Presentations—Gelato ICE Itanium® Conference & Expo | San Jose | April 2006
Over 200 scientists, developers, and engineers from 80 companies and institutions met in San Jose, California, for the April 2006 Gelato ICE: Itanium® Conference & Expo. Attendees addressed current high-performance computing issues and collaborative solutions specific
to Linux on the Intel Itanium architecture. Over a 3-day period, attendees were treated to
65+ technical presentations by some of the top research and industry users of Linux on
the Itanium-based platform.
Aside from the presentations and discussions, attendees participated in a variety of social
events. In addition to the technical presentations listed below, view some of the
photographs from the meeting.
Keynotes Itanium: Its Rationale and Potential from an HP Labs Perspective, William S. Worley - Secure64 & Itanium Solutions Alliance
Trends in Computer System Design, Jerry Huck - HP
The Road Ahead: Intel Itanium Architecture and Software, Don Soltis - Intel James Reinders - Intel
General Interest
Welcome, Mark K. Smith - Gelato Central Operations
Basic Itanium Architecture, Cameron McNairy - Intel
An Evaluation of High Performance Octave on Itanium, Ashok Krishnamurthy - Ohio Supercomputer Center
Highlights of the Upcoming October Gelato Conference, Jon Lau - National Grid Office
An Overview of Common Interconnects for Commodity Clusters, Doug Johnson - Ohio Supercomputer Center
Numerical Computation Tools for Itanium, Matthieu Delahaye & Shailesh Patel - Gelato Central Operations
Topics for Enterprise
Oracle: An Enterprise Itanium Use Case Study, Brian Hirano - Oracle
An Update on Xen on Itanium, Alex Williamson - HP
Enterprise Graphics on IPF, Hansong Zhang - SGI
MCA: Machine Check Architecture, Cameron McNairy - Intel
Itanium Solutions Alliance Developer Days Itanium Architecture, Cameron McNairy - Intel
Hardware Overview, Jeff Donsbach - HP
Itanium Firmware (EFI), Jeff Donsbach - HP
A Systematic Approach to Tuning Software, Sverre Jarp - European Organization for Nuclear Research
Completing a Successful Migration, Jeff Donsbach - HP
Tools and Tuning
Columbia Application Tuning Case Studies, Johnny Chang - National Aeronautics and Space Administration
HP Caliper: An Update to the Linux IPF Performance Tool, Curt Wohlgemuth & Steve Williams - HP
VTune Update, Paul M. Cohen - Intel
An Update on the Current State of Open|SpeedShop, Jack Carter - SGI
Valgrind, Julian Seward - OpenWorks
A Dynamic Instrumentation-Based System for Building Program Analysis Tools for the IPF Platform, Jasper Kamperman - Intel
OpenMP: Past, Present, and Future, Timothy Mattson - Intel
Update on the Perfmon2 Interface, Stéphane Eranian - HP
Focus on GCC The ISP RAS Effort to Improve GCC for Itanium, Arutyun I. Avetisyan - Institute for System Programming, Russian Academy of Science
GCC IP Issues, Dan Berlin - Google
Open64: An Alternative Backend for GCC, Shin-Ming Liu - HP
Aliasing in GCC, Dan Berlin - Google
Superblock Update, Robert Kidd - University of Illinois at Urbana-Champaign
An Interblock VLIW-Targeted Instruction Scheduler for GCC, Andrey Belevantsev - Institute for System Programming, Russian Academy of Science
Parallel Programming with GCC, Diego Novillo - Red Hat
LTO: A Brief Introduction, Mark Mitchell - CodeSourcery
LLVM: A Brief Introduction, Chris Lattner - Apple
|
Focus on Scalability Blktrace: An Overview, Alan Brunelle - HP
Scaling Linux to 512 Processors and Beyond, John Hawkes - SGI
NFS Performance, Peter Chubb - University of New South Wales
Scalability Mini-Track Wrap Up, Lee Schermerhorn - HP
Advanced Topics Mathematical Modeling to Formally Prove Correctness, John R. Harrison - Intel
Kernel Optimization for Enterprise Workloads, Kenneth Chen - Intel
Suggested Improvements in Itanium and Software, Clemens C. J. Roothaan - Gelato Honorary Member
Evolution of PCI IO: A Linux IO Geek's Perspective on HW, Grant Grundler - HP
Local and Remote Memory: Memory in a NUMA System, Christoph Lameter - SGI
Decimal Floating-Point, John Crawford - Intel
The Itanium Vector Math Library (VML), Clemens C. J. Roothaan - Gelato Honorary Member
Research Preparing for the First Beam at the LHC, Lawrence Pinsky - University of Houston
Computing Optimal Equilibrium Strategies for Network Economies, Alejandro Jofré - University of Chile
Mathematical Libraries and the Implementation of Parallel Solvers for Engineering, Hugo Daniel Scolnik - University of Buenos Aires
Superpages / VM Work, Ian Wienand - University of New South Wales
Experiences on the Itanium-Based Grid Test Bed at UPRM, Wilson Rivera - University of Puerto Rico Mayaguez
In Search of Collaboration, Ping-Hui Kao - HP
Itanium Virtualization and vNUMA, Matthew Chapman - University of New South Wales
Bioinformatics in Biomining, Nicholas Loira & Andres Aravena - University of Chile
|
| Keynotes |
| Monday |
Itanium: Its Rationale and Potential from an HP Labs Perspective
The
Intel/HP Itanium architecture definition effort started with the
results of an HP Labs research program, called PA Wide Word internally,
conducted from January 1990 to December 1993. Concepts and conclusions
formulated during this research program established technical
principles for a fundamental advance in processor architecture and led
to the Intel/HP partnership. Less noticed in published accounts is the
fact that many capabilities Intel and HP jointly innovated in the
Itanium architecture were specifically designed to enable construction
of secure systems. Non-security objectives have led modern
general-purpose operating systems to continue to rely upon a more than
40-year-old, CPU-only, hardware protection model. This limited hardware
protection model simply is incapable of supporting the levels of
remote-attack security required in today's massively complex systems,
in today's online world. As a result, we find vulnerable servers
surrounded by vulnerable external protective appliances. All require
periodic patching and re-testing. It's not clear the good guys are
winning.
Intel's Itanium 2 systems now offer the means for building "inherently
secure" systems. Inherently secure means that the software controlling
the hardware platform has specific, strong security properties. Without
an inherently secure foundation, the current trends of virtualizing
servers and consolidating network protective appliances magnify, rather
than mitigate, security risks. Secure64's inherently secure hardware
platform control software fully utilizes the capabilities of the
Itanium architecture to provide such a foundation. This offers
substantial benefits for information systems and infrastructures, and
can establish Itanium hardware platforms as the winners both for secure
consolidation and for secure virtualization. | |  | William S. Worley - Secure64 & Itanium Solution Alliance
Dr.
William (Bill) Worley Jr. is the CTO of Secure64 Software Corporation.
He is a Retired HP Fellow (Chief Scientist and Distinguished
Contributor), and Commissioner of Colorado Governor's Science and
Technology Commission. He received an MS (Physics) and MS (Information
Science) from the University of Chicago and a PhD (Computer Science)
from Cornell University. Bill is a system architect. At HP, he directed
the team that developed the PA RISC architecture. He later directed the
development of the PA Wide Word architecture, the foundation for the
HP/Intel partnership that led to the Itanium 2 microprocessor family.
Prior to HP, during 13 years with IBM, he contributed to architectures
for mainframes, storage systems, and IBM's first RISC architecture. In
the years prior to his retirement from HP, Bill focused upon hardware
and software architectures for secure systems. Following retirement,
Bill joined Secure64 Software as a co-founder and CTO. Secure64 has
developed a multi-core platform control system, including a queued,
asynchronous network stack, which fully exploits the security
capabilities of the Itanium architecture. |
|
| | | Tuesday |
Trends in Computer System Design
This
presentation will examine the issues and tradeoffs in high-performance
commercial system design. The current family of chipsets and system
enclosures from HP will be used to examine how system requirements
influence design choices. These requirements include performance,
power, reliability, availability, serviceability, and manageability. | |  | Jerry Huck - HP
Jerry
Huck is an HP Fellow with HP's server global business unit that
produces the Itanium-based HP Integrity servers running HP-UX, Linux,
OpenVMS and Windows operating environments. He is responsible to
participate in technology and strategy development. This includes work
on platform and processor architecture, virtualization, performance
analysis, and manageability solutions. Huck joined HP in 1983 and
participated in the development of HP's PA-RISC architecture
specializing in floating-point and virtual memory definition. He and
his team developed the 64-bit instruction set extensions to PA-RISC in
the early 90's. Starting in 1994, Huck led the HP side of the
instruction set and platform definition team for the co-developed Intel
Itanium architecture. He continues to evangelize HP's server offerings
with customers and industry analysts. He received his PhD from Stanford
and holds more than 15 patents in computer architecture and design. |
| | |
| Wednesday | The Road Ahead: Intel Itanium Architecture and Software
Join
us to hear about the road ahead for Itanium processors from hardware
and software experts working at Intel. Itanium processor-based systems
are winning in traditional RISC markets such as scalable enterprise,
high-performance computing (HPC) and mainframe replacement. These
markets require robust throughput, scalar and floating-point (FP)
performance of the processor, as well as its memory and I/O system.
Future Itanium processor designs will feature increased core count,
higher operating frequencies, increased memory bandwidth, and lower
memory latency. Itanium processor-based systems span from a few to
thousands of processors. Itanium processors are well
suited for these varied environments because of high reliability, agile
configurability, and strong software support. Enhanced reliability
results from specialized soft error resistant circuits, integrated
checking and error recovery algorithms along with extensive error
checking and correction of array elements and datapaths. Processor
reliability is increasingly critical due to virtualization applications
because virtual processors may exist on each physical processor and an
unrecoverable soft error of a physical processor would affect many
virtual processors. Intel designs excel at addressing this challenge.
Itanium processor designs also benefit greatly from the unmatched
manufacturing capabilities and silicon processing experience of Intel,
as well as a strong software ecosystem and excellent software
development products. | |  | Don Soltis - Intel
Don
Soltis is a Senior Principal Engineer at Intel and has spent the past
10 years on Itanium CPU architecture, design, and development. He has
20 years experience in CPU and ASIC design, working on PA-RISC CPUs,
I/O, memory and graphics chips. His favorite activity is freshwater
fly-fishing in the Colorado mountains and saltwater fly-fishing in
southwest Florida. |
 | James Reinders - Intel
James
Reinders is a Senior Engineer who joined Intel Corporation in 1989 and
has contributed to projects including the world's first TeraFLOP
supercomputer (ASCI Red), compilers, and architecture work for the
iWarp, Pentium Pro, Pentium II, Itanium, and Pentium 4 processors.
Reinders is currently the Director of Business Development and
Marketing for Intel's Software Development Products and serves as the
chief evangelist and spokesperson. He has been a leader in the creation
of Intel's Software Products including product plans, support,
technical marketing, marketing and business developemnt. Reinders is
also the author of a recent book titled "VTune Performance Analyzer
Essentials." |
| | |
| General Interest | | Monday |
Welcome
Welcome, introduction, and overview of Gelato Federation activities. | |  | Mark K. Smith - Gelato Central Operations
Mark
K. Smith is the Managing Director of the Gelato Federation. He works
with Federation members and sponsors around the world, fostering
collaborative relationships among members, sponsors, and the general
community to advance the Linux-Itanium platform. Mark leads a technical
team at University of Illinois and dedicates time to educating the
general community about the advantages of the platform. Prior to
joining Gelato, he worked in the software industry for 10 years. Mark
holds a PhD in Engineering from the University of Illinois. |
|
| | Basic Itanium Architecture
The
Itanium architecture and the paradigm of explicit parallel instruction
computing (EPIC) are often poorly understood. This presentation will
cover important aspects of the EPIC paradigm, including software
pipelining, register save engine, predication, parallel instruction
groups, data and control speculation, and many other mysteries of the
Itanium application and system architectures. | |  | Cameron McNairy - Intel
Cameron
McNairy is a Principal Engineer and an Intel Architect for the
Montecito program. Previous to Montecito, Cameron was a micro-architect
for the Itanium 2 processor, contributing to its design and final
validation. He plans to focus on performance, RAS (reliability,
availability, serviceability), and system interface issues in the
design of future IPF products. He came to the Itanium 2 team soon after
its inception from performance work on the first Itanium processor.
Cameron received a BSEE and an MSEE from Brigham Young University. He
is a member of the Institute of Electrical and Electronics Engineers. |
| | | An Evaluation of High Performance Octave on Itanium
GNU
Octave is a MATLAB-style interactive application for performing
numerical computations. The Octave language is mostly compatible with
MATLAB. MATLAB (and Octave) are being used as an executable
specification language to develop synthetic compact applications for
the DARPA HPCS program. This work has identified a clear need for a
MATLAB-style interpreter that can handle large address spaces, run on
multiple processors, and leverage high-performance interconnects. The
Ohio Supercomputer Center (OSC), Ohio State University, and Indiana
University have been collaborating on research and software
technologies for parallel Octave. We have constructed a version of
parallel Octave for the Itanium 2 cluster at OSC. This interpreter has
a 64-bit address space for large matrix support and uses the
high-bandwidth Myrinet interconnect. This talk will review the software
architecture, performance and scalability of parallel Octave on the OSC
Itanium 2 cluster.
| |  | Ashok Krishnamurthy - Ohio Supercomputer Center
Ashok
Krishnamurthy is the Director of Research and Scientific Development at
Ohio Supercomputer Center and an Associate Professor of Electrical and
Computer Engineering at Ohio State University. His research interests
are: signal and image processing, high-performance computing
applications, and data mining. His undergraduate degree in Electrical
Engineering is from the Indian Institute of Technology, Madras, and his
MS and PhD in Electrical and Computer Engineering are from the
University of Florida. He is involved in the DARPA High Productivity
Computing Systems Program and the DoD High Performance Computing
Modernization Program. |
| | |
| Wednesday | Highlights of the Upcoming October Gelato Conference
This presentation will highlight the next Gelato ICE: Itanium Conference & Expo to be held October 1-4 in Singapore. | |  | Jon Lau - National Grid Office
Jon Lau is the Assistant Head (Technical) at the National Grid Office (NGO) as well as the Technical Manager of the National Grid Pilot Platform (NGPP).
He coordinates the technical issues of the NGPP and virtual grid
communities, which span from network and security to middleware
software. He developed the first Access Grid (AG) node in Singapore and
has since seen the deployment of several sites in Singapore. He is
currently involved in several other initiatives such as the Global
Operational Grid, the Digital Media Grid Rendering Service, and the
SG@Schools PC-Grid. Prior to joining the NGO in January
2003, he was the Director of Engineering at eXage Private Limited, a
high-tech spin-off from the Kent Ridge Digital Labs (KRDL), where he
led the development team in designing a scalable architecture ready to
evolve to meet the needs of eXage's customers. Jon's technological
experience is driven from both hardware interests and software R&D
work at both the Information Technology Institute and KRDL. The many
projects that Jon has been involved with include WinViz, a data
visualization tool, as well as the Expert Advisory System on the
Internet (a national project), where he performed the role of a
Technical Manager. Jon holds a Bachelor's Degree in Computing and a
Master of Technology, both from the National University of Singapore. |
| | |
An Overview of Common Interconnects for Commodity Clusters
There
are a wide variety of interconnects for commodity clusters. Determining
the appropriate network when constructing a new cluster can be seen as
a daunting experience. This presentation intends to give a hardware
overview of the more common interconnects available, their performance,
and a comparison of the software available for the hardware. | |  | Doug Johnson - Ohio Supercomputer Center
Doug
Johnson is the Technical Lead for the Cluster Ohio project and
production Linux clusters at OSC. He has worked on many projects to
address usability and manageability of clusters of commodity systems.
His current areas of interest include: grid meta-scheduling, storage
for clusters, and high-availability services for clusters. |
| | |
Numerical Computation Tools for Itanium
If
you are working on developing new algorithms (signal processing, voice
encoding, etc.), analyzing and visualizing data, or simply performing
scientific and numerical computations like matrix operations, several
tools are available today to help you. These applications usually
manipulate large amounts of data and perform CPU intensive operations.
Therefore, the Itanium processor is a suitable platform. We will
explore the various solutions (MATLAB, Octave, and Scilab) and offered
functionalities, then will present the results of our informal
benchmarking/speed comparison tests and discuss the planned evolution. | |  | Matthieu Delahaye - Gelato Central Operations
Although
Matthieu Delahaye has worked on the Gelato portal since its creation in
2002, he officially joined Gelato Central Operations as a Software
Engineer in August 2004. In addition to maintaining the Gelato portal,
Matthieu works on Gelato Coconut, Gelato Vanilla, and other challenging
infrastructure and development projects around the Itanium processor.
Matthieu made his first kernel hacks while involved in the parisc-linux
port effort, and then joined the Debian Project. At the same time, he
received an MS in Computer Science from ESIEE, where he subsequently
worked for two years in the IT Department. |
 | Shailesh Patel - Gelato Central Operations
Shailesh
Patel was born in India, and grew up in Dubai, UAE. He graduated from
the National Institute of Technology (NIT), India with a BS in
Engineering and then completed his MS in Computer Engineering from
California State University, Long Beach. He has worked as a J2EE
developer, creating software for the subtitling and marketing industry.
At the University of Illinois at Urbana-Champaign, he worked with the
SandBox group and the openIMPACT team. Currenlty, Patel works for
Gelato Central Operations on the Vanilla project, developing optimized
binaries for the Itanium platform. |
| | |
| Topics for Enterprise | | Monday |
Oracle: An Enterprise Itanium Use Case Study
Oracle's
4-way Itanium 2 TPC-C benchmarks, announced in November of 2002, were
the culmination of a two-year project involving engineers from Intel,
HP, and Oracle. Since that time, multiple groups in Oracle and Intel
have continued to work closely on multiple versions of Oracle and
Linux-based Itanium platforms to ensure performance and stability for
enterprise solutions. This talk discusses the initial performance work
and the evolution of Oracle's and Intel's focuses, and presents some of
the current areas Oracle and Intel are jointly investigating. | |  | Brian Hirano - Oracle
Brian
Hirano is a Consulting Member of the technical staff in Server
Technologies at Oracle Corporation. He leads the Oracle effort to
release TPC-C benchmarks on McKinley-based Itanium platforms on Linux,
HP-UX, and Windows, working with teams from Intel and HP. In addition
to his development duties in the Oracle Database's Virtual Operating
System group, Brian also works with hardware and operating system
vendors on Oracle-related issues. |
|
| | An Update on Xen on Itanium
Xen
is rapidly becoming the de facto standard for open-source
virtualization, with capabilities and performance matching or exceeding
leading industry products. Paravirtualization techniques, efficient
inter-domain virtual I/O mechanisms, clever migration, and support for
multiple architectures (including VT and Pacifica hardware) have
contributed to a large broad base of developers and piqued industry
interest. Xen/ia64 is the first non-x86 architecture supported by Xen.
It is still a work-in-progress, but the core hypervisor component
utilizes code and/or experience from Xen, Linux/ia64, and the HP
vBlades research project. Many interesting strategies are employed to
ensure correctness, optimize performance, and leverage the many rapidly
developing layers of tools provided by Xen. We will provide
a brief overview of virtualization in general, Xen specifically, and
the current status of Xen/ia64. Then, we will spend the remaining time
discussing some interesting details about the inner workings of Xen on
Itanium. | |  | Alex Williamson - HP
Alex
Williamson is a member of HP's Open Source and Linux Organization
focusing on HP Integrity enablement and more recently Xen/ia64. Alex
has been involved with Linux/ia64 since 2000 and has made numerous
contributions to the Linux kernel. |
| | | | Tuesday | Enterprise Graphics on IPF
Large
shared memory achitectures enable friendly programming models and allow
efficient processing and visualization of large data sets produced in
the areas of computer-aided design (CAD), science and engineering
simulations, and new high-resolution sensor technology. In this talk,
we'll look at the SGI Altix multiprocessor systems as an example of
large shared memory architectures. We'll then showcase applications
that have a large memory footprint in genome matching and visualization
of CAD and high-resolution sensor data. | |  | Hansong Zhang - SGI Dr.
Hansong Zhang leads CPU-based visualization efforts at SGI, where he
advocates the cross-pollination between parallel computing,
visualization, and media applications. Prior to SGI, Zhang worked at
nVidia on real-time special effects. He was also the graphics architect
at Intrinsic Graphics, a vendor of cross-platform game software. Zhang
received his degree from the University of North Carolina, Chapel Hill. |
|
| MCA: Machine Check Architecture
The
Itanium Machine Check Architecture (MCA) is at the center of the
Itanium reliability, availability, and serviceability (RAS) approach.
Itanium's MCA defines methods and requirements that tie together the
processor, processor abstraction layer (PAL), system abstraction layer
(SAL), operating system (OS), and application. This presentation will
cover the various components and their roles, and then turn the focus
to the MCA foundations; the PAL and the processor that it abstracts. | | |
| |
| Itanium Solutions Alliance Developer Days | | Wednesday | Itanium Architecture
The
Itanium architecture and the paradigm of explicit parallel instruction
computing (EPIC) are often poorly understood. This presentation will
cover important aspects of the EPIC paradigm, including software
pipelining, register save engine, predication, parallel instruction
groups, data and control speculation, and many other mysteries of the
Itanium application and system architectures. | | | | | Hardware Overview
This
presentation will be an overview of the current Itanium product lines
offered in the marketplace and a quick summary of the integral system
specifications that set these systems apart. | |  | Jeff Donsbach - HP Jeff
Donsbach is a Senior Software Engineer in HP's Solutions Alliances
Engineering organization and Linux Expertise Center. The group helps
ISVs, large and small, port and optimize their applications for HP
platforms. Jeff has 20+ years of application development experience on
various UNIX flavors and 15 years of experience working with 64 bit
systems. Jeff has worked with a wide range of applications and ISVs
from
various industries including Databases, CAD/CAM, Software Development
Tools, Molecular Modeling, High Performance Computing and Middleware. |
| | | Itanium Firmware (EFI)
This
session provides an overview of the extensible firmware interface
(EFI), which is used to manage system boot, install, diagnostics, and
firmware properties. | | | | | A Systematic Approach to Tuning Software
In
this talk, we will look at performance optimization and bottleneck
identification. In order to optimize an application, one needs to
understand the "phase space" defined by the hardware and the external
software. One also needs to understand the application itself: the
algorithms used and the overall impact on the hardware platform.
Furthermore, one needs to know which hardware/software tools are
available for performance work. This talk will therefore try to define a systematic and detailed approach in this field: - Definition the hardware/compiler phase space:
- CPU
specifications (frequency, microarchitectural features) multi-core
designs, cache sizes, bus speeds, chip sets, I/O rates, etc.
- Compilers
(versions, features, flags, etc.) The compilers' encounters with the
application software, algorithms, programming style, etc.
- Review of performance tools (hardware/software)
- Illustration of measurements inside our phase-space with a few applications, ideally in three forms:
- Software kernels (testing only one feature at a time)
-
Well-known physics benchmark jobs (typically with emphasis on one
physics feature, such as tracking in detector geometries, etc.)
- Full-blown applications (e.g. a physics simulation framework, etc.)
The talk will hopefully provide some answers, and also give the audience enough "ammunition" to get started on their own. | |  | Sverre Jarp - European Organization for Nuclear Research
Sverre Jarp is the Chief Technology Officer at CERN's
openlab for DataGrid Application, which is a joint collaboration with
industry in order to assess leading-edge information technology for the
Large Hadron Collider's Computing Grid in 2007. He has been working in
computing at CERN, the European Organization for Nuclear Research, for
over 30 years and has held various managerial and technical positions
promoting advanced but cost-effective computing solutions for the
laboratory. In 2001-02, he spent a sabbatical year at the HP Labs, Palo
Alto, California, USA, working on software for the Itanium Processor
Family. His current field of interest is compiler optimization. Jarp
holds a degree in Theoretical Physics from the Norwegian University of
Science and Technology in Trondheim. |
| | | Completing a Successful Migration
Tips
on evaluating, locating, and resolving problems before they happen in
migrations will be covered. Information on additional resources on
Linux solution migrations will also be presented. | | | | |
| |
| Tools and Tuning | | Monday |
Columbia Application Tuning Case Studies
This
talk will present several case studies of application performance
enhancements on the SGI Altix platform. The enhancements include both
explicit (dplace) and implicit (cpubind/cpuset_pin) process-pinning,
eliminating memory contention in OpenMP applications, eliminating
unaligned memory accesses, and system profiling. These enhancements
enabled 2- to 20-fold improvements in application performance. | |  | Johnny Chang - National Aeronautics and Space Administration
Johnny
Chang is a member of the Application Performance and Productivity group
at the NASA Advanced Supercomputing (NAS) Division located in Moffett
Field, California. He is part of a group that provides consulting
service to the 700+ users of the Columbia supercomputer, a cluster of
twenty 512p SGI Altix systems. His work includes code porting,
debugging, tuning and optimization, and code scaling. Johnny received
his PhD in Chemical Physics from the University of Texas at Austin in
1985. He has published papers in multi-photon dynamics,quantum
scattering, path-integral methods, quantum functional sensitivity
analysis, and, most recently, weather modeling. |
|
| | HP Caliper: An Update to the Linux IPF Performance Tool
HP
Caliper is a sophisticated general-purpose performance analysis tool
that takes advantage of the Itanium processor's advanced performance
monitoring unit to provide detailed and accurate performance
measurements at the application and system level with minimal
perturbation to the system's behavior. Besides an overview
of HP Caliper, we will discuss new features, including system-wide
profiling and a new graphical user interface based on the rich client
platform of Eclipse. | |  | Curt Wohlgemuth - HP
Curt
Wohlgemuth is an engineer in the HP Caliper project. He has worked at
HP for many years, primarily in the areas of language tools, dynamic
translation, and performance tools. |
 | Steve Williams - HP Stephen
Williams is a member of the HP Caliper team and has worked at HP for
the past 17 years. He has worked on debuggers and performance tools and
has specialized in user interfaces. |
| | | VTune Update
This
talk will cover what's new for tuning Intel Itanium 2-based
applications, including native Eclipse IDE and NUMA aware support for
data collection. | |  | Paul M. Cohen - Intel
Paul
Cohen is a Performance Tools Product Line Marketing Manager at Intel.
He is responsible for Intel tools targeted at improving the performance
of customer applications. His current focus is improving usability of
the VTune Performance Analyzer, making it a robust enterprise-grade
solution able to deal with extremely large executables (100MB+) that
other products are unable to profile. In addition, he is working on
integration of the VTune Analyzer with Intel C and FORTAN compliers
under Eclipse with the ability to provide a close connection between
Intel compiler optimization reports and performance bottlenecks
represented in the VTune Analyzer. |
| | | An Update on the Current State of Open|SpeedShop
Open|SpeedShop
is SGI's next generation Linux performance analysis
tool. Based on the concepts of SGI's IRIX SpeedShop, Open|SpeedShop is
designed to be modular and easily extendable. It supports the concept
of plugins, which allow users to create their own performance
experiments. Another key feature of the performance tool is its
usability. Its user interface is designed for scientists in general,
not just computer scientists. Open|SpeedShop currently supports 4 user
interfaces: GUI, interactive command line, batch command file and as a
pure python module. The Open|SpeedShop baseline functionality includes
support for single system image (SSI) machines and for clusters (i.e.
multiple OS kernels). Current
experiments are exclusive and inclusive user time, program counter (PC)
sampling, MPI call tracing, input/output tracing, floating point
exception tracing, and CPU hardware performance counter experiments.
Open|SpeedShop enables FORTRAN (77, 90, and 95), C, and C++ programmers
to use an advanced performance analysis tool within the open-source
environment. The infrastructure and base components are released as
open source under the GPL and LGPL licenses. Open|SpeedShop is being
co-funded by the Department of Energy (DOE). | |  | Jack Carter - SGI
Jack
Carter has had over 20 years experience working with compilers and
compiler related tools, with extensive work with linkage and post
linkage object transformation technology. Currently he is a member of
SGI's Open|SpeedShop team. |
| | | | Tuesday |
Valgrind
Valgrind
is a GPL'd suite of simulation-based debugging and profiling tools for
Linux. Around a common core a number of tools have been built, two of
which are Memcheck, a memory error detector, and Cachegrind, a
low-level cache profiler. The system is structured as a common core,
which provides CPU virtualization, debug info management, and error
management, and handles other simulation nasties, particularly signals,
threads, and syscalls. The rich set of services provided by the core
makes it relatively easy to build sophisticated dynamic analysis tools.
The project Web site is http://www.valgrind.org. Valgrind
currently runs on {x86,amd64,ppc32,ppc64}-linux. A key component is
dynamic-translation based CPU virtualization. This converts blocks of
code into an architecture-neutral intermediate representation, hands
them to the currently active tool for instrumentation, and then
re-synthesizes runnable code from them. In this talk, I will take a
look at the challenges of porting this and other important Valgrind
components to Itanium. | |  | Julian Seward - OpenWorks
Julian
Seward founded the Valgrind project in 2000 and is the project lead and
a full time developer. His background is in compiler technology for
functional programming languages. He worked for several years on the
Glasgow Haskell Compiler, an open-source compiler for the functional
language Haskell, with earlier
postdoctoral work on compilation of a hybrid functional/OO language.
More recently, he led a small group developing a vectorizing code
generator for SIMD architectures. He holds a PhD in Computer Science
from the University of Manchester, UK. He is heavily involved with
open-source software and is also the author of bzip2, a widely used
lossless compression program. |
|
| | | Wednesday | A Dynamic Instrumentation-Based System for Building Program Analysis Tools for the IPF Platform
We
will present a dynamic instrumentation-based system called Pin for
building a variety of program analysis tools for the IPF platform. In
this talk, we will introduce the basic concepts of dynamic
instrumentation and provide details of the inner working of this
system. We will also talk about various optimizations that happen in
this system to ensure that programs running under control of Pin
perform reasonably well. Some specific features of IPF, which create
challenges for building a system like Pin, will be explored. We will
provide several real world examples of how this system has been used
for building program analysis tools. We will also talk about various
applications of this system in building tools for architecture research
and performance analysis. | |  | Jasper Kamperman - Intel
Jasper
Kamperman is the Product Manager for the Performance Tools Lab in
Intel's Developer Products Division. He has a Master's Degree in
Physics from the University of Utrecht in the Netherlands and holds a
PhD in Computer Science from the University of Amsterdam. Jasper has
presented at numerous conferences and published in scientific as well
as trade journals. Before joining Intel, Jasper was the Director of
Product Management at Reasoning, Inc. Previous engagements include a
position as researcher at CWI, the Dutch Center for Mathematics and
Computer Science, and consultant with ID Research (Now Ordina
Research), a high-tech consultancy firm. |
| | |
OpenMP: Past, Present, and Future
As
the industry moves to multi-core processors, multi-threaded software
will be essential. OpenMP is the industry standard API for writing
multi-threaded software. It is focused on the needs of applications
programmers and attempts to make it relatively simple to write parallel
software. In this talk, we will discuss the history of OpenMP, some of
the more innovative ways its being used today, and OpenMP innovations
you can expect to see in the future. | |  | Timothy Mattson - Intel
Tim
Mattson earned a PhD for his work on quantum molecular scattering
theory (UCSC, 1985). This was followed by a Post-doc at Caltech where
he worked on the Caltech/JPL hypercubes. Since then, he has held a
number of commercial and academic positions with high performance
computers as the common thread. Application areas have included
mathematics libraries, exploration geophysics, computational chemistry,
molecular biology, and bioinformatics. Dr. Mattson joined
Intel in 1993. Among his many roles at Intel, he was applications
manager for the ASCI teraFLOPS project, helped create OpenMP, founded
the Open Cluster Group (OSCAR), and launched Intel's programs in
computing for the Life Sciences. Currently, Mattson is conducting
research on abstractions that bridge across parallel system design,
parallel programming environments, and application software. This work
builds on his recent book on Design Patterns in Parallel Programming
(written with Professors Beverly Sanders and Berna Massingill and
published by Addison Wesley). The patterns provide the "human angle"
and help keep his research focused on technologies that help general
programmers solve real problems. |
| | |
| | Update on the Perfmon2 Interface
In
this short presentation, we will update the audience about the progress
of the perfmon2 interface. What are the latest features on Itanium and
other architectures? We will cover the user level tools and Montecito
support, and will report on the progress on getting our implementation
accepted in the mainline kernel for all major platforms. | |  | Stéphane Eranian - HP
Stéphane
Eranian is a Senior Research Scientist at HP Labs, where he has been
working on the porting of Linux to the IA-64 platform since 1998. He
has made numerous contributions to the Linux/IA-64 kernel and related
user-level programs. He is the main architect of the Linux/IA-64 kernel
performance monitoring subsystem (perfmon). He is also the creator of
the pfmon tool, which uses this subsystem to collect performance
information. Before joining HP, Stéphane worked on his PhD
at Chorus Systems (now Jaluna) in France. He holds a D.E.A. (BSc
degree) in Operating systems from Universite PARIS 6, France, and a
Doctorate (PhD degree) in Computer Science from Universite PARIS 7,
France. He is a member of USENIX and co-author of "IA-64 Linux Kernel:
Design and Implementation." |
| | | |
Focus on GCC | | Monday | The ISP RAS Effort to Improve GCC for Itanium
Ongoing
work at ISP RAS on improving GCC for Itanium processors will be
presented. Discussion will cover a past project with HP on improving
GCC instruction scheduling and the current effort on implementing a new
VLIW-targeted instruction scheduler. Future plans on improving GCC for
Itanium and potential collaboration projects will also be presented
including plans for a GCC meeting in Moscow this summer. | |  | Arutyun I. Avetisyan - Institute for System Programming, Russian Academy of Science
Arutyun Avetisyan is the Deputy Director of the Institute for System Programming (ISP)
at the Russian Academy of Sciences (RAS) in Moscow, Russia. His
research focuses include parallel and distributed programming, cluster
and grid technologies, and compiler technologies. Dr. Avetisyan leads a
project on a model based parallel program performance tuning system. |
| | | GCC IP Issues
This talk will cover a variety of intellectual property issues that come up during working on GCC, including:
- Copyright: Assignments of copyright, and how we deal with issues of
contributions of code from from other open source/commercial projects.
- Patents: How we deal with them in GCC, and what we require of companies that are going to contribute to GCC.
- General other issues related to intellectual property and GCC.
| |  | Dan Berlin - Google
Daniel
Berlin is an Advisory Engineer at IBM T.J. Watson Research Center,
where he works on compiler optimization research for current and future
IBM architectures. His main focus is designing and implementing new and
existing optimization algorithms for GCC. He is responsible for
implementing and maintaining several passes in GCC, including alias
analysis, various SSA optimizations, and high level loop transforms. He
received his CS Degree from the University of Rochester and has a JD
from George Washington University School of Law. |
| | | Open64: An Alternative Backend for GCC
While
GCC's Tree-based SSA optimization has been making good progress, the
Itanium processor may benefit more in the near future from alternate
high-performance optimizations. The Open64 compiler is the basis of the
Open Research Compiler (ORC), which Intel has been promoting for
Itanium-specific optimizations over the past couple years. This effort
aims to present Open64 as an alternative backend for GCC/G++ on the
Itanium/Linux platform. In addition to Itanium, this alternative
backend supports the EM64T/IA32 target as well as several other
embedded processors. In alignment with this effort, HP is coordinating
the update of the GCC/G++ front-end and driving the quality on the
Itanium/Linux platform. In this talk, the short- and long-term
perspectives of this alternative backend will be presented. | |  | Shin-Ming Liu - HP
Shin-Ming
Liu is the Project Manager for High-Level Optimization and GCC of the
Itanium C/C++ Compiler Section of the Java, Compiler, and Tools Lab at
HP in Cupertino, California. Liu led the development effort for the
high-level optimization and code generator project in compiler targeted
for the Itanium processor. In this project, he helped redesigned the
high-level optimization into a highly-robust, scalable, and efficient
component by rearchitecting the infrastructure, from which many new
techniques were developed. Many highly-recognized programming analysis
methods were adopted as well. Liu led the reinvention of compiler
development methodology by focusing on modulization, memory footprint
control, canonical internal representation, and automatic error
detection. Before joining HP, he worked at MIPS/SGI in the area of
compiler front end, middle end, back end, and linker. During that time,
he co-authored several technical publications. |
|
| | Aliasing in GCC
This talk will cover aliasing in GCC, including:
- An overview of the algorithms used to generate aliasing information.
- An overview of how the aliasing information is represented in GCC's IR.
- The improvements made in recent GCC versions to both of the above.
| | | | | Superblock Update
Superblock
scheduling is a common technique to increase the level of ILP in
generated code. By performing tail duplication, a Superblock-forming
compiler creates a longer extended basic block, simplifying the task of
moving instructions across basic block boundaries. More significantly,
the control flow into the duplicated tail is dramatically simplified.
This allows the compiler to draw much tighter bounds on the conditions
that exist when the block is executed and allows the code in the block
to be specialized for those conditions. This combination of radical
control flow transformation followed by specializing optimizations,
termed {\em structural compilation}, has been shown in the OpenIMPACT
compiler to be particularly useful in developing ILP when compiling for
the Itanium processor. As a first step toward developing
structural compilation techniques in GCC, we implemented Superblock
formation at the Tree-SSA level. By performing structural
transformations early, we give the compiler's high level optimizers an
opportunity to specialize the transformed program, thereby cultivating
higher levels of ILP. The early results of this modification are mixed,
with some benchmarks improving and others slowing. I will present the
effects of this structural transformation on later optimizations and
thoughts on the changes that will be necessary to allow optimizations
to benefit from this transformation. | |  | Robert Kidd - University of Illinois at Urbana-Champaign
Robert
Kidd is a graduate student in the IMPACT research group at the
University of Illinois at Urbana-Champaign. Within the IMPACT compiler,
he is responsible for the development of an interprocedural analysis
and optimization framework that fits within the usage model of a
traditional production compiler. Previous work within IMPACT has
addressed GCC compatibility and general maintenance of the code
generator. His work with GCC, supported by the Gelato Federation, aims
to improve the performance on the Itanium processor. |
| | | An Interblock VLIW-Targeted Instruction Scheduler for GCC
Modern
VLIW architectures (e.g. Itanium) require instruction level parallelism
(ILP) to be explicitly exposed by a compiler. An instruction scheduler
is a key compiler component for utilizing ILP. The current GCC
scheduler has a number of pitfalls in approaching this goal, including:
the oldest interblock scheduling algorithm, non-optimal region
formation, a traditional two-pass execution scheme, and lack of
transformations for eliminating false dependencies. This
presentation will cover an ongoing approach for implementing a new
aggressive instruction scheduler for GCC. The scheduling algorithm is
based on a selective scheduling approach. It is mainly targeted for
VLIW-like platforms, but the framework being implemented is general
enough and it can be used for other targets in the future. The key
features of the approach are as follows: works with DAG regions,
supports code motion with adding bookkeeping insns, supports register
renaming and forward substitution, and integrates with software
pipelining. We will discuss the algorithm and its adaptation to GCC,
implementation issues, and the current state of the project. | |  | Andrey Belevantsev - Institute for System Programming, Russian Academy of Science
Andrey
Belevantsev is a Project Manager for the GCC Itanium project at RAS
with a team of six. The current project of the team is implementing an
aggressive VLIW-targeted interblock scheduler for GCC. Andrey's
responsibilities include leading the team, designing the scheduler
infrastructure, and implementing the code motion part. His research
interests lay in the area of compiler optimizations, static analysis,
and security, focusing on instruction scheduling, alias analysis, and
interprocedural optimizations. |
| | | Parallel Programming with GCC
Multiprocessor
systems are becoming increasingly popular, but taking advantage of
their parallel capabilities is not always straightforward. Software
developed for these systems must explicitly make use of concurrency. In
this talk, I will describe two recent additions to the GNU Compiler
Collection (GCC) for developing software that can take advantage of
parallelism: vectorization and OpenMP. Vectorization is a compiler
feature that takes advantage of the multimedia capabilities of modern
CPUs by offloading the execution of some inner loops into separate
co-processors. OpenMP is a standard specification of compiler
directives for C, C++, and FORTRAN. It provides new directives to
specify parallelism, synchronization, and data sharing. This talk will
describe both features in detail, provide usage examples, and give tips
to take full advantage of these features when developing your
applications. | |  | Diego Novillo - Red Hat
Diego
Novillo was born in Cordoba, Argentina, and holds a PhD in Parallel
Computing from the University of Alberta, Canada. He is currently a
member of the compiler group at Red Hat Canada, working to improve the
GNU Compiler Collection (GCC), developing new ports and implementing
new analyses and optimizations. He is one of the main architects of
GCC's global optimization framework. |
| | | | Tuesday | LTO: A Brief Introduction
Many
compilers have obtained significant performance wins by using
"link-time optimization," i.e. by performing optimizations that cross
the boundaries of a single program unit. For example, if the argument
to a function is a constant in one module and the function is defined
in another module, the result of the function call may be constant as
well. But compilation of either module independently cannot determine
that fact. The GNU compiler collection (GCC) does not
presently implement link-time optimization, although it does provide a
limited form of inter-module optimization, as implemented by Geoff
Keating. Working with partners at AMD, HP, and IBM, we have developed a
proposal for implementing link-time optimization in GCC based on
serializing GCC's existing data structures. Thus, our proposal is
conservative in that it leverages GCC's existing data structures and
requires only minimal changes to GCC's core optimizers. A significant
advantage of our approach is that the serialized data structures will
be available to other consumers, such as program analyzers and IDEs.
Finally, our approach would facilitate the implementation of the most
significant missing feature in G++: the "export" keyword. | |  | Mark Mitchell - CodeSourcery
Mark
Mitchell is the founder of CodeSourcery and has been the Free Software
Foundation's Release Manager for GCC since 2001. Mitchell received
degrees in Computer Science from Harvard and Stanford. He left
Stanford's PhD program after starting CodeSourcery, where, with his
fellow Sourcerers, he strives to make the GNU Toolchain the choice of
software developers everywhere. |
| | |
LLVM: A Brief Introduction
This
talk will provide a brief introduction to LLVM (http:// llvm.org),
focusing on LLVM's robust interprocedural link-time optimization,
runtime optimization, and just-in-time code generation support. Work is
currently underway to integrate LLVM's mid-level and interprocedural
optimization capabilities into the GNU Compiler Collection (GCC)
compiler. Design, implementation, and status of GCC integration will be
discussed. | |  | Chris Lattner - Apple
Chris
Lattner is the Chief Architect of the LLVM Compiler Infrastructure,
which aims to build efficient and highly optimizing open-source
compiler components. He currently leads a team at Apple Computer, which
aims to integrate the GCC front-end with the LLVM optimizer and code
generator, providing GCC with interprocedural link-time optimizations
as well as a modern and efficient code generator. Chris holds a PhD in
Computer Science from the University of Illinois at Urbana Champaign
(UIUC). |
|
| |
| Focus on Scalability | | Wednesday | Blktrace: An Overview
"You
can't count what you can't measure" is an old software engineering
truism that inspires one to develop means to accurately and efficiently
measure the various subsystems within Linux in order to make concrete
performance improvements to the Linux kernel itself. Given that
measuring how Linux manages I/O is a key component towards
understanding overall system performance, Jens Axboe has recently been
working on a new capability within Linux called Blktrace, which allows
one to efficiently capture block I/O subsystem events for later
analysis. This presentation will start by providing an
overview of Blktrace through a discussion about its kernel
implementation and an overview of the utilities provided to capture
traces. We will then show how it is currently being used to measure the
LVM/DM subsystem as part of an effort to understand Linux IO
performance from top-to-bottom. | |  | Alan Brunelle - HP
Alan
D. Brunelle works for HP's Open Source and Linux Organization in the
Linux Scalability and Performance Group. He has been working on tools
to measure performance in order to help understand how to improve Linux
in that area. During his time in the group, Alan has primarily been
focused on the Linux storage I/O stack, and his work on the blktrace
utility has combined efforts in tool smithing with I/O. Prior to his
Linux work, Alan worked in Tru64 TruCluster technology, again primarily
in the I/O sphere. Prior to joining HP in 1988, he worked on attached
processor card software with Alacron, Inc, as well as graphics
algorithmic design and Unix/Mach device driver development with
CalComp. Alan earned an MSc in Computer Science from the UMASS/Lowell
(1989) and a BSc in Computer Science from the University of New
Hampshire (1984). |
| | | Scaling Linux to 512 Processors and Beyond
SGI's
Altix family of servers currently supports up to 512 Intel Itanium 2
processors and four terabytes of cache-coherent shared main memory, and
newer platforms will substantially increase those limits. Some
high-performance computing workloads benefit from executing on maximum
hardware configurations and in a single system image environment. In
the past few years as hardware capacity has increased, SGI and the
Linux community in general have pushed kernel scalability to keep up.
This presentation discusses the technical challenges of scaling to
hundreds, even thousands, of processors and many terabytes of memory,
what has been done to overcome those challenges, and what work remains. | |  | John Hawkes - SGI
John
Hawkes has been involved with the development and tuning of
high-performance multiprocessor computers since the early 1970s, from
HP's earliest multiprocessor Basic Language computer, to Elxsi's custom
message-passing SMP, to MIPS's R6000 Uni- and Multiprocessors, to SGI's
Challenge SMP and Altix ia64 ccNUMA. His involvement with Linux dates
back to SGI's exploratory work in the late 1990s and continues today
with the Altix servers, principally focusing on the measurement and
analysis of system performance and scaling. In recent years he has
co-authored papers about Linux performance for Usenix/Freenix and the
Ottawa Linux Symposium (OLS). |
| | | NFS Performance
There
have been many complaints about NFS performance on the Linux kernel
mailing lists when it is compared with performance on IRIX or Solaris.
Is it *really* so bad? And, what can be done to fix the problem? Over
the Southern Summer, Gelato@UNSW has been trying to find out. We
currently have tools to capture traces from real systems, anonymize
them (so that real users don't mind if we grab information), and replay
at a higher rate. In doing so, we have discovered, firstly, that there
are problems; secondly, that there is a degree of regularity in most
traces that can be exploited to improve NFS performance generally. This
is a work-in-progress talk; we expect to have more results by the time
of this conference. | |  | Peter Chubb - University of New South Wales
Peter
Chubb is a Senior Research Engineer at National ICT Australia and a
Research Officer at UNSW. He completed his PhD under Associate
Professor John Lions in 1989. Peter worked at Softway Pty Ltd as a
consultant and software engineer doing UNIX kernel, security, and
embedded work. He joined Gelato@UNSW at its inception in 2002. Peter
started using UNIX in 1979 and has never used Microsoft operating
systems for more than a few moments. His home life includes wife Lucy,
who also works at Gelato@UNSW, and two small daughters. Peter's hobbies
include music (he runs a recorder consort), aquaria (3 tanks at
present, no room for more), and fine wines. |
|
| | Scalability Mini-Track Wrap Up
In
each of the scalability presentations, we will try to leave time for
questions and answers. However, we expect/hope that attendees will have
additional scalability questions, issues, or topics not directly
related to the presentations. The scalability wrap up session will
provide an opportunity to discuss general scalability topics and areas
for further investigation and collaboration to measure and improve the
scalability of Linux on Itanium platforms. To this end, we encourage
attendees to share any scalability or general performance concerns, war
stories ("wins" are good, too!), unsolved mysteries, work in progress,
etc., including a couple of slides/graphs if you think that would be
helpful to illustrate the issue. | |  | Lee Schermerhorn - HP
As
a member of the Linux Performance and Scalability team within HP's Open
Source and Linux Organization (OSLO), Lee Schermerhorn works on
performance characterization and engineering for Linux on HP platforms
(primarily HP's Itanium-based Integrity platforms), with emphasis on
NUMA scheduling/affinity and (storage) IO performance. |
| | |
| Advanced Topics | | Monday | Mathematical Modeling to Formally Prove Correctness
Formal
verification attempts to establish the correctness of a computer
artifact (hardware, software, microcode, protocol, etc.) by rigorous
modeling and mathematical proof, rather than merely by testing or
simulation. Formal verification in the hardware industry is widely
practiced, and increasingly seen as necessary. We can perhaps identify
at least three reasons: - Hardware is designed in a more
modular way than most software, with refinement an important design
method. Constraints of interconnect layering and timing means that one
cannot really design "spaghetti hardware."
- More proofs in the
hardware domain can be largely automated, reducing the need for
intensive interaction by a human expert with the mechanical
theorem-proving system.
- The potential consequences of a
hardware error are greater, since such errors often cannot be patched
or worked around, and may in extremis necessitate a hardware
replacement.
It is not surprising that a considerable
amount of effort has been in the floating-point domain. Floating-point
algorithms have proven themselves difficult to get right. Yet in marked
contrast to some other targets for formal verification, it is not hard
to come up with widely accepted formal specifications of how
floating-point operations should behave. In fact, many operations are
specified almost completely by the IEEE Standard governing binary
floating-point arithmetic. However, in some other respects,
floating-point operations present a difficult challenge for formal
verification. We will describe some of our work in formally verifying
algorithms for operations such as division, square root, and
transcendental functions for the Intel Itanium architecture. | |  | John R. Harrison - Intel
John Harrison has worked in formal verification and automated theorem proving since 1990, when he joined Mike Gordon's "Hardware Verification Group" (HVG) at the University of Cambridge Computer Laboratory. As well as working on the development of the HOL theorem prover,
he developed a particular interest in the formalization of real
analysis and its application to formal verification of floating-point
hardware. After completing his PhD research in 1995, John Harrison
spent a very enjoyable year at Ĺbo Akademi University and Turku Centre for Computer Science (TUCS) in Turku, Finland, where he was a member of Ralph Back's Programming Methods Research Group. John Harrison then returned to Cambridge and worked on a formal model of floating-point arithmetic and its application
to the verification of some realistic algorithms for transcendental
functions. This work attracted the attention of Intel, and in 1998 John
Harrison joined the company as a Senior Software Engineer, specializing
in the design and formal verification of mathematical algorithms. He
has formally verified and in many cases designed or redesigned numerous
algorithms for mathematical functions including division, square root and trigonometric functions.
In his limited spare time over the past 10 years, John Harrison has
been working on a book, giving a comprehensive introduction to
automated theorem proving. He hopes that this book will finally reach
publication in 2006, and the associated code is already available from his Web page. |
| | | Kernel Optimization for Enterprise Workloads
Linux
has been receiving a great deal of attention in the past few years. The
popularity is being propelled by a wide range of adoption of Linux for
enterprise computing. Major software vendors have been supporting their
products on Linux for many years. As the enterprise software solution
stack builds up everyday, it is crucial that Linux kernel development
takes this opportunity to ensure that the kernel provides necessary
infrastructure for enterprise application to excel. This means
developing enterprise focused OS features, improving performance by
extending the scalability, as well as improving many other areas. Adding
to the excitement, the Intel Itanium 2 processor is built with many
innovative features that push the performance envelope. Featuring
massive caches and CPU execution resource, EPIC technology (Explicitly
Parallel Instruction Computing) provides a variety of optimization
opportunities. In this talk, we will highlight kernel optimization work
done on Linux-ia64, ranging from several critical low level assemblies
to generic kernel components. We will present how the linux-ia64 kernel
utilizes Itanium architecture features to extend scalability and
performance for enterprise workloads. | |  | Kenneth Chen - Intel
Ken
Chen works at Intel as a Linux kernel engineer. His first encounter
with Itanium was to develop processor firmware for the first generation
of the Itanium processor, followed by many years work on optimizing
enterprise software on Itanium architecture. For the last several
years, he worked on the Linux kernel, which he optimized for Itanium
platforms, ranging from low-level assembly code to generic SMP/ccNUMA
scalability. His latest venture is optimizing the Linux kernel for a
wide range of enterprise workloads and collaborating with the Linux
community to produce a superior enterprise-class-ready Linux kernel on
Itanium. |
| | | Suggested Improvements in Itanium and Software
In
general, the Itanium is a major step forward in computer design.
Nevertheless, there are still gaps in the instruction repertoire, and
the specifications of some instructions could be expanded or
modified.There are also some mandates by C++ concerning corner cases,
which cannot be justified by any mathematical reasoning whatsoever;
there is even one IEEE mandate that cannot pass muster. A detailed list
of shortcomings and possible remedies will be presented for your
consideration. | |  | Clemens C. J. Roothaan - Gelato Honorary Member
Clemens
Roothaan is a Professor Emeritus of Physics and Chemistry at the
University of Chicago. In the 1950's, he published detailed algorithms
to solve quantum mechanical movements of electrons in molecules and
atoms. Today, most computer programs in this area are based on his
method. After his retirement from the University in 1988, Roothaan
started to work for HP Labs in Palo Alto, California. He has worked on
the Itanium design team since 1990. Currently, Roothaan is working on a
large software suite of scientific tools for function evaluation. |
| | | Evolution of PCI IO: A Linux IO Geek's Perspective on HW
PCI
has been around since 1993 and has seen substantial changes since its
conception. New features and functionality have been introduced with
each generation (e.g. 64-bit, 3.3v, MSI, MSI-X, Split transactions,
etc). PCI-e is the latest generation and is *not* HW compatible with
previous generations. This gave HW vendors the "opportunity" (forced
them really) to re-implement and take advantage of some of the features
PCI-e offers. This talk will explain a few PCI features and
broken HW implementations, and will cover the reasons why PCI-e is an
improvement over previous PCI-X implementations. | |  | Grant Grundler - HP
Grant
Grundler was born in Toronto, Canada, and grew up near Silicon Valley,
California. He graduated from California State University, Hayward with
a BS in Computer Science. He lived and worked in Germany for three
years as a PC technician/support, ski tour "host," windsurf instructor,
and firmware designer/developer for a custom TokenBus networking card.
Back in "the States," Grant worked for three years at Olivetti on SVR4
ports to i860, MIPS R4000 (M700-10), and the first Alpha workstation.
Since 1993, Grant has worked for HP on HP-UX SCSI drivers and HP-UX PCI
subsystem design. He currently works on IO support and drivers for both
parisc- and ia64-linux ports. Grant's public presentations are
available at http://iou.parisc-linux.org/. |
| | | | Tuesday | Local and Remote Memory: Memory in a NUMA System
Memory
becomes difficult to handle in a NUMA system because storage is
available at various "distances" from the running process. A higher
distance means longer latency or less bandwidth, and therefore implies
slower access to memory. Performance in a NUMA system depends on
assigning available memory to processes in such a way that memory
access speed is optimized. The kernel has various mechanisms to
automatically or manually control NUMA memory placement. The
page allocator attempts to locate memory that is near the node where a
process is executing. However, if the data is to be later used by
processes running on other nodes, then memory would not be allocated in
the best way. The kernel allows manual control of memory allocation per
process via memory allocation policies. Similar issues occur in the
SLAB allocator. The SLAB allocator was revised last year in order to
insure that allocations occur in an optimal way and that allocations
are controllable in the same way as the page allocator. The
kernel itself must be aware of where its own data structures will be
placed and insure that data to be used by certain processors is on
memory nodes local to these processors. Improvements in this area
enhanced placement of core kernel structures and also allow device
drivers to place their data local to hardware devices. Finally, the
kernel now has the ability to migrate the physical location of pages to
improve performance after a process has been reassigned to a processor
on another node. | |  | Christoph Lameter - SGI
Christoph
Lameter is the Technical Lead at SGI for the Linux Kernel. He has been
leading the effort to make the kernel more NUMA aware by reworking the
SLAB allocator, page allocator, and various other components for
optimal performance. Christoph's patches made it possible for the
kernel to change the physical location of pages transparently while
processes are running (page migration) and he introduced the
functionality necessary to locally reclaim memory for optimal placement
of memory. Christoph is currently serving on the Technical Advisory
Board of OSDL, the Technical Program Committee for Gelato ICE, and the
Advisory Board of the Linux Professional Institute. He has been
teaching various classes on operating system design and programming
languages in San Jose. He earned a PhD for work on the implications of
quantum theory for concepts of reality. |
| | | | Wednesday |
Decimal Floating-Point
The
IEEE 754 Floating-Point Standard is up for revision. A major new
addition is a decimal data type and computation rules. This talk will
define and motivate the need for decimal (vs binary) floating-point
(FP), and demonstrate that a binary-integer based implementation is
effective for high-performance software emulation, as well as being
amenable to sharing hardware with binary FP units for maximum
efficiency and leverage. | |  | John Crawford - Intel
John
H. Crawford is an Intel Fellow at the Intel Corporation, Santa Clara,
California, where he investigates emerging technology directions and
issues for future Itanium Processor Family products. Crawford was the
Chief Architect of both the Intel386 and Intel486 microprocessors, and
co-project manager of the Pentium microprocessor. He managed the joint
Intel/HP team that defined the Itanium Processor Family instruction set
architecture, and directed aspects of Itanium processor product
development. Crawford was awarded the ACM/IEEE Eckert-Mauchly Award,
and the IEEE Ernst Weber Engineering Leadership Recognition. He was
elected to the National Academy of Engineering in 2002. Crawford
received an ScB in Computer Science from Brown, and an MS in Computer
Science from the University of North Carolina, Chapel Hill. |
| | |
| | The Itanium Vector Math Library (VML)
The
VML project was conceived in 1990 in parallel with the development of
the Itanium. To match the ambitious design of Itanium, all mathematical
and computational procedures relevant for the functions at hand were
re-examined. Powerful new methods were developed to determine (1)
Chebyshev expansions of a function by a straightforward transformation
of its MacLaurin expansion, and (2) a Remez expansion from a Chebyshev
expansion by a quadradically convergent iterative process.
The functions implemented are (1) reciprocal, division, square root,
and reciprocal square root; (2) exponentials, logarithms, and the power
function; (3) trigonometric functions and their inverses; (4)
hyperbolic functions and their inverses. Actually each VML code is a
subroutine that yields a vector of results from a vector of arguments.
For corner cases due to pathological input arguments, the VML codes
deliver the expected results directly, thereby avoiding error detection
by hardware, and subsequent elaborate and costly error processing.
Version 1 of VML, comprising 56 functions, is available in the Public
Domain. For 55 of these functions, the floating-point performance of
the inner loop is 100% saturated; the one exception is the single
precision logarithm, which has one floating point vacancy in its 12
cycle inner loop. The design and implementation
strategies of the VML programming model will be shown in detail by a
representative example. An earlier presentation entitled "Exploiting
the Power of Itanium" on 7/31/2003 in Amsterdam is available at
http://www.sara.nl/news/2003/20030813/lecture_roothaan_eng.html. | | | Clemens C. J. Roothaan - Gelato Honorary Member
|
| | |
| Research | | Monday | Preparing for the First Beam at the LHC
The
Large Hadron Collider (LHC) at CERN, the European Laboratory for
Particle Research in Geneva, Switzerland, is expecting to have the
first beam next year. This is the culmination of more than a decade
construction project, including the development of the supporting
software and computing models. ALICE is one of the four major detectors
that is being prepared for physics at the LHC, and the University of
Houston is a member of the US contingent of institutions involved in
that experiment. Along with the Ohio Supercomputer Center and the
facility at NERSC (LBL), the University of Houston Itanium cluster has
been participating in the increasingly severe sequence of "data
challenges" that are being wrapped up now in preparation for the actual
turn-on of the LHC. The ALICE computing model is necessarily dependent
upon a grid-based model that will include many different platforms,
Itanium among them. The data challenges have provided a good venue to
compare the relative attributes of the various platforms in running the
kind of simulations and analysis codes that are relevant to particle
physics applications. An overview of these results will be presented
along with a summary of the overall ALICE computing plans and the
status of its deployment. | |  | Lawrence Pinsky - University of Houston
Lawrence
Pinsky is the chairperson of the Physics Department at the University
of Houston. He holds a BS in Physics from Carnegie-Mellon University
and an MA and PhD in Physics from the University of Rochester.
Professor Pinsky also holds JD and LLM degrees from the University of
Houston's Law Center. He has published over 125 articles in refereed
journals and he gives from 5-10 invited talks each year. He is on the
organizing committees of several major international conferences each
year, including the recent CHEP'06 (Computing in High Energy
Physics-2006) conference in Mumbai, India. Professor Pinsky
is a member of the ALICE-USA Collaboration and has served as the
Computing Coordinator for that effort. He is a member of the ALICE
Computing Board and the CERN Grid Deployment Board. At the University
of Houston, he is a member of the Executive Committee of the Texas
Learning and Computation Center. Pinsky also has an extensive
NASA-supported research effort in the development of Monte Carlo
Transport codes for use in simulating the space radiation environment. |
| | | Computing Optimal Equilibrium Strategies for Network Economies
Models
for regulating, planning, and operating industries working on networks
such as energy, transportation, and telecommunication are key
ingredients today of which to take advantage. These models correspond
to large stochastic optimization/equilibrium problems, which are very
difficult to solve. In this talk, we will show three new distributed
algorithms/strategies to compute a solution and its implementation on
Itanium 2 clusters. This family of models and/or solutions are
currently used by several companies and institutions participating in
these industries. | |  | Alejandro Jofré - University of Chile
Dr.
Alejandro Jofré is a Professor at the University of Chile, acting as a
researcher at the Department of Mathematical Engineering. His research
focuses include optimization and mathematical economics. Since April
2000, he has been the Vice Director of the Centre for Mathematical
Modeling and is the leader of projects related with the energy and
telecommunication network such as "Rockmass Geo-Mechanical
Instabilities in Cooper Mines." He is also a Professor of the PhD
program in Mathematical Economics at the University Paris 1- Sorbonne
and an associate member of the Center for Experimental Math, Canada.
Jofré holds a PhD in Applied Mathematics from the University of Pau,
France, and has published more than 30 papers. |
| | | Mathematical Libraries and the Implementation of Parallel Solvers for Engineering
Our
research is focused on developing a highly efficient parallelizable
solver of huge systems of linear equations that arise from finite
element discretizations of complex nonlinear engineering problems.
Those problems are nonlinear, require many linearizations, and hence
several days of CPU time on Itanium platforms. Another important
application is the reconstruction of tomographic images. This
work includes a comparison of the mathematical libraries like MKL
(Linux) and MLIB (HP) from the point of view of the performance on
numerical problems using sequential and parallel implementations. The
new solver uses BLAS routines at levels 1,2,3, excluding complex data
types. The conclusions of our study present the results obtained with
several problems. | |  | Hugo Daniel Scolnik - University of Buenos Aires
Hugo
Daniel Scolnik is a Professor in the Computer Sciences Department (that
he founded in 1984) at the School of Sciences of the University of
Buenos Aires (UBA) where he teaches Cryptography, Numerical Analysis,
and Optimization. For his Gelato-related work, Scolnik co-directed a
Gelato-sponsored project comparing 64- and 32-bit architectures from
the point of view of their performance for scientific programming.
Scolnik is also currently directing three of his five graduate students
on Gelato-related theses.
Beyond his work at UBA, Scolnik was an international consultant
for United Nations agencies, HP, and Hitachi. He has been a Visiting
Professor in several countries. He represents Argentina on the
International Federation for Information Processing (IFIP) Technical
Committee 7 (TC7). He has published papers on Optimization, Numerical
Analysis, Automata Theory, Artificial Intelligence, Robotics, and
Mathematical Modeling, and has refereed several journals. In 2003,
Scolnik won the Konex Award for the best trajectory in Science and
Technology for the 1993-2003 decade in the area of Informatics. Scolnik
received a Licenciado en Ciencias Matemáticas at the University of
Buenos Aires in 1964, and a PhD in Mathematics from the University of
Zurich, Switzerland, in 1970. |
| | | Superpages / VM Work
This
talk will present a short overview of Gelato@UNSW's latest work on
issues relating to Itanium Linux virtual memory. Our work revolves
around both taking advantage of unique properties of the Itanium MMU
and some more "radical" ideas for overhauling parts of the Linux VM
layer. Topics touched on will include using the long-format VHPT,
strategies for providing dynamic superpages, and approaches for greater
abstraction within the Linux VM implementation. | |  | Ian Wienand - University of New South Wales
Ian
Wienand has been a Research Assistant with Gelato@UNSW since late 2003,
working on various Itanium Linux projects. He has recently changed the
nature of his engagement to undertake a Master's Degree within the
group, looking at new approaches for Itanium Linux virtual memory. |
| | |
| Tuesday | Experiences on the Itanium-Based Grid Test Bed at UPRM
The
Parallel and Distributed Computing Laboratory (PDCLab) at the
University of Puerto Rico, Mayaguez, has deployed an experimental grid
test bed to perform research in the area of grid computing. The PDCLab
grid test bed was deployed using components that allow flexible
re-configuration, management, and programmability. The test bed was
built upon heterogeneous components including an Itanium based cluster.
This presentation provides discussion about the hardware and software
configurations of the grid test bed, the rational used to choose each
of grid components, and the research issues being investigated. | |  | Wilson Rivera - University of Puerto Rico Mayaguez
Dr.
Wilson Rivera obtained his PhD in Computational Engineering from
Mississippi State University, while working at the NSF Engineering
Research Center for Computational Field Simulation. There he
concentrated on developing domain decomposition algorithms for solving
time dependent partial differential equations with applications in
Computational Fluid Dynamics. Dr. Rivera is an Associate Professor at
the University of Puerto Rico Mayaguez Campus (UPRM). He leads the
Parallel and Distributed Computing Laboratory (PDCLab) at UPRM. His
current funded projects address fundamental research problems in the
areas of grid computing (automated grid deployment, adaptive grid
services, dynamic resource management and grid performance) and
workflow management (workflow modeling, metadata description and
dynamic scheduling). Rivera is also the Executive Director for the
Institute for Computing and Informatics Studies at UPRM and is a
faculty member of the NSF Center for Subsurface Sensing and Imaging
Systems (CenSSIS) and the NSF Center for Collaborative Adaptive Sensing
of the Atmosphere (CASA). |
| | | In Search of Collaboration
In
advancing Linux on Itanium, there are many technical areas crying out
for collaboration. HP is, in particular, interested in collaborating
with research institutes, universities, and vendors in three areas:
scalability, virtualization, and GCC. We believe these three areas are
critical to the success of Itanium. In this presentation, we will
present the needs in the three areas from HP's viewpoint. We will have
short discussions on your needs as well. This goal is to stimulate
off-line discussions concerning potential collaborations. In some
cases, it could leads to funding from HP. Please join us to start more
collaborations. | |  | Ping-Hui Kao - HP
Ping-Hui
Kao is a System Architect in HP Open Source and Linux Organization
(OSLO) R&D. He contributed to the HP-UX operating system and
kernel, especially in filesystems, Windows NT work on HPPA based
platforms, Linux kernel developments, and HA and cluster technologies.
In addition to various engineering tasks, he is in charge of
Orchestrated Collaborative Engineering (OCE). As part of OCE, Ping-Hui
manages a R&D lab in Beijing, China, and coordinates collaboration
with the OSS community and consortia for OSLO. |
|
| | | Wednesday |
Itanium Virtualization and vNUMA
In
recent years, virtualization has become a hot technology, being widely
deployed for applications such as server consolidation. However
Itanium, like x86, was not originally designed with virtualization as a
goal. In this presentation, I will talk about the challenges of
virtualizing the Itanium architecture. I will present the various
possible approaches, including para-virtualization, pre-virtualization
(an automated technique we have developed), and hardware-assisted
virtualization in the form of Intel Virtualization Technology. I
will also provide an overview of vNUMA, a novel application of these
virtualization techniques. vNUMA provides a virtual ccNUMA-like
environment on a cluster, by transparently implementing shared memory
underneath the operating system. Thus, a single instance of an existing
operating system such as Linux can run across multiple nodes of a
cluster. While the general principles are applicable to any
architecture, the initial version has been built for Itanium systems. | |  | Matthew Chapman - University of New South Wales
Matthew
Chapman is a PhD student at the University of New South Wales, Sydney,
Australia, and also works part-time for HP Labs. His research interests
include operating systems, computer architecture, and virtualization.
Matthew has considerable experience with the Itanium architecture,
having contributed to the Itanium ports of Linux, L4, Xen, and his own
project vNUMA. He is also active in the wider open-source community,
such as the rdesktop project, which he founded. |
| | |
Bioinformatics in Biomining
Given
the explosive growth of genomic databases in recent times, the
development of efficient searching tools becomes more relevant every
day. In particular, the design of biochemical elements used for
bioidentification experiments requires search algorithms incorporating
specific biological constraints. These experiments are designed to
identify the biological diversity of metagenomic or environmental
samples and are useful in ecology, environmental studies, and infection
diagnosis, among others. We are focusing on text search algorithms for
short words (under 60 symbols) where a small number of substitutions
are allowed. The databases used are in the order of gigabytes. We have
developed an efficient solution for this problem, which can take
advantage of the Itanium 2 architecture. In this work, we will present
a comparative study of performance of this algorithm on several
architectures. This work is being developed at the Laboratory of
Bioinformatics and Mathematics of Genome, Center for Mathematical
Modeling, University of Chile. | |  | Nicholas Loira - University of Chile
Nicolas
Loira is a Computer Engineer from the University of Chile, with a
background in videogame programming, system administration, and IT.
After four years in the field of Bioinformatics, Nicolas currently
works designing and implementing algorithms to handle and analyze the
massive amounts of data produced by some of the most important
biotechnology projects in Chile. |
 | Andres Aravena - University of Chile
Andres
Aravena is a Mathematical Engineer and MSc candidate in Computer
Science at the University of Chile. He is currently the Project Manager
of the Laboratory of Bioinformatics and Genome Mathematics at the
Center for Mathematical Modeling at the University of Chile, a group
focused on one of the main Chilean biotechnological projects. His
previous experience includes system development and network management
at a nationwide university, and representation on the national academic
networking consortium. |
| | |
|