Presentations—October 2005 Gelato Federation Meeting
Over 90 scientists, developers, and engineers from 25+ institutions met at the Pontifical Catholic University of Rio Grande do Sul (PUCRS) in Porto Alegre, Brazil on October 2-5, 2005. The meeting, titled "The Itanium ERA: Education, Research, Application," allowed attendees to address current high-performance computing issues and collaborative solutions specific to Linux on the Intel Itanium processor. In a 2-1/2-day period, attendees were treated to two dozen technical presentations by some of the top research and industry users of Linux on Itanium. Aside from the presentations and discussions, attendees participated in a variety of social events. Besides the technical presentations listed below, view some of the photographs from the meeting. Welcome and Itanium Research at PUCRS, Jorge Audy and C&esar De Rose - Pontifical Catholic University of Rio Grande do Sul
State of the Federation, Mark K. Smith - Gelato Central Operations
Itanium Solutions Alliance, An Introduction, Caryl Malone - Itanium Solutions Alliance
Itanium Tools, led by Mark Davis - Intel
Itanium Architecture 101, Eric W. Moore - Intel
Compiler Optimizations for Transaction Processing Workloads on Itanium Linux Systems, Mark Davis - Intel
Focus on Grid, led by Walfredo Cirne - Universidade Federal de Campina Grande
CERN Loops - Can We Extrapolate Total Application Performance?, Sverre Jarp - European Organization for Nuclear Research
| GCC Improvement, led by Shin-Ming Liu - HP
Bugs! Getting Them Stomped, Peter Chubb - University of New South Wales
New Intel Itanium Architecture Specific Features for Intel VTune, Paul M. Cohen - Intel
Itanium Research at the University of Split, Mile Dzelalija - University of Split
Focus on Scalability, led by Lee Schermerhorn - HP
Performance Comparison of Data Reordering Algorithms, Alvaro Coutinho - UFRJ
Paravirtualization without Pain, Peter Chubb - University of New South Wales
Itanium Research at the University of Buenos Aires, Hugo Daniel Scolnik - University of Buenos Aires
Xen-Virtualized Machines, Dan Magenheimer - HP
Proposal for Enhanced Open-Source Compiler Performance, Shin-Ming Liu - HP
|
Welcome and Itanium Research at PUCRS
PUCRS will welcome Gelato meeting attendees to their Porto Alegre campus and will present their three main Itanium-related projects.
|
 | Jorge Audy - Pontifical Catholic University of Rio Grande do Sul
Jorge Audy is a Professor of Computer Science and Vice President of Research and Graduate Studies at the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Brazil. He holds a PhD in Information Systems and is currently involved in projects related to global software development and project management. |
 | César De Rose - Pontifical Catholic University of Rio Grande do Sul
César De Rose is an Associate Professor in the Computer Science Department at the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil. His primary research interests are parallel and distributed computing and parallel architectures. He is currently conducting research on a variety of topics applied to clusters and grids, including resource management, resource monitoring, and distributed allocation strategies. Dr. De Rose received a PhD in Computer Science from the University Karlsruhe, Germany, in 1998. He currently leads the Research Center in High Performance Computing (CPAD - PUCRS/HP) at PUCRS. |
| State of the Federation
Mark will give an overview and status of the Federation's growth and will report on collective member characteristics. In addition, he will discuss some of the Federation's key areas of interest and recent activity at Gelato Central Operations. Additionally, Matthieu Delahaye will brief attendees on the Gelato Vanillat project. Andy Schuh will preview the April 2006 Gelato Conference to be held in San Jose, California.
|
 | Mark K. Smith - Gelato Central Operations
Mark K. Smith is the Managing Director of the Gelato Federation. He works with Federation members and sponsors around the world, fostering collaborative relationships among members, sponsors, and the general community to advance the Linux-Itanium platform. Mark leads a technical team at University of Illinois and dedicates time to educating the general community about the advantages of the platform. Prior to joining Gelato, he worked in the software industry for 10 years. Mark holds a PhD in Engineering from the University of Illinois. |
|
Itanium Solutions Alliance, An Introduction
|
| Caryl Malone - Itanium Solutions Alliance
|
| Itanium Tools
Experts will introduce the pfmon tool, qtools, OProfile, PAPI-based tools, Intel VTune, Intel C++ Compiler (icc), and performance libraries.
Audience participation is encouraged.
|
| Philip Mucci - Royal Institute of Technology
|
| Itanium Architecture 101
This lecture will cover the principles of the EPIC architecture that software developers need to know. In this pursuit, the lecture will introduce the registers, instruction formats, predication, and speculation of the Intel Itanium architecture, culminating with the vocabulary and principles of how software pipelining is accomplished.
|
 | Eric W. Moore - Intel
Eric Wynne Moore is a Senior Software Engineer working in the Software Products Division at Intel Corporation. In the past, he has worked at Rational, Microsoft, RealAudio, Digital, Compaq, and Keane. His specialties include operating systems, compilers, high-performance computing, and performance tuning. In the last couple of years, Moore has trained more than 500 engineers in performance optimization, including engineers at PNNL, CIA, FBI, IBM, Dell, Hewlett Packard, SGI, Cisco, Intel, several universities, as well as all over the world, including and around Korea, China, Brazil, and Europe. |
| Compiler Optimizations for Transaction Processing Workloads on Itanium Linux Systems
This talk describes a repertoire of well-known and new compiler optimizations that help produce excellent server application performance and investigates their performance contributions. These optimizations combined produce a 40% speed-up in on-line transaction processing performance and have been implemented in the Intel C/C++ Itanium compiler.
|
 | Mark Davis - Intel
Mark Davis is a Senior Principal Engineer at Intel. He serves as an architect of Intel's Itanium Compiler Lab, providing high-quality, high-performance compilers for enterprise-class Itanium platforms. He has also been co-manager of the Itanium Compiler Development team, and co-manager of the Itanium code generator. Mark has specialized in compiler optimizations, performance analysis, and architecture design in his career. At Digital/Compaq, he worked on the GEM compiler for Alpha, and was technical lead when GEM was targeted to Itanium. As compiler lead at Stellar/Stardent, he helped design the world's first graphics supercomputer and delivered high performance compilers for it. While at Intermetrics, he was a language designer in DoD's competition for the Ada language and then was technical lead for an Ada compiler; later he helped design and became technical director of a highly-optimizing PL/I compiler developed for IBM.
Dr. Davis received his PhD in Computer Science from Harvard. |
| Focus on Grid
This section intends to discuss the grid efforts of Gelato members and how these efforts relate to Linux on Itanium. There will be an open plenary discussion and three short presentations. Dr. De Rose will talk about the GerpavGrid Project and how it utilizes the power of computational grids in the public administration of the City of Porto Alegre. Dr. Jarp will bring the latest updates on the CERN grid. Dr. Cirne will present the OurGrid project and how it intends to reduce the "scientific digital divide."
|
 | Walfredo Cirne - Universidade Federal de Campina Grande
Since July 1995, Dr. Walfredo Cirne has been a Professor at UFCG, acting as a researcher in distributed and parallel computing, advisor of MS and PhD students, and teacher of graduate and undergraduate classes. Previous to 1997, Dr. Cirne worked on computer networks and machine learning. Since then, his research has focused on grid computing. Currently, he coordinates one of the greatest research projects in computational grids in Brazil, OurGrid, a project developed in cooperation with HP that aims to provide a complete grid solution for bag-of-tasks applications, and includes the search of good out-of-the-box performance for grid middleware running on Itanium.
Dr. Cirne holds a PhD in Computer Science from the University of California San Diego, and has published dozens of papers in the main international publications and computer science conferences. |
| CERN Loops - Can We Extrapolate Total Application Performance?
CERN programs often have very flat execution profiles, which implies that the execution time is spread over many routines/methods. Consequently, compiler optimization must be applied to the whole program and not just a few inner loops. In this talk, we nevertheless discuss the value of extracting some of the most solicited routines (relatively speaking) and using them to gauge overall performance. An initial set of ten C++ routines have been extracted from three CERN packages (ROOT, GEANT4, and CLHEP). One main advantage is, of course, that the routines compile and execute in seconds, allowing lots of testing of different platforms, compilers, and compiler options. The speaker will review the initial selection and show results with GCC and icc on both the Xeon and the Itanium platforms.
|
 | Sverre Jarp - European Organization for Nuclear Research
Sverre Jarp is the Chief Technology Officer in CERN's openlab for DataGrid Application, which is a joint collaboration with industry in order to assess leading-edge information technology for the Large Hadron Collider's Computing Grid in 2007. He has been working in computing at CERN, the European Organization for Nuclear Research, for over 28 years and has held various managerial and technical positions promoting advanced but cost-effective computing solutions for the Laboratory. In 2001-02, he spent a sabbatical year at the HP Labs (Palo Alto, USA) working on software for the Itanium Processor Family. His current field of interest is, in particular, compiler optimization. S.Jarp holds a degree in Theoretical Physics from the Norwegian University of Science and Technology in Trondheim. |
| GCC Improvement
Three presentations are planned for this evening:
In this talk, Diego will provide a general architectural overview of GCC. He will also describe the development process: how to participate and contribute to its development. Finally, he will talk about some of the major challenges and opportunities for improvement.
The HP GCC project team is made up of three full-time engineers. The primary objective of the project is to improve the customer experience on the Linux-Itanium platform. The current focus of the team is to fix GCC defects posted in the Bugzilla database, submit patches to the FSF tree, and regularly monitor the quality of the major support line of GCC and report or fix any regressions. The project also helps coordinate the GCC performance enhancement activities among various organizations.
Superblock formation creates a chain of single predecessor basic blocks which can then be specialized by later optimizations. Moving the Superblock formation pass to early in the Tree-SSA stages of GCC opens the door for application of optimizations in a more powerful and maintainable intermediate language. Bob will discuss the details of his patch, present results, and future plans.
Canqun wil give a brief introduction of the CCRG group and their improvements for GCC on Itanium 2, which focus on function inlining. Performance results produced by their improved GCC willl given as well as a look at their ongoing work.
|
 | Shin-Ming Liu - HP
Shin-Ming Liu is the Project Manager for High-Level Optimization and GCC of the Itanium C/C++ Compiler Section of the Java, Compiler, and Tools Lab at HP in Cupertino, California. Liu led the development effort for the high-level optimization and code generator project in compiler targeted for the Itanium processor. In this project, he helped redesigned the high-level optimization into a highly-robust, scalable, and efficient component by rearchitecting the infrastructure, from which many new techniques were developed. Many highly-recognized programming analysis methods were adopted as well. Liu led the reinvention of compiler development methodology by focusing on modulization, memory footprint control, canonical internal representation, and automatic error detection. Before joining HP, he worked at MIPS/SGI in the area of compiler front end, middle end, back end, and linker. During that time, he co-authored several technical publications. |
| Bugs! Getting Them Stomped
Eric Raymond invented what he called, Linus's Law: "Given enough eyeballs, all bugs are shallow." What he meant by that was that in the open-source world, where bug reporter, bug fixer, and core developer share a common view of the system, bug reports are of better quality, and fixes are easier to find. Unfortunately, even though much of what we work on is open-source, it's often hard to work out how to report a problem and get it fixed. Consequently, many of us carry along sets of patches and workarounds for many months, maybe even years, until "something happens." In this talk, I'll be attempting to elucidate how to interact with the open-source community so that the problems *you* have are fixed up-stream. The essence is communication.
|
 | Peter Chubb - University of New South Wales
Peter Chubb is a Senior Research Engineer at National ICT Australia and a Research Officer at UNSW. He completed his PhD under Associate Professor John Lions in 1989. Peter worked at Softway Pty Ltd as a consultant and software engineer doing UNIX kernel, security, and embedded work. He joined Gelato@UNSW at its inception in 2002. Peter started using UNIX in 1979 and has never used Microsoft operating systems for more than a few moments. His home life includes wife Lucy, who also works at Gelato@UNSW, and two small daughters. Peter's hobbies include music (he runs a recorder consort), aquaria (3 tanks at present, no room for more), and fine wines. |
| New Intel Itanium Architecture Specific Features for Intel VTune
Intel's VTune Performance Analyzer is a robust enterprise grade solution even with large executables (100MB+) that other products are unable to profile. This talk will cover specific features being added to better support supercomputer systems based on Intel Itanium 2 processors. Productive Eclipse 3.1 Integrated Design Environment VTune Performance Analyzer 8.1 for Linux makes application performance tuning easier with an improved graphical user interface which will be available for the first time on Intel Itanium 2 processors this includes wizards to simplify configuration and quickly get to application hotspots without learning about the tool. No recompiles or changes to your build script are required to use VTune Analyzer. Intel Itanium 2 Processor Features Specific topics I expect to cover include new lower overhead ways to collect data on multi-processor computers, event editing to allow more experienced users to tailor event-based sampling for specific purposes, opcode matching an advanced collection technique specific to Intel Itanium family processors. Compiler optimization report integration allowing users to easily view complex reports. Multi-user callgraph allows several users to do tuning on a single Itanium 2 computer simultaneously.
|
 | Paul M. Cohen - Intel
Paul Cohen is a Performance Tools Product Line Marketing Manager at Intel. He is responsible for Intel tools targeted at improving the performance of customer applications. His current focus is on improving usability of VTune Performance Analyzer, making it a robust enterprise-grade solution able to deal with extremely large executables (100MB+) that other products are unable to profile. In addition, he is working on integration of the VTune Analyzer with Intel C and FORTAN compliers under Eclipse with the ability to provide a close connection between Intel compiler optimization reports and performance bottlenecks represented in the VTune Analyzer. |
| Itanium Research at the University of Split
|
 | Mile Dzelalija - University of Split
Mile Dzelalija is a Professor of Physics at the University of Split. He received his PhD in 1995 at the University of Zagreb, Croatia, with the thesis "Entropy in Au + Au Reactions at Relativistic Energies" after researching at the GSI, Darmstadt, Germany. Recently, he has been very active in quality assurance in education at the university and national level. Dzelalija's main research interests include: simulations and data analysis of heavy-ion reactions (FOPI and CBM experiments at the GSI, Darmstadt), and Higgs boson and supersymmetry particles observability in high-energy experiments (future CMS experiment at the LHC at CERN in Geneva), especially in determining some global properties of reaction systems and new particles. He is also interested in experimental and theoretical research activities in Biomechanics, Environmental Physics, and Physics in Conservation of Fine Arts. In 2004 as part of the UNESCO (United Nations Educational, Scientific and Cultural Organization) project sponsored by HP, the Department of Physics of the Faculty of Natural Sciences, Mathematics, and Education at the University of Split received two Itanium machines. The focus of the project is to design, enhance, port, and tune the Faculty's own or third-party applications to the Itanium architecture in order to achieve better results regarding the time and precision of the results. The Faculty intends to integrate their computer systems into this grid project. |
| Focus on Scalability
The focus on scalability session will contain a series of short presentations of work in progress at several organizations to measure and improve the scalability of Linux.
A multiprocessor computer Linux kernel uses several process queues from which processes are selected to be scheduled. Because a process queue can become overloaded while other queues can be empty (or "underloaded"), Linux has a load balancing algorithm to balance all process queues in the system. Currently, Linux builds a memory access level hierarchy, which does not represent correctly the actual number of memory machine access levels. In our project, we have implemented a new strategy to build the correct hierarchy based on information provided in the SLIT table. We intend to present some results we have already produced based on this new strategy.
Peter will be presenting results of file system scalability measurements and other efforts underway at UNSW.
Lee will present an overview of various performance and scalability issues that the HP Linux Scalability and Performance project has investigated or is currently working. The presentation will include information regarding some of the Linux performance measurement/instrumentation tools being used at HP to investigate these issues. Areas of investigation include: AIM7 scaling on midrange systems [16cpu, ~140 file systems, ...], locking bottlenecks [page up to date lock, global inode lock, ...], LVM2/DM scaling vs raw SCSI performance, etc.
|
 | Lee Schermerhorn - HP
As a member of the Linux Performance and Scalability team within HP's Linux and Open Source Lab (LOSL), Lee Schermerhorn works on performance engineering for Linux, primarily on HP Integrity (Itanium) platforms, with emphasis on NUMA scheduling/affinity and (storage) IO performance. |
| Performance Comparison of Data Reordering Algorithms
Several performance improvements for parallel finite element edge-based sparse matrix-vector multiplication algorithms on unstructured grids are presented and tested. Edge data structures for tetrahedral meshes and triangular interface elements are treated, focusing on nodal and edges renumbering strategies for improving processor and memory hierarchy use. Benchmark computations on Intel Itanium 2 processors are performed. The results show performance improvements in CPU time ranging from 2 to 3.
|
 | Alvaro Coutinho - UFRJ
Alvaro Coutinho is the Director of the Center for Parallel Computing and a Professor in the Department of Civil Engineering at the Alberto Luiz
Coimbra Institute for Graduate Studies and Research in Engineering (COPPE) at the University of Rio de Janeiro (UFRJ), and is also a Research Fellow with the Brazilian National Scientific Research Council. Coutinho's areas of current research include: new algorithms for improving processor efficiency in unstructured grid parallel computations, a computational environment for petroleum systems modeling, and the GRAD-GIGA project, a computational grid for high-performance computing.
Coutinho received MSc and DSc degrees in Civil Engineering from UFRJ and has published 62 journal papers and 250 conference papers. |
| Paravirtualization without Pain
Virtual systems are useful for many purposes. One of my favorites is to allow the development of operating systems without having to reboot. But also, for server consolidation, resource isolation, etc.
However, a fully virtual system usually runs much more slowly than the bare metal, because page faults, and privileged operations, cause expensive traps into the host system. A common approach is to modify the guest system to call directly into the host without trapping, allowing much better performance. Unfortunately, paravirtualization (as this approach is called) leads to fairly invasive changes to the guest operating system. And because the changes are usually specific to a particular virtual machine, they have to be redone for each virtual machine. We propose a technique called "afterburning" that automatically paravirtualizes the guest operating system.With a bit of linker/loader magic, the resulting binary can run either on a virtual or a real machine. Performance is very similar to that of a manually paravirtualized system, but the amount of change to the operating system is much less.
|
 | Peter Chubb - University of New South Wales
Peter Chubb is a Senior Research Engineer at National ICT Australia and a Research Officer at UNSW. He completed his PhD under Associate Professor John Lions in 1989. Peter worked at Softway Pty Ltd as a consultant and software engineer doing UNIX kernel, security, and embedded work. He joined Gelato@UNSW at its inception in 2002. Peter started using UNIX in 1979 and has never used Microsoft operating systems for more than a few moments. His home life includes wife Lucy, who also works at Gelato@UNSW, and two small daughters. Peter's hobbies include music (he runs a recorder consort), aquaria (3 tanks at present, no room for more), and fine wines. |
| Itanium Research at the University of Buenos Aires
Topics presented will cover: new fast and robust solvers for very large linear systems implemented on Itanium 2 computers, and applications to computational mechanics and image reconstruction problems.
|
 | Hugo Daniel Scolnik - UBA
Hugo Daniel Scolnik is a Professor in the Computer Sciences Department (that he founded in 1984) at the School of Sciences of the University of Buenos Aires (UBA) where he teaches Cryptography, Numerical Analysis, and Optimization. For his Gelato-related work, Scolnik co-directed a Gelato-sponsored project comparing 64- and 32-bit architectures from the point of view of their performance for scientific programming. Scolnik is also currently directing three of his five graduate students on Gelato-related theses.
Beyond his work at UBA, Scolnik was an international consultant for United Nations agencies, HP, and Hitachi. He has been a Visiting Professor in several countries. He represents Argentina on the International Federation for Information Processing (IFIP) Technical Committee 7 (TC7). He has published papers on Optimization, Numerical Analysis, Automata Theory, Artificial Intelligence, Robotics, and Mathematical Modeling, and has refereed several journals. In 2003, Scolnik won the Konex Award for the best trajectory in Science and Technology for the 1993-2003 decade in the area of Informatics.
Scolnik received a Licenciado en Ciencias Matemáticas at the University of Buenos Aires in 1964, and a PhD in Mathematics from the University of Zurich, Switzerland, in 1970. |
| Xen-Virtualized Machines
Xen is rapidly becoming the de facto standard for open-source virtualization, with capabilities and performance matching or exceeding leading industry products. Paravirtualization techniques, efficient inter-domain virtual I/O mechanisms, clever migration, and support for multiple architectures (including VT and Pacifica hardware) have contributed to a large broad base of developers and has piqued industry interest. Xen/ia64 is the first non-x86 architecture supported by Xen. It is still a work-in-progress, but the core hypervisor component utilizes code and/or experience from Xen, Linux/ia64, and the HP vBlades research project. Many interesting strategies are employed to ensure correctness, optimize performance, and leverage the many rapidly developing layers of tools provided by Xen. We will provide a brief overview of virtualization in general, Xen specifically, and the current status of Xen/ia64. Then we will spend the remaining time discussing some interesting details about the inner workings of Xen on Itanium.
|
 | Dan Magenheimer - HP
Dan Magenheimer is a senior scientist working for HP Labs, Fort Collins, Colorado, USA. Dan joined HP in 1982 as a member of the processor architecture team that developed PA-RISC; he wrote the first PA-RISC simulator, remote debugger, object-code emulator (for the 16-bit HP3000), integer multiplication algorithm, and linker.
From 1985 until 2001, he managed various software teams in HP's software, server, and storage divisions. Returning to HP Labs in 2001, Dan joined a team investigating security and virtualization on the Itanium platform; this team developed vBlades, the first Itanium virtual machine monitor. When the Xen open-source virtual machine monitor was announced in 2003, Dan commenced a port of Xen to Itanium (Xen/ia64), utilizing the lessons learned in vBlades and also directly leveraging Linux/ia64 code. Dan is currently the maintainer of Xen/ia64 and is working with a multi-company, worldwide team of Itanium experts to help deliver the first open-source virtual machine monitor for Itanium capable of running multiple SMP guests and supporting migration.
Dan has a BA in Computer Science from the University of California and a MSEE from Stanford University. He is a member of the ACM and IEEE. |
| Proposal for Enhanced Open-Source Compiler Performance
In contrast to superscalar RISC processors which implement an out-of-order instruction execution model, the Itanium Processor Family (IPF) shifts the responsibility on the compiler to expose and exploit available instruction-level parallelism through aggressive code transformation and scheduling techniques. As such, Itanium compilers exert a big influence on delivered application performance. Although GCC performs very well in optimizing scalar codes, it lags behind commercial compiler offerings in optimization techniques employed for floating point intensive or loop intensive codes. While good progress is being made in implementing TREE_SSA based optimizations for GCC, an alternative, highly-leveraged, back-end compile path has the potential for delivering increased performance for loop-intensive codes in the short-term, and could be specifically targeted for the Montecito processor release.
The Open64 compiler is a 64-bit, open-source optimizing compiler that has been derived from the proven SGI MIPSpro production compiler. It has been ported to Itanium and was open sourced in 2000. Intel helped promote the use of this compiler to stimulate Itanium compiler research and christened it as the Open Research Compiler (ORC). ORC integrates the GCC front end, a robust middle end, and an optimized Itanium code generator. It could serve as the basis for a high performance open-source compiler for IPF in a relatively short time, and it can also be leveraged to quickly bring-up an alternative IPF-tuned backend compilation path for GCC that complements the evolving TREE-SSA-based GCC compiler optimization strategy.
In this talk, we advocate this complementary approach to implementing an IPF-tuned alternate GCC back-end to deliver the performance potential of the Montecito processor for the Gelato community and Linux/IPF GCC development community at large. To be clear, the long-term interests of GCC and its broad Linux user base are best served with a single high performance backend. In this talk we'll review a possible evolutionary path to integrating the mainstream GCC optimization path with the alternative backend with the support of the GCC developer community, drawing on the strengths of each.
|
 | Shin-Ming Liu - HP
Shin-Ming Liu is the Project Manager for High-Level Optimization and GCC of the Itanium C/C++ Compiler Section of the Java, Compiler, and Tools Lab at HP in Cupertino, California. Liu led the development effort for the high-level optimization and code generator project in compiler targeted for the Itanium processor. In this project, he helped redesigned the high-level optimization into a highly-robust, scalable, and efficient component by rearchitecting the infrastructure, from which many new techniques were developed. Many highly-recognized programming analysis methods were adopted as well. Liu led the reinvention of compiler development methodology by focusing on modulization, memory footprint control, canonical internal representation, and automatic error detection. Before joining HP, he worked at MIPS/SGI in the area of compiler front end, middle end, back end, and linker. During that time, he co-authored several technical publications. |
|
|