Gelato Home
  
Community
Home > Community > Events > Presentations—May 2004 Gelato Federation Meeting

Presentations—May 2004 Gelato Federation Meeting
More than 90 scientists and engineers from 30 Gelato Member institutions and several corporate Itanium vendors attended the May 2004 Gelato Federation meeting, which focused on current high performance computing issues and collaborative solutions specific to the Itanium platform. In a two-day period, attendees were treated to over two dozen presentations by some of the top research and industry users of Linux on Itanium.

Besides the technical presentations listed below, view some of the project posters and photographs from the meeting.

Digital Publishing, High-Performance Computing, and the Grid - Jan Allebach (Purdue University)

Lustre - Phil Schwan (Cluster File Systems, Inc.)

OpenIMPACT Team Presentation - Wen-mei Hwu and Team (University of Illinois)

CERN's Itanium Cluster Update - Including Gridification - Sverre Jarp (CERN)

Itanium-Based HP Linux systems - Jerry Huck (HP)

The Roar of Thunder: LLNL Goes Itanium in a Big Way - Robin Goldstone (LLNL)

High-Performance Computing at BP - Keith Gray (BP)

NCSA Team Presentation - Rob Pennington and Team (NCSA)

HPCS2: The Success of the IA-64 Cluster - Evan Felix (Pacific Northwest National Laboratory)

Comparing and Evaluating epoll, select, and poll - Louay Gammo (University of Waterloo) and Tim Brecht (University of Waterloo)

Biomedical Computations on Itanium - Guna Rajagopal (Bioinformatics Institute)

Buster - Parallel Debugger on IA-64 Cluster - Wei-Min Zheng (Tsinghua University)

SMP Concurrent Software Development in C++ - Peter Buhr (University of Waterloo)

Producing Standards-Compliant Eclipse Binaries: An LSB Case Study - Kevin Cernekee (Gelato Federation)

OpenSSI Linux Clustering for HPC on Itaniums - Bruce Walker (HP)

Portable Atomic Operations and Lock-Free Synchronization - Hans Boehm (HP)

Quality-Awareness for Data-Intensive Applications - Karsten Schwan (Georgia Institute of Technology)

Rocks Cluster Distribution - Mason J. Katz (San Diego Supercomputer Center)

Overview of q-syscollect and q-view - David Mosberger (HP)

Hyperspectral Classification and Dimensionality Reduction Algorithms - Wilson Rivera (University of Puerto Rico Mayaguez)

Heterogeneous Mid-Size Clusters: A New Research Focus for the Cluster Track - César De Rose (Pontifical Catholic University)

Linux/IA-64 Support for Performance Monitoring for the 2.6 Kernel Series - Stéphane Eranian (HP)

Itanium 2 Processor Architecture - Eric W. Moore (Intel Corporation)

Cluster Security and Pfilter - Neil Gorsuch (NCSA)

User Mode Drivers / Superpages and Page Tables - Peter Chubb (University of New South Wales)

Porting, Building, and Optimizing Applications for the Itanium 2 Processor - Eric W. Moore (Intel Corporation)

Itanium Tricks and Gotchas - Gernot Heiser (University of New South Wales)
 
Day One - Monday, May 24
Digital Publishing, High-Performance Computing, and the Grid (PDF)
Digital publishing refers to the integration of digital technologies in the traditional workflow of document creation, prepress preparation, rasterization, printing, finishing, and distribution that is associated with commercial printing. The traditional commercial printing world is essentially a cottage industry and the workflow is carried out in a very labor-intensive and paper-centric way. In that world, only long press runs of identical copies of a document are economically viable. With the emergence of digital presses that support variable data printing, each succeeding copy off the press can be completely different, as content is pulled from databases in real time. In addition, the industry is ripe for a transformation of the entire workflow to support distributed implementation at every stage from document creation through the actual printing. High-performance computing and the grid have important roles to play in this transformation. In this talk, I will discuss in some detail the digital publishing workflow, in order to characterize the needs and opportunities with respect to high-performance computing and the grid.
Jan Allebach (Purdue University)
Jan P. Allebach received his BSEE from the University of Delaware in 1972 and his PhD from Princeton University in 1976. He was on the faculty at the University of Delaware from 1976 to 1983. Since 1983, he has been at Purdue University where he is the Michael J. and Katherine R. Birck Professor of Electrical and Computer Engineering. His current research interests include image rendering, image quality, digital publishing, color imaging and color measurement, and document management. Allebach is a member of the IEEE Signal Processing (SP) Society, the Society for Imaging Science and Technology (IS&T), and SPIE. He is a fellow of the IEEE SP Society and IS&T. He has served as Distinguished/Visiting Lecturer for both societies and has served as an officer and board member of both. Allebach is a past associate editor for the IEEE Transactions on Signal Processing and the IEEE Transactions on Image Processing. He is presently editor for the IS&T/SPIE Journal of Electronic Imaging. He received the Senior (best paper) Award from the IEEE SP Society and the Bowman Award from IS&T. In 2004, he was named Electronic Imaging Scientist of the Year by IS&T and SPIE.
 
Lustre (PDF)
The Lustre cluster file system is well-known as a work in progress with many advanced features planned for the future. Less well-known is that Lustre is installed and stable in several production environments, including 4 of the 5 largest Linux supercomputers in the world, and two IA-64 clusters (totalling more than 6,000 IA-64 CPUs). We will discuss both Lustre's success in the real world, its extreme scalability, and how you can integrate Lustre into your next IA-64 cluster.
Phil Schwan (Cluster File Systems, Inc.)
Phil Schwan is the CEO of Cluster File Systems Inc., the designers and maintainers of the Lustre cluster file system. Co-developer of the first version of Lustre, he now chiefly manages the improvement and support of production versions, and oversees relationships with current and potential mission-critical Lustre users.
 
OpenIMPACT Team Presentation (PDF)
The OpenIMPACT team will present a status report on the compiler, covering both recent SPECint2000 research results, showing the benefit of more aggressive compilation on Itanium2 relative to GCC and commercial compilers, and progress on new infrastructure components that will enable stronger optimization on a broader range of applications. Topics will include our new and improved multi-file optimization environment, a promising new pointer analysis system, and continuing strides in compiling powerfully and efficiently for instruction-level parallelism.

Robert Kidd

Updating the IMPACT compiler from a research-oriented compiler to a general purpose application complier has required significant effort. Almost every part of the compiler, from the installer and interface to the core libraries, has been touched in some way. This presentation will detail some of the interesting challenges the IMPACT team encountered in this process, including user interface, debugging, and compile-time performance problems.

Shailesh Patel

Compatibility with the C++ language has been a goal of the IMPACT team for some time. Although this work is still in the early stages, the compiler can already process some C++ sources. Shailesh will present the challenges the team has faced in adapting the compiler to the C++ language. As this work is far from complete, the presentation will conclude with some discussion of outstanding issues and future plans for C++ support.
Wen-mei Hwu and Team (University of Illinois)
Wen-mei W. Hwu is the Walter J. ("Jerry") Sanders - Advanced Micro Devices Endowed Chair in Electrical and Computer Engineering in the Coordinated Science Laboratory of the University of Illinois at Urbana-Champaign. From 1997 to 1999, Prof. Hwu served as the chairman of the Computer Engineering Program at the University of Illinois. His research interest is in the area of architecture, implementation, and software for high-performance computer systems. He is the director of the OpenIMPACT project, which has delivered new compiler and computer architecture technologies to the computer industry since 1987. For his contributions to the areas of compiler optimization and computer architecture, he received the 1993 Eta Kappa Nu Outstanding Young Electrical Engineer Award, the 1994 Xerox Award for Faculty Research, the 1994 University Scholar Award of the University of Illinois, the 1997 Eta Kappa Nu-Holmes MacDonald Outstanding Teaching Award, the 1998 ACM SigArch Maurice Wilkes Award, the 1999 ACM Grace Murray Hopper Award, the 2001 Tau Beta Pi Daniel C. Drucker Eminent Faculty Award, and the 2002 ComputerWorld Honors Archive Medal. Prof. Hwu holds four patents and is a fellow of IEEE and ACM. He serves on the Executive Committee of the MARCO/DARPA Centers for Circuit and System Solutions and Gigascale Systems Research. Prof. Hwu received his PhD degree in Computer Science from the University of California, Berkeley.
 
CERN's Itanium Cluster Update - Including Gridification (PDF)
We will give an update of the current activities around CERN's Itanium-based cluster. This includes processor upgrades and installation of additional nodes, tests with 10 Gbit WAN, Infiniband interconnect, "gridification" of the cluster, recent benchmarks, and other items. In addition, we will discuss some of our plans for the future.
Sverre Jarp (CERN)
Sverre Jarp is the Chief Technology Officer in CERN's Openlab for DataGrid Application, a joint collaboration with industry to assess leading-edge information technology for the Large Hadron Collider's Computing Grid in 2007. He has been working in computing at CERN, the European Organization for Nuclear Research, for over 28 years and has held various managerial and technical positions promoting advanced but cost-effective computing solutions for the Laboratory. In 2001-02, he spent a sabbatical at HP Labs (Palo Alto, USA) working on software for the Itanium Processor Family. His current field of interest is, in particular, compiler optimization. Jarp holds a degree in Theoretical Physics from the Norwegian University of Science and Technology in Trondheim.
 
Itanium-Based HP Linux systems (PDF)
This talk covers HP's use of Itanium-based processors in systems running the Linux operating system. HP has spearheaded the porting of Linux to Itanium and continues to provide support and enhancements for HP's systems. Enterprise and HPC applications are being ported and other enhancements are making Linux a full-fledged member of HP's server offerings. Included will be a quick overview of the technology used by HP's custom chipsets and the features that enable both high-performance and commercial systems.
Jerry Huck (HP)
Jerry Huck is an HP Fellow and CTO of HP's server global business unit, which produces Itanium-based systems running HP-UX, Linux, and Microsoft Windows operating environments. He is responsible for technology and strategy development for HP's business-critical servers. This includes HP's work on compilers, platform and processor architecture, virtualization, and manageability solutions for HP's servers. Huck joined HP in 1983 and participated in the development of HP's PA-RISC architecture, specializing in floating-point and virtual memory definition. He and his team developed the 64-bit instruction set extensions to PA-RISC in the early 90s. Starting in 1994, Huck led the HP side of the instruction set and platform definition team for the co-developed Intel Itanium architecture. He continues to evangelize for HP's server offerings with customers and industry analysts. He received his PhD from Stanford in 1983 and holds more than 15 patents in computer architecture and design.
 
The Roar of Thunder: LLNL Goes Itanium in a Big Way (PDF)
Lawrence Livermore National Laboratory (LLNL) has long been at the forefront of high-performance scientific computing. LLNL houses some of the world's largest supercomputers, with an aggregate peak capability of nearly 90TF. Over half of this capability currently comes in the form of parallel Linux clusters. In October 2003 LLNL solicited bids to build its largest Linux cluster to date: a 1024-node system with quad Itanium2 processors and the Quadrics Elan4 high-performance interconnect. The culmination of this effort was the achievement of a Linpack result of 19.94TF (nearly 87% of the 22.9TF peak) in April 2004. This talk will provide an overview of LLNL's Linux cluster strategy and then chronicle the birth of the Thunder Cluster, including the RFP process, build and integration, software development efforts, Linpack benchmarking, lessons learned, and future directions.
Robin Goldstone (LLNL)
Robin Goldstone is the group leader for the Production Linux Group at LLNL. She manages the kernel and system programmers who develop the Linux software stack for LLNL's production HPC Linux clusters, as well as the system administrators who support these systems. Goldstone has been involved in the integration and deployment of HPC systems at LLNL for the past eight years. Prior to becoming involved in the LLNL Linux cluster effort, Goldstone was the platform integration lead for ASCI White, a large IBM SP system that was formerly the world's fastest supercomputer. Goldstone holds BS and MS degrees in Computer Science from California State University, Chico.
 
High-Performance Computing at BP
High-performance computing is critical to the success of BP's seismic imaging research. In support of that, we have deployed one of the largest Intel Itanium2 clusters. We continue to search for scalable I/O solutions to handle large volumes of seismic data in the cluster. This talk will discuss our cluster architecture and future requirements.
Keith Gray (BP)
Keith Gray is the Manager of High-Performance Computing at British Petroleum, where he is the architect for technical infrastructure for the company's Upstream Digital Business. He graduated from Virginia Tech with a degree in Geophysics and has spent 17 years within the oil and gas industry.
 
Day Two - Tuesday, May 25
NCSA Team Presentation (PDF)
Welcome, Rob Pennington

TeraGrid Software Management, Mike Showerman

This talk will provide an overall view of the TeraGrid project and will discuss software management and control for these systems. Early experiences with the first phase of TeraGrid IA-64 hardware/software stack will be detailed.

The Virtual Machine Interface (VMI): A High-Performance Communication Middleware, Avneesh Pant

The last few years have seen increasing acceptance and deployment of large-scale clusters as a viable high-performance computing platform. These clusters leverage COTS components in their construction that leads to increasingly heterogeneous platforms. The current Top500 list of supercomputers consists of three large clusters in the top 5, each sporting a distinct processor/network combination (Xeon/Myrinet, PowerPC/Infiniband, and Itanium/Quadrics). This environment hinders the portability and "roaming" of applications between clusters. We believe that for cluster-based grids to be a viable resource in this heterogeneous environment, these issues need to be addressed. The VMI middleware provides a high-performance communication middleware that abstracts the underlying network infrastructure from the application, allowing the ability to Compile Once Run Everywhere (CORE). NCSA has implemented an MPI 1.2-compliant implementation based on MPICH, using the VMI middleware. This talk will provide an overview of the VMI middleware and the MPICH implementation layered atop it.

NCSA Grid Computing Environments for Applications, Jay Alameda

We have been working to enable classes of applications to take advantage of grid-enabled resources. I will be discussing two classes of applications, severe storm simulation and a multiscale chemical engineering application, and how these applications informed the development of our middleware components, centered around the Open Grid computing environments Runtime Engine (OGRE). We have used this infrastructure successfully to place science applications on a variety of platforms, including the IA-64-based TeraGrid cluster.

Benchmarking NCSA machines, Dr. Nahil Sobh

The Performance Engineering and Computational Methods (PECM) group tested a combination of applications and kernels on NCSA's production machines. These applications and kernels are part of NCSA's benchmark suite known as the "NCSAbench." The NCSAbench has a dual role: to serve as a component in the acceptance criteria and to serve as a reference point when assessing the relative performance of new systems. All NCSAbench runs were executed on dedicated systems. In this short talk, we will report on our experience in benchmarking these systems.
Rob Pennington and Team (NCSA)
Rob Pennington is the Interim Director of NCSA. He leads the center's efforts to build and deploy terascale-level high-performance computing clusters, including a large Itanium-based cluster, for a national community of academic researchers. Pennington is the czar of the Cluster-in-a-Box effort, which is part of the new In-a-Box technology deployment initiative. He is a member of the HPC Open Source Working Group and the IEEE Task Force for Cluster Computing. He hasgiven presentations on issues related to Linux and NT clusters at SC99 and SC2000, the Intel Developers' Forum, and the Alliance Chautauquas. He is also the author or co-author of several papers on high-performance clusters and distributed cluster computing.
 
HPCS2: The Success of the IA-64 Cluster (PDF)
This presentation will feature an overview of the IA-64-based cluster that is currently in production at Pacific Northwest National Laboratories' Environmental Molecular Sciences Laboratory, discussion of the deployment of the 54 TB Lustre filesystem, and a new look at how large-scale clusters can be instrumented and improved.
Evan Felix (Pacific Northwest National Laboratory)
At PNNL, Evan Felix has been a key contributor to the design and implementation of MPP2, a High-Performance, Itanium-based computing cluster, where he served as the primary parallel file system designer. Evan is currently developing layers to enable Lustre to process scientific data at the storage server level. He is also working with beta storage systems using Lustre, multi-terabyte software raid arrays, and various interconnects, such as Quadrics elan4 and Infiniband.

Evan is a graduate of Utah State University, where he studied Computer Science and worked in the data processing lab at the USU Space Dynamics Laboratory. He joined Hewlett Packard as an engineer working on the HP Virtual Array storage devices. He has contributed to many open source projects, including Lustre, scsidev, and scsi generic drivers. Recently, his contributions have been included in the stock Linux 2.6 kernel.
 
Comparing and Evaluating epoll, select, and poll (PDF)
The epoll event notification subsystem in the 2.6 Linux kernel is intended as a faster and more scalable replacement for poll() and select(). We are conducting preliminary experiments to evaluate the performance of the epoll event notification subsystem in the 2.6 Linux kernel on Itanium2 systems using a high-performance, event-driven, HTTP server (userver). Our initial findings indicate that the userver performs worse when using level-triggered epoll events than select() events under the workload examined. Profiling indicates that with epoll, the epoll_ctl() system call accounts for a substantial portion of the execution time. We are using the userver to investigate the programming model used to obtain events with epoll and comparing the performance with that of select() and poll(). We are also investigating the performance impact of a new system call that allows the server to aggregate large numbers of epoll_ctl() calls into one epoll_ctlv() system call. Preliminary results indicate that under the workload being used, the performance obtained using epoll_ctlv() is better than that using epoll_ctl().

Unfortunately, performance is still lower using epoll_ctlv() than when using select() or poll(). We hope to report on experiments using a wider variety of workloads and alternative approaches to using epoll, including the use of the edge-triggered epoll mechanisms.
Louay Gammo (University of Waterloo)
Louay Gammo has been developing system software for System V, Solaris, and Linux-based systems for over 15 years. He has worked on C, C++, Java, and Ada compilers. He was also a key contributor on a high-availability software platform for Sun Microsystems. Most recently, he developed the software for a consumer-electronics product that uses a Linux kernel for its operating system.

Tim Brecht (University of Waterloo)
Tim Brecht obtained his BSc from the University of Saskatchewan in 1983, MMath from the University of Waterloo in 1985, and PhD from the University of Toronto in 1994. Before joined the University of Waterloo as an Associate Professor, he held positions as an Associate Professor at York University (Toronto), a Visiting Scientist at IBM's Center for Advanced Studies, and a Research Scientist with Hewlett Packard Labs. Current research interests include: operating systems; Internet systems, services and applications; parallel computing; and performance evaluation.
 
Biomedical Computations on Itanium (PDF)
The BII has been in the forefront of applying Itanium processors in biomedical research. It acquired its first Itanium1 system in late 2001 and subsequently upgraded to its current Itanium2 system. Since its incorporation into the existing mix of computing platforms, great demands have been placed on the Itanium system for researcher in which floating point calculations are central. This talk will provide an overview of ongoing work on the Itanium platforms, touching on molecular dynamics simulations, modeling biological networks, representing signal transduction pathways, etc. The challenges that we met and overcame in getting our applications to run on the Itanium platforms will be highlighted.
Guna Rajagopal (Bioinformatics Institute)
Dr. Gunaretnam ("Guna") Rajagopal is the Deputy Executive Director of the Bioinformatics Institute (BII) in Singapore, where he headed the team that was involved in the forward planning of the state-of-the-art cyber-infrastructure for the Biopolis, a biomedical complex of private and public research organizations to serve as the foundation for Singapore's effort in enhancing its biomedical R&D capabilities.

Prior to this appointment, Rajagopal was a staff member at the Cavendish Laboratory, University of Cambridge; a Fellow and Director of Studies in Physics and Mathematics at Jesus College, Cambridge; and a member of the College Council. Rajagopal received his BSc Ed from the University of Malaya in 1981, an MSc from the University of Malaya in Theoretical High Energy Physics in 1983, and a PhD in Computational Physics from the Georgia Institute of Technology in 1991.
 
Buster - Parallel Debugger on IA-64 Cluster (PDF)
Buster is a portable debugger for PVM / MPI programs on IA-64 clusters. It is based on the client-server model and uses the existing sequential debugger, GDB. Its most important characteristics include portability, robustness, scalability, and practicability. That means it can be practically used on most clusters with Linux-like operating systems. We will talk about the details of its design and implementation, such as leveled model, use of GDB, precisely-defined communication protocols, automatic detection of processes, and so on, to show how we achieve these characteristics. Finally, we will compare it with other debuggers.
Wei-Min Zheng (Tsinghua University)
Wei-Min Zheng is a Professor in the Department of Computer Science and Technology at Tsinghua University in Beijing, China. His major research interests include parallel/distributed and cluster computing, compiler techniques, and run-time system design for parallel processing systems. Zheng and his group, with 16 faculty and research staff members and 90 graduate students, are currently working on a number of R&D projects supported by the Natural Science Foundation of China and the National High-Tech Program in these areas.
 
SMP Concurrent Software Development in C++ (PDF)
C++ is a powerful, efficient, object-oriented development environment for HPC applications; however, it lacks concurrency. The uC++ project extends C++ with high-level object-oriented concurrency integrated into the C++ programming model, and tools to aid in the development and debugging process.

Three aspects of the uC++ project are presented:
1. Overview of the uC++ concurrent extensions: coroutines, monitors, tasks, exception handling, files/sockets, and real-time. Examples are presented showing how uC++ concurrency retains all aspects of the C++ coding model: inheritance, exception handling, templates, etc. uC++ uses an M:N thread model.
2. Overview of the uProfiler: statistical and exact profiling matching the C++/uC++ execution model. Software profiling for execution-state transitions, call graphs, partial-order event tracing, and memory usage. Hardware profiling using Perfmon (IA-64) tied into the software profiling, e.g., exact call-graph showing cache misses per task per routine.
3. Overview of thread-aware non-blocking I/O: file, server, acceptor, client objects. These objects simplify file and socket programming in HPC applications. Scalability issues are being examined for very large I/O applications (connections & data-flow).

Currently, the uC++ programming environment is being used to develop a high-performance multiprocessor web-server in conjunction with Tim Brecht's userver group, and for real-time capability in networking software for multithreaded network processors, e.g., Intel IXP2400.
Peter Buhr (University of Waterloo)
Peter A. Buhr received BSc Hons, MSc, and PhD degrees in computer science from the University of Manitoba in 1976, 1978, 1985, respectively. He is currently an Associate Professor in the Department of Computer Science at the University of Waterloo in Canada. His research interests include concurrency, concurrent profiling/debugging, persistence, and polymorphism. Dr. Buhr is a member of the Association of Computing Machinery.
 
Producing Standards-Compliant Eclipse Binaries: An LSB Case Study (PDF)
Gelato believes that the deployment of professional-grade development tools, such as the Eclipse IDE, is essential to the success of the IA-64 platform. Kevin has spent the past several months familiarizing himself with both Eclipse and the Linux Standard Base (LSB), a comprehensive set of standards that aims to provide binary-level compatibility across compliant systems. In this talk, he will explain the benefits of LSB compliance for both users and developers and will discuss his experiences generating LSB-compliant packages for the Eclipse IDE and all of its library dependencies.
Kevin Cernekee (Gelato Federation)
Kevin Cernekee is a software engineer at UIUC. He is in charge of Gelato software standardization efforts and also works on developer tools, such as Eclipse and OpenIMPACT. Kevin brings a wealth of industrial-level Linux/UNIX software engineering and system management experience to Gelato. He has done development work in diverse areas, such as kernel enhancements, user-mode applications and utilities, embedded devices, and high-level web application programming.
 
OpenSSI Linux Clustering for HPC on Itaniums (PDF)
The Open Single System Image Linux Cluster Project (OpenSSI) provides a very rich environment for HPC on Itanium. It simultaneously addresses scalability, usability, availability, and manageability. It is very stable, supported by HP, and features a single available root filesystem, integration with Lustre, single process space, process and connection load balancing, and integration with open source tools like Scaleable PBS, Maui, MPICH, Ganglia, etc. OpenSSI is also integrated with HP MPI and CMU and features transparent MPI checkpointing.
Bruce Walker (HP)
For the last 20 years, Dr. Walker has been involved with Single System Image (SSI) UNIX/Linux clustering. He got his PhD from UCLA with a thesis on clustered filesystems and was one of the founders of the Locus Computing Corporation. Dr. Walker has given numerous industry presentations and has published over a dozen papers and two books on computer security and clustering. Dr. Walker was the chief architect of the NonStop Clusters product and is now the architect and project lead for the OpenSSI Linux Cluster project.
 
Portable Atomic Operations and Lock-Free Synchronization (PDF)
Non-numerical multithreaded programs generally rely on monitors and condition variables for synchronization. This is sometimes insufficient, particularly if the application is either very sensitive to performance of inter-thread communication, or if it makes use of Posix signal handlers. We present a small library that provides access to hardware-provided lock-free atomic operations and memory barriers. We provide implementations for the most common platforms, allowing the code to remain reasonably portable. Unlike other such packages, we provide routines that represent combinations of atomic operations and memory ordering constraints. This allows us to take advantage of diverse hardware primitives – including Itanium's ordering completers – with minimal performance loss. This talk will conclude with some examples of applications.
Hans Boehm (HP)
Hans Boehm is the primary author of a widely used open source garbage collector library. Boehm holds a PhD from Cornell University. He was on the faculty at the University of Washington and at Rice University before joining Xerox PARC, SGI, and finally HP Labs. Boehm has chaired several programming language research conferences and is the past chair of ACM SIGPLAN.
 
Quality-Awareness for Data-Intensive Applications (PDF)
This talk presents middleware- and system-level support to dynamically manage the quality of information produced and presented by data-intensive applications. Targeted applications are characterized by their data-intensive nature; examples include remote data visualization for scientific applications and the event streams present in operational information systems. Results of our work demonstrate the necessity and utility of online quality management, jointly performed by system-level and middleware-level functionality, for this class of applications.
Karsten Schwan (Georgia Institute of Technology)
Karsten Schwan is a Professor in the College of Computing at the Georgia Institute of Technology. He is also the director of the Center for Experimental Research in Computer Systems (CERCS), which spans both Georgia Tech's College of Computing and School of Electrical and Computer Engineering. Professor Schwan's MSc and PhD degrees are from Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania. At CMU, he initiated his research in high-performance computing, addressing operating and programming systems support for the Cm* multiprocessor. At Ohio State University, he established the PArallel, Real-time Systems (PARTS) Laboratory, containing both custom embedded processors and commercial parallel machines, and conducted research on operating and programming system support for cluster computing and for adaptive real-time systems. At Georgia Tech, his work concerns: middleware and systems support for computational grids and for pervasive systems, focusing on the interactive nature of modern distributed and parallel applications (online monitoring and program steering); the adaptive nature of pervasive applications (system support for predictable operation in dynamic environments); and the online configuration and adaptation of distributed application or system components (network-aware communication).
 
Rocks Cluster Distribution (PDF)
This talk will introduce the Rocks Cluster Distribution, an effort to bring commodity clusters to application scientist. The first release of Rocks was November 2000, and our goal has remained 'to make clusters easy.' The target user of our software is individual researches who want to build their own supercomputer without hiring a system administrator. Rocks scales from test bed size (2 nodes) to world class size (over 300 nodes). Rocks is used around the world by researchers, companies, and government labs. Rocks supports x86, Itanium, and Opteron chips, all with identical software. A focus of this talk will be on the issues we face with Itanium and how software architecture minimizes the differences between CPU architectures.
Mason J. Katz (San Diego Supercomputer Center)
Mason J. Katz is the Group Leader for Cluster Development for the San Diego Supercomputer Center (SDSC) at the University of California (UCSD). Mr. Katz received his BS in Systems Engineering from the University of Arizona. He worked for five years as an embedded software engineer on networks of lightning detection sensors. He has spent the past six years working at the University of Arizona and then at UCSD/SDSC on projects ranging from network security protocols (IPSec), operating systems (x-kernel, Scout), and commodity clustering (HPVM, NPACI Rocks).
 
Overview of q-syscollect and q-view (PDF)
This talks provides an overview of the q-syscollect and q-view performance tools and the philosophy underlying their design. A distinguishing feature of these tools is that they can produce gprof-compatible results for an entire machine both at the user- and kernel-level without any intrusion to the monitored programs (no recompilation or dynamic code patching required). Furthermore, unlike with gprof, multithreaded applications and shared libraries are fully supported.
David Mosberger (HP)
David Mosberger is a senior research scientist at HP Labs, where he has been working on Linux-related projects. His research interests are in operating systems, high-performance Internet systems, and computer architecture. He holds a PhD degree in Computer Science from the University of Arizona, where he was one of the primary contributors to making Linux 64-bit clean and getting it to work on the Digital Alpha platform. Mosberger is also the co-author of the popular open-source scanning API known as SANE. At HP labs, he has authored httperf and has been spear-heading the effort to bring Linux to the IA-64 platform. Most recently, he published a book called "IA-64 Linux Kernel: Design and Implementation."
 
Hyperspectral Classification and Dimensionality Reduction Algorithms (PDF)
Hyperspectral imagery (HSI) analysis demands large input data sets and requires significant CPU time and memory capacity. Currently HSI researchers need tools that allow them to make image processing faster and more efficient. In this talk, we discuss our experiences in porting and tuning a set of hyperspectral classification and dimensionality reduction algorithms on Itanium, the parallelization of this set of algorithms, and its deployment on a Grid computing platform.
Wilson Rivera (University of Puerto Rico Mayaguez)
Wilson Rivera is an Assistant Professor of Computer Science in the Electrical and Computer Engineering Department at the University of Puerto Rico-Mayaguez (UPRM). His major research interests include parallel and distributed computing, high-performance computing, and information technology. He also leads the PDCLab at UPRM.
 
Heterogeneous Mid-Size Clusters: A New Research Focus for the Cluster Track (PDF)
After the October 2003 Gelato Federation meeting in Stockholm, the Cluster Scalability Focus Group and the Cluster Performance Focus Group were combined into one group to better accommodate members' research interests in this particular area. Some members also demonstrated interest in building heterogeneous clusters combining different architectures, including Itanium2, as a reduced-budget alternative to homogeneous Itanium2 clusters. The idea of this research focus is to investigate and solve the issues that result from a heterogeneous cluster configuration (with mixed architectures including Itanium1, Itanium2, Pentium, Xeon, etc.), like node interoperability at several levels (network, MPI, code), resource management, load balancing, and cross-compiling. This presentation will detail some of these issues with the intention to draw more members into joining this cluster research focus.
César De Rose (Pontifical Catholic University)
César De Rose is an Associate Professor in the Computer Science Department at the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil. His primary research interests are parallel and distributed computing and parallel architectures. He is currently conducting research on a variety of topics applied to clusters and grids, including resource management, resource monitoring, and distributed allocation strategies. Prof. De Rose received his doctoral degree in Computer Science from the University Karlsruhe, Germany, in 1998. He currently leads the Research Center in High-Performance Computing (CPAD - PUCRS/HP) at PUCRS.
 
Linux/IA-64 Support for Performance Monitoring for the 2.6 Kernel Series (PDF)
This talk is divided in two major parts. In the first half, we introduce the new perfmon-2 interface, which is implemented in all 2.6-based kernels. We present the major new features and differences with the previous interface used in the 2.4 kernels. In particular, we detail the new file descriptor-based approach, the support for custom sampling buffer formats, and what it can do to help port existing monitoring tools. In the second part of the presentation, we give an overview of the existing performance monitoring tools and libraries. In particular, we focus on pfmon and q-syscollect, two tools developed by HP Labs. We show examples of what they provide to help understand performance problems of your applications.
Stéphane Eranian (HP)
Stéphane Eranian is a Senior Research Scientist at Hewlett Packard Labs, where he has been working on the porting of Linux to the IA-64 platform since 1998. He has made numerous contributions to the Linux/IA-64 kernel and related user-level programs. He is the main architect of the Linux/IA-64 kernel performance monitoring subsystem (perfmon). He is also the creator of the pfmon tool, which uses this subsystem to collect performance information.

Before joining HP, Stéphane worked on his PhD at Chorus Systems (now Jaluna) in France. He holds a D.E.A. (BSc degree) in Operating systems from Universite PARIS 6, France, and a Doctorate (PhD degree) in Computer Science from Universite PARIS 7, France. He is a member of USENIX and co-author of "IA-64 Linux Kernel: Design and Implementation."
 
Itanium 2 Processor Architecture (PDF)
This presentation is a basic overview of the Itanium (IA-64) architecture
Eric W. Moore (Intel Corporation)
Eric Wynne Moore is a Senior Software Engineer working in the Software Products Division at Intel Corporation. In the past, he has worked at Rational, Microsoft, RealAudio, Digital, Compaq, and Keane. His specialties include Operating Systems, Compilers, High Performance Computing, and Performance Tuning. In the last couple of years, Moore has trained more than 500 engineers in performance optimization, including engineers at PNNL, CIA, FBI, IBM, Dell, Hewlett Packard, SGI, Cisco, Intel, several universities, as well as all over the world, including USA, Korea, China, Brazil, and Europe.
 
Cluster Security and Pfilter (PDF)
Various aspects of cluster security will be presented, including special security difficulties introduced by the nature of clusters. Security layers will be discussed, concentrating on packet filtering systems and the use of the Pfilter firewall compiler system.
Neil Gorsuch (NCSA)
Neil Gorsuch started his career working as a software consultant in Southern California. He co-founded a company that designed oil well control equipment. He invented a device to add various types of serial and parallel ports to a non-expandable UNIX workstation. After being the sole owner and running that company for a number of years, Gorsuch decided to concentrate on his family rather than being an entrepreneur. Moving to the Midwest, he worked at Motorola, writing kernel drivers and cell phone firmware. Currently, Gorsuch works for the NCSA at the University of Illinois in the computational cluster development group, where he writes system and security software for cluster. He enjoys being with his family and activities such as house restoration and scuba diving.
 
User Mode Drivers / Superpages and Page Tables (PDF)
User Mode Drivers: Since the Stockholm meeting, we've managed to get a usable user-mode gigabit ethernet driver going, and have fixed many of the performance problems with the user mode IDE driver. User-mode IDE now outperforms the in-kernel driver. We're still working on the gigabit ethernet. Superpages and Page Tables: The Linux kernel has always used a fixed-radix, two- or three-level page table. Architectures such as IA-64 and PPC, that have no hardware support for the Linux page table, then copy information from the Linux page table into their own architecture-specific format page table. TLB misses are then handled by hardware walking the architecture-specific page table. Then if the page table entry is not present, the operating system is invoked to walk the Linux page table, copy the data into the architecture-specific page table, then reinvoke the page fault. TLB misses can be reduced by using larger pages; pages larger than the base page size are called superpages.

We're working on two fronts:
1. Introduce a new page table format that is better suited to large sparse virtual address spaces and that is easier to use with superpages; the guarded page table [1].
2. Introduce and experiment with super-page support within the existing three-level page table, with work based on [2].
[1] Liedtke, Page table structures for fine-grain virtual memory,Tech. Rep. 872, German National Research Center for Computer Science (GMD), Oct. 1994. http://citeseer.ist.psu.edu/liedtke94page.html
[2] http://shimizu-lab.dt.u-tokai.ac.jp/lsp.html
Peter Chubb (University of New South Wales)
Peter Chubb is a Senior Research Engineer at National ICT Australia and a Research Officer at UNSW. He completed his PhD under Associate Professor John Lions in 1989. Peter worked at Softway Pty Ltd as a consultant and software engineer doing UNIX kernel, security, and embedded work. He joined Gelato@UNSW at its inception in 2002.

Peter started using UNIX in 1979 and has never used Microsoft operating systems for more than a few moments. His home life includes wife Lucy, who also works at Gelato@UNSW, and two small daughters. Peter's hobbies include music (he runs a recorder consort), aquaria (3 tanks at present, no room for more), and fine wines.
 
Porting, Building, and Optimizing Applications for the Itanium 2 Processor (PDF)
A methodical approach to identifying and removing execution bottlenecks on the Intel Itanium 2 Processor in order to achieve peak execution efficiency will be described. This presentation will also discuss advanced usage of various Intel Software Development Tools to achieve this performance.
Eric W. Moore (Intel Corporation)
 
Itanium Tricks and Gotchas (PDF)
This talk covers our experiences optimizing Itanium system code, using a fast message-passing implementation as an example. What seemed like an optimal implementation (going by the documentation) ran about four times slower than expected. Through careful instruction scheduling, we managed to reduce the cost by 70%. More than half of this was achieved by reducing backend pipeline stalls. We used the processor's performance monitoring support for thorough-cycle accounting. Our optimization efforts were hampered by the lack of available documentation regarding processor implementation details, such as cache replacement policies and properties of internal processor buffers. This made it difficult to identify and eliminate some of the stall conditions. We were able to deduce some of the details using a number of microbenchmarks, but we still cannot account for some of the observed timings. We expect that compiler writers face similar issues on the Itanium, and would like to appeal to Intel to provide more complete documentation.
Gernot Heiser (University of New South Wales)
Gernot Heiser is a Professor of Operating Systems, School of Computer Science & Engineering, UNSW, and a Program Leader, Embedded, Real-Time and Operating Systems, National ICT Australia. His research focus includes: operating systems, embedded systems, computer architecture, real-time systems, and distributed systems.
-
-
-
-
-

 

All content © copyright 2002-2006 Gelato Federation. Click here to view the Gelato Federation Privacy Policy and Terms of Service Agreement. If you have any questions or comments, please contact us.

Gelato Central Operations is housed within the Coordinated Science Laboratory (CSL) of the College of Engineering at the University of Illinois at Urbana-Champaign (UIUC).