10/27/2010

Lab joins HPC leaders in effort to address data storage challenges

Donald B Johnston, LLNL, (925) 423-4902, johnston19@llnl.gov



Kim Cupps, LLNL High Performance Systems division leader, and Mark Seager, LLNL assistant department head for Advanced Technologies, inspect a newly installed rack for Dawn, a 500- teraFLOPS (trillion floating operations per second) IBM BlueGene/P system. Dawn will help lay the foundation for the 20 petaFLOPS (quadrillion floating operations per second) Sequoia system.


The Laboratory has joined forces with Cray, Data Direct Networks and Oak Ridge National Laboratory to form a non-profit corporation dedicated to addressing one of the great challenges facing High Performance Computing - data storage and management.

Open Scalable File System, Inc. (OpenSFS) will focus on the development of high performance computing (HPC) storage software technology to help supercomputing centers meet current and future data storage requirements. Also called the OpenSFS Alliance, the organization serves as a forum for collaboration among institutions using file systems on leading-edge HPC systems.

"By bringing together partners' expertise, we can do collectively what we would be very difficult and time-consuming to do individually. The alliance will allow us to accelerate the development of storage technology for next-generation HPC," said Mark Seager, LLNL lead for new HPC systems. "Ultimately it will benefit the scientific community, national security, industry and the nation's economic competitiveness."

A major challenge confronting the HPC community is storing and managing the enormous volumes of data generated by ever more powerful supercomputing systems. OpenSFS will work to improve currently available open source file systems, notably Lustre, as well as accelerate the development of future Lustre file systems. Lustre is "open-source" - software whose source code is shared and made available to the user community at no cost.

Lustre is a massively parallel distributed file system, developed by Oracle, used by large computing clusters. The name Lustre is derived from combining the operating system "Linux" and "cluster." Many of the most world's powerful supercomputing systems use Lustre file systems because they can support thousands of client systems, petabytes (trillions of bytes) of storage and hundreds of gigabytes (millions of bytes) per second of input/output communication or "throughput." Research and data centers use Lustre because of its scalability - the ease with which the system can be expanded to accommodate more powerful HPC systems.

"Our NNSA national security mission requires high-end HPC resources. We deploy over 22 systems in two distinct world-class simulation environments for multiple programs at LLNL.  The key integrating element in both of these simulation environments is the Lustre parallel file system," Seager said. "We welcome the formation of OpenSFS in order to provide the HPC Lustre community with a mechanism to share development and support resources."

The alliance will leverage established centers of excellence at ORNL and LLNL. Working together and with other members, these centers will use OpenSFS as a focal point for day-to-day collaboration. In addition to these activities, OpenSFS will hold an annual scalable file system workshop and will provide a variety of services (education and outreach, testing, documentation and project management) to the user community.

For more information about OpenSFS, Inc., see the Web.

 


Founded in 1952, Lawrence Livermore National Laboratory is a national security laboratory, with a mission to ensure national security and apply science and technology to the important issues of our time. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration.