TIME is running out on the U.S. nuclear weapons stockpile. As the weapons age beyond their design lifetimes, important questions arise: Are the weapons still safe? Will they still perform reliably? How long will they continue to be reliable? What maintenance and retrofitting should be prescribed to extend their working life? These questions must be answered with confidence as long as nuclear deterrence remains an essential part of U.S. national security policy.|
With the U.S. commitment to the Comprehensive Test Ban Treaty, the viability of the U.S. nuclear arsenal can no longer be determined through underground nuclear testing. Thus, new approaches are being taken to maintain and preserve the U.S. nuclear deterrent through DOE's Stockpile Stewardship Program.
One key component of the multifaceted Stockpile Stewardship Program is the Accelerated Strategic Computing Initiative (ASCI), an effort to push computational power far beyond present capabilities so scientists can simulate the aging of U.S. nuclear weapons and predict their performance. To calculate in precise detail all the complex events of a thermonuclear explosion requires computational power that does not yet exist, nor would it exist any time soon without the ASCI push, even at computer development speeds predicted by Moore's Law (that computer power doubles about every two years). ASCI's goal is to put such a high-fidelity simulation capability in place in the near future. To do that, the American computer industry must dramatically speed up the pace of computational development. Currently, computing's top speed is 1.8 teraflops, that is, 1.8 trillion floating-point (arithmetic) operations per second. This speed must increase to at least 100 teraflops by 2004, growth that must be coordinated with a host of accomplishments in code development and networking.
Why is this accelerated schedule necessary? Not only are weapons aging, so are the nuclear weapons experts with experience in designing and testing them. The Stockpile Stewardship Program must have this high-fidelity, three-dimensional simulation capability in place before that expertise is gone. "It's a tremendously ambitious goal, especially under such a short schedule," says Randy Christensen, ASCI's deputy program leader at Lawrence Livermore National Laboratory. Christensen describes the work as something akin to "trying to get a computer code to run in a few days a simulation that would have taken so long with current capability that it would not have been attempted."
Developing the Platform|
ASCI's computer hardware is being developed by a consortium of three national laboratories and a select group of industrial partners in a prime example of government-industry cooperation. The national laboratories-Lawrence Livermore, Los Alamos, and Sandia- are each teamed with a major commercial computer manufacturer-IBM, Silicon Graphics-Cray, and Intel, respectively-to design and build parallel, supercomputing platforms capable of teraflops speeds.
The development of infrastructure technologies seeks to tap all available resources to make these computer platforms perform the kind of high-fidelity simulation that stockpile stewardship requires. ASCI has a PathForward component, a program that invites computer companies to collaborate in developing required technologies. For instance, the program's first PathForward contracts, announced on February 3, 1998, awarded more than $50 million over four years to four major U.S. computer companies to develop and engineer high-bandwidth and low-latency technologies for the interconnection of 10,000 commodity processors that are needed to build the 30-teraflops computer. (See box below.) As a result of this effort, subsequent collaborations involving other agencies, academia, and industry are expected.
ASCI is an applications-driven program. Unprecedented computer power with a first-rate computing environment is required to do ASCI's stockpile stewardship job, which is to run new computer codes programmed with all of the accumulated scientific knowledge necessary to simulate the long-term viability of our weapons systems. The new generation of advanced simulation codes being developed in the ASCI program must cover a wide range of events and describe many complex physical phenomena. They must address the weapon systems' normal performance from high-explosive initiation through final nuclear yield and the effects of changes introduced by remanufacturing (perhaps using different materials and fabrication methods) or defects brought on by aging. In addition, they must simulate weapon behavior in a wide variety of abnormal conditions to examine weapon safety issues in any conceivable accident scenario. If this weren't difficult enough, the new codes must provide a level of fidelity to the actual behavior of weapons that is much higher than their predecessors provided.
The major challenges facing the developers of these advanced simulation codes are to base them on rigorous, first-principles physics and eliminate many of the numerical approximations and simplified physics that limit the fidelity of current codes; make them run efficiently on emerging high-performance computer architectures; validate their usefulness by means of nonnuclear experiments and archival nuclear test data; and do all of these in time to meet stockpile needs.
Meeting these challenges requires the coordinated efforts of over a hundred physicists, engineers, and computer scientists organized into many teams. Some teams create the advanced weapon simulation codes, writing and integrating hundreds of smaller programs that treat individual aspects of weapon behavior into a single, powerful simulation engine that can model an entire weapon. Other teams are devoted to developing the advanced numerical algorithms that will allow these codes to run quickly on machines consisting of thousands of individual processors-a feat never before achieved with programs this complex. Still others are developing much improved models for the physics of nuclear weapon operation or for the behavior of weapon materials under the extreme conditions of a nuclear explosion. Both the scale (the largest teams have about 20 people) and the degree of integration demanded by this complex effort have required a much greater level of planning and coordination than was needed in the past.
One example of the advanced simulation capabilities being developed in the ASCI program is its material modeling program. Enormously powerful ASCI computers are being used to carry out very accurate, first-principles calculations of material behavior at the atomic and molecular level. This information is then used to create accurate and detailed models of material behavior at larger and larger length scales until we have a model that can be used directly in the weapon simulation codes (Figure 2). This computational approach to material modeling has already produced a much better understanding of the phase changes in actinides (the chemical family of plutonium and uranium). The new approach is expected to be applied to many weapons materials, ranging from plutonium to high explosives. When fully developed, it will become a powerful tool for understanding and predicting the behavior of any material (for example, alloys used in airplane construction, steel in bridges), not just those used in nuclear weapons.
Developing the Infrastructure|
In addition to platform and applications development, ASCI is also developing a powerful computer infrastructure. A high-performance problem-solving environment must be available to support and manage the workflow and the communications between all the ASCI machines. At any time, over 700 classified and unclassified code developers and testers may be accessing ASCI computers, either from within the national laboratories or via the Internet. A scalable network architecture, in which individual computers are connected by very high-speed switches into one system, makes this high-demand access possible. With such a configuration, the network is, in effect, the computer (Figure 3).
Allowing large numbers of computers to communicate over a network as if they were a single system requires sophisticated new tools to perform scientific data management, resource allocation, parallel input and output, ultrahigh-speed and high-capacity intelligent storage, and visualization. These capabilities must be layered into the computer architecture, between user and hardware, so that the two can interact effectively and transparently. The applications integrate the computing environment and allow users, for example, to access a file at any of the three national laboratory sites as if it were a local file or to share a local file with collaborators at any ASCI site.|
At Livermore, ASCI staff are performing numerous projects to develop this integrated computing environment. One team is working on a science-data management tool that organizes, retrieves, and shares data. An important objective of this tool is to reduce the amount of data needed for browsing terascale data sets. Another team is developing data storage that will offer a vast storage repository for keeping data available and safe 24 hours a day. The repository will store petabytes (quadrillions of bytes) of information, equivalent to one hundred times the contents of the Library of Congress. The storage device will also rapidly deliver information to users, at a rate of billions of bytes per second. Another ASCI team is writing scalable input and output software to move data from computer to computer and reduce congestion between computers and storage. The changes resulting from its improvements will be tantamount to moving busloads of data, as compared to carloads-a sort of mass transit for data.
Weapons scientists will be confronted with analyzing and understanding overwhelmingly large amounts of data derived from three-dimensional numerical models. To help them, ASCI is developing advanced tools and techniques for computer visualization, wherein stored data sets are read into a computer, processed into smaller data sets, and then rendered into images. The development of visualization tools for use across three national laboratories will require close collaboration with regard to programming language, organization, and data-formatting standards. The Livermore team is focusing on how to reduce data sets for visualization-because they surely will become larger and larger-through the use of such techniques as resampling, multiresolution representation, feature extraction, pattern recognition, subsetting, and probing.
While the fast, powerful machines and complex computer codes garner most of the headlines, this problem-solving-environment effort is fundamental to fulfilling the ASCI challenge. As we come to understand that "the network is the computer," the significance of this element of the ASCI program comes sharply into focus.
In Pursuit of 100 Teraflops
Improvements to ASCI power will occur over five generations of high-performance computers. To ensure success, multiple-platform development approaches are being attempted. This strategy will reduce risk, allow faster progress, and result in greater breadth of computing capability. For example, the Sandia/Intel Red machine, which was put on line in August 1995, has achieved 1.8-teraflops speed (currently the world's fastest) and is now being used for both code development and simulation. The Lawrence Livermore/ IBM Blue Pacific and the Los Alamos/ Silicon Graphics-Cray Blue Mountain systems, which resulted from technical bids awarded in late 1996, are already running calculations.|
Blue Pacific was delivered to Lawrence Livermore on September 20, 1996, with a thousand times more power than Livermore's existing Cray YMP supercomputer (Figure 5a). The Lawrence Livermore/IBM team installed and powered up the system and had it running calculations within two weeks. Already, it is conducting some of the most detailed code simulations to date.
The Blue Pacific initial-delivery system, which arrived in 340 refrigerator- sized crates, takes up a significant portion of Livermore's computing machine room space, operates at 136 gigaflops, and has 67 gigabytes of memory and 2.5 terabytes of storage (Figure 5b). Initially, each of its 512 nodes contained one processor. During March 1998, these nodes were replaced with four-way symmetrical multiprocessors, quadrupling the number of processors. A further improvement will endow it with thousands of significantly improved processor nodes for the ASCI production model. These reduced-instruction-set computing microprocessors operate at a peak of 800 megaflops and, in this configuration, will bring the system to a total of 3.28 teraflops.|
In that three-teraflops configuration, the Blue Pacific's "Sustained Stewardship Teraflops" system alone would more than fill up all the space in Livermore's current machine room. For that reason, construction crews are now building and wiring new space to accommodate it. In new, larger quarters, workers have been installing electric power, replacing air handlers and coolers, and hooking up new fans as part of necessary building upgrades. The numbers are impressive: 12,000 square feet of building extension, 5.65 megawatts of power, 11 tons of air conditioning, 16 air handlers that replace the air four times per minute, and controllers that keep the temperature between 52°ree; and 72°ree;F at all times. This machine is scheduled to be installed in March or April of 1999 (Figure 6).
Although work on weapons physics is classified, work on the methods and techniques for predictive materials models encompasses unclassified research activities. ASCI thus can pursue a strategy of scientific exchange with academic institutions that will more rapidly establish the viability of large-scale computational simulation and advance simulation technology. This strategy is embodied in the Academic Strategic Alliances Program. The program invites the nation's best scientists and engineers to help develop the computational tools needed to apply numerical simulation to real-world problems. In this way, a broader scientific expertise is at work making the case for simulation; simulation algorithms are tested over a broad range of problems; and the independently produced simulations provide a peer review that helps validate stockpile stewardship simulations (see box below).
The Academic Strategic Alliances Program
In July 1997, the Academic Strategic Alliances Program awarded Level I funds to five universities to perform scientific modeling to establish and validate modeling and simulation as viable scientific methodologies.
Stanford University will develop simulation technology for power generation and for designing gas turbine engines that are used in aircraft, locomotives, and boats. This technology is applicable to simulating high-explosive detonation and ignition.
At their computational facility for simulating the dynamic response of materials, the California Institute of Technology will investigate the effect of shock waves induced by high explosives on various materials in different phases.
The University of Chicago will simulate and analyze astrophysical thermonuclear flashes.
The University of Utah at Salt Lake will provide a set of tools to simulate accidental fires and explosions.
The University of Illinois at Urbana/Champaign will focus on detailed, whole-system simulation of solid-propellant rockets. This effort will increase the understanding of shock physics and the quantum chemistry of energetic materials, as well as the effects of aging and other deterioration.
These Level I projects are part of a 10-year program, in which projects can be renewed after five years. Also under the Alliances program, smaller research projects are being funded at universities across the country as Level II and III collaborations.
|Computers Changed It All|
In the short span of time since computers came into general use, the nature of problem-solving has changed, by first becoming reliant on computers, and then becoming constrained by the limits of computer power. ASCI will develop technologies that will make computational capability no longer the limiting factor in solving huge problems. Just as important, ASCI will change the fundamental way scientists and engineers solve problems, moving toward full integration of numerical simulation with scientific understanding garnered over decades of experimentation.
In the stockpile stewardship arena, the ASCI effort will support high-confidence assessments and stockpile certification through higher fidelity simulations. Throughout American science and industry, new products and technologies can be developed at reduced risk and cost. Advanced simulation technologies will allow scientists and engineers to do such things as study the workings of disease molecules, so they can design drugs that combat the disease; observe the effects of car crashes without an actual crash; and model global weather to determine how human activities might be affecting it. The uses are limitless, and their benefits would more than justify this investment in high-end computing, even beyond the benefits of ASCI's principal national-security objective. --Gloria Wilt
Key Words: Academic Strategic Alliances Program, Accelerated Strategic Computing Initiative (ASCI), computer infrastructure, computer platform, parallel computing, PathForward, problem-solving environment, Stockpile Stewardship Program, simulation, teraflops, weapons codes.
For further information contact Randy B. Christensen (925) 423-3054 (firstname.lastname@example.org).
RANDY CHRISTENSEN is Deputy Program Leader of the Department of Energy's Accelerated Strategic Computing Initiative (ASCI). He has broad management responsibilities within the ASCI program as well as specific responsibility for applications development. He holds a B.S. in physics from Utah State University and an M.S. and Ph.D. in physics from the University of Illinois. Following a postdoctoral fellowship at the Joint Institute for Laboratory Astrophysics (1978-1981), he joined Lawrence Livermore National Laboratory as a code physicist in the Defense and Nuclear Technologies Directorate. He held a number of leadership positions in that directorate before becoming Deputy Associate Director of the Computation Directorate in 1992, where his responsibilities included management of the Livermore Computing Center.