Obama rolls out plan to boost U.S. supercomputer prowess

F-22Raptor · Jul 31, 2015

U.S. President Barack Obama has signed an executive order setting up the National Strategic Computing Initiative, to coordinate government agencies, academia and the private sector for the development of high-performance computing systems.

Adopting a "whole-of-government" approach, involving all departments and agencies with expertise and interests in high-performance computing (HPC), one of the objectives of the NSCI will be to speed up the delivery of "a capable exascale computing system that integrates hardware and software capability to deliver approximately 100 times the performance of current 10 petaflop systems across a range of applications representing government needs."

China currently leads the supercomputer race with the Tianhe-2 computer, developed by China's National University of Defense Technology, heading the list for over two years. The Tianhe-2's maximum achieved performance of 33.86 petaflops per second (quadrillions of calculations per second) on the Linpack benchmark is almost double that of Titan, a Cray XK7 supercomputer installed at the U.S. Department of Energy, which rated 17.59 petaflops per second, according to latest edition of the Top500 list of the world's top supercomputers.

The U.S. still has the largest number of supercomputers on the Top500 list with 233 such computers, down from 265 on the November 2013 list.

The government appears to want to remedy that situation. "Maximizing the benefits of HPC in the coming decades will require an effective national response to increasing demands for computing power, emerging technological challenges and opportunities, and growing economic dependency on and competition with other nations," Obama wrote in his order on Wednesday.

One of the objectives of the NSCI will be to provide a viable path over the next 15 years, even after the limits of current semiconductor technology are reached in the "post-Moore's law era." The law, named after Intel cofounder Gordon Moore, predicts the doubling of transistor density approximately every two years, allowing chips to get faster and cheaper.

The three lead agencies for the NSCI will be the Department of Energy, the Department of Defense and the National Science Foundation, each focusing on different areas of HPC. They will work with the Intelligence Advanced Research Projects Activity (IARPA) and the National Institute of Standards and Technology (NIST), described as foundational research and development agencies. While IARPA will focus on alternatives to standard semiconductor computing technologies, NIST will focus on measurement science.

Obama has also named deployment agencies, which are essentially key user agencies that could participate in and influence the design of systems, software and applications, to integrate their special requirements. These agencies are NASA, the FBI, the National Institutes of Health, Department of Homeland Security, and the National Oceanic and Atmospheric Administration.

The NSCI executive council, co-chaired by the Director of the Office of Science and Technology Policy and the Director of the Office of Management and Budget, will submit an implementation plan within 90 days of Obama's order.

Obama did not, however, announce a timeline for the creation of the supercomputer. The Department of Energy has said it plans to develop and deliver exascale computing systems by 2023 or 2024, with a hundred to thousand-fold increase in sustained performance over current computing capabilities. A department task force said last year an incremental $3 billion in investment would be required over 10 years to deliver the exascale computing.

Obama rolls out plan to boost U.S. supercomputer prowess | Computerworld

Kao Boy · Jul 31, 2015

US might not surpass in the Super computer prowess against China but has surely surpassed in the number of super computer owned by them... Tiahne-2 wouls still remain undisputed with latest update...

F-22Raptor · Jul 31, 2015

Info on 3 US supercomputers coming over the next 3 years...

For the first time in over twenty years of supercomputing history, a chipmaker, as opposed to a systems vendor, has been awarded the contract to build a leading-edge national computing resource. This machine, expected to reach a peak performance of 180 petaflops, will provide massive compute power to Argonne National Laboratory, which will receive the HPC gear in 2018.

Supercomputer maker Cray, which itself has had a remarkable couple of years contract-wise in government and commercial spheres, will be the integrator and manufacturer of the “Aurora” super for Argonne. This machine will be a next-generation variant of its “Shasta” supercomputer line, which it has been designing in conjunction with Intel since the chip maker bought the Cray interconnect business three years ago for $140 million.

The new $200 million supercomputer is set to be installed at Argonne’s Leadership Computing Facility in 2018, rounding out a trio of systems aimed at bolstering nuclear security initiatives as well as pushing the performance of key technical computing applications valued by the Department of Energy and other agencies.

This is the third and final announcement for pre-exascale class systems under the CORAL initiative, a $525 million undertaking that was announced in 2014 to bring a new generation of large-scale systems into play that will set the stage for future machines, which the DoE expects to be capable of pushing exaflops somewhere in the 2020 to 2022 timeframe. CORAL is a collaboration involving three national labs in the United States–Argonne, Oak Ridge, and Lawrence Livermore–with inter-agency support from the Department of Energy, the National Nuclear Administration (NNSA), and the Office of Science.

The machine at Oak Ridge National Laboratory, called Summit, will be delivered in 2017. It will provide over 5X the performance of the current top system at the lab, the Opteron and GPU-powered Cray XK7 Titan machine (also the second most powerful supercomputer on the planet) in one-fifth the number of nodes. The new Summit machine is expected to push 150 to 300 peak theoretical petaflops—a significant improvement over Titan, which tops out at 27 petaflops. The CORAL supercomputer at Lawrence Livermore National Laboratory, named Sierra, which is also set to be installed in 2017, will provide over 100 petaflops of peak performance.

For context, the relative performance of existing DoE supercomputers is below. Recall as well that the current top machine at Argonne is the Mira supercomputer, which delivers 10 petaflops peak while drawing 4.8 megawatts of power. Its successor, Aurora, with its 180 petaflops peak will pull 13 megawatts. This is an 18X performance improvement in performance with just 2.7x the power. During interviews today we will hopefully be able to extract how many nodes and cores this Aurora machine will have.

Both of these other CORAL machines sport a similar architecture that is based on IBM Power9 chips, which will emerge just in time with delivery of these systems. The performance and efficiency boost will come from Nvidia Volta Tesla GPUs with Mellanox EDR InfiniBand hooking the hybrid Power-Tesla nodes together. Two key features of the Volta GPU are stacked memory and NVLink interconnect, both of which are important for keeping the processors on both the Power and Tesla components fed.

All of this begs the question, if IBM, Nvidia, and Mellanox have collaborated to provide performance on these two earlier systems, it only stands to reason that with a delivery year of 2018, the Aurora machine will sport a different architecture (generally for multi-system procurements there is a mix of system design and vendors for balance). By the time 2018 rolls around, there is one architecture that appears to be most fitting, although it has only been a glimmer in Intel’s eye as it continues development of the predecessor architectures, Knights Landing and Knights Corner.

Knights Hill is the third in this generation of processors that so far have only found a real home in large-scale supercomputers, even though Intel is predicting and engineering the Knights processors for an expansion into enterprise markets. Further, Knights Hill will very likely mark a significant performance improvement over the Knights Landing processors that will be found in two other upcoming DoE supercomputers, the Trinity and Cori machines.

Intel has not said much about the future Knights Hill massively parallel processor, and very likely will not until it is about a year away from shipping the part in systems, if history is any guide. But to get the supercomputing labs and other possible commercial customers to start investing in the Knights chips, Intel had to provide some kind of roadmap that extended out beyond the impending Knights Landing chips, which it did last fall at the SC14 supercomputing conference. We have learned a lot more about Knights Landing in the past month, and that gives us some clues as to what can and will be done to create Knights Hill, which is very likely the motor inside of Aurora.

Just to review, Knights Landing has at least 60 heavily modified “Silvermont” Atom cores, which have a pair of 512-bit AVX2 vector math units attached to them. The rumor is that Knights Landing will have 72 cores, which means 144 of these vector units. Yields and thermal envelopes being what they are, it seems likely that Intel will not ship Knights Landing parts with all 72 cores activated; we expect clock speeds in the range of 1.2 GHz to 1.3 GHz if the core count is relatively low and slower if the core count is higher. The main thing is that Intel wants to have the Knights Landing chip hit more than 3 teraflops, which is triple that of the current “Knights Corner” coprocessor. The Knights Landing chip will have 16 GB of high bandwidth memory hooked to the same 2D mesh that links all of the cores together, and also has two DDR4 memory controllers on the mesh that link out to up to 384 GB of memory. (This near and far memory can be addressed in a number of ways, as we explained last month.) The important thing is not just local memory, but high memory bandwidth. Avinash Sodani, chief architect of the Knights Landing chip at Intel, told The Platform that the DDR4 far memory has about 90 GB/sec of bandwidth, which is on par with a Xeon server chip, and that 16 GB of HBM delivers more than 400 GB/sec of aggregate memory bandwidth.

The 2D mesh on the Knights Landing chip is new for Intel, and when asked about how far it could scale, Sodani said that the architecture could scale for two to four generations for sure. And when we suggested that Intel could, in theory, create NUMA-capable Xeon Phi processors but that it made far more sense to just add more components – cores, memory controllers, and HBM ports – to this 2D mesh and ride down the Moore’s Law curve, Sodani did not say we were wrong and moreover, said that making NUMA machines would not be possible because the bandwidth of memory requests across the NUMA interconnect would utterly swamp the processor interconnect. So, that tells us Knights Hill will have more cores, and as many cores as the process used to etch it will allow.

Intel has said that Knights Hill will use a 10 nanometer process, which is the one it will be ramping later this year for its processors for various client devices. The shift from the 14 nanometer processes used with Knights Landing to the 10 nanometer processes used with Knights Hill probably won’t yield a big change in core counts or clock speeds – maybe something on the order of a 30 percent to 50 percent boost in cores (call it somewhere between 90 and 100 cores) and about the same clock speed (somewhere around 1.2 GHz). Intel could keep adding more AVX vector units to the custom Silvermont cores to boost floating point performance, but has said that it wants to show pretty decent integer performance with the Knights chips, too. So it seems unlikely it will let them get too far out of balance after just getting them into balance with Knights Landing.

That’s another way of saying that Knights Hill might deliver somewhere between 4 teraflops and 4.5 teraflops of peak floating point performance. It is safe to say that local memory on the package and addressable through DDR controllers – as well as bandwidth on these memories – will scale proportionately.

Intel has also said that Knights Hill will support the second generation of its Omni-Path networking, which very likely runs at 200 Gb/sec if it is to compete against the HDR InfiniBand switching from Mellanox Technologies, due in the late 2017/early 2018 timeframe.

Future Intel Chips Shine in 180 Petaflops Argonne Supercomputer

Kao Boy said:
US might not surpass in the Super computer prowess against China but has surely surpassed in the number of super computer owned by them... Tiahne-2 wouls still remain undisputed with latest update...

Not with the 3 supercomputers below in development...

C130 · Jul 31, 2015

right now it's about who's willing to spend the most.

Search

Obama rolls out plan to boost U.S. supercomputer prowess

F-22Raptor

ELITE MEMBER

Kao Boy

FULL MEMBER

F-22Raptor

ELITE MEMBER

C130

ELITE MEMBER

Similar threads

Latest posts

Pakistan Defence Latest Posts

Pakistan Affairs Latest Posts

Military Forum Latest Posts

Country Latest Posts