C
hina is upgrading its number two supercomputer with new Chinese-made Matrix-2000 GPDSP accelerators. They will replace the existing Intel Knights Corner Xeon Phi coprocessors that were installed in the Tianhe-2 back in 2013. The upgraded supercomputer will be called the Tianhe-2A.
The original plan was to upgrade the system with the newer Knights Landing devices. But after the US government instituted an embargo on these chips to certain Chinese supercomputing sites, including the Guangzhou center, the National University of Defense Technology (NUDT) had to come up with plan B. In this case, that meant developing their own coprocessor. That turned out be the Matrix-2000, a DSP-type chip, tweaked for more general-purpose computation.
According to slides presented at the forum, each Matrix-2000 will deliver 2.4576 teraflops (peak), which more than doubles the 1.0 teraflops delivered by the original Xeon Phi chip. The Matrix-2000 consists of 128 cores, each one providing 16 double precision flops per cycle. Those flops are delivered by a 256-bit vector unit, which as Satoshi notes, is in line with the Knights Corner chip it replaces.
At least for the time being, the system will retain the original host CPUs from Tianhe-2, which are Intel Xeon processors. Each supercomputer node will pair two of those Intel CPUs with two Matrix-2000 coprocessors, hooked in via PCIe. The node count is being increased from 16,000 to 17,792.
Other enhancements include an interconnect that is 40 percent faster interconnect (to 14 Gbps) and has 50 percent lower latency (1 us). This is likely the TH-Express-2+ that NUDT has talked about before. In addition, main memory has been bumped from 1.4 to 3.4 petabytes, slightly improving the bytes-to-flops ratio of the Tianhe-2. Storage has also been enhanced in both capacity and I/O bandwidth.
Even though peak performance is going to nearly double, the system’s total power draw of 18 MW is just slightly more than that of the original system. That gives it a power efficiency of more than 5 gigaflops per watt, which would place it somewhere around the number 20 slot on the Green500 list.
Ironically, the upgrade won’t improve the system’s position in the TOP500 rankings. The number one Sunway TaihuLight has a peak performance of 125.4 petaflops, and attains 93 petaflops on the High Performance Linpack (HPL) benchmark. It’s unlikely Tianhe-2A will come in at better than 70 or 80 petaflops on HPL.
Tianhe-2 Supercomputer Being Upgraded to 95 Petaflops
Michael Feldman | September 20, 2017 05:46 CEST
The number two-ranked Tianhe-2 supercomputer, installed at the National Super Computer Center in Guangzhou, is being upgraded to 94.97 petaflops, nearly doubling its current peak performance of 54.9 petaflops.
The news comes out of the
International HPC Forum (IHPCF), via
a series of tweets from Satoshi Matsuoka posted on Tuesday. During the morning session, it was revealed that the upgraded system, dubbed Tianhe-2A, will sport the new Chinese-made Matrix-2000 GPDSP accelerators. They will replace the existing Intel Knights Corner Xeon Phi coprocessors that were installed in the Tianhe-2 back in 2013.
The original plan was to upgrade the system with the newer Knights Landing devices. But after the US government instituted an embargo on these chips to certain Chinese supercomputing sites, including the Guangzhou center, the National University of Defense Technology (NUDT) had to come up with plan B. In this case, that meant developing their own coprocessor. That turned out be the Matrix-2000, a DSP-type chip, tweaked for more general-purpose computation.
According to slides presented at the forum, each Matrix-2000 will deliver 2.4576 teraflops (peak), which more than doubles the 1.0 teraflops delivered by the original Xeon Phi chip. The Matrix-2000 consists of 128 cores, each one providing 16 double precision flops per cycle. Those flops are delivered by a 256-bit vector unit, which as Satoshi notes, is in line with the Knights Corner chip it replaces.
At least for the time being, the system will retain the original host CPUs from Tianhe-2, which are Intel Xeon processors. Each supercomputer node will pair two of those Intel CPUs with two Matrix-2000 coprocessors, hooked in via PCIe. The node count is being increased from 16,000 to 17,792.
Other enhancements include an interconnect that is 40 percent faster interconnect (to 14 Gbps) and has 50 percent lower latency (1 us). This is likely the TH-Express-2+ that NUDT has talked about before. In addition, main memory has been bumped from 1.4 to 3.4 petabytes, slightly improving the bytes-to-flops ratio of the Tianhe-2. Storage has also been enhanced in both capacity and I/O bandwidth. All the particulars are below, courtesy of James Lin,
who tweeted some nice screen images from the presentation.
Source: James Lin, @jameslinsjtu
Even though peak performance is going to nearly double, the system’s total power draw of 18 MW is just slightly more than that of the original system. That gives it a power efficiency of more than 5 gigaflops per watt, which would place it somewhere around the number 20 slot on the Green500 list.
Ironically, the upgrade won’t improve the system’s position in the TOP500 rankings. The number one Sunway TaihuLight has a peak performance of 125.4 petaflops, and attains 93 petaflops on the High Performance Linpack (HPL) benchmark. It’s unlikely Tianhe-2A will come in at better than 70 or 80 petaflops on HPL.
Nevertheless, the upgrade further cements China’s status as a serious supercomputing power, and does so, once again, with domestically produced technology. The country is currently the odds-on favorite to stand up the first exascale system, which it intends to do in the 2019-2020 timeframe.
https://www.top500.org/news/tianhe-2-supercomputer-being-upgraded-to-95-petaflops/
Technical pictures and info on 95 petaflop supercomputer
brian wang | September 20, 2017 |
Satoshi Matsuoka tweeted out details of a technical presentation on upgrades to the second most powerful supercomputer in the world, Tianhe-2A (improved from 56 petaflops to 95 petaflops).
https://www.nextbigfuture.com/2017/09/technical-pictures-and-info-on-95-petaflop-supercomputer.html
Floating Point 16 bit will be at 2-3 exaflop supercomputers in 2018
brian wang | September 20, 2017 |
Double precision exaflop/second has been the traditional definition of general purpose exaflop supercomputer. There are domain-specific machines and even the American DoE Summit and Sierra supercomputers where it can be different. These two machines, because of the NVIDIA Volta, will have significant acceleration speed in reduced precision arithmetic FP16, with what they call their Tensor Cores, which are in reality 4-by-4 FP16 single cycle matrix engines. The peak performance of Volta chips is 120 Tflop/s. So, the performance of the Summit and Sierra that will deploy these chips in tens of thousands, in double precision arithmetic, may be somewhere around 130-200 Petaflop/s, but in terms of their FP16 AI flop/s they will be 2-3 exaflop/s. The world had been fixated on double precision arithmetic being general purpose, that in reality, people are building machines that are a little bit more domain-specific, and we already will reach exascale by next year in that sense.
China’s second 100 Petaflop/s that is the successor to the Tianhe-2, the Tianhe-2A is going to be deployed sometime 2017 or early next year, but nonetheless, using indigenous Chinese technology, since they have been prohibited from using Intel Xeon Phi (Knights Landing) which was their original plan. Everybody agrees the US Intel chip ban actually drove them quicker to their goal, plus there are several other companies and centers in the running to reach exascale by 2020 at the earliest, or maybe 2021. In addition to Sunway TaihuLight, and the Tianhe-2A, there is a third project still in the running; three of their prototypes are to be presented, demonstrated and then going towards exascale.
The US Exascale Computing Project (ECP) has been given the guidance by, and is under direction of the Department of Energy. Both sides in there are involved: the Office of Science and the National Nuclear Security Administration (NNSA). It is a very risk adverse, very responsible project that has been defined under somewhat lower budgets than had been anticipated but at least with the assumption that it would achieve its end goal shortly after 2020.
Japan is largely on track with its Post-K. However, since the last ISC it was announced that this machine will be delayed one or two years due to the fact that semi-conductor scaling is slowing down, and as a result the anticipated performance could not be reached with the original plan. Fujitsu and Riken had to reorganize with a new plan that has the goal of the 2021 – 2022 timeframe deployment. They are adding new features, such as, it was announced, it will use an ARM processor in August with vector processor instruction set SVE extensions.
In Europe, they announced the intention to build exascale machines with European technology, the European Commission being very proactive and promoting this new direction. The details are not disclosed yet, so we will see next year what will happen with these European efforts. There are lots of research projects in Europe, but none of them are really, I would say, concrete enough by themselves to be able to build these large-scale machines in production, but I think finally Europe is stepping up to this game. However, compared to other countries, it does not have the industrial backing up to this exten
https://www.nextbigfuture.com/2017/...be-at-2-3-exaflop-supercomputers-in-2018.html