Typical simulation work that has been completed by Sunway Taihu Light
- Peta-Scale Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics with 8.5M Cores / 千万亿次八百五十万核可扩展非静力大气动力全隐求解器
This research wins the 2016 Gordon Bell Prize in High Performance Computing
An ultra-scalable fully implicit is developed for stiff time-dependent problems frequently found in atmospheric dynamics. In the solver, a hybrid multigrid domain decomposition preconditioner is proposed to greatly accelerate the convergence of the solver, and to exploit coarse-grained parallelism. A physics-based multi-block asynchronized incomplete LU factorization method is customized to solve the subproblems on each overlapped subdomain to further gain fine-grained concurrency. We perform systematic optimizations on different hardware levels for best utilization of the heterogeneous computing units and substantial reduction of the cost of data-movement. The solver enables fast and accurate atmospheric simulations on the emerging heterogeneous Sunway supercomputer in China, scaling to over 8.5 million heterogeneous cores
本项目设计并开发了一种用于大气动力学中经常出现的强时间依赖问题的高可扩展性全隐式求解器。该求解器使用异构多重网格局部分解算法,显著加快了求解器的收敛过程,并利用粗粒度并行。此外,设计了在物理上的多块异步不完全LU分解方法来解决每个重叠子域上的子问题,从而进一步取得粗粒度并发。同时,在不同硬件层级上实现了系统层面上的优化,充分利用异构计算单元,减少数据移动的开销。基于“神威·太湖之光”超级计算机,该求解器实现了快速且准确的大气模拟,可以扩展到850万核
- Large Scale Phase Field Simulation for Coarsening Dynamics Based on Cahn-Hilliard Equation with Degenerated Mobility / 钛合金微结构演化相场模拟
We present large scale phase field simulation on the new Sunway TaihuLight supercomputer. The highly nonlinear and severely stiff Cahn-Hilliard equations with degenerated mobility for microstructure evolution are solved at extreme scale, demonstrating that the latest advent of high performance computing platform and the new advances in algorithm design are now offering us the possibility to accurately simulate the coarsening dynamics at unprecedented spatial and time scales.
钛合金制备工艺复杂,微观组织形成机制和规律难以通过实验获得,常借助于软件模拟。相场法能够模拟微观组织的演化过程,广泛应用于新材料的设计。ScETD-PF是基于可扩展紧致指数时间差分算法库的相场模拟软件。该软件由中科院网络中心自主开发,支持计算材料科学、计算物理、计算生命科学等学科科研模拟。该应用首次实现了国际最大规模的钛合金微结构粗化相场模拟,显著加快了我国新型钛合金的设计和工艺优化。
This research is selected as a finalist for the 2016 Gordon Bell Prize.
- A Highly Effective Global Surface Wave Numercial Simulation with Ultra-high Resolution / 高分辨率海浪数值模拟
This research is selected as a finalist for the 2016 Gordon Bell Prize.
Surface wave is one of the most energetic motions in the global ocean, and it is crucially important to marine safety and climate change. High resolution global wave model has the key role for accurate wave forecasting. However the parallel efficiency with a large amount of computation is a big barrier for this kind model by now. In this work, a breakthrough in the design and application of irregular quasi-rectangular grid decomposition, master-slave cooperative computing workflow and pipelining schemes for high resolution global wave model has been achieved. Based on these innovations, the ultra-high horizontal resolution of (1/60) °by (1/60) °global wave model is implemented in the new Sunway heterogeneous Supercomputer with 100 PFlops peak performance. The results show that peak performance of our model can reach 30.07 PFlops with full-scale system consisting of 8,519,680 cores. These innovations provide good scalability and high efficiency for ultra-high resolution global wave model.
对于海洋模式模拟而言,分辨率的提高会带来计算量的大幅提升。如果水平分辨率提高10倍,模式的计算量将增加数百乃至上千倍,是未来E级计算机系统的驱动应用。该应用在“神威·太湖之光”超级计算机实现了(1/60)°高分辨率的全球海洋模式,通过从核加速以及负载均衡、通信重叠和指令流水等优化手段,模式成功扩展到8,519,680核数,达到最高30.07PFlops的峰值性能,获得了优异的扩展性与并行效率。
- Numerical Simulation of the Aerospace-craft Unification Algorithm / 航天飞行器统一算法数值模拟
Tiangong-1 is China’s first space station, which serves as an experimental testbed for orbital rendezvous and docking. Tiangong-1 is also the prototype of China’s future space lab. In this work, we perform the simulation of the turbulent state of the two-cabin simplified model (10-meter long, with a diameter around 3.5 meters in the cross section) of the Tiangong-1 spaceship in the failing process (flying height=65KM, Ma=13). By using 16,384 processors of the Sunway system,
the computation job, which normally takes 12 months, was finished in 20 days. In addition, the simulation results provide a good fit to the result of the wind tunnel test.
基于”神威·太湖之光“超级计算机,对”天宫一号“飞行器两舱简化外形(长度10余米、横截面直径近3.5米)陨落飞行(H=65km、62km、Ma=13)绕流状态大规模并行模拟,使用16,384个处理器在20天内便完成常规需要12个月的计算任务,计算结果与风洞实验结果吻合较好,为”天宫一号“飞行试验提供重要数据支持。
This is a particularly interesting piece of work. China conducted the Tiangong-1 mission in 2011. And such simulation procedures are usually conducted two to three years before the real mission. So does it mean Sunway Taihu Light, or the prototype of Sunway Taihu Light already existed in 2008/09???
@cirr @TaiShang @AndrewJin @Shotgunner51 @ahojunk @JSCh
- Refactoring and Optimizing the CAM on the New Sunway Many-core Supercomputer / 基于国产平台的国产地球系统模式
Our efforts are refactorizing and optimizing the Community Atmosphere Model (CAM) on the new Sunway supercomputer, which uses a many-core processor that consists of management processing elements (MPEs) and clusters of computing processing elements (CPEs). To map the large code base of CAM to millions of cores on the Sunway system, we take OpenACC-based refactorization as major tool, and apply source-to-source translator tools to generate the most suitable parallelism for the CPE cluster, and to fit the intermediate variable into the limited on-chip fast buffer. For single kernels, when comparing the original ported version using only MPEs and the refactorized version using both the MPE and CPE clusters, we achieve up to 22x speedup for the computer-intensive kernels. For the 25km resolution CAM global model, we manage to scale to 24,000 MPEs, and 1,536,000 CPEs, and achieve a simulation speed of 2.81 model years per day.
本工作主要基于神威超级计算机来完成公共大气模式CAM的代码重构与性能优化。为了将代码量巨大的CAM模式扩展到神威系统的百万计算核上,研究团队依托神威系统提供的OpenACC框架,对原始代码进行重构,设计了与神威系统计算、存储模型相匹配的计算代码,有效地提高了计算性能。与纯主核版本相比,同时使用主、从核的优化程序能取得22倍的性能提升。通过使用24,000个主核以及1,536,000个从核,全球范围25公里分辨率的模拟速度可以达到2.81模式年/天。
岛礁建设浮式平台总长100米级,可停靠万吨级船舶,具有土石方及各类建筑材料卸船、平台上重载汽车装运、经栈桥输送至礁盘、机械货物堆放与起吊、电力与燃料供应、施工人员食宿、淡水制造、污水处理等功能,可拖带至不同待建岛礁重复使用,
满足岛礁建设浮式平台针对西沙群岛近期建设及中、南沙群岛有关岛礁将来建设对输送物资上岛,船载土石料高效卸运、礁盘上永久基地的高效施工的工程需求。在科技部973项目“海洋超大型浮体复杂环境响应与结构安全性”和工信部高技术船舶科研项目“岛礁中型(总长 300 米级)浮式结构物关键技术研究”支持下,中国船舶重工集团公司第七〇二研究所进一步发展了可计及复杂海底地形影响的三维水弹性力学分析方法。
用户通过基于吴有生院士创立的三维水弹性理论而发展的可以考虑航速、频域二阶非线性、计及海底地形影响等因素的可视化成熟软件THAFTS,首次采用了数百万处理器核对近岛礁浮式平台和海洋超大型浮体三维水弹性问题进行了大规模并行计算。计算结果准确揭示了在近岛礁海底变化和波浪非均匀性影响下浮式平台的运动和载荷响应特性,所得数据可靠,并与试验结果相互验证,比较准确地评估了浮式平台在近岛礁复杂环境条件下的结构应力水平,具有重要的理论价值和工程实际意义。