Introducing the Cloud-Native Supercomputing Architecture – HPCwire

Historically, supercomputers were designed to run a single application and were confined to a small set of well-controlled users. With AI and HPC becoming primary compute environments for wide commercial use, supercomputers now need to serve a broad population of users and to host a more diverse software ecosystem, delivering non-stop services dynamically. New supercomputers must be architected to deliver bare-metal performance in a multi-tenancy environment.

The design of a supercomputer focuses on its most important mission: maximum performance with the lowest overhead. The goal of the cloud-native supercomputer architecture is to maintain these performance characteristics while meeting cloud services requirements: least-privilege security policies and isolation, data protection, and instant, on-demand AI and HPC services.

The data processing unit, or DPU, is an infrastructure platform thats architected and designed to deliver infrastructure services for supercomputing applications while maintaining their native performance. The DPU handles all provisioning and management of hardware and virtualization of servicescomputing, networking, storage, and security. It improves overall performance of multi-user supercomputers by optimizing the placement of applications and by optimizing network traffic and storage performance, while assuring quality of service.

DPUs also support protected data computing, making it possible to use supercomputing services to process highly confidential data. The DPU architecture securely transfers data between client storage and the cloud supercomputer, executing data encryption on behalf of the user.

The NVIDIA BlueField DPU consists of the industry-leading NVIDIA ConnectX network adapter, combined with an array of Arm cores; purpose-built, high-performance-computing hardware acceleration engines with full data-center-infrastructure-on-a-chip programmability; and a PCIe subsystem. The combination of the acceleration engines and the programmable cores enables migrating the complex infrastructure management and user isolation and protection from the host to the DPU, simplifying and eliminating overheads associated with them, as well as accelerating high-performance communication and storage frameworks.

By migrating the infrastructure management, user isolation and security, and communication and storage frameworks from the untrusted host to the trusted infrastructure control plane that the DPU is a part of, truly cloud-native supercomputing is possible for the first time. CPUs or GPUs can increase their compute availability to the applications and operate in a more synchronous way for higher overall performance and scalability.

The BlueField DPU enables a zero-trust supercomputing domain at the edge of every node, providing bare-metal performance with full isolation and protection in a multi-tenancy supercomputing infrastructure.

The BlueField DPU can host untrusted multi-node tenants and ensure that supercomputing resources used by one tenant will be handed over clean to a new tenant. As part of this process, the BlueField DPU protects the integrity of the nodes, reprovisions resources as needed, clears states left behind, provides a clean boot image for a newly scheduled tenant, and more.

HPC and AI communication frameworks such as Unified Communication X (UCX), Unified Collective Communications (UCC), Message Passing Interface (MPI), and Symmetrical Hierarchical Memory (SHMEM) provide programming models for exchanging data between cooperating parallel processes. These libraries include point-to-point and collective communication semantics (with or without data) for synchronization, data collection, or reduction purposes. These libraries are latency and bandwidth sensitive and play a critical role in determining application performance. Offloading the communication libraries from the host to the DPU enables parallel progress in the communication periods and in the computation periods (that is, overlapping) and reduces the negative effect of system noise.

BlueField DPUs include dedicated hardware acceleration engines (for example, NVIDIA In-Network Computing engines) to accelerate parts of the communication frameworks, such as data reduction-based collective communications and tag matching. The other parts of the communication frameworks can be offloaded to the DPU Arm cores, enabling asynchronous progress of the communication semantics. One example is leveraging BlueField for MPI non-blocking, All-to-All collective communication. The MVAPICH team at Ohio State University (OSU) and the X-ScaleSolutions team have migrated this MPI collective operation into the DPU Arm cores with the OSU MVAPICH MPI and have demonstrated 100 percent overlapping of communication and computation, which is 99 percent higher than using the host CPU for this operation.

Parallel Three-Dimensional Fast Fourier Transforms (P3DFFT) is a library used for large-scale computer simulations in a wide range of fields, including studies of turbulence, climatology, astrophysics, and material science. P3DFFT is written in Fortran90 and is optimized for parallel performance. It uses MPI for interprocessor communication and greatly depends on the performance of MPI All-to-All. Leveraging the OSU MVAPICH MPI over BlueField, the OSU and X-ScaleSolutions teams have demonstrated a 1.4X performance acceleration for P3DFFT.

1The performance tests were conducted by Ohio State University on the HPC-AI Advisory Councils Cluster Center, with the following system configuration: 32 servers with dual-socket Intel Xeon 16-core CPUs E5-2697A V4 @ 2.60GHz (total of 32 processors per node), 256GB DDR4 2400MHz RDIMMs memory, and 1TB 7.2K RPM SATA 2.5 hard drive per node. The servers were connected with NVIDIA BlueField-2 InfiniBand HDR100 DPUs and NVIDIA Quantum QM7800 40-port HDR 200Gb/s InfiniBand switch.

Extracting the highest possible performance from supercomputing systems while achieving efficient utilization has traditionally been incompatible with the secured, multi-tenant architecture of modern cloud computing. A cloud-native supercomputing platform provides the best of both worlds for the first time, combining peak performance and cluster efficiency with a modern zero-trust model for security isolation and multi-tenancy.

Learn more about the NVIDIA Cloud-Native Supercomputing Platform.

2021 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, BlueField, ConnectX, DOCA, and Magnum IO are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. All other trademarks are property of their respective owners.

ARM, AMBA and ARM Powered are registered trademarks of ARM Limited. Cortex, MPCore and Mali are trademarks of ARM Limited. ARM is used to represent ARM Holdings plc; its operating company ARM Limited; and the regional subsidiaries ARM Inc.; ARM KK; ARM Korea Limited.; ARM Taiwan Limited; ARM France SAS; ARM Consulting (Shanghai) Co. Ltd.; ARM Germany GmbH; ARM Embedded Technologies Pvt. Ltd.; ARM Norway, AS and ARM Sweden AB.

More:
Introducing the Cloud-Native Supercomputing Architecture - HPCwire

Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
[NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
“Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]

Cloud Hosting

Introducing the Cloud-Native Supercomputing Architecture – HPCwire

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin