Fungible can solve the public cloud Hotel California problem Blocks and Files – Blocks and Files

DPU startup Fungible recently briefed Blocks & Files on its views regarding Intels Infrastructure Processing Unit. Co-founder and CEO Pradeep Sindhu said it was inadequate as a data centre cost saver, lacked imagination, and couldnt help solve the public clouds trillion-dollar paradox, also known as the Hotel California problem.

Intel launched its data processing unit (DPU) line recently, differentiating itself from DPU suppliers such as Fungible and Pensando, by calling its product an Infrastructure Processing Unit (IPU). While Fungible would agree processing infrastructure-centric instructions is the key need, it thinks Intels DPU vision is painfully inadequate and provides much smaller TCO benefits.

The basic difference between the Intel and Fungible approaches, in Blocks & Filess view, is that Intel is focussed on incremental improvements using Smart NICs such as Nvidias BlueField where Fungible is looking for up to 12x improvement.

Intel says it is the DPU market leader by virtue of its sales to hyperscaler data centre operators. Such operators have thousands of servers and switches, and tens of thousands of storage drives. They sell compute cycles, so any percentage gain in compute cycle capacity is worthwhile. A ten to 20 per cent improvement obtained by offloading network, security and storage functions to a SmartNIC means a significant increase in server utilisation, and hence revenues.

Sindhu mentioned Amazons Annapurna approach, in which Amazon took infrastructure processes that were running on X86 and put them on Arm. The Arm CPUs were not as powerful as X86 processors but, at $5/core, were much lower cost than X86 which cost $40/core an 8x improvement. By making that switch, AWS freed up X86 compute cycles, which it could sell to customers.

But Fungible wants more. Sindhu told us that he believes the infrastructure server processing burden is becoming a much bigger component of overall server CPU cycle utilisation. As application server populations in data centres increase and as non-server processing resources such as GPUs increase as well the amount of internal-to-the-data-centre infrastructure processing skyrockets. Sindhu said there is much more east-west data movement in such data centres than north-south data movement.

That means more and more network, storage and security processing tasks are the result of such data movement within and across the infrastructure. It is, he thinks, pointless for application servers to execute the myriad repetitive instructions needed for this work. Yes, it should be offloaded from the servers but not in a small-scale, incremental way with SmartNICs.

In our view, that is like putting lipstick on a pig.

Sindhu said The DPU concept is broader than Intels IPU concept, but he is happy to have Intel and Nvidia validating our vision.

He said Intels IPU concept lacks imaginative vision because Fungibles DPUs can do so much more than an Intel IPU. One example: legacy applications are compute-centric, so user-initiated computations should be executed close to where user data is stored. SQL primitives can be executed directly on DPUs (with SSDs plugged into them) with little data movement.

Sindhu talked about a second example: machine learning involves parameter-serving problems in which previous GPU results are put into Comma and the results distributed. These computations are best done in the network and on-the fly, he said. In other words, on the Fungible DPU.

What Intel actually has is an accelerator (crypto and others), not a DPU, plus a bunch of vanilla Atom cores. Just integrating them on the same die will not solve the problem.

DPU computations have four characteristics:

What is needed is a more or less complete offload of infrastructure-centric processing to dedicated processing chips, tailor-made with instructions and architecture specifically designed for IT infrastructure processing. Fungibles view is that its specialised CPU can process these infrastructure-centric computations far more efficiently than anything else out there.

In effect, an infrastructure-focussed data processing system, using DPUs, is deployed inside a data centre as a central hub with application CPUs using GPUs to carry out processing tasks that distract from their main purpose: running application code.

Sindhu said Nvidias Bluefield SmartNIC is a hardware implementation of NVMe-over Fabrics RoCE (RDMA over Converged Ethernet) protocol, with a ConnectX 5 chip (ROCE v2) and 8 and then 16 Arm cores. The issue he said, is that for RoCE its fine but for other computations its just another general purpose CPU. In short, its inefficient.

Intel and Nvidia are using 7nm designs, yet we beat the pants off them with our 16nm chip because we have a better architecture. Well move to 7nm and then on to 5nm, and no-one will be able to catch us.

Our FS1600 does close to 15 million IOPS. If you use two Xeons with the same SSDs youll get one million IOPS if youre lucky We can see our way 20 million IOPS with software improvements.

Fungible claims that hyperscalers and near-hyperscaler data centres can realise a 12x improvement in TCO by using its specialised chips and software. That consists of 4x improvement from eliminating resource silos and then 3x from improving efficiency.

A Fungible slide shows these two aspects of Fungibles TCO claim. The 3x efficiency improvement claim is shown by taking a nominal $100 of existing IT network, compute and storage infrastructure spend and claiming that the equivalent Fungible infrastructure spend spend with DPU column would be $36. Thats a 64 per cent reduction. Put another way, a $10M infrastructure spend without Fungible would be a $3.6M spend with Fungible a saving of $6.4M.

The slide includes a middle Spend Smart NIC column, which is where Intels IPU-based infrastructure would fit. It partially offloads the host server CPUs but doesnt affect the network and storage spend elements, resulting in a ten per cent TCO reduction a $1M saving in the case of the $10M infrastructure spend example above, with Fungible saving you $5.4M more..

The rather large storage spend saving with Fungible $40 down to $8 is from its data reduction and erasure coding, both meaning less capacity is needed for the same number of raw terabytes.

Sindhu says overall enterprise data centre IT equipment utilisation is less than eight per cent. AWSs data centre utilisation is 32 per cent. He says: Fungible can bring 10x better efficiency to enterprise data centres and 2 to 2.5x better efficiency than hyperscalers.

The Fungible marketing message is that its products let customers operate a data centre with higher efficiency than the hyperscalers themselves. That means tier-1 and tier-2 data centre operators are likely prospects for Fungible.

Equinix and other colocation centres are examples of potentially good targets for Fungible. But a lot of Equinix revenues comes from hyperscalers themselves, with systems such as direct connect brokerage. Equinix might not want to bite the hands that feed it.

Toby Owen, Fungibles VP for Product, said: I think a service provider (SP) would be a model for us that has a huge potential. A whole bunch of customers would rent AI/ML resources from SPs.

Sindhu referred to the Andreessen Horowitz Trillion Dollar Paradox article by Sarah Wang and Martin Lomax:

This said that cloud IT resources were cheap when companies started using them, but quite rapidly become high-cost after a few years. The chart above shows worldwide spending on on-premises data centres and the public cloud from 2010 to 2020, with the cloud growing past the on-premises spend.

Cloud spend puts so much pressure on a companys margins that its share price falls and its market capitalisation suffers. The article explains: Its becoming evident that while cloud clearly delivers on its promise early on in a companys journey, the pressure it puts on margins can start to outweigh the benefits, as a company scales and growth slows. Because this shift happens later in a companys life, it is difficult to reverse as its a result of years of development focused on new features, and not infrastructure optimisation.

Sindhu says that large public cloud customers end up in Hotel California:You can check out any time you like, but you can never leave!*

Wang and Casados conclusion is: We show (using relatively conservative assumptions) that across 50 of the top public software companies currently utilising cloud infrastructure, an estimated $100B of market value is being lost among them due to cloud impact on margins relative to running the infrastructure themselves.

Sindhus message is that a large business can save extensive amounts of money by repatriating public cloud IT infrastructure spend to its on-premises data centres, and equipping them with Fungibles DPU hardware and software. That way they get better than public cloud data centre efficiency and boost their market capitalisation even more.

* Hotel California lyrics Cass County Music, Red Cloud Music, Fingers Music. Songwriters: Don Henley, Glenn Frey, Don Felder. Source:Musixmatch

Read the original here:
Fungible can solve the public cloud Hotel California problem Blocks and Files - Blocks and Files

Related Posts

Comments are closed.