April, 2023
One of the more secular trends in AI is the rapid datacenter architecture upgrade hyperscalers are undertaking to ensure exponentially larger amounts of compute are available at the same, if not lower, cost per unit. This is despite both Moore’s law and Denard scaling slowing.
Moore's law says that the number of transistors in an integrated circuit doubles every 24 months. Combined with Dennard scaling, this means that performance per joule doubles about every 18 months. This together is called Koomey’s Law. So far so good.
However, latest studies indicate that Koomey's Law has slowed to doubling every 2.6 years. This means, for exponentially higher amounts of compute, power consumption will be quite linear, resulting in a cost bottleneck for everything related to AI (e.g. training a model).
The increase in cost also makes intuitive sense given the extra components, which invariably also consume more energy, now required on a motherboard such as GPUs, accelerators like field programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs).
Moreover, these components need to work optimally in tandem, but usually don't. Simple example - a traditional server has 1 unit of CPU, GPU, and memory. But a particular workload needs 1 unit of CPU and 2 units of both GPU and memory, hence requiring a minimum of 2 servers.
However, 1 unit of CPU will be powered on but unused. Such suboptimal usage is the main reason why Koomey's law is showing signs of slowing. One answer to this compute versus cost conundrum is a form of datacenter architecture called "composable infrastructure".
Composable infrastructure is a system architecture that disaggregates some of the hardware of a traditional server into pools of resources to deliver a system that precisely matches the requirements of the workload it is intended to operate.
The story doesn't end there as there is one more issue - data latency. As you disaggregate and separate server components, you will need to ensure data moves fast and with high quality between these components. The data interconnect will need to be spectacular.
As a result, I/O technologies such as PCIe / CXL (USB's counterpart in the data center) as well as retimer chips made by BVP's portfolio company @kandoubus and needed to accurately reproduce source signals at the destination, are absolutely key for the lowering the costs of AI.