Articles

Gartner: Top trends in energy-efficient generative AI compute systems

3 Mins read
Generative AI compute systems

The rapid expansion of compute infrastructure to support training for generative AI (GenAI) has created a significant electrical power availability challenge. Gartner predicts that by 2027, more than 40% of current data centers deploying AI workloads will be constrained by electrical power. This power problem creates business challenges for data center operations, including increased costs and poorer sustainability performance, all of which will eventually impact data center operators’ customers and end users. However, a Gartner survey shows that innovation in semiconductors will contribute to solving 35% of the GenAI power issue.

Product leaders and companies reliant on GenAI must incorporate the following trends to deploy energy-efficient GenAI solutions.

Advancing AI Compute Chip Roadmap Despite Challenges

Semiconductor technology has driven innovation for decades in consumer electronics and high-performance computing. Today, the stakes are higher due to the massive increase in compute demand from generative AI (GenAI).  Hardware remains the focal point, and market and investor scrutiny on semiconductor compute’s contribution to this problem is at an all-time high.

The cost of semiconductor production at the leading edge has stifled innovation across the industry and limited the number of companies that can afford to access and experiment with it. Gartner predicts by 2035, a lack of material business opportunity, along with pricing pressure, will reduce the number of viable chip manufacturers with capability and scale to deliver leading-edge AI chips.

Product leaders must consider three potential scenarios regarding the AI compute chip technology roadmap and adjust their product/service portfolio and planning accordingly.

  • Market Consolidation: The already concentrated market of chipmakers involved in leading-edge logic manufacturing may witness further consolidation due to a lack of material business opportunities and pricing pressure.
  • Emergence of New Players: New players may emerge driven by nations pursuing semiconductor self-reliance or existing players with mature nodes pushing forward, opening new avenues despite initial hiccups.
  • Alternative Technologies: Alternative materials or technologies might enable scaling below 7 nm at lower costs with the current manufacturing setup. However, this would require close monitoring of academic and other research and early partnerships to gain access to new technology.

To address these challenges, companies should achieve cost reduction with faster time to market by leveraging AI-based chip designs with design technology co-optimization (DTCO) and system technology co-optimization (STCO). Additionally, maximizing energy efficiency in AI chips for compute by embracing new technologies on the horizon, such as wafer-scale processing or those in R&D like analog computing, is crucial. Optimizing compute performance by incorporating techniques like energy-proportional computing and low-arithmetic precision computing will also be important.

Solving the Memory Bottleneck Is the Key Differentiator

Gartner predicts by 2030, the failure to commercialize a suitable memory architecture or technology will be the biggest constraint on these systems.

Currently, memory latency and bandwidth are limiting AI system performance, creating a significant barrier exacerbated by the weakening of Moore’s Law and worsening signal integrity from faster compute logic chips. This issue is particularly severe in highly parallel systems used for AI, real-time, or analytics applications, where high energy costs are associated with on-chip and off-chip data movement and memory access. With emerging generative AI models requiring vast amounts of data for training, the associated memory cost and performance have become bottlenecks, necessitating new architecture designs for exponential improvement. Techniques like processing near memory with high-bandwidth memory (HBM) and graphics double data rate have been adopted to tackle these issues, but HBM currently faces cost and supply constraints.

Product leaders and enterprises must take several actions to address these challenges effectively. First, they should minimize end-user risks by collaborating with software developers to accelerate the adoption of processing in memory (PIM), where processing functionality and memory are combined into the same logical design, enabling logical operations to be directly applied to data stored within the memory array. Additionally, designing computer express link (CXL) architectures can free up memory resources from less-performant workloads, enabling workload consolidation and operational efficiencies.

Author bio: Gaurav Gupta, VP Analyst at Gartner

Read next: Gartner: Effective governance strategies for government CIOs

Leave a Reply

Your email address will not be published. Required fields are marked *

35 ÷ 5 =