Thermal-Aware GPU-Based Design Engine for On-Chip Power Delivery in Power-Efficient Multi-Core Chips



Peng Li


National Science Foundation (NSF)

ECCS Division of Electrical, Communications and Cyber Systems

ENG Directorate for Engineering

NSF Award #: 0903485

Semiconductor Research Corporation (SRC)



September 1, 2009 – August 31, 2012

NSF Project Site:




The objective of this research is to address the computational challenges in multi-core power distribution design by leveraging recent advances in single-instruction multiple-data (SIMD) graphics processing units (GPUs). The approach is to develop a massively parallel GPU-accelerated design engine to facilitate the analysis, design and verification of power-gated multi-core on-chip power delivery networks encompassing both electrical and thermal integrity issues.

Intellectual Merit: Aggressive fine-grained power gating is essential to pushing the performance vs. power envelope of current and future multi-core chip designs. This need introduces significant challenges in the design and verification of power delivery networks under complex power gating scenarios. While the recent GPU advances provide a potentially promising computing solution, the effective use of such SIMD compute power requires rethinking computer-aided design. In this work, GPU-specific computing paradigms, algorithms and implementations will be developed to address multi-core power distribution design and associated full-chip thermal challenges via efficient parallel computing on low-cost SIMD graphics processors.

Broader Impacts: This work exploits recent SIMD GPU based massively parallel platforms for addressing CAD challenges. The acquired experience is likely to contribute to computing advances in other science and engineering fields. The PI will promote the research participation from undergraduate students and students from underrepresented groups. The outcomes of this work will be integrated into the PI's graduate-level VLSI courses to provide educational and research experiences to students. The developed algorithms and methodologies will be disseminated in the research community at large and major semiconductor and EDA companies for potential industrial application.


Research Highlights 

Large-scale GPU-accelerated Power Grid Analysis

We envision that the recent emergence of general-purpose GPU (graphics processing units) computing platforms offer interesting opportunities in leveraging these powerful SIMD (single instruction multiple data) machines for addressing VLSI CAD challenges. However, to be successful, one has to expose properly the “hidden” data parallelism in the application and develop “smart” algorithms/implementations to penetrate the SIMD barrier of the GPU platform.

The challenging task of analyzing on-chip power (ground) distribution networks with multi-million node complexity and beyond is key to today’s large chip designs. For the first time, we show how to exploit recent massively parallel GPU platforms to tackle power grid analysis with promising performance. Several key enablers including GPU-specific algorithm design, circuit topology transformation, workload partitioning, performance tuning are embodied in our GPU-accelerated hybrid multigrid algorithm, GpuHMD, and its implementation. In particular, a proper interplay between algorithm design and SIMD architecture consideration is shown to be essential to achieve good runtime performance. Different from the standard CPU based CAD development, care must be taken to balance between computing and memory access, reduce random memory access patterns and simplify flow control to achieve efficiency on the GPU platform.

As shown in the following figure, to efficiently accelerate power grid analysis on SIMD GPUs, we first approximate a mostly regular real-life multi-layer power grid as a fully regular grid, and then solve such approximation very efficiently on the GPU by leveraging its high efficiency in processing regular data and flows. To ensure the overall analysis accuracy, the solution returned from the GPU is checked on the CPU host based on the original power grid; this checking processing is computationally straightforward and can be done easily on the CPU by leveraging its flexibility in processing irregular data structure and flows. The computed residue is returned to the GPU solver to compute the correction to the solution. A few such GPU-CPU approximate-and-correct iterative steps are performed to reach a fully converged solution. As such, the strengths of both general purpose CPU and SIMD GPU hardware are exploited simultaneously, leading to the overall high-efficiency of the solver [ Feng and Li, ICCAD’08 ].   


The accuracy of one GPU solve is illustrated in the following figure, where the GPU solution is compared with the exact solution by a direct solve for the steady-state response of an IBM test case.  The close match of the two solutions shows the effectiveness of the regular grid approximation. Typically, 2-3 GPU-CPU iterations are sufficient to reach the full solution convergence.


Extensive experiments on industrial and synthetic benchmarks have shown that for DC power grid analysis using one GPU, the proposed simulation engine achieves up to 100X runtime speedup over a state-of-the-art direct solver and more than 50X speedup over the CPU based multigrid implementation, while utilizing a four-core-four-GPU system, a grid with eight million nodes can be solved within about one second. It is observed that the proposed approach scales favorably with the circuit complexity, at a rate about one second per two million nodes on a single GPU card.


Related Publications 


Note: Supervised students are delineated with an asterisk (*).


[DAC10] *Zhiyu. Zeng, *Xiaoji Ye, Zhuo Feng and Peng Li, “Tradeoff analysis and optimization of power networks with on-chip voltage regulation,” in Proc. of ACM/IEEE Design Automation Conf., June 2010 (acceptance rate 24.4%).


[ICCAD08 ] *Zhuo Feng and Peng Li, "Multigrid  on GPU: tackling power grid analysis on parallel SIMT platforms ,” in Proc. of IEEE/ACM Int. Conf. on Computer-Aided Design, pp. 647-654, November 2008, (acceptance rate 26.6%),  (Best paper award nomination, 14 out of 458 submissions, 3%).



Acknowledgement and Disclaimer 


This material is based upon work supported by the National Science Foundation under Grant No. 0903485 and the Semiconductor Research Corporation.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Copyright by Peng Li.