PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration
By Yue Jiet Chong, Yimin Wang, Zhen Wu and Xuanyao Fong
National University of Singapore, Singapore

Abstract
This paper presents a 3D-stacked chiplets based large language model (LLM) inference accelerator, consisting of non-volatile in-memory-computing processing elements (PEs) and Inter-PE Computational Network (IPCN), interconnected via silicon photonic to effectively address the communication bottlenecks. A LLM mapping scheme was developed to optimize hardware scheduling and workload mapping. Simulation results show it achieves 3.95× speedup and 30× efficiency improvement over the Nvidia A100 before chiplet clustering and power gating scheme (CCPG). Additionally, the system achieves further scalability and efficiency improvement with the implementation of CCPG to accommodate larger models, attaining 57× efficiency improvement over Nvidia H100 at similar throughput.
Index Terms: LLM Inference, Hardware Accelerator, HW-SW Co-design
To read the full article, click here
Related Chiplet
- Interconnect Chiplet
- 12nm EURYTION RFK1 - UCIe SP based Ka-Ku Band Chiplet Transceiver
- Bridglets
- Automotive AI Accelerator
- Direct Chiplet Interface
Related Technical Papers
- ChipAI: A scalable chiplet-based accelerator for efficient DNN inference using silicon photonics
- Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets
- Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM
- The Evolution of Photonic Integrated Circuits and Silicon Photonics
Latest Technical Papers
- LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design
- 3D-ICE 4.0: Accurate and efficient thermal modeling for 2.5D/3D heterogeneous chiplet systems
- Compass: Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads
- Chiplet technology for large-scale trapped-ion quantum processors
- REX: A Remote Execution Model for Continuos Scalability in Multi-Chiplet-Module GPUs