2024年5月11日 星期六

HBM introduction

The special memory which powering deep learning revolution & AI

MI300: AI acceralator which use HBM

HBM does not refer to a special type of dynamic RAM memory cell or special chip.Rather it is a standard administered by the Joint Electron Device Engineering Council or JEDEC - for interfacing the DRAM and the compute. HBM introduces the concept of stacking the DRAM dies and running many independent memory channels through the stack. The goal is to provide very high data rate transfers for more advanced computing applications like those for AI which are more amenable to parallelism. The vendor can also choose to add a logic base die - as is  shown in the JEDEC spec diagram. This additional logic is for redistributing signals, test logic, and other external commnuncations. JEDEC's standard defines for the manufacturer how the HBM system has to work and what features they need to support. It helps with the commands and signals sent through the memory channel in the DRAM stack, for instance. But it does not specify how the vendor might structure the stack nor does it specify things outside the stack. This allows various HBM vendors in the memory industry to offer a differentiated products. HBM buyers can put your HBM stack right on the top of your CPU or GPU die. Or do what AMD/NVDIA which is to add a silicon interposer to connect multiple HBM die stacks to your GPU-like a PCB. JEDEC also administers other standards like DDR, LPDDR. 

    DDR is the standard you might be most familiar with- used for general purpose memory modules like those you put into your PC. The traditional memory interface standard for graphics cards is GDDR or Graphics Double Data Rate. Right now they are on their sixth generation-GDDR6. There are few gaming cards using it. Why do we need another standard? There are a few things that make it less suitable for heavy AI processing. First, while each new GDDR standard does feature higher data rates, they also employ a point-to-point connection. This means that each memory channel connects to just one module of memory. This is in contrast to HBM, where channels run through all the modules in the stack. GDDR's single channel makes it harder to scale the system's total memory capacity because it means we have to scale that single module's memory capacity which essentially requires us to shrink down DRAM cells even more which is hard, I did a whole video about this while ago discussing the 3D DRAM cell. 

    Future shrinks seem to require new capacitor structures like the pillar which may or may not be actually manufacturable. By vertically stacking the modules, HBM makes it conceptually easy to raise the memory capacity. There are also some physical size gains as well when it 

沒有留言:

張貼留言

Reflow soldering process

 Solder paste is used to temporary attach to the anywhere to all the contact pads, after which the assembly is subjected to the controlled h...