Skip to main content

What kind of chip does ChatGPT USE?

Recently, the ChatGPT-led generative model has become a new hot spot in AI, with Microsoft and Google in Silicon Valley investing heavily in such technologies (Microsoft has a $10 billion stake in OpenAI, the company behind ChatGPT, and Google has recently released its own BARD model), while Internet technology companies in China, represented by Baidu and others, have also indicated that they are developing such technologies and will go live in the near future. In China, Baidu and other Internet technology companies have also indicated that they are developing such technologies and will go live in the near future.

 


The generative models represented by ChatGPT have a common feature, that is, they use massive data for pre-training, and are often paired with a more powerful language model. The main function of the language model is to learn from the massive existing corpus, and after learning, it can understand the user's linguistic instructions, or furthermore, generate relevant text output according to the user's instructions.

 

Generative models can be broadly classified into two categories, one is language-based generative models and the other is image-based generative models. The language-based generative model is represented by ChatGPT, whose language model can not only learn to understand the meaning of user instructions, e.g., write a poem, Li Bai style, but also generate relevant text based on user instructions after massive data training. This means that ChatGPT needs to have a Large Language Model, LLM, large enough to understand the user's language and to be able to have high-quality language output - for example, the model must be able to understand how to generate poetry, etc. This also means that a large language model in language-based generative AI needs a very large number of parameters to be able to do this kind of complex learning and remember so much information. ChatGPT, for example, has 175 billion parameters (using standard floating point numbers would take up 700 GB of storage space), which shows how "big" its language model is.

 

An example of ChatGPT generating responses

 


Another class of generative models is the image-based generative models represented by Diffusion models, typically Dalle from OpenAI, ImaGen from Google, and the most popular Stable Diffusion from Runway AI. These image-like generation models also use a language model to understand the user's linguistic commands and then generate high-quality images based on those commands. Unlike the language-based generation models, the language model used here mainly uses language to understand user input without generating language output, so the number of parameters can be quite small (in the order of a few hundred million), while the number of parameters for the image diffusion model is relatively small, in the order of a few billion overall, but the computational effort is not small, because the resolution of the generated images or videos can be very high.

 



An example of an image generated by an image generation model

 

Generative models can produce unprecedented high-quality output through massive data training, and there are already a number of clear application markets, including search, conversational bots, image generation, and editing, etc. More applications are expected in the future, which also puts demand for related chips.

 

The need for chips for generating class models

 

As mentioned earlier, ChatGPT represents a generative model that needs to learn from large amounts of training data in order to achieve high-quality generative output. In order to support efficient training and inference, generative models have their own requirements for related chips.

 

The first is the need for distributed computation; the number of parameters for language generative models such as ChatGPT is in the hundreds of billions, and it is almost impossible to use single-computer training and inference, but a lot of distributed computation must be used. In distributed computing, the data interconnection bandwidth between machines and the computing chip for such distributed computing (such as RDMA) has a great demand, because often the bottleneck of the task may not be in computing, but in the data interconnection above, especially in this kind of large-scale distributed computing, the chip for the efficient support of distributed computing has become more critical.

 

Next is the memory capacity and bandwidth. Although distributed training and inference are inevitable for language-based generative models, the local memory and bandwidth of each chip will largely determine the execution efficiency of a single chip (because each chip's memory is used to its limit). For image-based generative models, it is possible to put the models (around 20GB) all in the memory of the chip, but as image-based generative models evolve further in the future, it is likely that its memory requirements will also increase further. From this perspective, ultra-high-bandwidth memory technology represented by HBM will become the inevitable choice for related accelerator chips, while the generative class models will also accelerate HBM memory to further increase capacity and bandwidth. In addition to HBM, new storage technologies such as CXL coupled with software optimizations will also have the potential to increase the capacity and performance of local storage in such applications and are estimated to gain more industrial adoption from the rise of the generative class model.

 

Finally, in computation, both language-based and image-based generative class models have a large computational demand, and image-based generative models may have a much higher demand for arithmetic power as they generate higher and higher resolutions and move toward video applications - current mainstream image generative models have a computational volume of around 20 TFlops, and as towards high resolution and images, 100-1000 TFLOPS of arithmetic demand is likely to be the norm.

 

To sum up, we believe that the requirements of generative models for chips include distributed computing, storage, and computation, which can be said to involve all aspects of chip design, and more importantly, how to combine all these requirements together in a reasonable way to ensure that a single aspect does not become a bottleneck, which will also become a chip design system engineering problem.

 

GPU and the new AI chip, who has a better chance

 

Generative models have a new demand for chips. Who has a better chance to capture this new demand and market for GPUs (represented by Nvidia and AMD) and new AI chips (represented by Habana, and GraphCore)?

 

First, from the perspective of language-based generative models, GPU vendors that currently have a complete layout in this kind of ecology are more advantageous because of the huge number of participants and the need for well-distributed computing support. This is a system engineering problem that requires a complete software and hardware solution, and in this regard, Nvidia has combined its GPUs to launch the Triton solution, which supports distributed training and distributed inference, allowing a model to be divided into multiple parts and processed on different GPUs, thus solving the problem of too many parameters that cannot be accommodated by the main memory of one GPU. This solves the problem of too many parameters for one GPU's main memory. Whether you use Triton directly or do further development on the basis of Triton in the future, it is more convenient to have a complete ecological GPU. From a computational point of view, since the main computation of the language-based generation model is matrix computation, which is the strength of the GPU, the new AI chip does not have an obvious advantage over the GPU from this point of view.

 

From the point of view of image-based generation models, the number of parameters of such models is also large but one to two orders of magnitude smaller than the language-based generation models, in addition to its calculation will still be used in a large number of convolutional calculations, so inference applications, if you can do a very good optimization, AI chips may have some opportunities. Here the optimization includes a large amount of on-chip storage to accommodate parameters and intermediate calculation results, for convolution and efficient support of matrix operations.

 

In general, the current generation of AI chips is designed to target smaller models (number of parameters at the billion level, computation at the 1TOPS level), while the demand for generative models is still relatively larger than the original design target. GPUs are designed to be more flexible at the cost of efficiency, while AI chips are designed to do the opposite, pursuing the efficiency of the target application. Therefore, we believe that GPUs will still dominate such generative model acceleration in the next year or two, but as generative model designs become more stable and AI chip designs have time to catch up with generative model iterations, AI chips have the opportunity to surpass GPUs in the generative model space from an efficiency perspective.

 

Comments

Popular posts from this blog

The biggest problem with the latest 56 semiconductor manufacturers suspend orders, price increased, and long lead time. How can you fix it?

 Following the suspension of MCU orders by ELAN , Holtek Semiconductor issued a notice on 21st April suspending orders with immediate effect. The price of various semiconductors, especially MCUs, has risen as a result of factors such as the tightness of 8-inch wafer foundries. The demand for MCUs is so high that many major MCU manufacturers at home and abroad are operating at full capacity, but supply still exceeds demand. In its notice, Holtek  Semiconductor stated that Suspension of orders for 2022 Subject: Orders with delivery dates in 2022 are suspended with immediate effect.   Description: 1. The wafer fabs and packaging houses have advised that there will be another wave of price increases soon - price increases of 15%-30%. 2. The fabs are expected to provide 2022 production numbers by early May and will announce 2022 order acceptance rules when confirmed.   3. expected to resume accepting orders for 2022 by mid-May. 4. 2022 orders that have received deposits will be rescheduled

Understanding of DC-DC buck bootstrap circuit

In the peripheral circuit design of DC-DC BUCK chips, we usually add capacitors or a combination of capacitors + resistors between the BOOT and SW pins, this piece of circuit is called bootstrap circuit, the capacitors and resistors in the bootstrap circuit are called bootstrap capacitors and bootstrap resistors.   What is a bootstrap capacitor?   DCDC Buck chip has a pin called BOOT, and some are called BST. The following is an explanation of the BOOT pin of a DCDC chip. In the external circuit design, a capacitor, generally 0.1uF, is needed between the BOOT and SW pin, and is connected to the driver end of the high-end MOS tube of DCDC.     How does a bootstrap capacitor work?   The following is a block diagram of a DCDC BUCK chip, the top NMOS is called the high-side MOSFET and the bottom NMOS is called the low-side MOSFET.     When the high side MOS tube is turned on, SW is VIN, SW charges and stores energy in the inductor, and the inductor current is rising; when the low side MOS

2022 global chip shortage continues: ST、NXP、ADI、XILINX、ONSEMI、DIODES... latest trends

Under the influence of the epidemic and various emergencies around the world, the global chip industry fell into a shortage of stock in 2021. Now it has been a year, and the shortage of chips seems to have not eased. Below, we have collated the latest market developments of the original chip manufacturers such as ST, Renesas, NXP, ADI, ON Semiconductor, Microchip, Qualcomm, etc., so that you can have a good idea of the market situation. ST: Large shortage of high-end products and automotive chips Most stockists have been selling off since the prices of ST  products have fallen back, but this month has seen a small rebound. For example, STM8S003F3P6TR  and STM32F103VCT6 , two pieces of material, have seen a small price increase. It is worth noting that the market price of ST's brake system chips has recently soared, and other automotive chips have also risen, and there is still a large shortage of high-end products and automotive chips, and delivery times are still long.   In additi

Teach you 5 ways to identify and avoid counterfeit electronic components in a second

In the process of purchasing electronic components, the most worrying thing for buyers is not the price, but the quality of the product. There are a variety of IC chips on the market, of all kinds, and without paying attention to the distinction, it is sometimes difficult to see the difference between various materials, whether it is true or false, new or refurbished. The following is a compilation of some methods to identify genuine and fake chips, for your reference. Common chip counterfeiting methods Material A counterfeit material A Original manufacturer's tailor loose material: the original packaging has been disassembled or is no longer available, but product functionality and yield may be lower due to storage time or handling process, etc. Original manufacturer scrap or defective products: mainly products that have not passed factory inspection by the original manufacturer, such as scrap products after reliability testing, poor packaging quality, bad test products, etc. Orig

9 effective ways to improve your electronic components specification for approval

1. Let purchasers find manufacturers to provide specifications, safety information, environmental protection information, insurance information - E-document 2. Verify that the information is true and complete, such as the applicable period of the document, so as not to be fooled by the manufacturer. 3. Let the buyer find the manufacturer to provide samples, specifications, safety information, environmental information, insurance information - Paper files 4. Environmental test: send several samples to the environmental laboratory for ROSH halogen test. 5. Electrical specifications and high-temperature testing - e.g. electrolytic capacitors :  A: measuring capacity and deviation withstand voltage, PIN foot tin is good B: go through the production line to see if the capacitors are deformed after the high temperature of the furnace and if the capacity and the deviation voltage are okay. 6. Body size check - e.g. electrolytic capacitors . Body height and diameter, PIN pin spacing, PIN pin d