Hardware for artificial intelligence
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Specialized computer hardware is often used to execute artificial intelligence (AI) programs faster, and with less energy, such as Lisp machines, neuromorphic engineering, event cameras, and physical neural networks. Since 2017, several consumer grade CPUs and SoCs have on-die NPUs. As of 2023, the market for AI hardware is dominated by GPUs.[1]
As of the 2020s, AI computation is dominated by graphics processing units (GPUs) and newer domain-specific accelerators such as Google’s Tensor Processing Units (TPUs), AMD’s Instinct MI300 series, and various on-device neural-processing units (NPUs) found in consumer hardware.[2][3]
Scope
[edit]For the purposes of this article, AI hardware refers to computing components and systems specifically designed or optimized to accelerate artificial-intelligence workloads such as machine-learning training or inference. This includes general-purpose accelerators used for AI (for example, GPUs) and domain-specific accelerators (for example, TPUs, NPUs, and other AI ASICs).[4]
Event-based cameras are sometimes discussed in the context of neuromorphic computing, but they are input sensors rather than AI compute devices. Conversely, components such as memristors are basic circuit elements rather than specialized AI hardware when considered alone.[5][6]
Lisp machines
[edit]
Lisp machines were developed in the late 1970s and early 1980s to make artificial intelligence programs written in the programming language Lisp run faster.
Dataflow architecture
[edit]Dataflow architecture processors used for AI serve various purposes with varied implementations like the polymorphic dataflow[7] Convolution Engine[8] by Kinara (formerly Deep Vision), structure-driven dataflow by Hailo,[9] and dataflow scheduling by Cerebras.[10]
Component hardware
[edit]AI accelerators
[edit]
Since the 2010s, advances in computer hardware have led to more efficient methods for training deep neural networks that contain many layers of non-linear hidden units and a very large output layer.[11] By 2019, graphics processing units (GPUs), often with AI-specific enhancements, had displaced central processing units (CPUs) as the dominant means to train large-scale commercial cloud AI.[12] OpenAI estimated the hardware compute used in the largest deep learning projects from Alex Net (2012) to Alpha Zero (2017), and found a 300,000-fold increase in the amount of compute needed, with a doubling-time trend of 3.4 months.[13][14]
General-purpose GPUs for AI
[edit]Since the 2010s, graphics processing units (GPUs) have been widely used to train and deploy deep learning models because of their highly parallel architecture and high memory bandwidth. Modern data-center GPUs include dedicated tensor or matrix-math units that accelerate neural-network operations.
In 2022, NVIDIA introduced the Hopper-generation H100 GPU, adding FP8 precision support and faster interconnects for large-scale model training.[15] AMD and other vendors have also developed GPUs and accelerators aimed at AI and high-performance computing workloads.[16]
Domain-specific accelerators (ASICs / NPUs)
[edit]Beyond general-purpose GPUs, several companies have developed application-specific integrated circuits (ASICs) and neural processing units (NPUs) tailored for AI workloads. Google introduced the Tensor Processing Unit (TPU) in 2016 for deep-learning inference, with later generations supporting large-scale training through dense systolic-array designs and optical interconnects.[17] Other vendors have released similar devices—such as Apple’s Neural Engine and various on-device NPUs—that emphasize energy-efficient inference in mobile or edge computing environments.[18]
Memory and interconnects
[edit]AI accelerators rely on fast memory and inter-chip links to manage the large data volumes of training and inference. High-bandwidth memory (HBM) stacks, standardized as HBM3 in 2023, provide terabytes-per-second throughput on modern GPUs and ASICs.[19] These accelerators are often connected through dedicated fabrics such as NVIDIA’s NVLink and NVSwitch or optical interconnects used in TPU systems to scale performance across thousands of chips.[20]
Sources
[edit]- ^ "Nvidia: The chip maker that became an AI superpower". BBC News. 25 May 2023. Retrieved 18 June 2023.
- ^ "NVIDIA H100 Tensor Core GPU Architecture Whitepaper". NVIDIA. 2022. Retrieved 4 November 2025.
- ^ "Google Cloud TPU v5 Announcement". Google Cloud Blog. 2023. Retrieved 4 November 2025.
- ^ Sze, Vivienne; Chen, Yu-Hsin; Yang, Tien-Ju; Emer, Joel (2017). "Efficient Processing of Deep Neural Networks: A Tutorial and Survey". Proceedings of the IEEE. 105 (12): 2295–2329. doi:10.1109/JPROC.2017.2761740. Retrieved 4 November 2025.
- ^ Gallego, Guillermo (2022). "Event-based Vision: A Survey" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2020.3008413. Retrieved 4 November 2025.
- ^ Strukov, D. B.; Snider, G. S.; Stewart, D. R.; Williams, R. S. (2008). "The Missing Memristor Found". Nature. 453: 80–83. doi:10.1038/nature06932. Retrieved 4 November 2025.
- ^ Maxfield, Max (24 December 2020). "Say Hello to Deep Vision's Polymorphic Dataflow Architecture". Electronic Engineering Journal. Techfocus media.
- ^ "Kinara (formerly Deep Vision)". Kinara. 2022. Retrieved 2022-12-11.
- ^ "Hailo". Hailo. Retrieved 2022-12-11.
- ^ Lie, Sean (29 August 2022). Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning. Cerebras (Report). Archived from the original on 15 March 2024. Retrieved 13 December 2022.
- ^ Research, AI (23 October 2015). "Deep Neural Networks for Acoustic Modeling in Speech Recognition". AIresearch.com. Retrieved 23 October 2015.
- ^ Kobielus, James (27 November 2019). "GPUs Continue to Dominate the AI Accelerator Market for Now". InformationWeek. Retrieved 11 June 2020.
- ^ Tiernan, Ray (2019). "AI is changing the entire nature of compute". ZDNet. Retrieved 11 June 2020.
- ^ "AI and Compute". OpenAI. 16 May 2018. Retrieved 11 June 2020.
- ^ "NVIDIA H100 Tensor Core GPU Architecture". NVIDIA. 2022. Retrieved 4 November 2025.
- ^ "AMD Instinct MI300X Accelerator". AMD. 2024. Retrieved 4 November 2025.
- ^ "Introducing Cloud TPU v5p and the AI Hypercomputer". Google Cloud Blog. 6 December 2023. Retrieved 4 November 2025.
- ^ "Apple Neural Engine". Apple Machine Learning Research. Retrieved 4 November 2025.
- ^ "JESD238A: High Bandwidth Memory (HBM3) Standard". JEDEC. January 2023. Retrieved 4 November 2025.
- ^ "NVIDIA Hopper Architecture In-Depth". NVIDIA Developer Blog. 22 March 2022. Retrieved 4 November 2025.