upload/duxiu_main2/【星空藏书馆】/【星空藏书馆】等多个文件/Kindle电子书库(012)/2022更新/2022/7月/Processing-in-Memory for AI From Circuits to Systems (Joo-Young Kim, Bongjin Kim, Tony Tae-Hyoung Kim) (z-lib.org).pdf
Processing-in-Memory for AI : From Circuits to Systems 🔍
Joo-Young Kim, Bongjin Kim, Tony Tae-Hyoung Kim
Springer International Publishing AG, Springer Nature, Cham, 2022
英语 [en] · 中文 [zh] · PDF · 7.9MB · 2022 · 📘 非小说类图书 · 🚀/lgli/upload/zlib · Save
描述
This book provides a comprehensive introduction to processing-in-memory (PIM) technology, from its architectures to circuits implementations on multiple memory types and describes how it can be a viable computer architecture in the era of AI and big data. The authors summarize the challenges of AI hardware systems, processing-in-memory (PIM) constraints and approaches to derive system-level requirements for a practical and feasible PIM solution. The presentation focuses on feasible PIM solutions that can be implemented and used in real systems, including architectures, circuits, and implementation cases for each major memory type (SRAM, DRAM, and ReRAM).
Erscheinungsdatum: 10.07.2022
Erscheinungsdatum: 10.07.2022
备用文件名
lgli/Joo-Young Kim, Bongjin Kim, Tony Tae-Hyoung Kim - Processing-in-Memory for AI: From Circuits to Systems (2022, Springer).pdf
备用文件名
zlib/Computers/Artificial Intelligence (AI)/Joo-Young Kim, Bongjin Kim, Tony Tae-Hyoung Kim/Processing-in-Memory for AI: From Circuits to Systems_21932713.pdf
备选标题
大手笔是怎样炼成的(套装共4册)
备选作者
Adobe InDesign 16.3 (Windows)
备选作者
谢亦森
备用出版商
Springer Nature Switzerland AG
备用出版商
长江文艺出版社
备用版本
Switzerland, Switzerland
备用版本
Cham, Switzerland, 2022
备用版本
1st ed. 2023, 2022
元数据中的注释
producers:
Adobe PDF Library 15.0
Adobe PDF Library 15.0
备用描述
Contents 5
1 Introduction 6
1.1 Hardware Acceleration for Artificial Intelligence and Machine Learning 6
1.2 Machine Learning Computations 7
1.2.1 Fully Connected Layer 8
1.2.2 Convolutional Layer 9
1.2.3 Recurrent Layer 10
1.3 von Neumann Bottleneck 11
1.3.1 Memory Wall Problem 11
1.3.2 Latest AI Accelerators with High-Bandwidth Memories 11
1.4 Processing-in-Memory Architecture 12
1.4.1 Paradigm Shift from Compute to Memory 12
1.4.2 Challenges 14
1.5 Book Organization 15
References 15
2 Backgrounds 19
2.1 Basic Memory Operations 19
2.1.1 SRAM Basics 20
2.1.2 DRAM Basics 22
2.1.3 ReRAM Basics 25
2.2 PIM Fundamentals 31
2.3 PIM Output Read-out 36
2.4 PIM Design Challenges 40
References 43
3 SRAM-Based Processing-in-Memory (PIM) 45
3.1 Introduction 45
3.2 SRAM-Based PIM Cell Designs 46
3.2.1 Standard 6T SRAM-Based PIM 46
3.2.2 Custom SRAM Cells for PIM 47
3.3 SRAM-Based PIM Macro Designs 52
3.4 Summary 65
References 66
4 DRAM-Based Processing-in-Memory 68
4.1 Introduction 68
4.2 Basic DRAM Operation 68
4.3 Bulk Bitwise Processing-in-Memory 70
4.3.1 AMBIT 70
4.3.1.1 Triple Row Activation 70
4.3.1.2 AMBIT DRAM Organization 71
4.3.1.3 Fast Row Copy 72
4.3.1.4 Bulk Bitwise NOT 72
4.3.1.5 Row Addressing 74
4.3.1.6 AMBIT Command Execution 74
4.3.1.7 Evaluation 74
4.3.2 DRISA 75
4.3.2.1 Motivation 75
4.3.2.2 Cell Microarchitectures 75
4.3.2.3 Computing Using NOR Operation 77
4.3.2.4 Evaluation 78
4.4 Bank-Level Processing-in-Memory 79
4.4.1 Newton 80
4.4.1.1 Motivation 80
4.4.1.2 Architecture 80
4.4.1.3 Newton's Operation 81
4.4.1.4 Evaluation 83
4.4.2 HBM-PIM 84
4.4.2.1 Motivation 84
4.4.2.2 HBM-PIM Architecture 85
4.4.2.3 HBM-PIM Controller 86
4.4.2.4 Programmable Computing Unit 86
4.4.2.5 Operation Flow 87
4.4.2.6 Data Movements 89
4.4.2.7 Implementation Results 89
4.5 3-D Processing-in-Memory 90
4.5.1 Neurocube 90
4.5.2 Tetris 91
4.5.3 iPIM 92
References 93
5 ReRAM-Based Processing-in-Memory (PIM) 95
5.1 Introduction 95
5.2 Basic ReRAM PIM Operation 96
5.3 Multiplication in ReRAM PIMs 97
5.3.1 Binary Multiply 97
5.3.2 Multiplication with Ternary Weight 98
5.3.3 Multi-bit Multiplication 98
5.3.3.1 Multiplication Using One Cycle and One Column 98
5.3.3.2 Parallel-Input Parallel-Weight (PIPW) 99
5.3.3.3 Serial-Input Parallel-Weight (SIPW) 99
5.4 ReRAM PIM Architecture 101
5.4.1 Introduction 101
5.4.2 Non-volatile PIM Processor 102
5.4.3 ReRAM PIM Architecture 103
5.4.4 ADCs and DACs in ReRAM PIM 104
5.5 ReRAM Co-processor 106
5.5.1 Architecture 106
5.5.2 Mixed-Signal Interface 106
5.5.3 ADCs and DACs Operation 107
5.6 Transposable ReRAM for Inference and Training 109
5.7 Bitline Sensing for MAC Accuracy Improvement 109
5.7.1 Variations in Bitline Current 109
5.7.2 Input-Aware Dynamic Reference Generation 111
5.7.3 Weighted Current Generation 112
5.7.3.1 PIM Macro Architecture 112
5.7.3.2 Serial-Input Non-weighted Product (SINWP) 113
5.7.3.3 Down-scaling Weighted Current Translator (DSWCT) 114
5.8 Versatile ReRAM-Based PIM Functions 116
5.8.1 Versatile PIM Architecture 116
5.8.2 2T2R ReRAM Bit Cell for Versatile Functions 117
5.8.2.1 Basic Memory Operation 117
5.8.2.2 TCAM Operation 118
5.8.2.3 Logic-in-Memory Operation 118
5.8.2.4 Dot Product Operation 119
5.9 Summary 120
References 121
6 PIM for ML Training 123
6.1 Introduction 123
6.2 Training Computations 124
6.2.1 Feed-Forward Propagation 124
6.2.2 Backward Propagation 125
6.2.3 Gradient Calculation and Weight Update 126
6.3 SRAM-Based PIM for Training 126
6.3.1 Two-Way Transpose SRAM PIM 126
6.3.1.1 SRAM Compute-in-Memory Macro Design 127
6.3.1.2 In-memory Multiplication for Forward and Backward Propagation 127
6.3.2 CIMAT 128
6.3.2.1 7T and 8T Transpose SRAM Cell Design 129
6.3.2.2 Weight Mapping Strategies and Data Flow 130
6.3.2.3 Pipeline Design 131
6.3.3 HFP-CIM 132
6.3.3.1 Heterogeneous Floating-Point Computing Architecture 133
6.3.3.2 Overall Processor Design and Sparsity Handling 135
6.4 ReRAM-Based PIM for Training 136
6.4.1 PipeLayer 136
6.4.1.1 Architecture of PipeLayer 137
6.4.1.2 Data Mapping and Parallelism of PipeLayer 138
6.4.2 FloatPIM 139
6.4.2.1 FloatPIM's Digital Operation 140
6.4.2.2 Hardware Architecture 141
6.4.2.3 Training of FloatPIM 142
References 143
7 PIM Software Stack 145
7.1 PIM Software Stack Overview 145
7.1.1 PIM Software Stack Challenges 149
7.2 PIM Offloading Execution 150
7.3 PIM Data Mapping 152
7.4 PIM Execution Scheduling 154
7.5 Cache Coherence 158
References 161
8 Conclusion 163
1 Introduction 6
1.1 Hardware Acceleration for Artificial Intelligence and Machine Learning 6
1.2 Machine Learning Computations 7
1.2.1 Fully Connected Layer 8
1.2.2 Convolutional Layer 9
1.2.3 Recurrent Layer 10
1.3 von Neumann Bottleneck 11
1.3.1 Memory Wall Problem 11
1.3.2 Latest AI Accelerators with High-Bandwidth Memories 11
1.4 Processing-in-Memory Architecture 12
1.4.1 Paradigm Shift from Compute to Memory 12
1.4.2 Challenges 14
1.5 Book Organization 15
References 15
2 Backgrounds 19
2.1 Basic Memory Operations 19
2.1.1 SRAM Basics 20
2.1.2 DRAM Basics 22
2.1.3 ReRAM Basics 25
2.2 PIM Fundamentals 31
2.3 PIM Output Read-out 36
2.4 PIM Design Challenges 40
References 43
3 SRAM-Based Processing-in-Memory (PIM) 45
3.1 Introduction 45
3.2 SRAM-Based PIM Cell Designs 46
3.2.1 Standard 6T SRAM-Based PIM 46
3.2.2 Custom SRAM Cells for PIM 47
3.3 SRAM-Based PIM Macro Designs 52
3.4 Summary 65
References 66
4 DRAM-Based Processing-in-Memory 68
4.1 Introduction 68
4.2 Basic DRAM Operation 68
4.3 Bulk Bitwise Processing-in-Memory 70
4.3.1 AMBIT 70
4.3.1.1 Triple Row Activation 70
4.3.1.2 AMBIT DRAM Organization 71
4.3.1.3 Fast Row Copy 72
4.3.1.4 Bulk Bitwise NOT 72
4.3.1.5 Row Addressing 74
4.3.1.6 AMBIT Command Execution 74
4.3.1.7 Evaluation 74
4.3.2 DRISA 75
4.3.2.1 Motivation 75
4.3.2.2 Cell Microarchitectures 75
4.3.2.3 Computing Using NOR Operation 77
4.3.2.4 Evaluation 78
4.4 Bank-Level Processing-in-Memory 79
4.4.1 Newton 80
4.4.1.1 Motivation 80
4.4.1.2 Architecture 80
4.4.1.3 Newton's Operation 81
4.4.1.4 Evaluation 83
4.4.2 HBM-PIM 84
4.4.2.1 Motivation 84
4.4.2.2 HBM-PIM Architecture 85
4.4.2.3 HBM-PIM Controller 86
4.4.2.4 Programmable Computing Unit 86
4.4.2.5 Operation Flow 87
4.4.2.6 Data Movements 89
4.4.2.7 Implementation Results 89
4.5 3-D Processing-in-Memory 90
4.5.1 Neurocube 90
4.5.2 Tetris 91
4.5.3 iPIM 92
References 93
5 ReRAM-Based Processing-in-Memory (PIM) 95
5.1 Introduction 95
5.2 Basic ReRAM PIM Operation 96
5.3 Multiplication in ReRAM PIMs 97
5.3.1 Binary Multiply 97
5.3.2 Multiplication with Ternary Weight 98
5.3.3 Multi-bit Multiplication 98
5.3.3.1 Multiplication Using One Cycle and One Column 98
5.3.3.2 Parallel-Input Parallel-Weight (PIPW) 99
5.3.3.3 Serial-Input Parallel-Weight (SIPW) 99
5.4 ReRAM PIM Architecture 101
5.4.1 Introduction 101
5.4.2 Non-volatile PIM Processor 102
5.4.3 ReRAM PIM Architecture 103
5.4.4 ADCs and DACs in ReRAM PIM 104
5.5 ReRAM Co-processor 106
5.5.1 Architecture 106
5.5.2 Mixed-Signal Interface 106
5.5.3 ADCs and DACs Operation 107
5.6 Transposable ReRAM for Inference and Training 109
5.7 Bitline Sensing for MAC Accuracy Improvement 109
5.7.1 Variations in Bitline Current 109
5.7.2 Input-Aware Dynamic Reference Generation 111
5.7.3 Weighted Current Generation 112
5.7.3.1 PIM Macro Architecture 112
5.7.3.2 Serial-Input Non-weighted Product (SINWP) 113
5.7.3.3 Down-scaling Weighted Current Translator (DSWCT) 114
5.8 Versatile ReRAM-Based PIM Functions 116
5.8.1 Versatile PIM Architecture 116
5.8.2 2T2R ReRAM Bit Cell for Versatile Functions 117
5.8.2.1 Basic Memory Operation 117
5.8.2.2 TCAM Operation 118
5.8.2.3 Logic-in-Memory Operation 118
5.8.2.4 Dot Product Operation 119
5.9 Summary 120
References 121
6 PIM for ML Training 123
6.1 Introduction 123
6.2 Training Computations 124
6.2.1 Feed-Forward Propagation 124
6.2.2 Backward Propagation 125
6.2.3 Gradient Calculation and Weight Update 126
6.3 SRAM-Based PIM for Training 126
6.3.1 Two-Way Transpose SRAM PIM 126
6.3.1.1 SRAM Compute-in-Memory Macro Design 127
6.3.1.2 In-memory Multiplication for Forward and Backward Propagation 127
6.3.2 CIMAT 128
6.3.2.1 7T and 8T Transpose SRAM Cell Design 129
6.3.2.2 Weight Mapping Strategies and Data Flow 130
6.3.2.3 Pipeline Design 131
6.3.3 HFP-CIM 132
6.3.3.1 Heterogeneous Floating-Point Computing Architecture 133
6.3.3.2 Overall Processor Design and Sparsity Handling 135
6.4 ReRAM-Based PIM for Training 136
6.4.1 PipeLayer 136
6.4.1.1 Architecture of PipeLayer 137
6.4.1.2 Data Mapping and Parallelism of PipeLayer 138
6.4.2 FloatPIM 139
6.4.2.1 FloatPIM's Digital Operation 140
6.4.2.2 Hardware Architecture 141
6.4.2.3 Training of FloatPIM 142
References 143
7 PIM Software Stack 145
7.1 PIM Software Stack Overview 145
7.1.1 PIM Software Stack Challenges 149
7.2 PIM Offloading Execution 150
7.3 PIM Data Mapping 152
7.4 PIM Execution Scheduling 154
7.5 Cache Coherence 158
References 161
8 Conclusion 163
备用描述
"This book provides a comprehensive introduction to processing-in-memory (PIM) technology, from its architectures to circuits implementations on multiple memory types and describes how it can be a viable computer architecture in the era of AI and big data. The authors summarize the challenges of AI hardware systems, processing-in-memory (PIM) constraints and approaches to derive system-level requirements for a practical and feasible PIM solution. The presentation focuses on feasible PIM solutions that can be implemented and used in real systems, including architectures, circuits, and implementation cases for each major memory type (SRAM, DRAM, and ReRAM)."--Cover page 4
开源日期
2022-07-13
We strongly recommend that you support the author by buying or donating on their personal website, or borrowing in your local library.
🚀 快速下载
成为会员以支持书籍、论文等的长期保存。为了感谢您对我们的支持,您将获得高速下载权益。❤️
🐢 低速下载
由可信的合作方提供。 更多信息请参见常见问题解答。 (可能需要验证浏览器——无限次下载!)
- 低速服务器(合作方提供) #1 (稍快但需要排队)
- 低速服务器(合作方提供) #2 (稍快但需要排队)
- 低速服务器(合作方提供) #3 (稍快但需要排队)
- 低速服务器(合作方提供) #4 (稍快但需要排队)
- 低速服务器(合作方提供) #5 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #6 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #7 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #8 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #9 (无需排队,但可能非常慢)
- 下载后: 在我们的查看器中打开
所有选项下载的文件都相同,应该可以安全使用。即使这样,从互联网下载文件时始终要小心。例如,确保您的设备更新及时。
外部下载
-
对于大文件,我们建议使用下载管理器以防止中断。
推荐的下载管理器:JDownloader -
您将需要一个电子书或 PDF 阅读器来打开文件,具体取决于文件格式。
推荐的电子书阅读器:Anna的档案在线查看器、ReadEra和Calibre -
使用在线工具进行格式转换。
推荐的转换工具:CloudConvert和PrintFriendly -
您可以将 PDF 和 EPUB 文件发送到您的 Kindle 或 Kobo 电子阅读器。
推荐的工具:亚马逊的“发送到 Kindle”和djazz 的“发送到 Kobo/Kindle” -
支持作者和图书馆
✍️ 如果您喜欢这个并且能够负担得起,请考虑购买原版,或直接支持作者。
📚 如果您当地的图书馆有这本书,请考虑在那里免费借阅。
下面的文字仅以英文继续。
总下载量:
“文件的MD5”是根据文件内容计算出的哈希值,并且基于该内容具有相当的唯一性。我们这里索引的所有影子图书馆都主要使用MD5来标识文件。
一个文件可能会出现在多个影子图书馆中。有关我们编译的各种数据集的信息,请参见数据集页面。
有关此文件的详细信息,请查看其JSON 文件。 Live/debug JSON version. Live/debug page.