Neural Networks and Deep Learning: A Textbook, 2nd Edition 🔍
Charu C. Aggarwal
Springer International Publishing : Imprint: Springer, 1, 2, 2023
英语 [en] · EPUB · 40.3MB · 2023 · 📘 非小说类图书 · 🚀/lgli/lgrs/nexusstc/upload/zlib · Save
描述
Neural networks were developed to simulate the human nervous system for Machine Learning tasks by treating the computational units in a learning model in a manner similar to human neurons. The grand vision of neural networks is to create artificial intelligence by building machines whose architecture simulates the computations in the human nervous system. Although the biological model of neural networks is an exciting one and evokes comparisons with science fiction, neural networks have a much simpler and mundane mathematical basis than a complex biological system. The neural network abstraction can be viewed as a modular approach of enabling learning algorithms that are based on continuous optimization on a computational graph of mathematical dependencies between the input and output. These ideas are strikingly similar to classical optimization methods in control theory, which historically preceded the development of neural network algorithms.
Neural networks were developed soon after the advent of computers in the fifties and sixties. Rosenblatt’s perceptron algorithm was seen as a fundamental cornerstone of neural networks, which caused an initial period of euphoria — it was soon followed by disappointment as the initial successes were somewhat limited. Eventually, at the turn of the century, greater data availability and increasing computational power lead to increased successes of neural networks, and this area was reborn under the new label of “Deep Learning.” Although we are still far from the day that Artificial Intelligence (AI) is close to human performance, there are specific domains like image recognition, self-driving cars, and game playing, where AI has matched or exceeded human performance. It is also hard to predict what AI might be able to do in the future. For example, few computer vision experts would have thought two decades ago that any automated system could ever perform an intuitive task like categorizing an image more accurately than a human. The large amounts of data available in recent years together with increased computational power have enabled experimentation with more sophisticated and deep neural architectures than was previously possible. The resulting success has changed the broader perception of the potential of Deep Learning. This book discusses neural networks from this modern perspective.
The chapters of the book are organized as follows:
1. The basics of neural networks: Chapters 1, 2, and 3 discuss the basics of neural network design and the backpropagation algorithm. Many traditional machine learning models can be understood as special cases of neural learning. Understanding the relationship between traditional machine learning and neural networks is the first step to understanding the latter. The simulation of various machine learning models with neural networks is provided in Chapter 3. This will give the analyst a feel of how neural networks push the envelope of traditional machine learning algorithms.
2. Fundamentals of neural networks: Although Chapters 1, 2, and 3 provide an overview of the training methods for neural networks, a more detailed understanding of the training challenges is provided in Chapters 4 and 5. Chapters 6 and 7 present radial-basis function (RBF) networks and restricted Boltzmann machines.
3. Advanced topics in neural networks: A lot of the recent success of deep learning is a result of the specialized architectures for various domains, such as recurrent neural networks and convolutional neural networks. Chapters 8 and 9 discuss recurrent and convolutional neural networks. Graph neural networks are discussed in Chapter 10. Several advanced topics like deep reinforcement learning, attention mechanisms, neural Turing mechanisms, and generative adversarial networks are discussed in Chapters 11 and 12.
Neural networks were developed soon after the advent of computers in the fifties and sixties. Rosenblatt’s perceptron algorithm was seen as a fundamental cornerstone of neural networks, which caused an initial period of euphoria — it was soon followed by disappointment as the initial successes were somewhat limited. Eventually, at the turn of the century, greater data availability and increasing computational power lead to increased successes of neural networks, and this area was reborn under the new label of “Deep Learning.” Although we are still far from the day that Artificial Intelligence (AI) is close to human performance, there are specific domains like image recognition, self-driving cars, and game playing, where AI has matched or exceeded human performance. It is also hard to predict what AI might be able to do in the future. For example, few computer vision experts would have thought two decades ago that any automated system could ever perform an intuitive task like categorizing an image more accurately than a human. The large amounts of data available in recent years together with increased computational power have enabled experimentation with more sophisticated and deep neural architectures than was previously possible. The resulting success has changed the broader perception of the potential of Deep Learning. This book discusses neural networks from this modern perspective.
The chapters of the book are organized as follows:
1. The basics of neural networks: Chapters 1, 2, and 3 discuss the basics of neural network design and the backpropagation algorithm. Many traditional machine learning models can be understood as special cases of neural learning. Understanding the relationship between traditional machine learning and neural networks is the first step to understanding the latter. The simulation of various machine learning models with neural networks is provided in Chapter 3. This will give the analyst a feel of how neural networks push the envelope of traditional machine learning algorithms.
2. Fundamentals of neural networks: Although Chapters 1, 2, and 3 provide an overview of the training methods for neural networks, a more detailed understanding of the training challenges is provided in Chapters 4 and 5. Chapters 6 and 7 present radial-basis function (RBF) networks and restricted Boltzmann machines.
3. Advanced topics in neural networks: A lot of the recent success of deep learning is a result of the specialized architectures for various domains, such as recurrent neural networks and convolutional neural networks. Chapters 8 and 9 discuss recurrent and convolutional neural networks. Graph neural networks are discussed in Chapter 10. Several advanced topics like deep reinforcement learning, attention mechanisms, neural Turing mechanisms, and generative adversarial networks are discussed in Chapters 11 and 12.
备用文件名
nexusstc/Neural Networks and Deep Learning: A Textbook/8ea71fb5ba41797181b956d6e6fb3676.epub
备用文件名
lgli/Neural Networks and Deep Learning 2Ed.epub
备用文件名
lgrsnf/Neural Networks and Deep Learning 2Ed.epub
备用文件名
zlib/Computers/Networking/Charu C. Aggarwal/Neural Networks and Deep Learning: A Textbook, 2nd Edition_25343913.epub
备用出版商
Springer International Publishing AG
备用出版商
Springer Nature Switzerland AG
备用版本
Springer Nature (Textbooks & Major Reference Works), [N.p.], 2023
备用版本
Second edition, Cham, Switzerland
备用版本
Switzerland, Switzerland
备用版本
2nd ed. 2023, Cham
元数据中的注释
{"edition":"2","isbns":["3031296419","3031296427","9783031296413","9783031296420"],"last_page":541,"publisher":"Springer","source":"crossref"}
备用描述
1 An Introduction to Neural Networks
1.1 Introduction
1.2 Single Computational Layer: The Perceptron
1.2.1 Use of Bias
1.2.2 What Objective Function Is the Perceptron Optimizing?
1.3 The Base Components of Neural Architectures
1.3.1 Choice of Activation Function
1.3.2 Softmax Activation Function
1.3.3 Common Loss Functions
1.4 Multilayer Neural Networks
1.4.1 The Multilayer Network as a Computational Graph
1.5 The Importance of Nonlinearity
1.5.1 Nonlinear Activations in Action
1.6 Advanced Architectures and Structured Data
1.7 Two Notable Benchmarks
1.7.1 The MNIST Database of Handwritten Digits
1.7.2 The ImageNet Database
1.8 Summary
1.9 Bibliographic Notes and Software Resources
1.10 Exercises
2 The Backpropagation Algorithm
2.1 Introduction
2.2 The Computational Graph Abstraction
2.2.1 Computational Graphs Create Complex Functions
2.3 Backpropagation in Computational Graphs
2.3.1 Computing Node-to-Node Derivatives with the Chain Rule
2.3.2 Dynamic Programming for Computing Node-to-NodeDerivatives
2.3.3 Converting Node-to-Node Derivatives into Loss-to-Weight Derivatives
2.4 Backpropagation in Neural Networks
2.4.1 Some Useful Derivatives of Activation Functions
2.4.2 Examples of Updates for Various Activations
2.5 The Vector-Centric View of Backpropagation
2.5.1 Derivatives with Respect to Vectors
2.5.2 Vector-Centric Chain Rule
2.5.3 A Decoupled View of Vector-Centric Backpropagation
2.5.4 Vector-Centric Backpropagation with Non-LayeredArchitectures
2.6 The Not-So-Unimportant Details
2.6.1 Mini-Batch Stochastic Gradient Descent
2.6.2 Learning Rate Decay
2.6.3 Checking the Correctness of Gradient Computation
2.6.4 Regularization
2.6.5 Loss Functions on Hidden Nodes
2.6.6 Backpropagation Tricks for Handling Shared Weights
2.7 Tuning and Preprocessing
2.7.1 Tuning Hyperparameters
2.7.2 Feature Preprocessing
2.7.3 Initialization
2.8 Backpropagation Is Interpretable
2.9 Summary
2.10 Bibliographic Notes and Software Resources
2.11 Exercises
3 Machine Learning with Shallow Neural Networks
3.1 Introduction
3.2 Neural Architectures for Binary Classification Models
3.2.1 Revisiting the Perceptron
3.2.2 Least-Squares Regression
3.2.2.1 Widrow-Hoff Learning
3.2.2.2 Closed Form Solutions
3.2.3 Support Vector Machines
3.2.4 Logistic Regression
3.2.5 Comparison of Different Models
3.3 Neural Architectures for Multiclass Models
3.3.1 Multiclass Perceptron
3.3.2 Weston-Watkins SVM
3.3.3 Multinomial Logistic Regression (Softmax Classifier)
3.4 Unsupervised Learning with Autoencoders
3.4.1 Linear Autoencoder with a Single Hidden Layer
3.4.1.1 Connections with Singular Value Decomposition
3.4.1.2 Sharing Weights in the Encoder and Decoder
3.4.2 Nonlinear Activation Functions and Depth
3.4.3 Application to Visualization
3.4.4 Application to Outlier Detection
3.4.5 Application to Multimodal Embeddings
3.4.6 Benefits of Autoencoders
3.5 Recommender Systems
3.6 Text Embedding with Word2vec
3.6.1 Neural Embedding with Continuous Bag of Words
3.6.2 Neural Embedding with Skip-Gram Model
3.6.3 Word2vec (SGNS) is Logistic Matrix Factorization
3.7 Simple Neural Architectures for Graph Embeddings
3.7.1 Handling Arbitrary Edge Counts
3.7.2 Beyond One-Hop Structural Models
3.7.3 Multinomial Model
3.8 Summary
3.9 Bibliographic Notes and Software Resources
3.10 Exercises
4 Deep Learning: Principles and Training Algorithms
4.1 Introduction
4.2 Why Is Depth Beneficial?
4.2.1 Hierarchical Feature Engineering: How Depth Reveals Rich Structure
4.3 Why Is Training Deep Networks Hard?
4.3.1 Geometric Understanding of the Effect of Gradient Ratios
4.3.2 The Vanishing and Exploding Gradient Problems
4.3.3 Cliffs and Valleys
4.3.4 Convergence Problems with Depth
4.3.5 Local Minima
4.4 Depth-Friendly Neural Architectures
4.4.1 Activation Function Choice
4.4.2 Dying Neurons and “Brain Damage”
4.4.2.1 Leaky ReLU
4.4.2.2 Maxout Networks
4.4.3 Using Skip Connections
4.5 Depth-Friendly Gradient-Descent Strategies
4.5.1 Importance of Preprocessing and Initialization
4.5.2 Momentum-Based Learning
4.5.3 Nesterov Momentum
4.5.4 Parameter-Specific Learning Rates
4.5.4.1 AdaGrad
4.5.4.2 RMSProp
4.5.4.3 AdaDelta
4.5.5 Combining Parameter-Specific Learning and Momentum
4.5.5.1 RMSProp with Nesterov Momentum
4.5.5.2 Adam
4.5.6 Gradient Clipping
4.5.7 Polyak Averaging
4.6 Second-Order Derivatives: The Newton Method
4.6.1 Example: Newton Method in the Quadratic Bowl
4.6.2 Example: Newton Method in a Non-Quadratic Function
4.6.3 The Saddle-Point Problem with Second-Order Methods
4.7 Fast Approximations of Newton Method
4.7.1 Conjugate Gradient Method
4.7.2 Quasi-Newton Methods and BFGS
4.8 Batch Normalization
4.9 Practical Tricks for Acceleration and Compression
4.9.1 GPU Acceleration
4.9.2 Parallel and Distributed Implementations
4.9.3 Algorithmic Tricks for Model Compression
4.10 Summary
4.11 Bibliographic Notes and Software Resources
4.12 Exercises
5 Teaching Deep Learners to Generalize
5.1 Introduction
5.1.1 Example: Linear Regression
5.1.2 Example: Polynomial Regression
5.2 The Bias-Variance Trade-Off
5.3 Generalization Issues in Model Tuning and Evaluation
5.3.1 Evaluating with Hold-Out and Cross-Validation
5.3.2 Issues with Training at Scale
5.3.3 How to Detect Need to Collect More Data
5.4 Penalty-Based Regularization
5.4.1 Connections with Noise Injection
5.4.2 L1-Regularization
5.4.3 L1- or L2-Regularization?
5.4.4 Penalizing Hidden Units: Learning Sparse Representations
5.5 Ensemble Methods
5.5.1 Bagging and Subsampling
5.5.2 Parametric Model Selection and Averaging
5.5.3 Randomized Connection Dropping
5.5.4 Dropout
5.5.5 Data Perturbation Ensembles
5.6 Early Stopping
5.6.1 Understanding Early Stopping from the Variance Perspective
5.7 Unsupervised Pretraining
5.7.1 Variations of Unsupervised Pretraining
5.7.2 What About Supervised Pretraining?
5.8 Continuation and Curriculum Learning
5.9 Parameter Sharing
5.10 Regularization in Unsupervised Applications
5.10.1 When the Hidden Layer is Broader than the Input Layer
5.10.1.1 Sparse Feature Learning
5.10.2 Noise Injection: De-noising Autoencoders
5.10.3 Gradient-Based Penalization: Contractive Autoencoders
5.10.4 Hidden Probabilistic Structure: Variational Autoencoders
5.10.4.1 Reconstruction and Generative Sampling
5.10.4.2 Conditional Variational Autoencoders
5.10.4.3 Relationship with Generative Adversarial Networks
5.11 Summary
5.12 Bibliographic Notes and Software Resources
5.13 Exercises
6 Radial Basis Function Networks
6.1 Introduction
6.2 Training an RBF Network
6.2.1 Training the Hidden Layer
6.2.2 Training the Output Layer
6.2.3 Iterative Construction of Hidden Layer
6.2.4 Fully Supervised Learning of Hidden Layer
6.3 Variations and Special Cases of RBF Networks
6.3.1 Classification with Perceptron Criterion
6.3.2 Classification with Hinge Loss
6.3.3 Example of Linear Separability Promoted by RBF
6.3.4 Application to Interpolation
6.4 Relationship with Kernel Methods
6.4.1 Kernel Regression Is a Special Case of RBF Networks
6.4.2 Kernel SVM Is a Special Case of RBF Networks
6.5 Summary
6.6 Bibliographic Notes and Software Resources
6.7 Exercises
7 Restricted Boltzmann Machines
7.1 Introduction
7.2 Hopfield Networks
7.2.1 Training a Hopfield Network
7.2.2 Building a Toy Recommender and Its Limitations
7.2.3 Increasing the Expressive Power of the Hopfield Network
7.3 The Boltzmann Machine
7.3.1 How a Boltzmann Machine Generates Data
7.3.2 Learning the Weights of a Boltzmann Machine
7.4 Restricted Boltzmann Machines
7.4.1 Training the RBM
7.4.2 Contrastive Divergence Algorithm
7.5 Applications of Restricted Boltzmann Machines
7.5.1 Dimensionality Reduction and Data Reconstruction
7.5.2 RBMs for Collaborative Filtering
7.5.3 Using RBMs for Classification
7.5.4 Topic Models with RBMs
7.5.5 RBMs for Machine Learning with Multimodal Data
7.6 Using RBMs beyond Binary Data Types
7.7 Stacking Restricted Boltzmann Machines
7.7.1 Unsupervised Learning
7.7.2 Supervised Learning
7.7.3 Deep Boltzmann Machines and Deep Belief Networks
7.8 Summary
7.9 Bibliographic Notes and Software Resources
7.10 Exercises
8 Recurrent Neural Networks
8.1 Introduction
8.2 The Architecture of Recurrent Neural Networks
8.2.1 Language Modeling Example of RNN
8.2.2 Backpropagation Through Time
8.2.3 Bidirectional Recurrent Networks
8.2.4 Multilayer Recurrent Networks
8.3 The Challenges of Training Recurrent Networks
8.3.1 Layer Normalization
8.4 Echo-State Networks
8.5 Long Short-Term Memory (LSTM)
8.6 Gated Recurrent Units (GRUs)
8.7 Applications of Recurrent Neural Networks
8.7.1 Contextualized Word Embeddings with ELMo
8.7.2 Application to Automatic Image Captioning
8.7.3 Sequence-to-Sequence Learning and Machine Translation
8.7.4 Application to Sentence-Level Classification
8.7.5 Token-Level Classification with Linguistic Features
8.7.6 Time-Series Forecasting and Prediction
8.7.7 Temporal Recommender Systems
8.7.8 Secondary Protein Structure Prediction
8.7.9 End-to-End Speech Recognition
8.7.10 Handwriting Recognition
8.8 Summary
8.9 Bibliographic Notes and Software Resources
8.10 Exercises
9 Convolutional Neural Networks
9.1 Introduction
9.1.1 Historical Perspective and Biological Inspiration
9.1.2 Broader Observations about Convolutional Neural Networks
9.2 The Basic Structure of a Convolutional Network
9.2.1 Padding
9.2.2 Strides
9.2.3 The ReLU Layer
9.2.4 Pooling
9.2.5 Fully Connected Layers
9.2.6 The Interleaving between Layers
9.2.7 Hierarchical Feature Engineering
9.3 Training a Convolutional Network
9.3.1 Backpropagating Through Convolutions
9.3.2 Backpropagation as Convolution with Inverted/TransposedFilter
9.3.3 Convolution/Backpropagation as Matrix Multiplications
9.3.4 Data Augmentation
9.4 Case Studies of Convolutional Architectures
9.4.1 AlexNet
9.4.2 ZFNet
9.4.3 VGG
9.4.4 GoogLeNet
9.4.5 ResNet
9.4.6 Squeeze-and-Excitation Networks (SENets)
9.4.7 The Effects of Depth
9.4.8 Pretrained Models
9.5 Visualization and Unsupervised Learning
9.5.1 Visualizing the Features of a Trained Network
9.5.2 Convolutional Autoencoders
9.6 Applications of Convolutional Networks
9.6.1 Content-Based Image Retrieval
9.6.2 Object Localization
9.6.3 Object Detection
9.6.4 Natural Language and Sequence Learning with TextCNN
9.6.5 Video Classification
9.7 Summary
9.8 Bibliographic Notes and Software Resources
9.9 Exercises
10 Graph Neural Networks
10.1 Introduction
10.2 Node Embeddings with ConventionalArchitectures
10.2.1 Adjacency Matrix Representation and Feature Engineering
10.3 Graph Neural Networks: The General Framework
10.3.1 The Neighborhood Function
10.3.2 Graph Convolution Function
10.3.3 GraphSAGE
10.3.4 Handling Edge Weights
10.3.5 Handling New Vertices
10.3.6 Handling Relational Networks
10.3.7 Directed Graphs
10.3.8 Gated Graph Neural Networks
10.3.9 Comparison with Image Convolutional Networks
10.4 Backpropagation in Graph Neural Networks
10.5 Beyond Nodes: Generating Graph-LevelModels
10.6 Applications of Graph Neural Networks
10.7 Summary
10.8 Bibliographic Notes and Software Resources
10.9 Exercises
11 Deep Reinforcement Learning
11.1 Introduction
11.2 Stateless Algorithms: Multi-Armed Bandits
11.3 The Basic Framework of Reinforcement Learning
11.4 Monte Carlo Sampling
11.4.1 Monte Carlo Sampling Algorithm
11.4.2 Monte Carlo Rollouts with Function Approximators
11.5 Bootstrapping for Value Function Learning
11.5.1 Q-Learning
11.5.2 Deep Learning Models as Function Approximators
11.5.3 Example: Neural Network Specifics for Video Game Setting
11.5.4 On-Policy versus Off-Policy Methods: SARSA
11.5.5 Modeling States versus State-Action Pairs
11.6 Policy Gradient Methods
11.6.1 Finite Difference Methods
11.6.2 Likelihood Ratio Methods
11.6.3 Actor-Critic Methods
11.6.4 Continuous Action Spaces
11.7 Monte Carlo Tree Search
11.8 Case Studies
11.8.1 AlphaGo and AlphaZero for Go and Chess
11.8.2 Self-Learning Robots
11.8.2.1 Deep Learning of Locomotion Skills
11.8.2.2 Deep Learning of Visuomotor Skills
11.8.3 Building Conversational Systems: Deep Learning for Chatbots
11.8.4 Self-Driving Cars
11.8.5 Neural Architecture Search with Reinforcement Learning
11.9 Practical Challenges Associated with Safety
11.10 Summary
11.11 Bibliographic Notes and Software Resources
11.12 Exercises
12 Advanced Topics in Deep Learning
12.1 Introduction
12.2 Attention Mechanisms
12.2.1 Recurrent Models of Visual Attention
12.2.2 Attention Mechanisms for Image Captioning
12.2.3 Soft Image Attention with Spatial Transformer
12.2.4 Attention Mechanisms for Machine Translation
12.2.5 Transformer Networks
12.2.5.1 How Self Attention Helps
12.2.5.2 The Self-Attention Module
12.2.5.3 Incorporating Positional Information
12.2.5.4 The Sequence-to-Sequence Transformer
12.2.5.5 Multihead Attention
12.2.6 Transformer-Based Pre-trained Language Models
12.2.6.1 GPT-n
12.2.6.2 BERT
12.2.6.3 T5
12.2.7 Vision Transformer (ViT)
12.2.8 Attention Mechanisms in Graphs
12.3 Neural Turing Machines
12.4 Adversarial Deep Learning
12.5 Generative Adversarial Networks (GANs)
12.5.1 Training a Generative Adversarial Network
12.5.2 Comparison with Variational Autoencoder
12.5.3 Using GANs for Generating Image Data
12.5.4 Conditional Generative Adversarial Networks
12.6 Competitive Learning
12.6.1 Vector Quantization
12.6.2 Kohonen Self-Organizing Map
12.7 Limitations of Neural Networks
12.7.1 An Aspirational Goal: Few Shot Learning
12.7.2 An Aspirational Goal: Energy-Efficient Learning
12.8 Summary
12.9 Bibliographic Notes and Software Resources
12.10 Exercises
Bibliography
Index
1.1 Introduction
1.2 Single Computational Layer: The Perceptron
1.2.1 Use of Bias
1.2.2 What Objective Function Is the Perceptron Optimizing?
1.3 The Base Components of Neural Architectures
1.3.1 Choice of Activation Function
1.3.2 Softmax Activation Function
1.3.3 Common Loss Functions
1.4 Multilayer Neural Networks
1.4.1 The Multilayer Network as a Computational Graph
1.5 The Importance of Nonlinearity
1.5.1 Nonlinear Activations in Action
1.6 Advanced Architectures and Structured Data
1.7 Two Notable Benchmarks
1.7.1 The MNIST Database of Handwritten Digits
1.7.2 The ImageNet Database
1.8 Summary
1.9 Bibliographic Notes and Software Resources
1.10 Exercises
2 The Backpropagation Algorithm
2.1 Introduction
2.2 The Computational Graph Abstraction
2.2.1 Computational Graphs Create Complex Functions
2.3 Backpropagation in Computational Graphs
2.3.1 Computing Node-to-Node Derivatives with the Chain Rule
2.3.2 Dynamic Programming for Computing Node-to-NodeDerivatives
2.3.3 Converting Node-to-Node Derivatives into Loss-to-Weight Derivatives
2.4 Backpropagation in Neural Networks
2.4.1 Some Useful Derivatives of Activation Functions
2.4.2 Examples of Updates for Various Activations
2.5 The Vector-Centric View of Backpropagation
2.5.1 Derivatives with Respect to Vectors
2.5.2 Vector-Centric Chain Rule
2.5.3 A Decoupled View of Vector-Centric Backpropagation
2.5.4 Vector-Centric Backpropagation with Non-LayeredArchitectures
2.6 The Not-So-Unimportant Details
2.6.1 Mini-Batch Stochastic Gradient Descent
2.6.2 Learning Rate Decay
2.6.3 Checking the Correctness of Gradient Computation
2.6.4 Regularization
2.6.5 Loss Functions on Hidden Nodes
2.6.6 Backpropagation Tricks for Handling Shared Weights
2.7 Tuning and Preprocessing
2.7.1 Tuning Hyperparameters
2.7.2 Feature Preprocessing
2.7.3 Initialization
2.8 Backpropagation Is Interpretable
2.9 Summary
2.10 Bibliographic Notes and Software Resources
2.11 Exercises
3 Machine Learning with Shallow Neural Networks
3.1 Introduction
3.2 Neural Architectures for Binary Classification Models
3.2.1 Revisiting the Perceptron
3.2.2 Least-Squares Regression
3.2.2.1 Widrow-Hoff Learning
3.2.2.2 Closed Form Solutions
3.2.3 Support Vector Machines
3.2.4 Logistic Regression
3.2.5 Comparison of Different Models
3.3 Neural Architectures for Multiclass Models
3.3.1 Multiclass Perceptron
3.3.2 Weston-Watkins SVM
3.3.3 Multinomial Logistic Regression (Softmax Classifier)
3.4 Unsupervised Learning with Autoencoders
3.4.1 Linear Autoencoder with a Single Hidden Layer
3.4.1.1 Connections with Singular Value Decomposition
3.4.1.2 Sharing Weights in the Encoder and Decoder
3.4.2 Nonlinear Activation Functions and Depth
3.4.3 Application to Visualization
3.4.4 Application to Outlier Detection
3.4.5 Application to Multimodal Embeddings
3.4.6 Benefits of Autoencoders
3.5 Recommender Systems
3.6 Text Embedding with Word2vec
3.6.1 Neural Embedding with Continuous Bag of Words
3.6.2 Neural Embedding with Skip-Gram Model
3.6.3 Word2vec (SGNS) is Logistic Matrix Factorization
3.7 Simple Neural Architectures for Graph Embeddings
3.7.1 Handling Arbitrary Edge Counts
3.7.2 Beyond One-Hop Structural Models
3.7.3 Multinomial Model
3.8 Summary
3.9 Bibliographic Notes and Software Resources
3.10 Exercises
4 Deep Learning: Principles and Training Algorithms
4.1 Introduction
4.2 Why Is Depth Beneficial?
4.2.1 Hierarchical Feature Engineering: How Depth Reveals Rich Structure
4.3 Why Is Training Deep Networks Hard?
4.3.1 Geometric Understanding of the Effect of Gradient Ratios
4.3.2 The Vanishing and Exploding Gradient Problems
4.3.3 Cliffs and Valleys
4.3.4 Convergence Problems with Depth
4.3.5 Local Minima
4.4 Depth-Friendly Neural Architectures
4.4.1 Activation Function Choice
4.4.2 Dying Neurons and “Brain Damage”
4.4.2.1 Leaky ReLU
4.4.2.2 Maxout Networks
4.4.3 Using Skip Connections
4.5 Depth-Friendly Gradient-Descent Strategies
4.5.1 Importance of Preprocessing and Initialization
4.5.2 Momentum-Based Learning
4.5.3 Nesterov Momentum
4.5.4 Parameter-Specific Learning Rates
4.5.4.1 AdaGrad
4.5.4.2 RMSProp
4.5.4.3 AdaDelta
4.5.5 Combining Parameter-Specific Learning and Momentum
4.5.5.1 RMSProp with Nesterov Momentum
4.5.5.2 Adam
4.5.6 Gradient Clipping
4.5.7 Polyak Averaging
4.6 Second-Order Derivatives: The Newton Method
4.6.1 Example: Newton Method in the Quadratic Bowl
4.6.2 Example: Newton Method in a Non-Quadratic Function
4.6.3 The Saddle-Point Problem with Second-Order Methods
4.7 Fast Approximations of Newton Method
4.7.1 Conjugate Gradient Method
4.7.2 Quasi-Newton Methods and BFGS
4.8 Batch Normalization
4.9 Practical Tricks for Acceleration and Compression
4.9.1 GPU Acceleration
4.9.2 Parallel and Distributed Implementations
4.9.3 Algorithmic Tricks for Model Compression
4.10 Summary
4.11 Bibliographic Notes and Software Resources
4.12 Exercises
5 Teaching Deep Learners to Generalize
5.1 Introduction
5.1.1 Example: Linear Regression
5.1.2 Example: Polynomial Regression
5.2 The Bias-Variance Trade-Off
5.3 Generalization Issues in Model Tuning and Evaluation
5.3.1 Evaluating with Hold-Out and Cross-Validation
5.3.2 Issues with Training at Scale
5.3.3 How to Detect Need to Collect More Data
5.4 Penalty-Based Regularization
5.4.1 Connections with Noise Injection
5.4.2 L1-Regularization
5.4.3 L1- or L2-Regularization?
5.4.4 Penalizing Hidden Units: Learning Sparse Representations
5.5 Ensemble Methods
5.5.1 Bagging and Subsampling
5.5.2 Parametric Model Selection and Averaging
5.5.3 Randomized Connection Dropping
5.5.4 Dropout
5.5.5 Data Perturbation Ensembles
5.6 Early Stopping
5.6.1 Understanding Early Stopping from the Variance Perspective
5.7 Unsupervised Pretraining
5.7.1 Variations of Unsupervised Pretraining
5.7.2 What About Supervised Pretraining?
5.8 Continuation and Curriculum Learning
5.9 Parameter Sharing
5.10 Regularization in Unsupervised Applications
5.10.1 When the Hidden Layer is Broader than the Input Layer
5.10.1.1 Sparse Feature Learning
5.10.2 Noise Injection: De-noising Autoencoders
5.10.3 Gradient-Based Penalization: Contractive Autoencoders
5.10.4 Hidden Probabilistic Structure: Variational Autoencoders
5.10.4.1 Reconstruction and Generative Sampling
5.10.4.2 Conditional Variational Autoencoders
5.10.4.3 Relationship with Generative Adversarial Networks
5.11 Summary
5.12 Bibliographic Notes and Software Resources
5.13 Exercises
6 Radial Basis Function Networks
6.1 Introduction
6.2 Training an RBF Network
6.2.1 Training the Hidden Layer
6.2.2 Training the Output Layer
6.2.3 Iterative Construction of Hidden Layer
6.2.4 Fully Supervised Learning of Hidden Layer
6.3 Variations and Special Cases of RBF Networks
6.3.1 Classification with Perceptron Criterion
6.3.2 Classification with Hinge Loss
6.3.3 Example of Linear Separability Promoted by RBF
6.3.4 Application to Interpolation
6.4 Relationship with Kernel Methods
6.4.1 Kernel Regression Is a Special Case of RBF Networks
6.4.2 Kernel SVM Is a Special Case of RBF Networks
6.5 Summary
6.6 Bibliographic Notes and Software Resources
6.7 Exercises
7 Restricted Boltzmann Machines
7.1 Introduction
7.2 Hopfield Networks
7.2.1 Training a Hopfield Network
7.2.2 Building a Toy Recommender and Its Limitations
7.2.3 Increasing the Expressive Power of the Hopfield Network
7.3 The Boltzmann Machine
7.3.1 How a Boltzmann Machine Generates Data
7.3.2 Learning the Weights of a Boltzmann Machine
7.4 Restricted Boltzmann Machines
7.4.1 Training the RBM
7.4.2 Contrastive Divergence Algorithm
7.5 Applications of Restricted Boltzmann Machines
7.5.1 Dimensionality Reduction and Data Reconstruction
7.5.2 RBMs for Collaborative Filtering
7.5.3 Using RBMs for Classification
7.5.4 Topic Models with RBMs
7.5.5 RBMs for Machine Learning with Multimodal Data
7.6 Using RBMs beyond Binary Data Types
7.7 Stacking Restricted Boltzmann Machines
7.7.1 Unsupervised Learning
7.7.2 Supervised Learning
7.7.3 Deep Boltzmann Machines and Deep Belief Networks
7.8 Summary
7.9 Bibliographic Notes and Software Resources
7.10 Exercises
8 Recurrent Neural Networks
8.1 Introduction
8.2 The Architecture of Recurrent Neural Networks
8.2.1 Language Modeling Example of RNN
8.2.2 Backpropagation Through Time
8.2.3 Bidirectional Recurrent Networks
8.2.4 Multilayer Recurrent Networks
8.3 The Challenges of Training Recurrent Networks
8.3.1 Layer Normalization
8.4 Echo-State Networks
8.5 Long Short-Term Memory (LSTM)
8.6 Gated Recurrent Units (GRUs)
8.7 Applications of Recurrent Neural Networks
8.7.1 Contextualized Word Embeddings with ELMo
8.7.2 Application to Automatic Image Captioning
8.7.3 Sequence-to-Sequence Learning and Machine Translation
8.7.4 Application to Sentence-Level Classification
8.7.5 Token-Level Classification with Linguistic Features
8.7.6 Time-Series Forecasting and Prediction
8.7.7 Temporal Recommender Systems
8.7.8 Secondary Protein Structure Prediction
8.7.9 End-to-End Speech Recognition
8.7.10 Handwriting Recognition
8.8 Summary
8.9 Bibliographic Notes and Software Resources
8.10 Exercises
9 Convolutional Neural Networks
9.1 Introduction
9.1.1 Historical Perspective and Biological Inspiration
9.1.2 Broader Observations about Convolutional Neural Networks
9.2 The Basic Structure of a Convolutional Network
9.2.1 Padding
9.2.2 Strides
9.2.3 The ReLU Layer
9.2.4 Pooling
9.2.5 Fully Connected Layers
9.2.6 The Interleaving between Layers
9.2.7 Hierarchical Feature Engineering
9.3 Training a Convolutional Network
9.3.1 Backpropagating Through Convolutions
9.3.2 Backpropagation as Convolution with Inverted/TransposedFilter
9.3.3 Convolution/Backpropagation as Matrix Multiplications
9.3.4 Data Augmentation
9.4 Case Studies of Convolutional Architectures
9.4.1 AlexNet
9.4.2 ZFNet
9.4.3 VGG
9.4.4 GoogLeNet
9.4.5 ResNet
9.4.6 Squeeze-and-Excitation Networks (SENets)
9.4.7 The Effects of Depth
9.4.8 Pretrained Models
9.5 Visualization and Unsupervised Learning
9.5.1 Visualizing the Features of a Trained Network
9.5.2 Convolutional Autoencoders
9.6 Applications of Convolutional Networks
9.6.1 Content-Based Image Retrieval
9.6.2 Object Localization
9.6.3 Object Detection
9.6.4 Natural Language and Sequence Learning with TextCNN
9.6.5 Video Classification
9.7 Summary
9.8 Bibliographic Notes and Software Resources
9.9 Exercises
10 Graph Neural Networks
10.1 Introduction
10.2 Node Embeddings with ConventionalArchitectures
10.2.1 Adjacency Matrix Representation and Feature Engineering
10.3 Graph Neural Networks: The General Framework
10.3.1 The Neighborhood Function
10.3.2 Graph Convolution Function
10.3.3 GraphSAGE
10.3.4 Handling Edge Weights
10.3.5 Handling New Vertices
10.3.6 Handling Relational Networks
10.3.7 Directed Graphs
10.3.8 Gated Graph Neural Networks
10.3.9 Comparison with Image Convolutional Networks
10.4 Backpropagation in Graph Neural Networks
10.5 Beyond Nodes: Generating Graph-LevelModels
10.6 Applications of Graph Neural Networks
10.7 Summary
10.8 Bibliographic Notes and Software Resources
10.9 Exercises
11 Deep Reinforcement Learning
11.1 Introduction
11.2 Stateless Algorithms: Multi-Armed Bandits
11.3 The Basic Framework of Reinforcement Learning
11.4 Monte Carlo Sampling
11.4.1 Monte Carlo Sampling Algorithm
11.4.2 Monte Carlo Rollouts with Function Approximators
11.5 Bootstrapping for Value Function Learning
11.5.1 Q-Learning
11.5.2 Deep Learning Models as Function Approximators
11.5.3 Example: Neural Network Specifics for Video Game Setting
11.5.4 On-Policy versus Off-Policy Methods: SARSA
11.5.5 Modeling States versus State-Action Pairs
11.6 Policy Gradient Methods
11.6.1 Finite Difference Methods
11.6.2 Likelihood Ratio Methods
11.6.3 Actor-Critic Methods
11.6.4 Continuous Action Spaces
11.7 Monte Carlo Tree Search
11.8 Case Studies
11.8.1 AlphaGo and AlphaZero for Go and Chess
11.8.2 Self-Learning Robots
11.8.2.1 Deep Learning of Locomotion Skills
11.8.2.2 Deep Learning of Visuomotor Skills
11.8.3 Building Conversational Systems: Deep Learning for Chatbots
11.8.4 Self-Driving Cars
11.8.5 Neural Architecture Search with Reinforcement Learning
11.9 Practical Challenges Associated with Safety
11.10 Summary
11.11 Bibliographic Notes and Software Resources
11.12 Exercises
12 Advanced Topics in Deep Learning
12.1 Introduction
12.2 Attention Mechanisms
12.2.1 Recurrent Models of Visual Attention
12.2.2 Attention Mechanisms for Image Captioning
12.2.3 Soft Image Attention with Spatial Transformer
12.2.4 Attention Mechanisms for Machine Translation
12.2.5 Transformer Networks
12.2.5.1 How Self Attention Helps
12.2.5.2 The Self-Attention Module
12.2.5.3 Incorporating Positional Information
12.2.5.4 The Sequence-to-Sequence Transformer
12.2.5.5 Multihead Attention
12.2.6 Transformer-Based Pre-trained Language Models
12.2.6.1 GPT-n
12.2.6.2 BERT
12.2.6.3 T5
12.2.7 Vision Transformer (ViT)
12.2.8 Attention Mechanisms in Graphs
12.3 Neural Turing Machines
12.4 Adversarial Deep Learning
12.5 Generative Adversarial Networks (GANs)
12.5.1 Training a Generative Adversarial Network
12.5.2 Comparison with Variational Autoencoder
12.5.3 Using GANs for Generating Image Data
12.5.4 Conditional Generative Adversarial Networks
12.6 Competitive Learning
12.6.1 Vector Quantization
12.6.2 Kohonen Self-Organizing Map
12.7 Limitations of Neural Networks
12.7.1 An Aspirational Goal: Few Shot Learning
12.7.2 An Aspirational Goal: Energy-Efficient Learning
12.8 Summary
12.9 Bibliographic Notes and Software Resources
12.10 Exercises
Bibliography
Index
备用描述
This book covers both classical and modern models in deep learning. The primary focus is on the theory and algorithms of deep learning. The theory and algorithms of neural networks are particularly important for understanding important concepts, so that one can understand the important design concepts of neural architectures in different applications. Why do neural networks work? When do they work better than off-the-shelf machine-learning models? When is depth useful? Why is training neural networks so hard? What are the pitfalls? The book is also rich in discussing different applications in order to give the practitioner a flavor of how neural architectures are designed for different types of problems. Deep learning methods for various data domains, such as text, images, and graphs are presented in detail. The chapters of this book span three categories: The basics of neural networks: The backpropagation algorithm is discussed in Chapter 2.Many traditional machine learning models can be understood as special cases of neural networks. Chapter 3 explores the connections between traditional machine learning and neural networks. Support vector machines, linear/logistic regression, singular value decomposition, matrix factorization, and recommender systems are shown to be special cases of neural networks. Fundamentals of neural networks: A detailed discussion of training and regularization is provided in Chapters 4 and 5. Chapters 6 and 7 present radial-basis function (RBF) networks and restricted Boltzmann machines. Advanced topics in neural networks: Chapters 8, 9, and 10 discuss recurrent neural networks, convolutional neural networks, and graph neural networks. Several advanced topics like deep reinforcement learning, attention mechanisms, transformer networks, Kohonen self-organizing maps, and generative adversarial networks are introduced in Chapters 11 and 12. The textbook is written for graduate students and upper under graduate level students. Researchers and practitioners working within this related field will want to purchase this as well.Where possible, an application-centric view is highlighted in order to provide an understanding of the practical uses of each class of techniques.The second edition is substantially reorganized and expanded with separate chapters on backpropagation and graph neural networks. Many chapters have been significantly revised over the first edition.Greater focus is placed on modern deep learning ideas such as attention mechanisms, transformers, and pre-trained language models.
开源日期
2023-06-30
We strongly recommend that you support the author by buying or donating on their personal website, or borrowing in your local library.
🚀 快速下载
成为会员以支持书籍、论文等的长期保存。为了感谢您对我们的支持,您将获得高速下载权益。❤️
🐢 低速下载
由可信的合作方提供。 更多信息请参见常见问题解答。 (可能需要验证浏览器——无限次下载!)
- 低速服务器(合作方提供) #1 (稍快但需要排队)
- 低速服务器(合作方提供) #2 (稍快但需要排队)
- 低速服务器(合作方提供) #3 (稍快但需要排队)
- 低速服务器(合作方提供) #4 (稍快但需要排队)
- 低速服务器(合作方提供) #5 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #6 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #7 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #8 (无需排队,但可能非常慢)
- 低速服务器(合作方提供) #9 (无需排队,但可能非常慢)
- 下载后: 在我们的查看器中打开
所有选项下载的文件都相同,应该可以安全使用。即使这样,从互联网下载文件时始终要小心。例如,确保您的设备更新及时。
外部下载
-
对于大文件,我们建议使用下载管理器以防止中断。
推荐的下载管理器:JDownloader -
您将需要一个电子书或 PDF 阅读器来打开文件,具体取决于文件格式。
推荐的电子书阅读器:Anna的档案在线查看器、ReadEra和Calibre -
使用在线工具进行格式转换。
推荐的转换工具:CloudConvert和PrintFriendly -
您可以将 PDF 和 EPUB 文件发送到您的 Kindle 或 Kobo 电子阅读器。
推荐的工具:亚马逊的“发送到 Kindle”和djazz 的“发送到 Kobo/Kindle” -
支持作者和图书馆
✍️ 如果您喜欢这个并且能够负担得起,请考虑购买原版,或直接支持作者。
📚 如果您当地的图书馆有这本书,请考虑在那里免费借阅。
下面的文字仅以英文继续。
总下载量:
“文件的MD5”是根据文件内容计算出的哈希值,并且基于该内容具有相当的唯一性。我们这里索引的所有影子图书馆都主要使用MD5来标识文件。
一个文件可能会出现在多个影子图书馆中。有关我们编译的各种数据集的信息,请参见数据集页面。
有关此文件的详细信息,请查看其JSON 文件。 Live/debug JSON version. Live/debug page.