Neural networks

One lesson I learned during graduate school was the importance of understanding things from first principles. So for learning about neural networks, I really enjoyed reading Grokking Deep Learning, which implements neural networks using only numpy, keeping the reader from being bogged down with implementation details. Another excellent introduction is this video from Grant Sanderson*, which also goes into mathematical details. If you prefer watching videos rather than reading, Josh Starmer’s StatQuest Neural Networks playlist is also excellent. After that, Michael Nielsen’s Neural Networks and Deep Learning is a good followup.

General concepts

Like other machine learning systems, neural networks aim to return a set of outputs given some inputs: classes for a image classification task, phrases in the target language for translation task, etc. What makes neural networks unique is the scale of the inputs and outputs and thus the difficulty of training: modern Large Language Models (LLMs) are trained on billions to hundreds of billions of parameters. As a result, progress in Artificial Intelligence (now almost always synonymous with large neural networks) has often involved making the training process more efficient.

Accessibility

Since they are very large, commercial-grade LLMs are generally cloud-hosted. Another consideration is licensing: since they require enormous amounts of time, infrastracture, technical expertise and electrical energy to train and run, commercial LLMs are almost always closed-source. However, smaller open-source models (though still on the order of billions of parameters) can be obtained and run locally, such as through Ollama.

A bit of history

The idea of artificial neural networks has been around since at least 1940s in the form of Hebbian theory, though modern artificial neural networks bear only vague conceptual resemblance to their biological analogues. Neural networks gained a lot of attention after Geoffrey Hinton’s famous 1986 paper on backpropagation, allowing for quick optimization of gradients, thus enabling multiple parameters to be learned efficiently.

The availability of big datasets and development of GPGPUss accelerated the use of neural networks in many areas, such as CNNs in image processing. Another big advance came in 2017 with the Attention Is All You Need paper which described Transformer models, enabling the development of GPTs.

Network tuning

One of the interesting and frustrating problems in modeling complex data is overfitting. The solution often involves a combination of picking the right tools, then knowing how to interpret their output. In a neural network context once you’ve picked the appropriate architecture, the right activation functions, you may need to implement dropout.

Neural Smithing covers some network tuning issues.

*_Vlogging on YouTube as 3Blue1Brown, Sanderson has an excellent series where he explained math concepts visually.

Written on January 20, 2020