..

Quickstart on LLMs

  1. Quickstart on LLMs
    1. Understand the lay of the land
    2. Learn how to use LLMs
    3. Learn how GPUs execute code

Updated on May, 25

Quickstart on LLMs

This simple index is aimed at helping one get started in the domain of LLMs. Obviously, the area is continuously evolving, so this index may become irrelevant soon. The most basic requisite is an understanding of Neural networks.

Understand the lay of the land

The first goal should be to understand what LLMs are, why they follow certain structures (decoder-only transformer networks), and how they can be used.

  1. Research papers
    1. Attention is all you need
    2. BERT
    3. Language Models are Few-Shot Learners
    4. Emergent Abilities of Large Language Models
    5. Improving language understanding with unsupervised learning
  2. State of GPT by Andrej Karpathy
  3. Don’t teach. Incentivize
  4. Stanford CS229 Building Large Language Models
  5. The illustrated Transformer
  6. The illustrated GPT-2
  7. How GPT3 works
  8. Building LLM applications for production by Chip Huyen
  9. RLHF: Reinforcement Learning from Human Feedback by Chip Huyen

Learn how to use LLMs

The second goal should be to understand how to use LLMs. Both locally and cloud-hosted ones. What does it take to run an LLM?

  1. A Hackers’ Guide to Language Models by Jeremy Howard
  2. Optimizing your LLM in production
  3. KV Cache brief explainer
  4. Understanding Llama2: KV Cache, Grouped Query Attention, Rotary Embedding and More
  5. LoRA explained (and a bit about precision and quantization)
  6. Paged Attention
  7. How continuous batching enables 23x throughput in LLM inference while reducing p50 latency
  8. How is Llama cpp possible
  9. Transformer Inference Arithmetic

Learn how GPUs execute code

  1. Stanford CS336 Lecture on GPUs
    1. This video is one of the best primer I have seen on understanding GPUs

From here, it depends on which area you are specifically interested in. Broadly (according to me), there are 3 tracks and requires sufficient time to build mastery.

  1. Inference optimization, training workload optimizations
  2. Pre-training and post-training - Can be for different modalities
  3. RL