Current curriculum covers topic from basic NLP techinques to the most modern ones, that may be helpful for custom training of LLMs:
- NLP Basics: tokenization, text preprocessing, text representations
- Text & Language Models: embeddings, n-gram models, RNNs, LSTMs, seq2seq, attention
- Transformers & LLMs: Transformer, pre-training (MLM/CLM), prompting, fine-tuning, PEFT
- Scaling & Optimization: : distributed training, MoE, KV-cache, Flash Attention, efficient inference, quantization
- Retrieval & Agents: Information Retrieval, RAG, agent-based systems
- Post-training: alignment, RLHF, DPO
- German Gritsai @grgera
- Anastasiia Vozniuk @natriistorm
- Ildar Khabutdinov @depinwhite
| Week # | Date | Topic | Lecture | Seminar | Additional | Recording |
|---|---|---|---|---|---|---|
| 1 | February 10 | Intro to NLP & Tokenization | slides | ipynb | materials | TBA |
TBA
Final mark = 0.3 × (oral answer grade) + 0.7 × (average score for practical assignments)
Both oral exam and homeworks are blocking parts, you need to pass both parts to pass the course.
- Probability Theory + Statistics
- Machine Learning
- Python Python guide
- Basic knowledge on NLP
We expect students to know basics of Natural Language Processing, as the course focuses on more advanced topics. When you unsure about the basics, we recommned to read these lectures / materials:
