Advanced Text-As-Data - Winter School - Iesp UERJ
Days 3 and 4: Transformers, Pre-Trained Models, and LLMs
Transformers, Pre-Trained Models, and LLMs
What to Expect?
In the last two days you learned some heavy matrix algebra, dot-product, sigmoid-function, mostly-black-magic-believe-in-me content. Now we will enter the paradox of complexity: the material we will learn today is so complex that it is best understood through intuition and experience.
The material will be divided into three parts:
- Transformers: lecture-heavy introduction to the Transformers architecture.
- Pre-trained models: applying the Transformers architecture to pre-trained models and implementing it (code).
- Large Language Models (LLMs): learn about Large Language models (LLMs), their applications and limitations, and how to implement LLMs (code).
Material
Here is where you can find all the lectures and code we will be using during the next two days:
- Slides for “Introduction to Transformers”: link.
- Slides for “Pre-Trained Transformers-Based Models”: link.
- Code for: