Advanced Text-As-Data - Winter School - Iesp UERJ


Days 3 and 4: Transformers, Pre-Trained Models, and LLMs

Author

Professor: Sebastián Vallejo

Transformers, Pre-Trained Models, and LLMs

What to Expect?

In the last two days you learned some heavy matrix algebra, dot-product, sigmoid-function, mostly-black-magic-believe-in-me content. Now we will enter the paradox of complexity: the material we will learn today is so complex that it is best understood through intuition and experience.

The material will be divided into three parts:

  • Transformers: lecture-heavy introduction to the Transformers architecture.
  • Pre-trained models: applying the Transformers architecture to pre-trained models and implementing it (code).
  • Large Language Models (LLMs): learn about Large Language models (LLMs), their applications and limitations, and how to implement LLMs (code).

Material

Here is where you can find all the lectures and code we will be using during the next two days:

  1. Slides for “Introduction to Transformers”: link.
  2. Slides for “Pre-Trained Transformers-Based Models”: link.
  3. Code for:
    1. Using Fine-Tuned Models: link.
    2. Fine-Tuning a Model: link.
    3. Further Pre-Training a Model: link.

  1. Slides for “A Primer on Large Language Models”: link.
  2. Code for:
    1. Using LLMs: link.
    2. LLMs as Annotators: link.