Processing math: 100%
\usepackagelistings\usepackagecaption\usepackagearray\usepackagetikz\usepackagebm\usepackagetimes,amsfonts,amsmath,amssymb\lstsetlanguage=Python,escapechar=&,basicstyle=\footnotesize\newcounterexcercise\newcounterexcercisepart\definecolorpennbluecmyk1,0.65,0,0.30\definecolorpennredcmyk0,1,0.65,0.34\definecolormygreenrgb0.10,0.50,0.10

Lab 6: Language Models

Download the files for Lab 5A from the following links:

We recommend that you use Google Colab, as training will be faster on the GPU.

To enable the GPU on Colab, go to Edit / Notebook settings / Hardware accelerator / select T4 GPU


Instructions on how to download and use Jupyter Notebooks can be found here. You can find a static version of the notebook below.


Lab_9

  1. In honor of the best Miranda.}:

    If by your art, my dearest father, you have put the wild waters in this roar, allay them.

    We can, as we illustrate in Figure fig_language_is_a_time_series, parse this sentence as a time series. In this time series the first vector is x0=“If”, the second vector is x1=“by”, the third is x2=“your”, and so on.

    If we interpret language as a time series, we can use a transformer to predict the next word in a sequence as we did in Chapter 5. If we then execute this predictor recursively, we can use it to predict several words in a sequence. This is a strategy for generating language.

    The first challenge to implement this strategy is how to represent words numerically. We do that with word embeddings as we discuss in the following section.