Skip to content

LLM Workshop - Introduction

Overview

  • History of AI
  • Compute and AI
  • Ethics and concerns
  • Introducing the workshop hardware

History of AI

  • How has AI developed over time?

AI as a term is coined

We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. [...] An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

Early NLP – 1956-1966

Rule based NLP pipeline

  • Rule based, lots of manual effort
  • Lots of LISP
  • Used for:
    • Information retrieval
    • Basic chat-bots e.g. Eliza
    • Limited translation systems

AI Winter I – 1974-1980

  • NLP Winter started even earlier

Wintery landscape

Statistical NLP – 1980s

Statistical NLP pipeline

  • Using statistics of the corpus
  • Bag-of-words, N-grams

AI Winter II – 1990s, early 2000s

  • The word "AI" is not a buzzword
  • Research continues under other names

Deep Learning – 2012-

Embedding words for Deep Learning

Deep Learning – Transformer architecture

No recurrent connections, thus more parallelizable.

Transformer architecture

Deep Learning – Attention mechanism

  • Scaled Dot-Product Attention $$ \mathrm{Attention}(V, K, Q) = \mathrm{softmax}\left(\frac{QK^\top}{\sqrt{d_K}}\right)V $$
  • Cross-attention: \(Q = X_\mathrm{dec} W_Q\), \(K = X_\mathrm{enc} W_K\) and \(V = X_\mathrm{enc} W_V\)
  • Self-attention: Same \(X\) used for all matrices
  • In decoder, self-attention masks future tokens
  • Autoregressive unimodal LLMs usually decoder only

Compute and AI

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.

Compute use over time

Compute and performance

What was new with ChatGPT?

  • Base models are pure language models
  • Chat models are:
    • Instruct tuned (supervised)
    • Reinforcement Learning with Human Feedback

RLHF

RLHF diagram

  • Enables RL when no clear scoring function available
  • Relatively little human input needed

Further scaling

3 scaling laws, pre, post and test-time

Ethics and issues

  • Societal concerns
  • Misuse concerns
  • Misalignment concerns

Societal concerns

  • Perpetuated bias
  • Confident falsehood and sycophancy
  • Copyright & IP issues
  • Distribution of wealth and the job market
  • Climate footprint

Misuse concerns

Misalignment

  • RLHF is only a step in the right direction
  • Goodhart's law
  • Misaligned leadership
    • What future are they aiming for?
  • Superintelligence If anyone builds it everyone dies, book cover

Introduction to the hardware for this workshop

Compute clusters

Generic compute cluster diagram

The compute node

Generic compute node with GPUs

  • Speed-up by parallelization
  • Feeding data to GPU memory (VRAM) often bottleneck

Software

  • Default software environment intentionally sparse
  • Use modules or containers to run software
  • (Follow our recommendations when installing Python packages)
  • We will use containers in this course

Software – containers

  • Apptainer containers
  • A single file for your software and all dependencies
  • (Building containers)
  • Running software in a container
    apptainer exec your_container.sif python your_code.py
    

SLURM

  • Batch queueing system
  • Allocates resources in a fair and effective manner
  • Resources are finite so expect queue times occasionally

SLURM workflow – Preparing job

Work on login node

SLURM workflow – Submitting job to queue

Submitting a job

SLURM workflow – Job starts

Job starts

Partial command overview

  • View queued and running jobs squeue [--me]
  • View previous jobs sacct
  • Submit jobs sbatch <JOBSCRIPT>
  • Cancel queued or running jobs scancel <JOBID>
  • More complete overview at SLURM documentation and Alvis intro

Summary of Introduction

  • Compute a key component to the success of LLMs
  • Use and development of AI is not without its issues
  • The hardware you can access
  • Containers for accessing software
  • SLURM batch queue system for running things on the cluster

Excercise

  1. Finish prerequisites
  2. Navigate to your instance of LLM-workshop
  3. Do a git pull to get the latest changes
  4. Launch an interactive session through https://alvis.c3se.chalmers.se/
  5. Modify and then submit LLM-workshop/excercises/introduction/hello-llms.sh TODO