Transformers

Topics Covered:

Encoder Decoder
- Architecture of Encoder and Decoders
- Encoder Forward Pass
- Decoder Forward Pass
- Improvements to make in very basic encoder decoder architecture
  - 1. using embeddings
    2. deep lstm's
    3. reversing the input
Attention Mechanism
- Stating the main Problems with Vanilla Encoder Decoder Architecture
- Deep dive on attention mechanism
- Bahdanau Attention VS Luong Attention

The main research paper used for this : [https://arxiv.org/pdf/1706.03762]

Self Attention
- Introduction to Self Attention
- Explaining the need for self attention
- converting the embeddings to context aware embeddings
- Query - Key - Value concept
- Parallel Operations
- Explaination behind Scaled Dot Product
- Gemoetric Intuition
- Why named 'Self Attention'
- The problem with Self Attention - Why do we need multi head attention
Multi Head Attention
- In depth explaination with example
- Matrix Calculation
- Problem with Self Attention - in context of sequence
Positional Encoding
- Gradually explaining positional encoding concept from scratch using examples
- getting positional encoding for a vector solved
- THe Linear relationship property
Layer Normalization
- Normalization
  - Where to apply?
  - Benefits of normalization
  - Revisiting Batch Normalization
  - Reasoning behind why not using batch normalization
  - Explaining Layer Normalization
Transformer Architecture
- Encoder
  - Explaining all the parts step by step
    - inputs -> tokens -> embeddings -> positional encoding -> multi head attention -> add and normalization -> feed forward network -> add and normalization -> final output
    - repeat from multi head attentino to add to final output 6 times -> final output
- Masked Self Attention
  - During Training
    - Sequential (time series)
    - Parallel
  - During Inference
- Cross Attention
- Transformer Decoder Architecture while Training -> Non-AutoRegressive
- Transformer Decoder ARchitecture while Inference -> AutoRegressive

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
runs/tmodel		runs/tmodel
transformer_code_from_scratch		transformer_code_from_scratch
Encoder Decoder + Attention Mechanism.pdf		Encoder Decoder + Attention Mechanism.pdf
README.md		README.md
Transformer Implementation walkthrough.PDF		Transformer Implementation walkthrough.PDF
Transformers.pdf		Transformers.pdf
tokenizer_en.json		tokenizer_en.json
tokenizer_it.json		tokenizer_it.json

Provide feedback