Training LLMs

Generally, I don’t deal with training or deeper tech, but might as well keep some notes on it.

Training

RAG Grounding

Fine Tuning

  • Example using Lllama 3
  • LORA - (Locally Optimized Rewiring Adjustment) is an algorithm used in the fine-tuning of neural network models, especially transformers. It introduces a low-rank decomposition of the weight matrices in transformers. By modifying only a small part of the model parameters during fine-tuning, LoRA allows for retaining most of the pre-trained model’s knowledge while adapting to new tasks more efficiently. This approach can significantly reduce the number of parameters that need to be updated, leading to faster and more resource-efficient fine-tuning.
    • Basically - stick a narrow matrix in front of the model, and only fine tune that.
  • What is GGUF? (GPT-Generated Unified Format)?
    • file formats used for storing models. GGUF is GGML.vNext.
  • What is Quantization (Q4_0, IQ2_XXS)
  • RLHF - Reinforcement learning human feedback
  • RLAIF - Reinforcement learning AI feedback (aka LLM as judge)
  • DPO - Direct Preference Optimization - I don’t quite understand but it seems like the RLHF v.Next with more simplicity. Phase still involves putting a human on the end who is given a choice of A/B and decides which they like better.
  • Alpaca - A dataset for human preferences used in the early fine tuning efforts. What’s clever is they used GPT-4 to generate the dataset to make it much cheaper. I’m assuming soon there will be even better sets.

Style transfer

Model Merging

Awesome blog

Fine-Tune Your Own Llama 2 Model in a Colab Notebook

Quantization

Not quite training but need to stuff this somewhere…

https://mlabonne.github.io/blog/posts/Introduction_to_Weight_Quantization.html