Training LLMs
Generally, I don’t deal with training or deeper tech, but might as well keep some notes on it.
Training
RAG Grounding
Fine Tuning
- Example using Lllama 3
- LORA - (Locally Optimized Rewiring Adjustment) is an algorithm used in the fine-tuning of neural network models, especially transformers. It introduces a low-rank decomposition of the weight matrices in transformers. By modifying only a small part of the model parameters during fine-tuning, LoRA allows for retaining most of the pre-trained model’s knowledge while adapting to new tasks more efficiently. This approach can significantly reduce the number of parameters that need to be updated, leading to faster and more resource-efficient fine-tuning.
- Basically - stick a narrow matrix in front of the model, and only fine tune that.
- What is GGUF? (GPT-Generated Unified Format)?
- file formats used for storing models. GGUF is GGML.vNext.
- What is Quantization (Q4_0, IQ2_XXS)
- A method to compress the LLM from each node being 16 bits to maybe 4 bits, or less.
- A scorecard of quantization compression methods
- This is lossy compression, and you need to eval how well it works? A common approach is to compare the difference of the answers via embeddings.
- RLHF - Reinforcement learning human feedback
- RLAIF - Reinforcement learning AI feedback (aka LLM as judge)
- DPO - Direct Preference Optimization - I don’t quite understand but it seems like the RLHF v.Next with more simplicity. Phase still involves putting a human on the end who is given a choice of A/B and decides which they like better.
- Alpaca - A dataset for human preferences used in the early fine tuning efforts. What’s clever is they used GPT-4 to generate the dataset to make it much cheaper. I’m assuming soon there will be even better sets.
Style transfer
Model Merging
Awesome blog
Fine-Tune Your Own Llama 2 Model in a Colab Notebook
Quantization
Not quite training but need to stuff this somewhere…
https://mlabonne.github.io/blog/posts/Introduction_to_Weight_Quantization.html