Show HN: Finetune Llama-3 2x faster in a Colab notebook

danielhanchen · on April 19, 2024

Hey! If you're interested to try out finetuning Llama-3 8B (Meta's new 15 trillion token!! model), made a Colab to finetune Llama-3 2x faster and use 60% less VRAM, and supports 4x longer contexts than HF+FA2.

Also uploaded Llama-3 70b pre-quantized 4bit so you can download it 4x faster! unsloth/llama-3-70b-bnb-4bit

syntaxing · on April 19, 2024

Hard to read on mobile, but if you don’t mind me asking, what’s the catch? Is there a penalty to training faster and less VRAM?

danielhanchen · on April 19, 2024

No catch at all!! There's 0 approximations, so everything is exact! We just have a custom backprop engine, rewrite everything into OpenAI's Triton language, and do all the differentiations and maths ourselves :)

Unsloth can fit 4x longer context windows than HF + Flash Attention 2 as well with our latest long context update, so 30% less VRAM use, at the expense of slightly +1.9% overhead.

popinman322 · on April 19, 2024

Any news on when Unsloth's parallel full tuning will be available?

danielhanchen · on April 19, 2024

Full training hmm - for now finetuning in 16bit and 4bit are supported - if people are interested can work on it!

danielhanchen · on April 21, 2024

I have also have a Kaggle notebook: https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-...

Kaggle provides Tesla T4s 30 hours for free per week!!