Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Finetune Llama-3 2x faster in a Colab notebook (colab.research.google.com)
45 points by danielhanchen on April 19, 2024 | hide | past | favorite | 6 comments


Hey! If you're interested to try out finetuning Llama-3 8B (Meta's new 15 trillion token!! model), made a Colab to finetune Llama-3 2x faster and use 60% less VRAM, and supports 4x longer contexts than HF+FA2.

Also uploaded Llama-3 70b pre-quantized 4bit so you can download it 4x faster! unsloth/llama-3-70b-bnb-4bit


Hard to read on mobile, but if you don’t mind me asking, what’s the catch? Is there a penalty to training faster and less VRAM?


No catch at all!! There's 0 approximations, so everything is exact! We just have a custom backprop engine, rewrite everything into OpenAI's Triton language, and do all the differentiations and maths ourselves :)

Unsloth can fit 4x longer context windows than HF + Flash Attention 2 as well with our latest long context update, so 30% less VRAM use, at the expense of slightly +1.9% overhead.


Any news on when Unsloth's parallel full tuning will be available?


Full training hmm - for now finetuning in 16bit and 4bit are supported - if people are interested can work on it!


I have also have a Kaggle notebook: https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-...

Kaggle provides Tesla T4s 30 hours for free per week!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: