Train & Finetune LLama3 using LLama-Factory

Sep 13, 2024 11:40 PM - 4 months ago 149867

This article will research nan llama factory, released connected 21 March 2024, and study really to fine-tune Llama 3 connected a unreality GPU. For our task, we will usage nan NVIDIA A4000 GPU, considered 1 of nan astir powerful single-slot GPUs, enabling seamless integration into various workstation setups.

Utilizing nan NVIDIA Ampere architecture, nan RTX A4000 integrates 48 second-generation RT Cores, 192 third-generation Tensor Cores, and 6,144 CUDA cores alongside 16GB of graphics representation pinch error-correction codification (ECC); this ensures precise and reliable computing for innovative projects.

Until recently, fine-tuning a ample connection exemplary was a analyzable task chiefly reserved for instrumentality learning and A.I. experts. However, this conception is changing quickly pinch nan ever-evolving section of artificial intelligence. New devices for illustration Llama Factory are emerging, making nan fine-tuning process much accessible and efficient. In addition, 1 tin now usage techniques specified arsenic DPO, ORPO, PPO, and SFT for fine-tuning and exemplary optimization. Furthermore, you tin now efficiently train and fine-tune models specified arsenic Llama, Mistral, Falcon, and more.

Prerequisites

This is an intermediate level tutorial which specifications nan process of finetuning a LLaMA 3 exemplary pinch a demo. We urge each readers are acquainted pinch nan wide functionality of Generative Pretrained Transformers earlier continuing.

To tally nan demo, a sufficiently powerful NVIDIA GPU is required. We urge utilizing an H100.

What is exemplary fine-tuning?

Fine-tuning a exemplary involves adjusting nan parameters of a pre-trained aliases guidelines exemplary that tin beryllium utilized for a circumstantial task aliases dataset, enhancing its capacity and accuracy. This process involves providing nan exemplary pinch caller information and modifying its weights, biases, and definite parameters to minimize nonaccomplishment and cost. By doing so, this caller exemplary tin execute good connected immoderate caller task aliases dataset without starting from scratch, helping to prevention clip and resources.

Typically, erstwhile a caller ample connection exemplary (LLM) is created, it undergoes training connected a ample corpus of textual data, which whitethorn see perchance harmful aliases toxic content. Following nan pre-training aliases first training phase, nan exemplary is fine-tuned pinch information measures, ensuring it avoids generating harmful aliases toxic responses. However, this attack could beryllium better. Nonetheless, nan conception of fine-tuning addresses nan request to accommodate models to circumstantial requirements.

Why usage LLama-Factory?

Enter nan Llama Factory, a instrumentality that facilitates nan businesslike and cost-effective fine-tuning of complete 100 models. Llama Factory streamlines nan process of fine-tuning models, making it accessible and user-friendly. It besides has a hugging look abstraction provided by Hiyouga that tin beryllium utilized to fine-tune nan model.

LLama Board(Huggingface Space)

This abstraction besides supports Lora and GaLore configuration to trim GPU usage. With an easy slider bar, users tin easy alteration parameters specified arsenic drop-out, epochs, batch size, etc. There are besides aggregate dataset options to take from to fine-tune your model. As discussed successful this article, nan Llama Factory supports galore models, including different versions of llama, mistral, and Falcon. It besides supports precocious algorithms for illustration galore, badm, and Lora, offering various features specified arsenic flash attention, positional encoding, and scaling.
Additionally, you tin merge monitoring devices for illustration TensorBoard, VanDB, and MLflow. For faster inference, you tin utilize Gradio and CLI. In essence, nan Llama Factory provides a divers group of options to heighten exemplary capacity and streamline nan fine-tuning process.

LLaMA Board: A Unified Interface for LLaMA Factory

LLaMA Board is simply a user-friendly instrumentality that helps group set and amended Language Model (LLM) capacity without needing to cognize really to code. It’s for illustration a dashboard wherever you tin easy customize really a connection exemplary learns and processes information.

Here are immoderate cardinal features:

  1. Easy Customization: You tin alteration really nan exemplary learns by adjusting settings connected a webpage. The default settings activity good for astir situations. You tin besides spot really your information will look to nan exemplary earlier you start.
  2. Monitoring Progress: As nan exemplary learns, you tin spot updates and graphs showing really good it’s doing. This helps you understand if it’s improving aliases not.
  3. Flexible Testing: You tin cheque really good nan exemplary understands matter by comparing it to known answers aliases by talking to it yourself. This helps you spot if nan exemplary is getting amended astatine knowing language.
  4. Support for Different Languages: LLaMA Board tin activity successful English, Russian, and Chinese, making it useful for group who speak different languages. It’s besides group up to adhd much languages successful nan future.

Fine-tune LLama 3

Let’s log successful to nan platform, prime nan GPU of your choice, and commencement nan notebook. You tin besides click nan nexus successful nan article to thief you commencement nan notebook.

We will commencement by cloning nan repo and installing nan basal libraries,

!git clone https://github.com/hiyouga/LLaMA-Factory.git %cd LLaMA-Factory %ls

Next, we will instal unsloth, which allows america to finetune nan exemplary efficiently. Further, we will instal xformers and bitsandbytes.

!pip instal "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip instal --no-deps xformers==0.0.25 !pip instal .[bitsandbytes] !pip instal 'urllib3<2'

Once everything is installed, we will cheque nan GPU specifications,

!nvidia-smi

image

Next, we will import torch and cheque our CUDA because we are utilizing GPU,

import torch try: assert torch.cuda.is_available() is True except AssertionError: print("Your GPU is not setup!")

We will now import nan dataset which comes pinch nan GitHub repo that we cloned. We tin besides create a civilization dataset and usage that instead.

import json %cd /notebooks/LLaMA-Factory MODEL_NAME = "Llama-3" with open("/notebooks/LLaMA-Factory/data/identity.json", "r", encoding="utf-8") as f: dataset = json.load(f) for sample in dataset: sample["output"] = sample["output"].replace("MODEL_NAME", MODEL_NAME).replace("AUTHOR", "LLaMA Factory") with open("/notebooks/LLaMA-Factory/data/identity.json", "w", encoding="utf-8") as f: json.dump(dataset, f, indent=2, ensure_ascii=False)

Once this is done, we will execute nan codification beneath to make nan Gradio web app nexus for Llama Factory.

%cd /notebooks/LLaMA-Factory !GRADIO_SHARE=1 llamafactory-cli webui

You tin click connected nan generated nationalist nexus to proceed onto nan GUI.

image

  1. Model Selection
    • You tin take immoderate model; here, we take Llama 3 pinch 8 cardinal parameters.
  2. Adapter Configuration
    • You person nan action to specify nan adapter path.
    • Available adapters see LoRa, QLoRa, freeze, aliases full.
    • You tin refresh nan adapter database if needed.
  3. Training Options
    • You tin train nan exemplary utilizing supervised fine-tuning.
    • Alternatively, you tin opt for DPU (Data Processing Unit) aliases PPU (Parallel Processing Unit) if applicable.
  4. Data Set Selection
    • The selected information group is for supervised fine-tuning (SFT).
    • You tin besides take your ain information set.
  5. Hyperparameter Configuration
    • You tin set hyperparameters, specified arsenic nan number of epochs, maximum norm, and maximum sample size.
  6. Laura Configuration
    • Detailed configuration options are disposable for nan LoRa model.
  7. Start Training
    • Once each configurations are set, you tin initiate nan training process by clicking nan “Start” button.

This will commencement nan training.

We will besides commencement nan training and fine-tuning utilizing nan CLI commands. You tin usage nan beneath codification to specify nan parameters.

args = dict( stage="sft", do_train=True, model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", dataset="identity,alpaca_gpt4_en", template="llama3", finetuning_type="lora", lora_target="all", output_dir="llama3_lora", per_device_train_batch_size=2, gradient_accumulation_steps=4, lr_scheduler_type="cosine", logging_steps=10, warmup_ratio=0.1, save_steps=1000, learning_rate=5e-5, num_train_epochs=3.0, max_samples=500, max_grad_norm=1.0, quantization_bit=4, loraplus_lr_ratio=16.0, use_unsloth=True, fp16=True, ) json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)

Next, unfastened a terminal and tally nan beneath command

!llamafactory-cli train train_llama3.json

This will commencement nan training process.

Once nan exemplary training is completed, we tin usage nan exemplary to infer from. Let america effort doing that and cheque really nan exemplary works.

args = dict( model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", finetuning_type="lora", template="llama3", quantization_bit=4, use_unsloth=True, ) json.dump(args, open("infer_llama3.json", "w", encoding="utf-8"), indent=2)

Here, we specify our exemplary pinch nan saved adapter, prime chat templates, and specify user-assistant interactions.

Next, tally nan beneath codification utilizing your terminal.

!llamafactory-cli chat infer_llama3.json

We urge our users to effort Llama-Factory pinch immoderate exemplary and research pinch nan parameters.

Conclusion

Effective fine-tuning has go 1 of nan necessity for ample connection models (LLMs) to accommodate itself for circumstantial tasks. However, it requires immoderate magnitude of effort and is rather challenging sometimes. With nan preamble to LLama-Factory, a broad model that consolidates precocious businesslike training techniques users tin easy customize fine-tuning for complete 100 LLMs without coding requirements.

Many group are now much funny astir ample connection models (LLMs) will thin to get drawn to LLama-Factory to spot if they tin set their ain models. This helps nan open-source organization turn and go much active. LLama-Factory is becoming well-known and has moreover been highlighted successful Awesome Transformers3 arsenic a starring instrumentality for fine-tuning LLMs efficiently.

We dream that this article encourages much developers to usage this model to create LLMs that tin use society. Remember, it’s important to travel nan rules of nan model’s licence erstwhile utilizing LLama-Factory to fine-tune LLMs to forestall immoderate imaginable misuse.

With this we travel to an extremity of this article, we saw really easy it is nowadays to fine-tune immoderate exemplary wrong minutes. We tin besides usage hugging look CLI to push this exemplary to hugging look hub.

References

  • LLAMAFACTORY: Unified Efficient Fine-Tuning of 100+ Language Models
  • Github LLaMA-Factory
More