Zagora
Train a lightweight QLoRA adapter on your dataset using SFT (Supervised Fine-Tuning).
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"instruction": "Your prompt here", "response": "Expected response"}
{"text": "Complete formatted text as a single string"}
{"chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [...], "ref_chosen_logps": -1.5, "ref_rejected_logps": -3.2}