--- language: - en tags: - llama - instruct - instruction - empirischtech pipeline_tag: text-generation base_model: - meta-llama/Llama-3.1-8B-Instruct license: llama3.1 --- # LLaMa-10b-instruct model card ## Model Details * **Developed by**: [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) * **Backbone Model**: [LLaMA](https://github.com/meta-llama/llama3) * **Language(s)**: English * **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers) * **License**: This model is under a **Non-commercial** Bespoke License and governed by the Meta license. You should only use this repository if you have been granted access to the model by filling out [this form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform), but have either lost your copy of the weights or encountered issues converting them to the Transformers format * **Where to send comments**: Instructions on how to provide feedback or comments on a model can be found by opening an issue in the [Hugging Face community's model repository](https://huggingface.co/upstage/llama-30b-instruct-2048/discussions) * **Contact**: For questions and comments about the model, please email [contact-us](https://chaperoneai.net/contact) ## Training Bigger models, more data, and better hardware have consistently improved deep learning performance. Whether in NLP or computer vision, larger models have led to major breakthroughs. However, most cutting-edge models are still trained from scratch, meaning they start with randomly initialized weights. The problem? Training costs are skyrocketing. To address the escalating computational costs of training large-scale models, various approaches have been proposed. We present our results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models. In this work, we take a step toward realizing such an approach. Specifically, we extend an existing **8B**-parameter model to **10B** parameters by initializing the additional layers with pretrained weights, followed by continued pretraining on a smaller dataset across multiple epochs. Due to budget constraints, we were unable to surpass the foundational model on the **EleutherAI** evaluation benchmark. However, the average scores are very clsoe, demonstrating potential for cost-efficient scaling strategies in large language model development. ## Usage - Tested on A100 80GB - Our model can handle up to 132k input tokens as supported by the Llama-3.1 architecture. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer model_id="empirischtech/Llama-3.1-10B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.float16 ) prompt = "### User:\nEmma feels perfectly fine, yet she still has an appointment at the hospital. What might be the reasons?\n\n### Assistant:\n" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) del inputs["token_type_ids"] streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=1024) output_text = tokenizer.decode(output[0], skip_special_tokens=True) ``` ## Hardware and Software * **Hardware**: We utilized an A100x8 for training our model * **Training Factors**: The model was pretrained using a combination of the [DeepSpeed library](https://github.com/microsoft/DeepSpeed) and the [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) ## Evaluation Results ### Harness Evaluation - The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). The model is evaluated on three benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU` and `IFEval`. The library used is [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness) #### Main Results | Benchmark | **Llama-3.1-8B-Instruct** | **Llama-3.1-10B-Instruct** | |------------|:------------------------:|:------------------------:| | ARC | 55.05 | 52.47 | | HellaSwag | 79.28 | 77.08 | | MMLU-Pro | 40.34 | 33.59 | | IFEval | 59.95 | 54.80 | | **average** | **58.66** | **54.49** | #### Scripts to generate evalution results ```python # install from https://github.com/EleutherAI/lm-evaluation-harness pip install lm-eval>=0.4.7 from lm_eval import evaluator tasks_list = ["arc_challenge", "ifeval", "mmlu_pro", "hellaswag"] # Benchmark dataset model_path="empirischtech/Llama-3.1-10B-Instruct" # Run evaluation results = evaluator.simple_evaluate( model="hf", # Hugging Face model cache_requests=False, model_args=f"pretrained={model_path}", tasks=tasks_list, batch_size=4, device="cuda:0" ) # Extract results results = results['results'] json_string = json.dumps(results, indent=4) ``` ## Ethical Issues ### Ethical Considerations - There were no ethical issues involved, as we did not include the benchmark test set or the training set in the model's training process ## Contact Us ### Why Our LLMs? - [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► [Get in touch](https://chaperoneai.net/contact)