yale-nlp
/

MDCureRM

@@ -1,20 +1,20 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - sfairXC/FsfairX-LLaMA3-RM-v0.1
 tags:
 - reward model
 - fine-grained
 ---
 # MDCureRM
 [📄 Paper](https://arxiv.org/pdf/2410.23463) | [🤗 HF Collection](https://huggingface.co/collections/yale-nlp/mdcure-6724914875e87f41e5445395) | [⚙️ GitHub Repo](https://github.com/yale-nlp/MDCure)
 ## Introduction
 **MDCure** is an effective and scalable procedure for generating high-quality multi-document (MD) instruction tuning data to improve MD capabilities of LLMs. Using MDCure, we construct a suite of MD instruction datasets complementary to collections such as [FLAN](https://github.com/google-research/FLAN) and fine-tune a variety of already instruction-tuned LLMs from the FlanT5, Qwen2, and LLAMA3.1 model families, up to 70B parameters in size. We additionally introduce **MDCureRM**, an evaluator model specifically designed for the MD setting to filter and select high-quality MD instruction data in a cost-effective, RM-as-a-judge fashion. Extensive evaluations on a wide range of MD and long-context benchmarks spanning various tasks show MDCure consistently improves performance over pre-trained baselines and over corresponding base models by up to 75.5%.
@@ -113,10 +113,16 @@ reward_weights = torch.tensor([1/9, 1/9, 1/9, 2/9, 2/9, 2/9], device="cuda")
 source_text_1 = ...
 source_text_2 = ...
 source_text_3 = ...
-context = f"{source_text_1}\n\n{source_text_2}\n\n{source_text_3}"
 instruction = "What happened in CHAMPAIGN regarding Lovie Smith and the 2019 defense improvements? Respond with 1-2 sentences."
-input_text = f"Instruction: {instruction}\n\n{context}"
 tokenized_input = tokenizer(
                         input_text,
                         return_tensors='pt',
@@ -141,7 +147,7 @@ Beyond MDCureRM, we open-source our best MDCure'd models at the following links:
 | **MDCure-Qwen2-1.5B-Instruct**    | [🤗 HF Repo](https://huggingface.co/yale-nlp/MDCure-Qwen2-1.5B-Instruct) | **Qwen2-1.5B-Instruct** fine-tuned with MDCure-72k  |
 | **MDCure-Qwen2-7B-Instruct**      | [🤗 HF Repo](https://huggingface.co/yale-nlp/MDCure-Qwen2-7B-Instruct) | **Qwen2-7B-Instruct** fine-tuned with MDCure-72k    |
 | **MDCure-LLAMA3.1-8B-Instruct**   | [🤗 HF Repo](https://huggingface.co/yale-nlp/MDCure-LLAMA3.1-8B-Instruct) | **LLAMA3.1-8B-Instruct** fine-tuned with MDCure-72k  |
-| **MDCure-LLAMA3.1-70B-Instruct**  | [🤗 HF Repo](https://huggingface.co/yale-nlp/MDCure-LLAMA3.1-70B-Instruct) | **LLAMA3.1-70B-Instruct** fine-tuned with MDCure-72 |
 ## Citation

 ---
 base_model:
 - sfairXC/FsfairX-LLaMA3-RM-v0.1
+language:
+- en
+license: apache-2.0
 tags:
 - reward model
 - fine-grained
+pipeline_tag: text-ranking
+library_name: transformers
 ---
 # MDCureRM
 [📄 Paper](https://arxiv.org/pdf/2410.23463) | [🤗 HF Collection](https://huggingface.co/collections/yale-nlp/mdcure-6724914875e87f41e5445395) | [⚙️ GitHub Repo](https://github.com/yale-nlp/MDCure)
 ## Introduction
 **MDCure** is an effective and scalable procedure for generating high-quality multi-document (MD) instruction tuning data to improve MD capabilities of LLMs. Using MDCure, we construct a suite of MD instruction datasets complementary to collections such as [FLAN](https://github.com/google-research/FLAN) and fine-tune a variety of already instruction-tuned LLMs from the FlanT5, Qwen2, and LLAMA3.1 model families, up to 70B parameters in size. We additionally introduce **MDCureRM**, an evaluator model specifically designed for the MD setting to filter and select high-quality MD instruction data in a cost-effective, RM-as-a-judge fashion. Extensive evaluations on a wide range of MD and long-context benchmarks spanning various tasks show MDCure consistently improves performance over pre-trained baselines and over corresponding base models by up to 75.5%.
 source_text_1 = ...
 source_text_2 = ...
 source_text_3 = ...
+context = f"{source_text_1}
+{source_text_2}
+{source_text_3}"
 instruction = "What happened in CHAMPAIGN regarding Lovie Smith and the 2019 defense improvements? Respond with 1-2 sentences."
+input_text = f"Instruction: {instruction}
+{context}"
 tokenized_input = tokenizer(
                         input_text,
                         return_tensors='pt',
 | **MDCure-Qwen2-1.5B-Instruct**    | [🤗 HF Repo](https://huggingface.co/yale-nlp/MDCure-Qwen2-1.5B-Instruct) | **Qwen2-1.5B-Instruct** fine-tuned with MDCure-72k  |
 | **MDCure-Qwen2-7B-Instruct**      | [🤗 HF Repo](https://huggingface.co/yale-nlp/MDCure-Qwen2-7B-Instruct) | **Qwen2-7B-Instruct** fine-tuned with MDCure-72k    |
 | **MDCure-LLAMA3.1-8B-Instruct**   | [🤗 HF Repo](https://huggingface.co/yale-nlp/MDCure-LLAMA3.1-8B-Instruct) | **LLAMA3.1-8B-Instruct** fine-tuned with MDCure-72k  |
+| **MDCure-LLAMA3.1-70B-Instruct**  | [🤗 HF Repo](https://huggingface.co/yale-nlp/MDCure-LLAMA3.1-70B-Instruct) | **LLAMA3.1-70B-Instruct** fine-tuned with MDCure-72k |
 ## Citation