Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
melisa 's Collections
Daily Papers 2025
Self-improving LLMs
Daily Papers
Model Merging
lshort-transformers

lshort-transformers

updated May 24, 2024

Papers useful when writing the paper: "The Not So Short Transfromers"

Upvote
1

  • ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

    Paper • 2403.03853 • Published Mar 6, 2024 • 65

  • SliceGPT: Compress Large Language Models by Deleting Rows and Columns

    Paper • 2401.15024 • Published Jan 26, 2024 • 74

  • Your Transformer is Secretly Linear

    Paper • 2405.12250 • Published May 19, 2024 • 158

  • Yi: Open Foundation Models by 01.AI

    Paper • 2403.04652 • Published Mar 7, 2024 • 66

  • Arcee's MergeKit: A Toolkit for Merging Large Language Models

    Paper • 2403.13257 • Published Mar 20, 2024 • 20

  • The Unreasonable Ineffectiveness of the Deeper Layers

    Paper • 2403.17887 • Published Mar 26, 2024 • 81

  • Weight subcloning: direct initialization of transformers using larger pretrained ones

    Paper • 2312.09299 • Published Dec 14, 2023 • 19

  • Evolutionary Optimization of Model Merging Recipes

    Paper • 2403.13187 • Published Mar 19, 2024 • 54

  • Resolving Interference When Merging Models

    Paper • 2306.01708 • Published Jun 2, 2023 • 14

  • What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

    Paper • 2312.15685 • Published Dec 25, 2023 • 16
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs