Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
pszemraj 's Collections
BookSum-based Summarizers
LLM training
Grammar Synthesis
Nougat ONNX
synthsumm
boulderspot
OCR Quality Classifiers
tFINE

LLM training

updated Oct 27, 2024

small-scale pretraining experiments of mine

Upvote
1

  • BEE-spoke-data/smol_llama-101M-GQA

    Text Generation • Updated Dec 25, 2023 • 564 • 28

  • BEE-spoke-data/smol_llama-220M-GQA

    Text Generation • Updated Jun 28, 2024 • 510 • 12

  • BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu

    Text Generation • Updated Jul 18, 2024 • 12 • 1

    Note smol_llama-220M-GQA CPT on fineweb-edu for 10 billion tokens


  • BEE-spoke-data/smol_llama-81M-tied

    Text Generation • Updated Nov 20, 2023 • 15 • 6

  • BEE-spoke-data/mega-ar-126m-4k

    Text Generation • Updated Jan 28, 2024 • 2.81k • 4

  • BEE-spoke-data/verysmol_llama-v11-KIx2

    Text Generation • Updated Jan 10, 2024 • 11 • 4

  • pszemraj/pythia-31m-KI_v1-2048-scratch

    Text Generation • Updated Nov 18, 2023 • 11

  • BEE-spoke-data/bert-plus-L8-4096-v1.0

    Fill-Mask • Updated Feb 14, 2024 • 3

  • BEE-spoke-data/mega-encoder-small-16k-v1

    Fill-Mask • Updated Mar 17, 2024 • 4 • 4

  • BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI

    Text Generation • Updated Mar 4, 2024 • 12 • 2

    Note this is a mid-training checkpoint of what is now smol_llama-220M


  • pszemraj/jamba-900M-v0.13-KIx2

    Text Generation • Updated May 18, 2024 • 20 • 4
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs