Qwen

Enterprise

company

https://qwen.ai/

alibaba_qwen

QwenLM

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

jklj077 new activity about 1 hour ago

Qwen/Qwen3-30B-A3B:KeyError: 'rope_type'

jklj077 new activity about 1 hour ago

Qwen/Qwen3-30B-A3B:KeyError: 'rope_type'

jklj077 new activity about 1 hour ago

Qwen/Qwen3-30B-A3B:Update README.md

View all activity

Qwen's activity

jklj077

in Qwen/Qwen3-30B-A3B about 1 hour ago

KeyError: 'rope_type'

#4 opened 1 day ago by

yuchenxie

KeyError: 'rope_type'

#7 opened about 12 hours ago by

Rolfie33

Update README.md

#11 opened about 9 hours ago by

yuchenxie

jklj077

in Qwen/Qwen3-235B-A22B-FP8 about 2 hours ago

is this W8A16 or W8A8?

#3 opened about 3 hours ago by

ehartford

jklj077

in Qwen/Qwen3-32B-FP8 about 2 hours ago

Remove vLLM FP8 Limitation

#3 opened 1 day ago by

simon-mo

jklj077

in Qwen/Qwen3-8B-FP8 about 2 hours ago

Remove vLLM FP8 Limitation

#2 opened 1 day ago by

simon-mo

jklj077

in Qwen/Qwen3-1.7B-FP8 about 2 hours ago

Remove vLLM FP8 Limitation

#2 opened 1 day ago by

simon-mo

jklj077

in Qwen/Qwen3-14B-FP8 about 2 hours ago

Remove vLLM FP8 Limitation

#2 opened 1 day ago by

simon-mo

jklj077

in Qwen/Qwen3-4B-FP8 about 2 hours ago

Remove vLLM FP8 Limitation

#2 opened 1 day ago by

simon-mo

jklj077

in Qwen/Qwen3-0.6B-FP8 about 2 hours ago

Remove vLLM FP8 Limitation

#3 opened 1 day ago by

simon-mo

jklj077

in Qwen/Qwen3-30B-A3B-FP8 about 2 hours ago

Remove vLLM FP8 Limitation

#2 opened 1 day ago by

simon-mo

jklj077

in Qwen/Qwen3-235B-A22B-FP8 about 2 hours ago

Remove vLLM FP8 Limitation

#2 opened 1 day ago by

simon-mo

ehartford

in Qwen/Qwen3-235B-A22B-FP8 about 3 hours ago

is this W8A16 or W8A8?

#3 opened about 3 hours ago by

ehartford

KnutJaegersberg

posted an update about 10 hours ago

Post

502

Mining LLM Pretraining Data: Topics, Skills, and Cognitive Patterns

Summary
The technical blog post details an analysis of pretraining data from various Large Language Models (LLMs) like GPT-2, Falcon, and Gemma2. Using text mining techniques including embeddings, clustering, and LLM-based annotation on datasets like OpenWebText, The Pile, and C4, the study identified key patterns.

Findings show the data is dominated by topics like Technology, Politics, Health, Business, and Culture, originating from diverse sources including web scrapes, academic papers, code repositories, and news media. The data reflects the work of professionals primarily in Journalism/Media, Content Creation, Analysis/Research, Academia, and Tech/Engineering. Consequently, LLMs learn corresponding skills (e.g., Research, Critical Thinking, Communication, Domain Expertise) and task representations (e.g., Analysis, Content Creation, Compliance).

The analysis also uncovered distinct writing styles, underlying cognitive frameworks (beliefs, frames, schemas, memes), and common cognitive biases (like Confirmation Bias) embedded in the data. LLM capability progression appears linked to data scale and task frequency, following a power law. The study concludes that LLMs are powerful data-driven simulators whose capabilities and limitations are shaped by the composition and inherent biases of their pretraining corpora, highlighting the importance of data understanding and curation.

https://huggingface.co/blog/KnutJaegersberg/mining-llm-pretraining-data

1 reply

yangapku

updated 6 models about 21 hours ago

AI & ML interests

Recent Activity

Team members 137

Qwen's activity

KeyError: 'rope_type'

KeyError: 'rope_type'

Update README.md

is this W8A16 or W8A8?

Remove vLLM FP8 Limitation

Remove vLLM FP8 Limitation

Remove vLLM FP8 Limitation

Remove vLLM FP8 Limitation

Remove vLLM FP8 Limitation

Remove vLLM FP8 Limitation

Remove vLLM FP8 Limitation

Remove vLLM FP8 Limitation

is this W8A16 or W8A8?