title: Awesome Turkish Language Models
emoji: 😎
short_description: A curated list of Turkish AI models, datasets, papers
colorFrom: purple
colorTo: blue
pinned: false
sdk: static
awesome-turkish-language-models
A curated list of Turkish AI models, datasets, papers
The purpose of this repo to share and spread the information of Turkish AI models, datasets and papers. The amount of these Turkish resources are low and spread across the web. This repo aims to bring a curated selection of these resources together. This is not a list of all Turkish NLP/LLM models or datasets but a selection. So not all BERT or LLaMA based models are gonna make it here. The same applies to low quality Google translate translations of datasets. We aim each entry to have some kind of unique element to its own. This can be model performance, uniqueness in the task, highlighting the groups/companies (not everyone share their stuff so why not appreciate it!) etc. If you want to add anything you are welcomed :smirk: , please check out the contributing section.
Table of Contents
Models
LLMs
- ytu-ce-cosmos/Turkish-Llama
- Trendyol/Llama-3-Trendyol-LLM-8b-chat-v2.0
- TURKCELL/Turkcell-LLM-7b-v1
- KOCDIGITAL/Kocdigital-LLM-8b-v0.1
- WiroAI/OpenR1-Qwen-7B-Turkish Reasoning model
- WiroAI/wiroai-turkish-llm-9b
VLMs
NLP
- Trendyol/tybert
- Trendyol/tyroberta
- ytu-ce-cosmos/turkish-base-bert-uncased
- ytu-ce-cosmos/turkish-colbert
- ytu-ce-cosmos/turkish-gpt2-large
- dbmdz/bert-base-turkish-128k-uncased
- TURKCELL/bert-offensive-lang-detection-tr
- asafaya/kanarya-2b
- boun-tabi-LMG/TURNA
- Helsinki-NLP group Lots of translation models for turkish
- VRLLab/TurkishBERTweet Tweet sentiment analysis
- akdeniz27/bert-base-turkish-cased-ner
Speech models
To be added
Multi-modal models
- kesimeg/lora-turkish-clip CLIP model finetuned on turkish dataset
Datasets
Text only
- merve/turkish_instructions Instruction tuning dataset
- BrewInteractive/alpaca-tr Instruction tuning dataset
- AYueksel/TurkishMMLU
- Metin/WikiRAG-TR
- MBZUAI/Bactrian-X
- alibayram/turkish_mmlu
- Helsinki-NLP group Lots of translation models datasets for turkish
- ytu-ce-cosmos/gsm8k_tr
- turkish-nlp-suite/turkish-wikiNER
- turkish-nlp-suite/InstrucTurca
- WiroAI/dolphin-r1-turkish Reasoning dataset
- allenai/c4 Web scrape
- HPLT/HPLT2.0_cleaned Web scrape
- unimelb-nlp/wikiann NER
- TUR2SQL Text to SQL query dataset
Text & Images
- ytu-ce-cosmos/Turkish-LLaVA-Finetune
- ytu-ce-cosmos/Turkish-LLaVA-Pretrain
- ytu-ce-cosmos/turkce-kitap
- 99eren99/LLaVA1.5-Data-Turkish
- TasvirEt
- Cohere For AI Has various dataset for VLM benchmarking
Text & Speech
- mozilla-foundation/common_voice_17_0 This dataset also has older versions v16,v15, etc.
Papers
- Cosmos-LLaVA: Chatting with the Visual
- Introducing cosmosGPT: Monolingual Training for Turkish Language Models
- TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish
- TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study
Benchmarks
- malhajar/OpenLLMTurkishLeaderboard_v0.2
- KUIS-AI/Cetvel
- kesimeg/Turkish-rewardbench Reward model comparison
Tutorials and Codes
To be added
Tools and APIs
- Glosbe
- Wiktionary
- Zemberek Some turkish NLP tools
- 3rt4nm4n/turkish-apis A list of turkish-apis
State of AI in Türkiye
- KUIS-AI Youtube channel
- TR-AI Youtube channel
- Trendyol Tech Youtube channel Has videos related to their AI products
Miscellaneous
- Mukayese: Turkish NLP Strikes Back
- Mukayese github repo
- Wikipedia dumps Can be used as a dataset
Contributing
If you got anything to be added here just make a pull request! Before making a pull request please consider if a model/dataset/etc. has enough quality/uniqueness. Huggingface is crowded with finetuning of LLama and BERT, same applies to dataset. Many datasets have multiple machine translation version. This makes it hard to find good quality sources. We want to keep this list as curated as possible but still be able to cover enough sources.