Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post about 1 hour ago
15 types of attention mechanisms Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention. Here is a list of 15 types of attention mechanisms used in AI models: 1. Soft attention (Deterministic attention) -> https://huggingface.co/papers/1409.0473 Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1. 2. Hard attention (Stochastic attention) -> https://huggingface.co/papers/1508.04025 Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything. 3. Self-attention -> https://huggingface.co/papers/1706.03762 Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation. 4. Cross-Attention (Encoder-Decoder attention) -> https://huggingface.co/papers/2104.08771 The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources. 5. Multi-Head Attention (MHA) -> https://huggingface.co/papers/1706.03762 Multiple attention “heads” are run in parallel.​ The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values. 6. Multi-Head Latent Attention (MLA) -> https://huggingface.co/papers/2405.04434 Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations. 7. Memory-Based attention -> https://huggingface.co/papers/1503.08895 Involves an external memory and uses attention to read from and write to this memory. See other types in the comments 👇
posted an update about 1 hour ago
15 types of attention mechanisms Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention. Here is a list of 15 types of attention mechanisms used in AI models: 1. Soft attention (Deterministic attention) -> https://huggingface.co/papers/1409.0473 Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1. 2. Hard attention (Stochastic attention) -> https://huggingface.co/papers/1508.04025 Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything. 3. Self-attention -> https://huggingface.co/papers/1706.03762 Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation. 4. Cross-Attention (Encoder-Decoder attention) -> https://huggingface.co/papers/2104.08771 The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources. 5. Multi-Head Attention (MHA) -> https://huggingface.co/papers/1706.03762 Multiple attention “heads” are run in parallel.​ The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values. 6. Multi-Head Latent Attention (MLA) -> https://huggingface.co/papers/2405.04434 Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations. 7. Memory-Based attention -> https://huggingface.co/papers/1503.08895 Involves an external memory and uses attention to read from and write to this memory. See other types in the comments 👇
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity

published an article 3 days ago
view article
Article

How to Reduce Memory Use in Reasoning Models

By Kseniase and 1 other
8
published an article 6 days ago
published an article 6 days ago
view article
Article

🌁#90: Why AI’s Reasoning Tests Keep Failing Us

By Kseniase
9
published an article 6 days ago
view article
Article

🦸🏻#13: Action! How AI Agents Execute Tasks with UI and API Tools

By Kseniase
4
published an article 7 days ago
view article
Article

🦸🏻#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI

By Kseniase
5
published an article 10 days ago
view article
Article

Everything You Need to Know about Knowledge Distillation

By Kseniase and 1 other
18
published an article 17 days ago
published an article 20 days ago
view article
Article

🌁#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

By Kseniase
4
published an article 20 days ago
published an article 24 days ago
published an article 27 days ago
view article
Article

🌁#88: Can DeepSeek Inspire Global Collaboration?

By Kseniase
3
published an article 29 days ago
view article
Article

🦸🏻#10: Does Present-Day GenAI Actually Reason?

By Kseniase
7
published an article about 1 month ago
view article
Article

Topic 27: What are Chain-of-Agents and Chain-of-RAG?

By Kseniase and 1 other
12
published an article about 1 month ago
published an article about 1 month ago
view article
Article

What is test-time compute and how to scale it?

By Kseniase and 1 other
54
published an article about 1 month ago
published an article about 1 month ago
view article
Article

🦸🏻#9: Does AI Remember? The Role of Memory in Agentic Workflows

By Kseniase
15
published an article about 1 month ago
view article
Article

🦸🏻#8: Rewriting the Rules of Knowledge: How Modern Agents Learn to Adapt

By Kseniase
5
published an article about 2 months ago
view article
Article

🅰️ℹ️ 1️⃣0️⃣1️⃣ The Keys to Prompt Optimization

By Kseniase and 1 other
4
published an article about 2 months ago
view article
Article

🌁#85: Curiosity, Open Source, and Timing: The Formula Behind DeepSeek’s Phenomenal Success

By Kseniase
6