Ksenia Se's picture

1 82 2

Ksenia Se

Kseniase

·

https://www.turingpost.com/

AI & ML interests

None yet

Recent Activity

replied to their post about 1 hour ago

15 types of attention mechanisms Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention. Here is a list of 15 types of attention mechanisms used in AI models: 1. Soft attention (Deterministic attention) -> https://huggingface.co/papers/1409.0473 Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1. 2. Hard attention (Stochastic attention) -> https://huggingface.co/papers/1508.04025 Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything. 3. Self-attention -> https://huggingface.co/papers/1706.03762 Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation. 4. Cross-Attention (Encoder-Decoder attention) -> https://huggingface.co/papers/2104.08771 The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources. 5. Multi-Head Attention (MHA) -> https://huggingface.co/papers/1706.03762 Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values. 6. Multi-Head Latent Attention (MLA) -> https://huggingface.co/papers/2405.04434 Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations. 7. Memory-Based attention -> https://huggingface.co/papers/1503.08895 Involves an external memory and uses attention to read from and write to this memory. See other types in the comments 👇

posted an update about 1 hour ago

15 types of attention mechanisms Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention. Here is a list of 15 types of attention mechanisms used in AI models: 1. Soft attention (Deterministic attention) -> https://huggingface.co/papers/1409.0473 Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1. 2. Hard attention (Stochastic attention) -> https://huggingface.co/papers/1508.04025 Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything. 3. Self-attention -> https://huggingface.co/papers/1706.03762 Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation. 4. Cross-Attention (Encoder-Decoder attention) -> https://huggingface.co/papers/2104.08771 The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources. 5. Multi-Head Attention (MHA) -> https://huggingface.co/papers/1706.03762 Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values. 6. Multi-Head Latent Attention (MLA) -> https://huggingface.co/papers/2405.04434 Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations. 7. Memory-Based attention -> https://huggingface.co/papers/1503.08895 Involves an external memory and uses attention to read from and write to this memory. See other types in the comments 👇

upvoted an article 3 days ago

How to Reduce Memory Use in Reasoning Models

View all activity

Organizations

Kseniase's activity

published an article 3 days ago

Article

How to Reduce Memory Use in Reasoning Models

By

and 1 other •

3 days ago

• 8

published an article 6 days ago

Article

🌁#91: We are failing in AI literacy

By

and 1 other •

6 days ago

• 3

published an article 6 days ago

Article

🌁#90: Why AI’s Reasoning Tests Keep Failing Us

By

•

13 days ago

• 9

published an article 6 days ago

Article

🦸🏻#13: Action! How AI Agents Execute Tasks with UI and API Tools

By

•

6 days ago

• 4

published an article 7 days ago

Article

🦸🏻#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI

By

•

7 days ago

• 5

published an article 10 days ago

Article

Everything You Need to Know about Knowledge Distillation

By

and 1 other •

10 days ago

• 18

published an article 17 days ago

Article

Inside the family of Smol models

By

and 1 other •

17 days ago

• 7

published an article 20 days ago

Article

🌁#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

By

•

20 days ago

• 4

published an article 20 days ago

Article

🦸🏻#11: How Do Agents Plan and Reason?

By

•

20 days ago

• 10

published an article 24 days ago

Article

Topic 28: What is Mixture-of-Mamba?

By

and 1 other •

24 days ago

• 3

published an article 27 days ago

Article

🌁#88: Can DeepSeek Inspire Global Collaboration?

By

•

27 days ago

• 3

published an article 29 days ago

Article

🦸🏻#10: Does Present-Day GenAI Actually Reason?

By

•

29 days ago

• 7

published an article about 1 month ago

Article

Topic 27: What are Chain-of-Agents and Chain-of-RAG?

By

and 1 other •

about 1 month ago

• 12

published an article about 1 month ago

Article

🌁#87: Why DeepResearch Should Be Your New Hire

By

•

Feb 10

• 5

published an article about 1 month ago

Article

What is test-time compute and how to scale it?

By

and 1 other •

Feb 6

• 54

published an article about 1 month ago

Article

🌁#86: Four Freedoms of truly open AI

By

and 1 other •

Feb 3

• 5

published an article about 1 month ago

Article

🦸🏻#9: Does AI Remember? The Role of Memory in Agentic Workflows

By

•

Feb 2

• 15

published an article about 1 month ago

Article

🦸🏻#8: Rewriting the Rules of Knowledge: How Modern Agents Learn to Adapt

By

•

Jan 31

• 5

published an article about 2 months ago

Article

🅰️ℹ️ 1️⃣0️⃣1️⃣ The Keys to Prompt Optimization

By

and 1 other •

Jan 29

• 4

published an article about 2 months ago

Article

🌁#85: Curiosity, Open Source, and Timing: The Formula Behind DeepSeek’s Phenomenal Success

By

•

Jan 27

• 6