Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Starbucks Is Using AI To Fire People — And Calling It A “Turnaround”

    May 16, 2026

    How the A.I. bubble is being ballooned by Donald Trump and Elon Musk.

    May 16, 2026

    Hua Hong Signals Strong Demand

    May 16, 2026
    Facebook X (Twitter) Instagram
    ailogicnews.aiailogicnews.ai
    • Home
    ailogicnews.aiailogicnews.ai
    Home»Deepseek»DeepSeek tests “sparse attention” to slash AI processing costs
    Deepseek

    DeepSeek tests “sparse attention” to slash AI processing costs

    AI Logic NewsBy AI Logic NewsOctober 1, 2025No Comments2 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    The attention bottleneck

    In AI, “attention” is a term for a software technique that determines which words in a text are most relevant to understanding each other. Those relationships map out context, and context builds meaning in language. For example, in the sentence “The bank raised interest rates,” attention helps the model establish that “bank” relates to “interest rates” in a financial context, not a riverbank context. Through attention, conceptual relationships become quantified as numbers stored in a neural network. Attention also governs how AI language models choose what information “matters most” when generating each word of their response.

    Calculating context with a machine is tricky, and it wasn’t practical at scale until chips like GPUs that can calculate these relationships in parallel reached a certain level of capability. Even so, the original Transformer architecture from 2017 checked the relationship of each word in a prompt with every other word in a kind of brute force way. So if you fed 1,000 words of a prompt into the AI model, it resulted in 1,000 x 1,000 comparisons, or 1 million relationships to compute. With 10,000 words, that becomes 100 million relationships. The cost grows quadratically, which creates a fundamental bottleneck for processing long conversations.

    Although it’s likely that OpenAI uses some sparse attention techniques in GPT-5, long conversations still suffer performance penalties. Every time you submit a new response to ChatGPT, the AI model at its core processes context comparisons for the entire conversation history all over again.

    Of course, the researchers behind the original Transformer model designed it for machine translation with relatively short sequences (maybe a few hundred tokens, which are chunks of data that represent words), where quadratic attention was manageable. It’s when people started scaling to thousands or tens of thousands of tokens that the quadratic cost became prohibitive.

    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleOpenAI releases video generation ap
    Next Article Hollywood performers union condemns AI-generated ‘actress’ Tilly Norwood
    AI Logic News

    Related Posts

    Deepseek

    Hua Hong Signals Strong Demand

    May 16, 2026
    Deepseek

    Can You Invest in DeepSeek in

    May 15, 2026
    Deepseek

    What is DeepSeek? Everything a

    May 15, 2026
    Demo
    Top Posts

    DeepSeek V4 And Tencent’s New Hunyuan Model To Launch In April

    March 17, 202644 Views

    OpenAI’s Simo Said to Warn Staff Ag

    March 17, 202637 Views

    Hunter Alpha Sparks DeepSeek V4 Speculation

    March 18, 202618 Views
    Latest Reviews
    ailogicnews.ai
    © 2026 Lee Enterprises

    Type above and press Enter to search. Press Esc to cancel.