Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Cerebras bumps up IPO range as it looks to raise up to $4.8 billion

    May 11, 2026

    What Graduates Need To Know About AI And Entry-Level Jobs

    May 11, 2026

    Here’s how artificial intelligence is changing boardrooms

    May 11, 2026
    Facebook X (Twitter) Instagram
    ailogicnews.aiailogicnews.ai
    • Home
    ailogicnews.aiailogicnews.ai
    Home»Deepseek»Apple Says Claude, DeepSeek-R1, and o3-mini Can’t Really Reason
    Deepseek

    Apple Says Claude, DeepSeek-R1, and o3-mini Can’t Really Reason

    AI Logic NewsBy AI Logic NewsJune 9, 2025No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    AI critic Gary Marcus is smiling again, thanks to Apple. 

    In a new paper titled The Illusion of Thinking, researchers from the Cupertino-based company argue that even the most advanced AI models, including the so-called large reasoning models (LRMs), don’t actually think. Instead, they simulate reasoning without truly understanding or solving complex problems.

    The paper, released just ahead of Apple’s Worldwide Developer Conference, tested leading AI models, including  OpenAI’s o1/o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking, using specially designed algorithmic puzzle environments rather than standard benchmarks. 

    The researchers argue that traditional benchmarks, like math and coding tests, are flawed due to “data contamination” and fail to reveal how these models actually “think”.

    “We show that state-of-the-art LRMs still fail to develop generalisable problem-solving capabilities, with accuracy ultimately collapsing to zero beyond certain complexities across different environments,” the paper noted.

    Interestingly, one of the authors of the paper is Samy Bengio, the brother of Turing Award winner Yoshua Bengio. Yoshua recently launched LawZero, a Canada-based nonprofit AI safety lab working on building systems that prioritise truthfulness, safety, and ethical behaviour over commercial interests. 

    The lab has secured around $30 million in initial funding from prominent backers, including former Google CEO Eric Schmidt’s philanthropic organisation, Skype co-founder Jaan Tallinn, Open Philanthropy, and the Future of Life Institute.

    Backing the paper’s claims, Marcus could not hold his excitement. “AI is not hitting a wall. But LLMs probably are (or at least a point of diminishing returns). We need new approaches, and to diversify the which roads are being actively explored.”

    “I don’t think LLMs are a good way to get there (AGI). They might be part of the answer, but I don’t think they are the whole answer,” Marcus said in a previous interaction with AIM, stressing that LLMs are not “useless”. He also expressed optimism about AGI, describing it as a machine capable of approaching new problems with the flexibility and resourcefulness of a smart human being. “I think we’ll see it someday,” he further said.

    Taking a more balanced view, Ethan Mollick, professor at The Wharton School, said in a post on X, “I think the Apple paper on the limits of reasoning models in particular tests is useful & important, but the “LLMs are hitting a wall” narrative on X around it feels premature at best. Reminds me of the buzz over model collapse—limitations that were overcome quickly in practice.”

    He added that the current approach to reasoning likely has real limitations for a variety of reasons. However, the reasoning approaches themselves were made public less than a year ago. “There are just a lot of approaches that might overcome these issues. Or they may not. It’s just very early.”

    Hemanth Mohapatra, partner at Lightspeed India, said that the recent Apple paper showing reasoning struggles with complex problems confirms what many experts, like Yann LeCun, have long sensed. He acknowledged that while a new direction is necessary, current AI capabilities still promise significant productivity gains.

    “We do need a different hill to climb, but that doesn’t mean existing capabilities won’t have huge impact on productivity,” he said.

    Meanwhile, Subbarao Kambhampati, professor at Arizona State University, who has been pretty vocal about LLMs’ inability to reason and think, quipped that another advantage of being a university researcher in AI is, “You don’t have to deal with either the amplification or the backlash as a surrogate for ‘The Company’. Your research is just your research, fwiw.”

    How the Models Were Tested

    Instead of relying on familiar benchmarks, Apple’s team used controlled puzzle environments, such as variants of the Tower of Hanoi, to precisely manipulate problem complexity and observe how models generate step-by-step “reasoning traces”. This allowed them to see not just the final answer, but the process the model used to get there.

    The paper found that for simpler problems, non-reasoning models often outperformed more advanced LRMs, which tended to “overthink” and miss the correct answer. 

    As the difficulty level rose to moderate, the reasoning models showed their strength, successfully following more intricate logical steps. However, when faced with truly complex puzzles, all models, regardless of their architecture, struggled and ultimately failed. 

    Rather than putting in more effort, the AI responses grew shorter and less thoughtful, as if the models were giving up. 

    While large language models continue to struggle with complex reasoning, that doesn’t make them useless. 

    Abacus.AI CEO Bindu Reddy pointed out on X, many people are misinterpreting the paper as proof that LLMs don’t work. “All this paper is saying is LLMs can’t solve arbitrarily hard problems yet,” she said, adding that they’re already handling tasks beyond the capabilities of most humans.

    Why Does This Happen?

    The researchers suggest that what appears to be reasoning is often just the retrieval and adaptation of memorised solution templates from training data, not genuine logical deduction. 

    When confronted with unfamiliar and highly complex problems, the models’ reasoning abilities tend to collapse almost immediately, revealing that what appears to be reasoning is often just an illusion of thought. 

    The study makes it clear that current large language models are still far from being true general-purpose reasoners. Their ability to handle reasoning tasks does not extend beyond a certain level of complexity, and even targeted efforts to train them with the correct algorithms result in only minor improvements.

    Cover up for Siri’s failure?

    Andrew White, co-founder of FutureHouse, questioned Apple’s approach, saying that its AI researchers seem to have adopted an “anti-LLM cynic ethos” by repeatedly publishing papers that argue reasoning LLMs are fundamentally limited and lack generalisation ability. He pointed out the irony, saying Apple has “the worst AI products” like Siri and Apple Intelligence, and admitted he has no idea what their actual strategy is.

    What This Means for the Future

    Apple’s research serves as a cautionary message for AI developers and users alike. While today’s chatbots and reasoning models appear impressive, their core abilities remain limited. As the paper puts it, “despite sophisticated self-reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds.”

    “We need models that can represent and manipulate abstract structures, not just predict tokens. Hybrid systems that combine LLMs with symbolic logic, memory modules, or algorithmic planners are showing early promise. These aren’t just add-ons — they reshape how the system thinks,” said Pradeep Sanyal, AI and data leader at a global tech consulting firm, in a LinkedIn post.

    He further added that combining neural and symbolic parts isn’t without drawbacks. It introduces added complexity around coordination, latency, and debugging. But the improvements in precision and transparency make it a direction worth exploring.

    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleOpenAI hits $10 billion in annualized revenue fueled by ChatGPT growth
    Next Article Why Duolingo's Founder is Doing Damage Control After AI Announcement
    AI Logic News

    Related Posts

    Deepseek

    DeepSeek Reportedly Raising 50

    May 11, 2026
    Deepseek

    Crypto Morning Brief: DeepSeek Plans $5B Funding Round; Zcash to Launch Quantum-Resistant Wallet Within One Month | Blockchain Industry Original In-Depth Content – Authoritative Industry Analysis Report Interpretation – Blockchain Technology Application Analysis

    May 10, 2026
    Deepseek

    DeepSeek V4 shows how cheaper AI may come from lower precision – Startup Fortune

    May 10, 2026
    Demo
    Top Posts

    DeepSeek V4 And Tencent’s New Hunyuan Model To Launch In April

    March 17, 202643 Views

    OpenAI’s Simo Said to Warn Staff Ag

    March 17, 202635 Views

    Hunter Alpha Sparks DeepSeek V4 Speculation

    March 18, 202616 Views
    Latest Reviews
    ailogicnews.ai
    © 2026 Lee Enterprises

    Type above and press Enter to search. Press Esc to cancel.