While publicly cautioning about “irrationality” in the AI market, Google leadership has privately ordered an aggressive infrastructure expansion to double AI serving capacity every six months.
According to a presentation viewed by CNBC this week, the directive targets a 1,000-fold scale increase within five years to support the compute-heavy “age of inference.”
Delivered by infrastructure VP Amin Vahdat, this internal mandate starkly contrasts with CEO Sundar Pichai’s recent comments to an interview with the BBC about a potential bubble.
Driven by the existential fear of underinvesting, the strategy relies on custom silicon like the Google’s Ironwood TPU Chips to prevent costs from spiraling alongside the capacity growth.
The 1,000x Mandate: Inside Google’s War Room
Details emerging from the November 6 all-hands meeting paint a picture of a company operating on a wartime footing.
Infrastructure VP Amin Vahdat presented a roadmap titled “AI Infrastructure” that laid out the exponential growth requirements necessary to keep pace with demand. Explicitly, the directive requires Google to double its AI serving capacity every six months to maintain its competitive position.
Long-term projections target a staggering 1,000-fold increase in capacity within just four to five years. Driving this acceleration is not model training, which has historically consumed the bulk of compute resources, but a fundamental shift to the “age of inference.”
Models like the recently launched Gemini 3 Pro require massive, continuous compute power to perform reasoning tasks and execute code.
Vahdat warned that “the competition in AI infrastructure is the most critical and also the most expensive part of the AI race.”
This sentiment was reinforced by CEO Sundar Pichai, who cited missed opportunities with the company’s video generation tool, Veo, due to hardware limitations. Pichai admitted that despite strong cloud growth, “those numbers would have been much better if we had more compute.”
Far from retrenching in the face of market skepticism, the internal tone frames 2026 as an “intense” year of “ups and downs.” Leadership’s message is clear: the primary constraint on growth is no longer software capability but the physical availability of compute.
The Silicon Shield: Ironwood, Axion, and the Efficiency Trap
Scaling capacity by 1,000x using off-the-shelf hardware would be financially ruinous. Google’s strategy hinges on decoupling performance gains from linear cost increases. Vahdat outlined the engineering requirement:
“Google needs to ‘be able to deliver 1,000 times more capability, compute, storage networking for essentially the same cost and increasingly, the same power, the same energy level,’ Vahdat said.”
Underpinning this massive expansion is a simple but brutal economic reality: efficiency is the only path to sustainability. Reliance on the Ironwood TPU, which entered general availability just recently, is central to this strategy.
Claiming a 10x peak performance improvement over the v5p, this seventh-generation chip offers 2x performance-per-watt compared to the previous Trillium generation.
General-purpose workloads are being offloaded to the new Arm-based Axion CPUs to free up power and thermal headroom for AI tasks. By moving standard compute jobs to more efficient processors, Google aims to maximize the energy available for its power-hungry TPUs.
Adopting a “co-design” philosophy, engineers integrate software directly with hardware architecture. Research from Google DeepMind informs the chip design, allowing the company to squeeze out gains where standard hardware cannot. Vahdat noted that “it won’t be easy but through collaboration and co-design, we’re going to get there.”
Looming large, however, is the “efficiency trap.” Jevons paradox suggests that as compute becomes more efficient, demand will rise to consume the surplus, negating cost savings. Should the cost of inference drop, the volume of queries – driven by agentic workflows and “Deep Think” reasoning – is expected to explode, keeping total energy consumption high.
The Bubble Paradox: Betting Against ‘Irrationality’
Amidst growing external skepticism regarding the return on investment (ROI) for generative AI, this aggressive internal expansion proceeds.
In an interview with the BBC, Pichai conceded there are “elements of irrationality” in the current market valuation of AI. Despite this public caution, Alphabet has raised its 2025 capital expenditure forecast to $93 billion, with a “significant increase” planned for 2026.
Employees directly challenged leadership on this disconnect during the Q&A session. One question specifically addressed the tension between soaring spending and the fear of a market correction:
“Amid significant Al investments and market talk of a potential Al bubble burst, how are we thinking about ensuring long-term sustainability and profitability if the Al market doesn’t mature as expected?”
Pichai’s defense rests on the company’s balance sheet. He argued: “We are better positioned to withstand, you know, misses, than other companies.”
Defensively, the logic posits that the risk of underinvesting – and potentially becoming irrelevant – is existential, whereas overinvesting is merely expensive.
Such reasoning currently drives the ongoing AI Capex Boom, where infrastructure buildouts detached from immediate revenue reality. Google is effectively betting that it can outlast competitors in a capital-intensive war of attrition.
Market Reality: The Prisoner’s Dilemma of AI
Collectively, the “Big Four” – Google, Microsoft, Amazon, and Meta – are projected to spend over $380 billion on infrastructure this year, according to figures cited by CNBC. Nvidia CEO Jensen Huang explicitly rejected the “bubble” narrative this week, citing tangible demand, a view Google must hedge against.
Competitor OpenAI is facing its own struggles. An internal memo from Sam Altman that surfaced this week suggests that the industry leader is increasingly grappling with the economic realities of scaling. This creates an opening for Google to leverage its vertical integration.
Shifting the bottleneck from data availability to pure token generation speed and cost is the “Age of Inference.” Google’s specific advantage lies in its custom silicon stack, potentially allowing it to weather a margin-crushing price war better than those reliant solely on Nvidia hardware.
Recent product launches, such as Gemini 3 Pro and Gemini 3 Pro Image, drive this demand further. Ultimately, the outcome depends on whether premium features such as “Deep Think” and agentic workflows can generate revenue faster than the hardware depreciates.
Despite the staggering costs involved, Google so far appears committed to a “build it and they will come” strategy.

