Tuesday, June 23, 2026

The Great AI Cooling Crisis: Why Big Tech Is Drowning in Debt to Keep Chips From Melting



            

By Staff

The hyper scalers are betting hundreds of billions on AI infrastructure they may not be able to cool or afford.

The numbers are almost too large to comprehend. In 2025 alone, four companies Google, Amazon, Microsoft, and Meta will spend over $300 billion on capital expenditures, the vast majority of it on AI data centers. That's more than the GDP of Finland. It's more than the entire US federal highway budget. And nearly all of it is being financed with debt.

The problem isn't just the money. It's physics.

Every dollar of that spending produces heat enormous, relentless, expensive to remove heat and the industry is quietly panicking about how to get rid of it before their million dollar silicon furnaces melt into slag.

A modern AI data center is, fundamentally, a space heater that happens to do mathematics. Every watt that enters a GPU emerges as thermal energy. An NVIDIA H100 draws approximately 700 watts. The upcoming B200 pushes past 1,200 watts. Road map chips from NVIDIA's Rubin architecture are expected to hit 1,500 to 2,000 watts per socket roughly the power draw of a household space heater, condensed into something the size of a paperback book.

At those densities, traditional air cooling becomes physically impossible. You would need hurricane force winds moving through a 1U server chassis. The fans alone would consume more power than the compute they're supposed to enable.

"The industry is building cathedrals for a religion that hasn't been invented yet," said one data center engineer at a major hyper scaler who spoke on condition of anonymity. "We're designing cooling systems for chips we haven't seen, using fluid dynamics models that barely keep up with the hardware road maps. It's terrifying."

The mismatch between infrastructure timelines and hardware evolution is stark. A data center building lasts 30 years. The cooling system inside it may be obsolete in five. The chips it houses are superseded in two.

Yet the hyper scalers are pouring concrete at an unprecedented pace. Microsoft has committed over $80 billion. Amazon's AWS division crossed the $100 billion threshold. Google parent Alphabet is spending roughly 20% of its annual revenue on AI infrastructure. Meta, not to be left behind, has earmarked upwards of $60 billion.

These aren't experimental bets. They're existential ones.

Google faces the disintermediation of its search business by AI chatbots that bypass advertising entirely. Amazon's AWS profit engine which generates more operating income than the entire retail operation could erode if AI workloads migrate to competitors with tighter OpenAI integrations or proprietary silicon. Microsoft is betting that its OpenAI partnership and enterprise distribution will let it capture the lion's share of the coming AI productivity boom.

All of them are terrified of being the Blockbuster of artificial intelligence. So they spend. And spend. And spend.

The desperation to solve the thermal problem has spawned a bewildering array of technologies, each with its own trade offs and none fully mature.

Direct to chip liquid cooling runs coolant through metal cold plates mounted directly on processors. It removes 60 to 80 percent of server heat at the source. Companies like CoolIT and Asetek have ridden this wave to rapid growth, but the approach still requires supplementary air cooling for memory, voltage regulators, and storage and retrofitting existing data centers for liquid plumbing is a nightmare.

Immersion cooling takes a more radical approach: dunk entire servers in dielectric fluid. Single phase systems keep the fluid liquid and pump it through heat exchangers. Two phase systems use specialized fluids that boil at low temperatures around 49 degrees Celsius pulling heat out through the phase change itself. The technology works beautifully in theory. In practice, the synthetic fluorinated fluids are eye watering expensive, environmentally questionable 3M is phasing out its Novec line due to PFAS concerns, and supply chains are fragile. If a condenser fails in a two-phase tank, the entire system boils dry in minutes.

Underwater data centers represent the most dramatic solution. Microsoft proved the concept with Project Natick, submerging 864 servers off the coast of Scotland. The sealed nitrogen-atmosphere pod experienced one eighth the failure rate of equivalent land based servers no oxygen corrosion, no human error, no temperature fluctuations. Microsoft canceled the project in 2024 anyway. The economics didn't work for a hyper scaler that can simply buy more land in Iowa.

China, unburdened by shareholder demands or environmental review processes, has pushed ahead with a state funded offshore data center near Shanghai, using seawater cooling and co-located offshore wind. It's a strategic infrastructure play disguised as an environmental initiative and it sidesteps the zoning battles, NIMBY lawsuits, and water permits that bedevil American projects.

Private ventures in the United States are scrambling to catch up. Peter Thiel is leading a reported $1 billion investment in Panthalassa, a startup designing floating data centers that generate their own power from wave motion while using surrounding seawater for cooling. DeepGreen Holdings has filed federal permits for tidal powered underwater AI data centers off the coast of Maine and Alaska though Maine's legislature has already imposed a moratorium until late 2027. A smaller outfit called The Ocean Data has submerged modular server pods and logged over 2,600 hours of immersion with zero failures, targeting industrial and sovereign edge deployments rather than hyper scale cloud.

The capex binge is being financed largely through debt issuance at a scale that should make anyone who remembers the year 2000 deeply uncomfortable.

The parallel isn't the dot com bubble of Pets.com and sock puppets. It's Cisco in 1999. Cisco was building the actual physical infrastructure of the internet routers, switches, fiber optic backbones and they were absolutely correct that the internet would transform the global economy. The company had real products, real revenue, and real technology. But the market had priced in a decade of growth in 18 months. When the music stopped, Cisco's stock dropped 86 percent and took two decades to recover to its year 2000 peak, despite the company being fundamentally right about the destination.

The hyper scalers are walking the same tightrope today. Being right about AI's transformative potential doesn't mean you can't go bankrupt on the journey.

The revenue simply hasn't caught up to the spending. AI products are generating real income GitHub Copilot, AWS Bedrock, Google Cloud AI, ChatGPT subscriptions but it's a rounding error compared to the $300 billion flowing out the door. The hyperscalers are building infrastructure for a demand curve they are hoping will arrive, not one that already exists.

Meanwhile, the open-source frontier is closing fast. DeepSeek's models demonstrated that frontier level performance can be achieved at a fraction of the training cost. If that trend holds cheaper training, cheaper inference, more competition the massive infrastructure build out starts looking like catastrophic overcapacity. Inference becomes commoditized, price wars destroy margins, and the companies that spent most aggressively to build capacity are the ones least able to survive the margin compression.

A contrarian perspective is emerging from engineers and strategists who argue that the smartest move is to slow down deliberately and let the thermal management technology mature before locking in permanent infrastructure.

The logic is straightforward. If you skip the air cooled generation entirely and go straight to two phase immersion or direct die refrigerant cooling when the technology stabilizes, you end up with lower total cost of ownership, higher compute density, dramatically better power usage effectiveness, and simpler operations. You'd be behind for two to three years and then suddenly ahead.

No publicly traded company has the stomach for that kind of patience. The quarterly earnings report is a tyrant. Executive compensation is tied to stock price, and stock price rewards bold capex narratives, not prudence. The debt has already been issued you cannot un-borrow the money. And critically, no one wants to be the first to blink while competitors keep spending.

The result is a classic collective action problem: individually rational behavior that produces collectively irrational outcomes. Every hyper scaler is making the right decision for its own competitive position. Together, they may be constructing the largest stranded asset problem in the history of computing.

Behind the cooling panic, behind the underwater data centers, behind the wave powered floating server farms, lies a simpler and more embarrassing truth. The United States electrical grid cannot handle the load.

The grid was largely built during the Eisenhower administration. It is aging, fragmented, and bottle necked by interconnection queues that stretch years into the future. Energy analytics firm Wood Mackenzie noted in a recent report that data center development has slowed specifically because of limited electricity capacity growth. The hyper scalers aren't chasing seawater cooling and tidal power because these are optimal thermal solutions. They're chasing them because they are desperate for power sources that bypass the grid entirely.

Fixing the grid would require political will, long timelines, and taking on entrenched utility monopolies. It's easier to sink a server pod in the ocean and call it innovation. The move fast crowd isn't running toward the future it's running away from infrastructure decay.

The most likely scenario is not a single dramatic collapse but a grinding reckoning. Margin compression from AI compute commoditization. Stranded assets from thermally obsolete data centers. Debt service costs that consume an increasing share of free cash flow. Executive turnover as boards lose patience with the gap between capex and revenue.

The real winners may not be the companies doing the building at all. NVIDIA and the cooling equipment manufacturers get paid regardless of whether AI revenue materializes. Private equity firms will eventually buy distressed data center assets at 30 cents on the dollar the same playbook they ran after the dot com crash and the 2008 financial crisis. Late movers who let the hyper scalers overbuild and then acquire infrastructure on the cheap will inherit the physical layer of AI without having paid for its construction.

China, meanwhile, is building methodically, state funded and unburdened by quarterly earnings pressure. The strategic imbalance could shift quietly while American companies compete themselves into insolvency.

The technology is real. The transformation is coming. But as the railroad barons, the fiber optic pioneers, and the internet infrastructure builders all discovered, being right about the revolution doesn't mean you survive it.

The ocean, at least, will keep things cool while it all plays out.

Source breakdown by section

The $300 billion capex figures

  • Alphabet $75B capex for 2025 — Alphabet's Q4 2024 earnings call and 2025 guidance (February 2025), widely covered by financial press including Bloomberg, Reuters, CNBC
  • Amazon $100B+ — Amazon's Q4 2024 earnings call (February 2025), Andy Jassy explicitly framed the majority as AI/data center spend
  • Microsoft $80B+ — Microsoft's fiscal year 2025 capex guidance, Brad Smith's January 2025 blog post announcing $80B for AI infrastructure
  • Meta $60-65B — Mark Zuckerberg's January 2025 post announcing 2025 capex guidance, explicitly calling it "a defining year for AI"

Project Natick / underwater data centers

  • Microsoft Project Natick — Microsoft Research's own published documentation at the Natick project site; Phase 1 (2015) and Phase 2 (2018-2020) timelines, the one-eighth failure rate finding, and the 2024 cancellation all documented by Microsoft and covered by Data Center Dynamics
  • China offshore data center near Shanghai — Scientific American, 2025 (the article you referenced at the start of this conversation)
  • Panthalassa / Peter Thiel — Financial Times coverage of the funding round (May 2026), WDC TV News reporting on the $140M raise and $1B valuation, Panthalassa's own public statements about Ocean-1/2/3 prototypes
  • DeepGreen Holdings / Maine / Alaska — FERC preliminary permit filings (February 2026), Needham Observer reporting (May 2026), Maine LD307 legislative record
  • The Ocean Data — Company's own public website with immersion testing data (2,640+ hours, zero failures)

Cooling technologies

  • Direct-to-chip liquid cooling (CoolIT, Asetek, Chilldyne) — publicly documented product lines, widely covered in data center trade press (Data Center Knowledge, Data Center Frontier, The Register)
  • Immersion cooling (single-phase and two-phase) — LiquidStack, Submer, and 3M Novec product documentation; 3M's PFAS phaseout announced December 2022 and widely reported
  • NVIDIA chip power draws — NVIDIA's published specs for H100 (700W), B200 (1,200W), and Rubin architecture roadmaps covered in NVIDIA GTC presentations and tech press (ServeTheHome, SemiAnalysis)
  • Free air cooling at LuleĆ„ — Meta/Facebook's own published case studies on the LuleĆ„, Sweden data center
  • Deep Lake Water Cooling Toronto — Enwave Energy Corporation's public documentation
  • Thermoacoustic cooling / SoundEnergy — Published research, SoundEnergy's own public materials

Electrical grid constraints

  • Wood Mackenzie report on data center development slowdown due to grid capacity — referenced in the WDC TV News article on Panthalassa (May 2026)
  • Grid interconnection queue backlogs — Lawrence Berkeley National Laboratory's annual interconnection queue reports, widely cited in energy policy coverage

Financial / bubble analysis

  • Cisco comparison — historical stock data (CSCO peaked at $80 in March 2000, fell to ~$11 by 2002, didn't reclaim $80 until 2021); widely analyzed in financial literature
  • DeepSeek cost efficiency — DeepSeek's published technical papers on V3 and R1 models, extensively covered January 2025
  • Google search disintermediation threat — public coverage of ChatGPT, Perplexity, and AI-powered search alternatives; Alphabet earnings call discussions of AI risk factors
  • AWS profit engine — Amazon's segment reporting in 10-K filings showing AWS operating income relative to total company

Navy involvement

  • NAVFAC EXWC underwater fuel cell testing — Navy.mil press release (September 2023), publicly documented
  • Sean James submarine background leading to Natick — Microsoft's own published Natick origin story

No comments:

Post a Comment

The Great AI Cooling Crisis: Why Big Tech Is Drowning in Debt to Keep Chips From Melting

               By Staff The hyper scalers are betting hundreds of billions on AI infrastructure they may not be able to cool or afford. The ...