DNYUZ
No Result
View All Result
DNYUZ
No Result
View All Result
DNYUZ
Home News

The AI History That Explains Fears of a Bubble

December 22, 2025
in News
The AI History That Explains Fears of a Bubble

Concerns among some investors are mounting that the AI sector, which has singlehandedly prevented the economy from sliding into recession, has become an unsustainable bubble. Nvidia, the main supplier of chips used in AI, became the first company worth $5 trillion dollars. Meanwhile, OpenAI, the developer of ChatGPT, has yet to make a profit and is burning through billions of investment dollars per year. Still, financiers and venture capitalists continue to pour money into OpenAI, Anthropic, and other AI startups. Their bet is that AI will transform every sector of the economy and, as happened to the typists and switchboard operators of yesteryear, replace jobs with technology.

[time-brightcove not-tgx=”true”]

Yet, there are reasons to be concerned that this bet may not pay off. For the past three decades, AI research has been organized around making improvements on narrowly-specified tasks like speech recognition. With the emergence of large language models (LLMs) like ChatGPT and Claude, however, AI agents are increasingly being asked to do tasks without clear methods for measuring improvement.

Take for example the seemingly mundane task of creating a PowerPoint presentation. What makes a good presentation? We may be able to point to best practices, but the “ideal” slideshow depends on creative processes, expert judgments, pacing, narrative sense, and subjective tastes that are all highly contextual. Annual review presentations differ from start-up pitches and project updates. You know a good presentation when you see it—and a bad one when it flops. But the standardized tests that the field currently uses to evaluate AI cannot capture the above qualities.

This may seem like a minor problem, but crises of evaluation have contributed to historical AI busts. And without accurate measures of how good AI really is, it’s hard to know whether we’re headed towards another one now.

Read more: The Architects of AI Are TIME’s 2025 Person of the Year

The birth of AI is often traced back to a small workshop at Dartmouth in 1956 that brought together computer scientists, psychologists, and others with a shared interest in mimicking human intelligence in machines. The field quickly found a powerful benefactor in the Defense Advanced Research Projects Agency (DARPA), an agency within the Department of Defense charged with maintaining technological supremacy during the Cold War. To avoid falling behind in the science race, DARPA lavished AI researchers at universities and private firms with significant no-strings-attached grants over the next 40 years.

These first decades of the field were defined by peaks of excitement, as new technologies were invented, followed by valleys of disappointment, as they failed to evolve into useful applications. During the 1980s, this cycle was spurred by an AI technology called “expert systems,” which promised to build machines with the intelligence of professionals like doctors and financial planners. Under the hood, these programs encoded human expertise in formal rules: if the patient has a fever and a rash, then test for measles.

Expert systems attracted significant attention and investment from industry based on early successes like automating loan applications. But this optimism was largely fueled by hype, rather than rigorous testing. In practice, these expert systems tended to make strange and sometimes disastrous mistakes when challenged with more complex tasks. During one humorous showcase, an expert system suggested a man’s infection might have been caused by a prior amniocentesis (a procedure performed on pregnant women). It turned out researchers had forgotten to add a rule for gender.

At the time, fiery AI critic Hubert Dreyfus described these failures as the “fallacy of the first step,” arguing that equating expert systems with progress toward real intelligence was like “claiming that the first primate to climb a tree was taking a first step towards flight to the moon.” The problem was that, as tasks became more complicated, the number of rules needed for every possible case mushroomed. Like moving from tic-tac-toe to checkers to chess, the number of possibilities doesn’t merely increase, it explodes exponentially.

When it became apparent that expert systems could not climb further, AI research entered a so-called “AI Winter” in the late 1980s. Grants dried up, companies shut down, and AI became a dirty word.

In the aftermath, DARPA re-evaluated its AI funding strategy. Rather than give no-strings-attached grants, government program managers began conditioning awards on attaining the highest score on a standardized test they called a “benchmark.” In contrast to complex problems like medical diagnosis, benchmarks focused on bite-sized tasks that were attainable and of immediate commercial and military value. They also used quantitative metrics to verify results. Can your system accurately translate this sentence from Russian to English, transcribe this audio snippet, or digitize letters in these documents? Researchers had to do more than make flashy claims based on promising but incomplete technologies. To get funded, they had to deliver concrete evidence of improvements on the benchmarks.

These benchmark competitions unified an unruly field by funneling AI researchers towards common problems. Instead of each research group choosing its own projects, DARPA shaped the collective agenda of the field by funding researchers to work on specific tasks like digit recognition or speech-to-text. The competitive nature of the new funding regime meant that AI orientations that were less successful on the benchmarks were crowded out. For example, the very first benchmark competition demonstrated that “machine learning” algorithms that can learn from data dominated the hand-crafted, rule-based approaches of the past.

Public leaderboards were soon erected to provide real-time feedback on which algorithms held the current highest scores on each benchmark, allowing researchers to learn from past successes. As tasks were solved, more complex tasks were put in their place. Translating words led to translating paragraphs, and eventually multiple languages. Digit recognition gave way to object recognition in images, then videos.

In the early 2010s, progress accelerated after benchmarks convinced researchers to go all in on one machine-learning approach inspired by the human brain, called artificial neural networks or “deep learning,” which now underpins today’s generative AI. Within a couple of years speech-to-text algorithms were powering modern AI assistants, and tumor recognition algorithms began to outperform radiologists on some cancers. Benchmarking had seemingly cracked the first step toward usable AI in everyday life.

By the end of the decade, the field was surprised to discover that their progress on benchmark tasks had led to deep-learning algorithms that could generate fluent, socially appropriate text like screenplays and poetry. These abilities did not show up in the benchmarks because the benchmarks weren’t designed to find them. This revelation catalyzed the generative AI revolution, leading to large language models like ChatGPT, Claude, and others that dominate the market today. It was the field’s greatest triumph. Yet, with this new technology, the field faces a new crisis.

Put simply, the tasks we now seek to automate no longer have clear benchmarks. There is no “correct” PowerPoint, marketing campaign, scientific hypothesis, or poem. Unlike object recognition where there is a right or wrong answer, these are complex, creative, multi-dimensional, and process-based problems, and even the hardest benchmarks simply cannot objectively measure progress.

As a result, new models of ChatGPT, Claude, Gemini, and Copilot are evaluated as much by “vibe tests” as concrete benchmarks. We’re currently caught between two inadequate approaches: old-style benchmarks that measure narrow capabilities precisely, and qualitative assessments that try to capture the practical capacities of these systems, but cannot produce clear, quantitative evidence of progress. Researchers are exploring new evaluation systems that bridge these perspectives, but this is a really hard problem.

Current investments assume significant automation will arrive in the next three to five years. But without reliable evaluation methods, we cannot know whether LLM-based technologies are leading us toward genuine automation or repeating Dreyfus’ fallacy, taking the first step on a dead-end path. This is the difference between the infrastructure of the future and a bubble. Right now, it’s difficult to tell which one we’re building.

Bernard Koch is an assistant professor of sociology at the University of Chicago who studies how evaluation shapes science, technology, and culture. David Peterson is an assistant professor of sociology at Purdue University who studies how AI is transforming science.

Made by History takes readers beyond the headlines with articles written and edited by professional historians. Learn more about Made by History at TIME here. Opinions expressed do not necessarily reflect the views of TIME editors.

OpenAI and TIME have a licensing and technology agreement that allows OpenAI to access TIME’s archives.

The post The AI History That Explains Fears of a Bubble appeared first on TIME.

How Blocking Illegal ‘Ghost’ Roads Could Protect Tropical Forests
News

How Blocking Illegal ‘Ghost’ Roads Could Protect Tropical Forests

by New York Times
December 22, 2025

Preventing illegal road building could help protect tropical forests. New research tries to identify which areas are most at risk. ...

Read more
News

‘Be wary’: Analysts raise questions over ‘uncertainty’ of Trump’s DJT stock and merger

December 22, 2025
News

Jerry Kasenetz, a King of Bubblegum Pop Music, Dies at 82

December 22, 2025
News

South African police arrest suspect in mass shooting that killed 12, including 3 children

December 22, 2025
News

‘I’ve seen it all’: Chatbots are preying on the vulnerable

December 22, 2025
Want to Love the Holidays? Give Up Hosting.

The Burdens and Joys of Holiday Hosting

December 22, 2025
Notorious crypto conman Sam Bankman-Fried has a prison passion project: giving legal advice to other inmates

Notorious crypto conman Sam Bankman-Fried has a prison passion project: giving legal advice to other inmates

December 22, 2025
Judge extends bar on Kilmar Abrego García’s re-detainment

Judge extends bar on Kilmar Abrego García’s re-detainment

December 22, 2025

DNYUZ © 2025

No Result
View All Result

DNYUZ © 2025