Loading summary...

Related Videos

LLMs can't reason

9 min read (77% time saved)

Too Long; Didn't Watch — Summary

Many people dismiss the reasoning capabilities of Large Language Models (LLMs) through logical fallacies and emotional biases, often driven by fear of AI surpassing human intelligence, rather than proposing objective tests to evaluate their abilities.

Main Takeaways

The debate around whether LLMs can reason, think, or understand is often plagued by irrational arguments and a lack of objective testing criteria.
Many critics employ 'just-a-ism' (e.g., 'it's just a stochastic parrot') or point to individual failures to dismiss AI's capabilities, which are flawed arguments.
Objective tests are crucial for determining if an entity possesses an ability; a single failure does not disprove a general capability, just as one human failing to run a four-minute mile doesn't mean all humans can't.
Emotional responses, particularly fear of AI making humans feel less special or becoming obsolete, often drive irrational arguments against AI's intelligence.
People tend to shift goalposts for AI intelligence, dismissing achievements as 'just brute force' or 'pattern matching' once AI accomplishes them, and often create new, vague terminology to criticize AI-generated content when they know it's AI-made.
It's crucial to avoid self-deception and evaluate AI's impact rationally, focusing on its actual capabilities and how it performs against average human benchmarks, rather than comparing it to the top 1% or inventing abstract concepts to dismiss its progress.

Detailed Summary

Introduction (0:00 - 2:08)

The speaker introduces the frustrating debate in AI circles regarding whether LLMs can think, reason, or understand. He highlights a common issue where people dismiss AI's capabilities with phrases like 'it's just a stochastic parrot' or 'it's just a next token predictor,' a phenomenon he refers to as 'just-a-ism.'

The discussion around LLMs' ability to think, reason, and understand is often frustrating due to a lack of clear testing criteria.
Many critics use 'just-a-ism' (e.g., 'just a stochastic parrot') to dismiss AI capabilities without proposing concrete tests.
Scott Aronson's concept of 'just-a-ism religion' is introduced, describing those who attribute AI's abilities to simplistic mechanisms.
Examples of 'just-a-ism' include claims that LLMs are 'giant Plinko boards' or 'mechanistic processing,' or lack 'chemical processes' for thinking.
The speaker predicts that no one will propose an actual test for AI reasoning, instead resorting to 'just-a-ism' arguments.

How to Test if Something "Can" (2:08 - 6:42)

The speaker explains that to determine if something has an ability, one must provide an objective test and observe if it passes. He uses examples like humans reasoning, rocks reasoning, and birds flying to illustrate that a single positive example proves a capability, while a single failure does not disprove it.

To prove an ability, one must show an example of it, such as a human reasoning or a bird flying.
The inability of one individual (e.g., an unhealthy bird) does not disprove the general capability of the species.
The speaker introduces the concept of testing if something 'can tell time' using various objects like metal gears, a stick in the mud (sundial), and a digital watch.
He argues that dismissing these objects' ability to tell time by saying 'it's just a bunch of gears' or 'just a stick' is a flawed 'just-a-ism' argument.
An objective test for telling time requires regular and measurable change, observable progress from that change, and consistent intervals over time.

Logic Fail (6:42 - 11:20)

The speaker emphasizes the need for objective tests to determine abilities, drawing a parallel to the four-minute mile. He argues that individual failures of LLMs do not disprove their general capacity for reasoning, just as one person failing to run a four-minute mile doesn't mean all humans can't.

Objective tests are necessary to determine if something possesses an ability, like running a four-minute mile.
Rules and norms are established for such tests to prevent loopholes and ensure fairness.
A single failure (e.g., the speaker running a nine-minute mile) does not prove that humans cannot achieve a four-minute mile.
Historically, the four-minute mile was thought to be biologically impossible until one person broke the barrier, proving the general human capability.
Similarly, individual instances of LLMs failing a task or showing a lack of reasoning do not disprove their overall capacity for reasoning.
Pointing to a broken clock doesn't mean working clocks can't tell time, and human foolishness doesn't mean humans can't reason.

AI Hate (11:20 - 15:17)

The speaker suggests that the belief that LLMs can't reason is often driven by fear and emotional responses. He notes that discussions about AI often trigger strong negative reactions, especially from those whose sense of self-worth is tied to human intelligence.

The belief that LLMs can't reason is often a 'silly belief' without specific test evidence.
Discussions about AI's capabilities, particularly reasoning, often trigger strong emotional and negative responses, which the speaker calls 'AI hate.'
This fear stems from the idea that if machines can think and reason like humans, it might diminish human importance or lead to job displacement.
A quote from October 2023 highlights that if one values intelligence above all other human qualities, they will 'have a bad time' as AI advances.
This fear leads to logical fallacies like 'just-a-ism' and dismissing AI's achievements based on individual mistakes.
The speaker also addresses the fallacy of demanding proof from those who challenge the 'AI can't reason' stance, rather than providing a test for their own assertion.

Shifting Goalposts (15:17 - 19:19)

The speaker discusses how critics of AI constantly shift the goalposts for what constitutes intelligence. He provides examples from chess, Go, object recognition, and scientific research, where AI's achievements are consistently dismissed as 'not real intelligence' once accomplished.

Critics of AI constantly move the goalposts for what defines intelligence, making it difficult to objectively assess AI's capabilities.
When AI beat world-class chess players, it was dismissed as 'brute force and heuristics,' not true intelligence.
Similarly, AI's mastery of trivia and language was labeled 'just statistics.'
AI's ability to recognize diverse objects was dismissed as 'just pattern matching,' lacking explanation.
Even in Go, where AI made creative moves, it was still called 'narrow intelligence.'
The speaker refutes the idea that AI only works within its dataset, citing AlphaFold and Google DeepMind's novel scientific research as counter-examples.
He also dismisses the argument that an LLM's self-reported inability to understand or reason is proof, comparing it to doubting human reasoning based on human responses.
The argument that thinking requires chemical processes, which LLMs lack, is likened to saying a digital watch can't tell time because it lacks gears, ignoring alternative mechanisms.

AI Derangement (19:19 - 28:55)

The speaker argues that many intelligent people make illogical arguments against AI due to emotional responses rather than rational thought. He suggests that when asked if LLMs can reason, a primal, fear-driven part of the brain takes over, leading to irrational conclusions.

Intelligent individuals often make illogical arguments against AI due to emotional rather than rational processing.
The question 'Can LLMs reason?' triggers a 'red alert' in some people's brains, activating a primal fear response.
This fear leads to a subconscious desire to prove that AI cannot reason, as the alternative (AI can reason) is perceived as threatening to human importance or survival.
The arguments against AI often don't logically answer the question posed but rather address a different, fear-driven prompt: 'Explain why LLMs can't do the thing that makes me feel special.'
The speaker illustrates this with an example of an artist criticizing AI-generated animation, dismissing it as 'not made' by the user, 'soulless,' or 'unwatchable,' while applying different standards to human-made art or photography.
This bias is evident in how different groups react negatively to AI excelling in areas that make them feel special (e.g., artists to AI art, musicians to AI music).

How AI Triggers People (28:55 - 33:39)

The speaker elaborates on how AI triggers negative reactions, particularly from professionals whose work is being challenged by AI's advancements. He notes that critics often invent new, vague terminology to describe AI outputs negatively, even when objective measures show AI performing well.

Negative reactions to AI advancements often come from people who pride themselves on skills that AI is starting to perform at a human level.
Musicians react negatively to AI music, coders to AI coding, and artists to AI art, often using emotional rather than objective critiques.
When judging AI outputs, critics often invent new terminology like 'no soul,' 'no depth,' 'tiny,' 'shallow,' or 'no meaning' to differentiate it from human work.
A LinkedIn chart illustrating why readers dislike AI writing (lack of subtext, author intention, etc.) is presented as an example of this phenomenon.
However, studies show that humans often prefer AI-generated poems over human-written ones when evaluated blindly, rating AI poems as clearer and more aesthetically pleasing.
This preference reverses when participants are informed the poem is AI-generated, demonstrating a bias against AI-created content.

Don't Lie to Yourself (33:39 - 37:42)

The speaker concludes by urging listeners to avoid self-deception when evaluating AI. He stresses the importance of rational assessment, focusing on objective tests and average performance rather than emotional biases or comparisons to the top 1% of human talent. He humorously suggests that only YouTubers might be safe from AI, before highlighting the circular definition of 'reason' and 'logic.'

It is crucial to avoid self-deception and maintain rationality when assessing AI's capabilities, especially in the coming years.
Individuals should conduct specific tests to evaluate AI's abilities in their field, rather than inventing abstract concepts to dismiss its potential.
It's important to consider AI's impact on average performance (e.g., AI coding better than 50% of developers) rather than comparing it only to the top 1%.
Studies showing average people cannot distinguish between AI and human-made music (a 50/50 guess) should inform market impact assessments.
Overreacting or underreacting to AI's expansion will lead to issues; a balanced, rational approach is necessary.
The speaker humorously suggests that YouTubers might be the only profession safe from AI due to the need for 'lived experience,' 'presence,' 'soul,' and 'X factor.'
He concludes by pointing out the circular definition of 'reason' (thinking using logic) and 'logic' (reasoning conducted), highlighting the ambiguity in the core terms of the debate.

Notable Quotes

"AI can't do X Y or Z because it's just a stochastic parrot. It's just a next token predictor. It's just a this or just a that. These people never stop and think what am I just a." — Wes Roth

"If you value intelligence above all other human qualities, you're going to have a bad time." — Jeffrey Leish

"Their assessment and ratings dropped when they were told the poem was AI generated showing a general bias against AI created pieces." — Wes Roth

"Whatever you do, above all, don't lie to yourself." — Wes Roth

"Reason is to think using logic, the process of logic. And logic of course means to reason, right? To to do the process of reasoning." — Wes Roth

Summarize another video

Press ⌘K to quickly paste a new URL

Related Videos