Loading summary...

Related Videos

Deepseek just killed LLMs

8 min read (70% time saved)

Too Long; Didn't Watch — Summary

Deepseek OCR introduces a significant breakthrough in AI efficiency by compressing visual data up to 20x while retaining high accuracy, potentially revolutionizing how AI models process information and addressing current bottlenecks. Concurrently, Google announced major advancements in quantum computing and an AI model that discovered a new cancer therapy pathway, while a controversial AGI definition paper sparked debate over AI safety and validation. Andrej Karpathy advocates for image-based inputs over text tokens, highlighting their efficiency and security benefits, aligning with Elon Musk's vision of photon-based AI interactions.

Main Takeaways

Deepseek OCR offers a novel approach to data compression for AI models, converting text into images to significantly reduce token count and improve processing efficiency without substantial loss of accuracy.
Google has made strides in quantum computing, achieving 13,000x faster algorithm execution, and developed an AI model capable of identifying new cancer therapy pathways, demonstrating the emergent capabilities of scaled AI.
A paper defining AGI by prominent AI safety researchers faced criticism for unverified citations, raising concerns about the rigor of AI safety research and the potential for LLM hallucination in academic work.
Andrej Karpathy champions the idea of using pixels as primary inputs for LLMs, arguing that text tokens are inefficient and problematic, and suggests that image-based inputs could lead to more robust and secure AI systems.
The video also touches on other AI news, including speculation about Google's Gemini 3.0, the vulnerability of LLMs to poisoning with a small number of malicious documents, and the performance of various LLMs in cryptocurrency trading.

Detailed Summary

Deepseek OCR: A New Efficiency Paradigm (0:00 - 4:04)

Deepseek OCR is introduced as a significant advancement in optical character recognition, capable of compressing visual context by up to 20% while maintaining 97% accuracy.
This compression addresses key bottlenecks in AI models: limited memory in context windows, slow training speeds, and the cost associated with larger context windows.
By converting text into images, Deepseek OCR allows for massive data compression without losing meaning, similar to how memes convey complex ideas efficiently.
The technology is particularly relevant for countries like China, which face GPU supply constraints, as it enables more efficient model training with fewer hardware resources.
Andrej Karpathy praised the Deepseek OCR paper, highlighting its potential to make AI models more efficient and general.

Google's Breakthroughs in Quantum Computing and AI for Cancer Research (4:04 - 8:54)

Google AI announced a major breakthrough in quantum computing, demonstrating a quantum computer running a verifiable algorithm 13,000 times faster than leading classical supercomputers.
A 27-billion parameter foundation model from Google's open-sourced Gemma family identified a new potential cancer therapy pathway, addressing the challenge of making 'cold' tumors 'hot' (visible to the immune system).
This discovery showcases the emergent capabilities of scale in AI models, where larger models develop abilities not present in smaller versions.
The model simulated 40,000 drugs and generated novel hypotheses for cancer treatment, with some candidates already known and others being surprising new hits.
Lab tests confirmed a 50% increase in antigen presentation with the AI-identified drug combination, providing a blueprint for new biological discovery and accelerating scientific research.

Controversy Around AGI Definition and AI News (8:54 - 13:40)

A paper titled 'A Definition of AGI,' authored by prominent AI safety researchers, faced criticism for containing non-existent citations, leading to speculation that LLMs were used to write portions of the paper without proper human validation.
Critics highlighted the irony of an AI safety paper failing to validate AI output, setting a poor example for the field.
Speculation arose about Google's Gemini 3.0, with two mystery models ('lithium flow' and 'Orion Mist') appearing on LL Arena, and a potential release date in December.
The video noted a Poly Market trend where individuals profited by betting on OpenAI's browser release.
Anthropic researchers discovered that LLMs are vulnerable to poisoning, with just 250 malicious documents capable of compromising models across various sizes, leading to gibberish output when specific trigger phrases are encountered.
In an 'alpha arena' for cryptocurrency trading, Deepseek and Quen models were outperforming others, including Grock, by consistently beating a simple buy-and-hold Bitcoin strategy.

Andrej Karpathy's Vision: Pixels Over Text Tokens (13:40 - 22:06)

Andrej Karpathy expressed strong support for Deepseek OCR, emphasizing his belief that pixels are superior inputs for LLMs compared to text tokens.
He argues that text tokens are wasteful, ugly, and problematic due to Unicode complexities, historical baggage, and security risks like prompt injection via invisible instructions.
Karpathy suggests that all LLM inputs should ideally be images, even rendering pure text into an image before feeding it to the model, to leverage richer information and compression.
This approach would allow for shorter context windows, increased efficiency, and a more general information stream, including bold text, colored text, and arbitrary images, similar to how memes convey nuanced meaning.
He highlighted his Nanochat project, which allows users to create a ChatGPT clone from scratch, demonstrating the potential for accessible AI model development.
Elon Musk echoed this sentiment, stating that "Long-term more than 99% of input and output for AI models will be photons. Nothing else scales," aligning with the idea of a reality based on light.

Deepseek OCR Paper Details and Implications (22:06 - 26:38)

The Deepseek OCR paper details its ability to compress text tokens by a factor of 10x with 97% decoding precision, and even 20x with 60% accuracy.
It can generate 200,000 pages of training data per day (or 33 million pages with 20 nodes) for LLMs and VLMs, addressing the quadratic scaling of computational cost with sequence length in long documents.
The technology leverages visual modality as an efficient compression medium, a necessity-driven innovation from Chinese AI labs facing hardware constraints.
Deepseek OCR equips models with capabilities for parsing charts, chemical formulas, simple geometric figures, and natural images, making it highly useful for financial research, scientific fields, and STEM applications.
It can convert documents to markdown, recognize chemical formulas into SMILES format, and perform general visual understanding tasks like image description and object detection.
The paper's findings open new possibilities for synergistically combining vision and language modalities to enhance computational efficiency, allowing AI to process more information with less computational burden.

Notable Quotes

"I quite like the new DeepSeek OCR paper. It's a good OCR model."
— Andrej Karpathy

"Doesn't AI safety involve validating output? How can a center for AI safety not validate the output of an AI model before rushing to publish a white paper? It sets a very bad example and frankly it's disgusting."
— Dominic Romano

"Tokenizers are ugly, separate, not to end stage. It imports all the ugliness of Unicode by encodings and inherits a lot of the historical baggage. Security/jailbreak risks."
— Andrej Karpathy

"Long-term more than 99% of input and output for AI models will be photons. Nothing else scales."
— Elon Musk

"Necessity is the mother of invention."
— Wes Roth (quoting a saying)

"How many AI safety PhDs does it take to write a paper defining AGI? None."
— The Liberator

"Andre Karpathy is an American former supermodel known for his modeling services."
— Nanochat (when asked about Andrej Karpathy)

"Good one STEG to compress millions of characters into images and then train a model to understand the text in those STE encoded images inherently."
— Elder Plius the Liberator

"I thought I told you to clean your room."
— Spongebob Squarepants meme example

"I already ranted about how much I dislike the tokenizer."
— Andrej Karpathy

"The tokenizer must go."
— Andrej Karpathy

"I have to also fight the urge to side quest an image input-only version of Nanohat."
— Andrej Karpathy

"Nano chat is a recent project by Andre Carpathy and as he says it's among the most unhinged I've written."
— Wes Roth (describing Nanochat)

"It's weird to think about, but our consciousness, our brain only experiences photons. all the lights, all the objects, everything we observed that's photons. Even when we touch things, we're not actually touching the atoms."
— Wes Roth (reflecting on Elon Musk's comment)

"We present Deepseek OCR as an initial investigation into the physibility of compressing long context via optical 2D mapping."
— Deepseek OCR paper

"Experiments show that the number of text tokens is within 10 times that of vision tokens."
— Deepseek OCR paper

"The model can achieve decoding precision of 97%."
— Deepseek OCR paper

"Even at compression ratio of 20x, the OCR accuracy remains at about 60%."
— Deepseek OCR paper

"In production, Deepsec OCR can generate training data for LMS and VLMs at a scale of 200,000 pages per day."
— Deepseek OCR paper

"LMS have a problem processing long documents there's a quadratic scaling with sequence length."
— Deepseek OCR paper

"This image can represent rich information using substantially fewer tokens than the equivalent digital text."
— Deepseek OCR paper

"These models they created, they equip the model with capabilities for parsing charts, chemical formulas, simple geometric figures, and natural images."
— Deepseek OCR paper

"Deepc OCR can generate 33 million pages of data per day for LMS and VLMs using 20 nodes."
— Deepseek OCR paper

"Their discoveries open new possibilities for how vision and language modalities can be synergistically combined to enhance computational efficiency."
— Deepseek OCR paper

"In the field of financial research reports the deep parsing mode of a deepseek OCR can be used to obtain structured results of charts within documents."
— Deepseek OCR paper

"Charts are a crucial form of data representation. finance and scientific fields."
— Deepseek OCR paper

"This technology may play a significant role in the development of models like this in the STEM fields."
— Deepseek OCR paper

"We retain Deepseek OCR's capabilities in general visual understanding mainly including image description, object detection, grounding etc."
— Deepseek OCR paper

"Because they included texton data, Deepseek OCR's language capabilities are also retained."
— Deepseek OCR paper

"When you ask a large language models for how many Rs there are in strawberry something like that it's important to understand that it's not seeing words it's seeing tokens."
— Andrej Karpathy

Summarize another video

Press ⌘K to quickly paste a new URL

Related Videos