Loading summary...

Related Videos

How Chinese DoorDash Is Making Better LLMs Than Meta

5 min read (68% time saved)

Too Long; Didn't Watch — Summary

Meituan, a Chinese food delivery giant, has rapidly emerged as a significant player in the AI landscape through its LongCat lab, which has released an impressive array of open-source LLMs, video generation, and omnimodal models in a short period, showcasing innovative architectures and a strong commitment to sharing detailed infrastructure optimizations.

Summarize another video

Press ⌘K to quickly paste a new URL

Related Videos

Detailed Summary

Introduction (0:00 - 1:31)

The video introduces LongCat, an AI lab from the Chinese food delivery company Meituan, highlighting its surprising and rapid emergence as a leader in AI research. Despite being a food delivery company, Meituan's LongCat lab has, in just four months since September 2025, released a wide range of open-source research including LLMs, audio encoders, video generation models, datasets, benchmarks, reasoning models, multimodal models, and image models. The presenter questions how a company seemingly unrelated to AI could achieve such rapid and high-quality output, even training their first LLM in just 30 days.

Hostinger Sponsorship (1:32 - 2:48)

This section is a sponsored message for Hostinger, promoting their all-in-one platform for website building, hosting, domains, email, marketing tools, and AI features. It emphasizes Hostinger's VPS KVM2 plan, which offers self-hosting capabilities for various applications like Docker and Olama, with no scaling limits and extensive resources. A special discount code "BYCLOUD" is offered for an additional 10% off the already discounted VPS KVM2 plan, which is 61% off at $6.99 a month plus two months free on a 24-month option.

Meituan's Background and LongCat's Name (2:49 - 5:04)

The presenter delves into Meituan's history, clarifying that it's a massive company, generating nearly four times the revenue of DoorDash and slightly less than Uber. Founded in 2010 by Wonging, Meituan evolved from a group buying company to a dominant food delivery service, entering the market even before Uber Eats. The video also explores the potential origin of the name "LongCat," suggesting it might be inspired by the Chinese translation of chinchilla ("longma" or "dragon cat"), referencing DeepMind's Chinchilla paper, or even the internet "long cat" meme. Meituan's technical prowess is highlighted by its long history of publishing technical blogs since 2013 and over 80 research papers on arXiv since 2019, demonstrating a deep-rooted culture of open research and technical capability long before LongCat's official announcement in September 2025.

LongCat Flash Chat: Innovative LLM (5:05 - 9:30)

LongCat's first open-source release, LongCat Flash Chat, is detailed as a non-reasoning model accompanied by a 36-page technical report. This report is praised for its unprecedented depth, covering everything from pre-training to distributed training strategies, and even infrastructure details like achieving a training cost of $0.5 per million tokens. The model introduces a novel "context-aware dynamical computation mechanism" that uses "zero computation experts" within its Mixture of Experts (MoE) architecture. This allows the model to dynamically allocate more resources to difficult tokens and less to easier ones, significantly improving compute efficiency. Despite its unique MoE++ mechanism, which is not commonly used, LongCat Flash Chat trained in just 30 days and can serve 100 tokens per second on H100s, demonstrating top-tier performance in agentic tool use and instruction following, comparable to Deepseek V3.1.

LongCat Video: Generative Video Model (9:31 - 11:13)

Released a month after Flash Chat, LongCat Video is presented as a highly valuable contribution to the video generation field, especially given the prevalence of closed-source models. It proposes a 3D block sparse attention mechanism for video latent, which reduces compute by over 90% while maintaining near-lossless quality. The model also incorporates GRPO (used in DeepSeek V3) for flow matching video models and unifies text-to-video, image-to-video, and video continuation into a single input type, a crucial step towards true world models. This unified approach contrasts with the brute-force cross-attention methods often seen in other models.

LongCat Flash Omni: Omnimodal Model (11:14 - 12:24)

One of LongCat's latest papers introduces an omnimodal model built on LongCat Flash, integrating visual and audio inputs with text and audio outputs. While not fully symmetric in input/output, it demonstrates multimodal understanding capabilities comparable to leading private models like Quen 3 Omni, GT40, and Gemini 2.5. A key contribution is LongCat's in-depth sharing of omnimodal model infrastructure optimization, including complex techniques like modality-decoupled parallelism and chunk-based modality bridges, which are rarely disclosed by other labs. This transparency highlights LongCat's commitment to open knowledge sharing.

Conclusion (12:25 - 13:44)

The video concludes by emphasizing LongCat's consistent innovation in optimization and efficiency, with many new techniques focused on infrastructure and compute reduction. The presenter recommends keeping an eye on Meituan's LongCat in 2026, especially for those interested in AI research, as their papers are a valuable resource for learning advanced AI infrastructure skills. The presenter then promotes their own new learning website, "Intuitive AI Academy," which offers intuitive explanations of LLM concepts from the ground up, with over 100,000 words written. A limited-time "NYNM" discount code is offered for 50% off a yearly plan. The video ends with acknowledgments to patrons and YouTube members.