Back to Home
AIdb#2370

Alibaba’s Qwen3.5-Omni writes code from speech—no training required

(3d ago)
Hangzhou, China
the-decoder.com
Alibaba’s Qwen3.5-Omni writes code from speech—no training required

Alibaba’s Qwen3.5-Omni writes code from speech—no training required📷 Published: Apr 12, 2026 at 08:29 UTC

  • Omnimodal model claims audio task lead over Gemini 3.1 Pro
  • Spoken instructions and video-to-code without explicit training
  • Developer reactions split between skepticism and cautious optimism

Qwen3.5-Omni isn’t just another multimodal upgrade. It’s Alibaba’s attempt to outflank Google’s Gemini 3.1 Pro on audio tasks while stumbling into an unplanned feature: translating voice memos and video walkthroughs into executable code. Early benchmarks—always a minefield—suggest it edges out Gemini in audio comprehension, but the real curiosity is how it acquired coding skills without targeted fine-tuning.

The demo reels show a researcher verbally describing a Python function, followed by the model spitting out syntactically correct (if not always elegant) code. More intriguing: it reportedly parses video tutorials of terminal commands and replicates them. That’s the kind of emergent behavior that makes engineers lean forward—or roll their eyes, depending on how many times they’ve seen "unsupervised learning" overpromise.

This isn’t Alibaba’s first rodeo with multimodal models, but the coding angle is new. The company’s Qwen2 series focused on text and vision; adding audio and video input was inevitable. The twist? The model’s ability to bridge these modalities into code output without explicit training data for that task—a claim that strains credibility until you remember how often these systems surprise even their creators.

The gap between emergent capability and deployable skill

The gap between emergent capability and deployable skill📷 Published: Apr 12, 2026 at 08:29 UTC

The gap between emergent capability and deployable skill

Here’s where the hype filter kicks in. Emergent capabilities are fascinating until you ask: How reliably? The difference between a demo converting a spoken loop into Python and handling a real-world debugging session is the difference between a party trick and a product. Alibaba’s documentation stays vague on failure rates, edge cases, or whether this works beyond contrived examples.

The developer community’s reaction on GitHub and forums like r/MachineLearning is a study in measured skepticism. Some praise the model’s audio chops; others note that "writing code from video" often means transcribing visible text, not inferring logic from pixels. The real test will be whether this translates to practical workflows or remains a benchmark footnote.

Competitively, this puts pressure on Google and Mistral to prove their multimodal models can do more than parse inputs—they must synthesize across them. For Alibaba, it’s a chance to position Qwen as the Swiss Army knife for developers who’d rather dictate than type. But as with all emergent behaviors, the question isn’t just can it do this—it’s how often does it do it right?

Qwen3.5-OmniNova DimenzijaMultimodal AI
// liked by readers

//Comments

RoboticsBaidu robotaxis grounded: China’s traffic chaos exposes real-world limitsAIDisney’s $1B AI bet collapses before the first frameMedicineInflammation’s Epigenetic Scars May Linger, Raising Colon Cancer RiskAIMistral’s tiny speech model fits on a watch—so what?MedicineAutism Gene StudyAIConntour Raises $7MMedicineBrain aging’s genetic map: AI hype vs. Alzheimer’s realityAIPorn’s AI Clones Aren’t Immortal—Just Better PackagedMedicine$100M federal bet on joint regeneration—what the trials can (and can’t) proveAIGitHub’s Copilot data grab: opt-out or be trainedMedicineRNA Sequencing UnifiesAIAI’s dirty little secret: secure by default is a mythSpaceEarth Formed From Inner Solar SystemAI$70M for AI code verification—because shipping works, not just generating itSpaceYouTube’s AI cloning tool exposes a deeper problemAIAI traffic now outpaces humans—but who’s really winning?SpaceSmile Mission to X-Ray Earth’s Magnetic ShieldGamingNvidia’s AI art war: Why players are sharpening the pitchforksSpaceGamma Cas’s X-Ray Mystery Solved After 40 YearsTechnologyLeaked iPhone hacking tool exposes Apple’s zero-click blind spotRoboticsBaidu robotaxis grounded: China’s traffic chaos exposes real-world limitsAIDisney’s $1B AI bet collapses before the first frameMedicineInflammation’s Epigenetic Scars May Linger, Raising Colon Cancer RiskAIMistral’s tiny speech model fits on a watch—so what?MedicineAutism Gene StudyAIConntour Raises $7MMedicineBrain aging’s genetic map: AI hype vs. Alzheimer’s realityAIPorn’s AI Clones Aren’t Immortal—Just Better PackagedMedicine$100M federal bet on joint regeneration—what the trials can (and can’t) proveAIGitHub’s Copilot data grab: opt-out or be trainedMedicineRNA Sequencing UnifiesAIAI’s dirty little secret: secure by default is a mythSpaceEarth Formed From Inner Solar SystemAI$70M for AI code verification—because shipping works, not just generating itSpaceYouTube’s AI cloning tool exposes a deeper problemAIAI traffic now outpaces humans—but who’s really winning?SpaceSmile Mission to X-Ray Earth’s Magnetic ShieldGamingNvidia’s AI art war: Why players are sharpening the pitchforksSpaceGamma Cas’s X-Ray Mystery Solved After 40 YearsTechnologyLeaked iPhone hacking tool exposes Apple’s zero-click blind spot
⊞ Foto Review