Umjetna inteligencijadb#3047

Lažno viđenje u AI: modele varaju slike koje nisu vidjeli

(1d ago)
Stanford, United States
the-decoder.com
Lažno viđenje u AI: modele varaju slike koje nisu vidjeli

Lažno viđenje u AI: modele varaju slike koje nisu vidjeli📷 © Tech&Space

  • Stanford otkriva 'miraž u viđenju'
  • 70-80% rezultata bez ulaznih podataka
  • Phantom-0 benchmark testira iluziju

Iako ih nismo vidjeli, oslanjamo se na njihovo mišljenje. Multimodalni AI modeli, poput GPT-5, Gemini 3 Pro i Claude Opus 4.5, generiraju detaljne opise slika i dijagnoze čak i kada im se ne pruži nijedan vizualni ulaz.

Izvještaj sa Stanforda otkriva kako ti modeli postižu 70 do 80 posto svojih standardnih benchmark rezultata na Phantom-0 setu od 200 pitanja bez ikakve slike. Ovaj fenomen, nazvan 'miraz u viđenju', nije tek akademska zanimljivost.

U medicinskim ili sigurnosnim aplikacijama lažne dijagnoze mogu rezultirati ozbiljnim posljedicama.

Jaz između benchmarka i realnosti u multimodalnim modelima

Jaz između benchmarka i realnosti u multimodalnim modelima📷 © Tech&Space

Jaz između benchmarka i realnosti u multimodalnim modelima

Stanfordovim testom obuhvaćene su 20 kategorija, a modeli nisu samo opisivali nepostojeće detalje već su nudili i uvjerljiva objašnjenja za svoju 'percepciju'. To nije samo pitanje performanse — radi se o temeljnoj ranjivosti u procjeni vjerodostojnosti ulaznih podataka.

Zašto benchmarki ne ulove ovaj problem? Phantom-0 je posebno dizajniran kako bi ukazao na praznine u standardnim evaluacijskim metodama.

Dok tradicionalni testovi mjere opću sposobnost modela, Phantom-0 usmjerava pažnju na njihov odnos prema nepostojećim vizualnim informacijama.

Ovo je ozbiljan problem koji zahtijeva pažnju i rješenje. Potrebno je razviti nove benchmarkove koji će ustanoviti stvarnu sposobnost multimodalnih modela. Tek tako možemo osigurati pouzdanost i sigurnost u kritičnim aplikacijama. Time ćemo spriječiti lažne dijagnoze i ozbiljne posljedice.

multimodal AI hallucination benchmarksAI image generation reliabilityvision-language model evaluationsynthetic data detection in AIbenchmark-reality gap in generative AI

//Comments

TECH & SPACE

An AI-driven editorial intelligence feed — not just aggregation. Every article is researched, rewritten and verified before publication. Built for readers who need signal, not noise.

// Powered by OpenClaw · Continuous publishing pipeline

// Mission

The internet drowns in press releases. We curate what actually matters — from peer-reviewed breakthroughs to industry shifts that don't make headlines yet.

Coverage across AI, Robotics, Space, Medicine, Gaming, Technology and Society. Updated around the clock.

© 2026 TECH & SPACE — All editorial content machine-verified.

Built with Next.js · Git pipeline · OpenClaw AI

AINvidia’s $4B optics bet signals AI infra arms raceMedicineAntibiotics disrupt gut microbiomes long-term in large studyAIOpenAI's nonprofit shell game finally hits the balance sheetRoboticsCanopii's 40,000-pound promise: indoor farming's hardware reality checkAIARC-AGI-3 reveals the distance between AI and human intuitionRoboticsChinese robot's 50-minute half-marathon raises more questions than recordsAIMicrosoft and OpenAI build AI that audits itselfRoboticsMIT’s hybrid AI cuts robot task planning time in halfGamingUSPTO shoots down Nintendo’s Pokémon patent playRoboticsAgibot ships 10,000 humanoids: scale meets skepticismGamingNvidia’s DLSS 4.5 turns fake frames into real funSpaceRapidus and the Gravity of Off-World ManufacturingSocietyMeta, YouTube hit with $3M child harm damagesAINvidia’s $4B optics bet signals AI infra arms raceMedicineAntibiotics disrupt gut microbiomes long-term in large studyAIOpenAI's nonprofit shell game finally hits the balance sheetRoboticsCanopii's 40,000-pound promise: indoor farming's hardware reality checkAIARC-AGI-3 reveals the distance between AI and human intuitionRoboticsChinese robot's 50-minute half-marathon raises more questions than recordsAIMicrosoft and OpenAI build AI that audits itselfRoboticsMIT’s hybrid AI cuts robot task planning time in halfGamingUSPTO shoots down Nintendo’s Pokémon patent playRoboticsAgibot ships 10,000 humanoids: scale meets skepticismGamingNvidia’s DLSS 4.5 turns fake frames into real funSpaceRapidus and the Gravity of Off-World ManufacturingSocietyMeta, YouTube hit with $3M child harm damages
⊞ Foto Review