Umjetna inteligencijadb#2616

LSD za MLLM: Kada AI prestaje kopirati i počinje birati

(1d ago)
Global
arxiv.org
LSD za MLLM: Kada AI prestaje kopirati i počinje birati

LSD za MLLM: Kada AI prestaje kopirati i počinje birati📷 © Tech&Space

  • Reinforcement Learning mijenja odabir primjera
  • kNN gubi na kompleksnim regresijskim zadacima
  • Pet benchmarka, nula realnih scenarija

Multimodalni veliki jezični modeli (MLLM) godinama se oslanjaju na jednostavnu, ali ograničenu strategiju: k-Nearest Neighbor (kNN) pretragu za odabir primjera u in-context learningu (ICL). Problem?

Sličnost ne znači relevantnost. Kada je zadatak kompleksna regresija — recimo, precizno procjenjivanje dubine na slikama ili kvantificiranje medicinskih nalaza — kNN često odabere redundantne primjere koji ne pokrivaju cijeli raspon izlaza.

Novi rad s arXiva Learning to Select Visual In-Context Demonstrations uvodi Learning to Select Demonstrations (LSD), pristup koji demonstracije tretira kao sekvencijalni problem odlučivanja, a ne kao statični skup sličnosti. LSD koristi Dueling Deep Q-Network (DQN) s query-centričnim Transformer Decoderom kako bi naučio politiku koja maksimizira performanse MLLM-a na downstream zadacima.

Umjesto da se oslanja na unaprijed definirane metričke udaljenosti, model dinamički gradi skup demonstracija prilagođen specifičnom upitu.

Novi pristup demonstracijama pokazuje gdje kNN zapinje, ali pitanje je tko će ga koristiti

Novi pristup demonstracijama pokazuje gdje kNN zapinje, ali pitanje je tko će ga koristiti📷 © Tech&Space

Novi pristup demonstracijama pokazuje gdje kNN zapinje, ali pitanje je tko će ga koristiti

Rani rezultati na pet vizualnih regresijskih benchmarka pokazuju poboljšanja u odnosu na kNN, ali — kao i uvijek u AI — benchmark nije stvarnost. Istraživači priznaju da metodologija još nije testirana na stvarnim podacima s šumom ili nepredvidivim varijacijama, što je ključno za praktičnu primjenu. Zanimljivo je da se LSD fokusira na vizualne zadatke, iako bi pristup teoretski mogao funkcionirati i za tekstualne scenarije.

To otvara pitanje: je li ovo optimizacija za specifičan problem ili temelj za širu primjenu? Industrijski gledano, LSD bi mogao biti zanimljiv za tvrtke koje se bave preciznim vizualnim analizama — od autonomnih vozila do medicinske dijagnostike.

Međutim, trenutačno je najveći izazov upravo ono što rad ne adresira: skalabilnost i trošak treniranja RL agenta za svaki novi zadatak.

Trenutačno nedostaje otvorena implementacija, a bez nje, tehnologija ostaje nedostupna većini istraživača. Ako se pokaže da LSD stvarno donosi značajna poboljšanja na stvarnim podacima, mogao bi potaknuti cijelu seriju sličnih pristupa. Očekivanja su velika, ali još uvijek je rano za konačne zaključke.

Large Language Models (LLMs) for Multimodal Learningk-Nearest Neighbors (kNN) limitations in AI retrievalLLM decision-making vs. memorizationMultimodal AI inference architecturesAI retrieval-augmented generation (RAG) evaluation

//Comments

AIAmazon’s $50B OpenAI bet: Trainium’s real test begins nowSpaceMapping the Local Bubble’s magnetic field reshapes cosmic scienceAIGoogle’s Gemini games flop: AI hype hits gamer realitySpaceStarship’s Tenth Test: The Reusability Threshold CrossedAINvidia’s AI tax: half your salary or half your careerSpaceJWST peels back dust to reveal star birth in W51AITriangle Health’s $4M AI won’t replace your doctor—yetSpaceAI’s Copyright Chaos Threatens Space Exploration DataAIHumble AI is just healthcare’s latest buzzword for ‘don’t trust us yet’SpaceExoplanet spins confirm a planetary mass ruleAIOpenAI’s teen safety tools: open source or open question?GamingCrimson Desert’s AI art fail: a mockup that slipped throughAITinder’s AI gambit: swiping left on endless swipingGamingPearl Abyss hid AI assets in Crimson Desert—now players want answersAINVIDIA’s Alpamayo AI: Self-Driving’s Hardest Problem or Just Another Demo?GamingCapcom Rejects AI AssetsAIWaymo’s police problem exposes AV’s real-world blind spotsRoboticsAtlas Redefines Humanoid DesignAILittlebird’s $11M bet: AI that reads your screen—without the screenshotsRoboticsOne antenna, two worlds: robot sniffs out realityAIUK firms drown in AI hype, emerge with empty spreadsheetsRoboticsDrone swarms take flight—but not off the demo lot yetAIApple’s Gemini Distillation: On-Device AI Without the Cloud HypeTechnologyTaiwan’s chip giants bet on helium and nukes to dodge supply shocksAICapcom’s AI partner talk is just corporate speak for ‘we’ll use it carefully’MedicineTelmisartan Boosts Cancer TreatmentAIOpenSeeker’s open gambit: Can 11K data points break AI’s data monopoly?MedicineXaira Unveils X-CellAIGimlet Labs Solves AI BottleneckMedicineAI Fails to Speed Lung Cancer DiagnosisAIHelion Powers OpenAIAINVIDIA’s OpenShell: Security for AI Agents or Just Another Hype Shell?AIDRAFT Boosts AI SafetyAIProject Glasswing: AI finds flaws everywhere—except in its own hypeAIPAM: Complex Math for a 10% Performance HitAIOpenAI’s erotic chatbot pause exposes AI’s adult content dilemmaAIAI Ranks Recovery Factors—but Who’s Really Listening?AIDeepMind’s AI safety play: real guardrails or just another demo?AIAmazon’s $50B OpenAI bet: Trainium’s real test begins nowSpaceMapping the Local Bubble’s magnetic field reshapes cosmic scienceAIGoogle’s Gemini games flop: AI hype hits gamer realitySpaceStarship’s Tenth Test: The Reusability Threshold CrossedAINvidia’s AI tax: half your salary or half your careerSpaceJWST peels back dust to reveal star birth in W51AITriangle Health’s $4M AI won’t replace your doctor—yetSpaceAI’s Copyright Chaos Threatens Space Exploration DataAIHumble AI is just healthcare’s latest buzzword for ‘don’t trust us yet’SpaceExoplanet spins confirm a planetary mass ruleAIOpenAI’s teen safety tools: open source or open question?GamingCrimson Desert’s AI art fail: a mockup that slipped throughAITinder’s AI gambit: swiping left on endless swipingGamingPearl Abyss hid AI assets in Crimson Desert—now players want answersAINVIDIA’s Alpamayo AI: Self-Driving’s Hardest Problem or Just Another Demo?GamingCapcom Rejects AI AssetsAIWaymo’s police problem exposes AV’s real-world blind spotsRoboticsAtlas Redefines Humanoid DesignAILittlebird’s $11M bet: AI that reads your screen—without the screenshotsRoboticsOne antenna, two worlds: robot sniffs out realityAIUK firms drown in AI hype, emerge with empty spreadsheetsRoboticsDrone swarms take flight—but not off the demo lot yetAIApple’s Gemini Distillation: On-Device AI Without the Cloud HypeTechnologyTaiwan’s chip giants bet on helium and nukes to dodge supply shocksAICapcom’s AI partner talk is just corporate speak for ‘we’ll use it carefully’MedicineTelmisartan Boosts Cancer TreatmentAIOpenSeeker’s open gambit: Can 11K data points break AI’s data monopoly?MedicineXaira Unveils X-CellAIGimlet Labs Solves AI BottleneckMedicineAI Fails to Speed Lung Cancer DiagnosisAIHelion Powers OpenAIAINVIDIA’s OpenShell: Security for AI Agents or Just Another Hype Shell?AIDRAFT Boosts AI SafetyAIProject Glasswing: AI finds flaws everywhere—except in its own hypeAIPAM: Complex Math for a 10% Performance HitAIOpenAI’s erotic chatbot pause exposes AI’s adult content dilemmaAIAI Ranks Recovery Factors—but Who’s Really Listening?AIDeepMind’s AI safety play: real guardrails or just another demo?
⊞ Foto Review