// INITIALIZING GLOBE FEED...
Umjetna inteligencijadb#3150

Google TurboQuant skuplja KV cache za LLM-e u 3 bita

(3d ago)
Mountain View, United States
tomshardware.com
Google TurboQuant skuplja KV cache za LLM-e u 3 bita

Google TurboQuant skuplja KV cache za LLM-e u 3 bita📷 © Tech&Space

  • TurboQuant komprimira cache u samo 3 bita
  • 8 puta brža obrada na H100 GPU-ima
  • bez gubitka točnosti modela

Google Research uvodi TurboQuant, algoritam koji komprimira ključno-vrijednosne (KV) cacheove velikih jezičnih modela (LLM-ova) u samo tri bita bez ikakvog utjecaja na točnost. Testovi provedeni na Nvidijinim H100 GPU-ima pokazuju osamostruko povećanje brzine izračuna attention logita u odnosu na nekomprimirane 32-bitne ključeve. Ova tehnika, predstavljena u sklopu dugotrajnih benchmarkova poput LongBench i Needle In A Haystack, cilja na rastući problem memorijskih uskih grla u modelima s povećanim kontekstnim prozorima.

TurboQuant djeluje kao dvostupanjski proces koji uklanja tradicionalni memorijski overhead kvantizacijskih metoda. Umjesto samo smanjenja veličine cachea, on reorganizira podatke kako bi se izravno eliminirali redundantni računski troškovi. Prema prvim mjerenjima, ova optimizacija omogućuje i do šest puta manju potrošnju memorije za potrebe cachea, što otvara mogućnost obrade većih serija podataka i dužih sekvenci u produkcijskim okruženjima. Također, ovaj pristup ne zahtijeva retraining modela, što ga čini posebno atraktivnim za industriju.

Nova Googleova tehnika eliminira memorijsko usko grlo u velikim jezičnim modelima

Nova Googleova tehnika eliminira memorijsko usko grlo u velikim jezičnim modelima📷 © Tech&Space

Nova Googleova tehnika eliminira memorijsko usko grlo u velikim jezičnim modelima

Iako je Google ovaj put izbjegao klasičan call-to-action, činjenica da će rad biti predstavljen na ICLR-u 2026. sugerira da se radi o ozbiljnoj tehnologiji, a ne samo o marketinškom pokretu. Trenutno je još otvoreno pitanje koliko će TurboQuant biti prenosiv na druge GPU arhitekture uz Nvidijin H100, iako rani signali sugeriraju mogućnost šire primjene.

Community već reagira na mogućnost smanjenih memorijskih zahtjeva, posebice među korisnicima visokoperformantnih AI inference zadataka. Za developere i kompanije koje rade s LLM-ovima u produkciji, ovakva optimizacija može značiti ključno smanjenje troškova na hardveru. Jedino preostalo pitanje jest koliko će brzo ovaj algoritam biti integriran u postojeće frameworkove poput TensorFlowa ili PyTorcha.

Google TurboQuant3-bit quantizationLLM memory optimizationKV cache compressionlarge language model efficiency

//Comments

TECH & SPACE

Uredničko informiranje s fronte tehnologije — UI, svemir, robotika i sve što dolazi.

// Kontinuirani pipeline objavljivanja

// Misija

Internet je preplavljen priopćenjima. Mi izvlačimo ono što stvarno vrijedi — recenzirane proboje, industrijske pomake i signale koji još ne dospijevaju u naslove.

Ažurirano neprestano.

© 2026 TECH & SPACE — Sav sadržaj provjeren AI sustavom.

Next.js · AI Pipeline · Open Source

AIGoogle’s new TPUs and agent bundle sound big, but the real math comes laterSpaceArtemis 2 has entered lunar space, but the real mission test still lies aheadAIAnthropic’s TPU deal looks enormous, but the harder question is who pays for all of itGamingNvidia’s weird RTX 5050 does not look like a clean win for playersAIClaude can now control your computer, but trust is costlier than automationTechnologyAustralia’s grid is sometimes paying people to consume power, and that changes the whole market logicAIAI data centers are swallowing gas faster than the marketing can reach net zeroTechnologyTesla’s HW3 split shows how much more expensive the FSD promise was than the realityAIOpenAI’s privacy filter looks useful, but it still has to prove how often it failsTechnologyBlockchain scams now haunt the Strait of HormuzAIChatGPT for Clinicians sounds strong, but a benchmark is still not the same thing as medicineTechnologyTesla’s AI4.1 doubles chip memory — is HW4 next?AIX is replacing communities with Grok feeds, and the real story is ad inventoryMedicineLab-grown human sperm sounds enormous, but without peer review it is not yet a clinical breakthroughAICyberpunk fiction shows AI safety is still too literalRoboticsGlydways has $170 million, but it still has to prove AV corridors are more than expensive rendersAIAI Scams Are Getting Scarily ConvincingRoboticsHumanoid parkour looks great, but the factory floor is still not a skateparkAIClaude is winning the new-install wave, but that is not a regime change yetRoboticsA&K Robotics raises $8M to push terminal autonomyAIGoogle’s new TPUs and agent bundle sound big, but the real math comes laterSpaceArtemis 2 has entered lunar space, but the real mission test still lies aheadAIAnthropic’s TPU deal looks enormous, but the harder question is who pays for all of itGamingNvidia’s weird RTX 5050 does not look like a clean win for playersAIClaude can now control your computer, but trust is costlier than automationTechnologyAustralia’s grid is sometimes paying people to consume power, and that changes the whole market logicAIAI data centers are swallowing gas faster than the marketing can reach net zeroTechnologyTesla’s HW3 split shows how much more expensive the FSD promise was than the realityAIOpenAI’s privacy filter looks useful, but it still has to prove how often it failsTechnologyBlockchain scams now haunt the Strait of HormuzAIChatGPT for Clinicians sounds strong, but a benchmark is still not the same thing as medicineTechnologyTesla’s AI4.1 doubles chip memory — is HW4 next?AIX is replacing communities with Grok feeds, and the real story is ad inventoryMedicineLab-grown human sperm sounds enormous, but without peer review it is not yet a clinical breakthroughAICyberpunk fiction shows AI safety is still too literalRoboticsGlydways has $170 million, but it still has to prove AV corridors are more than expensive rendersAIAI Scams Are Getting Scarily ConvincingRoboticsHumanoid parkour looks great, but the factory floor is still not a skateparkAIClaude is winning the new-install wave, but that is not a regime change yetRoboticsA&K Robotics raises $8M to push terminal autonomy
⊞ Foto Review