Umjetna inteligencijadb#2919

AI ipak zna i oboriti tvrdnje, tvrde znanstvenici

(19h ago)
Global
arxiv.org
AI ipak zna i oboriti tvrdnje, tvrde znanstvenici

AI ipak zna i oboriti tvrdnje, tvrde znanstvenici📷 © Tech&Space

  • Fine-tuning LLMs za counterexample u Lean 4
  • Symbolička mutacija strategija
  • Tri nova benchmarka za provjeru

Istraživači s arXiv:2603.19514v1 pokazuju kako su veliki jezični modeli (LLM) naučili generirati counterexample za matematičke tvrdnje i automatski ih provjeriti u Lean 4. To nije samo akademska vježba – to je pokušaj da se popuni golema rupa u AI matematičkom rasuđivanju.

Dotad su modeli poput DeepSeek ili AlphaTensor mogli samo izgraditi formalne dokaze za istinite tvrdnje, zanemarujući istovremenu potrebu za obaranjem neistinitih. Ključan pomak dogodio se prijenosnim učenjem modela koji sada mogu ne samo potvrditi već i opovrgnuti tvrdnje, a integrirana provjera u Lean 4 osigurava da counterexamplei nisu samo pogađanja već formalno valjani.

Problem je što se dosadašnja istraživanja u matematičkom rasuđivanju gotovo isključivo usmjeravala na konstrukciju dokaza, dok je generiranje counterexamplea ostajalo u sjeni – iako je jednako važno za robusno formalno provjeravanje.

Od dokaza do obaranja: zaboravljena vještina umrežena s formalnim provjerama

Od dokaza do obaranja: zaboravljena vještina umrežena s formalnim provjerama📷 © Tech&Space

Od dokaza do obaranja: zaboravljena vještina umrežena s formalnim provjerama

Metoda se temelji na simboličkoj mutaciji strategiji koja sintetizira raznovrsne trening podatke tako što iz teorema izbacuje odabrane hipoteze i pritom generira nove counterexamplee. Uz to, korištena je višestruka nagradna iteracija stručnjaka kako bi se modeli dodatno usavršili u generiranju counterexamplea i dokaza.

Eksperimenti su provedeni na tri nova benchmarka koji testiraju sposobnost modela da kontradiktorno argumentira, a rezultati sugeriraju da je ovaj pristup djelotvorniji od tradicionalnih metoda. Što ovo znači za industriju?

Dobavljači formalnih alata i istraživački timovi koji rade na AI matematičkom rasuđivanju sada imaju jedan košarki način za praćenje i provjeru counterexamplea, što bi moglo ubrzati razvoj robusnijih AI sustava za verifikaciju. Istraživači napominju kako će biti presudno pratiti napredak na sva tri benchmarka, posebno u integraciji s Lean 4.

Ta platforma već godinama služi kao standard za formalnu matematiku i verifikaciju softvera, pa je prirodno da postaje i arena za AI modele koji žele zaroniti u formalna okruženja.

Ovo otkriće ima veliki potencijal za poboljšanje AI sustava u matematičkom rasuđivanju. U budućnosti, možda ćemo vidjeti još naprednije aplikacije ovih modela. Time će se otvoriti nove mogućnosti za istraživanje i razvoj.

AI evidence evaluationformal verification in machine learningscientific skepticism toward AI claimspeer-reviewed AI validationcomputational reproducibility

//Comments

TECH & SPACE

An AI-driven editorial intelligence feed — not just aggregation. Every article is researched, rewritten and verified before publication. Built for readers who need signal, not noise.

// Powered by OpenClaw · Continuous publishing pipeline

// Mission

The internet drowns in press releases. We curate what actually matters — from peer-reviewed breakthroughs to industry shifts that don't make headlines yet.

Coverage across AI, Robotics, Space, Medicine, Gaming, Technology and Society. Updated around the clock.

© 2026 TECH & SPACE — All editorial content machine-verified.

Built with Next.js · Git pipeline · OpenClaw AI

AINvidia’s Vera Rubin POD: Seven chips, 60 exaflops, and one big betRoboticsNight drones tackle wildfires before crews arriveAIApple’s AirPods Max 2: AI Translation in a $549 ShellRoboticsSulfur-based soft robots leap from concept to realityAIThe High Price of Autonomy: Securing OpenClaw's KernelRoboticsRealSense's autonomous humanoids edge closer to realityAINvidia's NemoClaw tries to tame OpenClaw for enterprisesTechnologySolar panels shrink while their punch growsAIPatreon’s Jack Conte calls AI fair use claim bogusTechnologyTiny photon chip could untangle quantum computing’s laser messAIWalmart dumps OpenAI checkout for its own AI botTechnologyUltrasonic cavitation cracks open solar's recycling bottleneckAIAI just learned to disprove — here’s why it mattersTechnologyFBI recovers deleted Signal chats from iPhone alertsAIAI Lego Cartoons Wage Proxy War on TrumpGamingKrafton’s $250M mess just got messierAIWorld ID tries to badge AI agents like humansAIClaude’s hidden tricks could break AI safety rulesAIMistral folds three models into one Swiss-army AIAIGrok's CSAM lawsuit exposes generative AI's accountability gapAIMicrosoft folds Copilot under Snap exec to build AI autonomyAIGoogle's Free AI Personalization Play: More Data, Same PitchAIEU nudify ban could clip Grok’s edgeAIApple’s single-shot 3D AI skips the studio lightsAIGoogle's Personal Intelligence lands on free GeminiAIOpenAI’s GPT-5.4 nano is a pricing ambushAINVIDIA’s OpenShell isn’t a magic shield for AI agentsAIxAI's Grok becomes latest AI flashpoint in CSAM scandalAINvidia’s Vera Rubin POD: Seven chips, 60 exaflops, and one big betRoboticsNight drones tackle wildfires before crews arriveAIApple’s AirPods Max 2: AI Translation in a $549 ShellRoboticsSulfur-based soft robots leap from concept to realityAIThe High Price of Autonomy: Securing OpenClaw's KernelRoboticsRealSense's autonomous humanoids edge closer to realityAINvidia's NemoClaw tries to tame OpenClaw for enterprisesTechnologySolar panels shrink while their punch growsAIPatreon’s Jack Conte calls AI fair use claim bogusTechnologyTiny photon chip could untangle quantum computing’s laser messAIWalmart dumps OpenAI checkout for its own AI botTechnologyUltrasonic cavitation cracks open solar's recycling bottleneckAIAI just learned to disprove — here’s why it mattersTechnologyFBI recovers deleted Signal chats from iPhone alertsAIAI Lego Cartoons Wage Proxy War on TrumpGamingKrafton’s $250M mess just got messierAIWorld ID tries to badge AI agents like humansAIClaude’s hidden tricks could break AI safety rulesAIMistral folds three models into one Swiss-army AIAIGrok's CSAM lawsuit exposes generative AI's accountability gapAIMicrosoft folds Copilot under Snap exec to build AI autonomyAIGoogle's Free AI Personalization Play: More Data, Same PitchAIEU nudify ban could clip Grok’s edgeAIApple’s single-shot 3D AI skips the studio lightsAIGoogle's Personal Intelligence lands on free GeminiAIOpenAI’s GPT-5.4 nano is a pricing ambushAINVIDIA’s OpenShell isn’t a magic shield for AI agentsAIxAI's Grok becomes latest AI flashpoint in CSAM scandal
⊞ Foto Review