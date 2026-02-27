Legacy Codebench Results

Independent evaluation shows 94% accuracy on legacy code comprehension - 20 points ahead of GPT-4o

NEW YORK, NY, UNITED STATES, February 27, 2026 / EINPresswire.com / -- Hexaview Technologies today announced Legacy Insights, a documentation system for legacy enterprise code that achieved the top score on LegacyCodeBench, the first benchmark to measure whether AI can accurately understand COBOL programs.Legacy Insights scored 94% overall, leading Claude Sonnet 4-6 (93%), Claude Opus 4-6 (91%), and AWS Transform (90%). GPT-4o scored 74%. On the most complex enterprise programs - code with CICS transactions, DB2 integration, and decades of accumulated business logic - Legacy Insights maintained 92% accuracy while GPT-4o dropped to 60%.The benchmark was developed by Kalmantic , an applied AI research lab, with domain expertise contributed by Hexaview.-Why It MattersOver 220 billion lines of COBOL still process $3 trillion in daily transactions. Modernization projects fail at rates exceeding 60%, typically because the business logic embedded in these systems was never properly documented."Everyone's been asking whether AI can help with legacy modernization," said Ankit Agarwal , Founder and CTO of Hexaview. "Until now there wasn't a rigorous way to answer that. LegacyCodeBench gives enterprises an objective way to evaluate which systems they can actually trust."-How Legacy Insights WorksLegacy Insights combines retrieval-augmented generation with domain-specific tooling built for COBOL, including static analysis, control flow extraction, and business rule detection. Rather than relying on a foundation model's general knowledge, it grounds documentation in the actual program structure."General-purpose models are impressive, but they're guessing based on patterns they've seen," Agarwal said. "Legacy Insights reads the code the way an experienced COBOL engineer would - tracing the logic, checking the copybooks, understanding how the pieces connect."-How LegacyCodeBench WorksThe benchmark avoids the reproducibility problems of LLM-as-judge approaches. Instead, it extracts specific claims from AI-generated documentation - statements like "PREMIUM is calculated by multiplying BASE-RATE by RISK-FACTOR" - and verifies them by executing the original program.Documentation that avoids testable claims scores zero. Hallucinated variables fail the entire task."We're not measuring whether documentation reads well," said Nikita Kumar, co-author of the benchmark paper. "We're measuring whether you could actually use it to make decisions."Legacy Insights achieved the highest documentation quality score (96%) and maintained consistent performance across complexity tiers. The system is available now for enterprise engagements.-AvailabilityLegacy Insights is available for enterprise deployments. LegacyCodeBench is open source at legacycodebench.com.-ResourcesLegacy Insights: legacyip.hexaview.aiLegacyCodeBench: legacycodebench.comPaper Available at" legacycodebench.comGitHub: github.com/kalmantic/legacycodebenchAbout HexaviewHexaview is a strategic implementation partner for regulated enterprises, specializing in legacy system preservation and modernization. The company serves global clients across financial services, healthcare, and government. Learn more: hexaviewtech.comAbout KalmanticKalmantic is an applied AI research lab studying the challenges that emerge when AI meets production systems. They publish research openly and build tools based on their findings. Learn more: kalmantic.com-LegacyCodeBench is open source under MIT license.Media ContactHexaview Technologiesmarketing@hexaviewtech.com

