DataHub Cloud update boosts analytics agent accuracy

Tue, 2nd Jun 2026

DataHub has released a new version of DataHub Cloud designed to give analytics agents trusted context for enterprise data.

The release centres on a context layer between analytics agents and corporate data sources such as data warehouses and data lakes. The system ingests metadata and other signals from across an organisation, structures the information and serves it to analytics tools before they generate queries.

The update targets a problem that has become more visible as businesses test natural-language analytics tools: agents can produce plausible but wrong answers because they lack context on definitions, data freshness, lineage and previous usage. In practice, that can mean selecting the wrong metric, relying on old data or missing known data quality issues.

According to DataHub, the latest version of its cloud product continuously extracts semantic meaning from enterprise query history, combines it with operational signals and lets experts validate and refine the resulting context. It positions this as an alternative to approaches that require developers to define semantic models in advance.

A customer example from collaboration software group Miro was used to illustrate the point. "Starting with Snowflake metadata alone, our analytics agent answered about half of our benchmark questions correctly," said Ronald Angel, Product Manager, Data Platform, Miro.

"After layering in DataHub Cloud as our context platform, including data product documentation, cross-source context and business meaning derived from our query history, we nearly doubled accuracy from close to 50% to around 90%," said Angel.

Another customer, Swedish retailer ICA, said the software had already surfaced useful knowledge and data quality concerns. "Every system in the enterprise holds context, but DataHub Cloud is the platform built to unify it, and that makes it the natural foundation for AI agents," said Björn Barrefors, Metadata Management Lead, ICA.

"We have already seen it surface institutional knowledge that analysts would never have found on their own, while also flagging known data quality issues before the query ran. The next step is putting that context layer behind every data question our business asks," said Barrefors.

New features

DataHub grouped the release into four areas. The first, Context Ingestion, is intended to address fragmented information spread across catalogues, metric tools and internal documentation systems. It can build a unified graph from structured material and unstructured sources, including workplace knowledge platforms.

The second, Context Intelligence, focuses on query history. DataHub says it turns historical analyst queries, dashboards and documents into a semantic index so that, when an agent receives a question, it can retrieve not only schema information but also query patterns that have answered similar questions.

That matters because many organisations have years of SQL and reporting logic embedded in analyst work, but much of it remains inaccessible to generative AI systems. By converting that history into a structured reference layer, DataHub is aiming to make existing usage patterns reusable without a lengthy manual setup process.

The third area, Context Hub, provides a workspace for domain experts to review and approve AI-generated context, resolve conflicting definitions and test how changes may affect text-to-SQL output before publication. The fourth, Context Activation, is designed to let agents and workflows access that context through APIs, software development tools and built-in user interfaces.

Shirshanka Das, Co-Founder and Chief Technology Officer at DataHub, said the company views enterprise query history as a source of operational knowledge, not just a record of past activity. "We turn years of query history into a living knowledge base, fuse in real-time operational signals and compound it with every expert correction from the field," said Shirshanka Das, Co-Founder and Chief Technology Officer, DataHub.

"Every change is timestamped and versioned; agents don't just know the right answer, they know why it changed. That's auditable context, and it's how agents stop hallucinating and start earning trust," said Das.

Market backdrop

The announcement comes as data management suppliers try to position themselves around AI agents and natural-language interfaces. Much of that competition is now focused on reliability rather than novelty, particularly for finance, operations and other business uses where a wrong answer can quickly erode confidence.

Industry researchers have also pointed to the need for stronger data context around AI systems. In comments included by DataHub, Kevin Petrie, Vice President of Research at BARC, said context engineering requires both machine and human curation across usage patterns, business definitions and semantic signals.

DataHub said the product can ingest metadata from more than 100 sources. The company traces its roots to the open-source DataHub project, which it says has more than 15,000 contributors and is used by thousands of organisations. Its backers include Bessemer Venture Partners, LinkedIn and 8VC.

It also argued that supplying pre-validated context instead of raw schema can reduce the number of tokens an analytics agent needs to answer a question, lowering inference costs. Whether that cost argument resonates as strongly as the accuracy claim may depend on how quickly companies move from AI pilots to broader operational use. But the core challenge is already clear: the quality of the answer depends on the quality of the context the system receives.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google

Image: Shirshanka Das