How do you evaluate AI accuracy claims from CLM vendors?

AI accuracy claims should be tested on your own contracts, not accepted at face value. Vendor-quoted figures are typically measured on controlled test sets with predictable inputs. Real-world performance on your specific contract types may differ significantly — always request a pilot using your own documents.

What AI features in CLM actually deliver value?

The AI features that deliver the most reliable value are those supporting well-defined, bounded tasks: extracting specific data fields from executed contracts, flagging clauses that deviate from a standard position, and summarising contract terms for review.

What is the risk of relying on AI for contract review?

The primary risk is false confidence. AI can miss non-standard clauses, misidentify risk, or produce summaries that omit material terms — particularly on contract types that differ from its training data. AI should supplement legal review, not replace it.

Should organisations implement AI in CLM before getting the basics right?

No. The most effective approach is to establish sound CLM fundamentals first — structured templates, reliable metadata, governed approval workflows — and then layer AI on top. AI performs significantly better on well-structured data within governed systems.

AI

September 10, 2025

AI in CLM: Separating Value from Hype

Oscar Klink

CTO

Our CTO, Oscar Klink, has spent the better part of his career at the intersection of technology, design, and legal operations. With a background that spans product development, user experience, and system architecture, he’s played a central role in shaping how Precisely builds contract management that actually works in practice. Oscar has a knack for cutting through hype to get to what’s real, especially when it comes to new technologies like AI.

In this piece, he reflects on the current wave of AI in contract lifecycle management: where it’s genuinely useful, where risks are hiding in plain sight, and how to approach it with equal parts curiosity and caution. True to his style, it’s pragmatic, hands-on, and rooted in lessons from the field rather than theory.

Our CTO, Oscar Klink, has spent the better part of his career at the intersection of technology, design, and legal operations. With a background that spans product development, user experience, and system architecture, he's played a central role in shaping how Precisely builds contract management that actually works in practice. Oscar has a knack for cutting through hype to get to what's real, especially when it comes to new technologies like AI.

In this piece, he reflects on the current wave of AI in contract lifecycle management: where it's genuinely useful, where risks are hiding in plain sight, and how to approach it with equal parts curiosity and caution.

AI in CLM: separating value from hype

Lately, AI has been dominating every conversation in the CLM space. That's a good thing but it can also be risky. In this quick guide to common AI topics in CLM, I've seen plenty of slides claiming "95% accuracy" but the truth is, these numbers usually come from carefully controlled in-distribution test sets where the inputs are predictable and the outputs are already known. If I know the answers in advance, I can adjust prompts and rules until I hit 95% as well. That doesn't mean the system will actually perform at that level when it's faced with your real contracts; your NDAs, MSAs, or DPAs that have their own quirks, formats, and edge cases.

That's why the more pragmatic path is deliberately unflashy: get the CLM basics in place, run tests on your own documents, keep humans involved in reviewing outputs, and look for step-by-step improvements rather than overnight miracles.

And when it comes to vendor claims, a common one is "we use OpenAI," as if that alone answers every question about security, governance, and compliance. It doesn't. Yes, OpenAI now offers EU data residency for certain types of API traffic and for ChatGPT Enterprise/Edu content, and yes, Azure OpenAI runs the same models within Microsoft's cloud infrastructure with enterprise-grade controls. But your actual risk exposure still depends on the specific service you're using, which region it's configured in, and what the retention settings look like in practice.

What fuels the hype (and why it matters)

Bold AI claims often spread because there is pressure behind them: marketing teams want striking results, investors expect momentum, and many buyers are afraid of missing out. A single headline number such as "95% accuracy" can look convincing at first. In practice, that figure usually comes from a curated test set that the vendor has already tuned against.

Your own situation is rarely the same. Contract templates, scan quality, languages in use, and internal risk thresholds all affect outcomes. Once you move from a demo set to your actual repository, performance will shift. The only dependable way to know if a system works for you is to test it directly on your own data before making a purchase. This is one of many reasons to be deliberate when evaluating vendors — see Avoiding Common Pitfalls in CLM Adoption for a full checklist.

Risks of moving too fast

Compliance and residency gaps

For EU-based organizations, it is important to confirm where inference runs and which logs are retained. OpenAI's API may retain inputs and outputs for up to 30 days for abuse monitoring, although there are endpoints that support zero-data-retention. Azure OpenAI processes data in the region or geography you select, with the option of modified abuse-monitoring if approved. These technical choices should always be mapped against GDPR requirements and the record-keeping duties of the EU AI Act.

Data residency is not just a compliance checkbox — it is a foundational design decision for any AI-enabled CLM. For more on why this matters, read Why Data Sovereignty Matters for Contract Management and What It Means for AI.

Wrong expectations from demos

Accuracy scores presented in vendor demos are usually based on controlled, known datasets. That is very different from the mix of third-party contracts and edge cases that most legal teams actually deal with.

Sham digitization

If the underlying repository, metadata, versioning, and search functions are not in good shape, AI will not solve those problems. Instead, it risks magnifying the inconsistency.

What to do instead

Build the CLM foundation first

AI can only amplify what is already in place. A central contract repository, reliable metadata, clear version control for templates, and effective search are the basics that make advanced tools useful. Without this foundation, the gains from AI will be limited. You cannot fix data quality downstream — read more in You Can't Fix Data Quality Downstream.

Test on your own documents

A realistic pilot should include a small but representative set of your own contracts. A golden set of 20–50 documents that cover templates, third-party paper, and known edge cases works well. Keep the set blind until test day to avoid bias. Research such as ContractNLI shows just how hard it is to capture contractual nuance, which is why results from polished demo datasets rarely translate directly to real-world use.

Keep humans in the loop

Automation works best for low-risk, repetitive patterns. For higher-risk suggestions, human review should remain part of the process. Even with more autonomous "agentic" tools, oversight in legal workflows is still necessary.

Bottom line

AI is set to play an important role in contract lifecycle management. The teams that benefit most will be the ones that combine the strengths of AI with solid governance practices, careful testing on their own documents, and ongoing human oversight. This approach turns polished demos into lasting improvements in contract cycle times, while keeping control over risk and compliance. For Precisely's own approach to responsible AI innovation, see Smarter Contracting with AI: Inside Precisely's Approach to Responsible Innovation.

Contract Governance

Contract Governance: What Control in CLM Actually Means