We’ve processed thousands of commercial leases through our abstraction pipeline — office leases, FM contracts, service agreements, across dozens of countries and languages. The technology works. But the interesting part isn’t the technology itself. It’s what happens to portfolio operations when abstraction becomes instant, structured, and continuous instead of slow, manual, and periodic.

This article is about the hard problems we’ve solved and what changes when you have structured lease data available in seconds instead of weeks.

The amendment problem

The single hardest challenge in contract abstraction isn’t reading the head lease. It’s reading the head lease plus three amendments plus a side letter plus a deed of variation — and correctly determining which terms are still in force.

Consider a common scenario: a 2018 head lease specifies base rent of £450,000 per annum. Amendment 1 in 2020 adjusts the rent to £420,000 as a COVID concession. Amendment 2 in 2022 reverts to the original indexation schedule but references a new base. A side letter modifies the break clause notice period from 12 months to 9 months. What is the current rent? What is the effective break date?

Most legacy abstraction approaches — including many that claim to use AI — process each document independently. They’ll extract £450,000 from the head lease and £420,000 from the amendment as separate data points, leaving a human to reconcile which is current. This defeats the entire purpose.

Our pipeline reads the full document set as a single context. The model understands that Amendment 2 supersedes Amendment 1 which modifies the head lease. It resolves the chain and extracts the current effective terms — not the historical terms from each individual document. This is the difference between extraction and abstraction. Extraction pulls text. Abstraction understands the legal state of the contract.

30+ fields, type-safe, in minutes

Every lease we abstract produces a structured object with 30+ fields specific to the contract type. An office lease extracts differently from an FM contract. A service agreement has different relevant terms than a warehouse lease. The schema adapts to the document.

The output isn’t free text that needs further parsing. Every field is typed and validated against its schema: rent is a number with a currency, the commencement date is an ISO date, break options are an array of objects each with a date, notice period, and conditions. This means the data is immediately usable by every downstream system — portfolio analytics, IFRS 16 calculations, obligation calendars, cost benchmarking — without a human reformatting step.

The fields stream in real-time as the model processes the document. You watch the abstraction happen — parties appearing, then rent, then dates, then break clauses, then obligations. If something looks wrong in the first 10 seconds, you can see it immediately rather than waiting for a batch process to complete.

Multi-language without translation

A European portfolio might contain leases in English, German, Swedish, French, Dutch, and Spanish — sometimes within the same company. The traditional approach was to translate everything to English first, then abstract the translation. This adds cost, introduces a new error source (the translator may not understand CRE terminology), and creates a delay.

Our model reads leases in their original language and extracts structured data in English. A German Mietvertrag and a Swedish hyresavtal produce the same structured output format as an English lease. The extraction quality is equivalent because the underlying model has native comprehension of these languages — it isn’t translating then extracting, it’s understanding the document directly.

This matters operationally. When a portfolio manager uploads 40 leases from a Nordic acquisition, they don’t need to sort them by language or engage a translation service first. They upload everything and get structured data back for the full set.

Confidence scoring changes the workflow

No abstraction system is perfect. We’re transparent about that because it’s the honest position — and because pretending otherwise leads to worse outcomes than acknowledging uncertainty explicitly.

Every field we extract carries a confidence score. High confidence means the term was clearly stated, unambiguous, and consistent across all documents. Low confidence means the model found conflicting information, ambiguous language, or had to infer a value from context rather than finding it stated explicitly.

This transforms the review workflow. Instead of a human checking every field on every lease — which is what "manual abstraction with AI assistance" really means at most competitors — the reviewer focuses exclusively on low-confidence fields. For a typical commercial lease, 85-90% of fields extract at high confidence. The reviewer looks at the remaining 10-15%.

The 200-lease portfolio that used to take 3 months of outsourced legal review now takes an afternoon of targeted review on the fields that actually need human judgement.

The confidence score also serves as an audit trail. When a CFO asks "how do we know this break date is correct?", you can show that the system extracted it at high confidence from clause 8.3 of the head lease, confirmed by the unchanged reference in Amendment 2. That’s a better audit trail than "someone on the team read the lease and typed it into a spreadsheet."

Abstraction is the foundation, not the product

Here’s what most abstraction-only tools miss: the value of structured lease data is not the data itself. It’s what the data enables when it’s connected to everything else.

When abstraction feeds directly into a live portfolio system, you get:

  • Obligation calendars that populate automatically — every break clause, every notice period, every rent review date extracted from the lease and visible on a unified timeline the moment the contract is uploaded
  • Cost benchmarking from day one — the extracted rent, service charge, and area data feed directly into portfolio-wide cost/sqm analysis, compared against market benchmarks, without manual data entry
  • IFRS 16 schedules that generate themselves — lease term, rent, indexation, and discount rate feed into right-of-use asset and liability calculations automatically
  • AI insights that reference the actual contract — "Your Stockholm lease has a break option in 147 days with a 9-month notice period — the notice window opens next week" is only possible when the AI has structured access to the abstracted terms
  • Portfolio intelligence that improves with every upload — each new lease enriches the dataset that powers anomaly detection, risk scoring, and strategic recommendations

Abstraction in isolation produces a spreadsheet. Abstraction as the entry point to a portfolio intelligence system produces a continuously updating, always-current view of your entire lease liability — with every critical date tracked, every cost benchmarked, and every risk flagged.

What changes for the RE team

The operational impact is concrete. Teams that adopt AI abstraction as part of an integrated platform report three consistent shifts:

First: the data gap closes. Leases that sat unread in shared drives for years are suddenly structured, searchable, and analysable. Portfolio managers discover break clauses they didn’t know existed, obligations they weren’t tracking, and rent review mechanisms they hadn’t factored into their budget.

Second: reporting becomes real-time. The quarterly scramble to assemble a board pack from fragmented sources disappears when the underlying data is structured and continuously updated. The report generates itself because the data is already there.

Third: the team’s role shifts. Less time extracting and reconciling data. More time on strategy, negotiation, and portfolio optimisation — the work that actually requires human judgement and domain expertise.


Contract abstraction is a solved problem. The question for RE teams isn’t whether AI can read their leases — it can, accurately, in any language, in a few minutes. The question is whether the abstracted data sits in a spreadsheet or flows into a system that acts on it.