Updating our mapping of GLEIF data to BODS to better capture the lifecycle of LEI data changes
Understanding and capturing details about how ownership data changes over time is crucial to making sense of up-to-date and historical records.
As part of the release of version 0.4 of the Beneficial Ownership Data Standard (BODS), Open Ownership has worked on a new feature to help implementers represent change over time in ownership data.
To support learning and inform the development of BODS, our data analysts have revisited work we unveiled last year to map and republish corporate ownership data from the Global Legal Entity Identifier Foundation (GLEIF) on millions of entities which have a Legal Entity Identifier (LEI).
After building a more detailed understanding of the full lifecycle of an LEI and all the ways this data can change, we have:
- improved the process by which GLEIF’s data is mapped to version 0.2 of BODS;
- updated our regularly refreshed open dataset to make it more detailed;
- validated the change-over-time improvements we have made in version 0.4 of BODS.
We have also written a data use guide to explain our new process and to better support those who want to reuse the updated dataset.
Background
GLEIF is tasked with supporting the implementation and use of the LEI. It was set up by the Financial Stability Board and the G20 in the wake of the 2008 worldwide financial crash to develop a universal identifier that can be applied to any legal entity that engages in financial transactions.
GLEIF enables “smarter, less costly and more reliable decisions about who to do business with” by providing “open, standardised and high quality legal entity reference data” to uniquely identify companies. This open data includes its Level 2 Data on who owns whom.
Understanding ownership structures is a key part of Open Ownership’s mission to drive the global shift towards transparency over who owns and controls corporate vehicles. We focus on beneficial ownership transparency, which shows how natural persons own or control companies and other legal entities or arrangements, such as trusts.
The corporate ownership data collected and published openly by GLEIF is not beneficial ownership data, as it does not identify the natural persons at the end of ownership chains. Yet, this data can still give crucial insights on corporate ownership structures by offering detailed information on parent and child entity relationships and facilitating mapping to datasets, such as sanctions, watch and politically exposed persons lists.
Lessons learned
To understand how LEI records change over time, there are two critical information sources:
- GLEIF Golden Copy and Delta Files Specification / User Manual (contains information about how the golden copy and delta files can be accessed and used)
- State Transition and Validation Rules for Common Data File formats (contains detailed information mainly aimed at publishers of LEI data)
What follows below are the lessons that the Open Ownership data and technology team has learned from using this information to enhance our mapping of GLEIF data to BODS:
Understanding the full lifecycle of data and explaining changes
Facilitating testing by enhancing BODS data with annotations
In the process of testing our lifecycle mapping for the GLEIF data, it became clear that there were significant issues with manually checking the data. Tracing through chains of multiple statements was often required in order to establish the context of a particular statement and help verify that it was correct. This is very cumbersome when doing manual spot checks, and in some cases it does not lend itself to easy understanding of why a particular BODS statement exists (i.e. what particular GLEIF record caused it to be created).
BODS provides a mechanism for adding extra information to statements through annotations. This feature allows arbitrary information to be included in a statement, and it even has a “pointer” to indicate what part of the statement the annotation refers to.
We found that including annotations which provided the general context of the statement, and in particular which GLEIF record caused the statement to be created, was invaluable when conducting manual checks on the mapped GLEIF data. It should be noted that if the annotations are sufficiently regular and designed well, they can also be useful in automated checks of the data.
In our use case, we found that the most crucial annotations were to document:
- why an ownershipOrControlStatement had been created, e.g.:
"annotations": [{"motivation": "commenting", "description": "Describes GLEIF relationship: 549300EBO1XL27SM6630 is subject, W0CZ7N0GH8UIGXDM1H41 is interested party", "statementPointerTarget": "/", "creationDate": "2024-06-03", "createdBy": {"name": "Open Ownership", "uri": "https://www.openownership.org"}}]
- why an unknown entityStatement or personStatement had been created, e.g.:
"annotations": [{"motivation": "commenting", "description": "This statement was created due to a NO_KNOWN_PERSON GLEIF Reporting Exception for 984500LB75C39960N129", "statementPointerTarget": "/", "creationDate": "2024-06-03", "createdBy": {"name": "Open Ownership", "uri": "https://www.openownership.org"}}]
- and, for completeness, why a known entityStatement had been created, e.g.:
"annotations": [{"motivation": "commenting", "description": "GLEIF data for this entity - LEI: 259400WW62FRVH552J81; Registration Status: ISSUED", "statementPointerTarget": "/", "creationDate": "2024-06-03", "createdBy": {"name": "Open Ownership", "uri": "https://www.openownership.org"}}]
A lesson here is that open data is likely to be used in many different ways, including automated processing as well as manual examination. The bulk of BODS is focused on providing structured data which can be processed by a variety of automated tools and processes, but the annotations feature can provide a useful mechanism for enhancing the quality or usefulness of the data, particularly for when direct human interaction with the data is necessary.
We would recommend that publishers of BODS data carefully consider the possible use cases that the data could be put to, and design an annotation scheme with these in mind. In particular, publishers should find that annotations can be extremely useful for facilitating their own quality-control processes.
That said, annotations are a highly flexible feature of the data standard, and there are many other potential use cases beyond helping to test the data, such as to include data for which there are no specific fields in the standard, or providing metadata on how the data has been processed or transformed.
Explaining the use of BODS annotations
Annotations have been used in entity statements to identify the registration status of an LEI, as there is no obvious field for this in BODS. Annotations can point to a specific field or an entire statement. In this case, the annotation points to the entire statement:
"annotations": [{"motivation": "commenting", "description": "GLEIF data for this entity - LEI: 259400WW62FRVH552J81; Registration Status: ISSUED", "statementPointerTarget": "/", "creationDate": "2024-06-03", "createdBy": {"name": "Open Ownership", "uri": "https://www.openownership.org"}}]
Annotations have also been used to refer back to the source data.
For example, a natural persons reporting exception maps to a person statement referencing an anonymous person and an ownershipOrControlStatement linking the anonymous person to the entity. The person and ownershipOrControlStatement statements will have annotations saying “This statement was created due to a NATURAL_PERSONS GLEIF Reporting Exception for [LEI]”.
A relationship reported by GLEIF maps to an ownershipOrControlStatement. This statement will have an annotation "Describes GLEIF relationship: [LEI] is subject, [LEI] is interested party”. This makes it easier to check what the ownershipOrControlStatement is describing without needing to identify the entity statements for these entities, where the LEI is stored under identifiers.
Limitations of BODS version 0.2
This piece of work has reinforced the need for some of the changes we have made to BODS between versions 0.2 and 0.4.
The handling of information updates in BODS version 0.2 is ambiguous and inefficient:
- When a statement is replaced, the new statement references the old statement in the “replacesStatements” array. However, when that statement is replaced users have a choice as to whether the “replacesStatements” array should include only the most recent statement or its predecessors also. In our mapping, replacesStatements only references the most recent statement being replaced. In BODS version 0.4, the replacesStatements field has been removed. Now, all statements would be grouped under a single recordId.
- When a relationship ends, for example when an entity has a new fund manager, users could issue a new “voiding statement” using interest.endDate to indicate an interest has ended, or they could simply issue a new ownershipOrControlStatement with a new interested party and reference the previous statement in the replacesStatements array. In our mapping, we chose to use voiding statements in these cases. In BODS version 0.4, we have improved change over time modeling. In this case, a new statement would be issued with recordStatus closed to represent the end of the previous relationship, and a new relationship statement with a new recordId and recordStatus would be issued to represent the new relationship.
- When an updated entity statement is issued for a routine matter (such as renewing an LEI), even though nothing has materially changed, any ownership or control statements linked to that entity statement must also be re-issued to reference the latest statementId for that entity. In BODS version 0.4, this has been resolved by separating record Ids and statement Ids. An entity will always have the same record Id, and this is what is referenced in the relationship statement.
Challenges of handling GLEIF reporting exceptions
Deciding how to handle reporting exceptions in GLEIF also posed challenges when developing this mapping.
The exceptions handled are:
- NO_LEI – the parent entity does not have an LEI
- NATURAL_PERSONS – the entity is controlled by a person
- NON_CONSOLIDATING – the entity is controlled by entities not subject to consolidation
- NO_KNOWN_PERSON – the entity is controlled by unknown persons, e.g. controlled by diverse shareholders
In BODS, these cases would be handled using the same statement types as any other representation – a person statement can be used to represent an anonymous person, for example.
This part of the mapping was challenging, as we needed to consolidate information from all data sets. For example, if a NO_LEI exception is reported but then a relationship is reported later on, these must be cross-referenced so the ownershipOrControlStatement can reference the previous exceptions statement in replacesStatements.
Importance of good documentation
GLEIF’s thorough documentation made this mapping process much easier. GLEIF has provided clear and easy-to-understand schemas for its Common Data Formats. Detailed definitions of each field and relevant codelists are available in a user-friendly format on the GLEIF website: https://www.gleif.org/en/about-lei/common-data-file-format.
To follow this precedent, Open Ownership has also written a data use guide to explain how we have mapped LEI changes to BODS as part of our new process and to better support those who want to reuse our updated open dataset.
Make data available in multiple formats to support user needs
GLEIF publishes golden copies of their data three times per day. These files are a complete set of up-to-date data showing the status of LEIs and their relationships at the time of publishing.
GLEIF also publishes delta files, which represent changes in the data. Delta files are available at intervals of eight hours, 24 hours, seven days, and one month.
In our BODS mapping, we have used the golden copy file from 6th June 2024 as our starting point. We have then used the monthly (31 days) delta files to map change over time. It is much more efficient to process the delta files than looking for changes across golden copies, which are much larger (several gigabytes) in size.
GLEIF publishes its data as XML, JSON, and CSV files, and data is available via the GLEIF application programming interface. We have used the XML files.
Reuse the data
On Github, you can see how Open Ownership has updated how we ingest, map, and transform this data in line with BODS. This resulting dataset is published under the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication licence and will be refreshed on a regular basis.
This work is part of Open Ownership’s strategy to ensure more and higher-quality data on the ownership and control of corporate vehicles is available to stakeholders in governments, companies, and civil society.
Related articles and publications
Publication type
Blog post
Topics
Beneficial Ownership Data Standard
Sections
Technology
Open Ownership Principles
Up-to-date and historical records,
Structured data