Standardised Metadata Model

What Is This?

This is a metadata model: a blueprint for organising information about your data. Think of it like a library catalogue. The catalogue doesn't contain the books themselves, but it tells you what books exist, where they are, who wrote them, and how they're related.

Why Does This Matter?

Finding Data

In large organisations, people spend hours searching for the right data. A metadata model helps answer: "Where is customer data stored?" or "What does this column actually mean?"

Understanding Data

Numbers in a spreadsheet are meaningless without context. Metadata provides definitions, examples, and business meaning, turning raw data into usable information.

Trusting Data

Who owns this data? Where did it come from? Is it accurate? Metadata tracks lineage and ownership, so you know whether to trust what you're looking at.

Governing Data

Regulations require knowing where personal data lives and who can access it. Metadata enables compliance by tracking sensitivity, policies, and access controls.

The Core Concepts

Assets

The "things" being catalogued: tables, columns, files, business terms, people. Anything you need to track. This model defines 19 asset types across 6 layers.

Relationships

How assets connect. A column belongs to a table. A person owns a database. Data flows from one table to another. These connections create the knowledge graph.

Attributes

Properties of assets. A column has a data type. A table has a sensitivity level. A business term has a definition. Attributes add the descriptive detail.

Layers

Logical groupings of related assets. Physical layer for databases and files. Business layer for terms and policies. Ownership layer for people and roles. Each layer serves a distinct purpose.

How To Use This Document

Use the tabs above to explore:

Overview : Quick statistics and the layer architecture at a glance
Layer View : Detailed diagrams showing how assets connect within each layer
Interactive Graph : Explore the full model visually; click nodes to see connections
Compliance Notes : How privacy, risk, and security attributes work
Scope : What's explicitly excluded from this model

Select any asset in the side panel to see its definition, relationships, and attributes.

Assets

Alternate Types

~50

Relationships

Layers

Layer Architecture

Construction Principles

Minimalism

As few assets, relationships, and attributes as possible. Connections strictly defined.

Flexibility

Asset types allow variants without new assets. Model adapts with minimal structural changes.

Sufficiency

All core metadata needs addressed: groupings, hierarchy, and lineage at multiple granularities.

Key Design Decisions

Edges as Assets

LINEAGE EDGE and ROLE are modelled as assets rather than simple relationships. This allows them to carry attributes (transformation details, ownership type, time validity) while functioning as edges in visualisations.

Dimensional Layer for Org Hierarchy

TAGs provide natural organisational inheritance (Global → Europe → UK). Policy scope and classification cascade through hierarchy.

Alternate Types

Asset Types allow variants without new assets. TABLE becomes Database View, FILE becomes API Endpoint. Model adapts with minimal structural changes.

XOR Constraints

COLUMN belongs to TABLE or REPORT (not both). BUSINESS RULE grouped by POLICY or CONTRACT (not both). ROLE links to PERSON or GROUP.

Bidirectional CDE Inheritance

Critical Data Element status propagates between BUSINESS TERM ↔ COLUMN, but only one hop. Inherited status doesn't propagate further.

Physical Path Precedence

When ownership could inherit from both physical (DB→Schema→Table) and conceptual (Business Term) paths, physical wins by default.

Time as Visibility Filter

valid_from/valid_to filter what's visible, not a spatial dimension. Nodes appear/disappear rather than move through time.

Diamond Resolution by Timestamp

When same-named BUSINESS TERMs have equal hierarchical distance, earliest creation timestamp wins. Override available at any scope.

Dimensional Layer

▼

Ownership Layer

▼

Lineage Layer

▼

Data Quality Layer

▼

Business Layer

▼

Physical Layer

▼

Layers

Dimensional

Ownership

Lineage

Data Quality

Business

Physical

Edge Types

Standard

XOR constraint

Universal (TAG/ROLE)

Click to enable scroll-zoom • Drag to pan • Click nodes to highlight connections

Compliance, Privacy, Risk, Security & Sensitivity

Compliance is applied through POLICY and TAGs to assets. The remaining attributes (pii, risk, security, sensitivity) are applied directly to assets. All four attributes are optional.

PII

What the data is. Contains personally identifiable information, directly or in combination with other assets.

Boolean · false < true

Risk

What happens if things go wrong. Business impact if asset is compromised, lost, or misused.

Category · low < medium < high < critical

Security

How it must be protected. Protection controls required for the asset.

Category · low < medium < high < critical

Sensitivity

Who can access it. Classification level determining permitted access and distribution.

Category · public < internal < confidential < restricted

Applicable Assets

All four attributes apply to: DATABASE, SCHEMA, TABLE, COLUMN, DIRECTORY, FILE, REPORT and BUSINESS TERM.

Inheritance Rules

Downward Inheritance (Physical Path)

DATABASE → SCHEMA → TABLE → COLUMN
DIRECTORY → FILE
FILE → TABLE (if file contains table)
REPORT → COLUMN (report elements)

Child inherits parent's value unless explicitly set. Explicit value overrides inheritance.

Bidirectional Inheritance (BT ↔ COLUMN)

Follows CDE pattern: native values propagate one hop, inherited values do not propagate further.

BUSINESS TERM (native) → grouped COLUMNs (inherited)
COLUMN (native) → its BUSINESS TERM (inherited)

Conflict Resolution

When an asset could inherit from multiple paths (physical and BT), the higher value wins.

Multi-concept columns: highest value from any linked BT applies.

Null Semantics

Null inherits from parent if parent is set. Null with null parent remains null.

Null does not participate in "higher wins" logic — a set value always beats null.

No Upward Inheritance

A TABLE does not inherit sensitivity from its COLUMNs. Upward roll-up is computed at query time for analytics.

Enables queries like "show tables containing restricted columns" without storing derived attributes.

Validation

All attribute values must be from the defined fixed set. This enables ordering for conflict resolution.

Mutability: Mutable, fixed (no custom values permitted).

Scope & Boundaries

This document defines the logical metadata model: the conceptual structure of assets, relationships, and attributes. It does not prescribe implementation details. The authoritative source is the markdown specification (standardised_metadata_model.md).

Explicitly Out of Scope

Issues

Issue tracking, bug reports, and problem resolution workflows are not modelled.

Permissions

Requires workflows and direct ties to real data. Access control implementation is a separate concern.

Workflows

Approval processes, state machines, and task orchestration are implementation details.

Operational Data

Runtime outputs such as DQ execution results, query logs, and system metrics are not part of the model.

Enforced Visibility

Can set an asset attribute as 'confidential' but can't mask that asset from users who shouldn't see it. Model assumes transparency.

Productisation Requirements

To build a working metadata management product from this model, the following capabilities would need to be implemented:

Search & Discovery

Full-text search across all assets. Faceted filtering by layer, type, TAG, ownership. Fuzzy matching for typo tolerance. Relevance ranking.

Consider: Elasticsearch, Typesense, or similar

Ingestion Framework

Connectors for source systems (databases, BI tools, file systems). Schema extraction. Change detection. Bulk and incremental sync.

Patterns: Polling, CDC, push-based webhooks

User Management

Authentication (SSO/SAML/OIDC). User profiles linked to PERSON assets. Session management. API keys for programmatic access.

PERSON in model ≠ system user (but can be linked)

Change Tracking & Audit

Full history of all changes. Who changed what, when. Diff views. Rollback capability. Compliance audit trails.

Event sourcing or CDC on metadata store

Admin Console

DIMENSION/TAG hierarchy management. Asset type configuration. Validation rule customisation. Bulk operations. Import/export.

Governance team tooling

Lineage Visualisation

Interactive lineage graphs. Impact analysis (upstream/downstream). Column-level lineage rendering. Filtering by scope.

Core differentiator for data governance

Data Quality Integration

Rule execution engine or integration with external DQ tools. Result capture. Threshold alerting. Trend tracking.

Great Expectations, dbt tests, custom engines

Notifications & Workflows

Subscription to asset changes. Approval workflows for sensitive changes. Slack/email/webhook integrations.

Ownership change requests, deprecation notices

API Layer

REST or GraphQL endpoints for all operations. Bulk mutations. Streaming for large exports. SDK generation.

Enables ecosystem integrations

Reporting & Analytics

Coverage metrics (% assets with owners, descriptions). Quality scores. Usage analytics. Executive dashboards.

Metadata about metadata

Test Your Knowledge

Easy

Core concepts & building blocks

Medium

Model mechanics

Hard

Design reasoning & use cases

Asset Detail

Select an asset to explore

Click any node in the diagrams, or use the dropdown above