What Is This?
This is a metadata model: a blueprint for organising information about your data. Think of it like a library catalogue. The catalogue doesn't contain the books themselves, but it tells you what books exist, where they are, who wrote them, and how they're related.
Why Does This Matter?
Finding Data
In large organisations, people spend hours searching for the right data. A metadata model helps answer: "Where is customer data stored?" or "What does this column actually mean?"
Understanding Data
Numbers in a spreadsheet are meaningless without context. Metadata provides definitions, examples, and business meaning, turning raw data into usable information.
Trusting Data
Who owns this data? Where did it come from? Is it accurate? Metadata tracks lineage and ownership, so you know whether to trust what you're looking at.
Governing Data
Regulations require knowing where personal data lives and who can access it. Metadata enables compliance by tracking sensitivity, policies, and access controls.
The Core Concepts
Assets
The "things" being catalogued: tables, columns, files, business terms, people. Anything you need to track. This model defines 19 asset types across 6 layers.
Relationships
How assets connect. A column belongs to a table. A person owns a database. Data flows from one table to another. These connections create the knowledge graph.
Attributes
Properties of assets. A column has a data type. A table has a sensitivity level. A business term has a definition. Attributes add the descriptive detail.
Layers
Logical groupings of related assets. Physical layer for databases and files. Business layer for terms and policies. Ownership layer for people and roles. Each layer serves a distinct purpose.
How To Use This Document
Use the tabs above to explore:
- Overview : Quick statistics and the layer architecture at a glance
- Layer View : Detailed diagrams showing how assets connect within each layer
- Interactive Graph : Explore the full model visually; click nodes to see connections
- Compliance Notes : How privacy, risk, and security attributes work
- Scope : What's explicitly excluded from this model
Select any asset in the side panel to see its definition, relationships, and attributes.
Layer Architecture
Construction Principles
Minimalism
As few assets, relationships, and attributes as possible. Connections strictly defined.
Flexibility
Asset types allow variants without new assets. Model adapts with minimal structural changes.
Sufficiency
All core metadata needs addressed: groupings, hierarchy, and lineage at multiple granularities.
Key Design Decisions
Edges as Assets
LINEAGE EDGE and ROLE are modelled as assets rather than simple relationships. This allows them to carry attributes (transformation details, ownership type, time validity) while functioning as edges in visualisations.
Dimensional Layer for Org Hierarchy
TAGs provide natural organisational inheritance (Global → Europe → UK). Policy scope and classification cascade through hierarchy.
Alternate Types
Asset Types allow variants without new assets. TABLE becomes Database View, FILE becomes API Endpoint. Model adapts with minimal structural changes.
XOR Constraints
COLUMN belongs to TABLE or REPORT (not both). BUSINESS RULE grouped by POLICY or CONTRACT (not both). ROLE links to PERSON or GROUP.
Bidirectional CDE Inheritance
Critical Data Element status propagates between BUSINESS TERM ↔ COLUMN, but only one hop. Inherited status doesn't propagate further.
Physical Path Precedence
When ownership could inherit from both physical (DB→Schema→Table) and conceptual (Business Term) paths, physical wins by default.
Time as Visibility Filter
valid_from/valid_to filter what's visible, not a spatial dimension. Nodes appear/disappear rather than move through time.
Diamond Resolution by Timestamp
When same-named BUSINESS TERMs have equal hierarchical distance, earliest creation timestamp wins. Override available at any scope.
Dimensional Layer
Ownership Layer
Lineage Layer
Data Quality Layer
Business Layer
Physical Layer
Compliance, Privacy, Risk, Security & Sensitivity
Compliance is applied through POLICY and TAGs to assets. The remaining attributes (pii, risk, security, sensitivity) are applied directly to assets. All four attributes are optional.
PII
What the data is. Contains personally identifiable information, directly or in combination with other assets.
Boolean · false < true
Risk
What happens if things go wrong. Business impact if asset is compromised, lost, or misused.
Category · low < medium < high < critical
Security
How it must be protected. Protection controls required for the asset.
Category · low < medium < high < critical
Sensitivity
Who can access it. Classification level determining permitted access and distribution.
Category · public < internal < confidential < restricted
Applicable Assets
All four attributes apply to: DATABASE, SCHEMA, TABLE, COLUMN, DIRECTORY, FILE, REPORT and BUSINESS TERM.
Inheritance Rules
Downward Inheritance (Physical Path)
DATABASE → SCHEMA → TABLE → COLUMN
DIRECTORY → FILE
FILE → TABLE (if file contains table)
REPORT → COLUMN (report elements)
Child inherits parent's value unless explicitly set. Explicit value overrides inheritance.
Bidirectional Inheritance (BT ↔ COLUMN)
Follows CDE pattern: native values propagate one hop, inherited values do not propagate further.
BUSINESS TERM (native) → grouped COLUMNs (inherited)
COLUMN (native) → its BUSINESS TERM (inherited)
Conflict Resolution
When an asset could inherit from multiple paths (physical and BT), the higher value wins.
Multi-concept columns: highest value from any linked BT applies.
Null Semantics
Null inherits from parent if parent is set. Null with null parent remains null.
Null does not participate in "higher wins" logic — a set value always beats null.
No Upward Inheritance
A TABLE does not inherit sensitivity from its COLUMNs. Upward roll-up is computed at query time for analytics.
Enables queries like "show tables containing restricted columns" without storing derived attributes.
Validation
All attribute values must be from the defined fixed set. This enables ordering for conflict resolution.
Mutability: Mutable, fixed (no custom values permitted).
Scope & Boundaries
This document defines the logical metadata model: the conceptual structure of assets, relationships, and attributes. It does not prescribe implementation details. The authoritative source is the markdown specification (standardised_metadata_model.md).
Explicitly Out of Scope
Issues
Issue tracking, bug reports, and problem resolution workflows are not modelled.
Permissions
Requires workflows and direct ties to real data. Access control implementation is a separate concern.
Workflows
Approval processes, state machines, and task orchestration are implementation details.
Operational Data
Runtime outputs such as DQ execution results, query logs, and system metrics are not part of the model.
Enforced Visibility
Can set an asset attribute as 'confidential' but can't mask that asset from users who shouldn't see it. Model assumes transparency.
Productisation Requirements
To build a working metadata management product from this model, the following capabilities would need to be implemented:
Search & Discovery
Full-text search across all assets. Faceted filtering by layer, type, TAG, ownership. Fuzzy matching for typo tolerance. Relevance ranking.
Consider: Elasticsearch, Typesense, or similar
Ingestion Framework
Connectors for source systems (databases, BI tools, file systems). Schema extraction. Change detection. Bulk and incremental sync.
Patterns: Polling, CDC, push-based webhooks
User Management
Authentication (SSO/SAML/OIDC). User profiles linked to PERSON assets. Session management. API keys for programmatic access.
PERSON in model ≠ system user (but can be linked)
Change Tracking & Audit
Full history of all changes. Who changed what, when. Diff views. Rollback capability. Compliance audit trails.
Event sourcing or CDC on metadata store
Admin Console
DIMENSION/TAG hierarchy management. Asset type configuration. Validation rule customisation. Bulk operations. Import/export.
Governance team tooling
Lineage Visualisation
Interactive lineage graphs. Impact analysis (upstream/downstream). Column-level lineage rendering. Filtering by scope.
Core differentiator for data governance
Data Quality Integration
Rule execution engine or integration with external DQ tools. Result capture. Threshold alerting. Trend tracking.
Great Expectations, dbt tests, custom engines
Notifications & Workflows
Subscription to asset changes. Approval workflows for sensitive changes. Slack/email/webhook integrations.
Ownership change requests, deprecation notices
API Layer
REST or GraphQL endpoints for all operations. Bulk mutations. Streaming for large exports. SDK generation.
Enables ecosystem integrations
Reporting & Analytics
Coverage metrics (% assets with owners, descriptions). Quality scores. Usage analytics. Executive dashboards.
Metadata about metadata
Test Your Knowledge
Select an asset to explore
Click any node in the diagrams, or use the dropdown above