Typed Markdown Collections Specification

Version: 0.2.1 Last Updated: 2026-02-15

Abstract

This specification defines the behaviour of tools that treat folders of markdown files as typed, queryable data collections. It covers schema definition, field types, validation, querying, and CRUD operations.

Motivation

Markdown files with YAML frontmatter are a common way to store structured content. The pattern appears in static site generators, knowledge management tools like Obsidian, documentation systems, and increasingly in AI agent frameworks that use markdown for persistent state.

Each of these ecosystems has developed its own conventions for frontmatter structure, querying, and validation. This specification defines one coherent set of behaviours so that:

A CLI tool and an editor plugin can operate on the same files with consistent semantics
An AI agent can read and write markdown files that a human can also inspect and edit
Tool authors have a behaviour contract to implement against rather than inventing new conventions

Intended implementers

CLI tools for querying, validating, and manipulating markdown collections from the command line.

Editor plugins (for Obsidian, VS Code, etc.) that provide validation, autocomplete, and query interfaces. The expression syntax is designed for compatibility with Obsidian Bases.

Libraries in various languages that other applications can use to work with typed markdown.

AI agent frameworks that need structured, human-readable persistent storage.

What a conforming tool does

A tool implementing this specification:

Recognises collections by the presence of an mdbase.yaml config file
Loads type definitions from markdown files in a designated folder
Matches files to types based on explicit declaration or configurable rules
Validates frontmatter against type schemas, reporting errors at configurable severity levels
Executes queries using an expression language for filtering and sorting (with optional advanced features like grouping and summaries)
Performs CRUD operations with validation, default values, and auto-generated fields
Updates references when files are renamed, keeping links consistent across the collection

The specification defines the expected behaviour for each of these capabilities, along with conformance levels for partial implementations.

Design Principles

Files are the source of truth. Tools read from and write to the filesystem. Indexes and caches are derived and disposable.

Human-readable first. Tools should not require proprietary formats. A user with a text editor should be able to read and modify any file.

Progressive strictness. Tools should work on collections with no schema at all. Validation is opt-in and configurable.

Portable. Collections should work with any conforming tool. No vendor lock-in.

Git-friendly. All persistent state is text files suitable for version control.

How It Works

A collection is a folder with an `mdbase.yaml` marker

my-project/
├── mdbase.yaml            # Marks this folder as a collection
├── _types/                # Type definitions (schemas)
│   ├── task.md
│   └── person.md
├── tasks/
│   ├── fix-bug.md         # A record of type "task"
│   └── write-docs.md
└── people/
    └── alice.md           # A record of type "person"

The minimal config just declares the spec version:

# mdbase.yaml
spec_version: "0.2.1"

Types are defined as markdown files

A type is a schema for a category of files. Types live in the _types/ folder and are themselves markdown — the frontmatter defines the schema, the body documents it.

---
name: task
fields:
  title:
    type: string
    required: true
  status:
    type: enum
    values: [open, in_progress, done]
    default: open
  priority:
    type: integer
    min: 1
    max: 5
  assignee:
    type: link
    target: person
---

# Task

A task represents a unit of work. Set `status` to track progress.

Records are markdown files with typed frontmatter

A file declares its type and provides field values in frontmatter. The body is free-form markdown.

---
type: task
title: Fix the login bug
status: in_progress
priority: 4
assignee: "[[alice]]"
tags: [bug, auth]
---

The login form throws a validation error when the email contains a `+` character.

Queries filter and sort records using expressions

Queries are YAML objects with optional clauses for filtering, sorting, and pagination:

query:
  types: [task]
  where:
    and:
      - 'status != "done"'
      - "priority >= 3"
  order_by:
    - field: due_date
      direction: asc
  limit: 20

The expression language supports field access, comparison, boolean logic, string and list methods, date arithmetic, and link traversal:

status == "open" && tags.contains("urgent")
due_date < today() + "7d"
assignee.asFile().team == "engineering"

Validation is progressive

Collections work with no types at all — every file is an untyped record. Types can be added incrementally, and validation severity is configurable per-collection or per-type (off, warn, error). Strictness controls whether unknown fields are allowed, warned, or rejected.

Links connect records across the collection

Records can reference each other using wikilinks ([[alice]]) or markdown links ([Alice](../people/alice.md)). When a file is renamed, conforming tools update all references automatically.

Conformance is levelled

Implementations don't need to support everything. Six conformance levels let tools start with basic CRUD (Level 1) and progressively add matching, querying, links, reference updates, and caching. See §14 for details.

Specification Structure

Document	Description
01-terminology.md	Definitions of key terms
02-collection-layout.md	How tools identify and scan collections
03-frontmatter.md	Frontmatter parsing, null semantics, serialization
04-configuration.md	The `mdbase.yaml` configuration file
05-types.md	Type definitions as markdown files
06-matching.md	How tools match files to types
07-field-types.md	Primitive and composite field types
08-links.md	Link syntax, parsing, resolution
09-validation.md	Validation levels and error reporting
10-querying.md	Query model, filters, sorting
11-expressions.md	Expression language for filters and formulas
12-operations.md	Create, Read, Update, Delete, Rename
13-caching.md	Optional caching and indexing
14-conformance.md	Conformance levels and testing
15-watching.md	Watch mode and change events
Appendix A	Complete examples
Appendix B	Formal expression grammar
Appendix C	Standard error codes
Appendix D	Compatibility with existing tools

Versioning

This specification uses semantic versioning. The current version is 0.2.1. Breaking changes may occur before 1.0.0.

Tools should declare which specification version they implement and should reject configuration files with unsupported spec_version values.

License

This specification is released under the MIT License.

1. Terminology

This section defines the key terms used throughout the specification. Understanding these definitions is essential for correctly interpreting the requirements.

Core Concepts

Collection
A directory (and optionally its subfolders) containing markdown files managed as a unit. A collection is identified by the presence of an mdbase.yaml configuration file at its root. The collection is the fundamental unit of organization—all operations, queries, and validations occur within the scope of a single collection.

Collection Root
The directory containing the mdbase.yaml configuration file. All paths in the specification are relative to this directory unless otherwise stated.

File (or Record)
A markdown file within a collection. Files have the extension .md (or optionally .mdx or .markdown if configured). Each file represents a single record in the collection—analogous to a row in a database table, but richer: it has structured frontmatter, unstructured body content, and file system metadata.

Frontmatter
YAML metadata at the beginning of a file, delimited by --- markers. The frontmatter is a YAML mapping (object) containing the structured fields of the record.

---
title: My Document
tags: [important, review]
---

The body content begins here.

Body
The markdown content following the frontmatter. The body is treated as opaque content by default—this specification primarily concerns itself with frontmatter structure. Body querying is optional unless an implementation claims Level 3+ conformance, in which case file.body filtering is required.

Type
A named schema defining the expected frontmatter fields, their types, constraints, and validation rules for a category of files. Types are themselves defined as markdown files in a designated folder, making them versionable and documentable. A file may be associated with zero, one, or multiple types.

Type Definition File
A markdown file in the types folder whose frontmatter defines a type schema. The body of the file can contain documentation, examples, and usage notes for the type.

Untyped File
A file that is not associated with any type. Untyped files are valid members of a collection—they simply have no schema constraints applied to them. This allows for gradual adoption of typing.

Config File
The mdbase.yaml file at the collection root. This file defines global settings, the location of type definitions, and collection-wide behavior. It does not contain type definitions themselves—those live in separate markdown files.

Operations and Queries

Expression
A string that evaluates to a value, used in query filters and computed formulas. Expressions follow the syntax defined in the Expression Language section.

Query
A request to retrieve files matching certain criteria. Queries can filter by type, field values, file metadata, and path patterns, with results sorted and paginated.

Formula
A computed field defined by an expression. Formulas are evaluated at query time and can be used for filtering, sorting, and display.

Validation
The process of checking whether a file's frontmatter conforms to the schemas of its matched types. Validation can report issues, warn, or fail operations depending on configuration.

Links and References

Link
A reference from one file to another, expressed in frontmatter or body content. Links can use wikilink syntax ([[target]]), markdown link syntax ([text](path.md)), or bare paths.

Resolution
The process of determining which file a link refers to. Resolution takes into account relative paths, collection-wide search, and optional type-scoped lookups.

Backlink
An incoming link—a reference to a file from another file. Backlinks require indexing to compute efficiently and are an optional feature.

Implementation Terms

Implementation
Any tool, library, or application that reads, writes, or operates on collections according to this specification.

Conformance Level
A defined subset of the specification that an implementation may claim to support. See Conformance for the defined levels.

Cache
An optional derived data store that accelerates queries. Caches MUST be rebuildable from the source files and MUST NOT be the source of truth.

RFC 2119 Keywords

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

In brief:

MUST / REQUIRED / SHALL: Absolute requirement
MUST NOT / SHALL NOT: Absolute prohibition
SHOULD / RECOMMENDED: There may be valid reasons to ignore, but implications must be understood
SHOULD NOT / NOT RECOMMENDED: There may be valid reasons to do it, but implications must be understood
MAY / OPTIONAL: Truly optional; implementations may or may not include the feature

2. Collection Layout

This section defines how collections are identified, how files are discovered, and the overall structure of a compliant collection.

2.1 Identifying a Collection

A directory is recognized as a collection if and only if it contains a file named mdbase.yaml at its root. This file is the collection root marker. The directory containing this file is the collection root, and all paths in the specification are relative to this directory.

my-project/
├── mdbase.yaml           # Collection root marker
├── _types/             # Type definition files (configurable location)
│   ├── task.md
│   └── note.md
├── tasks/
│   ├── fix-bug.md
│   └── write-docs.md
├── notes/
│   └── meeting-2024-01.md
└── README.md

If a directory does not contain mdbase.yaml, it is not a collection. Implementations MUST NOT treat arbitrary directories of markdown files as collections without this marker.

2.2 File Discovery

Included Files

Implementations MUST scan the collection root and, by default, all subdirectories (recursively) for markdown files.

A file is considered a markdown file if:

It has the extension .md, OR
It has an extension listed in settings.extensions in the config file (e.g., .mdx, .markdown)

Implementations MUST treat files with these extensions as collection members (records).

Excluded Paths

Implementations MUST exclude certain paths from scanning:

The mdbase.yaml config file itself (it is not a record)
Paths listed in settings.exclude in the config file
The types folder (by default _types/, configurable via settings.types_folder)
The cache folder if present (by default .mdbase/)

Default exclusions (applied unless overridden):

.git
node_modules
.mdbase

Symlinks

Implementations MAY follow symlinks during scanning, but MUST prevent the resolved path from escaping the collection root. Symlinks that resolve outside the collection MUST be ignored with a warning.

Subdirectory Scanning

By default, implementations MUST scan subdirectories recursively. This behavior can be disabled by setting settings.include_subfolders: false in the config file.

When subdirectory scanning is disabled, only files directly in the collection root are considered records.

2.3 The Types Folder

Type definitions are stored as markdown files in a designated folder. By default, this folder is _types/ at the collection root, but it can be configured via settings.types_folder.

# mdbase.yaml
settings:
  types_folder: "_schemas"  # Use _schemas/ instead of _types/

The types folder:

MUST be excluded from the regular file scan (type files are not records)
MUST be scanned separately to load type definitions
MAY contain subdirectories for organization (all .md files are scanned)

See Types for the format of type definition files.

2.4 Path Conventions

All paths in the specification and in queries use forward slashes (/) regardless of operating system. Implementations on Windows MUST normalize backslashes to forward slashes.

Paths are relative to the collection root unless explicitly stated otherwise.

Examples:

tasks/fix-bug.md refers to a file in the tasks subdirectory
./sibling.md in a link is relative to the containing file's directory
../other/file.md in a link navigates up one directory

2.5 Reserved Names

The following names have special meaning and SHOULD NOT be used as regular record filenames:

Name	Purpose
`mdbase.yaml`	Collection configuration
`_types/`	Default types folder (configurable)
`.mdbase/`	Default cache folder

Implementations SHOULD warn if a user attempts to create a record with a reserved name.

2.6 Minimal Collection Example

The simplest valid collection consists of a config file and zero or more markdown files:

minimal/
├── mdbase.yaml
└── hello.md

mdbase.yaml:

spec_version: "0.2.1"

hello.md:

---
title: Hello World
---

This is a minimal collection with one untyped file.

This collection has no types defined. The single file is untyped but still a valid record. As the collection grows, types can be added incrementally.

2.7 Recommended Collection Structure

While the specification allows flexibility, the following structure is recommended for clarity:

my-collection/
├── mdbase.yaml               # Required: collection configuration
├── _types/                 # Type definitions
│   ├── task.md
│   ├── note.md
│   └── person.md
├── .mdbase/                  # Cache (gitignored)
│   └── index.sqlite
├── tasks/                  # Records organized by type or purpose
│   ├── task-001.md
│   └── task-002.md
├── notes/
│   └── weekly-review.md
├── people/
│   └── alice.md
└── README.md               # Documentation (also a record unless excluded)

Note that README.md is a valid record in this structure. If you want to exclude documentation files from the collection, add them to settings.exclude.

2.8 Nested Collections

If a subdirectory within a collection also contains an mdbase.yaml file, it defines an independent nested collection:

The parent collection's scan MUST NOT descend into the nested collection.
The nested collection's files are NOT records of the parent collection.
The nested mdbase.yaml acts as a boundary marker, similar to exclude patterns.
Implementations SHOULD automatically exclude directories containing mdbase.yaml.

my-collection/
├── mdbase.yaml          # Parent collection
├── tasks/
│   └── task-001.md      # Record in parent
└── sub-project/
    ├── mdbase.yaml      # Nested collection (independent)
    └── docs/
        └── readme.md    # Record in sub-project, NOT in parent

This behavior ensures that collections remain self-contained and do not interfere with each other.

2.9 Non-Markdown Files

Collections may contain non-markdown files such as images, PDFs, or other binary assets. These files are NOT records — they have no frontmatter and are not returned by queries.

Status in the Collection

Non-markdown files are valid link targets: [[photo.png]] and ![img](photo.png) resolve normally
They appear in file.links when referenced via non-embed link syntax (e.g., [[photo.png]])
They appear in file.embeds when referenced via embed syntax (![[...]] or ![](...))
Non-markdown files are not assigned types and cannot be validated against type schemas

File Discovery

File discovery MUST skip non-markdown files when building the record set
File discovery MUST NOT skip non-markdown files during link resolution — links to images, PDFs, and other assets MUST resolve by path
Implementations MUST resolve links to non-markdown files by path only (no id_field lookup, since non-markdown files are not records and have no frontmatter)

Example

my-collection/
├── mdbase.yaml
├── tasks/
│   └── task-001.md        # Record (included in queries)
├── images/
│   ├── diagram.png        # Not a record, but valid link target
│   └── screenshot.jpg     # Not a record, but valid link target
└── attachments/
    └── report.pdf         # Not a record, but valid link target

In task-001.md:

---
type: task
title: Fix the layout
---

See the [[images/diagram.png]] for reference.

![[images/screenshot.jpg]]

Here, diagram.png appears in file.links because it uses non-embed link syntax; screenshot.jpg appears in file.embeds because it uses embed syntax.

3. Frontmatter Parsing and Serialization

This section defines how frontmatter is parsed from files and how it should be written back. Correct handling of YAML edge cases—especially null values and empty fields—is essential for interoperability.

3.1 Frontmatter Delimiters

A file MAY begin with YAML frontmatter. Frontmatter is delimited by two lines consisting of exactly three hyphens (---):

---
title: My Document
---

# Heading

Body content begins here.

Rules:

The opening --- MUST be the very first line of the file (no leading whitespace or blank lines). A UTF-8 BOM, if present, MUST be ignored for this check.
The closing --- MUST be on its own line.
The content between the delimiters MUST be valid YAML.
If a file does not begin with ---, it has no frontmatter. The entire file is treated as body content, and the record has an empty frontmatter mapping ({}).
If additional --- blocks appear later in the file, they are part of the body and are not treated as frontmatter.

Examples of files without frontmatter:

# Just a heading

No frontmatter here.


---
This is not frontmatter because there's a blank line before the dashes.
---

3.2 YAML Parsing Requirements

Top-Level Structure

The frontmatter MUST parse as a YAML mapping (object/dictionary).

If the frontmatter parses as a different YAML type (scalar, list, null), implementations MUST treat it as invalid frontmatter and handle according to the validation level:

off: Treat as empty frontmatter, log warning
warn: Treat as empty frontmatter, emit warning
error: Fail the operation

Invalid example:

---
- item1
- item2
---

This is a YAML list, not a mapping, and is invalid frontmatter.

YAML Version

Implementations SHOULD support YAML 1.2. Implementations MAY support YAML 1.1 for compatibility with existing tools, but SHOULD prefer 1.2 semantics where they differ.

Character Encoding

Files MUST be UTF-8 encoded. Implementations MUST reject files with invalid UTF-8 sequences.

3.3 Null and Empty Value Semantics

Correct handling of null and empty values is critical for interoperability. This section defines canonical behavior that all implementations MUST follow.

Reading Null Values

The following YAML patterns MUST all be interpreted as null:

# Explicit null keyword
field1: null
field2: Null
field3: NULL

# Tilde (YAML null alias)
field4: ~

# Empty value (key with no value)
field5:

# Explicit empty (flow style)
field6:

All of the above result in field having the value null.

Empty String vs Null

An empty string is distinct from null:

# This is null:
empty_null:

# This is an empty string (not null):
empty_string: ""

# This is also an empty string:
empty_quoted: ''

Implementations MUST preserve this distinction. A field with value "" is a present field with an empty string value. A field with value null (or empty) is a present field with no value.

Missing vs Null

A missing field (key not present) is distinct from a null field (key present with null value):

---
present_null: null
present_string: "hello"
# 'missing_field' is not here
---

present_null is present with value null
present_string is present with value "hello"
missing_field is missing (not present at all)

This distinction matters for:

The exists() function in expressions (returns true when key is present, even if null)
The isEmpty() method (returns true when value is null, empty, or missing)
The required constraint (requires present and non-null)
Default value application (applies to missing, not to null)

Summary Table

YAML	Parsed Value	`exists(field)`	Satisfies `required`? (before defaults)
`field: null`	null	true	No
`field: ~`	null	true	No
`field:`	null	true	No
`field: ""`	`""` (empty string)	true	Yes (string value)
(key absent)	undefined	false	No

Presence vs Meaningful Value

exists(field) is true when the key is present in raw persisted frontmatter, even if the value is null.
field.isEmpty() is true when the value is null, empty, or missing.
required: true requires the key to be present in the effective frontmatter and the value to be non-null (see §9.2.1).

Implementations MUST preserve these distinctions in validation and query evaluation.

Implementation Notes (Non-Normative)

Practical approach to avoid null-semantics bugs:

Preserve raw parsed keys and values exactly after YAML parse.
Represent missing keys and present-null keys distinctly in memory.
Run exists() and file.hasProperty() checks against raw persisted keys.
Build effective frontmatter in a separate projection layer where defaults are applied only to missing keys.
Run required checks after effective-frontmatter construction, but still treat explicit null as failing required.

3.4 Writing Frontmatter

When implementations write or update frontmatter, they MUST follow these rules to ensure consistency and avoid ambiguity.

Never Write Empty-Value Nulls

Implementations MUST NOT write the "empty value" null form:

# ❌ NEVER write this
field:

This form is ambiguous in some contexts and causes issues with YAML tools that normalize whitespace.

Writing Null Values

When a field's value is null and the field should be written, implementations MUST use one of:

Option 1: Explicit null (preferred when preserving the field)

field: null

Option 2: Omit the field entirely (preferred when null means "no value")

# field is simply not present

The choice between these options is controlled by settings.write_nulls:

`write_nulls`	Behavior
`"omit"` (default)	Omit fields with null values
`"explicit"`	Write `field: null`

Writing Empty Strings

Empty strings MUST be written with explicit quotes:

field: ""

Writing Empty Lists

Empty lists can be written as [] or omitted, controlled by settings.write_empty_lists:

`write_empty_lists`	Behavior
`true` (default)	Write `field: []`
`false`	Omit fields with empty list values

3.5 Formatting Preservation

When updating a file, implementations SHOULD preserve as much of the original formatting as practical. Preservation applies to untouched fields and content; fields explicitly modified by the operation MAY be re-serialized using implementation defaults.

SHOULD Preserve

Field order: Keep fields in their original order when possible
Blank lines: Preserve blank lines within frontmatter (YAML allows them)
String style: If a string was written with quotes, keep the quotes
Comment proximity: Keep comments near their associated fields

MUST Preserve

Body content: The body MUST NOT be modified by frontmatter updates (except when explicitly changing the body)
Line endings: Preserve the file's line ending style (LF vs CRLF)

MAY Normalize

Implementations MAY normalize:

Indentation (2 spaces is conventional)
Trailing whitespace
Final newline (files SHOULD end with a newline)

3.6 Special Characters in Field Names

Field names containing special characters MUST be quoted in YAML:

"field-with-dashes": value
"field.with.dots": value
"field:with:colons": value

Field names that would otherwise parse as non-strings in YAML (e.g., yes, no, null, true, false) MUST be quoted to ensure they are treated as strings.

In expressions and queries, such fields are accessed with bracket notation:

note["field-with-dashes"]

Implementations SHOULD avoid requiring special characters in schema-defined field names. User-defined fields may use them.

3.7 Multi-line Strings

YAML supports several multi-line string formats. Implementations MUST support all standard YAML multi-line syntaxes:

Literal block (preserves newlines):

description: |
  This is a multi-line string.
  Line breaks are preserved.

Folded block (newlines become spaces):

description: >
  This is a long line that will be
  folded into a single line.

Quoted strings with escapes:

description: "Line 1\nLine 2"

When writing multi-line values, implementations SHOULD use literal block style (|) for readability.

3.8 Type Coercion

YAML has automatic type inference that can cause surprises. Implementations MUST be aware of these patterns:

YAML Value	YAML Type	Notes
`true`, `false`	Boolean
`yes`, `no`	Boolean (YAML 1.1)	Avoid; prefer `true`/`false`
`on`, `off`	Boolean (YAML 1.1)	Avoid; prefer `true`/`false`
`123`	Integer
`12.5`	Float
`1e10`	Float (scientific)
`0x1A`	Integer (hex)
`.inf`, `-.inf`	Float (infinity)
`.nan`	Float (NaN)
`2024-01-15`	Date (if YAML date extension)	Implementations MAY parse as date
`null`, `~`	Null
`"123"`	String	Quoted values are strings

When schema specifies a type, implementations MUST coerce compatible values (e.g., reading 123 for a string field as "123"). When coercion is not possible, it is a validation error.

3.9 Example: Round-Trip Preservation

Given this input file:

---
title: My Task
status: open
tags:
  - important
  - review
due_date: 2024-03-15
notes: |
  This is a longer note.
  It spans multiple lines.
---

# Task Details

The body content here.

After updating status to "done", the output SHOULD be:

---
title: My Task
status: done
tags:
  - important
  - review
due_date: 2024-03-15
notes: |
  This is a longer note.
  It spans multiple lines.
---

# Task Details

The body content here.

Note that:

Field order is preserved
Multi-line string style is preserved
List formatting is preserved
Body content is unchanged

4. Configuration

This section defines the structure and semantics of the mdbase.yaml configuration file that identifies and configures a collection.

4.0 File Encoding

The mdbase.yaml configuration file and all type definition files MUST be encoded in UTF-8 (consistent with the UTF-8 requirement for markdown files in §3.2).

4.1 File Location and Format

The configuration file MUST be named mdbase.yaml and MUST be located at the collection root. mdbase.yml is NOT supported. This file:

Identifies the directory as a collection
Specifies the schema version
Configures collection behavior
Points to the types folder

The file MUST be valid YAML and MUST parse as a mapping at the top level.

4.2 Minimal Configuration

The simplest valid configuration declares only the specification version:

spec_version: "0.2.1"

This creates a collection with all default settings and no types (all files are untyped).

4.3 Full Configuration Schema

# =============================================================================
# REQUIRED
# =============================================================================

# Specification version this configuration conforms to
# Implementations MUST reject versions they do not support
spec_version: "0.2.1"

# =============================================================================
# OPTIONAL: Collection Metadata
# =============================================================================

# Human-readable name for the collection
name: "My Project Tasks"

# Description of the collection's purpose
description: "Task and note management for the My Project initiative"

# =============================================================================
# OPTIONAL: Settings
# =============================================================================

settings:
  # ---------------------------------------------------------------------------
  # File Discovery
  # ---------------------------------------------------------------------------
  
  # Additional file extensions to treat as markdown (beyond .md which is always included)
  # Default: []
  # Common additions: ["mdx", "markdown"]
  # Entries MAY include a leading dot; implementations MUST normalize to no-dot.
  extensions: ["mdx"]
  
  # Paths to exclude from scanning (relative to collection root)
  # Default: [".git", "node_modules", ".mdbase"]
  # Glob patterns are supported
  exclude:
    - ".git"
    - "node_modules"
    - ".mdbase"
    - "drafts/**"
    - "*.draft.md"
  
  # Whether to scan subdirectories recursively
  # Default: true
  include_subfolders: true
  
  # ---------------------------------------------------------------------------
  # Types Configuration
  # ---------------------------------------------------------------------------
  
  # Folder containing type definition files (relative to collection root)
  # Default: "_types"
  types_folder: "_types"
  
  # Frontmatter keys that explicitly declare a file's type(s)
  # If a file has any of these keys, its value determines the type(s)
  # Default: ["type", "types"]
  explicit_type_keys: ["type", "types"]
  
  # ---------------------------------------------------------------------------
  # Validation
  # ---------------------------------------------------------------------------
  
  # Default validation level for operations
  # "off": No validation
  # "warn": Report issues but don't fail
  # "error": Report issues and fail operations
  # Default: "warn"
  default_validation: "warn"
  
  # Default strictness for types that don't specify their own
  # false: Extra fields allowed
  # true: Extra fields cause validation failure
  # "warn": Extra fields allowed but emit warning
  # Default: false
  default_strict: false

  # ---------------------------------------------------------------------------
  # Time
  # ---------------------------------------------------------------------------

  # Timezone for now()/today() and naive date comparisons
  # Use an IANA timezone name (e.g., "UTC", "America/New_York")
  # Default: local system timezone
  timezone: "UTC"
  
  # ---------------------------------------------------------------------------
  # Link Resolution
  # ---------------------------------------------------------------------------
  
  # Field name used as unique identifier for link resolution
  # When a link is a simple name (no path), implementations search for files
  # where this field matches the link target
  # Default: "id"
  id_field: "id"
  
  # ---------------------------------------------------------------------------
  # Write Behavior
  # ---------------------------------------------------------------------------
  
  # How to handle null values when writing frontmatter
  # "omit": Don't write fields with null values
  # "explicit": Write as `field: null`
  # Default: "omit"
  write_nulls: "omit"

  # Whether to write fields that are filled solely by defaults
  # true: Defaults are materialized on disk
  # false: Defaults only appear in effective output, not on disk
  # Default: true
  write_defaults: true
  
  # Whether to write empty lists
  # true: Write as `field: []`
  # false: Omit fields with empty list values
  # Default: true
  write_empty_lists: true
  
  # ---------------------------------------------------------------------------
  # Rename Behavior
  # ---------------------------------------------------------------------------
  
  # Whether to update references across the collection when a file is renamed
  # Default: true
  rename_update_refs: true
  
  # ---------------------------------------------------------------------------
  # Caching
  # ---------------------------------------------------------------------------
  
  # Folder for cache files (relative to collection root)
  # Default: ".mdbase"
  cache_folder: ".mdbase"

4.4 Setting Details

`spec_version` (Required)

The version of this specification the configuration conforms to. Implementations MUST check this value and MUST reject configuration files with versions they do not support.

Valid values: "0.2.1"

Compatibility: Implementations MAY accept "0.2" as an alias for "0.2.1", but SHOULD emit a warning and normalize to "0.2.1" when writing.

4.4.1 Version Compatibility

The spec_version field uses semantic versioning (MAJOR.MINOR.PATCH):

PATCH bumps (e.g., 0.2.0 → 0.2.1): Clarifications and errata only. No behavioral changes. All implementations of X.Y.z are compatible with any X.Y.z′.

MINOR bumps (e.g., 0.2.0 → 0.3.0): Additive changes only — new optional fields, new expression functions, new config keys. Implementations of X.Y MUST ignore unknown config keys introduced in X.Y+1 rather than rejecting them. Collections authored for X.Y work on X.Y+N without modification.

MAJOR bumps (e.g., 0.x → 1.0): Breaking changes. Implementations MUST reject configuration files with a different major version than the one they support.

During the 0.x series: MINOR bumps MAY contain breaking changes. Implementations SHOULD treat 0.x and 0.y (x ≠ y) as potentially incompatible.

Unknown keys: Implementations MUST ignore unknown keys under settings with a warning, to support forward compatibility within a major version. Unknown top-level keys (outside settings) MUST also be ignored with a warning.

`name` and `description`

Human-readable metadata about the collection. These have no semantic effect but are useful for documentation and tooling that displays collection information.

`settings.extensions`

File extensions to scan. The extension .md is always implicitly included. This setting specifies additional extensions beyond .md:

Default: []

Normalization:

Implementations MUST treat entries with or without a leading dot as equivalent.
The .md extension is always implicitly included and MUST NOT be required in this list.
If md or .md appears in extensions, it SHOULD be ignored with a warning.

Example: To include MDX files:

settings:
  extensions: ["mdx"]

`settings.exclude`

Paths or glob patterns to exclude from file scanning. Paths are relative to the collection root.

Default: [".git", "node_modules", ".mdbase"]

Glob patterns:

* matches any characters except /
** matches any characters including /
? matches a single character

Example:

settings:
  exclude:
    - ".git"
    - "node_modules"
    - "*.draft.md"      # Exclude all draft files
    - "archive/**"      # Exclude everything in archive/

`settings.types_folder`

The folder containing type definition files. Type files are markdown files whose frontmatter defines a schema.

Default: "_types"

The types folder:

Is automatically excluded from the regular file scan
Is scanned separately to load type definitions
May contain subdirectories (all .md files are processed)

`settings.migrations_folder`

Folder containing migration manifests (see §5.11.1).

Default: "<types_folder>/_migrations" (e.g., _types/_migrations)

Migration manifests are optional. If the folder exists and an implementation supports migrations, it MUST load manifests from this location.

`settings.explicit_type_keys`

Frontmatter keys that can explicitly declare a file's type(s). When a file has one of these keys, its value determines the type assignment, overriding any match rules.

Default: ["type", "types"]

Usage:

# Single type
type: task

# Multiple types
types: [task, urgent]

Using type as a normal field: If you want a frontmatter field named type to be treated as ordinary data, remove it from settings.explicit_type_keys and choose different declaration keys (e.g., kind, kinds).

`settings.write_defaults`

Whether to persist fields that are filled solely by defaults.

Default: true

When true, create and update operations MUST write default-only fields to disk. When false, defaults are only applied to the effective frontmatter returned by read/query/validation.

`settings.default_validation`

The default validation level applied when not otherwise specified.

Value	Behavior
`"off"`	No validation performed
`"warn"`	Validation issues are reported but operations succeed
`"error"`	Validation issues cause operations to fail

Default: "warn"

`settings.default_strict`

Default strictness mode for types that don't declare their own.

Value	Behavior
`false`	Unknown fields are allowed
`"warn"`	Unknown fields are allowed but trigger warnings
`true`	Unknown fields cause validation failure

Default: false

`settings.timezone`

Timezone for date/time functions (now(), today()) and naive datetime comparisons.

Default: local system timezone

Format: IANA timezone name (e.g., "UTC", "America/New_York")

Portability note: For deterministic results across machines, collections SHOULD set settings.timezone explicitly (e.g., "UTC"). If unset, identical queries can yield different results on different systems.

`settings.id_field`

The field name used as a unique identifier for link resolution. When a link is a simple name (not a path), implementations search for files where this field matches.

Uniqueness requirement: Values of the id_field MUST be unique across the collection. Implementations MUST validate uniqueness and report duplicate_id issues when multiple files share the same id_field value.

Default: "id"

Example: With id_field: "id", the link [[task-001]] would resolve to a file with id: task-001 in its frontmatter.

`settings.write_nulls`

Controls how null values are written to frontmatter.

Value	Behavior
`"omit"`	Fields with null values are not written
`"explicit"`	Null values are written as `field: null`

Default: "omit"

`settings.rename_update_refs`

Whether renaming a file automatically updates references to it across the collection.

Default: true

When enabled, implementations MUST update:

Link fields in frontmatter that resolve to the renamed file
Link syntax in body content that references the renamed file

See Operations for details.

4.5 Configuration Validation

Implementations MUST validate the configuration file before processing the collection. Validation checks:

Structure: The file parses as valid YAML with a mapping at the top level
Required fields: spec_version is present
Type correctness: Each field has the expected type
Valid values: Enum fields have allowed values
Path validity: Paths in exclude, types_folder, etc. are syntactically valid

If validation fails, implementations MUST NOT process the collection and MUST report the error clearly.

4.6 Configuration Examples

Minimal

spec_version: "0.2.1"

Standard Project

spec_version: "0.2.1"
name: "Project Documentation"
description: "Specs, decisions, and meeting notes"

settings:
  exclude:
    - ".git"
    - "node_modules"
    - "drafts/**"
  default_validation: "error"

Knowledge Base with Custom Types Folder

spec_version: "0.2.1"
name: "Personal Knowledge Base"

settings:
  types_folder: "schemas"
  extensions: ["mdx"]
  default_strict: "warn"
  id_field: "uid"

Strict Validation

spec_version: "0.2.1"
name: "Production Data"

settings:
  default_validation: "error"
  default_strict: true
  write_nulls: "explicit"

4.7 Environment Variables (Optional)

Implementations MAY support environment variable substitution in configuration values using ${VAR} syntax (and MAY also support ${VAR:-default} for default values):

settings:
  cache_folder: "${MDBASE_CACHE:-/tmp/mdbase}"

This feature is OPTIONAL. If not supported, implementations MUST treat ${...} as literal strings. If ${VAR:-default} is not supported, implementations MUST treat the entire string literally (no partial expansion).

If supported, substitution is a pure string-to-string transform applied to all string scalars in the configuration tree (including nested objects and list values). If a placeholder is unsupported or malformed, implementations MUST leave the entire string unchanged (no partial expansion). If a variable is referenced without a default and is undefined, implementations MUST substitute the empty string.

4.8 Security Considerations

Regular Expressions

Match rules, field constraints (pattern), and expressions (the matches operator) may contain regular expressions.

Required baseline: Implementations MUST support ECMAScript (ES2018+) regular expression syntax as the baseline flavor. This aligns with JavaScript-based tools (e.g., Obsidian) and is available in every major programming language.

Required features (MUST support):

Feature	Syntax	Example
Character classes	`[abc]`, `[^abc]`, `\d`, `\w`, `\s`	`\d{4}`
Quantifiers	`*`, `+`, `?`, `{n}`, `{n,m}`	`\w+`
Alternation	`\|`	`cat\|dog`
Anchors	`^`, `$`	`^TASK-`
Capturing groups	`(...)`	`(\d+)-(\d+)`
Non-capturing groups	`(?:...)`	`(?:foo\|bar)`
Lookahead	`(?=...)`, `(?!...)`	`\d+(?= items)`

Optional features (SHOULD support):

Feature	Syntax	Notes
Lookbehind	`(?<=...)`, `(?<!...)`	Supported in ES2018 but not in RE2-based engines
Named groups	`(?<name>...)`	Supported in ES2018 but not in RE2-based engines

Implementations that do not support optional features MUST reject patterns using those features with a clear error rather than silently ignoring them.

Invalid regex patterns in type definitions MUST be rejected at type load time with invalid_type_definition.

ReDoS mitigations: Implementations SHOULD guard against Regular Expression Denial of Service (ReDoS) by:

Setting timeouts on regex evaluation
Rejecting patterns with known dangerous constructs (e.g., nested quantifiers)
Documenting any regex restrictions

Environment Variables

If an implementation supports environment variable expansion in configuration (e.g., ${VAR}), it MUST:

Only expand variables explicitly referenced in configuration
Never log expanded values that may contain secrets
Document which config fields support expansion

Expression Evaluation

Implementations SHOULD set resource limits on expression evaluation:

Maximum expression nesting depth
Maximum number of function calls per evaluation
Timeout for individual expression evaluations

These limits prevent pathological expressions from consuming unbounded resources.

5. Types

This section defines how types (schemas) are created, structured, and interpreted. In this specification, types are markdown files—they live in a designated folder, have frontmatter that defines the schema, and body content that provides documentation.

5.1 Types as Markdown Files

A type is defined by a markdown file in the types folder (default: _types/). The file's frontmatter contains the schema definition; the body contains documentation for the type.

Example: _types/task.md

---
name: task
description: A task or todo item with status tracking
extends: base
strict: false

fields:
  title:
    type: string
    required: true
    description: Short summary of the task
  status:
    type: enum
    values: [open, in_progress, blocked, done]
    default: open
  priority:
    type: integer
    min: 1
    max: 5
    default: 3
  due_date:
    type: date
  tags:
    type: list
    items:
      type: string
    default: []
  assignee:
    type: link
    target: person
---

# Task

A task represents a discrete unit of work that can be tracked through its lifecycle.

## Status Values

- **open**: Not yet started
- **in_progress**: Currently being worked on
- **blocked**: Cannot proceed due to external dependency
- **done**: Completed

## Usage

Tasks are typically stored in the `tasks/` folder. Example:

```yaml
---
type: task
title: Fix the login bug
status: in_progress
priority: 4
due_date: 2024-03-15
assignee: "[[alice]]"
tags: [bug, auth]
---

The login form throws an error when...

person - For assignees
project - Tasks can belong to projects


This approach has several benefits:

1. **Documentation lives with the schema**: The markdown body explains how to use the type
2. **Version control friendly**: Types are tracked like any other content
3. **Human readable**: Anyone can understand the type by reading the file
4. **Editable anywhere**: No special tooling required to modify schemas
5. **Meta-consistency**: Types use the same format as the content they describe

---

## 5.2 Type Definition Schema

The frontmatter of a type file MUST conform to this structure:

```yaml
# =============================================================================
# REQUIRED
# =============================================================================

# The type name (must match filename without extension)
name: task

# =============================================================================
# OPTIONAL: Metadata
# =============================================================================

# Human-readable description
description: "A task or todo item"

# Optional schema version for migration tooling
version: 1

# Field to use as a human-readable display name for records of this type
# If missing or empty on a record, implementations MUST fall back to file.basename
display_name_key: title

# Type to inherit fields from
extends: base

# Strictness mode (overrides settings.default_strict)
# false: Allow unknown fields
# true: Reject unknown fields
# "warn": Allow but warn about unknown fields
strict: false

# =============================================================================
# OPTIONAL: Matching Rules
# =============================================================================

# Rules for automatically associating files with this type
# If not specified, files must explicitly declare their type
match:
  path_glob: "tasks/**/*.md"
  fields_present: [status]
  where:
    # Field predicates using expression operators
    tags:
      contains: "task"

# =============================================================================
# OPTIONAL: Path Pattern
# =============================================================================

# Pattern for validating/generating filenames
# Variables in {} reference field values
path_pattern: "{id}.md"

# =============================================================================
# REQUIRED (unless extends provides all fields)
# =============================================================================

# Field definitions
fields:
  field_name:
    type: string
    required: false
    # ... field options

Optional metadata fields:

description: Human-readable description of the type
version: Positive integer for migration tooling (informational only; see §5.11.2)
display_name_key: Field used as a human-readable label (see §5.13)

5.3 The `name` Field

Every type MUST have a name field that matches the filename (without extension).

_types/task.md    →  name: task
_types/person.md  →  name: person

If the name doesn't match the filename, implementations MUST emit a warning and use the name value as the canonical type name.

Names MUST:

Consist of lowercase letters, numbers, hyphens, and underscores
Start with a letter
Not exceed 64 characters
Be non-empty

Type names are canonicalized to lowercase. Implementations SHOULD treat type names case-insensitively when reading frontmatter (type/types) and SHOULD normalize them to lowercase for matching and output while emitting a warning for non-canonical casing.

Reserved names (MUST NOT be used):

Names starting with _ (reserved for internal use)
file, formula, this (reserved keywords in expressions)

5.4 Type Inheritance

Types MAY inherit from another type using the extends field:

# _types/base.md
---
name: base
fields:
  id:
    type: string
    required: true
  created_at:
    type: datetime
    generated: now
  updated_at:
    type: datetime
    generated: now_on_write
---

# _types/task.md
---
name: task
extends: base
fields:
  title:
    type: string
    required: true
  status:
    type: enum
    values: [open, done]
---

The task type inherits id, created_at, and updated_at from base, and adds title and status.

Inheritance Rules

Single inheritance only: A type can extend at most one parent type
Chains allowed: task extends base extends root is valid
Field override: Child fields with the same name override parent fields completely
Circular inheritance: MUST be detected and rejected with an error
Missing parent: If the parent type doesn't exist, validation MUST fail
Strictness: Child inherits parent's strict unless explicitly overridden

Field Override Example

# Parent defines priority as 1-3
# _types/base-task.md
fields:
  priority:
    type: integer
    min: 1
    max: 3

# Child redefines priority as 1-5
# _types/task.md
extends: base-task
fields:
  priority:
    type: integer
    min: 1
    max: 5  # Now allows 4 and 5

The child completely replaces the parent's field definition; constraints are not merged.

Important: Complete replacement includes all field properties (generated, default, required, unique, deprecated, and constraints). If a child redefines a field and omits a property that existed on the parent, that property is lost. Implementations SHOULD warn if a child override removes a generated strategy from a parent field.

5.5 Strictness

The strict field controls how unknown fields are handled during validation:

Value	Behavior
`false`	Unknown fields are allowed without warning
`"warn"`	Unknown fields are allowed but trigger warnings
`true`	Unknown fields cause validation failure

Default: Inherits from settings.default_strict in the config (which defaults to false).

"Unknown fields" are fields in a file's frontmatter that are not defined in the type's schema (including inherited fields).

5.6 Path Patterns

The optional path_pattern defines expected relative paths for a type, not just filenames. It may include subdirectories (e.g., people/{slug}.md).

For backward compatibility, filename_pattern is accepted as a deprecated alias for path_pattern. If both are present, path_pattern takes precedence and implementations SHOULD emit a warning.

Patterns use {} to reference field values. Common placeholders:

{id}: The id field value
{slug}: A URL-safe slug (implementations should slugify automatically)
{date}: A date field formatted as YYYY-MM-DD

Use cases:

Validation: Check that existing paths match the pattern
Generation: When creating new files, derive path from the effective frontmatter (defaults + generated values)

Slugification rules:

Lowercase all characters
Replace spaces and special characters with hyphens
Remove consecutive hyphens
Trim hyphens from start and end
Unicode handling: Implementations MUST use Unicode-aware lowercasing (not locale-dependent). Non-ASCII letters SHOULD be transliterated to their ASCII equivalents where a well-known mapping exists (e.g., ü → u, ñ → n). Characters with no ASCII equivalent SHOULD be removed rather than replaced with hyphens.

Unresolvable variables:

If a {variable} refers to a field that has no value (null/undefined/empty string), path derivation MUST fail with path_required.
If a {variable} refers to a field that is not defined in the type schema, type loading SHOULD emit a warning and path derivation MUST fail with path_required.
If a {variable} refers to a computed field, the type definition MUST be rejected with invalid_type_definition (computed fields are evaluated at read time and cannot be used for path derivation).

Path safety: The resolved path MUST be a valid relative path and MUST NOT contain path traversal (..) or invalid characters. Invalid paths MUST fail with invalid_path.

Generated field restriction: If path_pattern references a generated field that depends on file.* metadata, implementations MUST reject the type definition with invalid_type_definition to avoid circular dependencies (see §7.15).

5.7 Type Loading Order

When loading types from the types folder:

Scan all .md files in the types folder (including subdirectories)
Parse each file's frontmatter
Build a dependency graph based on extends relationships
Detect and reject circular dependencies
Load types in dependency order (parents before children)
Merge inherited fields into each type's effective schema

5.8 Built-in vs User Types

This specification does not define built-in content types. All content types are user-defined via markdown files.

However, implementations MUST create a meta type during collection initialization (see §12.12) that describes type definition files themselves. This meta type:

Lives in the types folder
Matches files in the types folder
Enables validating type files as ordinary records

Meta type requirements:

# _types/meta.md (or in the configured types folder)
---
name: meta
description: Schema for type definition files

match:
  path_glob: "_types/**/*.md"

strict: false

fields:
  name:
    type: string
    required: true
  description:
    type: string
  display_name_key:
    type: string
  extends:
    type: string
  strict:
    type: enum
    values: ["true", "false", "warn"]
  match:
    type: object
  path_pattern:
    type: string
  filename_pattern:
    type: string
  fields:
    type: any
---

Implementations MUST adjust match.path_glob to the configured types folder if settings.types_folder is customized. The fields property uses type any because field definitions are polymorphic. The strict field is modeled as a string enum because the type system does not support union types; YAML booleans are coerced to strings via §7.16.

Implementations MAY also provide starter templates for common types (task, note, person, etc.) that users can copy into their types folder and customize.

5.9 Creating Type Files Programmatically

Implementations MUST provide a way to create type definition files. This is a normal write operation that:

Validates the type definition schema
Checks for name conflicts with existing types
Writes the file to the types folder
Reloads the types registry

Example CLI interaction:

# Create a new type interactively
mdbase type create

# Create with a template
mdbase type create --from-template task

# Scaffold from an existing file's frontmatter
mdbase type create --infer-from notes/example.md

5.10 Type Documentation (Body Content)

The body of a type file is documentation. It has no semantic effect on the schema but SHOULD explain:

The purpose of the type
How to use each field
Example files
Relationships with other types
Best practices

Implementations MAY render this documentation in tooling (e.g., showing field help, type browser).

5.11 Schema Evolution

When a type definition changes, existing files are NOT automatically migrated — files are the source of truth. The following rules define what happens for each kind of schema change:

Field added (optional): Existing files without the field remain valid. The field is undefined (not null) until explicitly set. If a default is specified in the type definition, it applies to the effective value at read/query/validation time.

Field added (required): Existing files without the field fail validation. Implementations MUST report missing_required errors. Users must add the field to affected files manually or via batch update.

Field removed: Existing files with the removed field are treated as having an unknown field. Behavior depends on the type's strict setting (see §5.5). No data is deleted from files.

Field type changed: Existing files with values of the old type fail validation with type_mismatch. No automatic coercion of persisted data is performed.

Field renamed: The specification does not track field renames — this is equivalent to removing one field and adding another. Implementations MAY provide a batch rename tool as a convenience.

Type renamed: Existing files with type: old_name fail type matching. Implementations MUST provide a batch update command to update type declarations across files.

Type removed: Existing files with an explicit type that no longer exists MUST report unknown_type during validation. Implementations SHOULD NOT delete or rewrite files automatically.

Inheritance changed: The effective schema is recomputed. Fields gained from a new parent apply the same rules as "field added." Fields lost apply the same rules as "field removed."

Materializing defaults: Defaults are persisted based on settings.write_defaults (see §4). If a default is written, it MUST equal the declared default at the time of the write.

Migration strategy: Validation is the baseline mechanism, but implementations MAY also support migration manifests and the backfill operation for structured, repeatable migrations (see §5.11.1 and §12.8).

5.11.1 Migration Manifests (Level 6)

Migration manifests provide a declarative, ordered list of schema and data changes. Level 6 implementations MUST support migration manifests and the migrate operation (§12.13). Lower-level implementations MAY ignore them.

Location:
By default, manifests live under the migrations folder:

settings.migrations_folder if set
otherwise <types_folder>/_migrations/ (e.g., _types/_migrations/)

Format: A migration manifest is a markdown file. Its frontmatter contains a steps array. The body is free-form documentation.

Execution: Migration manifests are executed via the migrate operation (§12.13).

---
id: "2026-02-03-migrate-tasks"
description: "Add status and backfill existing tasks"
steps:
  - id: "add-status-field"
    op: add_field
    type: task
    field: status
    optional: true
    default: "open"

  - id: "backfill-status"
    op: backfill
    type: task
    fields: [status]
    apply:
      defaults: true
      generated: false
    dry_run: false
---

Supported op values:

add_field (schema change, descriptive)
rename_field (schema change, descriptive)
change_type (schema change, descriptive)
rename_type (schema change, descriptive)
move_path (schema change, descriptive)
backfill (data change; see §12.8)

Semantics:

Schema-change steps (add_field, rename_field, change_type, rename_type, move_path) are descriptive. They document the intended change but do not mutate files by themselves. Implementations MAY provide tooling to automate these, but automation is not required for conformance.
backfill steps MUST execute the Backfill operation (§12.8) with the provided parameters.
Steps MUST be executed in order. A failure in one step MUST stop execution. Implementations MAY expose an override, but it is not required for conformance.
Migration manifests MUST NOT be treated as type definition files, even though they live under the types folder. The migrations folder is excluded from type loading and from record scans.

5.11.2 Schema Versioning (Optional)

Types MAY include a version integer in their frontmatter. Tools MAY also write a _schema_version field into records as part of migrations or backfills. These fields are informational and MUST NOT affect validation unless an implementation opts into a migration system.

Rules:

version MUST be a positive integer if present.
_schema_version, if present on a record, MUST be a positive integer.
Implementations MUST ignore _schema_version during validation unless explicitly documented otherwise.

5.12 Computed Fields

Type definitions MAY include fields with a computed property containing an expression:

fields:
  full_name:
    type: string
    computed: "first_name + ' ' + last_name"
  overdue:
    type: boolean
    computed: "due_date < today() && status != 'done'"

Rules

Computed fields are evaluated at read time and are NOT persisted to the file
They are available in queries, formulas, and expressions like any other field
Computed fields MUST NOT be required (they are always derived)
Computed fields MUST NOT have default or generated — these are mutually exclusive mechanisms
Computed fields evaluate against effective frontmatter (defaults applied, computed excluded)
If a file contains a frontmatter key matching a computed field name, the persisted value is ignored and the computed value takes precedence. Implementations SHOULD emit a warning
If a type definition has both computed and required: true on a field, implementations MUST reject the type definition with an invalid_type_definition error

Conformance

Computed fields are a Level 3 (Querying) capability. Implementations below Level 3 MUST still load type definitions containing computed fields without error, but MUST ignore the computed property and treat the field as a regular (non-computed) field. This ensures type definitions are portable across conformance levels.

5.13 Display Name Key

Types MAY define display_name_key, which indicates which field should be used as a human-readable label for records of that type (e.g., title, name, subject). The field SHOULD be defined in the type schema. If the field is missing or empty on a record, implementations MUST fall back to file.basename (the filename without the final extension).

Evaluation Order

Non-computed fields are resolved first, then computed fields in dependency order. Computed fields MAY reference other computed fields, which are resolved via dependency ordering.

Circular computed field dependencies MUST be detected and rejected with a circular_computed error.

Inheritance

Computed fields from parent types are inherited and MAY be overridden by child types.

Example

# _types/task.md
---
name: task
fields:
  first_name:
    type: string
  last_name:
    type: string
  full_name:
    type: string
    computed: "first_name + ' ' + last_name"
  due_date:
    type: date
  status:
    type: enum
    values: [open, in_progress, done]
  is_overdue:
    type: boolean
    computed: "due_date < today() && status != 'done'"
---

5.14 Complete Type File Example

---
name: meeting-note
description: Notes from a meeting
extends: base

strict: "warn"

match:
  path_glob: "meetings/**/*.md"
  fields_present: [date, attendees]

path_pattern: "{date}-{title}.md"

fields:
  title:
    type: string
    required: true
    description: Meeting title or topic
  
  date:
    type: date
    required: true
    description: Date the meeting occurred
  
  attendees:
    type: list
    items:
      type: link
      target: person
    min_items: 1
    description: People who attended
  
  agenda:
    type: list
    items:
      type: string
    default: []
    description: Planned discussion topics
  
  decisions:
    type: list
    items:
      type: object
      fields:
        topic:
          type: string
          required: true
        decision:
          type: string
          required: true
        owner:
          type: link
          target: person
    default: []
    description: Decisions made during the meeting
  
  action_items:
    type: list
    items:
      type: link
      target: task
    default: []
    description: Tasks created from this meeting
  
  next_meeting:
    type: date
    description: Scheduled follow-up date
---

# Meeting Note

Meeting notes capture discussions, decisions, and action items from team meetings.

## Required Fields

- **title**: A short, descriptive title (e.g., "Q1 Planning", "Design Review")
- **date**: When the meeting occurred (YYYY-MM-DD format)
- **attendees**: At least one person must be linked

## Decisions Format

Decisions are structured objects with:
- `topic`: What was being decided
- `decision`: The outcome
- `owner`: Who is responsible for follow-through

```yaml
decisions:
  - topic: API versioning strategy
    decision: Use URL path versioning (/v1/, /v2/)
    owner: "[[alice]]"

Linking to Tasks

Action items should be created as separate task files and linked:

action_items:
  - "[[tasks/update-api-docs]]"
  - "[[tasks/create-v2-endpoints]]"

Example

---
type: meeting-note
title: Sprint Planning
date: 2024-03-01
attendees:
  - "[[alice]]"
  - "[[bob]]"
  - "[[charlie]]"
agenda:
  - Review last sprint
  - Estimate new stories
  - Assign work
decisions:
  - topic: Sprint length
    decision: Keep 2-week sprints
    owner: "[[alice]]"
action_items:
  - "[[tasks/story-123]]"
next_meeting: 2024-03-15
---

## Discussion

Sprint velocity was 42 points last sprint...

6. Type Matching

This section defines how files are associated with types. Unlike traditional schemas where each record belongs to exactly one table, this specification supports multi-type matching: a file may match zero, one, or multiple types simultaneously.

6.1 Matching Overview

Type matching determines which types apply to a file. This happens:

When reading a file (to know which schemas to validate against)
When querying (to filter by type)
When updating (to apply type-specific validation)

A file's types are determined by:

Explicit declaration (highest precedence): If the file's frontmatter contains a type key (e.g., type: task), only those declared types apply
Match rules: If no explicit declaration, each type's match rules are evaluated; all matching types apply
Untyped: If nothing matches, the file is untyped

6.2 Explicit Type Declaration

Files can explicitly declare their type(s) using frontmatter keys defined in settings.explicit_type_keys (default: type and types). Type names SHOULD be treated case-insensitively when read from frontmatter and normalized to lowercase for matching; non-canonical casing SHOULD emit a warning.

If you want to use a field like type as ordinary data, remove it from settings.explicit_type_keys and use different keys (e.g., kind, kinds) for type declarations.

Custom keys replace the defaults. When settings.explicit_type_keys is non-empty, the default keys type and types have no special meaning and are treated as ordinary fields.

Single Type

---
type: task
title: Fix the bug
---

This file is a task and only task. Match rules are not evaluated.

Multiple Types

---
types: [task, urgent]
title: Fix critical security bug
---

This file is both a task and an urgent record. It must validate against both schemas.

Precedence

If both type and types are present, implementations SHOULD prefer types (the plural form).

---
type: task          # Ignored when types is present
types: [task, bug]  # This is used
---

6.3 Match Rules

Types can define rules for automatically associating files without explicit declaration. Match rules are specified in the type's match field:

# _types/task.md
---
name: task
match:
  path_glob: "tasks/**/*.md"
  fields_present: [status, due_date]
  where:
    status:
      exists: true
    priority:
      gte: 1
---

All conditions in match are combined with AND logic—all must be true for the type to match.

6.4 Match Conditions

`path_glob`

Matches files by their path relative to the collection root.

match:
  path_glob: "tasks/**/*.md"

Glob syntax:

* matches any characters except /
** matches any characters including /
? matches a single character

Examples:

Pattern	Matches
`tasks/*.md`	`tasks/foo.md`, not `tasks/sub/foo.md`
`tasks/*/.md`	Any `.md` in `tasks/` or subdirectories
`*.task.md`	`foo.task.md`, `bar.task.md`
`notes/2024-*.md`	`notes/2024-01.md`, `notes/2024-12.md`

`fields_present`

Matches files that have all specified fields present and non-null.

match:
  fields_present: [status, assignee]

A field is "present" for matching purposes if:

The key exists in frontmatter, AND
The value is not null

Note: This is stricter than the exists() expression function, which returns true even for null values. The fields_present match condition requires a meaningful (non-null) value.

`where`

Matches files based on field value conditions. This uses a subset of the expression language operators:

match:
  where:
    # Exact equality
    kind: "task"
    
    # Field exists and is non-null
    status:
      exists: true
    
    # Comparison operators
    priority:
      gte: 3
    
    # List contains
    tags:
      contains: "important"
    
    # String prefix
    title:
      startsWith: "URGENT:"

Available operators in where:

Operator	Description	Example
(direct value)	Exact equality	`status: open`
`exists`	Field is present and non-null (true) or missing/null (false)	`assignee: { exists: true }`
`eq`	Equal to	`priority: { eq: 3 }`
`neq`	Not equal to	`status: { neq: "done" }`
`gt`	Greater than	`priority: { gt: 2 }`
`gte`	Greater than or equal	`priority: { gte: 3 }`
`lt`	Less than	`priority: { lt: 4 }`
`lte`	Less than or equal	`priority: { lte: 5 }`
`contains`	List contains value	`tags: { contains: "bug" }`
`containsAll`	List contains all values	`tags: { containsAll: ["bug", "urgent"] }`
`containsAny`	List contains any value	`tags: { containsAny: ["bug", "feature"] }`
`startsWith`	String starts with	`title: { startsWith: "WIP:" }`
`endsWith`	String ends with	`file: { endsWith: ".draft.md" }`
`matches`	Regex match (see §4.8 for regex flavor)	`title: { matches: "^TASK-\\d+" }`

Note: where: { field: { exists: true } } uses the same semantics as fields_present: the key must exist AND the value must be non-null. This keeps the match rule system internally consistent — both mechanisms agree on what "present" means.

Missing/null behavior: For all where operators, if the field is missing or null, the condition evaluates to false (no match). This is a silent non-match, not an error.

Evaluation errors: If a where condition triggers an evaluation error (e.g., type mismatch on a method call), the condition evaluates to false and the type does not match. Implementations MUST NOT error or warn during match evaluation.

Match `where` vs Query `where`

The match rule where clause uses a YAML-structured form with operator keys:

# Match rule where (YAML-structured)
match:
  where:
    status:
      neq: "done"

The query where clause (see Querying §10.3) uses expression strings:

# Query where (expression string)
where: 'status != "done"'

These are two distinct syntaxes. Match rules use the structured form because they are evaluated during type matching (before the expression engine is available). Query where clauses use expression strings for greater flexibility.

Match rules are evaluated against effective frontmatter (defaults applied, computed excluded). Computed fields are not available during matching. Type definitions that reference computed fields in match.where MUST be rejected with invalid_type_definition.

6.5 Multi-Type Matching

When a file matches multiple types (whether by explicit declaration or match rules), the file must conform to all matched types.

Validation

The file is validated against each type's schema. All validations must pass:

# File: tasks/urgent-bug.md
---
types: [task, urgent]
title: Fix login
status: open
escalation_contact: alice@example.com
---

This file must:

Have all required fields from task
Satisfy all constraints from task
Have all required fields from urgent
Satisfy all constraints from urgent

Field Conflicts

When two types define the same field differently:

Compatible definitions occur when both types define the same field with the same base type. Constraints are merged by taking the most restrictive intersection:

Constraint	Merge Rule
`required`	`true` if EITHER type requires it
`min` / `min_length` / `min_items`	Take the higher minimum
`max` / `max_length` / `max_items`	Take the lower maximum
`pattern`	Value must match all patterns
`values` (enum)	Take the intersection of allowed values
`default`	If both define defaults, they MUST be equal; otherwise it is an error
`unique` (list)	`true` if EITHER type requires it
`unique` (cross-file)	`true` if EITHER type requires it

# Type A: priority as integer 1-5
# Type B: priority as integer 1-3
# Effective: priority as integer 1-3 (most restrictive)

Composite and advanced constraints:

Constraint	Merge Rule
`list.items`	Item schemas MUST be compatible and are merged recursively using these same rules
`object.fields`	Fields are merged by name; overlapping fields are merged recursively
`generated`	If both define `generated`, the values MUST be identical; otherwise error
`deprecated`	`true` if EITHER type marks the field deprecated
`link.target`	If both define `target`, they MUST be identical; otherwise error
`link.validate_exists`	`true` if EITHER type sets it to true

Incompatible definitions occur when:

The base types differ (e.g., string vs integer)
Enum intersections produce an empty set
Merged min exceeds merged max

# Type A: status as string
# Type B: status as enum [open, closed]
# Incompatible: different base types → validation error

When field types are incompatible, implementations MUST report a type_conflict error. The file cannot satisfy both schemas simultaneously.

Implementation Notes (Non-Normative)

Suggested merge strategy for multi-type validation:

Collect matched types first; do not attempt eager conflict resolution during type load.
Build a merged constraint map per field at validation time.
For each overlapping field, merge constraints by intersection rules in this section.
Fail early with type_conflict for base-type mismatch, empty enum intersection, or min > max.
Keep per-source metadata (type, field) so diagnostics can point to both conflicting definitions.

Querying

A multi-type file appears in queries for ANY of its matched types:

# Query for tasks
query:
  types: [task]
# Returns files that are tasks (including files that are also other types)

# Query for files that are BOTH task AND urgent
query:
  where:
    and:
      - 'types.contains("task")'
      - 'types.contains("urgent")'

6.6 Matching Evaluation Order

Check explicit declaration: If type or types is in frontmatter, use those types exclusively. Stop.
Evaluate match rules: For each type with match rules, evaluate all conditions:
- If all conditions pass, the type matches
- A type without match rules never matches implicitly
Collect matches: The file's types are all types that matched in step 2.
Untyped: If no types matched, the file is untyped.

6.7 Match Rule Examples

Path-Based Matching

# _types/task.md
match:
  path_glob: "tasks/**/*.md"

All files in tasks/ are tasks.

Field-Based Matching

# _types/actionable.md
match:
  fields_present: [due_date]

Any file with a due_date field is actionable.

Tag-Based Matching

# _types/urgent.md
match:
  where:
    tags:
      contains: "urgent"

Any file tagged "urgent" matches this type.

Combined Matching

# _types/active-task.md
match:
  path_glob: "tasks/**/*.md"
  fields_present: [status, assignee]
  where:
    status:
      neq: "done"

Files in tasks/ with status and assignee fields, where status is not "done".

6.8 Type-Only Files (No Matching)

A type without match rules will never automatically match files. Files must explicitly declare the type:

# _types/template.md
---
name: template
# No match rules
fields:
  template_name:
    type: string
    required: true
---

This type only applies to files that declare type: template or types: [template, ...].

6.9 The `types` Property in Expressions

In expressions, files have a types property (list of strings) representing their matched types:

# Filter: files that are tasks
filters: 'types.contains("task")'

# Filter: files that are both task and urgent
filters: 'types.contains("task") && types.contains("urgent")'

# Filter: files that have no type
filters: "types.length == 0"

6.10 Debugging Type Matching

Implementations SHOULD provide a way to see why a file matched (or didn't match) specific types. For example:

# Show matching analysis for a file
mdbase debug match tasks/fix-bug.md

# Output:
# tasks/fix-bug.md
# ├── Explicit types: none
# ├── Matched types: [task, urgent]
# │   ├── task: matched via path_glob "tasks/**/*.md"
# │   └── urgent: matched via where.tags.contains("urgent")
# └── Unmatched types:
#     └── done: failed where.status.eq("done")

7. Field Types and Constraints

This section defines the data types that can be used in type definitions, along with their constraints and validation rules.

7.1 Field Definition Structure

Every field in a type's fields section has this structure:

fields:
  field_name:
    # Required: the data type
    type: string
    
    # Optional: is this field required?
    required: false
    
    # Optional: default value if field is missing
    default: "untitled"
    
    # Optional: auto-generation strategy
    generated: now
    
    # Optional: human-readable description
    description: "A brief summary"
    
    # Optional: mark as deprecated
    deprecated: false
    
    # Type-specific constraints (see each type below)

7.2 Common Field Options

These options apply to all field types:

`type` (Required)

The data type. One of: string, integer, number, boolean, date, datetime, time, enum, list, object, link, any.

`required`

Whether the field must be present and non-null.

Value	Behavior
`false` (default)	Field may be missing or null
`true`	Field must be present and non-null

`default`

A default value applied to the effective value when the field is missing. The default is NOT applied when the field is present but null. Defaults are not required to be persisted unless the caller explicitly requests materialization (see §5.11).

status:
  type: enum
  values: [open, done]
  default: open  # Applied only if 'status' key is absent

`generated`

Automatic value generation. See 7.15 Generated Fields.

`description`

Human-readable description of the field's purpose. Implementations MAY display this in tooling.

`deprecated`

Mark a field as deprecated. Implementations SHOULD warn when deprecated fields are used.

`unique`

Cross-file uniqueness constraint.

Value	Behavior
`false` (default)	No uniqueness checking
`true`	Field value MUST be unique across all files matching the declaring type

Rules:

Null/undefined values are exempt from uniqueness checks — multiple files may omit the field
For multi-type files: uniqueness is checked within each type's file set independently
Validation of uniqueness requires scanning all files of the type. Implementations SHOULD use caching for performance
Error code: duplicate_value, reported with the field name, conflicting file paths, and the duplicate value

Note: settings.id_field implicitly has unique: true behavior (see §4.4). The unique option makes cross-file uniqueness available for any field.

Example:

fields:
  slug:
    type: string
    unique: true
  email:
    type: string
    unique: true

7.3 `string`

A text value.

title:
  type: string
  required: true
  min_length: 1
  max_length: 200
  pattern: "^[A-Z].*"

Constraints:

Constraint	Type	Description
`min_length`	integer	Minimum string length
`max_length`	integer	Maximum string length
`pattern`	string	Regex pattern the value must match

Validation:

Value must be a string (or coercible to string)
Length constraints apply to character count (not bytes)
Pattern uses ECMAScript (ES2018+) regular expression syntax as the required baseline (see §4.8 for the full regex specification)

7.4 `integer`

A whole number.

priority:
  type: integer
  min: 1
  max: 5
  default: 3

Constraints:

Constraint	Type	Description
`min`	integer	Minimum value (inclusive)
`max`	integer	Maximum value (inclusive)

Validation:

Value must be a whole number (no decimal part)
YAML integers, strings containing integers, and floats with no decimal part MAY be coerced
Implementations MUST support at least the signed 53-bit integer range (−9,007,199,254,740,991 to 9,007,199,254,740,991). Values outside this range MAY be rejected with constraint_violation.

7.5 `number`

A floating-point number.

rating:
  type: number
  min: 0.0
  max: 5.0

Constraints:

Constraint	Type	Description
`min`	number	Minimum value (inclusive)
`max`	number	Maximum value (inclusive)

Validation:

Value must be numeric (integer or float)
Implementations MUST support at least IEEE 754 double-precision numbers. Higher-precision support is optional.
Floating-point precision and rounding follow IEEE 754 double semantics.
IEEE 754 special values (NaN, Infinity) are allowed unless explicitly constrained
When min or max is defined, NaN MUST produce constraint_violation because NaN is not orderable — all comparisons with NaN return false per IEEE 754. Infinity values are orderable and compared normally.

7.6 `boolean`

A true/false value.

draft:
  type: boolean
  default: false

Validation:

Accepts YAML boolean values: true, false
Implementations SHOULD also accept YAML 1.1 boolean spellings: yes, no, on, off
These should be normalized to true/false on write

7.7 `date`

A calendar date without time.

due_date:
  type: date

Format: ISO 8601 date: YYYY-MM-DD

Examples: 2024-03-15, 2024-12-01

Validation:

Must be a valid date (no February 30th)
String format must match ISO 8601
Year range MUST be between 0001 and 9999 (inclusive)
YAML date scalars MAY be accepted and MUST be normalized to ISO 8601 on write

7.8 `datetime`

A date with time.

created_at:
  type: datetime

Format: ISO 8601 datetime with optional timezone:

YYYY-MM-DDTHH:MM:SS
YYYY-MM-DDTHH:MM:SSZ
YYYY-MM-DDTHH:MM:SS+HH:MM

Examples:

2024-03-15T10:30:00
2024-03-15T10:30:00Z
2024-03-15T10:30:00+05:30

Validation:

Must be valid datetime
Year range MUST be between 0001 and 9999 (inclusive)
Implementations MUST preserve timezone information if present
YAML timestamp scalars MAY be accepted and MUST be normalized to ISO 8601 on write

Timezone Comparison Rules

Datetime values with explicit offsets are compared as absolute instants (convert to a common epoch before comparison)
Datetime values WITHOUT offsets (naive) are treated as local time in the implementation's configured timezone
Comparing an offset-aware datetime with a naive datetime: the naive datetime is interpreted in local time, then both are compared as absolute instants
now() returns an offset-aware datetime in settings.timezone if configured; otherwise local timezone
today() returns a date in settings.timezone if configured; otherwise local timezone
Date arithmetic preserves offset: datetime_with_offset + "1d" keeps the same offset
Serialization MUST preserve the original offset if present. 2024-03-15T10:00:00+05:30 MUST NOT be normalized to UTC on write
Implementations MAY provide a configuration option for default timezone (not specified in this version — use the local system timezone)

See also §11.7 for date/time functions in expressions.

7.9 `time`

A time without date.

meeting_time:
  type: time

Format: HH:MM or HH:MM:SS

Examples: 14:30, 09:00:00

7.10 `enum`

A value from a fixed set of options.

status:
  type: enum
  values: [draft, review, published, archived]
  default: draft

Required constraint:

Constraint	Type	Description
`values`	list	The allowed values (must be strings)

Validation:

Value must exactly match one of the values entries
Comparison is case-sensitive
Enum values MUST be strings
The values list MUST be non-empty. Empty lists MUST be rejected with invalid_type_definition.

7.11 `list`

An ordered collection of values.

tags:
  type: list
  items:
    type: string
  min_items: 0
  max_items: 10
  unique: true

Required constraint:

Constraint	Type	Description
`items`	field definition	The type of each list element

Optional constraints:

Constraint	Type	Description
`min_items`	integer	Minimum list length
`max_items`	integer	Maximum list length
`unique`	boolean	If true, no duplicate values allowed

Validation:

Value must be a YAML list
Each element is validated against items
Type coercion rules (§7.16) apply to list elements in the same way they apply to top-level fields
If unique: true, duplicates cause validation failure

Nested lists:

matrix:
  type: list
  items:
    type: list
    items:
      type: number

7.12 `object`

A nested structure with its own fields.

author:
  type: object
  fields:
    name:
      type: string
      required: true
    email:
      type: string
    url:
      type: string

Required constraint:

Constraint	Type	Description
`fields`	mapping	Field definitions for the nested object

Validation:

Value must be a YAML mapping
Each field is validated according to its definition
Unknown fields are handled according to type's strictness

Implementations MAY enforce a maximum nesting depth for object/list schemas to prevent pathological definitions. Implementations MUST support at least 16 levels of nesting; deeper schemas MAY be rejected with invalid_type_definition.

7.13 `link`

A reference to another file in the collection.

parent_task:
  type: link
  target: task
  validate_exists: false

related:
  type: list
  items:
    type: link

Optional constraints:

Constraint	Type	Description
`target`	string	Type name to constrain resolution scope
`validate_exists`	boolean	If true, validate that target file exists

Accepted formats:

Wikilinks: [[target]], [[target|alias]], [[folder/target]]
Markdown links: [text](path.md), [text](./relative.md)
Bare paths: ./sibling.md, ../parent/file.md

See Links for detailed parsing and resolution rules.

7.14 `any`

Accepts any valid YAML value.

metadata:
  type: any

Use cases:

Migration: Temporarily accept untyped data
Flexible schemas: When structure varies
Extension points: Allow arbitrary user data

Validation:

Any valid YAML value is accepted
No constraints available

7.15 Generated Fields

Fields can be automatically populated using the generated option:

fields:
  id:
    type: string
    generated: ulid
  
  created_at:
    type: datetime
    generated: now
  
  updated_at:
    type: datetime
    generated: now_on_write
  
  slug:
    type: string
    generated:
      from: title
      transform: slugify

Generation strategies:

Strategy	Description
`ulid`	Generate a ULID (Universally Unique Lexicographically Sortable Identifier)
`uuid`	Generate a UUID v4
`{random: N}`	Generate a random lowercase alphanumeric string of length `N`
`sequence` / `{sequence: { ... }}`	Generate an auto-incrementing integer
`now`	Current datetime (on create only)
`now_on_write`	Current datetime (on every write)
`{from, transform}`	Derive from another field or file metadata

Transform functions for derived fields:

Transform	Description
`slugify`	Convert to URL-safe slug
`lowercase`	Convert to lowercase
`uppercase`	Convert to uppercase

Important rules:

Generated values are only applied when the field is missing
User-provided values are NEVER overwritten by now or ulid/uuid
now_on_write ALWAYS updates the field on every write operation
Generated fields can still have required: true (they'll satisfy the requirement via generation)
random MAY only be used on string fields; sequence MAY only be used on integer fields

Ordering and dependencies:

Generated fields are resolved in dependency order. A generated field MAY reference another generated field.
If a generated field depends on another generated field, the dependency MUST be resolved first.
Cycles in generated-field dependencies MUST be rejected with invalid_type_definition.

Random String Generation

Use {random: N} to generate a random string of length N using the alphabet a-z and 0-9. N MUST be an integer between 1 and 64 (inclusive). Implementations SHOULD use a cryptographically secure random generator when available.

If N is missing, non-integer, or out of range, implementations MUST reject the type definition with invalid_type_definition.

fields:
  short_id:
    type: string
    generated:
      random: 8

Sequence Generation

Use sequence to generate an auto-incrementing integer.

Defaults:

Start at 1
Scope is per-type (each type has its own counter)
No gap reuse (next value is max + 1)

fields:
  number:
    type: integer
    generated: sequence

Optional configuration:

fields:
  number:
    type: integer
    generated:
      sequence:
        start: 100
        scope: type   # or "collection"

Implementations MUST ensure sequence assignment is race-safe (no duplicates) within a collection.

If start is provided, it MUST be an integer. Invalid sequence configuration MUST be rejected with invalid_type_definition.

Derived Values From File Metadata

The {from, transform} strategy MAY reference file metadata using the same file.* names defined in §10.5:

file.name (filename with extension)
file.basename (filename without final extension)
file.ext
file.path
file.folder

On create, these values refer to the final resolved path. On update, they refer to the current file path. If a generated field depends on file.*, that field MUST NOT be referenced by path_pattern (to avoid circular dependencies). Implementations MUST reject such type definitions with invalid_type_definition.

7.16 Type Coercion

When reading values, implementations MUST attempt to coerce compatible types:

Schema Type	Accepts
`string`	Any scalar (converted via toString)
`integer`	Integer, float with no decimal, numeric string
`number`	Integer, float, numeric string
`boolean`	Boolean, "true"/"false" strings, yes/no
`date`	ISO date string, YAML date
`datetime`	ISO datetime string, YAML timestamp

When coercion fails, it is a validation error.

7.17 Summary Table

Type	YAML	Constraints	Notes
`string`	String	`min_length`, `max_length`, `pattern`
`integer`	Integer	`min`, `max`	Whole numbers only
`number`	Float/Int	`min`, `max`	Allows decimals
`boolean`	Boolean	—	Normalized to true/false
`date`	String	—	ISO 8601 date
`datetime`	String	—	ISO 8601 datetime
`time`	String	—	HH:MM or HH:MM:SS
`enum`	String	`values` (required)	Must match exactly
`list`	List	`items` (required), `min_items`, `max_items`, `unique`
`object`	Mapping	`fields` (required)	Nested structure
`link`	String	`target`, `validate_exists`	Reference to file
`any`	Any	—	No validation

8. Links

Links are references from one file to another. They are a first-class concept in this specification due to their prevalence in markdown-based knowledge systems. This section defines link syntax, parsing, resolution, and traversal.

8.1 Why Links Matter

Links transform a folder of files into a knowledge graph. They enable:

Navigation: Jump between related documents
Backlinks: See what references a document
Queries: Find documents by their relationships
Validation: Ensure references point to real files
Refactoring: Rename files without breaking connections

This specification treats links as typed data with well-defined parsing and resolution semantics.

8.2 Link Formats

The link field type accepts three input formats:

8.2.1 Wikilinks

The format popularized by wikis and knowledge management tools:

[[target]]
[[target|alias]]
[[target#anchor]]
[[target#anchor|alias]]
[[folder/target]]
[[./relative]]
[[../parent/target]]

Components:

target: The file being linked to (without extension by default)
alias: Display text (does not affect resolution)
anchor: A heading or block reference within the target
path: May be absolute (from collection root) or relative (from current file)

Examples:

# Simple link
parent: "[[task-001]]"

# Link with alias (alias is metadata, not used for resolution)
assignee: "[[alice|Alice Smith]]"

# Link to specific section
reference: "[[api-docs#authentication]]"

# Relative link
related: "[[./sibling-task]]"

# Path from root
spec: "[[docs/specs/api-v2]]"

8.2.2 Markdown Links

Standard markdown link syntax:

[text](path.md)
[text](./relative.md)
[text](../other/file.md)
[text](path.md#anchor)

The text portion is treated as an alias.

Examples:

parent: "[Parent Task](./tasks/parent.md)"
reference: "[API Docs](docs/api.md#auth)"

8.2.3 Bare Paths

A path without link syntax:

./sibling.md
../other/file.md
folder/file.md

Examples:

config: "./config.md"
parent: "../parent-project/overview.md"

Bare paths follow the same resolution rules as markdown links: they are relative to the containing file's directory unless they start with / (root-relative).

8.3 Link Parsing

When a link value is read, implementations MUST parse it into a structured representation:

Component	Type	Description
`raw`	string	Original string value exactly as written
`target`	string	File path or identifier (without anchor/alias)
`alias`	string?	Display text if provided, otherwise null
`anchor`	string?	Heading/block reference if provided, otherwise null
`format`	enum	One of: `wikilink`, `markdown`, `path`
`is_relative`	boolean	Whether path starts with `./` or `../`

Parsing examples:

Input	target	alias	anchor	format	is_relative
`[[task-001]]`	`task-001`	null	null	wikilink	false
`[[task-001\|My Task]]`	`task-001`	`My Task`	null	wikilink	false
`[[docs/api#auth]]`	`docs/api`	null	`auth`	wikilink	false
`[[./sibling]]`	`./sibling`	null	null	wikilink	true
`[Link](file.md)`	`file.md`	`Link`	null	markdown	false
`./other.md`	`./other.md`	null	null	path	true

8.4 Link Resolution

Resolution transforms a parsed link into an absolute path (relative to collection root) pointing to the target file.

Resolution Algorithm

Given a link value and the path of the file containing it:

Parse the link into components (target, format, is_relative)
If format is markdown or path:
- If target starts with /, resolve from collection root (strip the leading /)
- Otherwise, resolve relative to the containing file's directory (markdown-standard behavior)
- Example: Link [Docs](docs/api.md) in notes/meeting.md resolves to notes/docs/api.md
If format is wikilink:
- If target starts with ./ or ../, resolve relative to the containing file's directory
- If target starts with /, resolve from collection root (strip the leading /)
- If target contains / (and is not relative), resolve from collection root
- Example: [[docs/api]] resolves to docs/api
If simple name (no /, no ./ or ../, and format is wikilink):
- Define the search scope:
  - If the link field has target constraint specifying a type, scope to files matching that type
  - Otherwise, scope to the entire collection
- ID match pass: Search scoped files for id_field == name
  - If exactly one match, resolve to it
  - If multiple matches, resolution MUST fail with ambiguous_link
- Filename match pass: If no id_field match exists, search scoped markdown files (records) by filename
- If multiple filename candidates match, apply tiebreakers in order: a. Same directory: Prefer a file in the same directory as the referring file b. Shortest path: Prefer the file with the shortest path (closest to collection root) c. Alphabetical: Sort candidate paths lexicographically and take the first
- If multiple candidates remain after all tiebreakers, resolve to null and emit an ambiguous_link warning
Note: Non-markdown files are not records and are not candidates for simple name matching (per §2.9). A wikilink [[diagram]] will not match diagram.png — only markdown files are searched by id_field and filename. Non-markdown files are only resolved when explicitly referenced by path (with extension or relative path).
Extension handling:
- If target lacks extension, try configured extensions in order (default: .md)
- Example: [[readme]] tries readme.md, readme.mdx, etc.
Path traversal check:
- After resolution and normalization, if the resolved path would escape the collection root, abort with path_traversal
Return:
- The absolute path (relative to collection root) if found
- null if no matching file exists

Resolution Examples

Given collection structure:

/
├── mdbase.yaml
├── tasks/
│   ├── task-001.md
│   └── subtasks/
│       └── task-002.md
├── notes/
│   └── meeting.md
├── people/
│   └── alice.md
└── journal/
    └── 2024/
        └── 01/
            └── 15.md

Link resolution from tasks/subtasks/task-002.md:

Link Value	Resolved Path	Notes
`[[task-001]]`	`tasks/task-001.md`	Search by name
`[[../task-001]]`	`tasks/task-001.md`	Relative path
`[[./task-003]]`	`tasks/subtasks/task-003.md`	Relative (may not exist)
`[[notes/meeting]]`	`notes/meeting.md`	Absolute from root
`[[meeting]]`	`notes/meeting.md`	Search by name
`[[alice]]`	`people/alice.md`	Search by name
`[link](../task-001.md)`	`tasks/task-001.md`	Markdown, relative
`../task-001.md`	`tasks/task-001.md`	Bare path, relative

8.5 Link Schema Options

When defining a link field in a type:

fields:
  parent:
    type: link
    target: task           # Constrain resolution to 'task' type
    validate_exists: true  # Fail if target doesn't exist
    description: "Parent task this subtask belongs to"
  
  related:
    type: list
    items:
      type: link          # List of links (no constraints)

`target` Constraint

Limits resolution scope to files of a specific type:

assignee:
  type: link
  target: person

When resolving [[alice]] for this field:

Implementation searches only files that match the person type
Matches by the configured id_field (default: id)
If no person type file has id: alice, resolution fails

`validate_exists` Constraint

When true, unresolved links cause validation errors:

parent:
  type: link
  validate_exists: true

Default is false (links can point to non-existent files).

8.6 Link and Tag Extraction (for `file.*` properties)

To support file.links, file.backlinks, file.embeds, and file.tags, implementations MUST extract links and tags from both frontmatter and body content using these rules:

Link Extraction

Included:

Frontmatter fields of type link and list of link
Body links in wikilink form ([[target]], [[target|alias]], [[target#anchor]])
Body links in markdown form ([text](path.md)), including #anchor
Embeds in wikilink form (![[target]]) and markdown form (![alt](path.md))

Excluded:

Links inside fenced code blocks (``` or ~~~ delimiters)
Links inside indented code blocks (4+ spaces or tab-indented lines per CommonMark §4.4)
Links inside inline code spans (` delimiters)
Escaped wikilinks: a backslash immediately before [[ (i.e., \[[) prevents wikilink parsing. The backslash is consumed and the brackets are treated as literal text. This does not apply to markdown links, which follow standard CommonMark escaping rules.

file.links returns all non-embed links; file.embeds returns only embeds.

Tag Extraction

file.tags includes:

Raw persisted frontmatter tags field if present (string or list of strings)
Inline tags in body content of the form #tag

Inline tags MUST:

Be preceded by whitespace or appear at the start of a line
Match the pattern [A-Za-z0-9_/-]+ after # (forward slashes create nested tag hierarchies)
Be outside fenced code blocks, indented code blocks, and inline code spans
Not be preceded by ]( or ](http patterns (to exclude URL fragments)

Implementations SHOULD ignore # fragments in URLs. A simple heuristic is to skip any # that is preceded by ), ", ', or appears within a markdown link target ([text](url#fragment)).

Nested Tags

Tags MAY contain forward slashes (/) to create hierarchies: #inbox/to-read, #project/alpha/urgent.

The file.hasTag() function performs prefix matching on nested tags: file.hasTag("inbox") matches #inbox, #inbox/to-read, and #inbox/processing. This is consistent with Obsidian's tag behavior.

Nesting has no depth limit. The / character is purely conventional — implementations do not need to build a tree structure.

8.7 Link Traversal

Links can be traversed to access properties of the linked file using the asFile() method.

The `asFile()` Method

In expressions, link.asFile() resolves a link to its target file object:

# Filter: tasks assigned to someone on the engineering team
filters: 'assignee.asFile().team == "engineering"'

# Formula: get the parent task's status
formulas:
  parent_status: "parent.asFile().status"

If the link cannot be resolved, asFile() returns null. Subsequent property access on null returns null (no error).

Self-referential links (a file linking to itself) are allowed. asFile() resolves to the same file, and the traversal depth limit prevents infinite loops when chained.

Multi-Hop Traversal

asFile() MAY be chained to traverse multiple links:

assignee.asFile().manager.asFile().name
parent.asFile().project.asFile().status

Each asFile() call resolves the link field on the current file and returns the target file object.

Null propagation: If any hop returns null, the entire chain evaluates to null (no error).

Depth limit: Implementations MUST enforce a maximum traversal depth (default: 10 hops). Exceeding this limit MUST produce an expression_depth_exceeded error. Circular traversal (A → B → A) does not cause infinite loops because the depth limit applies.

Accessing Linked File Properties

Once resolved, you can access:

Frontmatter fields:

parent.asFile().status
parent.asFile().priority
assignee.asFile().name

File metadata:

parent.asFile().file.name
parent.asFile().file.mtime
parent.asFile().file.path

Performance Considerations

Each hop requires loading and parsing the target file. Implementations SHOULD:

Cache resolved files during query execution
Document performance characteristics for multi-hop queries
Consider lazy resolution (only resolve when accessed)
Warn users about expensive traversals in large collections

8.8 Link Functions

The following functions operate on links and files in expressions:

Function	Description	Example
`link.asFile()`	Resolve link to file object	`assignee.asFile().name`
`file.hasLink(target)`	File contains link to target	`file.hasLink(link("tasks/main"))`
`file.links`	List of outgoing links	`file.links.length > 5`
`file.backlinks`	List of incoming links (requires index)	`file.backlinks.length`
`link(path)`	Construct a link from path	`link("people/alice")`

Backlinks

file.backlinks returns files that link or embed the current file (frontmatter link fields, body wikilinks/markdown links, and embeds). This requires either:

A full scan of all files (slow without cache)
A pre-computed reverse index (requires cache)

Implementations SHOULD document whether file.backlinks requires caching for reasonable performance.

Example: Find files linking to current file

filters: "file.hasLink(this.file)"

8.9 Link Storage and Round-Tripping

When writing links to frontmatter, implementations SHOULD preserve the original format when possible:

If user wrote [[note]], prefer outputting [[note]] over ./note.md
If user wrote a relative path, preserve relativity when possible
If user wrote with an alias, preserve the alias

This preserves user intent and keeps files human-readable.

8.10 Links in Body Content

While this specification focuses on frontmatter, links also appear in markdown body content. Implementations SHOULD support:

Parsing links from body content
Updating body links during rename operations
Including body links in file.links

Implementations that do NOT support body link parsing MUST document this limitation. See Operations for rename behavior.

8.11 Broken Links

A "broken link" is a link that cannot be resolved to an existing file. Handling options:

Scenario	Behavior
`validate_exists: false` (default)	Broken links are allowed; `asFile()` returns `null`
`validate_exists: true`	Broken links cause validation errors
Rename operations	Implementations SHOULD update links to maintain validity
Delete operations	Implementations MAY warn about incoming links that will break

8.12 Link Examples

Simple Task Hierarchy

# tasks/parent.md
---
type: task
id: parent
title: Main Feature
subtasks:
  - "[[child-1]]"
  - "[[child-2]]"
---

# tasks/child-1.md
---
type: task
id: child-1
title: Subtask One
parent: "[[parent]]"
---

Cross-Type References

# tasks/implement-api.md
---
type: task
assignee: "[[alice]]"
spec: "[[docs/api-spec]]"
related:
  - "[[tasks/write-tests]]"
  - "[[tasks/update-docs]]"
---

Relative Links

# projects/alpha/tasks/task-001.md
---
type: task
parent: "[[../overview]]"           # projects/alpha/overview.md
sibling: "[[./task-002]]"           # projects/alpha/tasks/task-002.md
global_reference: "[[people/bob]]"  # people/bob.md (from root)
---

8.13 Path Sandboxing

Link resolution MUST NOT resolve to paths outside the collection root.

Rules

After resolving relative paths (applying ../ segments), the resulting absolute path MUST be within the collection root directory
If resolution would escape the collection root, the link MUST resolve to null and implementations MUST emit a path_traversal error
This applies to all link formats: wikilinks, markdown links, and bare paths
Implementations MUST normalize paths (resolve . and .. segments) before checking containment

Example

In a collection rooted at /home/user/notes/:

Link	From File	Result
`[[../../../etc/passwd]]`	`notes/daily.md`	`null` + `path_traversal` error
`[[../../secrets/key]]`	`deep/nested/file.md`	`null` + `path_traversal` error
`[[../sibling]]`	`tasks/task-001.md`	Resolves normally (stays within root)

9. Validation

Validation ensures that files conform to their type schemas. This section defines what is validated, when validation occurs, and how errors are reported.

9.1 Validation Levels

Implementations MUST support three validation levels:

Level	Behavior
`off`	No validation performed
`warn`	Validation runs; issues are reported but operations succeed
`error`	Validation runs; issues cause operations to fail

The default level is configured via settings.default_validation (default: "warn").

At warn level, operations MUST return success and include issues as warnings. For operation outputs that include a valid flag, valid MUST be true even when warnings are present.

Operations MAY override the default level:

# Force error-level validation
mdbase validate --level error

# Create with no validation
mdbase create --no-validate

9.2 What Is Validated

For each typed file, validation checks the following:

9.2.1 Required Fields

Fields marked required: true MUST be:

Present in the effective frontmatter (defaults applied; computed fields excluded)
Non-null (value is not null)

Note: exists(field) checks for a present key in raw persisted frontmatter even if its value is null. Required fields must be present in the effective frontmatter and non-null.

# Type definition
fields:
  title:
    type: string
    required: true

# Valid
title: "My Document"

# Invalid: missing
# (no title key)

# Invalid: null
title: null
title:

9.2.2 Type Correctness

Values MUST match their declared type (or be coercible):

# Type definition
fields:
  priority:
    type: integer

# Valid
priority: 5
priority: "5"  # Coerced to integer

# Invalid
priority: "high"
priority: 5.5

9.2.3 Field Constraints

Type-specific constraints MUST be satisfied:

Type	Constraints
`string`	`min_length`, `max_length`, `pattern`
`integer`, `number`	`min`, `max`
`list`	`min_items`, `max_items`, `unique`
`enum`	`values`
`link`	`validate_exists`

9.2.4 Unknown Fields (Strictness)

When a type has strict: true, unknown fields cause validation failure:

# Type definition: strict: true
fields:
  title:
    type: string

# Valid
title: "Doc"

# Invalid: unknown field
title: "Doc"
extra_field: "not allowed"

With strict: "warn", unknown fields trigger warnings but pass validation.

Implicit fields: The following frontmatter keys are always implicitly allowed, even in strict mode:

type / types — type declaration keys (configurable via settings.explicit_type_keys)
Any keys listed in settings.explicit_type_keys

These keys are structural and do not need to be declared in the type's fields definition.

9.2.5 Multi-Type Validation

When a file matches multiple types, it MUST validate against ALL of them:

# File matches both 'task' and 'urgent' types
# Must satisfy:
# - All required fields from 'task'
# - All constraints from 'task'
# - All required fields from 'urgent'
# - All constraints from 'urgent'

9.2.6 Link Existence

For link fields with validate_exists: true, the target file MUST exist:

# Type definition
fields:
  parent:
    type: link
    validate_exists: true

# Valid (if file exists)
parent: "[[existing-task]]"

# Invalid (file doesn't exist)
parent: "[[nonexistent]]"

9.2.7 Path Patterns

If a type defines path_pattern (or the filename_pattern alias), paths MAY be validated:

# Type definition
path_pattern: "{id}.md"

# File: task-001.md with id: "task-001" → valid
# File: random-name.md with id: "task-001" → warning (mismatch)

Path pattern validation is RECOMMENDED but not strictly required.

9.2.8 Unique ID Field

If settings.id_field is configured (default: id), values of that field MUST be unique across the collection. If duplicates exist, validation MUST emit a duplicate_id issue for each file that shares the duplicated value.

9.3 Validation Issue Format

Each validation issue MUST include:

Field	Type	Description
`path`	string	File path relative to collection root
`field`	string	Field path (e.g., `author.email`, `tags[0]`)
`code`	string	Error code (see Appendix C)
`message`	string	Human-readable error description
`severity`	enum	`error` or `warning`

Optional fields:

Field	Type	Description
`expected`	any	Expected value or type
`actual`	any	Actual value found
`type`	string	Type name that triggered the issue
`line`	integer	1-based line number in the source file
`column`	integer	1-based column number
`end_line`	integer	End line of the issue range
`end_column`	integer	End column of the issue range

Implementations SHOULD include line and column fields when source position information is available. These fields enable LSP-style diagnostics and precise issue reporting in CI tooling.

Example Issue

{
  "path": "tasks/fix-bug.md",
  "field": "priority",
  "code": "constraint_violation",
  "message": "Value 7 exceeds maximum of 5",
  "severity": "error",
  "expected": { "max": 5 },
  "actual": 7,
  "type": "task",
  "line": 5,
  "column": 11,
  "end_line": 5,
  "end_column": 12
}

9.4 Validation Timing

Implementations MAY validate at different times:

When	Description
On read	Validate when loading a file
On write	Validate before creating or updating
On demand	Validate via explicit command
Continuous	Watch mode; validate on file changes

The specification does not mandate when validation occurs, only the behavior when it does.

Recommended Behavior

Create/Update operations: Validate before writing; fail if validation: error
Read/Query operations: Optionally validate; report issues but don't fail
Explicit validate command: Full collection validation with detailed report

9.5 Validation Commands

Implementations SHOULD provide explicit validation commands:

# Validate entire collection
mdbase validate

# Validate specific files
mdbase validate tasks/fix-bug.md notes/meeting.md

# Validate files of a specific type
mdbase validate --type task

# Validate with specific level
mdbase validate --level error

# Output validation report as JSON
mdbase validate --format json

9.6 Partial Validation

For large collections, implementations MAY support partial validation:

Validate only modified files (since last validation)
Validate only files in specific folders
Validate only files matching certain types

This is an optimization; full validation MUST remain available.

9.7 Validation Report Example

Human-readable format:

Validation Report
================

Errors: 3
Warnings: 5

tasks/fix-bug.md
  ERROR [missing_required] Field 'title' is required but missing
  ERROR [type_mismatch] Field 'priority': expected integer, got string "high"
  WARNING [unknown_field] Field 'custom' is not defined in type 'task'

notes/meeting.md
  ERROR [constraint_violation] Field 'attendees': minimum 1 item required, got 0
  WARNING [deprecated_field] Field 'old_field' is deprecated

tasks/subtask.md
  WARNING [link_not_found] Field 'parent': target '[[nonexistent]]' not found

JSON format:

{
  "summary": {
    "files_checked": 42,
    "files_valid": 39,
    "files_invalid": 3,
    "errors": 3,
    "warnings": 5
  },
  "issues": [
    {
      "path": "tasks/fix-bug.md",
      "field": "title",
      "code": "missing_required",
      "message": "Field 'title' is required but missing",
      "severity": "error",
      "type": "task"
    }
  ]
}

9.8 Auto-Fix (Optional)

Implementations MAY support automatic fixing of certain issues:

Issue	Auto-Fix
Missing field with default	Apply default value
Type coercion possible	Coerce value
Missing generated field	Generate value

Auto-fix MUST NOT:

Delete user data
Make changes that could lose information
Fix issues where the correct resolution is ambiguous

# Preview fixes
mdbase validate --fix --dry-run

# Apply fixes
mdbase validate --fix

9.9 Validation in Multi-Type Context

When a file matches multiple types, validation follows these rules:

All types validated: The file must pass validation for ALL matched types
Issues attributed: Each issue includes which type triggered it
Conflict detection: If types have incompatible field definitions, report as error

Example conflict:

# Type 'a' defines: status as string
# Type 'b' defines: status as enum [open, closed]

# File matches both types
# File has: status: "pending"

# Result:
# - Passes type 'a' validation (valid string)
# - Fails type 'b' validation ("pending" not in enum)
# - Overall: FAIL (must pass all types)

9.10 Skipping Validation

Certain scenarios may warrant skipping validation:

Migration: Importing data that doesn't yet conform
Bulk operations: Performance-critical batch updates
Emergency fixes: Bypassing validation to fix broken state

Implementations SHOULD support:

# Skip validation on create
mdbase create --no-validate task.md

# Skip validation on update
mdbase update --no-validate task.md

Skipping validation SHOULD be logged for audit purposes.

10. Query Model

Queries retrieve files from the collection based on filters, with support for sorting, pagination, and computed fields. This section defines the query structure and semantics.

10.1 Query Overview

A query is a request to retrieve files matching certain criteria. Queries can:

Filter by type
Filter by frontmatter field values
Filter by file metadata
Filter by path patterns
Sort results
Paginate results
Compute derived fields (formulas)

Queries operate on the collection as a flat list of files. The result is a list of file records matching the criteria.

10.2 Core Query Structure

A query is expressed as a YAML object with optional clauses:

query:
  # Filter by type(s) - optional
  types: [task]
  
  # Filter by folder prefix - optional
  folder: "projects/alpha"
  
  # Filter expressions - optional
  where:
    and:
      - 'status != "done"'
      - "priority >= 3"
  
  # Sorting - optional
  order_by:
    - field: due_date
      direction: asc
    - field: priority
      direction: desc

  # Pagination - optional
  limit: 20
  offset: 0

Core Query checklist: types, folder, where, order_by, limit, offset, include_body

Core vs Query+ Summary

Clause	Core	Query+
`types`	✅	—
`folder`	✅	—
`where`	✅	—
`order_by`	✅	—
`limit` / `offset`	✅	—
`include_body`	✅	—
`formulas`	—	✅
`groupBy`	—	✅
`summaries`	—	✅
`property_summaries`	—	✅
`properties`	—	✅

10.3 Core Query Clauses

`types`

Filter to files matching specified type(s):

# Single type
types: [task]

# Multiple types (OR)
types: [task, note]

Files must match at least one of the listed types.

`folder`

Filter to files within a folder (and subfolders):

folder: "projects/alpha"

Matches files with paths starting with projects/alpha/.

`where`

Filter by expression conditions. Can be:

A single expression string:

where: 'status == "open"'

A logical combination:

where:
  and:
    - 'status == "open"'
    - "priority >= 3"
    - or:
        - 'tags.contains("urgent")'
        - "due_date < today()"
    - not: "draft == true"

Shape rules (Obsidian Bases-compatible):

A where value MAY be a string expression.
A where value MAY be a logical object with one of the keys and, or, not.
and/or values are lists of conditions (each condition is either a string expression or another logical object).
not value is a single condition (string expression or logical object).

See Expressions for the full expression language.

`order_by`

Sort results by one or more fields:

order_by:
  - field: due_date
    direction: asc      # ascending (oldest first)
  - field: priority
    direction: desc     # descending (highest first)

Direction values:

asc: Ascending (A-Z, 1-9, oldest-newest)
desc: Descending (Z-A, 9-1, newest-oldest)

Null handling: Null values sort LAST in ascending order and FIRST in descending order. Non-scalar fields: If order_by references a field whose value is a list, object, or other non-scalar type, implementations MUST sort by a deterministic representation: lists by length, objects by key count. Implementations SHOULD emit a warning when a non-scalar sort key is used, since it is often unintentional. This avoids undefined behavior from comparing incomparable types. Tie-breakers: If all order_by fields compare equal, implementations MUST apply a stable tie-breaker by ascending file.path to ensure deterministic output.

Formula sorting:

order_by:
  - field: formula.urgency_score
    direction: desc

String Collation

Default string ordering uses Unicode code point order (lexicographic comparison of Unicode scalar values):

Comparison is case-sensitive by default: uppercase letters sort before lowercase ("A" < "a")
Null values sort LAST in ascending order and FIRST in descending order
Implementations MAY support locale-aware collation as an ext-prefixed extension
For enum fields, sort order follows the values list declaration order, not string order

`limit` and `offset`

Paginate results:

limit: 20    # Return at most 20 results
offset: 40   # Skip the first 40 results

Together these enable pagination: page 3 of 20 items = offset: 40, limit: 20.

10.4 Logical Operators in `where`

The where clause supports nested logical operators:

Operator	YAML Key	Description
AND	`and:`	All conditions must be true
OR	`or:`	At least one condition must be true
NOT	`not:`	Condition must be false

Examples:

# AND: all must match
where:
  and:
    - 'status == "open"'
    - "priority >= 3"

# OR: any must match
where:
  or:
    - 'status == "blocked"'
    - "due_date < today()"

# NOT: must not match
where:
  not: 'status == "done"'

# Nested logic
where:
  and:
    - 'status != "done"'
    - or:
        - "priority >= 4"
        - 'tags.contains("urgent")'

Alternatively, use expression operators directly:

where: 'status != "done" && (priority >= 4 || tags.contains("urgent"))'

10.5 Property Namespaces

In query expressions, properties are accessed through namespaces:

Namespace	Description	Example
(bare)	Effective frontmatter property (defaults applied, computed excluded)	`status`, `priority`
`note.`	Raw persisted frontmatter (for reserved names)	`note.type`, `note["my-field"]`
`file.`	File metadata	`file.name`, `file.mtime`
`formula.`	Computed fields	`formula.overdue`
`this`	Context file (for embedded queries)	`this.file.name`

File Properties

Property	Type	Description
`file.name`	string	Filename with extension (e.g., `"task-001.md"`)
`file.basename`	string	Filename without final extension (e.g., `"task-001"`; for `"file.draft.md"` this is `"file.draft"`)
`file.path`	string	Full path from collection root
`file.folder`	string	Parent folder path
`file.ext`	string	File extension without dot (e.g., `"md"`)
`file.size`	number	File size in bytes
`file.ctime`	datetime	Created time
`file.mtime`	datetime	Modified time
`file.display_name`	string	Human-readable display name (from type `display_name_key`, or `file.basename` fallback)
`file.links`	list	Outgoing links (including links to non-markdown files)
`file.backlinks`	list	Incoming links (requires index); MAY be null/empty if backlinks are unsupported
`file.tags`	list	All tags (raw frontmatter `tags` + inline `#tags`, including nested)
`file.properties`	object	Raw persisted frontmatter properties only (no computed fields, no applied defaults). This is equivalent to `note.`
`file.embeds`	list	All embed links in the file body

Bare properties (no namespace) are evaluated against effective frontmatter, which includes defaults and type coercion per the matched schema. The note./file.properties namespaces expose raw persisted values without coercion.

file.display_name is derived from the matched type's display_name_key (see §5.13). If multiple matched types define display_name_key, implementations SHOULD use the first type in the types list. If the key is missing or empty on a record, implementations MUST fall back to file.basename.

Body Content Properties

The file.body property provides access to the raw markdown body content (everything after the frontmatter closing ---):

# Find files that mention a keyword in their body
query:
  where: 'file.body.contains("TODO")'

# Case-insensitive body search
query:
  where: 'file.body.lower().contains("important")'

# Regex body search
query:
  where: 'file.body.matches("\\bAPI\\b")'

Rules:

file.body is a string and supports all string methods from §11.5: .contains(), .matches(), .lower(), .startsWith(), etc.
Body search operates on raw markdown text including syntax characters
Content inside fenced code blocks IS included in file.body (it is the raw text)
Implementations SHOULD support file.body in filters without requiring include_body: true in the query — the body is used for filtering, not necessarily returned in results
Performance note: Body search without caching requires reading every file. Implementations SHOULD use full-text indexes when available
Note: file.body includes content inside code blocks, but file.links and file.tags exclude links and tags inside code blocks (see §8). This means file.body.contains("[[foo]]") may match a link that does not appear in file.links

The `this` Context

In embedded queries (queries within a file), this refers to the containing file:

# Find files linking to current file
where: "file.hasLink(this.file)"

# Find tasks assigned to current file's author
where: "assignee == this.author"

10.6 Result Structure

Query results return file objects with this structure. The frontmatter field is the effective frontmatter (defaults applied, computed fields excluded). Raw persisted values are available via file.properties/note..

- path: "tasks/fix-bug.md"
  types: [task, urgent]
  frontmatter:  # Effective frontmatter (defaults applied, computed excluded)
    id: "task-001"
    title: "Fix the login bug"
    status: open
    priority: 4
    tags: [bug, auth]
  formulas:
    overdue: true
    days_until_due: -3
  file:
    name: "fix-bug.md"
    folder: "tasks"
    mtime: "2024-03-15T10:30:00Z"
    size: 1234
  body: "..."  # Optional, if requested

Result Envelope

Query results MUST include metadata alongside the result list:

results:
  - path: "tasks/fix-bug.md"
    types: [task, urgent]
    frontmatter:
      id: "task-001"
      title: "Fix the login bug"
      # ...
meta:
  total_count: 142    # Total matching records (before limit/offset)
  limit: 20
  offset: 0
  has_more: true      # Whether more results exist beyond this page

Fields:

total_count: The total number of records matching the query filters, ignoring limit and offset. Implementations MUST compute this accurately
has_more: true if offset + length(results) < total_count
When no limit is specified, has_more is false and total_count equals the result count

Including Body Content

By default, body content is not included in results. To include it:

query:
  include_body: true

This increases memory usage for large result sets.

10.7 Query+ (Optional Advanced Features)

The following clauses are OPTIONAL and are part of the Query+ profile. Implementations are not required to support Query+ to claim conformance at Level 3.

`formulas`

Define computed fields evaluated for each result:

formulas:
  overdue: "due_date < today() && status != 'done'"
  days_until_due: "due_date - today()"
  display_priority: 'if(priority >= 4, "🔴", if(priority >= 2, "🟡", "🟢"))'

Formulas are accessible via the formula. namespace in subsequent expressions and in results.

Formula error handling: Invalid formula syntax MUST produce invalid_formula. Circular references MUST produce circular_formula. Runtime evaluation errors inside formulas (including type mismatches, division by zero, invalid method calls, and invalid regex) MUST produce formula_evaluation_error (not type_error) as defined in Appendix C.5.

`groupBy`

Group results by a property value. Each unique value creates a group:

groupBy:
  property: status
  direction: ASC    # ASC or DESC

Only one groupBy property is supported per query.
direction controls the sort order of groups: ASC (default) or DESC.
Results within each group follow the order_by sort.
Ungrouped results (null/missing group value) appear in a separate group. The null group sorts LAST in ASC direction and FIRST in DESC direction, consistent with the order_by null handling rules (§10.3).

`summaries`

Define custom summary formulas. In summary expressions, the values keyword represents all values for the associated property across the result set:

summaries:
  custom_avg: "values.reduce(acc + value, 0) / values.length"
  rounded_mean: "values.reduce(acc + value, 0) / values.length"

Summary semantics:

values is an ordered list matching the result order (or group order when groupBy is used).
Missing properties contribute null values to values.
Implementations SHOULD preserve null values in values for custom summaries.
Built-in summaries SHOULD ignore null/empty values unless otherwise specified (e.g., Empty, Filled).

See Expressions §11.14 for default summary functions.

`property_summaries`

Assign summary functions to specific properties. These calculate an aggregate value across all records (or per group when groupBy is used):

property_summaries:
  priority: Average
  estimate_hours: Sum
  due_date: Earliest
  formula.overdue: Checked

Values reference either default summary names (see Expressions §11.14) or custom summaries defined in the summaries section.

When groupBy is present, property summaries are computed per group.

`properties`

Display configuration for properties. Does not affect query logic---used by view renderers:

properties:
  status:
    displayName: "Current Status"
  formula.overdue:
    displayName: "Overdue?"
  file.ext:
    displayName: "Extension"

Display names are not used in filters or formulas.

10.8 Query Examples

Core Examples

All Open Tasks

query:
  types: [task]
  where: 'status == "open"'

High Priority Tasks Due This Week

query:
  types: [task]
  where:
    and:
      - "priority >= 4"
      - "due_date <= today() + '7d'"
      - 'status != "done"'

Files Modified Today

query:
  where: "file.mtime > today()"

Tasks Tagged Urgent or Blocker

query:
  types: [task]
  where: 'tags.containsAny("urgent", "blocker")'

Tasks Assigned to Engineering Team Members

query:
  types: [task]
  where: 'assignee.asFile().team == "engineering"'

Notes Linking to a Specific Task

query:
  types: [note]
  where: 'file.hasLink(link("tasks/task-001"))'

Backlinks to Current File

query:
  where: "file.hasLink(this.file)"

Files Matching Multiple Types

query:
  where:
    and:
      - 'types.contains("actionable")'
      - 'types.contains("urgent")'

Query+ Examples

Overdue Tasks Sorted by Priority

query:
  types: [task]
  where:
    and:
      - "formula.is_overdue == true"
      - 'status != "blocked"'
  formulas:
    is_overdue: "due_date < today() && status != 'done'"
    urgency_score: "priority + if(due_date < today() - '7d', 5, 0)"
  order_by:
    - field: formula.urgency_score
      direction: desc
  limit: 10

Tasks Grouped by Status (Query+)

query:
  types: [task]
  where: 'status != "cancelled"'
  groupBy:
    property: status
    direction: ASC
  property_summaries:
    priority: Average
    estimate_hours: Sum
  order_by:
    - field: priority
      direction: desc

Untyped Files

query:
  where: "types.length == 0"

Paginated Results

# Page 1
query:
  types: [task]
  order_by:
    - field: created_at
      direction: desc
  limit: 20
  offset: 0

# Page 2
query:
  types: [task]
  order_by:
    - field: created_at
      direction: desc
  limit: 20
  offset: 20

10.9 Query Optimization

Implementations SHOULD optimize queries where possible:

Index usage: Use indexes for common filters (type, path prefix)
Short-circuit evaluation: Stop evaluating OR clauses on first match
Lazy loading: Don't parse body content unless requested
Caching: Cache query results for repeated queries

Complex queries (link traversal, formulas) may require full scans. Implementations SHOULD document performance characteristics.

10.10 Query API Considerations

Implementations exposing queries via API SHOULD support:

Programmatic access:

const results = await collection.query({
  types: ['task'],
  where: 'status == "open"',
  orderBy: [{ field: 'priority', direction: 'desc' }],
  limit: 10
});

CLI access:

mdbase query --type task --where 'status == "open"' --limit 10

The exact API surface is implementation-dependent.

11. Expression Language

Expressions are strings that evaluate to values. They are used in query filters, match conditions, and computed formulas. This section defines the expression syntax and available functions.

11.1 Expression Context

Expressions are evaluated in a context that provides:

Frontmatter fields: Direct access via bare names (effective values: defaults applied, computed excluded)
Raw frontmatter: Via the note. namespace (equivalent to file.properties)
File metadata: Via file. prefix
Formula values: Via formula. prefix
Context reference: Via this (in embedded queries)
Built-in functions: Date functions, type checks, etc.

11.2 Literals

Strings

"hello world"    // Double quotes
'hello world'    // Single quotes
"line 1\nline 2" // Escape sequences supported

Numbers

123       // Integer
45.67     // Decimal
-10       // Negative
1e6       // Scientific notation

Booleans

true
false

Null

null

Lists (in expressions)

["a", "b", "c"]
[1, 2, 3]

11.3 Property Access

Frontmatter Fields

status              // Direct access
priority            // Direct access
author.name         // Nested object
tags[0]             // List index (0-based)

Bracket Notation

For fields with special characters:

note["field-with-dashes"]
note["field.with.dots"]

File Metadata

file.name           // "task-001.md"
file.path           // "tasks/task-001.md"
file.folder         // "tasks"
file.ext            // "md"
file.size           // 1234 (bytes)
file.ctime          // Created datetime
file.mtime          // Modified datetime
file.body           // Raw markdown body content (string)

Formulas

formula.overdue
formula.urgency_score

Context (this)

this.file.name      // Current file's name
this.author         // Current file's author field

11.4 Operators

Comparison Operators

Operator	Description	Example
`==`	Equal	`status == "open"`
`!=`	Not equal	`status != "done"`
`>`	Greater than	`priority > 3`
`<`	Less than	`priority < 3`
`>=`	Greater or equal	`priority >= 3`
`<=`	Less or equal	`priority <= 3`

Arithmetic Operators

Operator	Description	Example
`+`	Addition	`priority + 1`
`-`	Subtraction	`total - discount`
`*`	Multiplication	`count * 2`
`/`	Division	`total / count`
`%`	Modulo	`index % 2`
`( )`	Grouping	`(a + b) * c`

Boolean Operators

Operator	Description	Example
`&&`	Logical AND	`a && b`
`\|\|`	Logical OR	`a \|\| b`
`!`	Logical NOT	`!done`

Implementation Notes (Non-Normative)

For predictable behavior across implementations, evaluate boolean expressions left-to-right with short-circuiting:

a && b: if a is falsy, do not evaluate b.
a || b: if a is truthy, do not evaluate b.
Keep method/function evaluation side-effect free so short-circuiting only affects performance and null-safety, not state.

Null Coalescing

value ?? default    // Returns default if value is null

11.5 String Methods

Method	Description	Example
`.length`	String length (field)	`title.length`
`.contains(str)`	Contains substring	`title.contains("bug")`
`.containsAll(...strs)`	Contains all substrings	`title.containsAll("bug", "fix")`
`.containsAny(...strs)`	Contains any substring	`title.containsAny("bug", "fix")`
`.startsWith(str)`	Starts with prefix	`title.startsWith("WIP:")`
`.endsWith(str)`	Ends with suffix	`file.name.endsWith(".draft.md")`
`.isEmpty()`	Empty or absent	`title.isEmpty()`
`.lower()`	Convert to lowercase	`status.lower()`
`.upper()`	Convert to uppercase	`status.upper()`
`.title()`	Title case	`name.title()`
`.trim()`	Remove whitespace	`title.trim()`
`.slice(start, end?)`	Extract substring	`id.slice(0, 4)`
`.split(sep, n?)`	Split to list	`tags_str.split(",")`
`.replace(pattern, repl)`	Replace all occurrences	`title.replace("old", "new")`
`.repeat(count)`	Repeat string	`"-".repeat(3)`
`.reverse()`	Reverse string	`name.reverse()`
`.matches(regex)`	Regex match (case-sensitive)	`title.matches("^TASK-\\d+")`

.replace() semantics: When pattern is a string, .replace() replaces all occurrences (matching Obsidian Bases behavior). This differs from JavaScript's String.prototype.replace() which only replaces the first match. Implementations that support regex literal patterns (e.g., /pattern/flags) SHOULD respect the g flag to control first-vs-all behavior, but string patterns always replace all.

.matches() semantics: The regex argument is a string containing an ECMAScript regular expression (see §4.8). Matching is case-sensitive by default and tests whether any part of the string matches the pattern (equivalent to a non-anchored search). To match the full string, use anchors: title.matches("^exact{{SPEC_CONTENT}}quot;). Implementations that support regex literal syntax (e.g., /pattern/i) MAY allow flags such as i (case-insensitive) and m (multiline).

11.6 List Methods

Method	Description	Example
`.length`	List length (field)	`tags.length`
`.contains(value)`	Contains element	`tags.contains("urgent")`
`.containsAll(...values)`	Contains all elements	`tags.containsAll("a", "b")`
`.containsAny(...values)`	Contains any element	`tags.containsAny("a", "b")`
`.isEmpty()`	List is empty	`tags.isEmpty()`
`[index]`	Element at index	`tags[0]`
`.filter(expr)`	Filter elements	`items.filter(value > 2)`
`.map(expr)`	Transform elements	`tags.map(value.lower())`
`.reduce(expr, init)`	Reduce to single value	`nums.reduce(acc + value, 0)`
`.flat()`	Flatten nested lists	`nested.flat()`
`.reverse()`	Reverse element order	`items.reverse()`
`.slice(start, end?)`	Extract portion	`items.slice(0, 3)`
`.sort()`	Sort ascending	`tags.sort()`
`.unique()`	Remove duplicates	`tags.unique()`
`.join(sep)`	Join to string	`tags.join(", ")`

In filter(), map(), and reduce(), the implicit variables value and index refer to the current element and its position. For reduce(), acc is the accumulator. These implicit variables shadow any frontmatter fields with the same names. To access a field named value, index, or acc, use an explicit namespace (e.g., note.value or file.properties.value) or this.value in embedded-query contexts.

containsAll() and containsAny() are variadic; passing a list literal counts as a single value and does not auto-expand.

reduce() requires an initial accumulator value. If reduce() is called without the init argument, implementations MUST treat it as a structural error and abort evaluation with wrong_argument_count. This avoids underspecified behavior on empty lists and keeps evaluation deterministic.

11.7 Date/Time Functions

Current Date/Time

Function	Returns	Description
`now()`	datetime	Current date and time
`today()`	date	Current date (no time)

Timezone semantics:

now() and today() use settings.timezone if configured; otherwise they use the implementation's local timezone.
Date-only values (date type) are interpreted in the local timezone for comparisons.
Datetime values with explicit offsets MUST be compared in absolute time.

Parsing

Function	Description	Example
`date(string)`	Parse date	`date("2024-03-15")`
`datetime(string)`	Parse datetime	`datetime("2024-03-15T10:30:00Z")`

Date Components

Method	Returns	Description
`.year`	integer	Year component
`.month`	integer	Month (1-12)
`.day`	integer	Day of month
`.hour`	integer	Hour (0-23)
`.minute`	integer	Minute (0-59)
`.second`	integer	Second (0-59)
`.dayOfWeek`	integer	Day of week (0=Sunday)
`.date()`	date	Date portion only
`.time()`	time	Time portion only

Date Formatting

due_date.format("YYYY-MM-DD")
created_at.format("MMM D, YYYY")

Common format tokens:

YYYY: 4-digit year
MM: 2-digit month
DD: 2-digit day
HH: 2-digit hour (24h)
mm: 2-digit minute
ss: 2-digit second

11.8 Date Arithmetic

Dates support arithmetic with duration strings:

due_date + "7d"           // Add 7 days
now() - "1w"              // Subtract 1 week
file.mtime > now() - "24h"  // Modified in last 24 hours

Duration units:

Unit	Aliases
`y`	`year`, `years`
`M`	`month`, `months`
`w`	`week`, `weeks`
`d`	`day`, `days`
`h`	`hour`, `hours`
`m`	`minute`, `minutes`
`s`	`second`, `seconds`

Unit case sensitivity: Duration units are case-sensitive. M is months and m is minutes.

Duration string format:

Each duration string contains a single number-unit pair. Whitespace between the number and unit is allowed ("7d" and "7 days" are equivalent). Compound durations in a single string (e.g., "1d12h") are NOT supported — chain additions instead:

date + "1M" + "4h" + "3m"  // Add 1 month, 4 hours, 3 minutes

Calendar arithmetic: Adding months or years clamps to the last day of the target month. For example, date("2024-01-31") + "1M" returns 2024-02-29 (2024 is a leap year), not 2024-03-02.

Examples:

today() + "30d"           // 30 days from now
due_date - "2w"           // 2 weeks before due date
created_at + "1y"         // 1 year after creation

Date comparison:

due_date < today()                    // Overdue
due_date < today() + "7d"             // Due within a week
file.mtime > now() - "1h"             // Modified in last hour

Date subtraction:

Subtracting two dates returns the difference in milliseconds:

now() - file.ctime                    // Milliseconds since creation
(today() - due_date) / 86400000       // Days overdue (negative if not yet due)
(now() + "1d") - now()                // Returns 86400000

Duration function:

The duration() function explicitly parses a duration string. This is needed when performing arithmetic on durations themselves:

now() + (duration("1d") * 2)          // 2 days from now
duration("5h") * 3                    // Duration must be on the left

11.9 Conditional Expression

if(condition, then_value, else_value)

Examples:

if(priority > 3, "high", "normal")
if(status == "done", "✓", "○")
if(due_date < today(), "overdue", if(due_date < today() + "7d", "soon", "ok"))

11.10 Null Handling

Check Existence

exists(field)    // true if field key is present (including null values)
field.isEmpty()  // true if field is null, empty, or absent

Note: exists() checks for key presence in raw persisted frontmatter. A field with value null exists but is empty. Use isEmpty() to check if a field has a meaningful value.

Provide Default

default(field, value)  // Return value if field is null or missing
field ?? value         // Null coalescing operator

Examples:

exists(due_date)                    // Has a due date?
default(priority, 3)                // Default priority to 3
assignee ?? "unassigned"            // Default to "unassigned"

Missing vs null: In expressions, missing properties are treated like null for default() and ??. Use exists(field) to distinguish missing from present-null.

11.11 Type Checking and Conversion

Type Checking

value.isType("string")     // true if value is a string
value.isType("number")     // true if value is a number
value.isType("boolean")    // true if value is a boolean
value.isType("date")       // true if value is a date
value.isType("list")       // true if value is a list
value.isType("object")     // true if value is an object

Type Conversion

value.toString()           // Convert any value to string
number("3.14")             // Parse string to number
number(true)               // Returns 1 (false returns 0)
number(date_value)         // Milliseconds since epoch
value.isTruthy()           // Coerce to boolean
list(value)                // Wrap in list if not already a list

11.12 Link Functions

Function	Description	Example
`link.asFile()`	Resolve link to file	`parent.asFile().status`
`link(path)`	Construct link	`link("tasks/task-001")`
`file.hasLink(target)`	File links to target	`file.hasLink(link("api-docs"))`
`file.hasTag(...tags)`	File has any of the given tags; uses prefix matching for nested tags (see §8)	`file.hasTag("important")`
`file.hasProperty(name)`	Raw persisted frontmatter has the key	`file.hasProperty("status")`
`file.inFolder(path)`	File is in folder (or subfolder)	`file.inFolder("archive")`
`file.asLink(display?)`	Convert file to link	`file.asLink("display text")`

file.asLink(display?) returns a link value equivalent to link(file.path). The default format is a wikilink ([[path]]). If display is provided, it is used as the link alias (e.g., [[path|display]]).

11.13 Object Methods

Method	Description	Example
`.isEmpty()`	Has no properties	`metadata.isEmpty()`
`.keys()`	List of property names	`metadata.keys()`
`.values()`	List of property values	`metadata.values()`

11.14 Summary Functions

Summary functions operate on a collection of values across all matching records. They are used in the summaries section of a query (see Querying).

In summary formulas, the values keyword represents all values for a given property across the result set. The formula MUST return a single value.

Summary value semantics:

values is ordered to match the query result order (or group order when grouped).
Missing properties contribute null values to values.
Custom summaries receive values with null entries intact.
Built-in summaries SHOULD ignore null/empty values unless the function is explicitly about emptiness (e.g., Empty, Filled).

values.reduce(acc + value, 0)         // Sum
values.reduce(acc + value, 0) / values.length  // Average
values.filter(value.isTruthy()).length // Count of truthy values

Default Summary Functions

Implementations SHOULD provide these built-in summary functions:

Name	Input Type	Description
`Average`	Number	Mean of all numeric values
`Min`	Number	Smallest number
`Max`	Number	Largest number
`Sum`	Number	Sum of all numbers
`Range`	Number	Difference between Max and Min
`Median`	Number	Median value
`Earliest`	Date	Earliest date
`Latest`	Date	Latest date
`Checked`	Boolean	Count of `true` values
`Unchecked`	Boolean	Count of `false` values
`Empty`	Any	Count of empty/null values
`Filled`	Any	Count of non-empty values
`Unique`	Any	Count of unique values

11.15 Operator Precedence

From highest to lowest:

( ) - Grouping
. [] - Property access
! - (unary) - Negation
* / % - Multiplication
+ - - Addition
< <= > >= - Comparison
== != - Equality
&& - Logical AND
|| - Logical OR
?? - Null coalescing

Use parentheses to clarify complex expressions.

11.16 Lambda Expressions

List methods like filter(), map(), and reduce() use implicit variables rather than arrow function syntax:

// value refers to the current element, index to its position
items.filter(value > 2)
tags.map(value.lower())
items.map(value.toString() + " (" + index.toString() + ")")

// reduce also provides acc (accumulator)
numbers.reduce(acc + value, 0)

Implementations MAY also support arrow function syntax as an extension:

tags.map(t => t.lower())
tasks.filter(t => t.status != "done")

If arrow functions are supported, implementations SHOULD parse them only within function argument positions and treat => as part of the lambda expression itself (not as a general-purpose operator).

11.17 Expression Examples

Simple Filters

status == "open"
priority >= 4
tags.contains("urgent")

Combined Conditions

status == "open" && priority >= 4
status == "blocked" || due_date < today()
!(status == "done")

Date Filters

due_date < today()
due_date < today() + "7d"
file.mtime > now() - "24h"
created_at.year == 2024

String Filters

title.contains("bug")
title.lower().contains("urgent")
file.name.startsWith("draft-")
id.matches("^TASK-\\d{4}$")

List Operations

tags.length > 0
tags.contains("important")
tags.containsAny("urgent", "critical")
assignees.filter(a => a.asFile().team == "eng").length > 0

Computed Fields (Formulas)

// Is overdue?
due_date < today() && status != "done"

// Days overdue (date subtraction returns milliseconds)
(today() - due_date) / 86400000

// Priority display
if(priority >= 4, "🔴 Critical", if(priority >= 2, "🟡 Normal", "🟢 Low"))

// Urgency score
priority * 10 + if(due_date < today(), 50, 0)

Link Traversal

parent.asFile().status == "done"
assignee.asFile().team == "engineering"
blocks.map(b => b.asFile().status).contains("blocked")

11.18 Error Handling

Expression evaluation errors MUST be handled gracefully and MUST NOT abort the overall query:

Error	Behavior
Property access on null	Returns null
Method call on null	Returns null
Division by zero	Returns null and emits `type_error`
Invalid regex	Evaluation error (see §4.8 for regex flavor)
Type mismatch	Returns null and emits `type_error`

Implementations SHOULD log evaluation errors and continue processing where possible.

Note: These error-handling rules apply to all expression evaluation contexts, including where filters in queries and match rules. A type mismatch such as "hello" + 5 returns null and emits type_error as a diagnostic — it does not abort the query. The file is simply excluded from results because the where expression evaluated to a non-truthy value. Only structural errors (invalid_expression, unknown_function, wrong_argument_count) abort the query, because they indicate a malformed query rather than a data-dependent evaluation issue.

Implementation Notes (Non-Normative)

A robust evaluator typically separates:

Parse-time structural errors (abort evaluation/query):
- invalid_expression
- unknown_function
- wrong_argument_count
Runtime data-dependent errors (return null, continue):
- null property access
- method call on null
- type mismatch
- invalid regex
- division by zero

This split keeps query behavior deterministic while preserving per-record fault tolerance.

11.18.1 General Expression Nesting Limit

Independently of the 10-hop asFile() traversal limit defined in §8.7, implementations MUST enforce a general nesting depth limit for expressions. This prevents stack overflow from deeply nested constructs such as if(true, if(true, if(true, ...))).

Parameter	Value
Default limit	64 levels
Configurable range	32–256 levels
Error code	`expression_depth_exceeded`

The nesting depth is incremented for each nested function call, sub-expression grouping, or property chain step. When the limit is exceeded, the expression MUST produce an expression_depth_exceeded error.

This limit is distinct from the asFile() hop limit: the asFile() limit counts link traversal hops (default 10), while the general nesting limit counts expression parse depth (default 64). Both emit expression_depth_exceeded.

11.19 Expression Portability

Expressions using only spec-defined functions and operators are portable expressions. This section defines rules for maintaining portability across implementations.

Custom Functions

Implementations MAY define custom functions beyond those specified in this document. Custom functions MUST be namespaced with the ext prefix using either :: or . as a delimiter:

ext::myFunc(value)    // Double-colon delimiter
ext.myFunc(value)     // Dot delimiter

Both delimiter forms are equivalent. Implementations MUST accept either form for custom functions they define.

Rules

Namespace requirement: Implementations MUST namespace all custom functions with the ext prefix. Unprefixed custom functions are not permitted.
No shadowing: Implementations MUST NOT override or shadow built-in functions or operators defined in this specification.
Non-portable warnings: Implementations SHOULD emit a warning when evaluating an expression that uses non-portable functions (i.e., ext-prefixed functions).
Documentation: Type definitions and queries SHOULD note when they depend on non-portable expressions.

Example

# Portable expression — uses only spec-defined functions
filters: 'status == "open" && due_date < today()'

# Non-portable expression — uses a custom function
filters: 'ext::sentiment(title) > 0.5'

Implementations encountering an unknown ext-prefixed function MUST treat it as an evaluation error (see §11.18).

12. Operations

This section defines the behavior of Create, Read, Update, Delete, and Rename operations on collection files.

12.1 Create

Creates a new file in the collection.

Input

Parameter	Required	Description
`type`	No	Type name(s) for the file
`frontmatter`	Yes	Field values (may be partial)
`body`	No	Markdown body content
`path`	No	Target file path (may be derived)

Behavior

Determine type(s): Use provided type, or infer from frontmatter if explicit type keys are present
- Explicit type keys are defined by settings.explicit_type_keys
- If settings.explicit_type_keys is empty, type inference relies solely on match rules
Apply defaults: For each missing field with a default value, apply the default to the effective record used for validation and output
Generate values: For fields with generated strategy:
- ulid: Generate ULID
- uuid: Generate UUID v4
- {random: N}: Generate random string
- sequence: Generate auto-incrementing integer
- now: Set to current datetime
- {from, transform}: Derive from source field
Validate: If validation level is not off:
- Validate against all matched type schemas
- If level is error and validation fails, abort
Determine path:
- If path provided, use it
- If type has path_pattern (or filename_pattern alias), derive from the effective frontmatter (including defaults and generated values)
- Otherwise, require explicit path
- If any pattern variable is unresolved (null/undefined/empty), fail with path_required
Check match rules (explicit type only, Level 2+):
- If the type was specified explicitly (via input type or explicit type keys), the final record MUST satisfy that type's match rules (§6.3–§6.4) when match rules are supported
- This check uses the effective frontmatter and the final path
- If the record does not satisfy the match rules, abort with match_failed
- Level 1 implementations MAY skip this step because match rule evaluation is a Level 2 capability
Check existence: If file already exists at path, abort with error
Write file:
- Serialize frontmatter to YAML
- If settings.explicit_type_keys is non-empty and no explicit type key is present in the frontmatter, implementations MUST write the type using the first key in settings.explicit_type_keys
- If settings.explicit_type_keys is empty, implementations MUST NOT write any type declaration field
- MUST include all explicitly provided fields and all generated fields
- MUST write fields filled solely by defaults when settings.write_defaults is true (default); SHOULD omit them when settings.write_defaults is false
- Ensure parent directories exist for the final path (create as needed)
- Combine with body
- Write atomically (temp file + rename)

Output

The output MUST include the path field containing the final file path relative to the collection root.

path: "tasks/task-001.md"
frontmatter:
  id: "01ARZ3NDEKTSV4RRFFQ69G5FAV"
  title: "Fix the bug"
  status: open
  created_at: "2024-03-15T10:30:00Z"
  # ... all fields including generated

Errors

Code	Description
`unknown_type`	Specified type doesn't exist
`validation_failed`	Validation errors (with details)
`path_conflict`	File already exists at target path
`path_required`	Cannot determine path
`match_failed`	Created record does not satisfy the specified type's match rules

Example

mdbase create task \
  --field title="Fix login bug" \
  --field priority=4 \
  --field "assignee=[[alice]]"

12.2 Read

Reads a file and returns its parsed content.

Input

Parameter	Required	Description
`path`	Yes	File path relative to collection root
`validate`	No	Whether to validate (default: per settings)
`include_body`	No	Include body content (default: true)

Behavior

Load file: Read from filesystem
- Reads respect collection scanning rules (settings.include_subfolders, settings.exclude, and settings.types_folder). If a path is excluded from the record set, read MUST return file_not_found.
Parse frontmatter: Extract YAML frontmatter and body
Determine types:
- Check for explicit type/types field
- If match rules are supported (Level 2+), evaluate match rules for all types
- Collect matched types
Validate (if enabled):
- Validate against all matched types
- Collect validation issues
Return record: Structured representation
- frontmatter is the effective frontmatter (defaults applied, computed fields excluded)
- file.properties (see Querying §10.5) provides raw persisted frontmatter when needed

Output

path: "tasks/task-001.md"
types: [task]
frontmatter:
  id: "task-001"
  title: "Fix the bug"
  status: open
  # ... all fields
file:
  name: "task-001.md"
  folder: "tasks"
  display_name: "Fix the bug"
  mtime: "2024-03-15T10:30:00Z"
  size: 1234
body: "## Description\n\nThe login form..."
validation:
  valid: true
  issues: []

Errors

Code	Description
`file_not_found`	File doesn't exist
`invalid_frontmatter`	YAML parse error

12.3 Update

Modifies an existing file's frontmatter and/or body.

Input

Parameter	Required	Description
`path`	Yes	File path
`fields`	No	Field updates (partial)
`body`	No	New body content (null = no change)

Behavior

Read existing file: Load current content
Merge fields: Apply field updates to existing frontmatter
- New fields are added
- Existing fields are replaced
- Explicit null removes the field (if write_nulls: omit) or writes null
- If a required field is removed via null, validation MUST treat it as missing
Determine types: Use explicit type keys (per settings.explicit_type_keys) and match rules, including path-based matching
- Updates MAY change match-relevant fields; reclassification is allowed and MUST NOT be treated as an error
Update generated fields: For fields with generated: now_on_write, update to current time
Apply defaults: For each missing field with a default, apply the default to the effective record used for validation and output
Validate: If enabled, validate merged frontmatter (using effective values for required checks)
Write file:
- Preserve field order where possible
- Preserve body if not provided
- Write atomically

Null Handling on Update

When updating a field to null:

`write_nulls` setting	Behavior
`"omit"` (default)	Remove the field from frontmatter
`"explicit"`	Write `field: null`

Important: Never write the empty-value form field: (see Frontmatter).

Output

path: "tasks/task-001.md"
frontmatter:  # Effective frontmatter (defaults applied, computed excluded)
  # ... updated frontmatter
previous:
  status: open
updated:
  status: done

Errors

Code	Description
`file_not_found`	File doesn't exist
`validation_failed`	Validation errors

Example

# Update single field
mdbase update tasks/task-001.md --field status=done

# Update multiple fields
mdbase update tasks/task-001.md \
  --field status=done \
  --field "completed_at=$(date -Iseconds)"

# Clear a field
mdbase update tasks/task-001.md --field assignee=null

12.4 Delete

Removes a file from the collection.

Input

Parameter	Required	Description
`path`	Yes	File path
`check_backlinks`	No	Warn about incoming links (default: true)

Behavior

Check existence: Verify file exists
Check backlinks (if enabled):
- Find files that link to this file
- Warn user about potential broken links
Delete file: Remove from filesystem

Output

path: "tasks/task-001.md"
deleted: true
broken_links:
  - path: "tasks/parent.md"
    field: "subtasks"

Errors

Code	Description
`file_not_found`	File doesn't exist

Example

# Delete with confirmation
mdbase delete tasks/task-001.md

# Delete without backlink check
mdbase delete tasks/task-001.md --no-check-backlinks

# Force delete
mdbase delete tasks/task-001.md --force

12.5 Rename (and Move)

Renames or moves a file, optionally updating references across the collection.

Input

Parameter	Required	Description
`from`	Yes	Current file path
`to`	Yes	New file path
`update_refs`	No	Update references (default: per settings)

Behavior

Validate paths: Ensure source exists and target doesn't
Rename file: Move file to new path atomically

Update references (if rename_update_refs is true):

Frontmatter links: Update link fields in all files that reference the renamed file

# Before: links to tasks/old-name.md
parent: "[[old-name]]"

# After: file renamed to tasks/new-name.md
parent: "[[new-name]]"

Body links: Update link syntax in markdown body content

<!-- Before -->
See [[old-name]] for details.
Check [the task](./old-name.md).

<!-- After -->
See [[new-name]] for details.
Check [the task](./new-name.md).

Reference Update Rules

Preserve link style:
- Wikilinks stay as wikilinks
- Markdown links stay as markdown links
- Relative links stay relative when possible
Update all matching references:
- By resolved path (most reliable)
- By name when unambiguous
ID-based links:
- If a simple-name link ([[name]]) resolves via id_field and the target file's id_field value did not change, implementations SHOULD NOT rewrite the link during rename (to avoid unnecessary churn).
Handle ambiguity:
- If a link could refer to multiple files, don't update
- Emit warning for manual review
Scope: Update references in ALL collection files, not just same folder

Output

from: "tasks/old-name.md"
to: "tasks/new-name.md"
references_updated:
  - path: "tasks/parent.md"
    field: "subtasks[0]"
    old_value: "[[old-name]]"
    new_value: "[[new-name]]"
  - path: "notes/meeting.md"
    location: "body"
    old_value: "[[old-name]]"
    new_value: "[[new-name]]"
warnings:
  - path: "archive/legacy.md"
    message: "Ambiguous link '[[name]]' not updated"

Partial Reference Update Failure

When update_refs is enabled and some reference updates fail (e.g., due to concurrent modification or I/O errors on individual files), the rename operation returns a partial success response:

from: "tasks/old-name.md"
to: "tasks/new-name.md"
references_updated:
  - path: "tasks/parent.md"
    field: "subtasks[0]"
    old_value: "[[old-name]]"
    new_value: "[[new-name]]"
ref_update_errors:
  - path: "notes/meeting.md"
    code: concurrent_modification
    message: "File was modified during reference update"
  - path: "docs/index.md"
    code: permission_denied
    message: "Cannot write to file"
error:
  code: rename_ref_update_failed
  message: "Rename succeeded but 2 reference updates failed"

The rename_ref_update_failed error code is returned alongside the successful rename result (from/to). The renamed file is at its new path. The ref_update_errors array provides per-file failure details including the file path and error code.

Errors

Code	Description
`file_not_found`	Source file doesn't exist
`path_conflict`	Target path already exists
`rename_ref_update_failed`	Rename succeeded but one or more reference updates failed (see above)

Example

# Simple rename
mdbase rename tasks/old.md tasks/new.md

# Move to different folder
mdbase rename tasks/task.md archive/task.md

# Rename without updating references
mdbase rename tasks/old.md tasks/new.md --no-update-refs

# Dry run (show what would change)
mdbase rename tasks/old.md tasks/new.md --dry-run

12.6 Atomicity

All write operations (Create, Update, Delete, Rename) SHOULD be atomic:

Create/Update: Write to temporary file, then rename
Delete: Single filesystem delete
Rename: Filesystem rename (atomic on most systems)

For Rename with reference updates, atomicity across multiple files is not guaranteed. Implementations SHOULD:

Complete the rename first
Update references file by file
Report partial failures clearly

12.7 Batch Operations

Implementations MAY support batch operations for efficiency:

# Bulk update
mdbase update --where 'status == "open"' --field status=in_progress

# Bulk delete
mdbase delete --where 'tags.contains("archive")' --confirm

# Bulk move
mdbase move 'tasks/*.md' archive/

Validation Phase

Before applying any changes, implementations MUST validate ALL affected files. If any file fails validation and default_validation is error, the entire batch MUST be aborted with no files modified.

Execution Phase

After validation passes, apply changes file by file.

Partial Failure

If a file write fails during execution (I/O error, concurrent modification):

Implementations MUST NOT roll back already-written files (filesystem operations are not transactional)
Implementations MUST continue processing remaining files (best-effort)
Implementations MUST report per-file results: success, failure (with error code), or skipped

Result Format

batch_result:
  total: 50
  succeeded: 47
  failed: 2
  skipped: 1
  details:
    - path: "tasks/task-001.md"
      status: "success"
    - path: "tasks/task-002.md"
      status: "failed"
      error: { code: "concurrent_modification", message: "..." }
    - path: "tasks/task-003.md"
      status: "skipped"
      reason: "Depends on failed task-002.md"

Dry-Run Mode

Batch operations MUST support --dry-run which validates all changes and reports what would happen without modifying any files.

12.8 Backfill (Level 6)

Backfill applies defaults and/or generated values to missing fields across many files. It is intended for schema evolution and migration workflows (see §5.11.1).

Input

Parameter	Required	Description
`type`	No	Type name to target (optional if `where` is provided)
`where`	No	Query filter expression to select files
`fields`	No	Specific fields to backfill (default: all fields with defaults/generated values)
`apply.defaults`	No	Apply default values to missing fields (default: true)
`apply.generated`	No	Apply generated values to missing fields (default: true)
`dry_run`	No	Validate and report changes without writing files

Behavior

Select candidates:
- If type is provided, include files matching that type
- If where is provided, filter candidates using the query expression
- If neither is provided, the operation MUST fail with invalid_request
Compute changes:
- Only missing fields are eligible for backfill
- Explicit null values are considered present and are NOT backfilled
- If fields is provided, only those fields are considered
- If apply.defaults is true, apply defaults for missing fields
- If apply.generated is true, apply generated values for missing fields (including ulid, uuid, {random: N}, sequence, now, {from, transform})
- If a file has no eligible missing fields, it MUST be reported as skipped
Validate:
- Apply the backfill changes to the effective frontmatter
- If default_validation is error and any file fails validation, abort the entire operation with no files modified
Write (unless dry_run):
- Persist fields that were filled by backfill
- Honor settings.write_defaults when writing default-filled fields
- Preserve formatting per §12.9

Output

Returns a batch-style result (same batch_result structure as §12.7, including skipped and per-file details):

batch_result:
  total: 12
  succeeded: 10
  failed: 1
  skipped: 1
  details:
    - path: "tasks/a.md"
      status: "success"
      changed_fields: [status, id]
    - path: "tasks/b.md"
      status: "skipped"
      reason: "No missing fields to backfill"
    - path: "tasks/c.md"
      status: "failed"
      error: { code: "validation_failed", message: "..." }

Errors

Code	Description
`invalid_request`	Missing `type` and `where`
`validation_failed`	Validation errors (with details)

12.9 Formatting Preservation

When writing files, implementations SHOULD preserve:

MUST Preserve

Body content (unless explicitly changed)
Line ending style (LF vs CRLF)

SHOULD Preserve

Frontmatter field order
String quoting style
Multi-line string format (literal vs folded)
Comments (if YAML parser supports it)

MAY Normalize

Indentation (recommend 2 spaces)
Trailing whitespace
Final newline (files SHOULD end with newline)

12.10 Hooks (Optional)

Implementations MAY support hooks for custom logic:

Hook	When
`beforeCreate`	Before validation and write
`afterCreate`	After successful write
`beforeUpdate`	Before validation and write
`afterUpdate`	After successful write
`beforeDelete`	Before deletion
`afterDelete`	After successful deletion
`beforeRename`	Before rename
`afterRename`	After successful rename and ref updates

Hooks receive operation context and can:

Modify values (before hooks)
Perform side effects (after hooks)
Abort operation (before hooks, by throwing)

This is an OPTIONAL feature; implementations need not support hooks.

12.11 Concurrency

Read-Modify-Write Cycle

When updating a file, implementations MUST detect concurrent modifications. The recommended approach is optimistic concurrency using file mtime:

Read file, record mtime
Apply changes in memory
Before writing, check that file mtime has not changed
If mtime changed, abort with concurrent_modification error
Write atomically (temp file + rename)

Implementations MAY use content hashing instead of mtime for more reliable conflict detection.

Conflict Behavior

On detecting a concurrent modification, implementations MUST abort the operation and report concurrent_modification. Implementations MUST NOT silently overwrite concurrent changes.

Implementations MAY offer a retry mechanism (re-read, re-apply, re-check) but MUST NOT retry automatically without user/caller consent.

Cross-File Operations

Rename with reference updates touches multiple files. These are NOT atomic across files. Implementations MUST:

Complete the primary rename first
Update references file by file
Use mtime checking on each referenced file before updating
Report partial failures — which files were updated and which were skipped due to conflicts

File Locking

Implementations MAY use advisory file locks for write operations. If used:

Locks MUST be released on operation completion (including error paths)
Lock timeouts SHOULD be documented
Implementations MUST NOT require locking for read operations

12.12 Init

Initializes a new collection in a directory.

Input

Parameter	Required	Description
`path`	No	Directory to initialize (default: current working directory)
`config`	No	Configuration object or YAML string to write as `mdbase.yaml`

If config is omitted, implementations MUST write a minimal configuration containing only spec_version.

Behavior

Create mdbase.yaml at the collection root
Create the types folder (settings.types_folder, default _types/)
Create the meta type file in the types folder (see §5.8)

Init MUST be atomic with respect to other init operations. If another process has already initialized the collection (or does so concurrently), init MUST fail with path_conflict rather than partially overwriting existing files.

Output

path: "/path/to/collection"
config_path: "mdbase.yaml"
types_folder: "_types"
meta_type_path: "_types/meta.md"

Errors

Code	Description
`path_conflict`	Collection already exists at target path
`invalid_path`	Path is malformed

12.13 Migrate (Level 6)

Applies a migration manifest in order. Migration manifests are defined in §5.11.1.

Input

Parameter	Required	Description
`id`	No	Migration manifest ID (from manifest frontmatter)
`path`	No	Explicit path to a migration manifest file
`dry_run`	No	If true, execute backfill steps in dry-run mode

Exactly one of id or path MUST be provided.

Behavior

Load manifest:
- If path is provided, load that file
- If id is provided, search settings.migrations_folder for a manifest with matching id
Validate manifest:
- Manifest frontmatter MUST contain steps as a list
- Each step MUST include id and op
- Unknown op values MUST fail with invalid_migration
Execute steps in order:
- For backfill steps, run the Backfill operation (§12.8) with the step’s parameters
  - If dry_run is true on the migrate operation, it overrides the step’s dry_run to true
- For schema-only steps (add_field, rename_field, change_type, rename_type, move_path), record the step as manual and continue
- If any backfill step fails, stop execution and return migration_failed

Output

The output MUST include:

migration_result.id
migration_result.steps[] with id, op, and status
For backfill steps, steps[].result.batch_result

migration_result:
  id: "2026-02-03-migrate-tasks"
  steps:
    - id: "add-status-field"
      op: add_field
      status: manual
    - id: "backfill-status"
      op: backfill
      status: success
      result:
        batch_result:
          total: 12
          succeeded: 12
          failed: 0

Errors

Code	Description
`invalid_request`	Missing or conflicting `id`/`path`
`invalid_migration`	Manifest is malformed or contains invalid steps
`migration_failed`	A step failed during execution (see details)

13. Caching and Indexing

Caching and indexing are optional features that accelerate queries on large collections. This section defines cache behavior and requirements.

13.1 Core Principle

Files are the source of truth.

Caches are derived data. They MUST be:

Rebuildable from files alone
Deletable without data loss
Optional for correctness (only affect performance)

If you delete the cache folder, the collection still works—queries just run slower.

13.2 When Caching Helps

Caching significantly improves performance for:

Operation	Without Cache	With Cache
Query by type	Scan all files	Index lookup
Query by field	Scan all files	Index lookup
Path prefix filter	Filesystem scan	Index lookup
Link resolution	Search all files	Direct lookup
Backlink queries	Scan all files	Reverse index
Full-text body search	Read all files	Text index

For small collections (< 100 files), caching overhead may not be worthwhile. For large collections (1000+ files), caching is strongly recommended.

13.3 Cache Requirements

If an implementation supports caching, it MUST follow these rules:

13.3.1 Derivable

The cache MUST be completely rebuildable from:

Collection files (markdown)
Configuration (mdbase.yaml)
Type definitions

No information should exist only in the cache.

13.3.2 Optional

All operations MUST work without the cache, possibly slower:

Queries scan files directly
Backlinks are computed on demand
Link resolution searches the collection

13.3.3 Detectable Staleness

The implementation MUST detect when the cache is stale:

File modified after cache entry
File deleted but still in cache
New file not in cache
Config changed since cache build

13.3.4 Explicit Rebuild

Users MUST be able to force a full cache rebuild:

mdbase cache rebuild

13.3.5 Deletable

Deleting the cache folder MUST NOT affect:

File contents
Collection integrity
Operation correctness

13.4 Cache Location

The default cache location is .mdbase/ at the collection root, configurable via settings.cache_folder.

my-collection/
├── mdbase.yaml
├── _types/
├── tasks/
├── notes/
└── .mdbase/                  # Cache folder
    ├── index.sqlite        # Main index (example)
    ├── links.json          # Link graph (example)
    └── meta.json           # Cache metadata

Gitignore

The cache folder SHOULD be gitignored. Add to .gitignore:

.mdbase/

Cache files are machine-specific and should not be version controlled.

13.5 What to Cache

Implementations MAY cache:

Data	Purpose
File metadata	Fast file lookups
Parsed frontmatter	Avoid re-parsing
Type assignments	Fast type queries
Field values	Field-based queries
Link graph	Link resolution, backlinks
Full-text index	Body content search

Minimum Recommended

At minimum, caching implementations SHOULD index:

File paths and mtimes (for staleness detection)
Type assignments (for type queries)
Link relationships (for backlinks)

13.6 Cache Invalidation

Staleness Detection

For each file, track:

Last known mtime
Content hash (optional, more reliable)

On query:

Check if file mtime matches cached mtime
If different, mark entry stale
Re-parse file and update cache

Change Triggers

Cache entries should be invalidated when:

Change	Invalidation
File modified	Re-index that file
File created	Index new file
File deleted	Remove from index
File renamed	Update path, check links
Config changed	Full rebuild
Type definition changed	Re-index affected files

Incremental vs Full Rebuild

Incremental: Update only changed entries (fast, normal operation)

Full rebuild: Recreate entire cache (slow, guaranteed consistent)

Implementations SHOULD support both.

13.7 Backlinks and Caching

The file.backlinks property requires knowing which files link TO a given file. This requires a reverse link index.

Building the Backlink Index

For each file A in the collection:

Parse A's frontmatter and body
Extract all links from A
For each link target B:
- Add A to B's backlinks set

Storage

{
  "tasks/task-001.md": {
    "backlinks": [
      "tasks/parent.md",
      "notes/meeting.md"
    ]
  }
}

Performance Note

Without caching, computing backlinks requires scanning every file. For large collections, this is prohibitively slow. Implementations SHOULD:

Document that file.backlinks requires caching for good performance
Warn when backlink queries are slow
Suggest enabling caching

13.8 Cache Commands

Implementations SHOULD provide cache management commands:

# Show cache status
mdbase cache status
# Output: Cache valid, 1234 files indexed, last built 5 min ago

# Rebuild entire cache
mdbase cache rebuild

# Clear cache
mdbase cache clear

# Update cache incrementally
mdbase cache update

# Verify cache integrity
mdbase cache verify

13.9 Cache Implementation Options

Implementations may use various storage backends:

SQLite (Recommended)

.mdbase/index.sqlite

Pros: ACID, queryable, single file, widely supported Cons: Binary file (not human-readable)

JSON Files

.mdbase/files.json
.mdbase/links.json
.mdbase/types.json

Pros: Human-readable, simple Cons: Full rewrite on update, no concurrent access

Memory Only

No persistent cache; rebuild on each run.

Pros: No disk I/O, always fresh Cons: Slow startup for large collections

13.10 Concurrent Access

When multiple processes may access the collection:

Read-Only Access

Multiple readers can safely share a cache. Use file locking or SQLite's WAL mode.

Write Access

When one process writes:

Acquire exclusive lock
Perform operation
Update cache
Release lock

Implementations SHOULD document concurrency behavior.

13.11 Cache Warming

For large collections, initial cache build can take time. Implementations MAY support:

Eager warming: Build cache on first access

mdbase cache build

Lazy warming: Build entries on first query for each file

Background warming: Build cache asynchronously while serving queries

13.12 Cache Versioning

Cache format may change between implementation versions. Include a version marker:

{
  "version": "1.0",
  "spec_version": "0.2.1",
  "built_at": "2024-03-15T10:30:00Z"
}

When loading cache:

Check version compatibility
If incompatible, trigger full rebuild
Log version mismatch for debugging

14. Conformance

This section defines conformance levels and testing requirements for implementations.

14.1 Conformance Levels

Implementations may claim conformance at different levels. Each level builds on all previous levels.

Level 1: Core

Required capabilities:

Parse mdbase.yaml configuration
Locate and scan markdown files in collection
Parse YAML frontmatter from files
Handle null values correctly (per §3)
Load type definitions from types folder
Single-type matching via explicit declaration (type field)
Validate fields against type schemas
Implement Create, Read, Update, Delete operations
Type coercion (per §7.16)

Test coverage: Basic parsing, validation, CRUD operations, concurrency (basic mtime conflict detection)

Level 2: Matching

Additional capabilities:

Path-based type matching (path_glob)
Field presence matching (fields_present)
Field value matching (where conditions in match rules)
Multi-type matching (files matching multiple types)
Multi-type validation with constraint merging (per §6.5)

Test coverage: Match rule evaluation, multi-type scenarios, constraint merging

Level 3: Querying

Additional capabilities:

Core query model (filter, sort, limit, offset)
Expression evaluation (all operators in §11)
String, list, and object methods
Date arithmetic (including date subtraction)
Duration parsing (duration())
Null coalescing (??)

Test coverage: Core query correctness, expression edge cases, body_search, computed_fields

Level 4: Links

Additional capabilities:

Parse all link formats (wikilink, markdown, bare path)
Resolve links to files
asFile() traversal in expressions
file.hasLink() and file.hasTag() functions
file.links property

Test coverage: Link parsing, resolution, traversal

Level 5: References

Additional capabilities:

Rename with reference updates
Backlink computation (file.backlinks)
Body link detection and update

Test coverage: Reference update correctness, backlink accuracy

Level 6: Full

All capabilities including:

Caching with staleness detection (per §13)
Batch operations
Watch mode with event delivery (per §15)
Type creation via tooling
Nested collection detection (per §2)
Migration manifests and backfill (per §5.11.1 and §12.8/§12.13)

Test coverage: Performance, cache correctness, watching, edge cases

14.1.1 Optional Profiles (Non-Normative)

Implementations MAY support optional profiles beyond the core levels. The Query+ profile adds advanced query features (formulas, groupBy, summaries, property_summaries, properties) as defined in Querying §10.7. Support for Query+ is not required for conformance.

14.2 Conformance Claims

Implementations SHOULD clearly state their conformance level:

mdbase-tool v1.0.0
Conformance: Level 4 (Links)
Specification: 0.2.1

Implementations MAY implement features from higher levels while claiming a lower level, but SHOULD NOT claim a level without passing all tests for that level.

14.3 Test Suite

A conformance test suite is provided as a collection of test cases. Each test group specifies a spec_ref identifying the specification section(s) under test. Individual test cases MAY also include a spec_ref to pinpoint the exact clause being validated.

The spec_ref field uses section numbers (e.g., "§7.2", "§3.4, §7.2"). When a test case omits spec_ref, it inherits the group-level reference.

name: "required field validation"
level: 1
category: validation
spec_ref: "§7.2"

setup:
  config: |
    spec_version: "0.2.1"
  types:
    task.md: |
      ---
      name: task
      fields:
        title:
          type: string
          required: true
      ---
  files:
    tasks/valid.md: |
      ---
      type: task
      title: "Valid task"
      ---
    tasks/invalid.md: |
      ---
      type: task
      ---

tests:
  - name: "valid file passes validation"
    operation: validate
    input:
      path: "tasks/valid.md"
    expect:
      valid: true
      issues: []

  - name: "missing required field fails validation"
    spec_ref: "§7.2"
    operation: validate
    input:
      path: "tasks/invalid.md"
    expect:
      valid: false
      issues:
        - code: missing_required
          field: title

Test Categories

Category	Description	Spec Reference
`config`	Configuration parsing and validation	§4
`types`	Type definition loading and inheritance	§5
`matching`	Type matching rules	§6
`validation`	Schema validation	§9
`expressions`	Expression evaluation	§11
`queries`	Query execution	§10
`links`	Link parsing and resolution	§8
`operations`	CRUD operations	§12
`references`	Reference updates	§12.5
`migration`	Migration manifests and backfill	§5.11.1 / §12.8
`caching`	Cache behavior	§13
`concurrency`	Concurrent modification detection	§12.11
`watching`	Watch mode event delivery	§15
`body_search`	Body content filtering	§10.5
`computed_fields`	Computed field evaluation	§5.12

14.3.1 Extended Test Format

The conformance test suite uses several extensions beyond the base format shown in §14.3. Test runners MUST support these extensions to execute the full suite.

Query Usage at Lower Levels

Some Level 1–2 tests use operation: query as test infrastructure for collection scanning and path discovery. Implementations claiming Level 1–2 conformance MUST support a minimal query subset sufficient for these tests:

types filtering
folder filtering
order_by with file.path
limit and offset
meta.total_count and meta.has_more

This subset does not require expression evaluation, computed fields, or body search. Full query semantics remain a Level 3 requirement.

Conformance Level Integrity (Non-Normative)

Test suites SHOULD include a guardrail that validates each test file’s declared level against the spec_ref sections it cites. This prevents accidental drift where lower-level suites require higher-level features. A simple static mapping from spec sections to conformance levels is sufficient.

Extended Operations

Operation	Description	Spec Reference
`batch_update`	Bulk update matching files	§12.7
`batch_delete`	Bulk delete matching files	§12.7
`backfill`	Backfill missing fields across files	§12.8
`create_type`	Create a type definition file	§5.9
`init`	Initialize a new collection	§12.12
`migrate`	Apply a migration manifest	§12.13
`watch`	Start filesystem watcher and simulate events	§15

Extended Input Fields

Field	Type	Description
`simulate`	object	Injects side-effects between read and write phases (e.g., `external_modify`, `external_create`, `io_error_on`)
`context_file`	string	Path to the file that provides the `this` context for embedded-query expression testing
`dry_run`	boolean	When `true`, validates changes without writing to disk

Extended Assertion Fields

Field	Type	Description
`verify_after`	object	Executes a follow-up operation after the primary test and checks its `expect` block (e.g., verifying atomicity after failure)
`frontmatter_written`	mapping	Asserts exact field values that were persisted to disk (as opposed to effective values with defaults)
`frontmatter_not_written`	list	Asserts that named fields were NOT persisted to disk
`frontmatter_not_bare_null`	list	Asserts that named fields are not written in the bare `field:` form (empty-value null)
`frontmatter_changed`	list	Asserts that listed fields have different values after the operation compared to before
`message_present`	boolean	On a validation issue, asserts that the `message` field exists and is a non-empty string
`mtime_present`	boolean	Asserts that `file.mtime` exists and is a valid datetime
`size_positive`	boolean	Asserts that `file.size` exists and is a positive integer
`ctime_present`	boolean	Asserts that `file.ctime` exists and is a valid datetime
`batch_result`	object	Asserts batch operation outcome with `total`, `succeeded`, `failed` counts

Extended Setup Fields

Field	Type	Description
`encoding`	string	File encoding for setup files (e.g., `"latin-1"`). Default is UTF-8
`line_endings`	string	Line ending style: `"LF"` or `"CRLF"`
nested collections	—	A `setup.files` entry like `sub-project/mdbase.yaml` creates a nested collection marker that the implementation must detect and exclude

Implementor On-Ramp (Non-Normative)

For practical implementation guidance and runnable fixtures, see:

`IMPLEMENTING.md` for a Level 1 quickstart
`REFERENCE-RUNNER.md` for adapter protocol details
`examples/adapter-template.py` for a minimal adapter scaffold
`QUICK-REFERENCE.md` for a compact behavior card
`examples/annotated-collection/` for a manual test collection

14.4 Required Test Coverage

For each conformance level, implementations MUST pass:

Level	Required Categories
1	config, types (basic), validation (basic), operations, concurrency
2	+ matching
3	+ expressions, queries, body_search, computed_fields
4	+ links
5	+ references
6	+ caching, watching

Out-of-Scope Checks

Some MUST-level behaviors are difficult to test portably (e.g., OS-level permission_denied errors, or logging requirements such as “do not log expanded environment variable values”). These behaviors remain normative, but the conformance suite does not currently enforce them. Implementations SHOULD document compliance for these items.

14.5 Test Execution

Test suite can be run against any implementation:

# Run all tests
python scripts/mdbase-test.py run --impl ./my-impl

# Run specific level
python scripts/mdbase-test.py run --impl ./my-impl --level 3

# Run specific category
python scripts/mdbase-test.py run --impl ./my-impl --category validation

14.6 Implementation Notes

Edge Cases to Handle

Implementations should correctly handle:

Empty frontmatter (---\n---)
File without frontmatter
Frontmatter with only null values
Empty collection (no files)
Type with no fields defined
File matching zero types
File matching multiple conflicting types
Circular type inheritance (should error)
Self-referential links
Links to non-existent files
Very long field values
Unicode in field names and values
Files with unusual characters in names

Performance Expectations

While not strictly required, implementations SHOULD:

Operation	Target	Collection Size
Read single file	< 10ms	Any
Query by type	< 100ms	1000 files
Query with filter	< 500ms	1000 files
Link resolution	< 10ms	Any
Backlink query	< 1s	1000 files (cached)

Error Messages

Implementations SHOULD provide helpful error messages:

❌ Validation failed: tasks/task-001.md

  Field 'priority' has invalid value
    Expected: integer between 1 and 5
    Actual: "high" (string)
    
  At line 5 in frontmatter:
    priority: high
              ^^^^

  Hint: Use a number like `priority: 3`

14.7 Extensions

Implementations MAY extend the specification with additional features:

Custom field types
Additional expression functions
Query output formats
Integration hooks
Custom validation rules

Extensions SHOULD:

Be clearly documented as non-standard
Not conflict with standard behavior
Be optional (spec-compliant usage should work without them)

Extensions SHOULD NOT:

Change the meaning of standard features
Require non-standard syntax for basic operations
Break interoperability with compliant tools

14.8 Reporting Issues

If the specification is ambiguous or conflicts with practical implementation needs, please report issues to the specification maintainers. The goal is a spec that is:

Clear and unambiguous
Implementable in any language
Useful for real-world applications

15. Watching

This section defines the watch mode event model for monitoring a collection for changes. Watch mode is a required capability for Level 6 conformance.

15.1 Overview

Watch mode enables implementations to monitor a collection for filesystem changes and emit structured events. This supports real-time UIs, continuous validation, and incremental cache updates.

15.2 Event Types

Event	Trigger	Payload Fields
`file_created`	New markdown file detected	`path`, `types`, `frontmatter`
`file_modified`	Existing file content changed	`path`, `types`, `frontmatter`, `changed_fields`
`file_deleted`	Markdown file removed	`path`, `last_known_types`
`file_renamed`	File moved/renamed (if detectable)	`from`, `to`, `types`
`type_changed`	Type definition file modified	`type_name`, `affected_files`
`config_changed`	`mdbase.yaml` modified	`previous_hash`, `new_hash`
`validation_error`	File fails validation after change	`path`, `issues`

When present, frontmatter in events is the effective frontmatter (defaults applied, computed excluded).

15.3 Event Payload Structure

All events include a common set of fields:

Field	Type	Description
`event`	string	Event type (e.g., `file_created`)
`timestamp`	datetime	When the event was emitted
`path`	string	File path relative to collection root

Additional fields per event type are listed in §15.2.

Example Payloads

file_created:

event: file_created
timestamp: "2024-03-15T10:30:00Z"
path: "tasks/task-042.md"
types: [task]
frontmatter:  # Effective frontmatter (defaults applied, computed excluded)
  title: "New task"
  status: open

file_modified:

event: file_modified
timestamp: "2024-03-15T10:31:00Z"
path: "tasks/task-042.md"
types: [task]
frontmatter:  # Effective frontmatter (defaults applied, computed excluded)
  title: "New task"
  status: in_progress
changed_fields: [status]  # Raw persisted frontmatter keys that changed

file_deleted:

event: file_deleted
timestamp: "2024-03-15T10:32:00Z"
path: "tasks/task-042.md"
last_known_types: [task]

file_renamed:

event: file_renamed
timestamp: "2024-03-15T10:33:00Z"
from: "tasks/task-042.md"
to: "archive/task-042.md"
types: [task]

15.4 Debouncing

Implementations MUST debounce filesystem events — multiple rapid changes to the same file MUST be coalesced into a single event.

Batch operations: Batch updates/deletes MAY emit multiple per-file events as each file is written. No aggregate batch event is required. Event ordering guarantees still apply per file.

Recommended debounce window: 100–500ms (implementation-defined)
After the debounce window, the implementation reads the file's current state and emits one event reflecting the net change
If a file is created and then immediately deleted within the debounce window, no event is emitted

15.5 Rename Detection

Filesystem watchers typically see a delete followed by a create rather than a rename.

Implementations SHOULD detect renames by correlating a delete and create within a short window (e.g., same content hash, or same id_field value)
If rename detection succeeds, emit a single file_renamed event
If rename detection fails, implementations MUST emit separate file_deleted and file_created events

15.6 Event Delivery

Implementations MUST support callback/listener registration for events
Events MUST be delivered in order per file — events for the same file are delivered sequentially. Events for different files MAY be delivered concurrently
If event processing (in a listener callback) fails, the error MUST NOT stop the watcher. Implementations SHOULD log the error and continue

15.7 Interaction with Caching

When a cache is present (see §13):

Watch events SHOULD trigger incremental cache updates
Cache updates MUST complete before the event is delivered to listeners, so that listeners see consistent state when they query the collection

Appendix A: Complete Examples

This appendix provides complete, working examples of collections and their components.

A.1 Minimal Collection

The simplest valid collection:

minimal/
├── mdbase.yaml
└── hello.md

mdbase.yaml:

spec_version: "0.2.1"

hello.md:

---
title: Hello World
---

This is a minimal collection with one untyped file.

A.2 Task Management Collection

A complete task management setup with types, queries, and examples.

Structure

tasks-project/
├── mdbase.yaml
├── _types/
│   ├── meta.md
│   ├── base.md
│   ├── task.md
│   ├── person.md
│   └── urgent.md
├── people/
│   ├── alice.md
│   └── bob.md
├── tasks/
│   ├── feature-login.md
│   ├── bug-crash.md
│   └── subtasks/
│       └── login-ui.md
└── .mdbase/
    └── (cache files)

Configuration

mdbase.yaml:

spec_version: "0.2.1"

name: "Project Tasks"
description: "Task tracking for the main project"

settings:
  exclude:
    - ".git"
    - "node_modules"
    - ".mdbase"
    - "*.draft.md"
  
  default_validation: "warn"
  default_strict: false
  id_field: "id"
  rename_update_refs: true
  write_nulls: "omit"

Type Definitions

_types/meta.md:

---
name: meta
description: Schema for type definition files

match:
  path_glob: "_types/**/*.md"

strict: false

fields:
  name:
    type: string
    required: true
  fields:
    type: any
---

_types/base.md:

---
name: base

fields:
  id:
    type: string
    required: true
    generated: ulid
  created_at:
    type: datetime
    generated: now
  updated_at:
    type: datetime
    generated: now_on_write
---

# Base Type

Common fields for all tracked entities. Provides automatic ID generation and timestamps.

_types/task.md:

---
name: task
description: A task or todo item with lifecycle tracking
extends: base

match:
  path_glob: "tasks/**/*.md"

path_pattern: "{id}.md"

fields:
  title:
    type: string
    required: true
    min_length: 1
    max_length: 200
    description: Short, descriptive task title
  
  status:
    type: enum
    values: [open, in_progress, blocked, done, cancelled]
    default: open
    description: Current lifecycle state
  
  priority:
    type: integer
    min: 1
    max: 5
    default: 3
    description: "1 = lowest, 5 = highest"
  
  assignee:
    type: link
    target: person
    description: Person responsible for this task
  
  due_date:
    type: date
    description: When the task should be completed
  
  tags:
    type: list
    items:
      type: string
    default: []
    description: Categorization tags
  
  parent:
    type: link
    target: task
    description: Parent task (for subtasks)
  
  blocks:
    type: list
    items:
      type: link
      target: task
    default: []
    description: Tasks that this task blocks
  
  estimate_hours:
    type: number
    min: 0
    description: Estimated hours to complete
---

# Task

Tasks represent discrete units of work tracked through their lifecycle.

## Status Values

| Status | Description |
|--------|-------------|
| `open` | Not started |
| `in_progress` | Currently being worked on |
| `blocked` | Waiting on external dependency |
| `done` | Completed successfully |
| `cancelled` | Will not be done |

## Priority Scale

- **5**: Critical - drop everything
- **4**: High - do this week
- **3**: Medium - normal priority
- **2**: Low - when time permits
- **1**: Someday - nice to have

## Example

```yaml
---
type: task
title: Implement user authentication
status: in_progress
priority: 4
assignee: "[[alice]]"
due_date: 2024-04-01
tags: [feature, security]
estimate_hours: 16
---

## Requirements

- OAuth 2.0 support
- Remember me functionality
- Password reset flow


**_types/person.md:**
```markdown
---
name: person
description: A team member
extends: base

match:
  path_glob: "people/**/*.md"

fields:
  name:
    type: string
    required: true
    description: Full name
  
  email:
    type: string
    description: Email address
  
  team:
    type: string
    description: Team or department
  
  role:
    type: string
    description: Job title or role
  
  active:
    type: boolean
    default: true
    description: Whether currently on the team
---

# Person

Team member records for assignment and reference.

_types/urgent.md:

---
name: urgent
description: Marks items requiring immediate attention

match:
  where:
    tags:
      contains: "urgent"

fields:
  escalation_contact:
    type: string
    description: Who to contact for escalation
  
  sla_hours:
    type: integer
    description: Hours until SLA breach
---

# Urgent

Items tagged "urgent" automatically get this type applied.
This enables additional tracking fields for urgent items.

Sample Files

people/alice.md:

---
type: person
id: alice
name: Alice Chen
email: alice@example.com
team: engineering
role: Senior Developer
active: true
---

Alice is the tech lead for the backend team.

## Expertise
- Authentication systems
- API design
- Performance optimization

tasks/feature-login.md:

---
type: task
id: feature-login
title: Implement user login system
status: in_progress
priority: 4
assignee: "[[alice]]"
due_date: 2024-04-01
tags: [feature, security, auth]
estimate_hours: 24
---

# Login System Implementation

## Overview

Build complete authentication system with OAuth support.

## Subtasks

- [[subtasks/login-ui]] - Frontend components
- Database schema design
- API endpoints
- Testing

tasks/bug-crash.md:

---
types: [task, urgent]
id: bug-crash
title: Fix crash on startup
status: open
priority: 5
assignee: "[[bob]]"
tags: [bug, urgent, production]
escalation_contact: alice@example.com
sla_hours: 4
---

# Critical: App Crashes on Startup

## Symptoms

App crashes immediately when user opens it.

## Impact

100% of users affected.

## Workaround

None known.

A.3 Query Examples

Core Examples

All Open Tasks

query:
  types: [task]
  where: 'status == "open"'
  order_by:
    - field: priority
      direction: desc

My Tasks (Assigned to Alice)

query:
  types: [task]
  where:
    and:
      - 'assignee.asFile().id == "alice"'
      - 'status != "done"'
  order_by:
    - field: due_date
      direction: asc

Tasks Due This Week

query:
  types: [task]
  where:
    and:
      - "due_date >= today()"
      - "due_date <= today() + '7d'"
      - 'status != "done"'
  order_by:
    - field: due_date
      direction: asc

High Priority Blockers

query:
  types: [task]
  where:
    and:
      - "priority >= 4"
      - 'status == "blocked"'

Urgent Items (Multi-Type)

query:
  where: 'types.contains("urgent")'
  order_by:
    - field: sla_hours
      direction: asc

Query+ Examples

Overdue Tasks (Query+)

query:
  types: [task]
  where:
    and:
      - "due_date < today()"
      - 'status != "done"'
      - 'status != "cancelled"'
  formulas:
    days_overdue: "(today() - due_date) / 86400000"  # date subtraction returns milliseconds
  order_by:
    - field: formula.days_overdue
      direction: desc

Workload by Person (Query+)

query:
  types: [task]
  where: 'status != "done" && exists(assignee)'
  formulas:
    assignee_name: "assignee.asFile().name"

To group by assignee (Query+):

query:
  types: [task]
  where: 'status != "done" && exists(assignee)'
  formulas:
    assignee_name: "assignee.asFile().name"
  groupBy:
    property: formula.assignee_name
    direction: ASC
  property_summaries:
    estimate_hours: Sum

A.4 Knowledge Base Collection

A personal wiki / knowledge base setup.

Structure

knowledge-base/
├── mdbase.yaml
├── _types/
│   ├── document.md
│   ├── concept.md
│   ├── source.md
│   └── daily.md
├── concepts/
│   ├── machine-learning.md
│   └── distributed-systems.md
├── sources/
│   └── attention-paper.md
├── daily/
│   └── 2024/
│       └── 03/
│           └── 15.md
└── inbox/
    └── random-thought.md

Types

_types/document.md:

---
name: document
description: General note or document

match:
  path_glob: "**/*.md"

fields:
  title:
    type: string
  tags:
    type: list
    items:
      type: string
    default: []
  related:
    type: list
    items:
      type: link
    default: []
---

# Document

Base type for all notes. Matched by default for any markdown file.

_types/concept.md:

---
name: concept
description: A concept or topic being studied
extends: document

match:
  path_glob: "concepts/**/*.md"

fields:
  aliases:
    type: list
    items:
      type: string
    default: []
    description: Alternative names for this concept
  
  status:
    type: enum
    values: [stub, developing, mature]
    default: stub
  
  sources:
    type: list
    items:
      type: link
      target: source
    default: []
---

# Concept

Represents a topic or idea being studied. Evolves from stub to mature.

_types/daily.md:

---
name: daily
description: Daily journal entry
extends: document

match:
  path_glob: "daily/**/*.md"

path_pattern: "{date}.md"

fields:
  date:
    type: date
    required: true
  
  mood:
    type: enum
    values: [great, good, okay, rough, bad]
  
  highlights:
    type: list
    items:
      type: string
    default: []
---

# Daily Note

Journal entries organized by date.

A.5 CLI Workflow Example

# Initialize a new collection
mkdir my-project && cd my-project
mdbase init
# Creates mdbase.yaml, _types/, and _types/meta.md

# Create a type
mdbase type create task

# Create a task
mdbase create task \
  --field title="Build the thing" \
  --field priority=4

# List all tasks
mdbase query --type task

# Find overdue tasks
mdbase query --type task --where 'due_date < today() && status != "done"'

# Update a task
mdbase update tasks/01ABC.md --field status=done

# Rename with reference updates
mdbase rename tasks/old-name.md tasks/new-name.md

# Validate the collection
mdbase validate

# Rebuild cache
mdbase cache rebuild

Appendix B: Expression Grammar

This appendix provides a formal grammar for the expression language used in filters, match conditions, and formulas.

B.1 Grammar Notation

This grammar uses Extended Backus-Naur Form (EBNF):

= defines a production rule
| denotes alternatives
[ ] denotes optional elements
{ } denotes zero or more repetitions
" " denotes literal strings
( ) groups elements
/* */ are comments

B.2 Complete Grammar

(* Top-level *)
expression = null_coalescing_expression ;

(* Null coalescing - low precedence *)
null_coalescing_expression = or_expression { "??" or_expression } ;

(* Logical operators *)
or_expression = and_expression { "||" and_expression } ;

and_expression = not_expression { "&&" not_expression } ;

not_expression = "!" not_expression
               | comparison_expression ;

(* Comparison operators *)
comparison_expression = additive_expression [ comparison_op additive_expression ] ;

comparison_op = "==" | "!=" | "<" | ">" | "<=" | ">=" ;

(* Arithmetic operators *)
additive_expression = multiplicative_expression { ( "+" | "-" ) multiplicative_expression } ;

multiplicative_expression = unary_expression { ( "*" | "/" | "%" ) unary_expression } ;

unary_expression = "-" unary_expression
                 | postfix_expression ;

(* Property access and function calls *)
postfix_expression = primary_expression { postfix_op } ;

postfix_op = "." identifier [ call_arguments ]   (* Method or property *)
           | "[" expression "]"                   (* Index access *)
           | call_arguments ;                     (* Function call *)

call_arguments = "(" [ argument_list ] ")" ;

argument_list = expression { "," expression } ;

(* Primary expressions *)
primary_expression = literal
                   | identifier
                   | "(" expression ")"
                   | if_expression
                   | list_literal ;

(* If expression *)
if_expression = "if" "(" expression "," expression "," expression ")" ;

(* Literals *)
literal = string_literal
        | number_literal
        | boolean_literal
        | null_literal ;

string_literal = '"' { string_char } '"'
               | "'" { string_char } "'" ;

string_char = /* any character except quote or backslash */
            | escape_sequence ;

escape_sequence = "\\" ( '"' | "'" | "\\" | "n" | "r" | "t" ) ;

number_literal = integer_literal [ exponent ]
               | float_literal ;

integer_literal = [ "-" ] digit { digit } ;

float_literal = [ "-" ] digit { digit } "." digit { digit } [ exponent ] ;

exponent = ( "e" | "E" ) [ "+" | "-" ] digit { digit } ;

boolean_literal = "true" | "false" ;

null_literal = "null" ;

list_literal = "[" [ expression { "," expression } ] "]" ;

(* Identifiers *)
identifier = ( letter | "_" ) { letter | digit | "_" } ;

letter = "a" | "b" | ... | "z" | "A" | "B" | ... | "Z" ;

digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

B.3 Operator Precedence

From highest to lowest precedence:

Level	Operators	Associativity	Description
1	`( )`	—	Grouping
2	`.` `[]` `()`	Left-to-right	Property access, index, call
3	`!` `-` (unary)	Right-to-left	Logical NOT, negation
4	`*` `/` `%`	Left-to-right	Multiplication, division, modulo
5	`+` `-`	Left-to-right	Addition, subtraction
6	`<` `<=` `>` `>=`	Left-to-right	Comparison
7	`==` `!=`	Left-to-right	Equality
8	`&&`	Left-to-right	Logical AND
9	`		`
10	`??`	Left-to-right	Null coalescing

B.4 Reserved Words

The following identifiers are reserved:

true
false
null
if
note
file
formula
this

These cannot be used as bare field names without bracket notation (e.g., use note["file"] to access a frontmatter field named file).

The keywords note, file, formula, and this serve as namespace prefixes for property access (see Querying §10.5). note. accesses raw persisted frontmatter; bare field names access effective frontmatter. When a frontmatter field name collides with a namespace keyword, use the note. prefix with bracket notation: note["file"], note["formula"].

`file` Namespace Properties

The file namespace provides access to file metadata. The following properties are valid under file.:

file.name       file.basename   file.path       file.folder
file.ext        file.size       file.ctime      file.mtime
file.body       file.links      file.backlinks  file.tags
file.properties file.embeds

file.body is a string containing the raw markdown body content (everything after the frontmatter closing ---). It supports all string methods defined in §11.5.

Chained Method Calls

The grammar supports chained method calls through the recursive postfix_op production. This explicitly includes chained .asFile() for multi-hop link traversal:

assignee.asFile().manager.asFile().name

This is parsed as a sequence of postfix operations:

postfix_expression
├── identifier: "assignee"
├── method_call: "asFile" ()
├── property_access: "manager"
├── method_call: "asFile" ()
└── property_access: "name"

Implementations MUST enforce a maximum traversal depth (default: 10) to prevent unbounded chains. See §8.7 for traversal rules.

B.5 Whitespace and Comments

Whitespace (spaces, tabs, newlines) is ignored except within string literals.

Comments are not supported in the expression language. (Use YAML comments in query files instead.)

B.6 String Escaping

Within string literals:

Escape	Meaning
`\\`	Backslash
`\"`	Double quote
`\'`	Single quote
`\n`	Newline
`\r`	Carriage return
`\t`	Tab

B.7 Duration Literals

Duration literals are strings with a special format used in date arithmetic:

duration_literal = string_literal ;  (* Must match duration pattern *)

duration_pattern = number duration_unit ;

duration_unit = "y" | "year" | "years"
              | "M" | "month" | "months"
              | "w" | "week" | "weeks"
              | "d" | "day" | "days"
              | "h" | "hour" | "hours"
              | "m" | "minute" | "minutes"
              | "s" | "second" | "seconds" ;

Examples: "7d", "2 weeks", "1h", "30m"

Note: Durations are regular strings; the arithmetic operators recognize them contextually.

B.8 Parse Examples

Simple Comparison

status == "open"

Parse tree:

comparison_expression
├── additive_expression
│   └── primary_expression
│       └── identifier: "status"
├── comparison_op: "=="
└── additive_expression
    └── primary_expression
        └── string_literal: "open"

Combined Logic

priority >= 3 && status != "done"

Parse tree:

and_expression
├── comparison_expression
│   ├── identifier: "priority"
│   ├── ">="
│   └── number_literal: 3
└── comparison_expression
    ├── identifier: "status"
    ├── "!="
    └── string_literal: "done"

Method Chain (Arrow Extension Only)

tags.filter(t => t.startsWith("bug")).length > 0

Parse tree (only valid if the optional arrow-function extension is enabled):

comparison_expression
├── postfix_expression
│   ├── postfix_expression
│   │   ├── identifier: "tags"
│   │   └── method_call: "filter"
│   │       └── lambda_expression
│   │           ├── parameter: "t"
│   │           └── method_call
│   │               ├── identifier: "t"
│   │               └── method: "startsWith"
│   │                   └── string_literal: "bug"
│   └── property_access: "length"
├── ">"
└── number_literal: 0

Conditional

if(priority > 3, "high", "normal")

Parse tree:

if_expression
├── condition
│   └── comparison_expression
│       ├── identifier: "priority"
│       ├── ">"
│       └── number_literal: 3
├── then_value
│   └── string_literal: "high"
└── else_value
    └── string_literal: "normal"

B.9 Implementation Notes

Tokenization

Recommended token types:

STRING        : '"' ... '"' | "'" ... "'"
NUMBER        : [0-9]+ ('.' [0-9]+)?
IDENTIFIER    : [a-zA-Z_][a-zA-Z0-9_]*
BOOLEAN       : 'true' | 'false'
NULL          : 'null'
IF            : 'if'
OPERATOR      : '==' | '!=' | '<=' | '>=' | '<' | '>' | '&&' | '||' | '!' | '+' | '-' | '*' | '/' | '%' | '??' | '=>'
               (* include '=>' only if the optional arrow-function extension is enabled *)
PUNCTUATION   : '(' | ')' | '[' | ']' | '.' | ','

Error Recovery

When parsing fails, implementations SHOULD:

Report the position of the error
Provide context (surrounding tokens)
Suggest likely fixes for common errors

Example error message:

Expression parse error at position 15:
  status == "open" && 
                      ^
  Expected: expression
  Found: end of input
  
  Hint: Expression is incomplete after '&&'

B.10 Optional Arrow-Function Extension

Implementations MAY support arrow-function syntax in list methods. If supported, the following grammar is added for lambda expressions used within argument lists:

lambda_expression = identifier "=>" expression
                 | "(" [ parameter_list ] ")" "=>" expression ;

parameter_list = identifier { "," identifier } ;

Appendix C: Error Codes

This appendix defines standard error codes for validation issues and operation errors.

C.1 Validation Error Codes

Field Errors

Code	Description	Example
`missing_required`	Required field is absent or null	Field `title` is required
`type_mismatch`	Value doesn't match declared type	Expected integer, got string
`constraint_violation`	Value violates min/max/pattern/etc	Value 7 exceeds max of 5
`invalid_enum`	Value not in enum options	"pending" not in [open, done]
`unknown_field`	Field not in schema (strict mode)	Unknown field "custom"
`deprecated_field`	Field is marked deprecated	Field "old_name" is deprecated
`duplicate_id`	`id_field` value is not unique in collection	`id: task-001` appears in multiple files
`duplicate_value`	Field value violates cross-file `unique` constraint	`slug: "my-post"` appears in multiple files

List Errors

Code	Description	Example
`list_too_short`	List has fewer than min_items	Minimum 1 item required
`list_too_long`	List has more than max_items	Maximum 10 items allowed
`list_duplicate`	Duplicate in list with unique=true	Duplicate value "a"
`list_item_invalid`	List item fails `items` validation (item-level cause may be included)	Item [2] type mismatch

String Errors

Code	Description	Example
`string_too_short`	String shorter than min_length	Minimum 1 character required
`string_too_long`	String longer than max_length	Maximum 200 characters allowed
`pattern_mismatch`	String doesn't match pattern	Must match "^[A-Z].*"

Number Errors

Code	Description	Example
`number_too_small`	Number below min	Value -1 below min of 0
`number_too_large`	Number above max	Value 100 above max of 10
`not_integer`	Expected integer, got float	3.5 is not an integer

Link Errors

Code	Description	Example
`invalid_link`	Link cannot be parsed	Malformed wikilink
`link_not_found`	Link target doesn't exist	Target "[[missing]]" not found
`link_wrong_type`	Target is wrong type	Expected person, found task
`ambiguous_link`	Multiple candidates for a simple-name link (error on ID ambiguity, warning if still ambiguous after tiebreakers)	"[[note]]" matches notes/note.md and archive/note.md

Date/Time Errors

Code	Description	Example
`invalid_date`	Cannot parse as date	"tomorrow" is not ISO date
`invalid_datetime`	Cannot parse as datetime	Invalid datetime format
`invalid_time`	Cannot parse as time	Invalid time format

C.2 Type System Errors

Code	Description	Example
`unknown_type`	Type name not defined	Type "taks" not found
`circular_inheritance`	Type inheritance forms cycle	task → base → task
`missing_parent_type`	Parent type doesn't exist	Parent "base" not found
`type_conflict`	Multi-type field incompatibility	"status" defined as string and enum
`invalid_type_definition`	Type file has invalid schema	Missing required "name" field
`circular_computed`	Circular dependency between computed fields	`full_name` depends on `display` depends on `full_name`

C.3 Operation Errors

File Operations

Code	Description	Example
`file_not_found`	File doesn't exist	tasks/missing.md not found
`path_conflict`	File already exists at target path (on create or rename)	tasks/task.md already exists
`path_required`	Cannot determine file path	No path provided or derivable
`invalid_path`	Path is malformed (e.g., null bytes, control characters)	Path contains invalid characters
`invalid_frontmatter`	Frontmatter YAML cannot be parsed	YAML syntax error in frontmatter
`validation_failed`	Frontmatter fails validation against type schema(s)	Contains individual validation issues (see §C.1)
`invalid_request`	Operation input is missing required parameters or is otherwise invalid	Backfill requires `type` or `where`
`invalid_migration`	Migration manifest is malformed or contains invalid steps	Missing `steps` array
`migration_failed`	Migration step failed during execution	Backfill step failed validation
`permission_denied`	Filesystem permission error	Cannot write to file
`concurrent_modification`	File was modified by another process during operation	File mtime changed between read and write
`path_traversal`	Path resolution would escape collection root. Applies to any path resolution — link resolution, operation input paths, computed paths	`[[../../../etc/passwd]]` escapes root
`match_failed`	Created record does not satisfy explicit type match rules	Type match rules not satisfied by final record

Rename Operations

Code	Description	Example
`rename_ref_update_failed`	Reference update failed for one or more files	Could not update links in X

Configuration Errors

Code	Description	Example
`invalid_config`	Config file malformed	YAML parse error
`missing_config`	No mdbase.yaml found	Not a collection
`unsupported_version`	spec_version not supported	Version 0.3.0 not supported

C.4 Expression Errors

Code	Description	Example
`invalid_expression`	Expression syntax error	Unexpected token
`unknown_function`	Function doesn't exist	Unknown function "foo"
`wrong_argument_count`	Wrong number of arguments	if() requires 3 arguments
`type_error`	Type error in expression	Cannot add string and number
`expression_depth_exceeded`	Expression nesting or traversal exceeded maximum depth	General expression nesting exceeds 64-level limit (see §11.18.1), or chained `asFile()` calls exceed 10-hop limit (see §8.7)

C.5 Formula Errors

Code	Description	Example
`circular_formula`	Formula references form cycle	a refs b refs a
`invalid_formula`	Formula expression invalid	Parse error in formula
`formula_evaluation_error`	Runtime error in formula	Division by zero

C.6 Error Response Format

Errors SHOULD be returned in a consistent format:

Single Error

{
  "error": {
    "code": "file_not_found",
    "message": "File 'tasks/missing.md' not found",
    "path": "tasks/missing.md"
  }
}

Validation Errors

{
  "valid": false,
  "errors": [
    {
      "path": "tasks/task-001.md",
      "field": "priority",
      "code": "constraint_violation",
      "message": "Value 7 exceeds maximum of 5",
      "severity": "error",
      "expected": { "max": 5 },
      "actual": 7,
      "type": "task",
      "line": 5,
      "column": 11,
      "end_line": 5,
      "end_column": 12
    },
    {
      "path": "tasks/task-001.md",
      "field": "custom_field",
      "code": "unknown_field",
      "message": "Field 'custom_field' is not defined in type 'task'",
      "severity": "warning",
      "type": "task",
      "line": 8,
      "column": 1,
      "end_line": 8,
      "end_column": 27
    }
  ],
  "warnings": 1,
  "errorCount": 1
}

C.7 Error Severity

Severity	Description	Effect
`error`	Definite problem	Fails validation at error level
`warning`	Potential problem	Reported but doesn't fail
`info`	Informational	Logged, no effect

C.8 Human-Readable Messages

Error messages SHOULD be:

Clear: State what went wrong
Specific: Include relevant values
Actionable: Suggest how to fix

Good example:

Field 'priority' has value 7, but maximum allowed is 5.
Change the value to 5 or less.

Bad example:

Constraint violation on priority.

C.9 Exit Codes (CLI)

For CLI implementations:

Code	Description
0	Success
1	General error
2	Validation error(s)
3	Configuration error
4	File not found
5	Permission denied

Appendix D: Compatibility Notes

This appendix describes compatibility with existing tools and migration paths from other systems.

D.1 Obsidian Bases Compatibility

This specification was designed with Obsidian Bases compatibility as a goal. Many expression and query patterns are directly compatible.

Compatible Features

Feature	This Spec	Obsidian Bases
Property access	`status`, `file.name`	Same
Comparison	`==`, `!=`, `<`, `>`, `<=`, `>=`	Same
Boolean logic	`&&`, `\|\|`, `!`	Same
Date functions	`now()`, `today()`	Same
Date arithmetic	`date + "7d"`	Same
String methods	`.contains()`, `.startsWith()`	Same
List methods	`.contains()`, `.length`	Same
Link traversal	`link.asFile()`	Same
File metadata	`file.mtime`, `file.path`, `file.tags`	Same
Context	`this.file`, `this.property`	Same
Logical structure	`and:`, `or:`, `not:` in YAML	Same
Type checking	`.isType("string")`	Same
Type conversion	`number()`, `list()`, `.toString()`	Same
List methods	`.unique()`, `.reduce()`, `.reverse()`	Same
String methods	`.lower()`, `.upper()`, `.split()`	Same
Summaries	`values` keyword, default functions	Same
Grouping	`groupBy` (single property)	Same

Extended Features

This specification adds features not in Obsidian Bases:

Type definitions as markdown files (version-controlled schemas)
Multi-type matching with constraint merging
Formal validation with error codes and levels
Rename with reference updates
CRUD operations specification
Generated fields (ULID, UUID, timestamps)
Filename patterns with slug generation
Match rules for automatic type assignment
Nested collection detection
Security considerations (ReDoS, resource limits)

Differences

Aspect	This Spec	Obsidian Bases
Type storage	Markdown files in types folder	Obsidian internal
Configuration	`mdbase.yaml`	Obsidian settings
Views	Not specified (query only)	Table, Board, Gallery, etc.
Grouping	`groupBy` clause (single property, per §10.7)	Built-in groupBy
Summaries	`property_summaries` and custom `summaries` (per §10.7, §11.14)	Built-in summary functions
Lambda style	Implicit variables (`value`, `index`, `acc`); arrow syntax optional	Implicit variables
Method names	`.lower()`, `.upper()`, `.title()`	Same

Optional Compatibility Profile (Non-Normative)

Implementations MAY provide an optional "Bases compatibility" profile that mirrors Obsidian Bases query and expression behavior. This is not a required part of conformance. If provided, tools SHOULD document:

Which Bases features are supported
Any behavioral differences
How to enable the profile (if applicable)

Migration from Bases Queries

Most Bases queries work directly:

# Bases query
types: [task]
where: 'status == "open"'
order_by:
  - field: due_date
    direction: asc

# This spec: identical!

D.2 Dataview Compatibility

Dataview is a popular Obsidian plugin with its own query language. Here's how to migrate common patterns.

Query Migration

Dataview	This Spec
`FROM "tasks"`	`folder: "tasks"`
`WHERE status = "open"`	`where: 'status == "open"'`
`WHERE contains(tags, "urgent")`	`where: 'tags.contains("urgent")'`
`SORT due_date ASC`	`order_by: [{field: due_date, direction: asc}]`
`LIMIT 10`	`limit: 10`

Full Example

Dataview:

TABLE title, status, due_date
FROM "tasks"
WHERE status != "done"
SORT due_date ASC
LIMIT 20

This Spec:

query:
  folder: "tasks"
  where: 'status != "done"'
  order_by:
    - field: due_date
      direction: asc
  limit: 20

Unsupported Dataview Features

Feature	Notes
Inline fields (`field::`)	Not supported; use frontmatter
TABLE format	Implementations define output format
LIST format	Implementations define output format
TASK queries	Use `where` with checkbox fields
CALENDAR view	Implementation-specific
DataviewJS	Not applicable

D.3 Hugo/Jekyll Front Matter

Static site generators use frontmatter similarly but with different conventions.

Hugo

Hugo uses specific frontmatter keys:

Hugo	This Spec
`title`	Same (user-defined)
`date`	Same (user-defined)
`draft`	Same (user-defined)
`weight`	Same (user-defined)
`taxonomies`	Use `list` fields

Migration: Hugo content mostly works directly. Define types that match your content structure.

Jekyll

Jekyll collections map naturally:

# Jekyll _config.yml
collections:
  posts:
    output: true
  projects:
    output: true

# This spec mdbase.yaml
spec_version: "0.2.1"
settings:
  types_folder: "_types"
# Define post and project types

D.4 Notion Export Compatibility

When exporting from Notion to markdown:

Database Properties → Frontmatter

Notion databases export properties as frontmatter:

---
title: My Page
Status: In Progress
Due Date: 2024-03-15
Tags:
  - important
  - review
---

Type Creation

Create types matching your Notion databases:

# _types/notion-task.md
---
name: notion-task
fields:
  title:
    type: string
  Status:
    type: enum
    values: [Not Started, In Progress, Done]
  "Due Date":
    type: date
  Tags:
    type: list
    items:
      type: string
---

Note: Notion uses spaces in property names. Use bracket notation in queries: note["Due Date"].

D.5 Logseq Compatibility

Logseq uses a block-based structure with page properties.

Page Properties

Logseq page properties map to frontmatter:

title:: My Page
status:: open
tags:: #task #urgent

Becomes:

---
title: My Page
status: open
tags: [task, urgent]
---

Block Properties

Logseq block properties don't have a direct equivalent. Consider:

Converting important blocks to separate files
Using structured frontmatter objects
Using the body for detailed content

D.6 Tana Compatibility

Tana exports use a specific JSON/markdown format. Key mappings:

Tana	This Spec
Supertags	Types
Fields	Frontmatter fields
References	Links

D.7 Migration Strategies

Incremental Migration

Start untyped: Import files without types
Add types gradually: Create types for one category at a time
Enable validation: Move from off to warn to error
Enforce strictness: Enable strict: true when ready

Automated Migration

# 1. Initialize collection
mdbase init

# 2. Scan existing files and suggest types
mdbase infer-types --output _types/

# 3. Review and adjust generated types
# (manual step)

# 4. Validate and fix issues
mdbase validate --fix --dry-run
mdbase validate --fix

Handling Legacy Fields

For fields that don't fit the new schema:

# Option 1: Type with any field for legacy data
fields:
  legacy:
    type: any
    deprecated: true

# Option 2: Loose strictness during migration
strict: false  # Allow unknown fields

D.8 Tool-Specific Notes

VS Code

Extensions can parse mdbase.yaml for IntelliSense
Frontmatter validation possible via YAML schemas
Query preview via custom webviews

Vim/Neovim

YAML syntax highlighting works for frontmatter
Custom commands can invoke CLI tools
LSP integration possible for validation

Emacs

Org-mode users: consider bidirectional sync
Markdown-mode with YAML support
Custom functions for query execution

D.9 Interoperability Best Practices

Stick to common field types: String, integer, date, list work everywhere
Avoid tool-specific features: Keep frontmatter portable
Use standard date formats: ISO 8601 always
Keep links simple: Wikilinks are most portable
Document your schema: Types are self-documenting
Version your types: Track schema changes in git

#Typed Markdown Collections Specification

#Abstract

#Motivation

#Intended implementers

#What a conforming tool does

#Design Principles

#How It Works

#A collection is a folder with an `mdbase.yaml` marker

#Types are defined as markdown files

#Records are markdown files with typed frontmatter

#Queries filter and sort records using expressions

#Validation is progressive

#Links connect records across the collection

#Conformance is levelled

#Specification Structure

#Versioning

#License

#1. Terminology

#Core Concepts

#Operations and Queries

#Links and References

#Implementation Terms

#RFC 2119 Keywords

#2. Collection Layout

#2.1 Identifying a Collection

#2.2 File Discovery

#Included Files

#Excluded Paths

#Symlinks

#Subdirectory Scanning

#2.3 The Types Folder

#2.4 Path Conventions

#2.5 Reserved Names

#2.6 Minimal Collection Example

#2.7 Recommended Collection Structure

#2.8 Nested Collections

#2.9 Non-Markdown Files

#Status in the Collection

#File Discovery

#Example

#3. Frontmatter Parsing and Serialization

#3.1 Frontmatter Delimiters

#3.2 YAML Parsing Requirements

#Top-Level Structure

#YAML Version

#Character Encoding

#3.3 Null and Empty Value Semantics

#Reading Null Values

#Empty String vs Null

#Missing vs Null

#Summary Table

#Presence vs Meaningful Value

#3.4 Writing Frontmatter

#Never Write Empty-Value Nulls

#Writing Null Values

#Writing Empty Strings

#Writing Empty Lists

#3.5 Formatting Preservation

#SHOULD Preserve

#MUST Preserve

#MAY Normalize

#3.6 Special Characters in Field Names

#3.7 Multi-line Strings

#3.8 Type Coercion

#3.9 Example: Round-Trip Preservation

#4. Configuration

#4.0 File Encoding

#4.1 File Location and Format

#4.2 Minimal Configuration

#4.3 Full Configuration Schema

#4.4 Setting Details

#`spec_version` (Required)

4.4.1 Version Compatibility

#`name` and `description`

#`settings.extensions`

#`settings.exclude`

#`settings.types_folder`

#`settings.migrations_folder`

#`settings.explicit_type_keys`

#`settings.write_defaults`

Typed Markdown Collections Specification

Abstract

Motivation

Intended implementers

What a conforming tool does

Design Principles

How It Works

A collection is a folder with an `mdbase.yaml` marker

Types are defined as markdown files

Records are markdown files with typed frontmatter

Queries filter and sort records using expressions

Validation is progressive

Links connect records across the collection

Conformance is levelled

Specification Structure

Versioning

License

1. Terminology

Core Concepts

Operations and Queries

Links and References

Implementation Terms

RFC 2119 Keywords

2. Collection Layout

2.1 Identifying a Collection

2.2 File Discovery

Included Files

Excluded Paths

Symlinks

Subdirectory Scanning

2.3 The Types Folder

2.4 Path Conventions

2.5 Reserved Names

2.6 Minimal Collection Example

2.7 Recommended Collection Structure

2.8 Nested Collections

2.9 Non-Markdown Files

Status in the Collection

File Discovery

Example

3. Frontmatter Parsing and Serialization

3.1 Frontmatter Delimiters

3.2 YAML Parsing Requirements

Top-Level Structure

YAML Version

Character Encoding

3.3 Null and Empty Value Semantics

Reading Null Values

Empty String vs Null

Missing vs Null

Summary Table

Presence vs Meaningful Value

3.4 Writing Frontmatter

Never Write Empty-Value Nulls

Writing Null Values

Writing Empty Strings

Writing Empty Lists

3.5 Formatting Preservation

SHOULD Preserve

MUST Preserve

MAY Normalize

3.6 Special Characters in Field Names

3.7 Multi-line Strings

3.8 Type Coercion

3.9 Example: Round-Trip Preservation

4. Configuration

4.0 File Encoding

4.1 File Location and Format

4.2 Minimal Configuration

4.3 Full Configuration Schema

4.4 Setting Details

`spec_version` (Required)

`name` and `description`

`settings.extensions`

`settings.exclude`

`settings.types_folder`

`settings.migrations_folder`

`settings.explicit_type_keys`

`settings.write_defaults`

`settings.default_validation`