SPECIFICATION v0.1.0

Typed Markdown Collections Specification

Version: 0.1.0
Status: Draft
Last Updated: 2026-01-30


Abstract

This specification defines the behaviour of tools that treat folders of markdown files as typed, queryable data collections. It covers schema definition, field types, validation, querying, and CRUD operations.


Motivation

Markdown files with YAML frontmatter are a common way to store structured content. The pattern appears in static site generators, knowledge management tools like Obsidian, documentation systems, and increasingly in AI agent frameworks that use markdown for persistent state.

Each of these ecosystems has developed its own conventions for frontmatter structure, querying, and validation. This specification defines one coherent set of behaviours so that:

  • A CLI tool and an editor plugin can operate on the same files with consistent semantics
  • An AI agent can read and write markdown files that a human can also inspect and edit
  • Tool authors have a behaviour contract to implement against rather than inventing new conventions

Intended implementers

CLI tools for querying, validating, and manipulating markdown collections from the command line.

Editor plugins (for Obsidian, VS Code, etc.) that provide validation, autocomplete, and query interfaces. The expression syntax is designed for compatibility with Obsidian Bases.

Libraries in various languages that other applications can use to work with typed markdown.

AI agent frameworks that need structured, human-readable persistent storage.


What a conforming tool does

A tool implementing this specification:

  1. Recognises collections by the presence of an mdbase.yaml config file
  2. Loads type definitions from markdown files in a designated folder
  3. Matches files to types based on explicit declaration or configurable rules
  4. Validates frontmatter against type schemas, reporting errors at configurable severity levels
  5. Executes queries using an expression language for filtering and sorting (with optional advanced features like grouping and summaries)
  6. Performs CRUD operations with validation, default values, and auto-generated fields
  7. Updates references when files are renamed, keeping links consistent across the collection

The specification defines the expected behaviour for each of these capabilities, along with conformance levels for partial implementations.


Design Principles

Files are the source of truth. Tools read from and write to the filesystem. Indexes and caches are derived and disposable.

Human-readable first. Tools should not require proprietary formats. A user with a text editor should be able to read and modify any file.

Progressive strictness. Tools should work on collections with no schema at all. Validation is opt-in and configurable.

Portable. Collections should work with any conforming tool. No vendor lock-in.

Git-friendly. All persistent state is text files suitable for version control.


How It Works

A collection is a folder with an `mdbase.yaml` marker

my-project/
├── mdbase.yaml            # Marks this folder as a collection
├── _types/                # Type definitions (schemas)
│   ├── task.md
│   └── person.md
├── tasks/
│   ├── fix-bug.md         # A record of type "task"
│   └── write-docs.md
└── people/
    └── alice.md           # A record of type "person"

The minimal config just declares the spec version:

# mdbase.yaml
spec_version: "0.1.0"

Types are defined as markdown files

A type is a schema for a category of files. Types live in the _types/ folder and are themselves markdown — the frontmatter defines the schema, the body documents it.

---
name: task
fields:
  title:
    type: string
    required: true
  status:
    type: enum
    values: [open, in_progress, done]
    default: open
  priority:
    type: integer
    min: 1
    max: 5
  assignee:
    type: link
    target: person
---

# Task

A task represents a unit of work. Set `status` to track progress.

Records are markdown files with typed frontmatter

A file declares its type and provides field values in frontmatter. The body is free-form markdown.

---
type: task
title: Fix the login bug
status: in_progress
priority: 4
assignee: "[[alice]]"
tags: [bug, auth]
---

The login form throws a validation error when the email contains a `+` character.

Queries filter and sort records using expressions

Queries are YAML objects with optional clauses for filtering, sorting, and pagination:

query:
  types: [task]
  where:
    and:
      - 'status != "done"'
      - "priority >= 3"
  order_by:
    - field: due_date
      direction: asc
  limit: 20

The expression language supports field access, comparison, boolean logic, string and list methods, date arithmetic, and link traversal:

status == "open" && tags.contains("urgent")
due_date < today() + "7d"
assignee.asFile().team == "engineering"

Validation is progressive

Collections work with no types at all — every file is an untyped record. Types can be added incrementally, and validation severity is configurable per-collection or per-type (off, warn, error). Strictness controls whether unknown fields are allowed, warned, or rejected.

Records can reference each other using wikilinks ([[alice]]) or markdown links ([Alice](../people/alice.md)). When a file is renamed, conforming tools update all references automatically.

Conformance is levelled

Implementations don't need to support everything. Six conformance levels let tools start with basic CRUD (Level 1) and progressively add matching, querying, links, reference updates, and caching. See §14 for details.


Specification Structure

Document Description
01-terminology.md Definitions of key terms
02-collection-layout.md How tools identify and scan collections
03-frontmatter.md Frontmatter parsing, null semantics, serialization
04-configuration.md The mdbase.yaml configuration file
05-types.md Type definitions as markdown files
06-matching.md How tools match files to types
07-field-types.md Primitive and composite field types
08-links.md Link syntax, parsing, resolution
09-validation.md Validation levels and error reporting
10-querying.md Query model, filters, sorting
11-expressions.md Expression language for filters and formulas
12-operations.md Create, Read, Update, Delete, Rename
13-caching.md Optional caching and indexing
14-conformance.md Conformance levels and testing
15-watching.md Watch mode and change events
Appendix A Complete examples
Appendix B Formal expression grammar
Appendix C Standard error codes
Appendix D Compatibility with existing tools

Versioning

This specification uses semantic versioning. The current version is 0.1.0, indicating a draft in active development. Breaking changes may occur before 1.0.0.

Tools should declare which specification version they implement and should reject configuration files with unsupported spec_version values.


License

This specification is released under CC BY 4.0.

1. Terminology

This section defines the key terms used throughout the specification. Understanding these definitions is essential for correctly interpreting the requirements.


Core Concepts

Collection
A directory (and optionally its subfolders) containing markdown files managed as a unit. A collection is identified by the presence of an mdbase.yaml configuration file at its root. The collection is the fundamental unit of organization—all operations, queries, and validations occur within the scope of a single collection.

Collection Root
The directory containing the mdbase.yaml configuration file. All paths in the specification are relative to this directory unless otherwise stated.

File (or Record)
A markdown file within a collection. Files have the extension .md (or optionally .mdx or .markdown if configured). Each file represents a single record in the collection—analogous to a row in a database table, but richer: it has structured frontmatter, unstructured body content, and file system metadata.

Frontmatter
YAML metadata at the beginning of a file, delimited by --- markers. The frontmatter is a YAML mapping (object) containing the structured fields of the record.

---
title: My Document
status: draft
tags: [important, review]
---

The body content begins here.

Body
The markdown content following the frontmatter. The body is treated as opaque content by default—this specification primarily concerns itself with frontmatter structure. Implementations MAY support body queries, but this is optional.

Type
A named schema defining the expected frontmatter fields, their types, constraints, and validation rules for a category of files. Types are themselves defined as markdown files in a designated folder, making them versionable and documentable. A file may be associated with zero, one, or multiple types.

Type Definition File
A markdown file in the types folder whose frontmatter defines a type schema. The body of the file can contain documentation, examples, and usage notes for the type.

Untyped File
A file that is not associated with any type. Untyped files are valid members of a collection—they simply have no schema constraints applied to them. This allows for gradual adoption of typing.

Config File
The mdbase.yaml file at the collection root. This file defines global settings, the location of type definitions, and collection-wide behavior. It does not contain type definitions themselves—those live in separate markdown files.


Operations and Queries

Expression
A string that evaluates to a value, used in query filters and computed formulas. Expressions follow the syntax defined in the Expression Language section.

Query
A request to retrieve files matching certain criteria. Queries can filter by type, field values, file metadata, and path patterns, with results sorted and paginated.

Formula
A computed field defined by an expression. Formulas are evaluated at query time and can be used for filtering, sorting, and display.

Validation
The process of checking whether a file's frontmatter conforms to the schemas of its matched types. Validation can report issues, warn, or fail operations depending on configuration.


Link
A reference from one file to another, expressed in frontmatter or body content. Links can use wikilink syntax ([[target]]), markdown link syntax ([text](path.md)), or bare paths.

Resolution
The process of determining which file a link refers to. Resolution takes into account relative paths, collection-wide search, and optional type-scoped lookups.

Backlink
An incoming link—a reference to a file from another file. Backlinks require indexing to compute efficiently and are an optional feature.


Implementation Terms

Implementation
Any tool, library, or application that reads, writes, or operates on collections according to this specification.

Conformance Level
A defined subset of the specification that an implementation may claim to support. See Conformance for the defined levels.

Cache
An optional derived data store that accelerates queries. Caches MUST be rebuildable from the source files and MUST NOT be the source of truth.


RFC 2119 Keywords

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

In brief:

  • MUST / REQUIRED / SHALL: Absolute requirement
  • MUST NOT / SHALL NOT: Absolute prohibition
  • SHOULD / RECOMMENDED: There may be valid reasons to ignore, but implications must be understood
  • SHOULD NOT / NOT RECOMMENDED: There may be valid reasons to do it, but implications must be understood
  • MAY / OPTIONAL: Truly optional; implementations may or may not include the feature

2. Collection Layout

This section defines how collections are identified, how files are discovered, and the overall structure of a compliant collection.


2.1 Identifying a Collection

A directory is recognized as a collection if and only if it contains a file named mdbase.yaml at its root. This file is the collection root marker. The directory containing this file is the collection root, and all paths in the specification are relative to this directory.

my-project/
├── mdbase.yaml           # Collection root marker
├── _types/             # Type definition files (configurable location)
│   ├── task.md
│   └── note.md
├── tasks/
│   ├── fix-bug.md
│   └── write-docs.md
├── notes/
│   └── meeting-2024-01.md
└── README.md

If a directory does not contain mdbase.yaml, it is not a collection. Implementations MUST NOT treat arbitrary directories of markdown files as collections without this marker.


2.2 File Discovery

Included Files

Implementations MUST scan the collection root and, by default, all subdirectories (recursively) for markdown files.

A file is considered a markdown file if:

  1. It has the extension .md, OR
  2. It has an extension listed in settings.extensions in the config file (e.g., .mdx, .markdown)

Implementations MUST treat files with these extensions as collection members (records).

Excluded Paths

Implementations MUST exclude certain paths from scanning:

  1. The mdbase.yaml config file itself (it is not a record)
  2. Paths listed in settings.exclude in the config file
  3. The types folder (by default _types/, configurable via settings.types_folder)
  4. The cache folder if present (by default .mdbase/)

Default exclusions (applied unless overridden):

  • .git
  • node_modules
  • .mdbase

Subdirectory Scanning

By default, implementations MUST scan subdirectories recursively. This behavior can be disabled by setting settings.include_subfolders: false in the config file.

When subdirectory scanning is disabled, only files directly in the collection root are considered records.


2.3 The Types Folder

Type definitions are stored as markdown files in a designated folder. By default, this folder is _types/ at the collection root, but it can be configured via settings.types_folder.

# mdbase.yaml
settings:
  types_folder: "_schemas"  # Use _schemas/ instead of _types/

The types folder:

  • MUST be excluded from the regular file scan (type files are not records)
  • MUST be scanned separately to load type definitions
  • MAY contain subdirectories for organization (all .md files are scanned)

See Types for the format of type definition files.


2.4 Path Conventions

All paths in the specification and in queries use forward slashes (/) regardless of operating system. Implementations on Windows MUST normalize backslashes to forward slashes.

Paths are relative to the collection root unless explicitly stated otherwise.

Examples:

  • tasks/fix-bug.md refers to a file in the tasks subdirectory
  • ./sibling.md in a link is relative to the containing file's directory
  • ../other/file.md in a link navigates up one directory

2.5 Reserved Names

The following names have special meaning and SHOULD NOT be used as regular record filenames:

Name Purpose
mdbase.yaml Collection configuration
_types/ Default types folder (configurable)
.mdbase/ Default cache folder

Implementations SHOULD warn if a user attempts to create a record with a reserved name.


2.6 Minimal Collection Example

The simplest valid collection consists of a config file and zero or more markdown files:

minimal/
├── mdbase.yaml
└── hello.md

mdbase.yaml:

spec_version: "0.1.0"

hello.md:

---
title: Hello World
---

This is a minimal collection with one untyped file.

This collection has no types defined. The single file is untyped but still a valid record. As the collection grows, types can be added incrementally.


While the specification allows flexibility, the following structure is recommended for clarity:

my-collection/
├── mdbase.yaml               # Required: collection configuration
├── _types/                 # Type definitions
│   ├── task.md
│   ├── note.md
│   └── person.md
├── .mdbase/                  # Cache (gitignored)
│   └── index.sqlite
├── tasks/                  # Records organized by type or purpose
│   ├── task-001.md
│   └── task-002.md
├── notes/
│   └── weekly-review.md
├── people/
│   └── alice.md
└── README.md               # Documentation (also a record unless excluded)

Note that README.md is a valid record in this structure. If you want to exclude documentation files from the collection, add them to settings.exclude.


2.8 Nested Collections

If a subdirectory within a collection also contains an mdbase.yaml file, it defines an independent nested collection:

  • The parent collection's scan MUST NOT descend into the nested collection.
  • The nested collection's files are NOT records of the parent collection.
  • The nested mdbase.yaml acts as a boundary marker, similar to exclude patterns.
  • Implementations SHOULD automatically exclude directories containing mdbase.yaml.
my-collection/
├── mdbase.yaml          # Parent collection
├── tasks/
│   └── task-001.md      # Record in parent
└── sub-project/
    ├── mdbase.yaml      # Nested collection (independent)
    └── docs/
        └── readme.md    # Record in sub-project, NOT in parent

This behavior ensures that collections remain self-contained and do not interfere with each other.


2.9 Non-Markdown Files

Collections may contain non-markdown files such as images, PDFs, or other binary assets. These files are NOT records — they have no frontmatter and are not returned by queries.

Status in the Collection

  • Non-markdown files are valid link targets: [[photo.png]] and ![img](photo.png) resolve normally
  • They appear in file.links when referenced via non-embed link syntax (e.g., [[photo.png]])
  • They appear in file.embeds when referenced via embed syntax (![[...]] or ![](...))
  • Non-markdown files are not assigned types and cannot be validated against type schemas

File Discovery

  • File discovery MUST skip non-markdown files when building the record set
  • File discovery MUST NOT skip non-markdown files during link resolution — links to images, PDFs, and other assets MUST resolve by path
  • Implementations MUST resolve links to non-markdown files by path only (no id_field lookup, since non-markdown files are not records and have no frontmatter)

Example

my-collection/
├── mdbase.yaml
├── tasks/
│   └── task-001.md        # Record (included in queries)
├── images/
│   ├── diagram.png        # Not a record, but valid link target
│   └── screenshot.jpg     # Not a record, but valid link target
└── attachments/
    └── report.pdf         # Not a record, but valid link target

In task-001.md:

---
type: task
title: Fix the layout
---

See the [[images/diagram.png]] for reference.

![[images/screenshot.jpg]]

Here, diagram.png appears in file.links because it uses non-embed link syntax; screenshot.jpg appears in file.embeds because it uses embed syntax.

3. Frontmatter Parsing and Serialization

This section defines how frontmatter is parsed from files and how it should be written back. Correct handling of YAML edge cases—especially null values and empty fields—is essential for interoperability.


3.1 Frontmatter Delimiters

A file MAY begin with YAML frontmatter. Frontmatter is delimited by two lines consisting of exactly three hyphens (---):

---
title: My Document
status: draft
---

# Heading

Body content begins here.

Rules:

  1. The opening --- MUST be the very first line of the file (no leading whitespace or blank lines).

  2. The closing --- MUST be on its own line.

  3. The content between the delimiters MUST be valid YAML.

  4. If a file does not begin with ---, it has no frontmatter. The entire file is treated as body content, and the record has an empty frontmatter mapping ({}).

Examples of files without frontmatter:

# Just a heading

No frontmatter here.

---
This is not frontmatter because there's a blank line before the dashes.
---

3.2 YAML Parsing Requirements

Top-Level Structure

The frontmatter MUST parse as a YAML mapping (object/dictionary).

If the frontmatter parses as a different YAML type (scalar, list, null), implementations MUST treat it as invalid frontmatter and handle according to the validation level:

  • off: Treat as empty frontmatter, log warning
  • warn: Treat as empty frontmatter, emit warning
  • error: Fail the operation

Invalid example:

---
- item1
- item2
---

This is a YAML list, not a mapping, and is invalid frontmatter.

YAML Version

Implementations SHOULD support YAML 1.2. Implementations MAY support YAML 1.1 for compatibility with existing tools, but SHOULD prefer 1.2 semantics where they differ.

Character Encoding

Files MUST be UTF-8 encoded. Implementations MUST reject files with invalid UTF-8 sequences.


3.3 Null and Empty Value Semantics

Correct handling of null and empty values is critical for interoperability. This section defines canonical behavior that all implementations MUST follow.

Reading Null Values

The following YAML patterns MUST all be interpreted as null:

# Explicit null keyword
field1: null
field2: Null
field3: NULL

# Tilde (YAML null alias)
field4: ~

# Empty value (key with no value)
field5:

# Explicit empty (flow style)
field6:

All of the above result in field having the value null.

Empty String vs Null

An empty string is distinct from null:

# This is null:
empty_null:

# This is an empty string (not null):
empty_string: ""

# This is also an empty string:
empty_quoted: ''

Implementations MUST preserve this distinction. A field with value "" is a present field with an empty string value. A field with value null (or empty) is a present field with no value.

Missing vs Null

A missing field (key not present) is distinct from a null field (key present with null value):

---
present_null: null
present_string: "hello"
# 'missing_field' is not here
---
  • present_null is present with value null
  • present_string is present with value "hello"
  • missing_field is missing (not present at all)

This distinction matters for:

  • The exists() function in expressions (returns true when key is present, even if null)
  • The isEmpty() method (returns true when value is null, empty, or missing)
  • The required constraint (requires present and non-null)
  • Default value application (applies to missing, not to null)

Summary Table

YAML Parsed Value exists(field) Satisfies required? (before defaults)
field: null null true No
field: ~ null true No
field: null true No
field: "" "" (empty string) true Yes (string value)
(key absent) undefined false No

Presence vs Meaningful Value

  • exists(field) is true when the key is present in raw persisted frontmatter, even if the value is null.
  • field.isEmpty() is true when the value is null, empty, or missing.
  • required: true requires the key to be present in the effective frontmatter and the value to be non-null (see §9.2.1).

Implementations MUST preserve these distinctions in validation and query evaluation.


3.4 Writing Frontmatter

When implementations write or update frontmatter, they MUST follow these rules to ensure consistency and avoid ambiguity.

Never Write Empty-Value Nulls

Implementations MUST NOT write the "empty value" null form:

# ❌ NEVER write this
field:

This form is ambiguous in some contexts and causes issues with YAML tools that normalize whitespace.

Writing Null Values

When a field's value is null and the field should be written, implementations MUST use one of:

Option 1: Explicit null (preferred when preserving the field)

field: null

Option 2: Omit the field entirely (preferred when null means "no value")

# field is simply not present

The choice between these options is controlled by settings.write_nulls:

write_nulls Behavior
"omit" (default) Omit fields with null values
"explicit" Write field: null

Writing Empty Strings

Empty strings MUST be written with explicit quotes:

field: ""

Writing Empty Lists

Empty lists can be written as [] or omitted, controlled by settings.write_empty_lists:

write_empty_lists Behavior
true (default) Write field: []
false Omit fields with empty list values

3.5 Formatting Preservation

When updating a file, implementations SHOULD preserve as much of the original formatting as practical:

SHOULD Preserve

  • Field order: Keep fields in their original order when possible
  • Blank lines: Preserve blank lines within frontmatter (YAML allows them)
  • String style: If a string was written with quotes, keep the quotes
  • Comment proximity: Keep comments near their associated fields

MUST Preserve

  • Body content: The body MUST NOT be modified by frontmatter updates (except when explicitly changing the body)
  • Line endings: Preserve the file's line ending style (LF vs CRLF)

MAY Normalize

Implementations MAY normalize:

  • Indentation (2 spaces is conventional)
  • Trailing whitespace
  • Final newline (files SHOULD end with a newline)

3.6 Special Characters in Field Names

Field names containing special characters MUST be quoted in YAML:

"field-with-dashes": value
"field.with.dots": value
"field:with:colons": value

In expressions and queries, such fields are accessed with bracket notation:

note["field-with-dashes"]

Implementations SHOULD avoid requiring special characters in schema-defined field names. User-defined fields may use them.


3.7 Multi-line Strings

YAML supports several multi-line string formats. Implementations MUST support all standard YAML multi-line syntaxes:

Literal block (preserves newlines):

description: |
  This is a multi-line string.
  Line breaks are preserved.

Folded block (newlines become spaces):

description: >
  This is a long line that will be
  folded into a single line.

Quoted strings with escapes:

description: "Line 1\nLine 2"

When writing multi-line values, implementations SHOULD use literal block style (|) for readability.


3.8 Type Coercion

YAML has automatic type inference that can cause surprises. Implementations MUST be aware of these patterns:

YAML Value YAML Type Notes
true, false Boolean
yes, no Boolean (YAML 1.1) Avoid; prefer true/false
on, off Boolean (YAML 1.1) Avoid; prefer true/false
123 Integer
12.5 Float
1e10 Float (scientific)
0x1A Integer (hex)
.inf, -.inf Float (infinity)
.nan Float (NaN)
2024-01-15 Date (if YAML date extension) Implementations MAY parse as date
null, ~ Null
"123" String Quoted values are strings

When schema specifies a type, implementations MUST coerce compatible values (e.g., reading 123 for a string field as "123"). When coercion is not possible, it is a validation error.


3.9 Example: Round-Trip Preservation

Given this input file:

---
title: My Task
status: open
tags:
  - important
  - review
due_date: 2024-03-15
notes: |
  This is a longer note.
  It spans multiple lines.
---

# Task Details

The body content here.

After updating status to "done", the output SHOULD be:

---
title: My Task
status: done
tags:
  - important
  - review
due_date: 2024-03-15
notes: |
  This is a longer note.
  It spans multiple lines.
---

# Task Details

The body content here.

Note that:

  • Field order is preserved
  • Multi-line string style is preserved
  • List formatting is preserved
  • Body content is unchanged

4. Configuration

This section defines the structure and semantics of the mdbase.yaml configuration file that identifies and configures a collection.


4.0 File Encoding

The mdbase.yaml configuration file and all type definition files MUST be encoded in UTF-8 (consistent with the UTF-8 requirement for markdown files in §3.2).


4.1 File Location and Format

The configuration file MUST be named mdbase.yaml and MUST be located at the collection root. This file:

  • Identifies the directory as a collection
  • Specifies the schema version
  • Configures collection behavior
  • Points to the types folder

The file MUST be valid YAML and MUST parse as a mapping at the top level.


4.2 Minimal Configuration

The simplest valid configuration declares only the specification version:

spec_version: "0.1.0"

This creates a collection with all default settings and no types (all files are untyped).


4.3 Full Configuration Schema

# =============================================================================
# REQUIRED
# =============================================================================

# Specification version this configuration conforms to
# Implementations MUST reject versions they do not support
spec_version: "0.1.0"

# =============================================================================
# OPTIONAL: Collection Metadata
# =============================================================================

# Human-readable name for the collection
name: "My Project Tasks"

# Description of the collection's purpose
description: "Task and note management for the My Project initiative"

# =============================================================================
# OPTIONAL: Settings
# =============================================================================

settings:
  # ---------------------------------------------------------------------------
  # File Discovery
  # ---------------------------------------------------------------------------
  
  # Additional file extensions to treat as markdown (beyond .md which is always included)
  # Default: []
  # Common additions: ["mdx", "markdown"]
  # Entries MAY include a leading dot; implementations MUST normalize to no-dot.
  extensions: ["mdx"]
  
  # Paths to exclude from scanning (relative to collection root)
  # Default: [".git", "node_modules", ".mdbase"]
  # Glob patterns are supported
  exclude:
    - ".git"
    - "node_modules"
    - ".mdbase"
    - "drafts/**"
    - "*.draft.md"
  
  # Whether to scan subdirectories recursively
  # Default: true
  include_subfolders: true
  
  # ---------------------------------------------------------------------------
  # Types Configuration
  # ---------------------------------------------------------------------------
  
  # Folder containing type definition files (relative to collection root)
  # Default: "_types"
  types_folder: "_types"
  
  # Frontmatter keys that explicitly declare a file's type(s)
  # If a file has any of these keys, its value determines the type(s)
  # Default: ["type", "types"]
  explicit_type_keys: ["type", "types"]
  
  # ---------------------------------------------------------------------------
  # Validation
  # ---------------------------------------------------------------------------
  
  # Default validation level for operations
  # "off": No validation
  # "warn": Report issues but don't fail
  # "error": Report issues and fail operations
  # Default: "warn"
  default_validation: "warn"
  
  # Default strictness for types that don't specify their own
  # false: Extra fields allowed
  # true: Extra fields cause validation failure
  # "warn": Extra fields allowed but emit warning
  # Default: false
  default_strict: false
  
  # ---------------------------------------------------------------------------
  # Link Resolution
  # ---------------------------------------------------------------------------
  
  # Field name used as unique identifier for link resolution
  # When a link is a simple name (no path), implementations search for files
  # where this field matches the link target
  # Default: "id"
  id_field: "id"
  
  # ---------------------------------------------------------------------------
  # Write Behavior
  # ---------------------------------------------------------------------------
  
  # How to handle null values when writing frontmatter
  # "omit": Don't write fields with null values
  # "explicit": Write as `field: null`
  # Default: "omit"
  write_nulls: "omit"
  
  # Whether to write empty lists
  # true: Write as `field: []`
  # false: Omit fields with empty list values
  # Default: true
  write_empty_lists: true
  
  # ---------------------------------------------------------------------------
  # Rename Behavior
  # ---------------------------------------------------------------------------
  
  # Whether to update references across the collection when a file is renamed
  # Default: true
  rename_update_refs: true
  
  # ---------------------------------------------------------------------------
  # Caching
  # ---------------------------------------------------------------------------
  
  # Folder for cache files (relative to collection root)
  # Default: ".mdbase"
  cache_folder: ".mdbase"

4.4 Setting Details

`spec_version` (Required)

The version of this specification the configuration conforms to. Implementations MUST check this value and MUST reject configuration files with versions they do not support.

Valid values: "0.1.0"

Compatibility: Implementations MAY accept "0.1" as an alias for "0.1.0", but SHOULD emit a warning and normalize to "0.1.0" when writing.

4.4.1 Version Compatibility

The spec_version field uses semantic versioning (MAJOR.MINOR.PATCH):

PATCH bumps (e.g., 0.1.0 → 0.1.1): Clarifications and errata only. No behavioral changes. All implementations of X.Y.z are compatible with any X.Y.z′.

MINOR bumps (e.g., 0.1.0 → 0.2.0): Additive changes only — new optional fields, new expression functions, new config keys. Implementations of X.Y MUST ignore unknown config keys introduced in X.Y+1 rather than rejecting them. Collections authored for X.Y work on X.Y+N without modification.

MAJOR bumps (e.g., 0.x → 1.0): Breaking changes. Implementations MUST reject configuration files with a different major version than the one they support.

During the 0.x series: MINOR bumps MAY contain breaking changes. Implementations SHOULD treat 0.x and 0.y (x ≠ y) as potentially incompatible.

Unknown keys: Implementations MUST ignore unknown keys under settings with a warning, to support forward compatibility within a major version. Unknown top-level keys (outside settings) MUST also be ignored with a warning.

`name` and `description`

Human-readable metadata about the collection. These have no semantic effect but are useful for documentation and tooling that displays collection information.

`settings.extensions`

File extensions to scan. The extension .md is always implicitly included. This setting specifies additional extensions beyond .md:

Default: []

Normalization:

  • Implementations MUST treat entries with or without a leading dot as equivalent.
  • The .md extension is always implicitly included and MUST NOT be required in this list.
  • If md or .md appears in extensions, it SHOULD be ignored with a warning.

Example: To include MDX files:

settings:
  extensions: ["mdx"]

`settings.exclude`

Paths or glob patterns to exclude from file scanning. Paths are relative to the collection root.

Default: [".git", "node_modules", ".mdbase"]

Glob patterns:

  • * matches any characters except /
  • ** matches any characters including /
  • ? matches a single character

Example:

settings:
  exclude:
    - ".git"
    - "node_modules"
    - "*.draft.md"      # Exclude all draft files
    - "archive/**"      # Exclude everything in archive/

`settings.types_folder`

The folder containing type definition files. Type files are markdown files whose frontmatter defines a schema.

Default: "_types"

The types folder:

  • Is automatically excluded from the regular file scan
  • Is scanned separately to load type definitions
  • May contain subdirectories (all .md files are processed)

`settings.explicit_type_keys`

Frontmatter keys that can explicitly declare a file's type(s). When a file has one of these keys, its value determines the type assignment, overriding any match rules.

Default: ["type", "types"]

Usage:

# Single type
type: task

# Multiple types
types: [task, urgent]

Using type as a normal field: If you want a frontmatter field named type to be treated as ordinary data, remove it from settings.explicit_type_keys and choose different declaration keys (e.g., kind, kinds).

`settings.default_validation`

The default validation level applied when not otherwise specified.

Value Behavior
"off" No validation performed
"warn" Validation issues are reported but operations succeed
"error" Validation issues cause operations to fail

Default: "warn"

`settings.default_strict`

Default strictness mode for types that don't declare their own.

Value Behavior
false Unknown fields are allowed
"warn" Unknown fields are allowed but trigger warnings
true Unknown fields cause validation failure

Default: false

`settings.id_field`

The field name used as a unique identifier for link resolution. When a link is a simple name (not a path), implementations search for files where this field matches.

Uniqueness requirement: Values of the id_field MUST be unique across the collection. Implementations MUST validate uniqueness and report duplicate_id issues when multiple files share the same id_field value.

Default: "id"

Example: With id_field: "id", the link [[task-001]] would resolve to a file with id: task-001 in its frontmatter.

`settings.write_nulls`

Controls how null values are written to frontmatter.

Value Behavior
"omit" Fields with null values are not written
"explicit" Null values are written as field: null

Default: "omit"

`settings.rename_update_refs`

Whether renaming a file automatically updates references to it across the collection.

Default: true

When enabled, implementations MUST update:

  • Link fields in frontmatter that resolve to the renamed file
  • Link syntax in body content that references the renamed file

See Operations for details.


4.5 Configuration Validation

Implementations MUST validate the configuration file before processing the collection. Validation checks:

  1. Structure: The file parses as valid YAML with a mapping at the top level
  2. Required fields: spec_version is present
  3. Type correctness: Each field has the expected type
  4. Valid values: Enum fields have allowed values
  5. Path validity: Paths in exclude, types_folder, etc. are syntactically valid

If validation fails, implementations MUST NOT process the collection and MUST report the error clearly.


4.6 Configuration Examples

Minimal

spec_version: "0.1.0"

Standard Project

spec_version: "0.1.0"
name: "Project Documentation"
description: "Specs, decisions, and meeting notes"

settings:
  exclude:
    - ".git"
    - "node_modules"
    - "drafts/**"
  default_validation: "error"

Knowledge Base with Custom Types Folder

spec_version: "0.1.0"
name: "Personal Knowledge Base"

settings:
  types_folder: "schemas"
  extensions: ["mdx"]
  default_strict: "warn"
  id_field: "uid"

Strict Validation

spec_version: "0.1.0"
name: "Production Data"

settings:
  default_validation: "error"
  default_strict: true
  write_nulls: "explicit"

4.7 Environment Variables (Optional)

Implementations MAY support environment variable substitution in configuration values using ${VAR} syntax (and MAY also support ${VAR:-default} for default values):

settings:
  cache_folder: "${MDBASE_CACHE:-/tmp/mdbase}"

This feature is OPTIONAL. If not supported, implementations MUST treat ${...} as literal strings. If ${VAR:-default} is not supported, implementations MUST treat the entire string literally (no partial expansion).


4.8 Security Considerations

Regular Expressions

Match rules, field constraints (pattern), and expressions (the matches operator) may contain regular expressions.

Required baseline: Implementations MUST support ECMAScript (ES2018+) regular expression syntax as the baseline flavor. This aligns with JavaScript-based tools (e.g., Obsidian) and is available in every major programming language.

Required features (MUST support):

Feature Syntax Example
Character classes [abc], [^abc], \d, \w, \s \d{4}
Quantifiers *, +, ?, {n}, {n,m} \w+
Alternation | cat|dog
Anchors ^, $ ^TASK-
Capturing groups (...) (\d+)-(\d+)
Non-capturing groups (?:...) (?:foo|bar)
Lookahead (?=...), (?!...) \d+(?= items)

Optional features (SHOULD support):

Feature Syntax Notes
Lookbehind (?<=...), (?<!...) Supported in ES2018 but not in RE2-based engines
Named groups (?<name>...) Supported in ES2018 but not in RE2-based engines

Implementations that do not support optional features MUST reject patterns using those features with a clear error rather than silently ignoring them.

ReDoS mitigations: Implementations SHOULD guard against Regular Expression Denial of Service (ReDoS) by:

  • Setting timeouts on regex evaluation
  • Rejecting patterns with known dangerous constructs (e.g., nested quantifiers)
  • Documenting any regex restrictions

Environment Variables

If an implementation supports environment variable expansion in configuration (e.g., ${VAR}), it MUST:

  • Only expand variables explicitly referenced in configuration
  • Never log expanded values that may contain secrets
  • Document which config fields support expansion

Expression Evaluation

Implementations SHOULD set resource limits on expression evaluation:

  • Maximum expression nesting depth
  • Maximum number of function calls per evaluation
  • Timeout for individual expression evaluations

These limits prevent pathological expressions from consuming unbounded resources.

5. Types

This section defines how types (schemas) are created, structured, and interpreted. In this specification, types are markdown files—they live in a designated folder, have frontmatter that defines the schema, and body content that provides documentation.


5.1 Types as Markdown Files

A type is defined by a markdown file in the types folder (default: _types/). The file's frontmatter contains the schema definition; the body contains documentation for the type.

Example: _types/task.md

---
name: task
description: A task or todo item with status tracking
extends: base
strict: false

fields:
  title:
    type: string
    required: true
    description: Short summary of the task
  status:
    type: enum
    values: [open, in_progress, blocked, done]
    default: open
  priority:
    type: integer
    min: 1
    max: 5
    default: 3
  due_date:
    type: date
  tags:
    type: list
    items:
      type: string
    default: []
  assignee:
    type: link
    target: person
---

# Task

A task represents a discrete unit of work that can be tracked through its lifecycle.

## Status Values

- **open**: Not yet started
- **in_progress**: Currently being worked on
- **blocked**: Cannot proceed due to external dependency
- **done**: Completed

## Usage

Tasks are typically stored in the `tasks/` folder. Example:

```yaml
---
type: task
title: Fix the login bug
status: in_progress
priority: 4
due_date: 2024-03-15
assignee: "[[alice]]"
tags: [bug, auth]
---

The login form throws an error when...

This approach has several benefits:

1. **Documentation lives with the schema**: The markdown body explains how to use the type
2. **Version control friendly**: Types are tracked like any other content
3. **Human readable**: Anyone can understand the type by reading the file
4. **Editable anywhere**: No special tooling required to modify schemas
5. **Meta-consistency**: Types use the same format as the content they describe

---

## 5.2 Type Definition Schema

The frontmatter of a type file MUST conform to this structure:

```yaml
# =============================================================================
# REQUIRED
# =============================================================================

# The type name (must match filename without extension)
name: task

# =============================================================================
# OPTIONAL: Metadata
# =============================================================================

# Human-readable description
description: "A task or todo item"

# Type to inherit fields from
extends: base

# Strictness mode (overrides settings.default_strict)
# false: Allow unknown fields
# true: Reject unknown fields
# "warn": Allow but warn about unknown fields
strict: false

# =============================================================================
# OPTIONAL: Matching Rules
# =============================================================================

# Rules for automatically associating files with this type
# If not specified, files must explicitly declare their type
match:
  path_glob: "tasks/**/*.md"
  fields_present: [status]
  where:
    # Field predicates using expression operators
    tags:
      contains: "task"

# =============================================================================
# OPTIONAL: Filename Pattern
# =============================================================================

# Pattern for validating/generating filenames
# Variables in {} reference field values
filename_pattern: "{id}.md"

# =============================================================================
# REQUIRED (unless extends provides all fields)
# =============================================================================

# Field definitions
fields:
  field_name:
    type: string
    required: false
    # ... field options

5.3 The `name` Field

Every type MUST have a name field that matches the filename (without extension).

_types/task.md    →  name: task
_types/person.md  →  name: person

If the name doesn't match the filename, implementations MUST emit a warning and use the name value as the canonical type name.

Names MUST:

  • Consist of lowercase letters, numbers, hyphens, and underscores
  • Start with a letter
  • Not exceed 64 characters

Type names are canonicalized to lowercase. Implementations SHOULD treat type names case-insensitively when reading frontmatter (type/types) and SHOULD normalize them to lowercase for matching and output while emitting a warning for non-canonical casing.

Reserved names (MUST NOT be used):

  • Names starting with _ (reserved for internal use)
  • file, formula, this (reserved keywords in expressions)

5.4 Type Inheritance

Types MAY inherit from another type using the extends field:

# _types/base.md
---
name: base
fields:
  id:
    type: string
    required: true
  created_at:
    type: datetime
    generated: now
  updated_at:
    type: datetime
    generated: now_on_write
---
# _types/task.md
---
name: task
extends: base
fields:
  title:
    type: string
    required: true
  status:
    type: enum
    values: [open, done]
---

The task type inherits id, created_at, and updated_at from base, and adds title and status.

Inheritance Rules

  1. Single inheritance only: A type can extend at most one parent type
  2. Chains allowed: task extends base extends root is valid
  3. Field override: Child fields with the same name override parent fields completely
  4. Circular inheritance: MUST be detected and rejected with an error
  5. Missing parent: If the parent type doesn't exist, validation MUST fail
  6. Strictness: Child inherits parent's strict unless explicitly overridden

Field Override Example

# Parent defines priority as 1-3
# _types/base-task.md
fields:
  priority:
    type: integer
    min: 1
    max: 3

# Child redefines priority as 1-5
# _types/task.md
extends: base-task
fields:
  priority:
    type: integer
    min: 1
    max: 5  # Now allows 4 and 5

The child completely replaces the parent's field definition; constraints are not merged.


5.5 Strictness

The strict field controls how unknown fields are handled during validation:

Value Behavior
false Unknown fields are allowed without warning
"warn" Unknown fields are allowed but trigger warnings
true Unknown fields cause validation failure

Default: Inherits from settings.default_strict in the config (which defaults to false).

"Unknown fields" are fields in a file's frontmatter that are not defined in the type's schema (including inherited fields).


5.6 Filename Patterns

The optional filename_pattern defines expected filename structure:

filename_pattern: "{id}-{slug}.md"

Patterns use {} to reference field values. Common placeholders:

  • {id}: The id field value
  • {slug}: A URL-safe slug (implementations should slugify automatically)
  • {date}: A date field formatted as YYYY-MM-DD

Use cases:

  1. Validation: Check that existing filenames match the pattern
  2. Generation: When creating new files, derive filename from field values

Slugification rules:

  • Lowercase all characters
  • Replace spaces and special characters with hyphens
  • Remove consecutive hyphens
  • Trim hyphens from start and end
  • Unicode handling: Implementations MUST use Unicode-aware lowercasing (not locale-dependent). Non-ASCII letters SHOULD be transliterated to their ASCII equivalents where a well-known mapping exists (e.g., üu, ñn). Characters with no ASCII equivalent SHOULD be removed rather than replaced with hyphens.

5.7 Type Loading Order

When loading types from the types folder:

  1. Scan all .md files in the types folder (including subdirectories)
  2. Parse each file's frontmatter
  3. Build a dependency graph based on extends relationships
  4. Detect and reject circular dependencies
  5. Load types in dependency order (parents before children)
  6. Merge inherited fields into each type's effective schema

5.8 Built-in vs User Types

This specification does not define built-in types. All types are user-defined via markdown files.

However, implementations MAY provide starter templates for common types (task, note, person, etc.) that users can copy into their types folder and customize.


5.9 Creating Type Files Programmatically

Implementations MUST provide a way to create type definition files. This is a normal write operation that:

  1. Validates the type definition schema
  2. Checks for name conflicts with existing types
  3. Writes the file to the types folder
  4. Reloads the types registry

Example CLI interaction:

# Create a new type interactively
mdbase type create

# Create with a template
mdbase type create --from-template task

# Scaffold from an existing file's frontmatter
mdbase type create --infer-from notes/example.md

5.10 Type Documentation (Body Content)

The body of a type file is documentation. It has no semantic effect on the schema but SHOULD explain:

  • The purpose of the type
  • How to use each field
  • Example files
  • Relationships with other types
  • Best practices

Implementations MAY render this documentation in tooling (e.g., showing field help, type browser).


5.11 Schema Evolution

When a type definition changes, existing files are NOT automatically migrated — files are the source of truth. The following rules define what happens for each kind of schema change:

Field added (optional): Existing files without the field remain valid. The field is undefined (not null) until explicitly set. If a default is specified in the type definition, it applies to the effective value at read/query/validation time.

Field added (required): Existing files without the field fail validation. Implementations MUST report missing_required errors. Users must add the field to affected files manually or via batch update.

Field removed: Existing files with the removed field are treated as having an unknown field. Behavior depends on the type's strict setting (see §5.5). No data is deleted from files.

Field type changed: Existing files with values of the old type fail validation with type_mismatch. No automatic coercion of persisted data is performed.

Field renamed: The specification does not track field renames — this is equivalent to removing one field and adding another. Implementations MAY provide a batch rename tool as a convenience.

Type renamed: Existing files with type: old_name fail type matching. Implementations MUST provide a batch update command to update type declarations across files.

Inheritance changed: The effective schema is recomputed. Fields gained from a new parent apply the same rules as "field added." Fields lost apply the same rules as "field removed."

Materializing defaults: Defaults are not required to be persisted to disk. Implementations MAY provide a flag to write default values on create/update; if a default is written, it MUST equal the declared default at the time of the write.

Migration strategy: Validation is the migration mechanism. Run validation on the collection after schema changes, review reported errors, and fix affected files.


5.12 Computed Fields

Type definitions MAY include fields with a computed property containing an expression:

fields:
  full_name:
    type: string
    computed: "first_name + ' ' + last_name"
  overdue:
    type: boolean
    computed: "due_date < today() && status != 'done'"

Rules

  • Computed fields are evaluated at read time and are NOT persisted to the file
  • They are available in queries, formulas, and expressions like any other field
  • Computed fields MUST NOT be required (they are always derived)
  • Computed fields MUST NOT have default or generated — these are mutually exclusive mechanisms
  • If a file contains a frontmatter key matching a computed field name, the persisted value is ignored and the computed value takes precedence. Implementations SHOULD emit a warning
  • If a type definition has both computed and required: true on a field, implementations MUST reject the type definition with an invalid_type_definition error

Conformance

Computed fields are a Level 3 (Querying) capability. Implementations below Level 3 MUST still load type definitions containing computed fields without error, but MUST ignore the computed property and treat the field as a regular (non-computed) field. This ensures type definitions are portable across conformance levels.

Evaluation Order

Non-computed fields are resolved first, then computed fields in dependency order. Computed fields MAY reference other computed fields, which are resolved via dependency ordering.

Circular computed field dependencies MUST be detected and rejected with a circular_computed error.

Inheritance

Computed fields from parent types are inherited and MAY be overridden by child types.

Example

# _types/task.md
---
name: task
fields:
  first_name:
    type: string
  last_name:
    type: string
  full_name:
    type: string
    computed: "first_name + ' ' + last_name"
  due_date:
    type: date
  status:
    type: enum
    values: [open, in_progress, done]
  is_overdue:
    type: boolean
    computed: "due_date < today() && status != 'done'"
---

5.13 Complete Type File Example

---
name: meeting-note
description: Notes from a meeting
extends: base

strict: "warn"

match:
  path_glob: "meetings/**/*.md"
  fields_present: [date, attendees]

filename_pattern: "{date}-{title}.md"

fields:
  title:
    type: string
    required: true
    description: Meeting title or topic
  
  date:
    type: date
    required: true
    description: Date the meeting occurred
  
  attendees:
    type: list
    items:
      type: link
      target: person
    min_items: 1
    description: People who attended
  
  agenda:
    type: list
    items:
      type: string
    default: []
    description: Planned discussion topics
  
  decisions:
    type: list
    items:
      type: object
      fields:
        topic:
          type: string
          required: true
        decision:
          type: string
          required: true
        owner:
          type: link
          target: person
    default: []
    description: Decisions made during the meeting
  
  action_items:
    type: list
    items:
      type: link
      target: task
    default: []
    description: Tasks created from this meeting
  
  next_meeting:
    type: date
    description: Scheduled follow-up date
---

# Meeting Note

Meeting notes capture discussions, decisions, and action items from team meetings.

## Required Fields

- **title**: A short, descriptive title (e.g., "Q1 Planning", "Design Review")
- **date**: When the meeting occurred (YYYY-MM-DD format)
- **attendees**: At least one person must be linked

## Decisions Format

Decisions are structured objects with:
- `topic`: What was being decided
- `decision`: The outcome
- `owner`: Who is responsible for follow-through

```yaml
decisions:
  - topic: API versioning strategy
    decision: Use URL path versioning (/v1/, /v2/)
    owner: "[[alice]]"

Linking to Tasks

Action items should be created as separate task files and linked:

action_items:
  - "[[tasks/update-api-docs]]"
  - "[[tasks/create-v2-endpoints]]"

Example

---
type: meeting-note
title: Sprint Planning
date: 2024-03-01
attendees:
  - "[[alice]]"
  - "[[bob]]"
  - "[[charlie]]"
agenda:
  - Review last sprint
  - Estimate new stories
  - Assign work
decisions:
  - topic: Sprint length
    decision: Keep 2-week sprints
    owner: "[[alice]]"
action_items:
  - "[[tasks/story-123]]"
next_meeting: 2024-03-15
---

## Discussion

Sprint velocity was 42 points last sprint...

6. Type Matching

This section defines how files are associated with types. Unlike traditional schemas where each record belongs to exactly one table, this specification supports multi-type matching: a file may match zero, one, or multiple types simultaneously.


6.1 Matching Overview

Type matching determines which types apply to a file. This happens:

  • When reading a file (to know which schemas to validate against)
  • When querying (to filter by type)
  • When updating (to apply type-specific validation)

A file's types are determined by:

  1. Explicit declaration (highest precedence): If the file's frontmatter contains a type key (e.g., type: task), only those declared types apply
  2. Match rules: If no explicit declaration, each type's match rules are evaluated; all matching types apply
  3. Untyped: If nothing matches, the file is untyped

6.2 Explicit Type Declaration

Files can explicitly declare their type(s) using frontmatter keys defined in settings.explicit_type_keys (default: type and types). Type names SHOULD be treated case-insensitively when read from frontmatter and normalized to lowercase for matching; non-canonical casing SHOULD emit a warning.

If you want to use a field like type as ordinary data, remove it from settings.explicit_type_keys and use different keys (e.g., kind, kinds) for type declarations.

Single Type

---
type: task
title: Fix the bug
---

This file is a task and only task. Match rules are not evaluated.

Multiple Types

---
types: [task, urgent]
title: Fix critical security bug
---

This file is both a task and an urgent record. It must validate against both schemas.

Precedence

If both type and types are present, implementations SHOULD prefer types (the plural form).

---
type: task          # Ignored when types is present
types: [task, bug]  # This is used
---

6.3 Match Rules

Types can define rules for automatically associating files without explicit declaration. Match rules are specified in the type's match field:

# _types/task.md
---
name: task
match:
  path_glob: "tasks/**/*.md"
  fields_present: [status, due_date]
  where:
    status:
      exists: true
    priority:
      gte: 1
---

All conditions in match are combined with AND logic—all must be true for the type to match.


6.4 Match Conditions

`path_glob`

Matches files by their path relative to the collection root.

match:
  path_glob: "tasks/**/*.md"

Glob syntax:

  • * matches any characters except /
  • ** matches any characters including /
  • ? matches a single character

Examples:

Pattern Matches
tasks/*.md tasks/foo.md, not tasks/sub/foo.md
tasks/**/*.md Any .md in tasks/ or subdirectories
*.task.md foo.task.md, bar.task.md
notes/2024-*.md notes/2024-01.md, notes/2024-12.md

`fields_present`

Matches files that have all specified fields present and non-null.

match:
  fields_present: [status, assignee]

A field is "present" for matching purposes if:

  • The key exists in frontmatter, AND
  • The value is not null

Note: This is stricter than the exists() expression function, which returns true even for null values. The fields_present match condition requires a meaningful (non-null) value.

`where`

Matches files based on field value conditions. This uses a subset of the expression language operators:

match:
  where:
    # Exact equality
    kind: "task"
    
    # Field exists and is non-null
    status:
      exists: true
    
    # Comparison operators
    priority:
      gte: 3
    
    # List contains
    tags:
      contains: "important"
    
    # String prefix
    title:
      startsWith: "URGENT:"

Available operators in where:

Operator Description Example
(direct value) Exact equality status: open
exists Field is present (true) or missing (false) assignee: { exists: true }
eq Equal to priority: { eq: 3 }
neq Not equal to status: { neq: "done" }
gt Greater than priority: { gt: 2 }
gte Greater than or equal priority: { gte: 3 }
lt Less than priority: { lt: 4 }
lte Less than or equal priority: { lte: 5 }
contains List contains value tags: { contains: "bug" }
containsAll List contains all values tags: { containsAll: ["bug", "urgent"] }
containsAny List contains any value tags: { containsAny: ["bug", "feature"] }
startsWith String starts with title: { startsWith: "WIP:" }
endsWith String ends with file: { endsWith: ".draft.md" }
matches Regex match (see §4.8 for regex flavor) title: { matches: "^TASK-\\d+" }

Match `where` vs Query `where`

The match rule where clause uses a YAML-structured form with operator keys:

# Match rule where (YAML-structured)
match:
  where:
    status:
      neq: "done"

The query where clause (see Querying §10.3) uses expression strings:

# Query where (expression string)
where: 'status != "done"'

These are two distinct syntaxes. Match rules use the structured form because they are evaluated during type matching (before the expression engine is available). Query where clauses use expression strings for greater flexibility.


6.5 Multi-Type Matching

When a file matches multiple types (whether by explicit declaration or match rules), the file must conform to all matched types.

Validation

The file is validated against each type's schema. All validations must pass:

# File: tasks/urgent-bug.md
---
types: [task, urgent]
title: Fix login
status: open
escalation_contact: alice@example.com
---

This file must:

  • Have all required fields from task
  • Satisfy all constraints from task
  • Have all required fields from urgent
  • Satisfy all constraints from urgent

Field Conflicts

When two types define the same field differently:

Compatible definitions occur when both types define the same field with the same base type. Constraints are merged by taking the most restrictive intersection:

Constraint Merge Rule
required true if EITHER type requires it
min / min_length / min_items Take the higher minimum
max / max_length / max_items Take the lower maximum
pattern Value must match all patterns
values (enum) Take the intersection of allowed values
default If both define defaults, they MUST be equal; otherwise it is an error
unique (list) true if EITHER type requires it
unique (cross-file) true if EITHER type requires it
# Type A: priority as integer 1-5
# Type B: priority as integer 1-3
# Effective: priority as integer 1-3 (most restrictive)

Composite and advanced constraints:

Constraint Merge Rule
list.items Item schemas MUST be compatible and are merged recursively using these same rules
object.fields Fields are merged by name; overlapping fields are merged recursively
generated If both define generated, the values MUST be identical; otherwise error
deprecated true if EITHER type marks the field deprecated
link.target If both define target, they MUST be identical; otherwise error
link.validate_exists true if EITHER type sets it to true

Incompatible definitions occur when:

  • The base types differ (e.g., string vs integer)
  • Enum intersections produce an empty set
  • Merged min exceeds merged max
# Type A: status as string
# Type B: status as enum [open, closed]
# Incompatible: different base types → validation error

When field types are incompatible, implementations MUST report a type_conflict error. The file cannot satisfy both schemas simultaneously.

Querying

A multi-type file appears in queries for ANY of its matched types:

# Query for tasks
query:
  types: [task]
# Returns files that are tasks (including files that are also other types)

# Query for files that are BOTH task AND urgent
query:
  where:
    and:
      - 'types.contains("task")'
      - 'types.contains("urgent")'

6.6 Matching Evaluation Order

  1. Check explicit declaration: If type or types is in frontmatter, use those types exclusively. Stop.

  2. Evaluate match rules: For each type with match rules, evaluate all conditions:

    • If all conditions pass, the type matches
    • A type without match rules never matches implicitly
  3. Collect matches: The file's types are all types that matched in step 2.

  4. Untyped: If no types matched, the file is untyped.


6.7 Match Rule Examples

Path-Based Matching

# _types/task.md
match:
  path_glob: "tasks/**/*.md"

All files in tasks/ are tasks.

Field-Based Matching

# _types/actionable.md
match:
  fields_present: [due_date]

Any file with a due_date field is actionable.

Tag-Based Matching

# _types/urgent.md
match:
  where:
    tags:
      contains: "urgent"

Any file tagged "urgent" matches this type.

Combined Matching

# _types/active-task.md
match:
  path_glob: "tasks/**/*.md"
  fields_present: [status, assignee]
  where:
    status:
      neq: "done"

Files in tasks/ with status and assignee fields, where status is not "done".


6.8 Type-Only Files (No Matching)

A type without match rules will never automatically match files. Files must explicitly declare the type:

# _types/template.md
---
name: template
# No match rules
fields:
  template_name:
    type: string
    required: true
---

This type only applies to files that declare type: template or types: [template, ...].


6.9 The `types` Property in Expressions

In expressions, files have a types property (list of strings) representing their matched types:

# Filter: files that are tasks
filters: 'types.contains("task")'

# Filter: files that are both task and urgent
filters: 'types.contains("task") && types.contains("urgent")'

# Filter: files that have no type
filters: "types.length == 0"

6.10 Debugging Type Matching

Implementations SHOULD provide a way to see why a file matched (or didn't match) specific types. For example:

# Show matching analysis for a file
mdbase debug match tasks/fix-bug.md

# Output:
# tasks/fix-bug.md
# ├── Explicit types: none
# ├── Matched types: [task, urgent]
# │   ├── task: matched via path_glob "tasks/**/*.md"
# │   └── urgent: matched via where.tags.contains("urgent")
# └── Unmatched types:
#     └── done: failed where.status.eq("done")

7. Field Types and Constraints

This section defines the data types that can be used in type definitions, along with their constraints and validation rules.


7.1 Field Definition Structure

Every field in a type's fields section has this structure:

fields:
  field_name:
    # Required: the data type
    type: string
    
    # Optional: is this field required?
    required: false
    
    # Optional: default value if field is missing
    default: "untitled"
    
    # Optional: auto-generation strategy
    generated: now
    
    # Optional: human-readable description
    description: "A brief summary"
    
    # Optional: mark as deprecated
    deprecated: false
    
    # Type-specific constraints (see each type below)

7.2 Common Field Options

These options apply to all field types:

`type` (Required)

The data type. One of: string, integer, number, boolean, date, datetime, time, enum, list, object, link, any.

`required`

Whether the field must be present and non-null.

Value Behavior
false (default) Field may be missing or null
true Field must be present and non-null

`default`

A default value applied to the effective value when the field is missing. The default is NOT applied when the field is present but null. Defaults are not required to be persisted unless the caller explicitly requests materialization (see §5.11).

status:
  type: enum
  values: [open, done]
  default: open  # Applied only if 'status' key is absent

`generated`

Automatic value generation. See 7.15 Generated Fields.

`description`

Human-readable description of the field's purpose. Implementations MAY display this in tooling.

`deprecated`

Mark a field as deprecated. Implementations SHOULD warn when deprecated fields are used.

`unique`

Cross-file uniqueness constraint.

Value Behavior
false (default) No uniqueness checking
true Field value MUST be unique across all files matching the declaring type

Rules:

  • Null/undefined values are exempt from uniqueness checks — multiple files may omit the field
  • For multi-type files: uniqueness is checked within each type's file set independently
  • Validation of uniqueness requires scanning all files of the type. Implementations SHOULD use caching for performance
  • Error code: duplicate_value, reported with the field name, conflicting file paths, and the duplicate value

Note: settings.id_field implicitly has unique: true behavior (see §4.4). The unique option makes cross-file uniqueness available for any field.

Example:

fields:
  slug:
    type: string
    unique: true
  email:
    type: string
    unique: true

7.3 `string`

A text value.

title:
  type: string
  required: true
  min_length: 1
  max_length: 200
  pattern: "^[A-Z].*"

Constraints:

Constraint Type Description
min_length integer Minimum string length
max_length integer Maximum string length
pattern string Regex pattern the value must match

Validation:

  • Value must be a string (or coercible to string)
  • Length constraints apply to character count (not bytes)
  • Pattern uses ECMAScript (ES2018+) regular expression syntax as the required baseline (see §4.8 for the full regex specification)

7.4 `integer`

A whole number.

priority:
  type: integer
  min: 1
  max: 5
  default: 3

Constraints:

Constraint Type Description
min integer Minimum value (inclusive)
max integer Maximum value (inclusive)

Validation:

  • Value must be a whole number (no decimal part)
  • YAML integers, strings containing integers, and floats with no decimal part MAY be coerced

7.5 `number`

A floating-point number.

rating:
  type: number
  min: 0.0
  max: 5.0

Constraints:

Constraint Type Description
min number Minimum value (inclusive)
max number Maximum value (inclusive)

Validation:

  • Value must be numeric (integer or float)
  • IEEE 754 special values (NaN, Infinity) are allowed unless explicitly constrained

7.6 `boolean`

A true/false value.

draft:
  type: boolean
  default: false

Validation:

  • Accepts YAML boolean values: true, false
  • Implementations SHOULD also accept YAML 1.1 boolean spellings: yes, no, on, off
  • These should be normalized to true/false on write

7.7 `date`

A calendar date without time.

due_date:
  type: date

Format: ISO 8601 date: YYYY-MM-DD

Examples: 2024-03-15, 2024-12-01

Validation:

  • Must be a valid date (no February 30th)
  • String format must match ISO 8601
  • YAML date scalars MAY be accepted and MUST be normalized to ISO 8601 on write

7.8 `datetime`

A date with time.

created_at:
  type: datetime

Format: ISO 8601 datetime with optional timezone:

  • YYYY-MM-DDTHH:MM:SS
  • YYYY-MM-DDTHH:MM:SSZ
  • YYYY-MM-DDTHH:MM:SS+HH:MM

Examples:

  • 2024-03-15T10:30:00
  • 2024-03-15T10:30:00Z
  • 2024-03-15T10:30:00+05:30

Validation:

  • Must be valid datetime
  • Implementations MUST preserve timezone information if present
  • YAML timestamp scalars MAY be accepted and MUST be normalized to ISO 8601 on write

Timezone Comparison Rules

  • Datetime values with explicit offsets are compared as absolute instants (convert to a common epoch before comparison)
  • Datetime values WITHOUT offsets (naive) are treated as local time in the implementation's configured timezone
  • Comparing an offset-aware datetime with a naive datetime: the naive datetime is interpreted in local time, then both are compared as absolute instants
  • now() returns an offset-aware datetime in the implementation's local timezone
  • today() returns a date in the implementation's local timezone
  • Date arithmetic preserves offset: datetime_with_offset + "1d" keeps the same offset
  • Serialization MUST preserve the original offset if present. 2024-03-15T10:00:00+05:30 MUST NOT be normalized to UTC on write
  • Implementations MAY provide a configuration option for default timezone (not specified in this version — use the local system timezone)

See also §11.7 for date/time functions in expressions.


7.9 `time`

A time without date.

meeting_time:
  type: time

Format: HH:MM or HH:MM:SS

Examples: 14:30, 09:00:00


7.10 `enum`

A value from a fixed set of options.

status:
  type: enum
  values: [draft, review, published, archived]
  default: draft

Required constraint:

Constraint Type Description
values list The allowed values (must be strings)

Validation:

  • Value must exactly match one of the values entries
  • Comparison is case-sensitive
  • Enum values MUST be strings

7.11 `list`

An ordered collection of values.

tags:
  type: list
  items:
    type: string
  min_items: 0
  max_items: 10
  unique: true

Required constraint:

Constraint Type Description
items field definition The type of each list element

Optional constraints:

Constraint Type Description
min_items integer Minimum list length
max_items integer Maximum list length
unique boolean If true, no duplicate values allowed

Validation:

  • Value must be a YAML list
  • Each element is validated against items
  • If unique: true, duplicates cause validation failure

Nested lists:

matrix:
  type: list
  items:
    type: list
    items:
      type: number

7.12 `object`

A nested structure with its own fields.

author:
  type: object
  fields:
    name:
      type: string
      required: true
    email:
      type: string
    url:
      type: string

Required constraint:

Constraint Type Description
fields mapping Field definitions for the nested object

Validation:

  • Value must be a YAML mapping
  • Each field is validated according to its definition
  • Unknown fields are handled according to type's strictness

A reference to another file in the collection.

parent_task:
  type: link
  target: task
  validate_exists: false

related:
  type: list
  items:
    type: link

Optional constraints:

Constraint Type Description
target string Type name to constrain resolution scope
validate_exists boolean If true, validate that target file exists

Accepted formats:

  • Wikilinks: [[target]], [[target|alias]], [[folder/target]]
  • Markdown links: [text](path.md), [text](./relative.md)
  • Bare paths: ./sibling.md, ../parent/file.md

See Links for detailed parsing and resolution rules.


7.14 `any`

Accepts any valid YAML value.

metadata:
  type: any

Use cases:

  • Migration: Temporarily accept untyped data
  • Flexible schemas: When structure varies
  • Extension points: Allow arbitrary user data

Validation:

  • Any valid YAML value is accepted
  • No constraints available

7.15 Generated Fields

Fields can be automatically populated using the generated option:

fields:
  id:
    type: string
    generated: ulid
  
  created_at:
    type: datetime
    generated: now
  
  updated_at:
    type: datetime
    generated: now_on_write
  
  slug:
    type: string
    generated:
      from: title
      transform: slugify

Generation strategies:

Strategy Description
ulid Generate a ULID (Universally Unique Lexicographically Sortable Identifier)
uuid Generate a UUID v4
now Current datetime (on create only)
now_on_write Current datetime (on every write)
{from, transform} Derive from another field

Transform functions for derived fields:

Transform Description
slugify Convert to URL-safe slug
lowercase Convert to lowercase
uppercase Convert to uppercase

Important rules:

  1. Generated values are only applied when the field is missing
  2. User-provided values are NEVER overwritten by now or ulid/uuid
  3. now_on_write ALWAYS updates the field on every write operation
  4. Generated fields can still have required: true (they'll satisfy the requirement via generation)

7.16 Type Coercion

When reading values, implementations MUST attempt to coerce compatible types:

Schema Type Accepts
string Any scalar (converted via toString)
integer Integer, float with no decimal, numeric string
number Integer, float, numeric string
boolean Boolean, "true"/"false" strings, yes/no
date ISO date string, YAML date
datetime ISO datetime string, YAML timestamp

When coercion fails, it is a validation error.


7.17 Summary Table

Type YAML Constraints Notes
string String min_length, max_length, pattern
integer Integer min, max Whole numbers only
number Float/Int min, max Allows decimals
boolean Boolean Normalized to true/false
date String ISO 8601 date
datetime String ISO 8601 datetime
time String HH:MM or HH:MM:SS
enum String values (required) Must match exactly
list List items (required), min_items, max_items, unique
object Mapping fields (required) Nested structure
link String target, validate_exists Reference to file
any Any No validation

8. Links

Links are references from one file to another. They are a first-class concept in this specification due to their prevalence in markdown-based knowledge systems. This section defines link syntax, parsing, resolution, and traversal.


Links transform a folder of files into a knowledge graph. They enable:

  • Navigation: Jump between related documents
  • Backlinks: See what references a document
  • Queries: Find documents by their relationships
  • Validation: Ensure references point to real files
  • Refactoring: Rename files without breaking connections

This specification treats links as typed data with well-defined parsing and resolution semantics.


The link field type accepts three input formats:

The format popularized by wikis and knowledge management tools:

[[target]]
[[target|alias]]
[[target#anchor]]
[[target#anchor|alias]]
[[folder/target]]
[[./relative]]
[[../parent/target]]

Components:

  • target: The file being linked to (without extension by default)
  • alias: Display text (does not affect resolution)
  • anchor: A heading or block reference within the target
  • path: May be absolute (from collection root) or relative (from current file)

Examples:

# Simple link
parent: "[[task-001]]"

# Link with alias (alias is metadata, not used for resolution)
assignee: "[[alice|Alice Smith]]"

# Link to specific section
reference: "[[api-docs#authentication]]"

# Relative link
related: "[[./sibling-task]]"

# Path from root
spec: "[[docs/specs/api-v2]]"

Standard markdown link syntax:

[text](path.md)
[text](./relative.md)
[text](../other/file.md)
[text](path.md#anchor)

The text portion is treated as an alias.

Examples:

parent: "[Parent Task](./tasks/parent.md)"
reference: "[API Docs](docs/api.md#auth)"

8.2.3 Bare Paths

A path without link syntax:

./sibling.md
../other/file.md
folder/file.md

Examples:

config: "./config.md"
parent: "../parent-project/overview.md"

Bare paths follow the same resolution rules as markdown links: they are relative to the containing file's directory unless they start with / (root-relative).


When a link value is read, implementations MUST parse it into a structured representation:

Component Type Description
raw string Original string value exactly as written
target string File path or identifier (without anchor/alias)
alias string? Display text if provided, otherwise null
anchor string? Heading/block reference if provided, otherwise null
format enum One of: wikilink, markdown, path
is_relative boolean Whether path starts with ./ or ../

Parsing examples:

Input target alias anchor format is_relative
[[task-001]] task-001 null null wikilink false
[[task-001|My Task]] task-001 My Task null wikilink false
[[docs/api#auth]] docs/api null auth wikilink false
[[./sibling]] ./sibling null null wikilink true
[Link](file.md) file.md Link null markdown false
./other.md ./other.md null null path true

Resolution transforms a parsed link into an absolute path (relative to collection root) pointing to the target file.

Resolution Algorithm

Given a link value and the path of the file containing it:

  1. Parse the link into components (target, format, is_relative)

  2. If format is markdown or path:

    • If target starts with /, resolve from collection root (strip the leading /)
    • Otherwise, resolve relative to the containing file's directory (markdown-standard behavior)
    • Example: Link [Docs](docs/api.md) in notes/meeting.md resolves to notes/docs/api.md
  3. If format is wikilink:

    • If target starts with ./ or ../, resolve relative to the containing file's directory
    • If target starts with /, resolve from collection root (strip the leading /)
    • If target contains / (and is not relative), resolve from collection root
    • Example: [[docs/api]] resolves to docs/api
  4. If simple name (no /, no ./ or ../, and format is wikilink):

    • Define the search scope:
      • If the link field has target constraint specifying a type, scope to files matching that type
      • Otherwise, scope to the entire collection
    • ID match pass: Search scoped files for id_field == name
      • If exactly one match, resolve to it
      • If multiple matches, resolution MUST fail with ambiguous_link
    • Filename match pass: If no id_field match exists, search scoped files by filename
    • If multiple filename candidates match, apply tiebreakers in order: a. Same directory: Prefer a file in the same directory as the referring file b. Shortest path: Prefer the file with the shortest path (closest to collection root) c. Alphabetical: Sort candidate paths lexicographically and take the first
    • If multiple candidates remain after all tiebreakers, resolve to null and emit an ambiguous_link warning
  5. Extension handling:

    • If target lacks extension, try configured extensions in order (default: .md)
    • Example: [[readme]] tries readme.md, readme.mdx, etc.
  6. Path traversal check:

    • After resolution and normalization, if the resolved path would escape the collection root, abort with path_traversal
  7. Return:

    • The absolute path (relative to collection root) if found
    • null if no matching file exists

Resolution Examples

Given collection structure:

/
├── mdbase.yaml
├── tasks/
│   ├── task-001.md
│   └── subtasks/
│       └── task-002.md
├── notes/
│   └── meeting.md
├── people/
│   └── alice.md
└── journal/
    └── 2024/
        └── 01/
            └── 15.md

Link resolution from tasks/subtasks/task-002.md:

Link Value Resolved Path Notes
[[task-001]] tasks/task-001.md Search by name
[[../task-001]] tasks/task-001.md Relative path
[[./task-003]] tasks/subtasks/task-003.md Relative (may not exist)
[[notes/meeting]] notes/meeting.md Absolute from root
[[meeting]] notes/meeting.md Search by name
[[alice]] people/alice.md Search by name
[link](../task-001.md) tasks/task-001.md Markdown, relative
../task-001.md tasks/task-001.md Bare path, relative

When defining a link field in a type:

fields:
  parent:
    type: link
    target: task           # Constrain resolution to 'task' type
    validate_exists: true  # Fail if target doesn't exist
    description: "Parent task this subtask belongs to"
  
  related:
    type: list
    items:
      type: link          # List of links (no constraints)

`target` Constraint

Limits resolution scope to files of a specific type:

assignee:
  type: link
  target: person

When resolving [[alice]] for this field:

  1. Implementation searches only files that match the person type
  2. Matches by the configured id_field (default: id)
  3. If no person type file has id: alice, resolution fails

`validate_exists` Constraint

When true, unresolved links cause validation errors:

parent:
  type: link
  validate_exists: true

Default is false (links can point to non-existent files).


To support file.links, file.backlinks, file.embeds, and file.tags, implementations MUST extract links and tags from both frontmatter and body content using these rules:

Included:

  • Frontmatter fields of type link and list of link
  • Body links in wikilink form ([[target]], [[target|alias]], [[target#anchor]])
  • Body links in markdown form ([text](path.md)), including #anchor
  • Embeds in wikilink form (![[target]]) and markdown form (![alt](path.md))

Excluded:

  • Links inside fenced code blocks
  • Links inside inline code spans

file.links returns all non-embed links; file.embeds returns only embeds.

Tag Extraction

file.tags includes:

  • Raw persisted frontmatter tags field if present (string or list of strings)
  • Inline tags in body content of the form #tag

Inline tags MUST:

  • Be preceded by whitespace or appear at the start of a line
  • Match the pattern [A-Za-z0-9_/-]+ after # (forward slashes create nested tag hierarchies)
  • Be outside fenced code blocks and inline code spans
  • Not be preceded by ]( or ](http patterns (to exclude URL fragments)

Implementations SHOULD ignore # fragments in URLs. A simple heuristic is to skip any # that is preceded by ), ", ', or appears within a markdown link target ([text](url#fragment)).

Nested Tags

Tags MAY contain forward slashes (/) to create hierarchies: #inbox/to-read, #project/alpha/urgent.

The file.hasTag() function performs prefix matching on nested tags: file.hasTag("inbox") matches #inbox, #inbox/to-read, and #inbox/processing. This is consistent with Obsidian's tag behavior.

Nesting has no depth limit. The / character is purely conventional — implementations do not need to build a tree structure.


Links can be traversed to access properties of the linked file using the asFile() method.

The `asFile()` Method

In expressions, link.asFile() resolves a link to its target file object:

# Filter: tasks assigned to someone on the engineering team
filters: 'assignee.asFile().team == "engineering"'

# Formula: get the parent task's status
formulas:
  parent_status: "parent.asFile().status"

If the link cannot be resolved, asFile() returns null. Subsequent property access on null returns null (no error).

Multi-Hop Traversal

asFile() MAY be chained to traverse multiple links:

assignee.asFile().manager.asFile().name
parent.asFile().project.asFile().status

Each asFile() call resolves the link field on the current file and returns the target file object.

Null propagation: If any hop returns null, the entire chain evaluates to null (no error).

Depth limit: Implementations MUST enforce a maximum traversal depth (default: 10 hops). Exceeding this limit MUST produce an expression_depth_exceeded error. Circular traversal (A → B → A) does not cause infinite loops because the depth limit applies.

Accessing Linked File Properties

Once resolved, you can access:

Frontmatter fields:

parent.asFile().status
parent.asFile().priority
assignee.asFile().name

File metadata:

parent.asFile().file.name
parent.asFile().file.mtime
parent.asFile().file.path

Performance Considerations

Each hop requires loading and parsing the target file. Implementations SHOULD:

  • Cache resolved files during query execution
  • Document performance characteristics for multi-hop queries
  • Consider lazy resolution (only resolve when accessed)
  • Warn users about expensive traversals in large collections

The following functions operate on links and files in expressions:

Function Description Example
link.asFile() Resolve link to file object assignee.asFile().name
file.hasLink(target) File contains link to target file.hasLink(link("tasks/main"))
file.links List of outgoing links file.links.length > 5
file.backlinks List of incoming links (requires index) file.backlinks.length
link(path) Construct a link from path link("people/alice")

file.backlinks returns files that link TO the current file. This requires either:

  • A full scan of all files (slow without cache)
  • A pre-computed reverse index (requires cache)

Implementations SHOULD document whether file.backlinks requires caching for reasonable performance.

Example: Find files linking to current file

filters: "file.hasLink(this.file)"

When writing links to frontmatter, implementations SHOULD preserve the original format when possible:

  • If user wrote [[note]], prefer outputting [[note]] over ./note.md
  • If user wrote a relative path, preserve relativity when possible
  • If user wrote with an alias, preserve the alias

This preserves user intent and keeps files human-readable.


While this specification focuses on frontmatter, links also appear in markdown body content. Implementations SHOULD support:

  • Parsing links from body content
  • Updating body links during rename operations
  • Including body links in file.links

Implementations that do NOT support body link parsing MUST document this limitation. See Operations for rename behavior.


A "broken link" is a link that cannot be resolved to an existing file. Handling options:

Scenario Behavior
validate_exists: false (default) Broken links are allowed; asFile() returns null
validate_exists: true Broken links cause validation errors
Rename operations Implementations SHOULD update links to maintain validity
Delete operations Implementations MAY warn about incoming links that will break

Simple Task Hierarchy

# tasks/parent.md
---
type: task
id: parent
title: Main Feature
subtasks:
  - "[[child-1]]"
  - "[[child-2]]"
---
# tasks/child-1.md
---
type: task
id: child-1
title: Subtask One
parent: "[[parent]]"
---

Cross-Type References

# tasks/implement-api.md
---
type: task
assignee: "[[alice]]"
spec: "[[docs/api-spec]]"
related:
  - "[[tasks/write-tests]]"
  - "[[tasks/update-docs]]"
---
# projects/alpha/tasks/task-001.md
---
type: task
parent: "[[../overview]]"           # projects/alpha/overview.md
sibling: "[[./task-002]]"           # projects/alpha/tasks/task-002.md
global_reference: "[[people/bob]]"  # people/bob.md (from root)
---

8.13 Path Sandboxing

Link resolution MUST NOT resolve to paths outside the collection root.

Rules

  • After resolving relative paths (applying ../ segments), the resulting absolute path MUST be within the collection root directory
  • If resolution would escape the collection root, the link MUST resolve to null and implementations MUST emit a path_traversal error
  • This applies to all link formats: wikilinks, markdown links, and bare paths
  • Implementations MUST normalize paths (resolve . and .. segments) before checking containment

Example

In a collection rooted at /home/user/notes/:

Link From File Result
[[../../../etc/passwd]] notes/daily.md null + path_traversal error
[[../../secrets/key]] deep/nested/file.md null + path_traversal error
[[../sibling]] tasks/task-001.md Resolves normally (stays within root)

9. Validation

Validation ensures that files conform to their type schemas. This section defines what is validated, when validation occurs, and how errors are reported.


9.1 Validation Levels

Implementations MUST support three validation levels:

Level Behavior
off No validation performed
warn Validation runs; issues are reported but operations succeed
error Validation runs; issues cause operations to fail

The default level is configured via settings.default_validation (default: "warn").

Operations MAY override the default level:

# Force error-level validation
mdbase validate --level error

# Create with no validation
mdbase create --no-validate

9.2 What Is Validated

For each typed file, validation checks the following:

9.2.1 Required Fields

Fields marked required: true MUST be:

  1. Present in the effective frontmatter (defaults applied; computed fields excluded)
  2. Non-null (value is not null)

Note: exists(field) checks for a present key in raw persisted frontmatter even if its value is null. Required fields must be present in the effective frontmatter and non-null.

# Type definition
fields:
  title:
    type: string
    required: true

# Valid
title: "My Document"

# Invalid: missing
# (no title key)

# Invalid: null
title: null
title:

9.2.2 Type Correctness

Values MUST match their declared type (or be coercible):

# Type definition
fields:
  priority:
    type: integer

# Valid
priority: 5
priority: "5"  # Coerced to integer

# Invalid
priority: "high"
priority: 5.5

9.2.3 Field Constraints

Type-specific constraints MUST be satisfied:

Type Constraints
string min_length, max_length, pattern
integer, number min, max
list min_items, max_items, unique
enum values
link validate_exists

9.2.4 Unknown Fields (Strictness)

When a type has strict: true, unknown fields cause validation failure:

# Type definition: strict: true
fields:
  title:
    type: string

# Valid
title: "Doc"

# Invalid: unknown field
title: "Doc"
extra_field: "not allowed"

With strict: "warn", unknown fields trigger warnings but pass validation.

Implicit fields: The following frontmatter keys are always implicitly allowed, even in strict mode:

  • type / types — type declaration keys (configurable via settings.explicit_type_keys)
  • Any keys listed in settings.explicit_type_keys

These keys are structural and do not need to be declared in the type's fields definition.

9.2.5 Multi-Type Validation

When a file matches multiple types, it MUST validate against ALL of them:

# File matches both 'task' and 'urgent' types
# Must satisfy:
# - All required fields from 'task'
# - All constraints from 'task'
# - All required fields from 'urgent'
# - All constraints from 'urgent'

For link fields with validate_exists: true, the target file MUST exist:

# Type definition
fields:
  parent:
    type: link
    validate_exists: true

# Valid (if file exists)
parent: "[[existing-task]]"

# Invalid (file doesn't exist)
parent: "[[nonexistent]]"

9.2.7 Filename Patterns

If a type defines filename_pattern, filenames MAY be validated:

# Type definition
filename_pattern: "{id}.md"

# File: task-001.md with id: "task-001" → valid
# File: random-name.md with id: "task-001" → warning (mismatch)

Filename pattern validation is RECOMMENDED but not strictly required.

9.2.8 Unique ID Field

If settings.id_field is configured (default: id), values of that field MUST be unique across the collection. If duplicates exist, validation MUST emit a duplicate_id issue for each file that shares the duplicated value.


9.3 Validation Issue Format

Each validation issue MUST include:

Field Type Description
path string File path relative to collection root
field string Field path (e.g., author.email, tags[0])
code string Error code (see Appendix C)
message string Human-readable error description
severity enum error or warning

Optional fields:

Field Type Description
expected any Expected value or type
actual any Actual value found
type string Type name that triggered the issue
line integer 1-based line number in the source file
column integer 1-based column number
end_line integer End line of the issue range
end_column integer End column of the issue range

Implementations SHOULD include line and column fields when source position information is available. These fields enable LSP-style diagnostics and precise issue reporting in CI tooling.

Example Issue

{
  "path": "tasks/fix-bug.md",
  "field": "priority",
  "code": "constraint_violation",
  "message": "Value 7 exceeds maximum of 5",
  "severity": "error",
  "expected": { "max": 5 },
  "actual": 7,
  "type": "task",
  "line": 5,
  "column": 11,
  "end_line": 5,
  "end_column": 12
}

9.4 Validation Timing

Implementations MAY validate at different times:

When Description
On read Validate when loading a file
On write Validate before creating or updating
On demand Validate via explicit command
Continuous Watch mode; validate on file changes

The specification does not mandate when validation occurs, only the behavior when it does.

  • Create/Update operations: Validate before writing; fail if validation: error
  • Read/Query operations: Optionally validate; report issues but don't fail
  • Explicit validate command: Full collection validation with detailed report

9.5 Validation Commands

Implementations SHOULD provide explicit validation commands:

# Validate entire collection
mdbase validate

# Validate specific files
mdbase validate tasks/fix-bug.md notes/meeting.md

# Validate files of a specific type
mdbase validate --type task

# Validate with specific level
mdbase validate --level error

# Output validation report as JSON
mdbase validate --format json

9.6 Partial Validation

For large collections, implementations MAY support partial validation:

  • Validate only modified files (since last validation)
  • Validate only files in specific folders
  • Validate only files matching certain types

This is an optimization; full validation MUST remain available.


9.7 Validation Report Example

Human-readable format:

Validation Report
================

Errors: 3
Warnings: 5

tasks/fix-bug.md
  ERROR [missing_required] Field 'title' is required but missing
  ERROR [type_mismatch] Field 'priority': expected integer, got string "high"
  WARNING [unknown_field] Field 'custom' is not defined in type 'task'

notes/meeting.md
  ERROR [constraint_violation] Field 'attendees': minimum 1 item required, got 0
  WARNING [deprecated_field] Field 'old_field' is deprecated

tasks/subtask.md
  WARNING [link_not_found] Field 'parent': target '[[nonexistent]]' not found

JSON format:

{
  "summary": {
    "files_checked": 42,
    "files_valid": 39,
    "files_invalid": 3,
    "errors": 3,
    "warnings": 5
  },
  "issues": [
    {
      "path": "tasks/fix-bug.md",
      "field": "title",
      "code": "missing_required",
      "message": "Field 'title' is required but missing",
      "severity": "error",
      "type": "task"
    }
  ]
}

9.8 Auto-Fix (Optional)

Implementations MAY support automatic fixing of certain issues:

Issue Auto-Fix
Missing field with default Apply default value
Type coercion possible Coerce value
Missing generated field Generate value

Auto-fix MUST NOT:

  • Delete user data
  • Make changes that could lose information
  • Fix issues where the correct resolution is ambiguous
# Preview fixes
mdbase validate --fix --dry-run

# Apply fixes
mdbase validate --fix

9.9 Validation in Multi-Type Context

When a file matches multiple types, validation follows these rules:

  1. All types validated: The file must pass validation for ALL matched types
  2. Issues attributed: Each issue includes which type triggered it
  3. Conflict detection: If types have incompatible field definitions, report as error

Example conflict:

# Type 'a' defines: status as string
# Type 'b' defines: status as enum [open, closed]

# File matches both types
# File has: status: "pending"

# Result:
# - Passes type 'a' validation (valid string)
# - Fails type 'b' validation ("pending" not in enum)
# - Overall: FAIL (must pass all types)

9.10 Skipping Validation

Certain scenarios may warrant skipping validation:

  • Migration: Importing data that doesn't yet conform
  • Bulk operations: Performance-critical batch updates
  • Emergency fixes: Bypassing validation to fix broken state

Implementations SHOULD support:

# Skip validation on create
mdbase create --no-validate task.md

# Skip validation on update
mdbase update --no-validate task.md

Skipping validation SHOULD be logged for audit purposes.

10. Query Model

Queries retrieve files from the collection based on filters, with support for sorting, pagination, and computed fields. This section defines the query structure and semantics.


10.1 Query Overview

A query is a request to retrieve files matching certain criteria. Queries can:

  • Filter by type
  • Filter by frontmatter field values
  • Filter by file metadata
  • Filter by path patterns
  • Sort results
  • Paginate results
  • Compute derived fields (formulas)

Queries operate on the collection as a flat list of files. The result is a list of file records matching the criteria.


10.2 Core Query Structure

A query is expressed as a YAML object with optional clauses:

query:
  # Filter by type(s) - optional
  types: [task]
  
  # Filter by folder prefix - optional
  folder: "projects/alpha"
  
  # Filter expressions - optional
  where:
    and:
      - 'status != "done"'
      - "priority >= 3"
  
  # Sorting - optional
  order_by:
    - field: due_date
      direction: asc
    - field: priority
      direction: desc

  # Pagination - optional
  limit: 20
  offset: 0

Core Query checklist: types, folder, where, order_by, limit, offset, include_body

Core vs Query+ Summary

Clause Core Query+
types
folder
where
order_by
limit / offset
include_body
formulas
groupBy
summaries
property_summaries
properties

10.3 Core Query Clauses

`types`

Filter to files matching specified type(s):

# Single type
types: [task]

# Multiple types (OR)
types: [task, note]

Files must match at least one of the listed types.

`folder`

Filter to files within a folder (and subfolders):

folder: "projects/alpha"

Matches files with paths starting with projects/alpha/.

`where`

Filter by expression conditions. Can be:

A single expression string:

where: 'status == "open"'

A logical combination:

where:
  and:
    - 'status == "open"'
    - "priority >= 3"
    - or:
        - 'tags.contains("urgent")'
        - "due_date < today()"
    - not: "draft == true"

Shape rules (Obsidian Bases-compatible):

  • A where value MAY be a string expression.
  • A where value MAY be a logical object with one of the keys and, or, not.
  • and/or values are lists of conditions (each condition is either a string expression or another logical object).
  • not value is a single condition (string expression or logical object).

See Expressions for the full expression language.

`order_by`

Sort results by one or more fields:

order_by:
  - field: due_date
    direction: asc      # ascending (oldest first)
  - field: priority
    direction: desc     # descending (highest first)

Direction values:

  • asc: Ascending (A-Z, 1-9, oldest-newest)
  • desc: Descending (Z-A, 9-1, newest-oldest)

Null handling: Null values sort last by default. Tie-breakers: If all order_by fields compare equal, implementations MUST apply a stable tie-breaker by ascending file.path to ensure deterministic output.

Formula sorting:

order_by:
  - field: formula.urgency_score
    direction: desc

String Collation

Default string ordering uses Unicode code point order (lexicographic comparison of Unicode scalar values):

  • Comparison is case-sensitive by default: uppercase letters sort before lowercase ("A" < "a")
  • Null values sort LAST in ascending order and FIRST in descending order
  • Implementations MAY support locale-aware collation as an ext-prefixed extension
  • For enum fields, sort order follows the values list declaration order, not string order

`limit` and `offset`

Paginate results:

limit: 20    # Return at most 20 results
offset: 40   # Skip the first 40 results

Together these enable pagination: page 3 of 20 items = offset: 40, limit: 20.


10.4 Logical Operators in `where`

The where clause supports nested logical operators:

Operator YAML Key Description
AND and: All conditions must be true
OR or: At least one condition must be true
NOT not: Condition must be false

Examples:

# AND: all must match
where:
  and:
    - 'status == "open"'
    - "priority >= 3"

# OR: any must match
where:
  or:
    - 'status == "blocked"'
    - "due_date < today()"

# NOT: must not match
where:
  not: 'status == "done"'

# Nested logic
where:
  and:
    - 'status != "done"'
    - or:
        - "priority >= 4"
        - 'tags.contains("urgent")'

Alternatively, use expression operators directly:

where: 'status != "done" && (priority >= 4 || tags.contains("urgent"))'

10.5 Property Namespaces

In query expressions, properties are accessed through namespaces:

Namespace Description Example
(bare) Effective frontmatter property (defaults applied, computed excluded) status, priority
note. Raw persisted frontmatter (for reserved names) note.type, note["my-field"]
file. File metadata file.name, file.mtime
formula. Computed fields formula.overdue
this Context file (for embedded queries) this.file.name

File Properties

Property Type Description
file.name string Filename with extension (e.g., "task-001.md")
file.basename string Filename without final extension (e.g., "task-001"; for "file.draft.md" this is "file.draft")
file.path string Full path from collection root
file.folder string Parent folder path
file.ext string File extension without dot (e.g., "md")
file.size number File size in bytes
file.ctime datetime Created time
file.mtime datetime Modified time
file.links list Outgoing links (including links to non-markdown files)
file.backlinks list Incoming links (requires index); MAY be null/empty if backlinks are unsupported
file.tags list All tags (raw frontmatter tags + inline #tags, including nested)
file.properties object Raw persisted frontmatter properties only (no computed fields, no applied defaults). This is equivalent to note.
file.embeds list All embed links in the file body

Body Content Properties

The file.body property provides access to the raw markdown body content (everything after the frontmatter closing ---):

# Find files that mention a keyword in their body
query:
  where: 'file.body.contains("TODO")'

# Case-insensitive body search
query:
  where: 'file.body.lower().contains("important")'

# Regex body search
query:
  where: 'file.body.matches("\\bAPI\\b")'

Rules:

  • file.body is a string and supports all string methods from §11.5: .contains(), .matches(), .lower(), .startsWith(), etc.
  • Body search operates on raw markdown text including syntax characters
  • Content inside fenced code blocks IS included in file.body (it is the raw text)
  • Implementations SHOULD support file.body in filters without requiring include_body: true in the query — the body is used for filtering, not necessarily returned in results
  • Performance note: Body search without caching requires reading every file. Implementations SHOULD use full-text indexes when available
  • Note: file.body includes content inside code blocks, but file.links and file.tags exclude links and tags inside code blocks (see §8). This means file.body.contains("[[foo]]") may match a link that does not appear in file.links

The `this` Context

In embedded queries (queries within a file), this refers to the containing file:

# Find files linking to current file
where: "file.hasLink(this.file)"

# Find tasks assigned to current file's author
where: "assignee == this.author"

10.6 Result Structure

Query results return file objects with this structure. The frontmatter field is the effective frontmatter (defaults applied, computed fields excluded). Raw persisted values are available via file.properties/note..

- path: "tasks/fix-bug.md"
  types: [task, urgent]
  frontmatter:  # Effective frontmatter (defaults applied, computed excluded)
    id: "task-001"
    title: "Fix the login bug"
    status: open
    priority: 4
    tags: [bug, auth]
  formulas:
    overdue: true
    days_until_due: -3
  file:
    name: "fix-bug.md"
    folder: "tasks"
    mtime: "2024-03-15T10:30:00Z"
    size: 1234
  body: "..."  # Optional, if requested

Result Envelope

Query results MUST include metadata alongside the result list:

results:
  - path: "tasks/fix-bug.md"
    types: [task, urgent]
    frontmatter:
      id: "task-001"
      title: "Fix the login bug"
      # ...
meta:
  total_count: 142    # Total matching records (before limit/offset)
  limit: 20
  offset: 0
  has_more: true      # Whether more results exist beyond this page

Fields:

  • total_count: The total number of records matching the query filters, ignoring limit and offset. Implementations MUST compute this accurately
  • has_more: true if offset + length(results) < total_count
  • When no limit is specified, has_more is false and total_count equals the result count

Including Body Content

By default, body content is not included in results. To include it:

query:
  include_body: true

This increases memory usage for large result sets.


10.7 Query+ (Optional Advanced Features)

The following clauses are OPTIONAL and are part of the Query+ profile. Implementations are not required to support Query+ to claim conformance at Level 3.

`formulas`

Define computed fields evaluated for each result:

formulas:
  overdue: "due_date < today() && status != 'done'"
  days_until_due: "due_date - today()"
  display_priority: 'if(priority >= 4, "🔴", if(priority >= 2, "🟡", "🟢"))'

Formulas are accessible via the formula. namespace in subsequent expressions and in results.

`groupBy`

Group results by a property value. Each unique value creates a group:

groupBy:
  property: status
  direction: ASC    # ASC or DESC
  • Only one groupBy property is supported per query.
  • direction controls the sort order of groups: ASC (default) or DESC.
  • Results within each group follow the order_by sort.
  • Ungrouped results (null/missing group value) appear in a separate group.

`summaries`

Define custom summary formulas. In summary expressions, the values keyword represents all values for the associated property across the result set:

summaries:
  custom_avg: "values.reduce(acc + value, 0) / values.length"
  rounded_mean: "values.reduce(acc + value, 0) / values.length"

Summary semantics:

  • values is an ordered list matching the result order (or group order when groupBy is used).
  • Missing properties contribute null values to values.
  • Implementations SHOULD preserve null values in values for custom summaries.
  • Built-in summaries SHOULD ignore null/empty values unless otherwise specified (e.g., Empty, Filled).

See Expressions §11.14 for default summary functions.

`property_summaries`

Assign summary functions to specific properties. These calculate an aggregate value across all records (or per group when groupBy is used):

property_summaries:
  priority: Average
  estimate_hours: Sum
  due_date: Earliest
  formula.overdue: Checked

Values reference either default summary names (see Expressions §11.14) or custom summaries defined in the summaries section.

When groupBy is present, property summaries are computed per group.

`properties`

Display configuration for properties. Does not affect query logic---used by view renderers:

properties:
  status:
    displayName: "Current Status"
  formula.overdue:
    displayName: "Overdue?"
  file.ext:
    displayName: "Extension"

Display names are not used in filters or formulas.


10.8 Query Examples

Core Examples

All Open Tasks

query:
  types: [task]
  where: 'status == "open"'

High Priority Tasks Due This Week

query:
  types: [task]
  where:
    and:
      - "priority >= 4"
      - "due_date <= today() + '7d'"
      - 'status != "done"'

Files Modified Today

query:
  where: "file.mtime > today()"

Tasks Tagged Urgent or Blocker

query:
  types: [task]
  where: 'tags.containsAny("urgent", "blocker")'

Tasks Assigned to Engineering Team Members

query:
  types: [task]
  where: 'assignee.asFile().team == "engineering"'

Notes Linking to a Specific Task

query:
  types: [note]
  where: 'file.hasLink(link("tasks/task-001"))'
query:
  where: "file.hasLink(this.file)"

Files Matching Multiple Types

query:
  where:
    and:
      - 'types.contains("actionable")'
      - 'types.contains("urgent")'

Query+ Examples

Overdue Tasks Sorted by Priority

query:
  types: [task]
  where:
    and:
      - "formula.is_overdue == true"
      - 'status != "blocked"'
  formulas:
    is_overdue: "due_date < today() && status != 'done'"
    urgency_score: "priority + if(due_date < today() - '7d', 5, 0)"
  order_by:
    - field: formula.urgency_score
      direction: desc
  limit: 10

Tasks Grouped by Status (Query+)

query:
  types: [task]
  where: 'status != "cancelled"'
  groupBy:
    property: status
    direction: ASC
  property_summaries:
    priority: Average
    estimate_hours: Sum
  order_by:
    - field: priority
      direction: desc

Untyped Files

query:
  where: "types.length == 0"

Paginated Results

# Page 1
query:
  types: [task]
  order_by:
    - field: created_at
      direction: desc
  limit: 20
  offset: 0

# Page 2
query:
  types: [task]
  order_by:
    - field: created_at
      direction: desc
  limit: 20
  offset: 20

10.9 Query Optimization

Implementations SHOULD optimize queries where possible:

  • Index usage: Use indexes for common filters (type, path prefix)
  • Short-circuit evaluation: Stop evaluating OR clauses on first match
  • Lazy loading: Don't parse body content unless requested
  • Caching: Cache query results for repeated queries

Complex queries (link traversal, formulas) may require full scans. Implementations SHOULD document performance characteristics.


10.10 Query API Considerations

Implementations exposing queries via API SHOULD support:

Programmatic access:

const results = await collection.query({
  types: ['task'],
  where: 'status == "open"',
  orderBy: [{ field: 'priority', direction: 'desc' }],
  limit: 10
});

CLI access:

mdbase query --type task --where 'status == "open"' --limit 10

The exact API surface is implementation-dependent.

11. Expression Language

Expressions are strings that evaluate to values. They are used in query filters, match conditions, and computed formulas. This section defines the expression syntax and available functions.


11.1 Expression Context

Expressions are evaluated in a context that provides:

  • Frontmatter fields: Direct access via bare names (effective values: defaults applied, computed excluded)
  • Raw frontmatter: Via the note. namespace (equivalent to file.properties)
  • File metadata: Via file. prefix
  • Formula values: Via formula. prefix
  • Context reference: Via this (in embedded queries)
  • Built-in functions: Date functions, type checks, etc.

11.2 Literals

Strings

"hello world"    // Double quotes
'hello world'    // Single quotes
"line 1\nline 2" // Escape sequences supported

Numbers

123       // Integer
45.67     // Decimal
-10       // Negative
1e6       // Scientific notation

Booleans

true
false

Null

null

Lists (in expressions)

["a", "b", "c"]
[1, 2, 3]

11.3 Property Access

Frontmatter Fields

status              // Direct access
priority            // Direct access
author.name         // Nested object
tags[0]             // List index (0-based)

Bracket Notation

For fields with special characters:

note["field-with-dashes"]
note["field.with.dots"]

File Metadata

file.name           // "task-001.md"
file.path           // "tasks/task-001.md"
file.folder         // "tasks"
file.ext            // "md"
file.size           // 1234 (bytes)
file.ctime          // Created datetime
file.mtime          // Modified datetime
file.body           // Raw markdown body content (string)

Formulas

formula.overdue
formula.urgency_score

Context (this)

this.file.name      // Current file's name
this.author         // Current file's author field

11.4 Operators

Comparison Operators

Operator Description Example
== Equal status == "open"
!= Not equal status != "done"
> Greater than priority > 3
< Less than priority < 3
>= Greater or equal priority >= 3
<= Less or equal priority <= 3

Arithmetic Operators

Operator Description Example
+ Addition priority + 1
- Subtraction total - discount
* Multiplication count * 2
/ Division total / count
% Modulo index % 2
( ) Grouping (a + b) * c

Boolean Operators

Operator Description Example
&& Logical AND a && b
|| Logical OR a || b
! Logical NOT !done

Null Coalescing

value ?? default    // Returns default if value is null

11.5 String Methods

Method Description Example
.length String length (field) title.length
.contains(str) Contains substring title.contains("bug")
.containsAll(...strs) Contains all substrings title.containsAll("bug", "fix")
.containsAny(...strs) Contains any substring title.containsAny("bug", "fix")
.startsWith(str) Starts with prefix title.startsWith("WIP:")
.endsWith(str) Ends with suffix file.name.endsWith(".draft.md")
.isEmpty() Empty or absent title.isEmpty()
.lower() Convert to lowercase status.lower()
.upper() Convert to uppercase status.upper()
.title() Title case name.title()
.trim() Remove whitespace title.trim()
.slice(start, end?) Extract substring id.slice(0, 4)
.split(sep, n?) Split to list tags_str.split(",")
.replace(pattern, repl) Replace pattern title.replace("old", "new")
.repeat(count) Repeat string "-".repeat(3)
.reverse() Reverse string name.reverse()
.matches(regex) Regex match (see §4.8 for regex flavor) title.matches("^TASK-\\d+")

11.6 List Methods

Method Description Example
.length List length (field) tags.length
.contains(value) Contains element tags.contains("urgent")
.containsAll(...values) Contains all elements tags.containsAll("a", "b")
.containsAny(...values) Contains any element tags.containsAny("a", "b")
.isEmpty() List is empty tags.isEmpty()
[index] Element at index tags[0]
.filter(expr) Filter elements items.filter(value > 2)
.map(expr) Transform elements tags.map(value.lower())
.reduce(expr, init) Reduce to single value nums.reduce(acc + value, 0)
.flat() Flatten nested lists nested.flat()
.reverse() Reverse element order items.reverse()
.slice(start, end?) Extract portion items.slice(0, 3)
.sort() Sort ascending tags.sort()
.unique() Remove duplicates tags.unique()
.join(sep) Join to string tags.join(", ")

In filter(), map(), and reduce(), the implicit variables value and index refer to the current element and its position. For reduce(), acc is the accumulator.

containsAll() and containsAny() are variadic; passing a list literal counts as a single value and does not auto-expand.


11.7 Date/Time Functions

Current Date/Time

Function Returns Description
now() datetime Current date and time
today() date Current date (no time)

Timezone semantics:

  • now() and today() use the implementation's local timezone unless otherwise configured.
  • Date-only values (date type) are interpreted in the local timezone for comparisons.
  • Datetime values with explicit offsets MUST be compared in absolute time.

Parsing

Function Description Example
date(string) Parse date date("2024-03-15")
datetime(string) Parse datetime datetime("2024-03-15T10:30:00Z")

Date Components

Method Returns Description
.year integer Year component
.month integer Month (1-12)
.day integer Day of month
.hour integer Hour (0-23)
.minute integer Minute (0-59)
.second integer Second (0-59)
.dayOfWeek integer Day of week (0=Sunday)
.date() date Date portion only
.time() time Time portion only

Date Formatting

due_date.format("YYYY-MM-DD")
created_at.format("MMM D, YYYY")

Common format tokens:

  • YYYY: 4-digit year
  • MM: 2-digit month
  • DD: 2-digit day
  • HH: 2-digit hour (24h)
  • mm: 2-digit minute
  • ss: 2-digit second

11.8 Date Arithmetic

Dates support arithmetic with duration strings:

due_date + "7d"           // Add 7 days
now() - "1w"              // Subtract 1 week
file.mtime > now() - "24h"  // Modified in last 24 hours

Duration units:

Unit Aliases
y year, years
M month, months
w week, weeks
d day, days
h hour, hours
m minute, minutes
s second, seconds

Duration string format:

Each duration string contains a single number-unit pair. Whitespace between the number and unit is allowed ("7d" and "7 days" are equivalent). Compound durations in a single string (e.g., "1d12h") are NOT supported — chain additions instead:

date + "1M" + "4h" + "3m"  // Add 1 month, 4 hours, 3 minutes

Calendar arithmetic: Adding months or years clamps to the last day of the target month. For example, date("2024-01-31") + "1M" returns 2024-02-29 (2024 is a leap year), not 2024-03-02.

Examples:

today() + "30d"           // 30 days from now
due_date - "2w"           // 2 weeks before due date
created_at + "1y"         // 1 year after creation

Date comparison:

due_date < today()                    // Overdue
due_date < today() + "7d"             // Due within a week
file.mtime > now() - "1h"             // Modified in last hour

Date subtraction:

Subtracting two dates returns the difference in milliseconds:

now() - file.ctime                    // Milliseconds since creation
(today() - due_date) / 86400000       // Days overdue (negative if not yet due)
(now() + "1d") - now()                // Returns 86400000

Duration function:

The duration() function explicitly parses a duration string. This is needed when performing arithmetic on durations themselves:

now() + (duration("1d") * 2)          // 2 days from now
duration("5h") * 3                    // Duration must be on the left

11.9 Conditional Expression

if(condition, then_value, else_value)

Examples:

if(priority > 3, "high", "normal")
if(status == "done", "✓", "○")
if(due_date < today(), "overdue", if(due_date < today() + "7d", "soon", "ok"))

11.10 Null Handling

Check Existence

exists(field)    // true if field key is present (including null values)
field.isEmpty()  // true if field is null, empty, or absent

Note: exists() checks for key presence in raw persisted frontmatter. A field with value null exists but is empty. Use isEmpty() to check if a field has a meaningful value.

Provide Default

default(field, value)  // Return value if field is null or missing
field ?? value         // Null coalescing operator

Examples:

exists(due_date)                    // Has a due date?
default(priority, 3)                // Default priority to 3
assignee ?? "unassigned"            // Default to "unassigned"

**Missing vs null:** In expressions, missing properties are treated like `null` for `default()` and `??`. Use `exists(field)` to distinguish missing from present-null.

11.11 Type Checking and Conversion

Type Checking

value.isType("string")     // true if value is a string
value.isType("number")     // true if value is a number
value.isType("boolean")    // true if value is a boolean
value.isType("date")       // true if value is a date
value.isType("list")       // true if value is a list
value.isType("object")     // true if value is an object

Type Conversion

value.toString()           // Convert any value to string
number("3.14")             // Parse string to number
number(true)               // Returns 1 (false returns 0)
number(date_value)         // Milliseconds since epoch
value.isTruthy()           // Coerce to boolean
list(value)                // Wrap in list if not already a list

Function Description Example
link.asFile() Resolve link to file parent.asFile().status
link(path) Construct link link("tasks/task-001")
file.hasLink(target) File links to target file.hasLink(link("api-docs"))
file.hasTag(...tags) File has any of the given tags; uses prefix matching for nested tags (see §8) file.hasTag("important")
file.hasProperty(name) Raw persisted frontmatter has the key file.hasProperty("status")
file.inFolder(path) File is in folder (or subfolder) file.inFolder("archive")
file.asLink(display?) Convert file to link file.asLink("display text")

11.13 Object Methods

Method Description Example
.isEmpty() Has no properties metadata.isEmpty()
.keys() List of property names metadata.keys()
.values() List of property values metadata.values()

11.14 Summary Functions

Summary functions operate on a collection of values across all matching records. They are used in the summaries section of a query (see Querying).

In summary formulas, the values keyword represents all values for a given property across the result set. The formula MUST return a single value.

Summary value semantics:

  • values is ordered to match the query result order (or group order when grouped).
  • Missing properties contribute null values to values.
  • Custom summaries receive values with null entries intact.
  • Built-in summaries SHOULD ignore null/empty values unless the function is explicitly about emptiness (e.g., Empty, Filled).
values.reduce(acc + value, 0)         // Sum
values.reduce(acc + value, 0) / values.length  // Average
values.filter(value.isTruthy()).length // Count of truthy values

Default Summary Functions

Implementations SHOULD provide these built-in summary functions:

Name Input Type Description
Average Number Mean of all numeric values
Min Number Smallest number
Max Number Largest number
Sum Number Sum of all numbers
Range Number Difference between Max and Min
Median Number Median value
Earliest Date Earliest date
Latest Date Latest date
Checked Boolean Count of true values
Unchecked Boolean Count of false values
Empty Any Count of empty/null values
Filled Any Count of non-empty values
Unique Any Count of unique values

11.15 Operator Precedence

From highest to lowest:

  1. ( ) - Grouping
  2. . [] - Property access
  3. ! - (unary) - Negation
  4. * / % - Multiplication
  5. + - - Addition
  6. < <= > >= - Comparison
  7. == != - Equality
  8. && - Logical AND
  9. || - Logical OR
  10. ?? - Null coalescing

Use parentheses to clarify complex expressions.


11.16 Lambda Expressions

List methods like filter(), map(), and reduce() use implicit variables rather than arrow function syntax:

// value refers to the current element, index to its position
items.filter(value > 2)
tags.map(value.lower())
items.map(value.toString() + " (" + index.toString() + ")")

// reduce also provides acc (accumulator)
numbers.reduce(acc + value, 0)

Implementations MAY also support arrow function syntax as an extension:

tags.map(t => t.lower())
tasks.filter(t => t.status != "done")

If arrow functions are supported, implementations SHOULD parse them only within function argument positions and treat => as part of the lambda expression itself (not as a general-purpose operator).


11.17 Expression Examples

Simple Filters

status == "open"
priority >= 4
tags.contains("urgent")

Combined Conditions

status == "open" && priority >= 4
status == "blocked" || due_date < today()
!(status == "done")

Date Filters

due_date < today()
due_date < today() + "7d"
file.mtime > now() - "24h"
created_at.year == 2024

String Filters

title.contains("bug")
title.lower().contains("urgent")
file.name.startsWith("draft-")
id.matches("^TASK-\\d{4}$")

List Operations

tags.length > 0
tags.contains("important")
tags.containsAny("urgent", "critical")
assignees.filter(a => a.asFile().team == "eng").length > 0

Computed Fields (Formulas)

// Is overdue?
due_date < today() && status != "done"

// Days overdue (date subtraction returns milliseconds)
(today() - due_date) / 86400000

// Priority display
if(priority >= 4, "🔴 Critical", if(priority >= 2, "🟡 Normal", "🟢 Low"))

// Urgency score
priority * 10 + if(due_date < today(), 50, 0)
parent.asFile().status == "done"
assignee.asFile().team == "engineering"
blocks.map(b => b.asFile().status).contains("blocked")

11.18 Error Handling

Expression evaluation errors MUST be handled gracefully and MUST NOT abort the overall query:

Error Behavior
Property access on null Returns null
Method call on null Returns null
Division by zero Returns null and emits type_error
Invalid regex Evaluation error (see §4.8 for regex flavor)
Type mismatch Returns null and emits type_error

Implementations SHOULD log evaluation errors and continue processing where possible.


11.19 Expression Portability

Expressions using only spec-defined functions and operators are portable expressions. This section defines rules for maintaining portability across implementations.

Custom Functions

Implementations MAY define custom functions beyond those specified in this document. Custom functions MUST be namespaced with the ext prefix using either :: or . as a delimiter:

ext::myFunc(value)    // Double-colon delimiter
ext.myFunc(value)     // Dot delimiter

Both delimiter forms are equivalent. Implementations MUST accept either form for custom functions they define.

Rules

  1. Namespace requirement: Implementations MUST namespace all custom functions with the ext prefix. Unprefixed custom functions are not permitted.

  2. No shadowing: Implementations MUST NOT override or shadow built-in functions or operators defined in this specification.

  3. Non-portable warnings: Implementations SHOULD emit a warning when evaluating an expression that uses non-portable functions (i.e., ext-prefixed functions).

  4. Documentation: Type definitions and queries SHOULD note when they depend on non-portable expressions.

Example

# Portable expression — uses only spec-defined functions
filters: 'status == "open" && due_date < today()'

# Non-portable expression — uses a custom function
filters: 'ext::sentiment(title) > 0.5'

Implementations encountering an unknown ext-prefixed function MUST treat it as an evaluation error (see §11.18).

12. Operations

This section defines the behavior of Create, Read, Update, Delete, and Rename operations on collection files.


12.1 Create

Creates a new file in the collection.

Input

Parameter Required Description
type No Type name(s) for the file
frontmatter Yes Field values (may be partial)
body No Markdown body content
path No Target file path (may be derived)

Behavior

  1. Determine type(s): Use provided type, or infer from frontmatter if type/types key present

  2. Apply defaults: For each missing field with a default value, apply the default to the effective record used for validation and output

  3. Generate values: For fields with generated strategy:

    • ulid: Generate ULID
    • uuid: Generate UUID v4
    • now: Set to current datetime
    • {from, transform}: Derive from source field
  4. Validate: If validation level is not off:

    • Validate against all matched type schemas
    • If level is error and validation fails, abort
  5. Determine path:

    • If path provided, use it
    • If type has filename_pattern, derive from field values
    • Otherwise, require explicit path
  6. Check existence: If file already exists at path, abort with error

  7. Write file:

    • Serialize frontmatter to YAML
    • MUST include all explicitly provided fields and all generated fields
    • SHOULD omit fields that were filled solely by defaults, unless the caller explicitly requests default materialization
    • Combine with body
    • Write atomically (temp file + rename)

Output

path: "tasks/task-001.md"
frontmatter:
  id: "01ARZ3NDEKTSV4RRFFQ69G5FAV"
  title: "Fix the bug"
  status: open
  created_at: "2024-03-15T10:30:00Z"
  # ... all fields including generated

Errors

Code Description
unknown_type Specified type doesn't exist
validation_failed Validation errors (with details)
path_conflict File already exists at target path
path_required Cannot determine path

Example

mdbase create task \
  --field title="Fix login bug" \
  --field priority=4 \
  --field "assignee=[[alice]]"

12.2 Read

Reads a file and returns its parsed content.

Input

Parameter Required Description
path Yes File path relative to collection root
validate No Whether to validate (default: per settings)
include_body No Include body content (default: true)

Behavior

  1. Load file: Read from filesystem

  2. Parse frontmatter: Extract YAML frontmatter and body

  3. Determine types:

    • Check for explicit type/types field
    • Evaluate match rules for all types
    • Collect matched types
  4. Validate (if enabled):

    • Validate against all matched types
    • Collect validation issues
  5. Return record: Structured representation

    • frontmatter is the effective frontmatter (defaults applied, computed fields excluded)
    • file.properties (see Querying §10.5) provides raw persisted frontmatter when needed

Output

path: "tasks/task-001.md"
types: [task]
frontmatter:
  id: "task-001"
  title: "Fix the bug"
  status: open
  # ... all fields
file:
  name: "task-001.md"
  folder: "tasks"
  mtime: "2024-03-15T10:30:00Z"
  size: 1234
body: "## Description\n\nThe login form..."
validation:
  valid: true
  issues: []

Errors

Code Description
file_not_found File doesn't exist
invalid_frontmatter YAML parse error

12.3 Update

Modifies an existing file's frontmatter and/or body.

Input

Parameter Required Description
path Yes File path
fields No Field updates (partial)
body No New body content (null = no change)

Behavior

  1. Read existing file: Load current content

  2. Merge fields: Apply field updates to existing frontmatter

    • New fields are added
    • Existing fields are replaced
    • Explicit null removes the field (if write_nulls: omit) or writes null
  3. Update generated fields: For fields with generated: now_on_write, update to current time

  4. Apply defaults: For each missing field with a default, apply the default to the effective record used for validation and output

  5. Validate: If enabled, validate merged frontmatter (using effective values for required checks)

  6. Write file:

    • Preserve field order where possible
    • Preserve body if not provided
    • Write atomically

Null Handling on Update

When updating a field to null:

write_nulls setting Behavior
"omit" (default) Remove the field from frontmatter
"explicit" Write field: null

Important: Never write the empty-value form field: (see Frontmatter).

Output

path: "tasks/task-001.md"
frontmatter:  # Effective frontmatter (defaults applied, computed excluded)
  # ... updated frontmatter
previous:
  status: open
updated:
  status: done

Errors

Code Description
file_not_found File doesn't exist
validation_failed Validation errors

Example

# Update single field
mdbase update tasks/task-001.md --field status=done

# Update multiple fields
mdbase update tasks/task-001.md \
  --field status=done \
  --field "completed_at=$(date -Iseconds)"

# Clear a field
mdbase update tasks/task-001.md --field assignee=null

12.4 Delete

Removes a file from the collection.

Input

Parameter Required Description
path Yes File path
check_backlinks No Warn about incoming links (default: true)

Behavior

  1. Check existence: Verify file exists

  2. Check backlinks (if enabled):

    • Find files that link to this file
    • Warn user about potential broken links
  3. Delete file: Remove from filesystem

Output

path: "tasks/task-001.md"
deleted: true
broken_links:
  - path: "tasks/parent.md"
    field: "subtasks"

Errors

Code Description
file_not_found File doesn't exist

Example

# Delete with confirmation
mdbase delete tasks/task-001.md

# Delete without backlink check
mdbase delete tasks/task-001.md --no-check-backlinks

# Force delete
mdbase delete tasks/task-001.md --force

12.5 Rename (and Move)

Renames or moves a file, optionally updating references across the collection.

Input

Parameter Required Description
from Yes Current file path
to Yes New file path
update_refs No Update references (default: per settings)

Behavior

  1. Validate paths: Ensure source exists and target doesn't

  2. Rename file: Move file to new path atomically

  3. Update references (if rename_update_refs is true):

    Frontmatter links: Update link fields in all files that reference the renamed file

    # Before: links to tasks/old-name.md
    parent: "[[old-name]]"
    
    # After: file renamed to tasks/new-name.md
    parent: "[[new-name]]"

    Body links: Update link syntax in markdown body content

    <!-- Before -->
    See [[old-name]] for details.
    Check [the task](./old-name.md).
    
    <!-- After -->
    See [[new-name]] for details.
    Check [the task](./new-name.md).

Reference Update Rules

  1. Preserve link style:

    • Wikilinks stay as wikilinks
    • Markdown links stay as markdown links
    • Relative links stay relative when possible
  2. Update all matching references:

    • By resolved path (most reliable)
    • By name when unambiguous
  3. ID-based links:

    • If a simple-name link ([[name]]) resolves via id_field and the target file's id_field value did not change, implementations SHOULD NOT rewrite the link during rename (to avoid unnecessary churn).
  4. Handle ambiguity:

    • If a link could refer to multiple files, don't update
    • Emit warning for manual review
  5. Scope: Update references in ALL collection files, not just same folder

Output

from: "tasks/old-name.md"
to: "tasks/new-name.md"
references_updated:
  - path: "tasks/parent.md"
    field: "subtasks[0]"
    old_value: "[[old-name]]"
    new_value: "[[new-name]]"
  - path: "notes/meeting.md"
    location: "body"
    old_value: "[[old-name]]"
    new_value: "[[new-name]]"
warnings:
  - path: "archive/legacy.md"
    message: "Ambiguous link '[[name]]' not updated"

Errors

Code Description
file_not_found Source file doesn't exist
path_conflict Target path already exists

Example

# Simple rename
mdbase rename tasks/old.md tasks/new.md

# Move to different folder
mdbase rename tasks/task.md archive/task.md

# Rename without updating references
mdbase rename tasks/old.md tasks/new.md --no-update-refs

# Dry run (show what would change)
mdbase rename tasks/old.md tasks/new.md --dry-run

12.6 Atomicity

All write operations (Create, Update, Delete, Rename) SHOULD be atomic:

  1. Create/Update: Write to temporary file, then rename
  2. Delete: Single filesystem delete
  3. Rename: Filesystem rename (atomic on most systems)

For Rename with reference updates, atomicity across multiple files is not guaranteed. Implementations SHOULD:

  • Complete the rename first
  • Update references file by file
  • Report partial failures clearly

12.7 Batch Operations

Implementations MAY support batch operations for efficiency:

# Bulk update
mdbase update --where 'status == "open"' --field status=in_progress

# Bulk delete
mdbase delete --where 'tags.contains("archive")' --confirm

# Bulk move
mdbase move 'tasks/*.md' archive/

Validation Phase

Before applying any changes, implementations MUST validate ALL affected files. If any file fails validation and default_validation is error, the entire batch MUST be aborted with no files modified.

Execution Phase

After validation passes, apply changes file by file.

Partial Failure

If a file write fails during execution (I/O error, concurrent modification):

  • Implementations MUST NOT roll back already-written files (filesystem operations are not transactional)
  • Implementations MUST continue processing remaining files (best-effort)
  • Implementations MUST report per-file results: success, failure (with error code), or skipped

Result Format

batch_result:
  total: 50
  succeeded: 47
  failed: 2
  skipped: 1
  details:
    - path: "tasks/task-001.md"
      status: "success"
    - path: "tasks/task-002.md"
      status: "failed"
      error: { code: "concurrent_modification", message: "..." }
    - path: "tasks/task-003.md"
      status: "skipped"
      reason: "Depends on failed task-002.md"

Dry-Run Mode

Batch operations MUST support --dry-run which validates all changes and reports what would happen without modifying any files.


12.8 Formatting Preservation

When writing files, implementations SHOULD preserve:

MUST Preserve

  • Body content (unless explicitly changed)
  • Line ending style (LF vs CRLF)

SHOULD Preserve

  • Frontmatter field order
  • String quoting style
  • Multi-line string format (literal vs folded)
  • Comments (if YAML parser supports it)

MAY Normalize

  • Indentation (recommend 2 spaces)
  • Trailing whitespace
  • Final newline (files SHOULD end with newline)

12.9 Hooks (Optional)

Implementations MAY support hooks for custom logic:

Hook When
beforeCreate Before validation and write
afterCreate After successful write
beforeUpdate Before validation and write
afterUpdate After successful write
beforeDelete Before deletion
afterDelete After successful deletion
beforeRename Before rename
afterRename After successful rename and ref updates

Hooks receive operation context and can:

  • Modify values (before hooks)
  • Perform side effects (after hooks)
  • Abort operation (before hooks, by throwing)

This is an OPTIONAL feature; implementations need not support hooks.


12.10 Concurrency

Read-Modify-Write Cycle

When updating a file, implementations MUST detect concurrent modifications. The recommended approach is optimistic concurrency using file mtime:

  1. Read file, record mtime
  2. Apply changes in memory
  3. Before writing, check that file mtime has not changed
  4. If mtime changed, abort with concurrent_modification error
  5. Write atomically (temp file + rename)

Implementations MAY use content hashing instead of mtime for more reliable conflict detection.

Conflict Behavior

On detecting a concurrent modification, implementations MUST abort the operation and report concurrent_modification. Implementations MUST NOT silently overwrite concurrent changes.

Implementations MAY offer a retry mechanism (re-read, re-apply, re-check) but MUST NOT retry automatically without user/caller consent.

Cross-File Operations

Rename with reference updates touches multiple files. These are NOT atomic across files. Implementations MUST:

  1. Complete the primary rename first
  2. Update references file by file
  3. Use mtime checking on each referenced file before updating
  4. Report partial failures — which files were updated and which were skipped due to conflicts

File Locking

Implementations MAY use advisory file locks for write operations. If used:

  • Locks MUST be released on operation completion (including error paths)
  • Lock timeouts SHOULD be documented
  • Implementations MUST NOT require locking for read operations

13. Caching and Indexing

Caching and indexing are optional features that accelerate queries on large collections. This section defines cache behavior and requirements.


13.1 Core Principle

Files are the source of truth.

Caches are derived data. They MUST be:

  • Rebuildable from files alone
  • Deletable without data loss
  • Optional for correctness (only affect performance)

If you delete the cache folder, the collection still works—queries just run slower.


13.2 When Caching Helps

Caching significantly improves performance for:

Operation Without Cache With Cache
Query by type Scan all files Index lookup
Query by field Scan all files Index lookup
Path prefix filter Filesystem scan Index lookup
Link resolution Search all files Direct lookup
Backlink queries Scan all files Reverse index
Full-text body search Read all files Text index

For small collections (< 100 files), caching overhead may not be worthwhile. For large collections (1000+ files), caching is strongly recommended.


13.3 Cache Requirements

If an implementation supports caching, it MUST follow these rules:

13.3.1 Derivable

The cache MUST be completely rebuildable from:

  • Collection files (markdown)
  • Configuration (mdbase.yaml)
  • Type definitions

No information should exist only in the cache.

13.3.2 Optional

All operations MUST work without the cache, possibly slower:

  • Queries scan files directly
  • Backlinks are computed on demand
  • Link resolution searches the collection

13.3.3 Detectable Staleness

The implementation MUST detect when the cache is stale:

  • File modified after cache entry
  • File deleted but still in cache
  • New file not in cache
  • Config changed since cache build

13.3.4 Explicit Rebuild

Users MUST be able to force a full cache rebuild:

mdbase cache rebuild

13.3.5 Deletable

Deleting the cache folder MUST NOT affect:

  • File contents
  • Collection integrity
  • Operation correctness

13.4 Cache Location

The default cache location is .mdbase/ at the collection root, configurable via settings.cache_folder.

my-collection/
├── mdbase.yaml
├── _types/
├── tasks/
├── notes/
└── .mdbase/                  # Cache folder
    ├── index.sqlite        # Main index (example)
    ├── links.json          # Link graph (example)
    └── meta.json           # Cache metadata

Gitignore

The cache folder SHOULD be gitignored. Add to .gitignore:

.mdbase/

Cache files are machine-specific and should not be version controlled.


13.5 What to Cache

Implementations MAY cache:

Data Purpose
File metadata Fast file lookups
Parsed frontmatter Avoid re-parsing
Type assignments Fast type queries
Field values Field-based queries
Link graph Link resolution, backlinks
Full-text index Body content search

At minimum, caching implementations SHOULD index:

  • File paths and mtimes (for staleness detection)
  • Type assignments (for type queries)
  • Link relationships (for backlinks)

13.6 Cache Invalidation

Staleness Detection

For each file, track:

  • Last known mtime
  • Content hash (optional, more reliable)

On query:

  1. Check if file mtime matches cached mtime
  2. If different, mark entry stale
  3. Re-parse file and update cache

Change Triggers

Cache entries should be invalidated when:

Change Invalidation
File modified Re-index that file
File created Index new file
File deleted Remove from index
File renamed Update path, check links
Config changed Full rebuild
Type definition changed Re-index affected files

Incremental vs Full Rebuild

Incremental: Update only changed entries (fast, normal operation)

Full rebuild: Recreate entire cache (slow, guaranteed consistent)

Implementations SHOULD support both.


The file.backlinks property requires knowing which files link TO a given file. This requires a reverse link index.

For each file A in the collection:

  1. Parse A's frontmatter and body
  2. Extract all links from A
  3. For each link target B:
    • Add A to B's backlinks set

Storage

{
  "tasks/task-001.md": {
    "backlinks": [
      "tasks/parent.md",
      "notes/meeting.md"
    ]
  }
}

Performance Note

Without caching, computing backlinks requires scanning every file. For large collections, this is prohibitively slow. Implementations SHOULD:

  1. Document that file.backlinks requires caching for good performance
  2. Warn when backlink queries are slow
  3. Suggest enabling caching

13.8 Cache Commands

Implementations SHOULD provide cache management commands:

# Show cache status
mdbase cache status
# Output: Cache valid, 1234 files indexed, last built 5 min ago

# Rebuild entire cache
mdbase cache rebuild

# Clear cache
mdbase cache clear

# Update cache incrementally
mdbase cache update

# Verify cache integrity
mdbase cache verify

13.9 Cache Implementation Options

Implementations may use various storage backends:

.mdbase/index.sqlite

Pros: ACID, queryable, single file, widely supported Cons: Binary file (not human-readable)

JSON Files

.mdbase/files.json
.mdbase/links.json
.mdbase/types.json

Pros: Human-readable, simple Cons: Full rewrite on update, no concurrent access

Memory Only

No persistent cache; rebuild on each run.

Pros: No disk I/O, always fresh Cons: Slow startup for large collections


13.10 Concurrent Access

When multiple processes may access the collection:

Read-Only Access

Multiple readers can safely share a cache. Use file locking or SQLite's WAL mode.

Write Access

When one process writes:

  1. Acquire exclusive lock
  2. Perform operation
  3. Update cache
  4. Release lock

Implementations SHOULD document concurrency behavior.


13.11 Cache Warming

For large collections, initial cache build can take time. Implementations MAY support:

Eager warming: Build cache on first access

mdbase cache build

Lazy warming: Build entries on first query for each file

Background warming: Build cache asynchronously while serving queries


13.12 Cache Versioning

Cache format may change between implementation versions. Include a version marker:

{
  "version": "1.0",
  "spec_version": "0.1.0",
  "built_at": "2024-03-15T10:30:00Z"
}

When loading cache:

  1. Check version compatibility
  2. If incompatible, trigger full rebuild
  3. Log version mismatch for debugging

14. Conformance

This section defines conformance levels and testing requirements for implementations.


14.1 Conformance Levels

Implementations may claim conformance at different levels. Each level builds on all previous levels.

Level 1: Core

Required capabilities:

  • Parse mdbase.yaml configuration
  • Locate and scan markdown files in collection
  • Parse YAML frontmatter from files
  • Handle null values correctly (per §3)
  • Load type definitions from types folder
  • Single-type matching via explicit declaration (type field)
  • Validate fields against type schemas
  • Implement Create, Read, Update, Delete operations
  • Type coercion (per §7.16)

Test coverage: Basic parsing, validation, CRUD operations, concurrency (basic mtime conflict detection)

Level 2: Matching

Additional capabilities:

  • Path-based type matching (path_glob)
  • Field presence matching (fields_present)
  • Field value matching (where conditions in match rules)
  • Multi-type matching (files matching multiple types)
  • Multi-type validation with constraint merging (per §6.5)

Test coverage: Match rule evaluation, multi-type scenarios, constraint merging

Level 3: Querying

Additional capabilities:

  • Core query model (filter, sort, limit, offset)
  • Expression evaluation (all operators in §11)
  • String, list, and object methods
  • Date arithmetic (including date subtraction)
  • Duration parsing (duration())
  • Null coalescing (??)

Test coverage: Core query correctness, expression edge cases, body_search, computed_fields

Additional capabilities:

  • Parse all link formats (wikilink, markdown, bare path)
  • Resolve links to files
  • asFile() traversal in expressions
  • file.hasLink() and file.hasTag() functions
  • file.links property

Test coverage: Link parsing, resolution, traversal

Level 5: References

Additional capabilities:

  • Rename with reference updates
  • Backlink computation (file.backlinks)
  • Body link detection and update

Test coverage: Reference update correctness, backlink accuracy

Level 6: Full

All capabilities including:

  • Caching with staleness detection (per §13)
  • Batch operations
  • Watch mode with event delivery (per §15)
  • Type creation via tooling
  • Nested collection detection (per §2)

Test coverage: Performance, cache correctness, watching, edge cases


14.1.1 Optional Profiles (Non-Normative)

Implementations MAY support optional profiles beyond the core levels. The Query+ profile adds advanced query features (formulas, groupBy, summaries, property_summaries, properties) as defined in Querying §10.7. Support for Query+ is not required for conformance.


14.2 Conformance Claims

Implementations SHOULD clearly state their conformance level:

mdbase-tool v1.0.0
Conformance: Level 4 (Links)
Specification: 0.1.0

Implementations MAY implement features from higher levels while claiming a lower level, but SHOULD NOT claim a level without passing all tests for that level.


14.3 Test Suite

A conformance test suite is provided as a collection of test cases. Each test group specifies a spec_ref identifying the specification section(s) under test. Individual test cases MAY also include a spec_ref to pinpoint the exact clause being validated.

The spec_ref field uses section numbers (e.g., "§7.2", "§3.4, §7.2"). When a test case omits spec_ref, it inherits the group-level reference.

name: "required field validation"
level: 1
category: validation
spec_ref: "§7.2"

setup:
  config: |
    spec_version: "0.1.0"
  types:
    task.md: |
      ---
      name: task
      fields:
        title:
          type: string
          required: true
      ---
  files:
    tasks/valid.md: |
      ---
      type: task
      title: "Valid task"
      ---
    tasks/invalid.md: |
      ---
      type: task
      ---

tests:
  - name: "valid file passes validation"
    operation: validate
    input:
      path: "tasks/valid.md"
    expect:
      valid: true
      issues: []

  - name: "missing required field fails validation"
    spec_ref: "§7.2"
    operation: validate
    input:
      path: "tasks/invalid.md"
    expect:
      valid: false
      issues:
        - code: missing_required
          field: title

Test Categories

Category Description Spec Reference
config Configuration parsing and validation §4
types Type definition loading and inheritance §5
matching Type matching rules §6
validation Schema validation §9
expressions Expression evaluation §11
queries Query execution §10
links Link parsing and resolution §8
operations CRUD operations §12
references Reference updates §12.5
caching Cache behavior §13
concurrency Concurrent modification detection §12.10
watching Watch mode event delivery §15
body_search Body content filtering §10.5
computed_fields Computed field evaluation §5.12

14.4 Required Test Coverage

For each conformance level, implementations MUST pass:

Level Required Categories
1 config, types (basic), validation (basic), operations, concurrency
2 + matching
3 + expressions, queries, body_search, computed_fields
4 + links
5 + references
6 + caching, watching

14.5 Test Execution

Test suite can be run against any implementation:

# Run all tests
mdbase-test run --impl ./my-impl

# Run specific level
mdbase-test run --impl ./my-impl --level 3

# Run specific category
mdbase-test run --impl ./my-impl --category validation

# Generate conformance report
mdbase-test report --impl ./my-impl --output report.html

14.6 Implementation Notes

Edge Cases to Handle

Implementations should correctly handle:

  • Empty frontmatter (---\n---)
  • File without frontmatter
  • Frontmatter with only null values
  • Empty collection (no files)
  • Type with no fields defined
  • File matching zero types
  • File matching multiple conflicting types
  • Circular type inheritance (should error)
  • Self-referential links
  • Links to non-existent files
  • Very long field values
  • Unicode in field names and values
  • Files with unusual characters in names

Performance Expectations

While not strictly required, implementations SHOULD:

Operation Target Collection Size
Read single file < 10ms Any
Query by type < 100ms 1000 files
Query with filter < 500ms 1000 files
Link resolution < 10ms Any
Backlink query < 1s 1000 files (cached)

Error Messages

Implementations SHOULD provide helpful error messages:

❌ Validation failed: tasks/task-001.md

  Field 'priority' has invalid value
    Expected: integer between 1 and 5
    Actual: "high" (string)
    
  At line 5 in frontmatter:
    priority: high
              ^^^^

  Hint: Use a number like `priority: 3`

14.7 Extensions

Implementations MAY extend the specification with additional features:

  • Custom field types
  • Additional expression functions
  • Query output formats
  • Integration hooks
  • Custom validation rules

Extensions SHOULD:

  • Be clearly documented as non-standard
  • Not conflict with standard behavior
  • Be optional (spec-compliant usage should work without them)

Extensions SHOULD NOT:

  • Change the meaning of standard features
  • Require non-standard syntax for basic operations
  • Break interoperability with compliant tools

14.8 Reporting Issues

If the specification is ambiguous or conflicts with practical implementation needs, please report issues to the specification maintainers. The goal is a spec that is:

  • Clear and unambiguous
  • Implementable in any language
  • Useful for real-world applications

15. Watching

This section defines the watch mode event model for monitoring a collection for changes. Watch mode is a required capability for Level 6 conformance.


15.1 Overview

Watch mode enables implementations to monitor a collection for filesystem changes and emit structured events. This supports real-time UIs, continuous validation, and incremental cache updates.


15.2 Event Types

Event Trigger Payload Fields
file_created New markdown file detected path, types, frontmatter
file_modified Existing file content changed path, types, frontmatter, changed_fields
file_deleted Markdown file removed path, last_known_types
file_renamed File moved/renamed (if detectable) from, to, types
type_changed Type definition file modified type_name, affected_files
config_changed mdbase.yaml modified previous_hash, new_hash
validation_error File fails validation after change path, issues

When present, frontmatter in events is the effective frontmatter (defaults applied, computed excluded).


15.3 Event Payload Structure

All events include a common set of fields:

Field Type Description
event string Event type (e.g., file_created)
timestamp datetime When the event was emitted
path string File path relative to collection root

Additional fields per event type are listed in §15.2.

Example Payloads

file_created:

event: file_created
timestamp: "2024-03-15T10:30:00Z"
path: "tasks/task-042.md"
types: [task]
frontmatter:  # Effective frontmatter (defaults applied, computed excluded)
  title: "New task"
  status: open

file_modified:

event: file_modified
timestamp: "2024-03-15T10:31:00Z"
path: "tasks/task-042.md"
types: [task]
frontmatter:  # Effective frontmatter (defaults applied, computed excluded)
  title: "New task"
  status: in_progress
changed_fields: [status]  # Raw persisted frontmatter keys that changed

file_deleted:

event: file_deleted
timestamp: "2024-03-15T10:32:00Z"
path: "tasks/task-042.md"
last_known_types: [task]

file_renamed:

event: file_renamed
timestamp: "2024-03-15T10:33:00Z"
from: "tasks/task-042.md"
to: "archive/task-042.md"
types: [task]

15.4 Debouncing

Implementations MUST debounce filesystem events — multiple rapid changes to the same file MUST be coalesced into a single event.

  • Recommended debounce window: 100–500ms (implementation-defined)
  • After the debounce window, the implementation reads the file's current state and emits one event reflecting the net change
  • If a file is created and then immediately deleted within the debounce window, no event is emitted

15.5 Rename Detection

Filesystem watchers typically see a delete followed by a create rather than a rename.

  • Implementations SHOULD detect renames by correlating a delete and create within a short window (e.g., same content hash, or same id_field value)
  • If rename detection succeeds, emit a single file_renamed event
  • If rename detection fails, implementations MUST emit separate file_deleted and file_created events

15.6 Event Delivery

  • Implementations MUST support callback/listener registration for events
  • Events MUST be delivered in order per file — events for the same file are delivered sequentially. Events for different files MAY be delivered concurrently
  • If event processing (in a listener callback) fails, the error MUST NOT stop the watcher. Implementations SHOULD log the error and continue

15.7 Interaction with Caching

When a cache is present (see §13):

  • Watch events SHOULD trigger incremental cache updates
  • Cache updates MUST complete before the event is delivered to listeners, so that listeners see consistent state when they query the collection

Appendix A: Complete Examples

This appendix provides complete, working examples of collections and their components.


A.1 Minimal Collection

The simplest valid collection:

minimal/
├── mdbase.yaml
└── hello.md

mdbase.yaml:

spec_version: "0.1.0"

hello.md:

---
title: Hello World
---

This is a minimal collection with one untyped file.

A.2 Task Management Collection

A complete task management setup with types, queries, and examples.

Structure

tasks-project/
├── mdbase.yaml
├── _types/
│   ├── base.md
│   ├── task.md
│   ├── person.md
│   └── urgent.md
├── people/
│   ├── alice.md
│   └── bob.md
├── tasks/
│   ├── feature-login.md
│   ├── bug-crash.md
│   └── subtasks/
│       └── login-ui.md
└── .mdbase/
    └── (cache files)

Configuration

mdbase.yaml:

spec_version: "0.1.0"

name: "Project Tasks"
description: "Task tracking for the main project"

settings:
  exclude:
    - ".git"
    - "node_modules"
    - ".mdbase"
    - "*.draft.md"
  
  default_validation: "warn"
  default_strict: false
  id_field: "id"
  rename_update_refs: true
  write_nulls: "omit"

Type Definitions

_types/base.md:

---
name: base

fields:
  id:
    type: string
    required: true
    generated: ulid
  created_at:
    type: datetime
    generated: now
  updated_at:
    type: datetime
    generated: now_on_write
---

# Base Type

Common fields for all tracked entities. Provides automatic ID generation and timestamps.

_types/task.md:

---
name: task
description: A task or todo item with lifecycle tracking
extends: base

match:
  path_glob: "tasks/**/*.md"

filename_pattern: "{id}.md"

fields:
  title:
    type: string
    required: true
    min_length: 1
    max_length: 200
    description: Short, descriptive task title
  
  status:
    type: enum
    values: [open, in_progress, blocked, done, cancelled]
    default: open
    description: Current lifecycle state
  
  priority:
    type: integer
    min: 1
    max: 5
    default: 3
    description: "1 = lowest, 5 = highest"
  
  assignee:
    type: link
    target: person
    description: Person responsible for this task
  
  due_date:
    type: date
    description: When the task should be completed
  
  tags:
    type: list
    items:
      type: string
    default: []
    description: Categorization tags
  
  parent:
    type: link
    target: task
    description: Parent task (for subtasks)
  
  blocks:
    type: list
    items:
      type: link
      target: task
    default: []
    description: Tasks that this task blocks
  
  estimate_hours:
    type: number
    min: 0
    description: Estimated hours to complete
---

# Task

Tasks represent discrete units of work tracked through their lifecycle.

## Status Values

| Status | Description |
|--------|-------------|
| `open` | Not started |
| `in_progress` | Currently being worked on |
| `blocked` | Waiting on external dependency |
| `done` | Completed successfully |
| `cancelled` | Will not be done |

## Priority Scale

- **5**: Critical - drop everything
- **4**: High - do this week
- **3**: Medium - normal priority
- **2**: Low - when time permits
- **1**: Someday - nice to have

## Example

```yaml
---
type: task
title: Implement user authentication
status: in_progress
priority: 4
assignee: "[[alice]]"
due_date: 2024-04-01
tags: [feature, security]
estimate_hours: 16
---

## Requirements

- OAuth 2.0 support
- Remember me functionality
- Password reset flow

**_types/person.md:**
```markdown
---
name: person
description: A team member
extends: base

match:
  path_glob: "people/**/*.md"

fields:
  name:
    type: string
    required: true
    description: Full name
  
  email:
    type: string
    description: Email address
  
  team:
    type: string
    description: Team or department
  
  role:
    type: string
    description: Job title or role
  
  active:
    type: boolean
    default: true
    description: Whether currently on the team
---

# Person

Team member records for assignment and reference.

_types/urgent.md:

---
name: urgent
description: Marks items requiring immediate attention

match:
  where:
    tags:
      contains: "urgent"

fields:
  escalation_contact:
    type: string
    description: Who to contact for escalation
  
  sla_hours:
    type: integer
    description: Hours until SLA breach
---

# Urgent

Items tagged "urgent" automatically get this type applied.
This enables additional tracking fields for urgent items.

Sample Files

people/alice.md:

---
type: person
id: alice
name: Alice Chen
email: alice@example.com
team: engineering
role: Senior Developer
active: true
---

Alice is the tech lead for the backend team.

## Expertise
- Authentication systems
- API design
- Performance optimization

tasks/feature-login.md:

---
type: task
id: feature-login
title: Implement user login system
status: in_progress
priority: 4
assignee: "[[alice]]"
due_date: 2024-04-01
tags: [feature, security, auth]
estimate_hours: 24
---

# Login System Implementation

## Overview

Build complete authentication system with OAuth support.

## Subtasks

- [[subtasks/login-ui]] - Frontend components
- Database schema design
- API endpoints
- Testing

tasks/bug-crash.md:

---
types: [task, urgent]
id: bug-crash
title: Fix crash on startup
status: open
priority: 5
assignee: "[[bob]]"
tags: [bug, urgent, production]
escalation_contact: alice@example.com
sla_hours: 4
---

# Critical: App Crashes on Startup

## Symptoms

App crashes immediately when user opens it.

## Impact

100% of users affected.

## Workaround

None known.

A.3 Query Examples

Core Examples

All Open Tasks

query:
  types: [task]
  where: 'status == "open"'
  order_by:
    - field: priority
      direction: desc

My Tasks (Assigned to Alice)

query:
  types: [task]
  where:
    and:
      - 'assignee.asFile().id == "alice"'
      - 'status != "done"'
  order_by:
    - field: due_date
      direction: asc

Tasks Due This Week

query:
  types: [task]
  where:
    and:
      - "due_date >= today()"
      - "due_date <= today() + '7d'"
      - 'status != "done"'
  order_by:
    - field: due_date
      direction: asc

High Priority Blockers

query:
  types: [task]
  where:
    and:
      - "priority >= 4"
      - 'status == "blocked"'

Urgent Items (Multi-Type)

query:
  where: 'types.contains("urgent")'
  order_by:
    - field: sla_hours
      direction: asc

Query+ Examples

Overdue Tasks (Query+)

query:
  types: [task]
  where:
    and:
      - "due_date < today()"
      - 'status != "done"'
      - 'status != "cancelled"'
  formulas:
    days_overdue: "(today() - due_date) / 86400000"  # date subtraction returns milliseconds
  order_by:
    - field: formula.days_overdue
      direction: desc

Workload by Person (Query+)

query:
  types: [task]
  where: 'status != "done" && exists(assignee)'
  formulas:
    assignee_name: "assignee.asFile().name"

To group by assignee (Query+):

query:
  types: [task]
  where: 'status != "done" && exists(assignee)'
  formulas:
    assignee_name: "assignee.asFile().name"
  groupBy:
    property: formula.assignee_name
    direction: ASC
  property_summaries:
    estimate_hours: Sum

A.4 Knowledge Base Collection

A personal wiki / knowledge base setup.

Structure

knowledge-base/
├── mdbase.yaml
├── _types/
│   ├── document.md
│   ├── concept.md
│   ├── source.md
│   └── daily.md
├── concepts/
│   ├── machine-learning.md
│   └── distributed-systems.md
├── sources/
│   └── attention-paper.md
├── daily/
│   └── 2024/
│       └── 03/
│           └── 15.md
└── inbox/
    └── random-thought.md

Types

_types/document.md:

---
name: document
description: General note or document

match:
  path_glob: "**/*.md"

fields:
  title:
    type: string
  tags:
    type: list
    items:
      type: string
    default: []
  related:
    type: list
    items:
      type: link
    default: []
---

# Document

Base type for all notes. Matched by default for any markdown file.

_types/concept.md:

---
name: concept
description: A concept or topic being studied
extends: document

match:
  path_glob: "concepts/**/*.md"

fields:
  aliases:
    type: list
    items:
      type: string
    default: []
    description: Alternative names for this concept
  
  status:
    type: enum
    values: [stub, developing, mature]
    default: stub
  
  sources:
    type: list
    items:
      type: link
      target: source
    default: []
---

# Concept

Represents a topic or idea being studied. Evolves from stub to mature.

_types/daily.md:

---
name: daily
description: Daily journal entry
extends: document

match:
  path_glob: "daily/**/*.md"

filename_pattern: "{date}.md"

fields:
  date:
    type: date
    required: true
  
  mood:
    type: enum
    values: [great, good, okay, rough, bad]
  
  highlights:
    type: list
    items:
      type: string
    default: []
---

# Daily Note

Journal entries organized by date.

A.5 CLI Workflow Example

# Initialize a new collection
mkdir my-project && cd my-project
mdbase init

# Create a type
mdbase type create task

# Create a task
mdbase create task \
  --field title="Build the thing" \
  --field priority=4

# List all tasks
mdbase query --type task

# Find overdue tasks
mdbase query --type task --where 'due_date < today() && status != "done"'

# Update a task
mdbase update tasks/01ABC.md --field status=done

# Rename with reference updates
mdbase rename tasks/old-name.md tasks/new-name.md

# Validate the collection
mdbase validate

# Rebuild cache
mdbase cache rebuild

Appendix B: Expression Grammar

This appendix provides a formal grammar for the expression language used in filters, match conditions, and formulas.


B.1 Grammar Notation

This grammar uses Extended Backus-Naur Form (EBNF):

  • = defines a production rule
  • | denotes alternatives
  • [ ] denotes optional elements
  • { } denotes zero or more repetitions
  • " " denotes literal strings
  • ( ) groups elements
  • /* */ are comments

B.2 Complete Grammar

(* Top-level *)
expression = null_coalescing_expression ;

(* Null coalescing - low precedence *)
null_coalescing_expression = or_expression { "??" or_expression } ;

(* Logical operators *)
or_expression = and_expression { "||" and_expression } ;

and_expression = not_expression { "&&" not_expression } ;

not_expression = "!" not_expression
               | comparison_expression ;

(* Comparison operators *)
comparison_expression = additive_expression [ comparison_op additive_expression ] ;

comparison_op = "==" | "!=" | "<" | ">" | "<=" | ">=" ;

(* Arithmetic operators *)
additive_expression = multiplicative_expression { ( "+" | "-" ) multiplicative_expression } ;

multiplicative_expression = unary_expression { ( "*" | "/" | "%" ) unary_expression } ;

unary_expression = "-" unary_expression
                 | postfix_expression ;

(* Property access and function calls *)
postfix_expression = primary_expression { postfix_op } ;

postfix_op = "." identifier [ call_arguments ]   (* Method or property *)
           | "[" expression "]"                   (* Index access *)
           | call_arguments ;                     (* Function call *)

call_arguments = "(" [ argument_list ] ")" ;

argument_list = expression { "," expression } ;

(* Primary expressions *)
primary_expression = literal
                   | identifier
                   | "(" expression ")"
                   | if_expression
                   | list_literal ;

(* If expression *)
if_expression = "if" "(" expression "," expression "," expression ")" ;

(* Literals *)
literal = string_literal
        | number_literal
        | boolean_literal
        | null_literal ;

string_literal = '"' { string_char } '"'
               | "'" { string_char } "'" ;

string_char = /* any character except quote or backslash */
            | escape_sequence ;

escape_sequence = "\\" ( '"' | "'" | "\\" | "n" | "r" | "t" ) ;

number_literal = integer_literal [ exponent ]
               | float_literal ;

integer_literal = [ "-" ] digit { digit } ;

float_literal = [ "-" ] digit { digit } "." digit { digit } [ exponent ] ;

exponent = ( "e" | "E" ) [ "+" | "-" ] digit { digit } ;

boolean_literal = "true" | "false" ;

null_literal = "null" ;

list_literal = "[" [ expression { "," expression } ] "]" ;

(* Identifiers *)
identifier = ( letter | "_" ) { letter | digit | "_" } ;

letter = "a" | "b" | ... | "z" | "A" | "B" | ... | "Z" ;

digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

B.3 Operator Precedence

From highest to lowest precedence:

Level Operators Associativity Description
1 ( ) Grouping
2 . [] () Left-to-right Property access, index, call
3 ! - (unary) Right-to-left Logical NOT, negation
4 * / % Left-to-right Multiplication, division, modulo
5 + - Left-to-right Addition, subtraction
6 < <= > >= Left-to-right Comparison
7 == != Left-to-right Equality
8 && Left-to-right Logical AND
9 ` `
10 ?? Left-to-right Null coalescing

B.4 Reserved Words

The following identifiers are reserved:

true
false
null
if
note
file
formula
this

These cannot be used as bare field names without bracket notation (e.g., use note["file"] to access a frontmatter field named file).

The keywords note, file, formula, and this serve as namespace prefixes for property access (see Querying §10.5). note. accesses raw persisted frontmatter; bare field names access effective frontmatter. When a frontmatter field name collides with a namespace keyword, use the note. prefix with bracket notation: note["file"], note["formula"].

`file` Namespace Properties

The file namespace provides access to file metadata. The following properties are valid under file.:

file.name       file.basename   file.path       file.folder
file.ext        file.size       file.ctime      file.mtime
file.body       file.links      file.backlinks  file.tags
file.properties file.embeds

file.body is a string containing the raw markdown body content (everything after the frontmatter closing ---). It supports all string methods defined in §11.5.

Chained Method Calls

The grammar supports chained method calls through the recursive postfix_op production. This explicitly includes chained .asFile() for multi-hop link traversal:

assignee.asFile().manager.asFile().name

This is parsed as a sequence of postfix operations:

postfix_expression
├── identifier: "assignee"
├── method_call: "asFile" ()
├── property_access: "manager"
├── method_call: "asFile" ()
└── property_access: "name"

Implementations MUST enforce a maximum traversal depth (default: 10) to prevent unbounded chains. See §8.7 for traversal rules.


B.5 Whitespace and Comments

Whitespace (spaces, tabs, newlines) is ignored except within string literals.

Comments are not supported in the expression language. (Use YAML comments in query files instead.)


B.6 String Escaping

Within string literals:

Escape Meaning
\\ Backslash
\" Double quote
\' Single quote
\n Newline
\r Carriage return
\t Tab

B.7 Duration Literals

Duration literals are strings with a special format used in date arithmetic:

duration_literal = string_literal ;  (* Must match duration pattern *)

duration_pattern = number duration_unit ;

duration_unit = "y" | "year" | "years"
              | "M" | "month" | "months"
              | "w" | "week" | "weeks"
              | "d" | "day" | "days"
              | "h" | "hour" | "hours"
              | "m" | "minute" | "minutes"
              | "s" | "second" | "seconds" ;

Examples: "7d", "2 weeks", "1h", "30m"

Note: Durations are regular strings; the arithmetic operators recognize them contextually.


B.8 Parse Examples

Simple Comparison

status == "open"

Parse tree:

comparison_expression
├── additive_expression
│   └── primary_expression
│       └── identifier: "status"
├── comparison_op: "=="
└── additive_expression
    └── primary_expression
        └── string_literal: "open"

Combined Logic

priority >= 3 && status != "done"

Parse tree:

and_expression
├── comparison_expression
│   ├── identifier: "priority"
│   ├── ">="
│   └── number_literal: 3
└── comparison_expression
    ├── identifier: "status"
    ├── "!="
    └── string_literal: "done"

Method Chain (Arrow Extension Only)

tags.filter(t => t.startsWith("bug")).length > 0

Parse tree (only valid if the optional arrow-function extension is enabled):

comparison_expression
├── postfix_expression
│   ├── postfix_expression
│   │   ├── identifier: "tags"
│   │   └── method_call: "filter"
│   │       └── lambda_expression
│   │           ├── parameter: "t"
│   │           └── method_call
│   │               ├── identifier: "t"
│   │               └── method: "startsWith"
│   │                   └── string_literal: "bug"
│   └── property_access: "length"
├── ">"
└── number_literal: 0

Conditional

if(priority > 3, "high", "normal")

Parse tree:

if_expression
├── condition
│   └── comparison_expression
│       ├── identifier: "priority"
│       ├── ">"
│       └── number_literal: 3
├── then_value
│   └── string_literal: "high"
└── else_value
    └── string_literal: "normal"

B.9 Implementation Notes

Tokenization

Recommended token types:

STRING        : '"' ... '"' | "'" ... "'"
NUMBER        : [0-9]+ ('.' [0-9]+)?
IDENTIFIER    : [a-zA-Z_][a-zA-Z0-9_]*
BOOLEAN       : 'true' | 'false'
NULL          : 'null'
IF            : 'if'
OPERATOR      : '==' | '!=' | '<=' | '>=' | '<' | '>' | '&&' | '||' | '!' | '+' | '-' | '*' | '/' | '%' | '??' | '=>'
               (* include '=>' only if the optional arrow-function extension is enabled *)
PUNCTUATION   : '(' | ')' | '[' | ']' | '.' | ','

Error Recovery

When parsing fails, implementations SHOULD:

  1. Report the position of the error
  2. Provide context (surrounding tokens)
  3. Suggest likely fixes for common errors

Example error message:

Expression parse error at position 15:
  status == "open" && 
                      ^
  Expected: expression
  Found: end of input
  
  Hint: Expression is incomplete after '&&'

B.10 Optional Arrow-Function Extension

Implementations MAY support arrow-function syntax in list methods. If supported, the following grammar is added for lambda expressions used within argument lists:

lambda_expression = identifier "=>" expression
                 | "(" [ parameter_list ] ")" "=>" expression ;

parameter_list = identifier { "," identifier } ;

Appendix C: Error Codes

This appendix defines standard error codes for validation issues and operation errors.


C.1 Validation Error Codes

Field Errors

Code Description Example
missing_required Required field is absent or null Field title is required
type_mismatch Value doesn't match declared type Expected integer, got string
constraint_violation Value violates min/max/pattern/etc Value 7 exceeds max of 5
invalid_enum Value not in enum options "pending" not in [open, done]
unknown_field Field not in schema (strict mode) Unknown field "custom"
deprecated_field Field is marked deprecated Field "old_name" is deprecated
duplicate_id id_field value is not unique in collection id: task-001 appears in multiple files
duplicate_value Field value violates cross-file unique constraint slug: "my-post" appears in multiple files

List Errors

Code Description Example
list_too_short List has fewer than min_items Minimum 1 item required
list_too_long List has more than max_items Maximum 10 items allowed
list_duplicate Duplicate in list with unique=true Duplicate value "a"
list_item_invalid List item fails validation Item [2] type mismatch

String Errors

Code Description Example
string_too_short String shorter than min_length Minimum 1 character required
string_too_long String longer than max_length Maximum 200 characters allowed
pattern_mismatch String doesn't match pattern Must match "^[A-Z].*"

Number Errors

Code Description Example
number_too_small Number below min Value -1 below min of 0
number_too_large Number above max Value 100 above max of 10
not_integer Expected integer, got float 3.5 is not an integer
Code Description Example
invalid_link Link cannot be parsed Malformed wikilink
link_not_found Link target doesn't exist Target "[[missing]]" not found
link_wrong_type Target is wrong type Expected person, found task
ambiguous_link Multiple candidates for simple name link after tiebreakers "[[note]]" matches notes/note.md and archive/note.md

Date/Time Errors

Code Description Example
invalid_date Cannot parse as date "tomorrow" is not ISO date
invalid_datetime Cannot parse as datetime Invalid datetime format
invalid_time Cannot parse as time Invalid time format

C.2 Type System Errors

Code Description Example
unknown_type Type name not defined Type "taks" not found
circular_inheritance Type inheritance forms cycle task → base → task
missing_parent_type Parent type doesn't exist Parent "base" not found
type_conflict Multi-type field incompatibility "status" defined as string and enum
invalid_type_definition Type file has invalid schema Missing required "name" field
circular_computed Circular dependency between computed fields full_name depends on display depends on full_name

C.3 Operation Errors

File Operations

Code Description Example
file_not_found File doesn't exist tasks/missing.md not found
path_conflict File already exists at target path (on create or rename) tasks/task.md already exists
path_required Cannot determine file path No path provided or derivable
invalid_path Path is malformed Path contains invalid characters
invalid_frontmatter Frontmatter YAML cannot be parsed YAML syntax error in frontmatter
validation_failed Frontmatter fails validation against type schema(s) Contains individual validation issues (see §C.1)
permission_denied Filesystem permission error Cannot write to file
concurrent_modification File was modified by another process during operation File mtime changed between read and write
path_traversal Link resolution attempted to escape collection root [[../../../etc/passwd]] escapes root

Rename Operations

Code Description Example
rename_ref_update_failed Reference update failed for one or more files Could not update links in X

Configuration Errors

Code Description Example
invalid_config Config file malformed YAML parse error
missing_config No mdbase.yaml found Not a collection
unsupported_version spec_version not supported Version 2.0 not supported

C.4 Expression Errors

Code Description Example
invalid_expression Expression syntax error Unexpected token
unknown_function Function doesn't exist Unknown function "foo"
wrong_argument_count Wrong number of arguments if() requires 3 arguments
type_error Type error in expression Cannot add string and number
expression_depth_exceeded Expression traversal exceeded maximum depth Chained asFile() calls exceed 10-hop limit

C.5 Formula Errors

Code Description Example
circular_formula Formula references form cycle a refs b refs a
invalid_formula Formula expression invalid Parse error in formula
formula_evaluation_error Runtime error in formula Division by zero

C.6 Error Response Format

Errors SHOULD be returned in a consistent format:

Single Error

{
  "error": {
    "code": "file_not_found",
    "message": "File 'tasks/missing.md' not found",
    "path": "tasks/missing.md"
  }
}

Validation Errors

{
  "valid": false,
  "errors": [
    {
      "path": "tasks/task-001.md",
      "field": "priority",
      "code": "constraint_violation",
      "message": "Value 7 exceeds maximum of 5",
      "severity": "error",
      "expected": { "max": 5 },
      "actual": 7,
      "type": "task",
      "line": 5,
      "column": 11,
      "end_line": 5,
      "end_column": 12
    },
    {
      "path": "tasks/task-001.md",
      "field": "custom_field",
      "code": "unknown_field",
      "message": "Field 'custom_field' is not defined in type 'task'",
      "severity": "warning",
      "type": "task",
      "line": 8,
      "column": 1,
      "end_line": 8,
      "end_column": 27
    }
  ],
  "warnings": 1,
  "errorCount": 1
}

C.7 Error Severity

Severity Description Effect
error Definite problem Fails validation at error level
warning Potential problem Reported but doesn't fail
info Informational Logged, no effect

C.8 Human-Readable Messages

Error messages SHOULD be:

  1. Clear: State what went wrong
  2. Specific: Include relevant values
  3. Actionable: Suggest how to fix

Good example:

Field 'priority' has value 7, but maximum allowed is 5.
Change the value to 5 or less.

Bad example:

Constraint violation on priority.

C.9 Exit Codes (CLI)

For CLI implementations:

Code Description
0 Success
1 General error
2 Validation error(s)
3 Configuration error
4 File not found
5 Permission denied

Appendix D: Compatibility Notes

This appendix describes compatibility with existing tools and migration paths from other systems.


D.1 Obsidian Bases Compatibility

This specification was designed with Obsidian Bases compatibility as a goal. Many expression and query patterns are directly compatible.

Compatible Features

Feature This Spec Obsidian Bases
Property access status, file.name Same
Comparison ==, !=, <, >, <=, >= Same
Boolean logic &&, ||, ! Same
Date functions now(), today() Same
Date arithmetic date + "7d" Same
String methods .contains(), .startsWith() Same
List methods .contains(), .length Same
Link traversal link.asFile() Same
File metadata file.mtime, file.path, file.tags Same
Context this.file, this.property Same
Logical structure and:, or:, not: in YAML Same
Type checking .isType("string") Same
Type conversion number(), list(), .toString() Same
List methods .unique(), .reduce(), .reverse() Same
String methods .lower(), .upper(), .split() Same
Summaries values keyword, default functions Same
Grouping groupBy (single property) Same

Extended Features

This specification adds features not in Obsidian Bases:

  • Type definitions as markdown files (version-controlled schemas)
  • Multi-type matching with constraint merging
  • Formal validation with error codes and levels
  • Rename with reference updates
  • CRUD operations specification
  • Generated fields (ULID, UUID, timestamps)
  • Filename patterns with slug generation
  • Match rules for automatic type assignment
  • Nested collection detection
  • Security considerations (ReDoS, resource limits)

Differences

Aspect This Spec Obsidian Bases
Type storage Markdown files in types folder Obsidian internal
Configuration mdbase.yaml Obsidian settings
Views Not specified (query only) Table, Board, Gallery, etc.
Grouping groupBy clause (single property, per §10.7) Built-in groupBy
Summaries property_summaries and custom summaries (per §10.7, §11.14) Built-in summary functions
Lambda style Implicit variables (value, index, acc); arrow syntax optional Implicit variables
Method names .lower(), .upper(), .title() Same

Optional Compatibility Profile (Non-Normative)

Implementations MAY provide an optional "Bases compatibility" profile that mirrors Obsidian Bases query and expression behavior. This is not a required part of conformance. If provided, tools SHOULD document:

  • Which Bases features are supported
  • Any behavioral differences
  • How to enable the profile (if applicable)

Migration from Bases Queries

Most Bases queries work directly:

# Bases query
types: [task]
where: 'status == "open"'
order_by:
  - field: due_date
    direction: asc

# This spec: identical!

D.2 Dataview Compatibility

Dataview is a popular Obsidian plugin with its own query language. Here's how to migrate common patterns.

Query Migration

Dataview This Spec
FROM "tasks" folder: "tasks"
WHERE status = "open" where: 'status == "open"'
WHERE contains(tags, "urgent") where: 'tags.contains("urgent")'
SORT due_date ASC order_by: [{field: due_date, direction: asc}]
LIMIT 10 limit: 10

Full Example

Dataview:

TABLE title, status, due_date
FROM "tasks"
WHERE status != "done"
SORT due_date ASC
LIMIT 20

This Spec:

query:
  folder: "tasks"
  where: 'status != "done"'
  order_by:
    - field: due_date
      direction: asc
  limit: 20

Unsupported Dataview Features

Feature Notes
Inline fields (field::) Not supported; use frontmatter
TABLE format Implementations define output format
LIST format Implementations define output format
TASK queries Use where with checkbox fields
CALENDAR view Implementation-specific
DataviewJS Not applicable

D.3 Hugo/Jekyll Front Matter

Static site generators use frontmatter similarly but with different conventions.

Hugo

Hugo uses specific frontmatter keys:

Hugo This Spec
title Same (user-defined)
date Same (user-defined)
draft Same (user-defined)
weight Same (user-defined)
taxonomies Use list fields

Migration: Hugo content mostly works directly. Define types that match your content structure.

Jekyll

Jekyll collections map naturally:

# Jekyll _config.yml
collections:
  posts:
    output: true
  projects:
    output: true

# This spec mdbase.yaml
spec_version: "0.1.0"
settings:
  types_folder: "_types"
# Define post and project types

D.4 Notion Export Compatibility

When exporting from Notion to markdown:

Database Properties → Frontmatter

Notion databases export properties as frontmatter:

---
title: My Page
Status: In Progress
Due Date: 2024-03-15
Tags:
  - important
  - review
---

Type Creation

Create types matching your Notion databases:

# _types/notion-task.md
---
name: notion-task
fields:
  title:
    type: string
  Status:
    type: enum
    values: [Not Started, In Progress, Done]
  "Due Date":
    type: date
  Tags:
    type: list
    items:
      type: string
---

Note: Notion uses spaces in property names. Use bracket notation in queries: note["Due Date"].


D.5 Logseq Compatibility

Logseq uses a block-based structure with page properties.

Page Properties

Logseq page properties map to frontmatter:

title:: My Page
status:: open
tags:: #task #urgent

Becomes:

---
title: My Page
status: open
tags: [task, urgent]
---

Block Properties

Logseq block properties don't have a direct equivalent. Consider:

  • Converting important blocks to separate files
  • Using structured frontmatter objects
  • Using the body for detailed content

D.6 Tana Compatibility

Tana exports use a specific JSON/markdown format. Key mappings:

Tana This Spec
Supertags Types
Fields Frontmatter fields
References Links

D.7 Migration Strategies

Incremental Migration

  1. Start untyped: Import files without types
  2. Add types gradually: Create types for one category at a time
  3. Enable validation: Move from off to warn to error
  4. Enforce strictness: Enable strict: true when ready

Automated Migration

# 1. Initialize collection
mdbase init

# 2. Scan existing files and suggest types
mdbase infer-types --output _types/

# 3. Review and adjust generated types
# (manual step)

# 4. Validate and fix issues
mdbase validate --fix --dry-run
mdbase validate --fix

Handling Legacy Fields

For fields that don't fit the new schema:

# Option 1: Type with any field for legacy data
fields:
  legacy:
    type: any
    deprecated: true

# Option 2: Loose strictness during migration
strict: false  # Allow unknown fields

D.8 Tool-Specific Notes

VS Code

  • Extensions can parse mdbase.yaml for IntelliSense
  • Frontmatter validation possible via YAML schemas
  • Query preview via custom webviews

Vim/Neovim

  • YAML syntax highlighting works for frontmatter
  • Custom commands can invoke CLI tools
  • LSP integration possible for validation

Emacs

  • Org-mode users: consider bidirectional sync
  • Markdown-mode with YAML support
  • Custom functions for query execution

D.9 Interoperability Best Practices

  1. Stick to common field types: String, integer, date, list work everywhere
  2. Avoid tool-specific features: Keep frontmatter portable
  3. Use standard date formats: ISO 8601 always
  4. Keep links simple: Wikilinks are most portable
  5. Document your schema: Types are self-documenting
  6. Version your types: Track schema changes in git