Skip to content

davay42/mdld-parse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

382 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MD-LD

Markdown-Linked Data β€” Write RDF knowledge graphs as plain Markdown. Parse to quads, generate back, merge documents. Zero dependencies, round-trip safe.

MD-LD is the only RDF format that is both writable by humans and parseable by machines in the same document. Unlike Turtle (write-only), JSON-LD (machine-only), and RDFa (embedded-in-HTML-only), MD-LD annotations flow with natural Markdown prose β€” making knowledge graphs readable without a renderer.

mdld.js.org

NPM

πŸ“š Documentation Hub

  • πŸ“‹ Specification β€” Formal specification and test suite
  • πŸ“– Documentation β€” Complete documentation with guides and references
  • 🎯 Examples β€” Real-world MD-LD examples and use cases
  • πŸ“š Grammar β€” EBNF+TextMate grammar specifications
  • 🧩 Ontologies β€” W3C and related standard ontologies used in RDF

🎯 What is MD-LD?

MD-LD is not just another RDF syntax. It's a universal semantic writing interface that removes the intermediary between human text and machine-readable graphs.

Traditional systems require:

Human β†’ UI β†’ App Logic β†’ Hidden Database β†’ APIs β†’ Exports

MD-LD enables:

Human text β†’ Graph immediately

Core value: Author and maintain knowledge graphs as plain text with deterministic round-trip safety. No platforms, databases, or proprietary SaaS mediation required.

[ex] <tag:ame@example.com,2026:>

# Alice {=ex:alice .prov:Person label}

[Alice Smith] {ex:fullName}
[alice@example.com] {ex:email}

Generates RDF quads that work with n3.js, rdflib, and any RDF/JS-compatible library.

πŸš€ Quick Start

pnpm install mdld-parse
import { parse, generate, merge } from 'mdld-parse';

// Parse MDLD to RDF quads
const result = parse({ text: mdldString });
console.log(result.quads); // RDF/JS quads
console.log(result.primary); // Primary metadata (subject, type, label, comment)
console.log(result.statements); // Elevated statements
console.log(result.origin); // Provenance tracking

// Generate MDLD from quads
const { text } = generate({ quads: result.quads });

// Merge multiple documents (CRDT-style)
const merged = merge([doc1, doc2, doc3]);

πŸ’‘ Why MD-LD?

The Problem with Current Systems

Most software today uses graphs internally but hides them behind UIs:

  • Notion, Slack, Google Docs β€” Human interfaces over hidden graphs
  • CRMs, task apps, note apps β€” Proprietary data silos
  • Social networks β€” Platform-controlled knowledge prisons

Users cannot access the graph directly. Semantics are hidden. Data is locked in products.

The MD-LD Solution

MD-LD removes the intermediary. Writing becomes publishing. Publishing becomes graph construction.

Key benefits:

  • Graph sovereignty β€” You own text, graph, provenance, execution, history
  • No central platform required β€” Works offline, in browsers, on servers
  • Universal semantic substrate β€” Agents can read, reason, write, execute, validate
  • Continuous semantic narrative β€” Unifies chat, tasks, notes, emails, calendar, files
  • Native time dimension β€” Every action, statement, correction becomes part of the graph
  • Decentralized authority β€” RFC 4151 tag: URIs enable self-sovereign identity without central registries
  • Text-native agent memory - LLM Agent memory substrate in plain text β€” parse context, write knowledge, merge with other agents, all as Markdown files. No database required.

Real-World Applications

Personal Knowledge Management

[alice] <tag:alice@example.com,2026:>

# Meeting Notes {=alice:meeting-2024-01-15 .alice:Meeting label}

Attendees:
**Alice** {+alice:alice ?alice:attendee label}
**Bob** {+alice:bob ?alice:attendee label}

Action items:
**Review proposal** {+alice:task-1 ?alice:actionItem label}

Developer Documentation

[api] <tag:brian@example.org,2026:app/api/>
# Get User by ID {=api:/users/:id .api:Endpoint label}

Method: [GET] {+api:methods/GET ?api:method}
Path: [/users/:id] {api:path}
Status: [OK] {api:status}

Academic Research

[alice] <tag:alice@example.org,2026:>
# Semantic Web {=alice:research/paper-semantic-markdown .alice:ScholarlyArticle label}
Is part of [semantic research] {+alice:research/semantic !member}

Authored by [Alice Johnson] {+alice:alice-johnson ?alice:author} on [2026-08-12] {alice:datePublished ^^xsd:date}.

Content Management

[blog] <tag:justin@example.org,2026:>
# Understanding MD-LD {=blog:post-mdld .blog:Post label}

[MD-LD] {blog:emphasized} allows you to embed RDF directly in Markdown.

✨ Core Features

  • πŸ”— Prefix folding β€” Build hierarchical namespaces with CURIE-based IRI authoring
  • πŸ“ Subject declarations β€” {=IRI} and {=#fragment} for context setting
  • 🎯 Object IRIs β€” {+IRI} and {+#fragment} for temporary object declarations
  • πŸ”„ Three predicate forms β€” p (Sβ†’L), ?p (Sβ†’O), !p (Oβ†’S)
  • 🏷️ Type declarations β€” .Class for rdf:type triples
  • πŸ“… Datatypes & language β€” ^^xsd:date and @en support
  • 🧩 Fragments β€” Document structuring with {=#fragment}
  • ⚑ Polarity system β€” Sophisticated diff authoring with + and - prefixes
  • πŸ“ Origin tracking β€” Complete provenance with lean quad-to-source mapping
  • πŸ”— Span chains β€” Walkable textual topology between semantic blocks for context recovery and resonance
  • 🎯 Elevated statements β€” Automatic rdf:Statement pattern detection
  • 🏷️ Primary metadata quartet β€” Subject, type, label, comment for document identity
  • πŸ”„ Round-trip safety β€” Deterministic parse ↔ generate cycles

Bundle size: 86KB unminified, 20KB gzipped

πŸ“¦ Installation

Node.js

pnpm install mdld-parse
node -e "
import { parse } from 'mdld-parse';
console.log(parse({ text: '# Test {=tag:test@example.org,2026:index .prov:Entity label}' }));
"

Browser ESM (importmap)

<script type="importmap">
{
  "imports": {
    "mdld-parse": "https://cdn.jsdelivr.net/npm/mdld-parse/+esm",
  }
}
</script>
<script type="module">
  import { parse } from 'mdld-parse';
  const result = parse('[ex] <tag:my@example.com,2026:test/>\n\n# Hello {=ex:init .prov:Activity label}');
</script>

Example use in browser console

You can copy and paste this code into your browser console to see the list of tasks as an easy to render JSON object.

const mdld = await import('https://cdn.jsdelivr.net/npm/mdld-parse/+esm')

const text = `[my] <tag:alice@example.org:>

# Tasks {=my:tasks .prov:Collection label}

## Task 1 {=my:tasks/1 .prov:Activity label}
One of my [urgent] {my:tasks/status} [tasks] {+my:tasks !prov:hadMember}
> Explore deeper the concept of a triple in RDF {comment}

## Task 2 {=my:tasks/2 .prov:Activity label}
One of my [tasks] {+my:tasks !prov:hadMember}
> Start building knowledge graphs {comment}
`;

const result = parse({ text });

function extractByType (quads, type) {
  return Object.values(
    quads.reduce((acc, q) => {
      const s = q.subject.value;
      const key = q.predicate.value.split(/[#/]/).pop();

      (acc[s] ??= { iri: s })[key] = q.object.value;

      return acc;
    }, {})
  )
  .filter(x => x.type === type)
  .map(({ type, ...x }) => x);
}

const tasks = extractByType(result.quads,"http://www.w3.org/ns/prov#Activity")

console.log(tasks);
/*
[
  {
    "iri": "tag:alice@example.org:tasks/1",
    "label": "Task 1",
    "status": "urgent",
    "comment": "Explore deeper the concept of a triple in RDF"
  },
  {
    "iri": "tag:alice@example.org:tasks/2",
    "label": "Task 2",
    "comment": "Start building knowledge graphs"
  }
]
*/

🧠 Semantic Model

MD-LD encodes a directed labeled multigraph where three nodes may be in scope:

  • S β€” current subject (IRI)
  • O β€” object resource (IRI from link/image)
  • L β€” literal value (string + optional datatype/language)

Predicate Routing

Each predicate form determines the graph edge:

Form Edge Example Meaning
p S β†’ L [Alice] {label} literal property
?p S β†’ O [NASA] {=ex:nasa ?org} object property
!p O β†’ S [Parent] {=ex:p !hasPart} reverse object

🎨 Syntax Quick Reference

Subject Declaration

Set current subject (emits no quads):

[ex] <tag:nasa@example.org,2026:>
## Apollo 11 {=ex:apollo11}

Type Declaration

Emit rdf:type triple:

[ex] <tag:nasa@example.org,2026:>
## Apollo 11 {=ex:apollo11 .ex:SpaceMission .prov:Entity}

Literal Properties

Inline value carriers emit literal properties:

[ex] <tag:nasa@example.org,2026:>
# Mission {=ex:apollo11}
[Neil Armstrong] {ex:commander}
[1969] {ex:year ^^xsd:gYear}
[Historic mission] {ex:description @en}

Object Properties

Links create relationships (use ? prefix):

[ex] <tag:nasa@example.org,2026:>
# Mission {=ex:apollo11}
[NASA] {=ex:nasa ?ex:organizer}

Resource Declaration

Declare resources inline with {+iri}:

[ex] <tag:nasa@example.org,2026:>
# Mission {=ex:apollo11}
[Neil Armstrong] {+ex:armstrong ?ex:commander .Person}

Diff Authoring (Polarity)

Use + and - for retractions:

[ex] <tag:carol@example.org,2026:>

New student [Alice] {=ex:new-student .prov:Person ex:name} is our [class] {+ex:my-class !member}. I think she might know [Bob] {+ex:bob ?ex:knows}.

**Correction:** [Her] {=ex:new-student} name is not [Alice] {-ex:name}, it's [Ellie] {ex:name}.

**Correction:** I asked her directly - no, she doesn't know [him] {+ex:bob -?ex:knows}.

**IRI replacement:** Let's create a proper [Class] {=ex:my-class} record for [Ellie] {+ex:Ellie .prov:Person ex:name label ?member} instead of temporary [Ellie] {+ex:new-student -.prov:Person -ex:name -?member} record created earlier.

After generate(parse({text})) would look like this:

[ex] <tag:carol@example.org,2026:>

# Ellie {=ex:Ellie .prov:Person label}
[Ellie] {ex:name}

# my-class {=ex:my-class}
[ex:Ellie] {+ex:Ellie ?member}

πŸ”§ API Reference

parse({ text, context, dataFactory, graph })

Parse MDLD to RDF quads with lean origin tracking.

Parameters:

  • text (string, required) β€” MDLD formatted text
  • context (object, optional) β€” Prefix mappings
  • dataFactory (object, optional) β€” Custom RDF/JS DataFactory
  • graph (string, optional) β€” Named graph IRI

Returns: { quads, remove, statements, origin, context, primarySubject, primary, md }

  • quads β€” RDF/JS Quads (final resolved graph state)
  • remove β€” RDF/JS Quads (external retractions for diff workflows)
  • statements β€” Elevated SPO quads from rdf:Statement patterns
  • origin β€” Lean origin tracking: quadIndex, blocks, spans, documentStructure
  • context β€” Final context with prefixes
  • primarySubject β€” String IRI or null (canonical append identity)
  • primary β€” Primary metadata quartet: { subject, type, label, comment }
  • md β€” Clean Markdown with annotations stripped

merge(docs, options)

Merge multiple MDLD documents with diff polarity resolution.

Parameters:

  • docs (array) β€” Array of markdown strings or ParseResult objects
  • options (object, optional):
    • context (object) β€” Prefix mappings

Returns: { quads, remove, statements, origin, context, primarySubjects, primary }

  • quads β€” RDF/JS Quads (final resolved graph state)
  • remove β€” RDF/JS Quads (external retractions)
  • statements β€” Elevated statements from all documents
  • origin β€” Merge origin with document tracking
  • context β€” Final context with prefixes
  • primarySubjects β€” Array of string IRIs (canonical identities)
  • primary β€” Array of primary metadata objects

Use case: CRDT-style state management with append-only documents.

generate({ quads, context, primarySubject, compactInline, renderReverse, remove, lang })

Generate deterministic MDLD from RDF quads.

Parameters:

  • quads (array, required) β€” RDF/JS Quads to convert
  • context (object, optional) β€” Prefix mappings
  • primarySubject (string, optional) β€” IRI to place first in output
  • compactInline (boolean, optional) β€” Inline type/label compaction (default: false)
  • renderReverse (boolean, optional) β€” Reverse connections as !p (default: false)
  • remove (array, optional) β€” RDF/JS Quads to retract (for diff generation)
  • lang (string, optional) β€” Preferred language for labels (e.g., 'en', 'es', 'fr'). Priority: specified lang β†’ untagged β†’ English β†’ any language

Returns: { text, context, compactStats }

  • text β€” Generated MDLD text
  • context β€” Full context with prefixes
  • compactStats β€” Compaction metrics

Features: Visual styling, label-in-heading, round-trip safe, diff generation, language preference.

Example with language preference:

const { text } = generate({
  quads: result.quads,
  lang: 'es'  // Prefer Spanish labels
});

generateNode({ quads, focusIRI, context, compactInline, renderReverse, lang })

Generate node-centric MDLD for a specific IRI.

Parameters:

  • quads (array, required) β€” RDF/JS Quads to search
  • focusIRI (string, required) β€” IRI to center view on
  • context (object, optional) β€” Prefix mappings
  • compactInline (boolean, optional) β€” Inline compaction (default: true)
  • renderReverse (boolean, optional) β€” Reverse connections (default: true)
  • lang (string, optional) β€” Preferred language for labels (e.g., 'en', 'es', 'fr'). Priority: specified lang β†’ untagged β†’ English β†’ any language

Returns: { text, context, compactStats }

Safety: Returns empty text if focusIRI not found (prevents accidental full database rendering).

updateValue({ text, quad, value, origin })

Update carrier text of a literal quad in MDLD text.

Parameters:

  • text (string) β€” Original MDLD text
  • quad (object) β€” Quad to update
  • value (string) β€” New carrier text
  • origin (object, optional) β€” ParseResult.origin

Returns: Updated MDLD text (fail-safe)

Use case: Editor applications updating literal values.

locate(quad, origin)

Locate quad origin entry for UI navigation.

Returns: { blockId, range, valueRange, carrierType, ... } or null

Utility Functions

import {
  DEFAULT_CONTEXT,    // Default prefix mappings
  DataFactory,        // RDF/JS DataFactory
  hash,              // String hashing
  expandIRI,         // IRI expansion
  shortenIRI,        // IRI shortening
  parseSemanticBlock // Semantic block parsing
} from 'mdld-parse';

πŸ—οΈ Architecture

Design Principles

  • Zero dependencies β€” Pure JavaScript, 85KB unminified (20KB gzipped)
  • Streaming-first β€” Single-pass parsing, O(n) complexity
  • Character-based tokenization β€” 20-28% faster than regex-based approaches
  • Standards-compliant β€” RDF/JS data model, W3C CURIE 1.0 syntax
  • Deterministic β€” Same input always produces same output
  • Explicit semantics β€” No guessing, inference, or heuristics
  • Dual-layer origin β€” Every parse emits both a semantic quad graph and a walkable textual topology graph simultaneously

Origin: Blocks and Spans

The parser output includes a complete document chain at no extra cost:

[Block] --(Span)-- [Block] --(Span)-- [Block]
  • Blocks (origin.blocks) β€” semantic anchors: tokens that produced RDF quads, with prevSpanId/nextSpanId links
  • Spans (origin.spans) β€” textual observations: raw byte ranges between blocks, with bidirectional block and span links

Spans store no text β€” content is always recovered via sourceText.slice(span.range[0], span.range[1]). This unlocks context-aware UI, autocomplete neighborhood retrieval, and cross-document topology without any parser-level interpretation.

Performance Characteristics

  • Real-time (60fps): Up to 4,527 quads per frame
  • Batch processing: Up to 225,059 quads per second
  • Memory efficient: ~640 bytes per quad retained after GC
  • Streaming-friendly: Full document never in memory

RDF/JS Compatibility

Quads work with:

Standards Compliance

  • RDF 1.1 β€” Core RDF concepts
  • RDFS β€” Schema vocabulary
  • PROV-O β€” Provenance ontology
  • SHACL β€” Constraint validation
  • W3C CURIE 1.0 β€” Compact URI syntax

πŸ§ͺ Testing

pnpm test

Comprehensive test suite covering:

  • Syntax parsing and tokenization
  • Context management and prefix folding
  • Polarity system and retractions
  • Elevated statements detection
  • Primary metadata extraction
  • Round-trip parse/generate cycles
  • Origin tracking and provenance

Governance

MD-LD is a craft project. Its coherence comes from a single evolving understanding of how semantic text should work β€” not from consensus, but from sustained attention to the same problem over time.

This means:

  • Decisions are made by the steward, informed by discussion and use
  • The project prioritizes conceptual integrity over inclusiveness
  • Contributions that align with the model are welcomed and incorporated
  • Contributions that expand scope without deepening coherence are respectfully declined
  • The spec will not grow features to attract users β€” it will grow depth to serve understanding

Licensing

MD-LD is currently published as copyrighted source material.

The project is under active development and no open-source license has been selected yet.

Individuals, researchers, educators, and non-commercial users are welcome to experiment with the technology.

Organizations interested in production or commercial use should contact the author.

The long-term governance and licensing model remains under evaluation.

The primary goal at this stage is preserving the simplicity, interoperability, and long-term integrity of the system while the ecosystem forms around it.

About

Write RDF knowledge graphs as plain Markdown. Human-friendly semantic annotations. Zero dependencies, round-trip safe.

Topics

Resources

Stars

Watchers

Forks

Contributors