Class reference

Codecs & canonical content

Codecs

A Document is Kinogaki Core's one native shape: a tree of path-addressed Elements holding typed Values, wired by connections. The native .prisma text and .prism binary serialize that shape directly. Every other format is a codec — a bidirectional translator between foreign bytes and a Document. The codec layer is how a Markdown note, an HTML page, a JSON tree, or an SVG drawing become a Document you can read, query, edit, and emit back out in any format that can represent it.

This is also the layer the rest of Kinogaki renders through. The issue tracker and these docs are authored in Markdown, normalized into the document model, and emitted as the HTML page you are reading. The pipeline is decode → Document → encode, and the Document in the middle is plain, diff-clean, and inspectable.

Orienting example

Read a Markdown file, look at it as a Document, write it back out as HTML:

#include "kinogaki/Codecs.h"
using namespace kinogaki;

Document doc;
doc.load("notes.md");                  // Markdown → Document (Codec::Auto picks it from .md)
std::string prisma = doc.toString();   // the document model, as .prisma ASCII
doc.save("notes.html", Codec::Html);   // emit the same model as HTML

Markdown and HTML target the same document model, so loading through one codec and saving through another is a real conversion with no per-pair adapter. There is one model and a codec on each side.

The Codec selector

Codec is the one value you pass to every read or write. It names the format. Auto resolves from a file's extension on load/save, or sniffs the native bytes on decode.

enum class Codec { Auto, Prism, PrismBinary, Json, Markdown, Html, Svg, Text, Blob };

The integer values match kinogaki::Codec and the C ABI, so the Python IntEnum is the same selector across the wall.

| Codec | Format | Target model | Notes | | --- | --- | --- | --- | | Auto | by extension / sniff | — | the default for load and save | | Prism | .prisma ASCII | native | the default for toString | | PrismBinary | .prism binary | native | the only codec that honors compress | | Json | JSON | scene/structured | encode renders any Document | | Markdown | Markdown | document | CommonMark common-core subset | | Html | HTML | document | superset of the Markdown subset | | Svg | SVG | vector | structural element tree, not a blob | | Text | plain text | document (text) | lossless lines, byte-exact round-trip | | Blob | arbitrary bytes | document (blob) | lossless bytes, byte-exact round-trip |

codecName, codecByName, and codecForPath map between a Codec and its name/extension:

const char*          codecName(Codec codec);              // "markdown", "html", …
std::optional<Codec> codecByName(std::string_view name);  // "markdown"/"md"/"json"/… → Codec
Codec                codecForPath(std::string_view path); // by file extension; Blob fallback

Encode and decode

The two in-memory primitives convert bytes to and from a Document. Both directions are fallible.

std::optional<Document>    decode(std::string_view bytes, Codec codec, ParseError* err = nullptr);
std::optional<std::string> encode(const Document& doc, Codec codec, bool compress = false);

decode normalizes foreign bytes (or native .prisma/.prism) into a Document. On malformed input it returns nullopt and fills err with a located diagnosis — codecs fail closed, they never hand back a half-parsed Document. encode renders a Document as the codec's bytes, and may decline: asking a text codec to write an arbitrary scene returns nullopt rather than garbage. That declining is what lets a tool dispatch blindly over the codec set and still fail cleanly. compress applies to PrismBinary only.

def decode(data: "str | bytes", codec: Codec) -> Document      # raises PrismError if it can't parse
def encode(doc: Document, codec: Codec, *, compress=False) -> "str | bytes"  # raises if it declines

In Python the fallibility surfaces as a raised PrismError instead of nullopt. Text codecs return str; PRISM_BINARY and BLOB return bytes.

The Document methods

The everyday surface is the methods on a Document — load, loadString, toString, save. They wrap decode/encode with file I/O and an Auto-by-extension default. load/loadString replace the document's contents.

bool        load(const std::string& path, Codec codec = Codec::Auto);  // Auto → codec by extension
bool        loadString(std::string_view bytes, Codec codec);
std::string toString(Codec codec = Codec::Prism) const;  // .prisma ASCII by default; any codec otherwise
bool        save(const std::string& path, Codec codec = Codec::Auto) const;  // Auto → format by extension

load and save return false (Python: False) on a read/parse/write failure or a declined encode. toString raises in Python if the codec declines the document; in C++ a declining codec yields an empty string.

A worked example: Markdown → Document → HTML

Start with a Markdown note:

# Notes

A paragraph with *emphasis*.

load lands it on the document model, a tree of typed Elements. A document's blocks and inline runs are ordered body content, so they are nameless Elements (/document/[0], [1], …): their order is their identity, and the file reads as the structure itself.

Document doc;
doc.load("notes.md");                  // Markdown → Document
std::string prisma = doc.toString();   // the document model, as .prisma ASCII

The Document, written as .prisma:

#prisma 3.0
def document "document" {
    def heading {
        int32 level = 1
        def text {
            str text = "Notes"
        }
    }
    def paragraph {
        def text {
            str text = "A paragraph with "
        }
        def emphasis {
            def text {
                str text = "emphasis"
            }
        }
        def text {
            str text = "."
        }
    }
}

Save that same Document as HTML. The HTML codec walks the shared model straight through:

doc.save("notes.html", Codec::Html);
<h1>Notes</h1>
<p>A paragraph with <em>emphasis</em>.</p>

The heading, paragraph, emphasis, and text Elements are the model both codecs speak: load through one, save through the other.

The content vocabulary

A codec does not invent ad-hoc Elements per file. It targets a canonical model for a kind of content, and that model is shared by every codec of its kind. The document model below is shared by the Markdown and HTML codecs, which is exactly why md → Document → html works.

The document is rooted at /document. Its blocks are ordered anonymous children, and a block's inline content is likewise ordered anonymous children, so the tree carries the meaning with no "0"/"1" noise.

Block kinds:

Inline kinds:

HTML is a superset of the Markdown subset, so any model from a Markdown document renders to clean HTML. The HTML codec also carries the document-v1 constructs these docs use beyond CommonMark core: heading.id (<h* id>), definitionList/term/definition (<dl>/<dt>/<dd>), figure/caption (<figure>/<figcaption>), note (<div class="note">), and a foreign <svg> island kept verbatim as a single rawHtml block — the HTML-only escape hatch. Unknown tags are flattened, their text kept. Markdown constructs outside the subset (tables, raw HTML, reference links) are not modelled.

The other kinds

Each kind of content has one canonical model; Prism's native shape is the scene/structured kind.

All kinds reduce to one substrate, which is why they can share it: an image channel, a vertex position, a vector control point, and a heading's text are all typed array Values; layers and AOVs are named properties, the same mechanism that carries a material parameter.

Bundle

A Document is already a path-addressed hierarchy, which is exactly what a directory tree is — so a folder of files is a Document. Bundle is the filesystem convention on top, letting a whole site or project pack into one Document and ship as one compressed .prism.

Filenames are data, not path segments: a path segment is [A-Za-z0-9_] and dotted names do not survive the binary crate, so each element carries a safe segment plus its name. The render-as rule keeps storage and presentation separate: a file's type is the storage truth, the name's extension is the materialise target. A document named page.html materialises as HTML; the same content named page.md materialises as Markdown.

#include "kinogaki/codecs/Bundle.h"
using namespace kinogaki::codecs;

Document site;
Path root  = bundle::root();
Path pages = bundle::addFolder(site, root, "pages");
bundle::addFile(site, pages, "index.html", htmlBytes);   // kind chosen from the extension
bundle::addFile(site, root,  "style.css",  cssBytes);    // → an opaque blob

std::optional<std::string> html = bundle::materializeFile(site, /* the index path */);

addFile picks the kind from the extension (html/htm/md/markdown → a document decoded via the matching codec; svg → a structural svg; anything else → a blob). materializeFile reads a file element back to bytes, honoring the render-as rule by choosing the codec from the file's name extension. The functions are filesystem-free and pure — they operate on a Document, and a tool walks the real directory and calls them.

A lossless contract

Every codec is held to a stated contract. A within-format round-trip is the identity: doc.md → doc.prisma → doc.md returns your file. Cross-format degradation is defined and tabulated — a construct a target format cannot express degrades in a documented way (an HTML-only island becomes a verbatim raw block on the way to Markdown), so you always know what a conversion preserves. The shipping codecs cover JSON, Markdown, HTML, SVG, plain text, and arbitrary blobs — enough that anything you hand the tooling converts cleanly, and enough to build these docs entirely out of converted Markdown.