Layer 2 (Pipeline)

Layer 2 Schema

Layer 2 of the design describes how to build pipelines of computations, and provides a schema for publishing snapshots of previous computation results, associating them with human readable names for lookup, and attaching complete instructions for running those pipelines again.

(Refer to the Design chapter if further contextualization is required.)

Schema

The key concepts at Layer 2 are the Module and the Catalog. Modules are about computing new stuff; Catalogs are about how we reference stuff we’ve computed before, and how we pass those reference around and share them with other people.

Modules

### ----------------------------
## Recall we have the following types
##  already defined from Layer 0:
##    - AbsPath
## And from Layer 1:
##    - FormulaAction
### ----------------------------

type Module struct {
    imports {SlotName:ImportRef}
    steps   {StepName:Step}
    exports {ItemName:SlotRef}
}

type Step union {
    | Operation "operation"
    | Module "module"
} representation keyed

## Operation is analogous to Formula, but lifted
## into Layer 2, and using *named* references (SlotRef)
## instead of WareID hashes for inputs.
##
## Each Operation is compiled down to a Formula in order
## to be evaluated; so, evaluating an Operation in effect
## produces a Formula _and_ a RunRecord.
##
## Because Operations are written with named references,
## we can write a bunch of connected Operations at one time,
## before having to evaluate any of them; and then evaluate
## the whole sequence (literally, Module) at once.
##
type Operation struct {
    inputs {AbsPath:SlotRef}
    action FormulaAction
    outputs {SlotName:AbsPath}
}

type StepName string
type SlotName string

## SlotRef is a struct in nature, but serialized as a string:
## when serialized, it looks like "{optional StepName}.{SlotName}".
type SlotRef struct {
    ## If specified, this SlotRef will refer to an output
    ## of the step with this StepName;
    ## Lack of a stepName means it's a reference to the
    ## module imports rather than another step.
    stepName optional StepName

    ## SlotName corresponds to either an Operation output,
    ## or to a Module import, dependong on if stepName is set.
    slotName SlotName
} representation string sequence (join=".")

### TODO: ImportRef -- needs a lot of description, it's Fun

Note that Operation is basically isomorphic to Formula at Layer 1: it just uses SlotRef and SlotName – human-readable labels – instead of going straight to WareIDs. These labels allow us to describe a graph of computation, without yet having actually evaluated any of the concrete values.

Module is just a big group of Operations wired up together. (They can be defined recursively, too, but that’s just syntactic sugar; you don’t need to use this feature unless you find it helpful for scoping and namespacing in a really large project.)

Some of the types from Layer 1 – like RunRecord – don’t show up again explicitly in Layer 2… but they’re still in the background. Imagine that when you evaluate a Module, it’s precipitating out Formulas and RunRecords as the computation proceeds.

Similarly, types from Layer 0 – like WareID – don’t show up explicitly in Modules in Layer 2… but like Formula and RunRecord, they’re produced as a consequence of evaluating a Module: the final product of evaluating a module is a map of {ItemName:WareID}! Catalogs (which we’ll discuss more in a moment) also make heavy use of WareID.

Catalogs

// TODO

Modules vs Replayable Modules

Modules can be considered to have two different flavors: “replayable”, or… not.

The difference comes down to what kinds of ImportRef have been used in the module’s imports. Certain kinds of imports – namely, the "catalog" kind – are things we can expect to reproducibly resolve. A module that uses only these imports is something that we thus expect to be able to run again – “replay”, if you will – at any time, even in the distant future, or without much of the ambient context of our current environment.

Remember that we can also have imports of the "ingest" kind. Ingest imports are a mechanism for bringing new data into the Timeless ecosystem. Ingest imports handle un-contained un-tracked content! Though ingests turn such data into WareIDs and thus tracked content… the ingest itself needs the un-contained context to do that. So, ingests are not easily replayable without that context.

Fortunately, we can make an unreplayable module – one that uses “ingest” imports – into a replayable one. It’s easy: all we have to do is turn any of the “ingest” imports into “catalog” imports instead.

This conversion to a replayable module can even be automatic: for example, if a module has an ingest import, and uses a SlotRef in its exports which refers directly to that ingest: then, when making releases out of this module, we can automatically re-write the import to refer to a catalog… the one we’re making a release into right at that moment.

Writing modules using “ingest” imports is great for productivity. Replayable modules are great for the timeless ability to recompute things. Automatic rewrite of modules into replayable form is a tool we use to get the best of both worlds.