Layer 2 Schema

Layer 2 of the design describes how to build pipelines of computations, and provides a schema for publishing snapshots of previous computation results, associating them with human readable names for lookup, and attaching complete instructions for running those pipelines again.

(Refer to the Design chapter if further contextualization is required.)

Schema

The key concepts at Layer 2 are the Module and the Catalog.

# Recall we have the following types
#  already defined from Layer 0:
#  - AbsPath
# And from Layer 1:
#  - Formula

struct Module = {
	imports {SlotName:ImportRef}
	steps   {StepName:Step}
	exports {ItemName:SlotRef}
}

union Step keyed =
	| Operation "operation"
	| Module "module"

struct Operation = {
	inputs {AbsPath:SlotRef}
	action alias Formula.action
	outputs {SlotName:AbsPath}
}

string StepName
string SlotName

# SlotRef is a struct in nature, but serialized as a string:
# it looks like "{optional StepName}:{SlotName}".
struct SlotRef = {
	# If specified, this SlotRef will refer to an output
	# of the step with this StepName;
	# Lack of a stepName means it's a reference to the
	# module imports rather than another step.
	stepName optional StepName
	# SlotName corresponds to either an Operation output,
	# or to a Module import, dependong on if stepName is set.
	slotName SlotName
} representation string sequence ":"

### TODO: ImportRef -- needs a lot of description, it's Fun

Note that Operation is basically isomorphic to Formula at Layer 1: it just uses SlotRef and SlotName – human-readable labels – instead of going straight to WareIDs. These labels allow us to describe a graph of computation, without yet having actually evaluated any of the concrete values.

Module is just a big group of Operations wired up together. (They can be defined recursively, too, but that’s just syntactic sugar; you don’t need to use this feature unless you find it helpful for scoping and namespacing in a really large project.)

Some of the types from Layer 1 – like RunRecord – don’t show up again explicitly in Layer 2… but they’re still in the background. Imagine that when you evaluate a Module, it’s precipitating out Formulas and RunRecords as the computation proceeds.

Similarly, types from Layer 0 – like WareID – don’t show up explicitly in Layer 2… but like Formula and RunRecord, they’re produced as a consequence of evaluating a Module. In fact, and the end of evaluating a module, what you’ll actually get out when all the computations are complete is a map of {ItemName:WareID}!