Welcome

Welcome to the Timeless Stack documentation.

The Timeless Stack is a suite of tools for running processes repeatedly. All of the tools and APIs in the stack are tailored to make task definition precise, environment setup portable, and results reproducible.

The Timeless Stack is the logical next step for Linux Containers: isolation and sandboxing processes is a critical start: the Timeless Stack tools now let you take total control of your container image builds with a purely functional and, well, timeless approach to composition and dependency management.

Philosophy and Goals

The Timeless Stack is about reliable computing. Reliable software builds; reliable data processing: and enabling them through reliable data storage, and making the processing itself just another piece of data we can easily reason about.

Container technology is a key part of our approach to reliability: containers makes it possible to have a zero-ambiguity environment, where all the inputs necessary to make a process run are absolutely clear.

More important than the technology though is the principles of how we use it:

  • Zero-ambiguity environment: the Timeless Stack is developed on the principle of "precise-by-default".

  • Deep-time reproducibility: the Timeless Stack represents a commitment to reproducible results today, tomorrow, next week, next year, and... you get the picture.

  • Communicable results: the Timeless Stack describes processes in a Formula. Communicating a Formula -- via email, gist, pastebin, whatever -- should be enough for anyone to repeat your work. Everything in the Timeless Stack is API-driven, easy to serialize, and easy to share.

  • Control over data flow: Unlike other container systems, in the Timeless Stack you can compose filesystem trees how you want: multiple inputs, in any order; and explicitly declare sections of filesystem that are useful results to export (meaning just as importantly, you can choose what files to leave behind). Granular control lets you build pipelines that are clean, explicit, and fast. More importantly, it lets us reason about our processes, and thus scale up our ability to share.

  • Labeling instead of contamination: The Timeless Stack configuration explicitly enforces a split between << data identity >> and << data naming >>. We work with hashes as primary identifiers, which it easy to decentralize any processing built with the Timeless Stack.

  • Variation builds on precision: the Timeless Stack designs for systems like automatic updates and matrix tests on environmental variations by building them on top of the zero-ambiguity/deep-time-reproducible API layer of Formulas. This allows clear identification of each version/test/etc, making it possible to clearly report what's been covered and what needs to be finished.

The Timeless Stack is not a build tool, and it's not just a container image builder; think of it more as a workspace manager. It's important to have a clean workspace, fill it with good tools, and keep the materials going both in and out of your workspace well-inventoried. Like other container systems, you can use make, cake, rake, bake, or whatever's popular this month inside a Timeless Stack Formula; unlike other container systems, with the Timeless Stack you can control the inputs separately, update them separately, and perhaps most importantly of all actually produce results which are smaller than the whole fileset you started with.

All of these properties come together towards two big goals:

  • Decentralization: things built with the Timeless Stack can be built again, anywhere, by anyone, and anywhen.
  • Simplicity: shipping full system images is great, but with the Timeless Stack you can also choose to produce more granular products, which lets you keep individual parts simple even while building bigger systems.

Let's hear more...!

You should see a table of contents on the left, and page-flip buttons just below. (On mobile screens, you may need to click the "pancake stack" icon at the top of the page to show/hide the table of contents.)

If you're evaluating the Timeless Stack to understand why the project exists and what's unique about the ecosystem, you'll want to start with the Design chapters.

If you're a person of action: the Quickstart section has examples of using the tools. The quickstart examples start with the lowest level tools, then work gradually up the stack to the more expressive layers -- so keep this in mind if you have limited patience or time to experiment; the commands you'll use the most in practice are actually at the end of the quickstart series.

If you just need to brush up on some reference material: head to the CLI docs. (But keep in mind most of the tools will also generate their own help text in response to the -h or --help commands!)

The Glossary is also a good quick reference for core concepts.

To get to source code, jump to the CLI docs, and each tool's source repo will be linked from their individual reference pages.

For a new reader, we recommend giving the Design chapters a quick skim for highlights, but jump to the Quickstart examples as soon as things get too abstract. Go back and forth in whatever order interests you!

Happy hacking!

Getting Started with Repeatr and the Timeless Stack

In this getting started doc, we'll show working examples You should be able to copy the snippets in this file directly, and they should work without modification. You should be able to modify them afterwards to build in the directions you wish to explore.

prerequisites

  • a linux kernel (we're about to use linux containers).
  • either the timeless stack tool binaries, OR a go compiler to build them.

The host system requirements for running the core Timeless Stack tools is intentionally very, very small.

installing

First things first: we'll need a repeatr (and rio) binary on our $PATH.

To build the latest versions: clone and follow the instructions in https://github.com/polydawn/repeatr . This will require a go compiler, bash, git, and not much else.

computing with repeatr

The first piece of the Timeless Stack we'll use is Repeatr. Repeatr computes things -- and you guessed it, hopefully repeatedly. To do this, Repeatr uses containers to isolate environments, and it will be our job to give a list identifing all of our raw materials to Repeatr so it can set up that isolated environment.

hello-world formula

{"formula": {
    "inputs": {
        "/": "tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5VoMU6AGszuBYAz9EzQdeHVFAou7c4W9vFcQ6"
    },
    "action": {
        "exec": ["/bin/echo", "hello world!"]
    }
},
"context": {
    "fetchUrls": {
        "/": [
            "ca+https://repeatr.s3.amazonaws.com/warehouse/"
        ]
    }
}}

This snippet is called formula (and some "context" configuration). It lists the inputs we need -- these are specified using WareIDs -- and describes the action we want to run in the container.

Copy and paste formula and its context into a file called example.formula, and we can run it!

repeatr run example.formula

You should see a couple lines of logs scroll by, then the "hello" output, and finally, a json object. The logs as repeatr sets up the environemnt are routed to stderr, as is the "hello" print from the commands run in the container. The json object comes out on stdout -- and is the only thing on stdout, so you can easily pipe this to other tools (e.g. jq).

log: lvl=info msg=read for ware "tar:6q7G4hW...vFcQ6" opened from warehouse "ca+https://repeatr.s3.amazonaws.com/warehouse/"
hello world!
{
    "guid": "by356nem-e0trxfw4-mt4xk1m9",
    "formulaID": "VvzXuRSogyW7JXvt49JWSfbJoAhpovCRPM69bd8xnDXyU8L5TMhrUmKGodWffysmK",
    "results": {},
    "exitCode": 0,
    // ...additional metadata elided...
}

This json object is called a Run Record. You get one from every repeatr run invocation, and they describe both the setup (the formulaID property is a hash describing the formula we just ran! This will be very useful, later), and the results... this time, we have an empty results field, but we'll see that used in just a moment; also, you can see the command in the container exited successfully by the "exitCode": 0 line.

This is a reliable, repeatable way to distribute software and run it regardless of host environment. But it's actually pretty boring! Let's build something with it, next.

producing outputs

This is a formula that produces outputs:

{"formula": {
    "inputs": {
        "/": "tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5VoMU6AGszuBYAz9EzQdeHVFAou7c4W9vFcQ6"
    },
    "action": {
        "exec": ["/bin/mkdir", "-p", "/task/out/beep"]
    },
    "outputs": {
        "/task/out": {"packtype": "tar"}
    }
},
"context": {
    "fetchUrls": {
        "/": [
            "ca+https://repeatr.s3.amazonaws.com/warehouse/"
        ],
    },
    "saveUrls": {
        "/task/out": "ca+file://./warehouse/"
    }
}}

As you can see, a formula with outputs isn't much more than what we've already seen: you just name the filesystem path you want to save when the container exits, and Repeatr will make it happen.

As with the fetchURLs for inputs, we now have saveURLs for the output. These are optional; you can list an output but no matching saveURL if you want to hash it, but discard the data. But typically of course you do want to save the output, so you can either pass it on to more formulas, or use rio unpack to extract it on your host.

Okay, let's run:

{
    "guid": "by37z50k-6kh08vp6-ofnhpc8c",
    "formulaID": "8jjTTBhvBixJZz2XcV6UjpmdnJSFz1QoR17E8UqcYNjM3gJc7nfRN5ithU6FGTLaTe",
    "results": {
        "/task/out": "tar:729LuUdChuu7traKQHNVAoWD9AjmrdCY4QUquhU6sPeRktVKrHo4k4cSaiQ523Nn4D"
    },
    "exitCode": 0,
    // ...additional metadata elided...
}

Now our RunRecord's results field has members! There will be one entry for every entry you requested in the formula's outputs section. Each value is a WareID -- the same format we use to identify formula inputs.

Congrats! You just made your first reproducibly-build ware :D

But where did it go?

Here, our saveURL was ca+file://./warehouse/. This URL indicates three things:

  • file:// indicates we'll use the local filesystem as the storage warehouse;
  • the ca+ prefix indicates we'll use it in Content Addressable mode
  • ./warehouse is the local directory we'll store things at.

So, if you run find ./warehouse on your host, you should now see a file with a (quite long) path which is the hash you just saw in the runrecord. That's your packed ware. Since we're using the "tar" pack format in this example, you can actually extract it with any regular tar command -- but maybe hang on; we'll cover the rio pack and rio unpack commands in a sec, which are a bit smoother (and handle things consistently for other pack formats, as well).

Since the results are WareIDs, and inputs to formulas are WareIDs, we don't have to stop here and unpack the results -- we can chain formulas together to build more complex software. We'll demonstrate formula chaining right after the unpack commands.

other things to try

There are lots of different options you can configure in formulas, such as setting environment variables, setting the user IDs to run as, and many other knobs to twiddle. We'll skip over those in this quickstart.

One thing you may have wondered already is why the "context" is separate from the "formula". You can answer that question by changing some of the "context" fields -- say, adding or removing more URLs to the fetchUrls list -- and then calling repeatr run again. Notice anything? The formulaID doesn't change ;)

packing and unpacking Wares

Once you have things packed into wares and identified by WareIDs, it's easy to assemble them and also build new ones with Repeatr. But what about at the edges of the system? How do we import new stuff from the outside world? How do we export stuff we make to other folks?

The answers all these questions are pretty simple: rio. You can use rio --help to get an overview of everything Rio can do; in short, it's for moving packed Wares around and for shuffling files in and out of packed form. rio was what repeatr used earlier to get and save your files; if you watch ps while it's running, you'll see a rio child process for every input and output.

packing files into wares

rio pack <packtype> <filesetPath> [--target=<warehouseURL> ...]

Packing turns a fileset -- any ol' directory full of files -- into a packed form, and if a target warehouse is specified, uploads the packed data there.

This command returns a WareID on stdout. You can easily pipe this to other commands (like rio unpack to simply get the same files back again), or template it into a formula's input section.

unpacking wares into files

rio unpack <wareID> <destinationPath> [--source=<warehouseURL> ...]

(Note that <wareID> looks like <packtype>:<hash> -- they're the same thing.)

Unpacking fetches data based on its wareID -- a content-addressible identifier, based on cryptographic hash, which means what you get is immutable and always exactly what you asked for -- and unpacks it into a fileset on a local directory.

Unpacking, like packing, prints a WareID on stdout when done. Depending on your other flags, this may be a different WareID than the one you asked for! rio unpack will unpack the files with your current user's UID and GID by default; doing so results in a slightly different filesystem, and that's what this resulting WareID is describing. Check out the rio unpack --help for more info on these flags (particularly, --uid, --gid, and --sticky.)

You will need to start rio with superuser privileges to successfully perform a rio unpack with UID and GID settings (as usual -- no magic here).

scanning existing packs for WareIDs

rio scan <packtype> --source=<singleItemWarehouseURL

Rio can scan existing packed data and report the Rio WareID. This only makes sense for some pack types: for example, it's easy to do this with tar archives produced by other processes... but utterly nonsensical to do with git repositories, because there's no such thing as identifying git repo content without a hash.

The --source argument uses the same style of warehouse URLs as all Rio subcommands, but interprets it slightly differently: the URL must identify one ware only. For example, you can use ca+file:///.warehouse/ in rio pack and rio unpack, but you cannot use that URL with rio scan; you'll have to drop to a non-CA variant so that a specify ware is pointed to rather than a whole warehouse.

mirroring existing packs to many warehouses

// TODO rio mirror

composing multiple formulas

// TODO

Timeless Stack Design

As we covered in the intro, the Timeless Stack is three things: a philosophy, some tooling, and an ecosystem that plays nicely together. Thus, the design documentation also has a major split: some things are very literal tool and API design subjects, and some are recommendations and guidelines for good ecosystem integration in the things you generate with the tools.

  • Timeless Stack API Layers, the next section, covers where the core APIs begin and end. It's important to understand where these API layers are separated in order to understand how the Timeless Stack facilitates building reusable components without going the typical road of building a new walled garden of a distro.
  • Release Schema then covers how the Timeless Stack standardizes publishing both build instructions and artifact identifiers so that both data and all the mechanisms to regenerate that data can be shared. This standardization makes it possible for many different projects with many different authors to all effectively collaborate, publishing releases and maintaining dependency trees even without a centralized online authority.
  • Responsible Packaging takes the next step and describes how we recommend building and packaging software for end users so that it works well in not just the Timeless Stack ecosystem, but also will be trivially exportable to any other distro and environment. (Note that this section is guidelines and recommendations -- not code, structures, or requirements. You can start building software and processing data with the Timeless Stack without reading this section.)

Timeless Stack API Layers

The Timeless Stack APIs are split into several distinct levels, based on their expressiveness. The lower level layers are extremely concrete references, and focus heavily on immutability and use of hashes as identifiers; these layers are the "timeless" parts of the stack, because they leave no ambiguity and are simple serializable formats. The higher level layers are increasingly expressive, but also require increasing amount of interpretation.

  • Layer 0: Identifying Content — simple, static identifiers for snapshots of filesystems.
  • Layer 1: Identifying Computation — scripts, plus explicit declarations of needed input filesystem snapshots, and selected paths which should be snapshotted and kept as outputs.
  • Layer 2: Computation Graphs — statically represented pipelines, using multiple isolated computations (each with independent, declarative environments) to build complex outputs.
  • Layer 3+: Planners — use any tools you want to generate Layer 2 pipelines! The Timeless Stack has standard bring import and export APIs, and you can compute Layer 2 however you like!

The Timeless Stack focuses ensuring the lower level layers are appropriate to track in version control. There's a strong separation between Layer 3 and everything below: since Layer 3 may require computation itself to generate the Layer 2 specifications, we require all of the lower layers to make sense and be manipulable without any relationship or dependency on Layer 3 semantics.

As with the layers of a pyramid: the lower layers are absolutely essential foundation for everything that comes on top of them; and also, relatively small amounts of code at the highest levels can direct massive amounts of work in the lower layers.

The Layers, in detail

Layer 0: Identifying Content

The most basic part of the Timeless Stack APIs are WareIDs -- hashes, which identify content, fully immutably.

The main tool at this level is Rio. Operations like rio pack and rio unpack convert filesystems into packed Wares (which are easy to replicate to other computers) and WareIDs (so we can easily refer to the Wares even before copying them)... and back again to filesystems.

Data Examples

Data at Layer 0 is very terse: it's all WareIDs, which are a simple string identifier composed of a "packtype" (e.g. tar, git, zip, etc) and a hash.

These are all examples of WareIDs:

  • tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5VoMU6AGszuBYAz9EzQdeHVFAou7c4W9vFcQ6
  • tar:8ZaAmtWZbjtNfJWD8nmGRLDn2Ec745wKWoee4Tu1ZcxacdmMWMv1ssjbGrg8kmwn1e
  • git:825d8382ac3d46deb89104460bbfb5fbc779dab5
  • git:3cf6a45846f1b33e6459adee244f1ac18ae0d511

As you can see, these aren't very human-readable. We'll address this in the higher protocol layers -- around Layer 2 we'll begin to construct mappings that associate human-readable names to these opaque and immutable references.

Layer 1: Identifying Computation

Formulas and RunRecords -- hashable, contain no human naming, identifying computations, fully static.

The main tool at this level is Repeatr. The most common command is repeatr run, which takes a Formula, evaluates it, and returns a RunRecord (see the example data structures, below).

Data Examples

A formula looks something like this (in YAML format), though they may have many inputs, and also multiple outputs:

# This is a Formula.
inputs:
  "/":       "tar:6q7G4hWr283FpTa5Lf8heVqw9t97b5VoMU6AGszuBYAz9EzQdeHVFAou7c4W9vFcQ6"
  "/app/go": "tar:8ZaAmtWZbjtNfJWD8nmGRLDn2Ec745wKWoee4Tu1ZcxacdmMWMv1ssjbGrg8kmwn1e"
  "/task":   "git:825d8382ac3d46deb89104460bbfb5fbc779dab5"
action:
  exec:
    - "/bin/bash"
    - "-c"
    - |
      export PATH=\$PATH:/app/go/go/bin
      ./goad install
outputs:
  "/task/bin": {packtype: "tar"}

As you can see, a Formula composes many of the Layer 0 components. It also generates more Layer 0 WareIDs. When you feed the above formula to repeatr run, you'll get a JSON object on stdout called a RunRecord, which resembles this one:

# This is a RunRecord.
{
    "guid": "c3rms673-o2k84p3y-4ztef48q",
    "time": 1515875768,
    "formulaID": "3vFsH3UbWJZHPrhgckpf5DJrq5DisykE3ND6Z14ineQJxdvZb9iapiKKGtE8ZHEDzM",
    "exitCode": 0,
    "results": {
        "/task/bin": "tar:6XKnQ4Kcf6zmf16VNUAyBHirTEKV8WfB3JunSx3Szenc7keiotuEDCNZjCXcxod7mH"
    }
}

RunRecords contain several items which are essentially random -- namely, the time and guid fields. They also contain many fields which should be deterministic given the same Formula -- specifically, formulaID is actually a hash of the Formula that was evaluated; it's an immutable, unforgeable reference back to the Formula. Most importantly, though, the RunRecord contains the results map. This contains a key-value pair of path to WareID -- one pair for each output path specified in the Formula. The results section depends on what your formula does, of course.

Like Formulas, RunRecords can also be hashed to produce unique identifiers. These hashes cover the unreproducible fields like time and guid, so they tend not to collide, and thus are useful as primary key for storing RunRecords. The collision resistance makes it easy to gather RunRecords from many different authors -- useful if we want to compare their results fields later!

(repeatr run will also emit the stdout and stderr printed by your contained process on its own stderr channel, plus some decoration. This is configurable, but the important note here is that we consider those streams to be debug info, and we don't keep them. Use tee or route them to a file if they're needed as outputs that can be referenced by other formulas later.)

Layer 2: Computation Graphs

The Timeless Stack represents multi-stage computations by generating a series of formulas from a document which has psuedo-formulas, which rather than having all hashed wareIDs as inputs already pinned, instead uses human-readable names and references. These human-readable references can connect the outputs of one formula to be the inputs of another; or reference external data (e.g. previous releases of system which have publicly tracked names).

These multi-stage computations are called "modules", and each psuedo-formula is a "step". The module-local named references which connect Steps are called "slot refs". Slot Refs can be initialized either by the outputs of a Step, or by external data using a "import". Imports come in several forms, but the main one is "catalog imports", which refer to snapshots of previously produced data.

At Layer 2 we also begin to have multiple documents of different types which will all be referenced at the same time. We have not just the Module which you will author; but will need to import data from Catalogs, and export releases to put more info into Catalogs for future use. We're starting move towards updatability rather than repeatability here; it will be up to the user to make sure all these documents are versioned in a coherent snapshot for deep-time repeatability.

As a result of the use of human-meaningful names rather than hashes, documents at Layer 2 are not trivially globally content-addressable. In other words, two different people can write semantically identical Layer 2 modules, which generate totally identical Layer 1 formulas... and while the Layer 1 formulas will converge to the exact same identity hashes, the Layer 2 modules may not, if different locally-scoped names were used. Examples of differences that may result in identical Layer 1 content but distinct Layer 2 modules include step names (one author may have called a step "stepFoo" while the other titled it "stepBaz") or imports which resolve to the same Wares but got there via different references (one module might import a WareID released as "foo:v1.0:linux" while another references it as "foo:v1.0rc2:linux", regardless of whether both names resolve to the same WareID).

Data Examples

A module is composed of several steps (mostly "operations", which are the precursor to a formula, and will generate a formula when all inputs are resolved to hashes), plus some information to wire intermediate steps together ("imports") and information to name the final interesting results ("exports"):

{
    "imports": {
        "base": "catalog:example.timeless.io/base:201801:linux-amd64"
    },
    "steps": {
        "stepBar": {
            "operation": {
                "inputs": {
                    "base": "/",
                    "stepFoo.out": "/woof",
                },
                "action": {
                    "exec": [
                        "cat",
                        "/woof/records"
                    ]
                },
                "outputs": {}
            }
        },
        "stepFoo": {
            "operation": {
                "imports": {
                    "base": "/",
                },
                "action": {
                    "exec": [
                        "bash",
                        "-c",
                        "mkdir out\nls -la /usr/bin | tee > out/records"
                    ]
                },
                "outputs": {
                    "out": "/task/out"
                }
            }
        }
    },
    "exports": {
        "a-final-product": "stepFoo.out"
    }
}

Evaluating a modules simply evaluates each step in order, plugs together any intermediates, templates this info into a formula, evaluates it, and then turns the crank for the next step. (Modules can be automatically topo-sorted based on dependencies, and Timeless Stack tools will evaluate things in that order.)

The final result of evaluating a Layer 2 Module is very similar to the results of Layer 1 evaluation: each formula will yield a RunRecord, so we get a whole series of those which we can retain (mostly for audit purposes)... plus, we get a map of all the WareIDs produced that were marked for export.

The final map of exports is isomorphic to a catalog release items map. You can pipe the exports map right into a making a new release!

:warning: Layer 2's Module format is recently developed (mid 2018). It is subject to change.

Modules have several other interesting features, such as "submodules" and "ingest references" -- docs for these are TODO :)

Layer 3: Planners

Planners at large -- this layer is open to substantial interpretation and not actually standardized; the only constraint for integrating it into the Timeless ecosystem is that whatever is going on at this layer, it has to produce the "module" format; from there, other tools can interoperate.

Which layer should I interact with?

Which layer you should interact with depends on what kind of work you're doing.

In short, most people will author stuff up at Layer 3. It's where the most expressive forms of authorship are at. But most tools will operate on the lower layers, and integrations with other ecosystems (you want to track releases on the blockchain? Hello, welcome!) will similarly want to interface with these lower layers.

What can we do without Layer 3?

We can do a great deal of work with Layer 0/1/2 alone!

  • We can transport snapshots of data, source code, and program binaries;
  • We can run programs and compilers (exact versions of them, on exact versions of source code and input data);
  • and we can run whole pipelines of various programs and compilers, each with their own complete environments.

And since all of these pieces of data are serializable, we can commit entire snapshots of these pipelines to version control.

This means someone else, given this snapshot (and the Timeless Stack tools), can reproduce our entire environments, with all dependencies, and repeat our entire pipeline of data processing.

What do we need Layer 3 semantics for, then?

In short, moving forward.

When handling data at Layer 0, it's all immutable.

When handling computations at Layer 1, they're still all immutable. The only way that using the same Layer 1 instructions will generate different data is if they generate random data (which is probably a Problem rather than anything you'd want).

When handling sets of computations at Layer 2, they're still, yes still, all immutable. Even though some steps refer to other steps for their inputs, the typical expectation is that each step should reliably produce the same data, so the overall semantics of re-executing a whole Layer 2 module should be the same as an individual Layer 1 step.

Layer 3 is where we finally relax on immutability, and thus it's where we begin to do the interesting work of generating new modules and updating inputs. Layer 3 can look up release information from other projects, for example, and bring that in as an input to the Layer 2 data. Being precise in this information in Layer 2 is critical for later auditability and reproducibility; but in Layer 3, we're free to compute new plans using whatever latest freshest data we want.

Thus, most human authorship happens using Layer 3 tools and languages, because it provides the most flexibility and leverage -- then, we bake those plans into Layer 2 immutable data ASAP, in order to get the best benefits of both worlds (expressive and immutable).

Integration Examples

Layer 0 WareIDs are short and easy to copy-paste to share in emails, slacks, tweets, or good ol' IRC. Other people can download data produced by Timeless Stack pipelines without a fuss.

Layer 1 Formulas can be put on something as simple as pastebin in order to share with other people. It can be useful for self-contained bug reports, for example.

Layer 2 Modules are suitable to feed to tools which can traverse graphs and e.g. draw nice renderings of build dependencies. Such tools could also ask and quickly answer questions like "Find all dependencies ever used, recursively, to build $tool-foobar; now, tell me if they currently have any security vulnerabilities".

Layer 2 Modules is suitable for publication in a distributed ledger. This can be used as part of a system to make read-only public audit and accountability possible.

Layer 2 Modules, like Layer 1 Formulas, can be easily re-evaluated -- even by other people, other machines, and even months or years later -- so it can be used to distribute small and reproducible instructions rather than large binary blobs that take lots of network and disk space. This makes it excellent for Continuous Integration / Continuous Deployment systems, which can use it to track and report the health as well as historical states of large systems.

The possibilities are pretty much endless. If you can parse a JSON API, you can build integrations with the Timeless Stack at whatever layer seems the most useful to you.

Release Schema

"Releases" are a data structure associating a human-readable name to a WareID (and, some metadata about it: descriptions of the content, info about mirrors where it can likely be fetched from, etc).

Catalogs and Release records

Naming a specific artifact is a three-tuple:

  • Catalog name -- catalogs represent a project, and a single authoring party. (In terms of key management, a Catalog is the unit of signing!)
  • Release name -- a release may contain several wares, but is made as one atomic object. Releases can be tagged with all sorts of metadata. (They also share a single "replay" -- jump to the next section for more about replays.)
  • Item name -- releases often contain several "items" -- a typical example is a "docs" item, a "linux-amd64" item, a "darwin-amd64" item, etc.

This complete tuple -- catalog:release:item -- is enough to identify a specific WareID.

Example data

Here is an example of a single release catalog, containing three releases, each of which has three or four WareIDs itemized in the release:

{
  "name": "domain.org/team/project",
  "releases": [
    {
      "name": "v2.0rc1",
      "items": {
        "docs":         "tar:SiUoVi9KiSJoQ0vE29VJDWiEjFlK9s",
        "linux-amd64":  "tar:Ee0usTSDBLZjgjZ8NkMmCPTHtjUb5r",
        "darwin-amd64": "tar:G9ei3jf9weiq00ijvlekK9deJjFjwi",
        "src":          "tar:KE29VJDJKWlSiUoV9siEjF0vi9iSoQ"
      },
      "metadata": {
        "anything": "goes here -- it's a map[string]string",
        "semver":   "2.0rc1",
        "tracks":   "nightly,beta,2.x"
      },
      "hazards": null,
    },{
      "name": "v1.1",
      "items": {
        "docs":         "tar:iSJSiUoVi9KoQ0vE29VJDWlK9siEjF",
        "linux-amd64":  "tar:BLZEe0usTSDjgjZ8NkMmCPUb5rTHtj",
        "darwin-amd64": "tar:weiG9ei3jf9q00ijvlekK9FjwideJj",
        "src":          "tar:KWlKE29VJDJSiUoV9siEjFiSoQ0vi9"
      },
      "metadata": {
        "anything": "goes here -- it's a map[string]string",
        "semver":   "1.1",
        "tracks":   "nightly,beta,stable,1.x"
      },
      "hazards": null,
    },{
      "name": "v1.0",
      "items": {
        "docs":         "tar:iUiSi0vE29VQJSoVK9sjFJDWiEl9Ko",
        "linux-amd64":  "tar:e0BLTjZ8NkMgZEusb5rtjmCPTHUSDj",
        "src":          "tar:E2KWJUoV9siilK9VSoQi9EjF0viDJS"
      },
      "metadata": {
        "anything": "goes here -- it's a map[string]string",
        "semver":   "1.0",
        "tracks":   "nightly,beta,stable,1.x"
      },
      "hazards": {
        "something critical": "CVE-asdf-1234"
      }
    }
  ]
}

Notice that each Release name is unique, but also that releases are stored in an ordered array rather than an unordered map. This is to remove any potential ambiguity or complex decision making about the sorting when a user-facing tool must decide in what order to present choices. (Automation should refrain from assuming that the top of the list is "latest", however -- remember, this is the Timeless Stack; "latest" is not a concept we want to give much credence to at any point in time!)

Release names are free-text. Typically, we recommend they start with a "v" out of convention. It is also typical to follow something roughly resembling SemVer (though there are many, many variations on this, and the Timeless Stack does not explicitly recommend nor require adherence to any particular variations of SemVer rules).

Item names are also free-text. By convention, the most likely names you'll see are similar to the ones in this example: "docs" and "src" are extremely common; tuples representing architecture and host OS like "linux-amd64" are also common. Usually, the same item names should occur in subsequent releases as in earlier ones -- tooling that generates formulas using WareIDs from release catalogs expects the release name to change for each new version, but the item names to remain essentially constant.

Replay instructions

As we've already established, releases are at heart a mapping: the human-readable catalog:release:item tuple to a specific WareID. However, there's a lot more we'd like to communicate as well: wouldn't it be nice if we could share all our build instructions along with a release?

We can. Remember Computation Graphs from Layer 2 of the API schema? These structures are suitable for associating with a release. In fact, the exports section of a Module lines up precisely with the "item name" section of a release record.

Responsible Packaging

This section is TODO :)

Cliff notes:

  • "Package Management" is really a bunch of things and we need to acknowledge that and split these roles out:
    • Authoring packages;
    • Syncing package metadata, and making it possible to compute selections of packages;
    • Distributing the bulk data of packages;
    • and finally Installing packages (and note well, if you weren't able to separate this from selection, you dun goofed megabad).
    • Yes, there are some more peripheral bonus features we can define...
      • for example, keeping enumerations of what's been installed on a host
      • but let's not get distracted: these are bonuses. Even that example is stretching it: "has been installed" on a "host" doesn't even make sense in all situations, such as containers.
  • Okay! Now that we've got that split defined, we can identify sensible requirements for each role.
    • Authoring is a human story; we'll leave that aside for this discussion.
    • Publishing metadata is the job of catalogs. This is already well-spec'd in the Timeless Stack.
    • Performing selections is a somewhat freetext area in the Timeless Stack (though if your result is a Module, we certainly do ensure the selection-vs-usage separation).
    • Distributing the bulk data is explicitly out of band for the Timeless Stack -- and because we have the WareID hash contract, that's okay and easy to punt on without compromising other system design details.
    • Installing packages is what we want to talk more about.
  • Installing should be easy.
    • This requires rational design up-front.
    • Easy means stateless.
    • Easy means drag-n-drop.
    • Easy means no post-install hooks.

The rest of this document should discuss how we quantitatively measure "easy", then discuss how we make things that qualify.

Spoilers:

  • relocatable binaries
  • static linking is an acceptable relocatability
  • ELF header relative links are too
  • the XORIGIN hack
  • PATH is still the monster in the shadows
  • sharing shared libraries with CA install paths
  • or not (and making as that transparent as possible)

Formulas

Formulas are one of the core API objects in the Timeless Stack. They're a Layer 1 object in the big picture.

Formulas are a description of a container, as a pure function: we list inputs (by hash, thus immutably), then describe a process, and list what paths in the filesystem we want to save as outputs.

What's in a Formula?

Formulas come in three parts:

{
    "inputs": {
        [... map of paths to WareIDs ...]
    },
    "action": {
        [... structure with commands and env ...]
    },
    "outputs": {
        [... set of paths we should save ...]
    }
}

Formula inputs and outputs are fairly straightforward; you can quickly get a grasp of what they're representing if you've already understood the rio command and the Layer 0 model of Wares.

Formula actions can be thought of as roughly a shell script that runs in a container which is populated with the filesets specified by the WareIDs in your input section. They also include many other fields:

  • exec command
  • environment vars
  • uid/gid for exec
  • working directory

More information about all the fields in an action, their details, and their defaults are covered in further sections:

(There's also a fourth section, called "context", which often accompanies a formula. The purpose of "context" is to carry around all the other incidental details we need to make things runnable, like URLs where we expect to be able to fetch the PackIDs of the inputs. But since this is, well, contextual to when and where we're evaluating the formula, we keep it separated.)

Defaults in Formulas

Repeatr goes through a fairly great length of work to make sure the default behavior for formulas is always roughly what you mean.

We also commit to a fairly stable definition of that, because implicit changes to what the blank spaces in a formula mean over time would cause majorly problematic behavior throughout the ecosystem (formula hashes would not change, but while semantics did -- not good).

So, here are some very -- even overly, boringly -- specific docs of what we mean by "good defaults", and why.

Opting Out

We'd be monsters if you couldn't disable these "helpful" defaults if you disagree with them.

Set formula.action.cradle to "disable" to skip out on anything that can be skipped. (What does that mean? Well, you can't have a UID set to null, so that default will still be computed. But all changes to the env var map will be skipped, all filesystem tweaks skipped, and the default cwd becomes plain '/'.)

Default Action & Command Environment

All of the optional fields in the formula.action declaration have defaults:

Working Directory (cwd)

The current working directory when the process is launched defaults to /task.

(This path will be created if it does not exist, and set to reasonable permissions if necessary -- skip on to the "Default Filesystem Setup" section of this doc for more detail.)

UID

The default UID is 1000.

Note that the UID has nothing to do with privilege levels (you may wish to read the Policy doc for more information about privilege levels).

GID

The default GID is 1000.

Username

The default username is "reuser", unless your UID is zero; if your UID is zero, the username defaults to "root" instead, which is probably what you expected.

The $USER environment variable will be set to this value, unless already explicitly set in the formula.action.env map.

Homedir

The default homedir is "/home/$USER" (as defined by the Username section, above -- e.g. the formula.action.env is not considered), unless your UID is zero; if your UID is zero, the homedir defaults to "/root" instead, which is probably what you expected.

The $HOME environment variable will be set to this value, unless already explicitly set in the formula.action.env map.

Path

The $PATH environment variable, unless otherwise specified, will always be set to:

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

(This is a conservative choice, given that most distros are moving towards a unified "/bin"; but here, being conservative has no downside.)

Hostname

The hostname (on executor engines that support setting hostname) will default to the execution ID, which is a random value.

Default Filesystem Setup

After stitching up your input filesets, Repeatr will also perform some small (but fairly essential) tweaks to the filesystem right before launching your commands. These are meant to make sure you have a minimum-viable / minimum-sane environment (even if your input filesystems shifted radically).

(All of these mutations, if made, will still preserve the mtimes of parent dirs, for your convenience and sanity if you intend to scan parts of the parent files into an output which preserves those properties.)

(As an edge case, these mutations will be skipped if the paths they would affect would end up outside of any mounts through to the host.)

Cwd

The cwd will be made, if it doesn't exist, and it will be readable and writable to the owner (e.g. bitwise |0700). The owner UID and GID will be set to the formula's UID and GID. All of the parent dirs will be made traversable to "everyone" (e.g. bitwise |0001) if they aren't already.

tl;dr: Your process should always be free to write in its own homedir.

Homedir

The homedir will be made, if it doesn't exist, and it will be readable and writable to the owner (e.g. bitwise |0700). The owner UID and GID will be set to the formula's UID and GID. All of the parent dirs will be made traversable to "everyone" (e.g. bitwise |0001) if they aren't already.

tl;dr: Your process should always be free to write in its own homedir.

Tempdir

The /tmp dir will be made, if it doesn't exist. The permissions will be forced to bitwise 01777 unconditionally. If the dir was made, the owner UID and GID will be 0 and 0.

tl;dr: Any process should be free to write the tempdir, and it should generally behave exactly how you expect a tempdir to behave.

/dev and /proc

Here be dragons.

Some container executors will force the creation and mounting of the /dev and /proc filesystems, and populate it with all of the magical wonderful interfaces to the kernel you might expect.

We make very little guarantees about what you may find under these paths. They are implementation (and host kernel...!) specific.

Policies for Formula Execution

Policy settings are how Formulas describe privilege levels. Internally, they translate to linux kernel "capabilities", but the formula policies concept is intentionally much less rich, and designed around the concept of safe and minimal defaults.

By default, executing a Formula will try to use at least as much isolation as a regular posix user account would provide: a non-zero UID and GID are assigned. Operating on files with other owners is a permissions error; etcetera. You can configure other policies to give your contained process more privileges.

Policy Levels

Policy settings are a short enum:

  • routine
  • governor
  • sysad

This list goes from lowest to highest privilege levels.

Roughly:

The "routine" policy is extremely safe; it has no escalated privileges; if files aren't owned by your UID you're not getting any special treatment; etc. This is the default if not overridden, and it should be enough for most daily work.

The "governor" policy is a bit like root on your host -- the process can read and write to any files, change its UID and GID, etc, as if superuser -- but it's still in containers, locked in a chroot, and reasonably safe. (Notably, you still cannot create device files.) If you can get away with "routine" mode, do so! If you need "governor", it's fairly safe. (Running a legacy distro package manager often requires more privileges than well-designed container-era tools should need, for example. Sometimes you can get away with merely action.uid=0, but in practice policy=governor is also often required.)

The "sysad" policy explicitly means giving the contained process enough privilege that it may be able to escalate to root on your host, reboot your machine, create and manipulate device files, etc. You could conceivably want to use this if you want containers for organizational purposes, but use tools in them which really do run administrative operations on your host.

Long story short: As always in security, use the lowest privilege levels you can get the job done with; and we've made those the default. You probably don't want to use the higher Policy settings if you can help it, and certainly under no circumstances should one ever use the "sysad" policy level when handling any untrusted filesets or executable content retrieved from the network.

CLI Documentation

The Timeless Stack ships as several individual binaries, separated by which layer they operate on:

  • rio -- Operations on filesets and wares
  • repeatr -- Evaluates formulas in containers
  • hitch -- Manages release databases
  • heft -- A Layer3+ pipeline planning tool

The rio Tool

rio -- an abbreviation for Repeatable Input/Output -- is the tool in the Timeless Stack which handles all packing, identification, unpacking, transport, and mirroring of filesystems and data.

rio is (sort of) comparable to the role of the venerable and ancient tar command: it specifies a way to pack and transport data. rio is also much more than tar, because rio also handles identifying data by hash -- we call this a WareID -- which lets us be clear about handling immutable snapshots of filesystems.

rio has a ton of different capabilities -- it can handle many different pack formats; as long as a consistent hash can be defined, rio can probably handle it. Most typically, we use rio with the "tar" packType, but there's also support for "git" (yes! git support is built in!), and support for more formats is welcome in the future.

rio abstracts the actual storage location from the identity of the data. The most obvious expression of this is that most of the rio commands can take the --source=<url> and --target=<url> arguments multiple times. rio also has native support for a wide variety of cloud storage systems in addition to using your local filesystem: AWS S3, GCP Cloud Storage, and local filesystems can all be used pretty much interchangeably, as well as HTTPS URLs for read-only modes.

CLI synopsis

rio unpack <packType:wareID> <dstPath> [--source=<url>...]
rio pack   <packType> <srcPath> [--target=<url>]
rio scan   <packType> --source=<url>
rio mirror <packType:wareID> --target=<url> [--source=<url>...]
  • rio pack takes files on your filesystem and packs them into a Ware (also uploading it to a warehouse, if one is specified).
  • rio unpack fetches a Ware by WareID, and unpack it into a Fileset on your local filesystem.
  • rio scan examines some existing data stream see if it's matches a pack format we recognize, and computes its WareID. This is useful for importing data made somewhere outside the Timeless Stack.
  • rio mirror replicates data to more storage warehouses.

The rio pack and rio unpack commands contain many flags for how to handle POSIX permission and ownership bits, as well as timestamps (which are discarded by default in pack operations, for reproducibility reasons). Check rio pack -h and rio unpack -h for more information on those options.

Repeatr

Repeatr — github.com/polydawn/repeatr — evaluates a Formula, producing a RunRecord.

Hitch

Hitch — github.com/polydawn/hitch — manages a filesystem database of Releases.

Hitch is used by Layer 3 Planners (like Heft, for example) as a source of information for which Wares might be used. WareIDs discovered by scanning or produced as results of executing a Formula can be added to new releases... as can entire sets of multi-step build instructions in Layer 2 format.

Heft

Heft — github.com/polydawn/heft — is a Layer 3 "planning" tool, which generates Layer 2 computation graphs for execution.

Heft uses info from release databases managed by Hitch to select which versions of Wares to use. The computation graphs Heft produces can be evaluated by Repeatr, and the results of this evaluation are more Wares (which can be put into a new release!).

Examples

These are the examples you are looking for :)

Glossary

Fileset

A Fileset is a term referring to set of files and directories, including some standard posix metadata. Roughly, you can consider it interchangeable with simply saying "directory". We give it a name in the Timeless Stack glossary just to speak about it unambiguously. A Fileset can be "packed" into a 'Ware'.


Ware

The "packed" form of a Fileset. Tarballs, git commits; many formats are defined. Wares are immutable and identified a by 'WareID'. We say that we "pack" a Fileset into a Ware, which results in a WareID; and we an "unpack" a Ware to produce a Fileset after fetching it by WareID.


WareID

The hash identifying a Ware. Holding a WareID gives you an immutable reference to a Ware (which you can unpack into a Fileset).


content-addressable

Describes the practice of identifying data based on its own content (rather than identifying it based on a name which conveys other meanings). Typically implemented by using a cryptographic hash over the content. Content-addressable systems are immutable.


Formula

An API structure describing a series of Wares, how to arrange them in a filesystem, some action to perform on them, and what parts of the filesystem to save as resultant Wares. Since Wares in a Formula are referred to by their content-addressable WareID, Formulas in turn are an immutable description of how to set up and run something. Repeatr evaluates a Formula to produce a RunRecord.