repeatr

Usage & Examples

Repeatr is a tool meant to compose well in a unix-y style. The first few examples will show using Repeatr Formulas directly; you should be able to copy-paste these and get running. Building pipelines uses a separate tool, Reppl. Later examples will cover using Reppl; these make things much simpler, remove the need to copy-paste hashes so much, and enables much more complex operations.

Repeatr basics

hello, world!

Let's start with a very simple formula: it takes a filesystem for "/" (commonly called a rootfs, in other systems), and says "hello".

inputs:
  "/":
    type: "tar"
    hash: "aLMH4qK1EdlPDavdhErOs0BPxqO0i6lUaeRE4DuUmnNMxhHtF56gkoeSulvwWNqT"
    silo: "http+ca://repeatr.s3.amazonaws.com/assets/"
action:
  command: ["echo", "hello"]

Paste this formula into a file, and then run it with `repeatr run thefile.formula`.

Nothing much to see here; just a deterministic, reliable deployment. It's a little verbose, but hang on. We'll improve that with Reppl later; and as the other examples will show, there's a reason to specify all these values.

hello, output!

Repeatr has a concept of "outputs": paths on the filesystem you want to save; these can then be used as inputs to other Repeatr formulas. Let's try that now:

inputs:
  "/":
    type: "tar"
    hash: "aLMH4qK1EdlPDavdhErOs0BPxqO0i6lUaeRE4DuUmnNMxhHtF56gkoeSulvwWNqT"
    silo: "http+ca://repeatr.s3.amazonaws.com/assets/"
action:
  command:
    - "/bin/bash"
    - "-c"
    - |
      mkdir -p /task/output/reproducible
      echo "Hello, World" > /task/output/reproducible/wow
outputs:
  "/task/output/reproducible":
    type: "tar"
    silo: "file+ca://./wares/"

Paste this formula into a file, run `mkdir -p ./wares/` so we have a place to save the outputs, and then run it with `repeatr run thefile.formula`.

Notice that this launched nearly instantaneously, because the input filesystem was already cached. You'll see a few more lines of logs at the end as well, when Repeatr reports on its progress in saving the outputs we configured it to keep.

The json output from the `repeatr run` command is now a little more interesting: the map now contains the name of the output, and a hash. If you run this formula repeatedly, you'll see that the output hash is always the same -- because the output filesystem itself is always the same!

If you look at the tar produced, you'll find two entries: the "./" directory, and a single file, "./wow". Note the lack of absolute path: if you use this in another formula, you specify whereever you want it to appear in the container filesystem.

hello, compile!

Repeatr allows you to compose multiple input filesystems. Compared to other container image formats, this means you have much more flexibility and control over what you bring in, how you update it, and what you ship for results. Let's look at a sample formula where we compile a simple golang program to demonstrate:

inputs:
  "/":
    type: "tar"
    hash: "aLMH4qK1EdlPDavdhErOs0BPxqO0i6lUaeRE4DuUmnNMxhHtF56gkoeSulvwWNqT"
    silo: "http+ca://repeatr.s3.amazonaws.com/assets/"
  "/app/go":
    type: "tar"
    hash: "UoY1amg4W8_JVQJ6tg6I4BQm1Mlw3ngT_kutZNr6XfFvvWAZfGrwDxDcQD2TzOVz"
    silo: "https://storage.googleapis.com/golang/go1.8.linux-amd64.tar.gz"
action:
  env:
    "GOROOT": "/app/go/go/"
  command:
    - bash
    - -c
    - |
      set -euo pipefail
      export PATH=$PATH:/app/go/go/bin/
      (cat < main.go
      mkdir /task/bin
      go build -o /task/bin/awesomeserver
outputs:
  "/task/bin":
    type: "tar"
    silo: "file+ca://./wares/"

(We cheat a little here, using a bash heredoc for more content than you probably should. Jump down to the next example to see how to use git to bring in sources instead!)

As before, paste this formula into a file, run `mkdir -p ./wares/` so we have a place to save the outputs, and then run it with `repeatr run thefile.formula`.

The first time this runs, the filesystem for /app/go must be downloaded, but if you've already run the previous examples, the filesystem for / is already cached and instantly available. There's no dependency between the two: you can update either input filesystem separately, and the other will remained cached correctly regardless. Also notice how we're able to use any upstream storage location we want! There are no special magic registries required; Repeatr is all just files.

The output in this case is also interesting. Like before, we have one output. Like before, (but perhaps somewhat surprisingly!) you'll notice that the output hash is always the same if we run the formula repeatedly! The golang compiler is deterministic! We can actually rely on this behavior, since we've pinned not only the exact rootfs image, but also the exact version of golang compiler. When we get to the Reppl examples and begin building pipelines, this theme will return: we can build caching systems on top of formulas which can skip any process with an already-known result. Lastly, notice that we're only packaging the compiled binary as an output: we're not bringing along the golang compiler, and we're certainly not bringing along the entire rootfs. This means we could build another formula which takes our output here as an input, and effectively ship new versions of our server that take up single megabytes instead of potentially hundreds of megabytes for a full linux image.

hello, git!

So far, we've been using type: "tar" for all of our inputs. However, that's not the only thing that Repeatr supports! Anything that can be named by a hash can be used as a Repeatr input. For example, the popular version control system git fits in nicely!

inputs:
  "/":
    type: "tar"
    hash: "aLMH4qK1EdlPDavdhErOs0BPxqO0i6lUaeRE4DuUmnNMxhHtF56gkoeSulvwWNqT"
    silo: "http+ca://repeatr.s3.amazonaws.com/assets/"
  "/task":
    type: "git"
    hash: "b271f7fa349d07d2092b43e38a450bdc605f2453"
    silo: "https://github.com/polydawn/repeatr.git"
action:
  command: ["/bin/true"] ## no-op; we're just showing off git inputs.
outputs:
  "/task/doc": ## export the repeatr docs from the source repo!
    type: "tar"
    silo: "file+ca://./wares/"

As before, paste this formula into a file, run `mkdir -p ./wares/` so we have a place to save the outputs, and then run it with `repeatr run thefile.formula`.

You know the drill: this gets both the inputs separately, caches them separately, assembles them, and launches your command. The result? A tarball with all the files from the doc/ dir in the Repeatr source repo.

repeating repeatr

Building Repeatr is a cool example because it brings together everything we've already demonstrated, and it's a simple, self-contained, deterministic process. The repeat-thyself script in the Repeatr source repo uses Repeatr to build Repeatr with no other dependencies; check it out!

Pipelines with Reppl

Reppl is a tool in the Repeatr ecosystem that manages hashes, making it possible to write formulas that update. Developers write formulas with names for the inputs, and Reppl automatically substitutes in the hashes. With Reppl, you can assemble whole pipelines, have many steps which independently update, and best of all, since Reppl can understand Repeatr hashes, making Reppl cache entire steps of the pipeline is both fast and guaranteeably correct.

Note that Reppl is not your only option here! In fact, Reppl is interfaces with Repeatr entirely by templating Formula files like we've already seen in the earlier examples. Reppl is one solution to the need for updating formulas, but if it doesn't tickle your fancy, it's entirely reasonable to build your own system on top of the primitives of Repeatr formulas.

a simple pipeline

Reppl is an imperative system for stringing together formulas. Strining together a series of Reppl commands in a bash script is common; this is what we'll demonstrate.

The most basic Reppl files look like this:

mkdir -p wares
reppl init
reppl put hash base  aLMH4qK1EdlPDavdhErOs0BPxqO0i6lUaeRE4DuUmnNMxhHtF56gkoeSulvwWNqT  --warehouse=http+ca://repeatr.s3.amazonaws.com/assets/
reppl put hash go    UoY1amg4W8_JVQJ6tg6I4BQm1Mlw3ngT_kutZNr6XfFvvWAZfGrwDxDcQD2TzOVz  --warehouse=https://storage.googleapis.com/golang/go1.8.linux-amd64.tar.gz
reppl eval step-A.frm
reppl eval step-B.frm
reppl eval step-G.frm
reppl unpack hellogopher debug/hellogopher

This is a demo from the Reppl source repo: see the rest of the files. You'll need to see the step-{A,B,G}.frm formula files themselves to see the way names are used to connect their outputs and inputs together.

This script will cause each of the three "eval" steps to be executed in their own containers. If one of them assigns a name to an output, after that formula is run, Reppl will save that name->hash mapping. If one of them later uses that name for an input, Reppl will substitute in the hash it remembers for that name.

Things really get interesting when you run the same script again: nothing happens. Reppl looks at each formula (with the input hashes pinned) before running it; if it's seen that exact setup before, it skips it. Thus, if you run the same set of steps again, and none of their inputs have changed, the whole thing will no-op itself out, and complete instantly.

The last step in this script is an "unpack": this places the latest version of the ware named "hellogopher" (in this example) in the "debug" dir. This is a convenient way to drop results of a pipeline back onto your host filesystem, composable with whatever steps you want to take next. Using Repeatr wares and content-addressible storage techniques for intermediates is still the way to go because it enables perfect caching and even parallel builds, but at the end of the day, getting the result on your $PATH in the simplest way possible is still important: a "reppl unpack" command gets it done without a fuss.

reppl gone wild

Since Reppl commands can be composed easily in bash, you can easily bind up other systems. For example, here is a (much) more complex use of Reppl, with eight formulas, and git (!) as an input.

reppl init
reppl put hash base        aLMH4qK1EdlPDavdhErOs0BPxqO0i6lUaeRE4DuUmnNMxhHtF56gkoeSulvwWNqT  --warehouse="http+ca://repeatr.s3.amazonaws.com/assets/"
reppl put hash go          UoY1amg4W8_JVQJ6tg6I4BQm1Mlw3ngT_kutZNr6XfFvvWAZfGrwDxDcQD2TzOVz  --warehouse="https://storage.googleapis.com/golang/go1.8.linux-amd64.tar.gz"
reppl put hash raceway-src "$(git rev-parse HEAD)"  --type=git  --warehouse="."
reppl eval ./radd/formulary/basis.formula
reppl eval ./radd/formulary/repeatr.formula
reppl eval ./radd/formulary/monotar.formula
reppl eval ./radd/formulary/monotar-test.formula
reppl eval ./radd/formulary/monotar-assembly.formula
reppl eval ./meta/build/raceway-linux-amd64.formula
reppl eval ./meta/build/raceway-darwin-amd64.formula
reppl put hash minikube-linux-amd64    FuU0FrKlNxIFVp37OsuYHazEMMPMZCiEilTLPYPMyjUYTYcIRnB5Ti5VOlE1VaLG  --kind=file  --warehouse="https://storage.googleapis.com/minikube/releases/v0.12.0/minikube-linux-amd64"
reppl put hash minikube-darwin-amd64   htQ5zinrGTPmgQ75Hgv7s0mOM8xRJU8M4Sjpwp5tKniEf2EEPZP92aE9yE_5sPCl  --kind=file  --warehouse="https://storage.googleapis.com/minikube/releases/v0.12.0/minikube-darwin-amd64"
reppl put hash minikube-iso            sYsWNgzuegzFnUuDifm5HGkcKlYD3dbEfBPl-uygYGsq-syGXkzlfjyAaQO9tWLu  --kind=file  --warehouse="https://storage.googleapis.com/minikube/minikube-0.7.iso"
reppl eval ./meta/build/shrinkwrap.formula

Notice how we were able to easily take a git commit hash from the surrounding environment in this example. Just pop in reppl put hash myproj-src "$(git rev-parse HEAD)" and you instantly have a build pipeline that will build your checked out commit! (Try this technique in a real repo, then switch back and forth between a couple of different branches to really see how awesome Repeatr caching and Reppl pipelines are!)

Ready, set...

You have read enough. Go build something wonderful!