Reproducible Resolve

Reproducible Resolve

Preface

It’s important to disambiguate {ware content access} vs {ware selection}. (Both of these very different operations are sometimes referred to as ‘resolving’ in other systems, which can be confusing.)

When working with hermetic computation, one important operation is going from a WareID to the content referred to by that ID.

When working with computation planning and coordination systems, one important operation is selecting which tools and which pieces of data (e.g. which WareIDs) one wants to work with.

These operations a very distinct.

What is “resolve”?

“Resolve” in the Timeless Stack is broken into two steps:

  • Translating names and version numbers into a WareID (in essence, translating a mutable reference to an immutable hash) is one step;
  • Taking an imprecise version range specifier, transitive dependencies description, or other kind of version resolution cue, and translating it into a specific version name (which can feed into the above process) is another step.

In any system which uses human-readable names, or in any system which has a concept of “updates”, we’ll end up with a concept of “resolve”.

In the Timeless Stack, we try to separate these two phases, because it’s easier to make them reproducible when handling them distinctly… but for both, we’ll see that input snapshotting is again a key part of a reproducibility solution.

What is “reproducible resolve”?

“Reproducible resolve” describes having a… heck, the term says it all, doesn’t it?… having a resolve process which is reproducible.

The key to attaining this is simply to identify “resolve” as an operation itself, and hold that computation to as high a standard as everything else. So: of course the Resolve operation has to be reproducible.

And then, it follows naturally, of course the Resolve operation must be handled using Hermetic Computation.

And thence it follows naturally that of course we’ll need an atomic snapshot of all of the inputs to the Resolve operation.

So: Reproducible Resolve is what happens when you take your resolve operation, have a clear mechanisms for hermetic evaluation of it, and a defined pathway for distributing snapshots of the data that resolve needs as inputs.

In the Timeless Stack, a key data structure for this is called a “Catalog”: you can read more about catalogs in the following pages:

The key concept of the Timeless Catalog is that it supplies a whole system snapshot: When we perform Resolve operations in the Timeless Stack tools – both the “version range to specific version” as well as the “name to hash” kind – all the metadata and all the inputs for Resolve processes must come from a Catalog. That Catalog used as the input to Resolve is is carefully defined to be snapshottable as well as easy to replicate, and it’s even possible to address it as a WareID. This makes Reproducible Resolve possible, and makes Timeless Stack tools reliable and predictable even when working with the normally-touchy subject of updates.

Why is this important?

Reproducible Resolve is part of the “Reproducible by Design”/“un-degrading” vision and the “Decentralized” goals.

Systems without reproducible resolve tend to be difficult to explain the state of at any date other than the immediate moment they were “updated”.

With reproducible resolve, we can always backtrack to the catalog snapshot we used during our last update: and using that snapshot, we can re-evaluate our resolve logic, watch what it does, and have a complete understanding of why our resolved system is in the state that it is.

Better updating

A system built around Reproducible Resolve naturally ends up with a very nice user experience during “updates”: users can fetch new information (a new Catalog, if you will), and then see what this information might recommend they do with their system; and then decide whether they want to make those changes or not.

Reproducible Resolve naturally makes for a very highly controllable process.

(Could we have skipped this whole thing by avoiding names in the first place?)

Yes… but also no.

It’s true that if you model the universe as a set of functions (functions for build; functions for compose system, etc), and you apply function composition a la f(a(b(), c(d(), e())), and all the function names are actually the operators themselves… then you can construct an awful lot of things without ever using names.

And to some extent, we can certainly do this in the Timeless Stack as well.

However, this model tends not to result in explainable “updates” in a way that’s very satisfying. Given such a tree of functions, it’s awfully hard to create clearly documented relationships (such as, say, where one version of a function might be newer and a preferred replacement to another).

In some scenarios (say, especially, a “monorepo” situation), having extremely limited options for reasoning about updates is fine. However, this does not scale very well. Especially, if one wants to approach in a distributed development mindset, one needs be able to communicate the concept of changes over time to other people, and do so in a way where the other parties can have some agency over how they adopt changes. In such a situation, the anonymous functions model just doesn’t do it.

(Some systems do use the monorepo approach. The Guix and Nix systems used this approach for some time. It’s not nonviable. It’s just not optimal, either, if one’s goal is decentralizing coordination of development.)

In short: introducing naming to the system is key to the “Decentralized” goal.

For more about how we compose naming with our reproducibility goals, check out the Design chapter on Catalogs.