How Journaling Can Transform Your Thought Patterns

High blood pressure is a notorious health problem. Since it’s a vascular condition, it can trigger a range of issues across the body, but what really makes it dangerous is that it has no outward signs or symptoms. If high blood pressure is not proactively detected and treated, it most likely will continue unnoticed until it causes a disaster.

This post is about semantics, and how semantic errors are the high blood pressure of the software industry.

Like many terms in computer science, “semantics” comes to us from linguistics, by way of logic. It refers to the rules which define how to interpret a message (as opposed to syntax, which are the rules that define what a valid message looks like.) Put loosely, the semantics of a piece of data are the meaning of that piece of data; when you draw conclusions about the state of the world (or the possible states of the world) based on that data, you are reasoning about that data’s semantics. 

You probably already have some intuition around what a semantic error feels like. It’s basically an “oops I made some assumptions about this data/API that aren’t really true” error. Common variants include: “oops I thought this method was idempotent but it’s not”, “oops I didn’t think users could be active in two teams at once but they can”, and “oops I didn’t think these events could happen in that order but they did”. Seasoned engineers tend to write few logic bugs (“the code doesn’t do what I intended it to do”), which means a lot of the bugs they do write are semantic bugs (“the code does what I intended it to do, but in the context of the larger system the thing I intended is wrong”.)This risk is especially high in larger projects, where there’s too much code to fit in any one person’s head, and developers often have to reason about models and interfaces owned by other teams.

In the hierarchy of things that can go wrong, semantic bugs are pretty bad, for a couple reasons:

  • They’re hard to design around. A lot of incident types have well-understood failure modes - usually[1] the end result is a node either crashing or becoming unreachable or unresponsive. This is undesirable, but it’s possible to design systems that recover gracefully from this kind of problem on their own. With semantic errors, crashing is the best case scenario. The worst case scenario is that the faulty component exhibits essentially undefined behavior; potentially lying to other components about the state of the world in a way that’s hard to detect. Bad behavior can spread through the system, causing it to corrupt data or take inappropriate action (when I worked at Gusto the worst bugs caused our app to file tax forms with bad information, leading to scary IRS notices for our customers.) Designing a system that’s robust to semantic bugs is really tough, and often involves checking multiple redundant sources of truth to make sure one isn’t feeding you garbage that happens to pass all your validations (this may not always be possible!)
  • They can be hard to catch with tests, if only for the reason that you’re apt to make the same faulty assumptions writing your test cases that you made writing your code.
  • They’re pernicious. Sometimes your wrong assumptions may only be wrong under very specific conditions, and you may be (un)lucky enough that it takes months for those conditions to occur in production. This can be maddening - when bugs occur directly after deploying a code change, you can at least point to a probable culprit, but when these “sleeper” bugs hit it feels like your happy, stable system keels over and dies for no reason. What’s worse, since so much has happened since the bug was checked in, it’s very difficult to figure out where and what it is, when it was deployed, and who wrote it - especially since everyone is almost certainly working on something completely different than they were a few months ago.

For most classes of bug, it’s advisable to find a balance between preventing incidents altogether and rapidly diagnosing/resolving them when they do occur. Semantic errors, however, are so costly and unpredictable that it’s almost always better to keep them from making it to production in the first place.

Fundamentally, semantic bugs are breakdowns in communication. Someone (maybe a past version of you!) designed some part of the system with a certain set of semantics in mind, and either those semantics were not conveyed to you, they were and you misunderstood them, or - most commonly - a combination of both. A lot of advice on preventing this boils down to “improve your general communication skills” (have explicit documentation, do regular spec reviews, write as much institutional knowledge down as you can, etc.) If you did all these things and semantic misalignment is no longer an issue for your org, that’s awesome! You can stop reading here (but you’re welcome to continue if you want.) Otherwise, press on, and we’ll look at some architectural techniques you can use to make your codebase naturally resistant to semantic bugs. Along the way we’ll make the case for this overarching design principle:

In general, the most effective way to preclude semantic errors is to give every team the flexibility to model the world in a way that makes sense to them.

I’ve done my best to lay this post out along a smooth gradient from the practical to the theoretical. While the earlier sections include some abstract discussion to set the stage and tie everything together, for the most part they provide tactical advice that you can deploy right now, with minimal disruption. Model Networks (which this post introduces) are motivated and described at around the midpoint, along with some of their immediate applications. The final sections investigate the effect Model Networks could have on the way we write software, and consider some best-case “moonshot” scenarios that assume sizable time scales and significant adoption. These are just thought experiments, but whenever possible they make concrete and specific predictions about what aspects of everyday software engineering might change and how.

If you’re mainly interested in ideas you can use today, you can stop reading after you finish Part 1. If you’d also like to hear some big-picture theories about what the future might look like, you should read this whole post!

Part 1: Architecting for Semantic Safety

Our goal, in a nutshell, is to ensure that when an engineer works with a piece of data or an API, they can be confident the assumptions they’re making about it are true (and they also have an accessible way to clear up anything they’re unsure about.) This is a moving target - we also need a way for people to announce when they change the semantics of something, and a way for folks downstream to identify and fix any resulting breakages. 

We could always make a bunch of documents that contain all the context on our program that anybody could need, but we’re looking for an architectural solution - a way to structure our code so that it’s easy for developers to avoid misunderstandings in the first place. Where do we start, though? What parts of our program should we be focusing on?

I contend that we should focus on the Model Layer. To build software in a semantically safe way, we need to be very intentional about the way we do our data modeling, because in most programs models function as semantic repositories. I’ll explain what I mean:

Models Are Semantic Repositories

Did you know that the Navy has its own miniature ocean? Well, relatively miniature: it’s as big as six olympic-sized swimming pools put together and holds twelve million gallons of water. When the Navy wants to test a new ship design, they’ll build a tiny version of it and place it in the pool, which has sophisticated actuators that can recreate any wave pattern you might find in the real ocean. Then they’ll study what happens.

The Navy’s mini-ocean (taken together with the mini-ships that brave it) is an example of a model. A model captures some targeted features of reality (in this case, the behavior of certain ships in various wave patterns) while leaving out features that are irrelevant or impractical (the colossal size, weight, and cost of a real-life battleship.) It’s not always necessary to use models to study the world like this, but when you can it’s usually advisable. For example, the Navy could simply build their ships, drop them in the ocean, and observe the results (the naval equivalent of testing in production) - but this would be a costlier and riskier way to get the same information their ocean model could have given them ahead of time. 

Every model is a simplified version of the real world, and therefore inherently lossy (to paraphrase George Box: “All models are wrong, but some are useful.”) On the flip side, this same simplicity can make models straightforward to study and reason about. The key to constructing an effective model is not to make it perfectly faithful to reality, but to make it faithful only in the ways that are important to you.

Models exist in software too, and unlike physical models they’re never optional: there is no way to embed real-life objects into your code (despite what Tron may suggest), so if a program needs to know about or manipulate some aspect of the world, it must create a model that captures it.

I’m using a fairly broad conception of a model in this post, under which models describe not only the shape of objects, but what they can do and what you can do with them (i.e. they include “verbs” in addition to “nouns”.) This blurs the line a bit between models and interfaces - note that under this definition you could loosely consider APIs public-facing models.

Part of effective software design is knowing what you want your models to represent and how - you usually have a surprising number of options to choose from. For example, let’s say you were at a company that needed to keep track of billing events. Here are two simple models you could use (notated as Typescript data types - we’ll assume a pseudo-ORM paradigm where the types correspond to SQL tables); let’s say these belong to Service 1 and Service 2:

// Service 1’s Models:

enum EventType = {
    Click,
    Impression,
    Billable Hour,
    Data Transfer,
    API Call
    // others...
}

enum PayeeType = {
    Individual,
    Business,
    Affiliate
}

type BillingEventCounter1 = {
    windowStartTime: Date,
    type: EventType,
    totalHourlyCount: number, //non-negative
    customerId: number,
    payeeId: number, 
    payeeType: PayeeType, 
    eventSources: Array<EventSourceJoin>
}

type BillingRates = {
   type: EventType,
   effectiveStartDate: Date,
   effectiveEndDate: Date
}

type EventSourceJoin = {
    billingEventTime: Date,
    billingEventType: EventType,
    customerId: number,
    payeeId: number,
    sourceId: number
}

//Service 2’s Model: 

type BillingEventCounter2 = {
    counterId: number,
    startTime: Date,
    durationMs: number,
    sourceIds: Array<number>,
    customerId: number,
    billerId: number,

    clickCount: number,
    impressionCount: number,
    viewCount: number,
    engagementCount: number,

    clickBillingRate: number,
    impressionBillingRate: number,
    viewBillingRate: number,
    engagementBillingRate: number
}

We can notice some things about these models:

  • BillingEventCounter2 has a duration field, and BillingEventCounter1 doesn’t. BillingEventCounter2 has a field called totalHourlyCount, so from that we can guess that BillingEventCounter1 always covers a period of an hour, while the period that BillingEventCounter2 covers is adjustable. The second approach is more flexible and allows us to adjust our measurement granularity dynamically, without having to make any changes to the model. The first approach, meanwhile, can be more expedient in some cases - knowing that each event always covers an hour may help to simplify downstream calculations.
  • The type of the event is specified in a field in BillingEventCounter1. BillingEventCounter2 instead has hardcoded fields for four different types of events. By using the second approach we can track multiple types of events with a single object, but we can’t add or remove any new event types without changing the underlying table structure. The second approach is better suited for cases where we know that the possible types of billing events are pretty much set in stone.
  • payeeType is specified in BillingEventCounter1 (i.e. it’s a polymorphic association), but not in BillingEventCounter2. From this we can assume that Service 1 has different models for different types of payees and Service 2 probably doesn’t care about those distinctions (it’s also possible that Service 2 just models them in a different way - for example, with a single model for all payees that has a payeeType field on it.)
  • Billing rates are specified in a separate table in the first case, and are inlined into event counters in the second case (in the table this would be something like a JSON column.) The design of BillingEventCounter2 suggests a software architecture where the current billing rate is injected into the event counter when the counter is persisted. Service 1’s approach requires some fairly complicated logic to calculate the total amount billed over a certain period or periods, but it’s also a little safer and more flexible than Service 2’s approach. For example, back-dating a change in the billing rate for Service 2 requires manually updating a lot of event counters, which, depending on the scale, may not be practical - meanwhile for Service 1, we just have to update a single row in the BillingRates table. Service 2 may also be susceptible to race conditions and timing bugs that can skew the billing calculations. If we know for a fact billing rates will never change after the fact, and we can tolerate possible errors in our rate calculations, Service 2’s approach could save some time - but otherwise Service 1’s approach is probably better.
  • BillingEventCounter1 and BillingEventCounter2 can both reference multiple sources, but BillingEventCounter1 uses a join table and BillingEventCounter2 has a simple array of IDs (in this case it’d be stored in something like a JSON column.) This may seem like a matter of taste, but details like this can offer clues about how the rest of the app might be designed. For example, since the EventSourceJoin table is decoupled from BillingEventCounter1, Service 1 is probably more likely than Service 2 to have a way to get all of the events for a particular source.
  • BillingEventCounter2 has a single numeric primary key, which is probably generated when the event is stored. BillingEventCounter1 has a compound primary key that consists of the event timestamp, event type, payee ID, and customer ID.

To emphasize - neither of these models is necessarily better or worse than the other! They are both suited to different contexts, different applications, and different requirements.

As I mentioned before, models inherently carry a lot of semantic information with them; some of this is conveyed via what details the modeler chose to include and how they’re structured, some of it is conveyed via language level constructs (types, validations, required fields), and a lot of it is conveyed informally via comments (in the ideal case - in the non-ideal case it just lives in the heads of developers!) 

For example, here are some of the things BillingEventCounter1 communicates to us:

  • As we pointed out before, hourlyCount tells us that each BillingEventCounter1 aggregates events over a period of one hour. Ideally in a real project there would be a comment confirming this for us as well.
  • We can see that there is a many-to-many relationship between event counters and sources, which implies that an event counter can aggregate events from multiple sources. There does not appear to be a way to tell how many events came from each source.
  • We can tell that the EventSourceJoin object joins together BillingEventCounter1 with an unlisted Source object, and we’d expect it to contain primary keys for both of the objects it joins together. sourceId is clearly the primary key for the Source object; all the other fields on EventSourceJoin are also present on BillingEventCounter1. This suggests that BillingEventCounter1 is uniquely identified by:
    • The timestamp of the counter
    • The event type the counter is tracking
    • The customerID
    • The payee ID
Notably, the payee type is not a primary key of BillingEventCounter1. Among other things, this suggest that payees with different types will always have different IDs.
  • There’s a comment that tells us that hourlyCount cannot be negative; from this we can assume that it isn’t possible to correct billing event counts by making a new counter with a negative hourlyCount (it also suggests, but does not prove, that count corrections are not supported in general.) Note we can also infer this from the primary keys of BillingEventCounter1, which imply that you cannot have two counters for the same hour, event type, customer, and payee.

In practice, developers tend to treat models as a system of record for the semantics of their data and operations. When trying to figure out what a piece of software does or means, studying the models is usually a great place to start.[2]

We want engineers to have the context they need to make the right assumptions when building software, which means we need to do our modeling in a way that effectively communicates semantic information to stakeholders across the organization while leaving minimal room for misinterpretation. Let’s look at some options!

Model Management Strategy 1: The Null Strategy

The Null Strategy represents an absence of technical strategy around model management. Organizations using the Null Strategy don’t have a consistent way to organize their models; they may not think of model management as a priority, or even realize that it’s something they could prioritize. They may not have a concept of semantic errors as a separate class of bug from everything else, and if they do they probably don’t have strong opinions on how to prevent them, apart from generally trying to keep everyone “on the same page.” Teams create and update models as they need them, but there’s no formal notion of ownership.

Before I say anything more, I want to make it clear that the Null Strategy isn’t always the worst strategy. In fact, before you hit a certain number of engineers it’s probably the best strategy - I think really drilling down on your models (beyond common-sense things like getting your types and associations right) pays off when you start forming truly distinct teams with formalized interactions between them. Focusing on model management and model design when your whole system still fits in a single person’s head is like starting with microservices from day one - the cost/benefit ratio is probably going to be very high.

There’s another fairly special case where the Null Strategy can be preferable even in the long-term, and that’s when your system is complex, but your semantics aren’t. This occurs in cases where your engineering challenges are about handling logistics for gigantic amounts of relatively simple data (e.g. social networks, adtech). Model management pays off the most in the opposite situation, where you have a smaller amount of more sophisticated data (Jack Danger calls this “Small Data Engineering”.)

Despite these special cases, at a certain size and domain complexity the Null Strategy usually starts to break down pretty quickly. There are a few signs this is happening:

  • Your models start filling up with stuff that isn’t obvious. It’s kind of subjective what this means, of course, but you’re past the point where anyone on the team could crack open any model and figure out how to use it just from context and a general understanding of how the app works. Silos of domain knowledge are starting to form in your org, and there are things in the models that are hard to understand if you weren’t in the room when they were being spec’d out. The amount and quality of documentation varies a lot. People are getting nervous about using fields and entities they didn’t add themselves, which is leading to some redundancy and bloat.
  • You have a “tragedy of the commons” situation with your larger models (this is a classic symptom.) This tends to happen with models that are nominally owned by one team, but in practice used by everyone (things like Company, Customer, Document, etc.) Everyone is adding their own stuff to these “mega-models”, so they’re huge, but no one is equipped to curate them, so it’s not apparent who added what or why (this means if you’re confused about something, there’s not a good way to find out who to ask beyond “git blame”.) It’s also very hard to tell who depends on different parts of the models, and because dependencies can’t really be audited, they aren’t controlled. People add dependencies whenever it’s convenient, which means that everyone ends up depending on everything, big-ball-of-mud style. This is causing your mega-models to ossify, because changing them usually entails a nasty product-wide refactor.
  • There’s a lot of data munging going on in random places. Engineers are frequently having to deal with data that’s awkward for them to work with because it was structured by a different team for a different use case; in the absence of any formal policy they’re defaulting to the most sensible and expedient option, which is to massage the data into a more palatable format at the exact point they need it, either at the top of the file or function they’re working on, or in the nearest util package. One-off transformations are starting to pile up around the codebase (and there’s a lot of duplicate logic.) There’s so much of this extra machinery, in fact, that it’s beginning to have a small but material effect on development speed.
  • The general effect of all of this is that things just get slower, sometimes because changes take more mechanical effort, but often because developers are afraid to use a model or API until they can track down the person who wrote it and confirm that the way they think things work is the way things actually work. The really bad part of all this is that, despite the caution, bugs are still slipping through the cracks and causing meltdowns in production. It’s kind of the worst of both worlds.

Okay, so we’ve decided the Null Strategy isn’t working and we need to start paying more  attention to our models. Now what?

Model Management Strategy 2: Centralize Everything

When any horizontal aspect of a software org feels like it’s getting too chaotic, the classic playbook is to centralize it by splitting it out into its own service and/or team (it’s also common to use vendors[3] for things that aren’t core competencies.) For example, you’re probably familiar with centralized auth platforms like Stytch or Auth0, centralized secrets management platforms like Vault, and centralized error tracking & telemetry platforms like Sentry (your org may also have a bunch of internally-developed services that are more specific to your needs, especially if you have a Platform or Developer Experience team.)

Centralization is popular because a lot of the time it’s really effective. Some benefits of centralization are:

  • Standardization: keeping everyone on a single platform dramatically drives down the amount of technical volatility in the org. If ten teams are using ten different platforms to handle a particular concern, that’s ten upgrade paths to consider, ten times the amount of potential bugs and vulnerabilities, and ten different feature sets to design around - plus, engineers who want to move to a different team will have to learn a new platform. Aligning everyone to a single system can be a huge efficiency win.
  • Auditability/Transparency: as you grow it becomes important to be able to see what’s going on in your org, not only for compliance reasons, but also just to get an idea of what’s happening so you can dig into issues, see what’s working, and identify areas for improvement (think of this as “organizational telemetry”.) It can also be useful for frontline employees to have a single place to look when they’re searching for e.g. where to put secrets; that way they don’t have to dig through a bunch of code/wikis.
  • Compliance: there are certain administrative tasks that every org of a certain size needs to do for security reasons - things like (de)provisioning, scoping, permissions, and audit trails. Having one centralized platform means the administrator in charge of these things only has to deal with one system. Plus, it’s easy to ensure that a single system (as opposed to ten) has all the features you need to stay compliant.

The second model management strategy we’ll consider is a straightforward application of this centralization principle. Organizations that follow this strategy typically have a “universal model” stored in a single place and intended for use by the whole company. This model is extensively documented (sometimes even with tags for things like PII/PHI) and serves as a system of record for a single set of semantics that everyone shares. It’s intended to prevent semantic errors by acting as a “common language” - think of it as the Esperanto of data modeling. 

Sometimes there’s a team responsible for curating the model and updating it as needed in response to client requests. Other times there are guidelines on how to make changes, perhaps paired with a blocking review process. At minimum, someone somewhere is accountable for making sure the model stays clean. Centralized modeling can feel like a natural fit when you’re developing a monolith (I’ve also heard of people using it in conjunction with microservices.) Data teams also often implement this pattern on top of a data warehouse or data fabric.

As it happens, I was one of the founding members of Gusto’s data team, and for a while complete centralization was the goal behind our first data warehouse. I can attest from personal experience that this kind of vision is very difficult to pull off, for a number of reasons:

  • It’s easy to sell the concept of a universal model; nobody likes ambiguity and no one will second-guess you if you tell them you can make that go away. Selling the reality is different, because standardization requires a sustained commitment from the whole company before it really starts to work. This puts every team in a difficult kind of prisoner’s dilemma, where they can prioritize investing in the central model (over other pressing concerns!) in exchange for some theoretical future payoff, but if other teams flake out and the project fails they’ll have wasted a lot of time. Successfully executing a centralized strategy requires a lot of political capital - enough to get a critical mass of teams enthusiastically onboard. Failing that, you’re going to be processing everyone else’s exhaust - teams will make decisions and then (maybe) tell you about it, and you’ll have to react as fast as you can. This is not a good place to be.
  • Even with a central data model, teams are reflexively going to make incorrect assumptions about the data. Your product team is going to think “Monthly Active Users” means one thing and your finance team is going to think it means another. Your growth team will think “onboarding_completed” means one thing and your risk team will think it means another. This isn’t the teams’ fault: they have a lot of work to get done and simply don’t have the time to pore over your meticulously crafted docs every time they want to write a query; on top of that they’re having to constantly fight the instinctual urge to rely on the model they have in their heads (which is built up from years of domain experience) instead of the central model they’re supposed to be using. At some point, even the most diligent team will get rushed and distracted, and then they’re going to make mistakes.
  • You can’t realistically enforce that all the code, reports, etc in the company always stay locked to the central model. I mean, you can, if you’re okay with adding huge amounts of drag and tacking on another stakeholder to every single spec review. But seriously, you can’t - you can’t ask everyone to do a refactor whenever you want to make changes, and when someone wants to build something you can’t slam on the brakes to ruminate on how their new models fit into the company-wide picture. If you try, you’ll probably just get left behind (see point #1).
  • It’s hard to build a data catalog that gives users an idea of what’s available without degenerating into an intimidating mass of columns. As mentioned, people can’t afford to spend that much time or energy leafing through your docs; if they feel overwhelmed they’ll just wing it, or they’ll ask you to guide them (the latter happened often at Gusto; we were happy to help when we could, but we did not have much manpower.) The best mitigation I’ve found for this is, ironically, to have one schema per team - but that’s precisely what we were trying to avoid by centralizing!

All this said, if you feel like you’re in a position where you can pull centralization off, then go for it! Done under the right conditions it can work really well - but in most cases there are simply too many organizational forces working against it for it to be tenable. I think of it more as a specialty approach.

So much for a centralized strategy, then. Let’s look at the other end of the spectrum.

Model Management Strategy 2.5: Team-Based Models

In certain respects the pendulum has been swinging away from centralized systems for a while, leading to popular decentralized approaches that let individual teams keep control over their own technology decisions in the name of moving fast. Probably the most famous example is the microservices architecture, in which each team owns and operates a tiny, discrete service, complete with a separate repo and a separate deployment process. Teams are not bound by the architectural choices of their peers; they are free to use the stack that works best for them. The closest analogue in the data realm is the data mesh architecture. 

Two points to make about decentralized approaches:

  • They’re not completely decentralized. Complete decentralization would amount to everybody doing whatever they wanted, which is just the Null Strategy. Instead, there is usually some infrastructure that acts as “connective tissue”, centralizing certain aspects of the architecture. Microservices, for example, are typically built on top of a centralized orchestration platform like Kubernetes, and often use a service mesh like Istio to standardize aspects of intra-service communication like TLS, load balancing, and rate limiting.
  • There’s usually some flexibility in what axis you choose to decentralize on and how. The “pure” form of microservices has services all running in their own containers and all separated by a network boundary. There are alternative formulations where the codebase is actually still a monolith and the “services” are components in a single process, and of course there are a whole spectrum of approaches in-between. People have found success with all of these - the important thing is to decompose your codebase into well-defined modules (each owned by a single team), and to have explicit boundaries and contracts between them, whatever those may be. 

The decentralized form of model management involves breaking up your domain model and having different teams own the different parts. This can also involve decomposing individual models into specialized components, especially if you have the kind of huge “mega models” discussed in Section 1. The team that owns a particular model is responsible for making it clean and legible, both to themselves and to the rest of the org. 

In the microservices approach, each service is responsible for storing the data for the parts of the model it owns in its own database and allowing other services to retrieve that data via an API. Most of the time this practice of model splitting is thought of as an implicit part of moving to microservices, but it’s also called out in Domain-Driven Design via the notion of “Bounded Contexts.”

If you’re a fan of Domain-Driven Design, you’ve probably been waiting for me to bring it up since the moment I started talking about models and semantics (and if you don’t know what Domain-Driven Design is don’t worry about it.) This post indeed references concepts that are part of DDD (and I’ll point those references out), but I’m agnostic when it comes to the framework as a whole. You can take the techniques discussed here and apply them in the context of a DDD deployment, or you can just use them on their own!

You’ll notice I’m calling this strategy 2.5 and not strategy 3. That’s not because I think it’s bad! On the contrary, it addresses a lot of the issues we had with the Null Strategy while dodging the pitfalls of centralization. I do think it’s incomplete, though, since it still requires developers to architect their software around models that are owned and designed by other teams, which leaves the door open for mistakes and misunderstandings. In the next section we’ll look at a straightforward way to address this that’s easy to layer on if you’re already using team-based models.

Model Management Strategy 3: Point-to-Point Translations

I’ve suggested several times so far that it’s dangerous for one team to directly reference another team’s models. This might seem like an odd claim to make. How else would one consume data from another team? And who better to model a dataset than the team that owns it?

To understand where the potential problems lie, we need to back up a bit. This article started with a brief introduction to models. We talked about both physical models and code-based models, but there’s another category I didn’t mention, which is mental models. These are pretty straightforward: human brains can’t perceive all facets of reality directly, so we create informal representations in our heads that we can reference and manipulate, both to make sense of the present and to try and predict the future. Like other kinds of models, mental models are imperfect proxies of reality, and the “proper” choice of details to include or omit depends a lot on context. For example, if you’re thinking about where to pick up groceries on the way home from work, the “grocery store” model in your head probably has location information and some coarse information about food selection (e.g. “is this a specialty store; does this store have my favorite snack”, etc.) On the other hand, if you run a fancy restaurant and you’re thinking through suppliers, your “grocery store” mental model probably has selection information, but also details on quality, price, reliability, and any preexisting business relationships with the owners.

We as software developers focus a lot on data models, but we also rely heavily on mental models (usually semi-consciously) to represent and reason about our users and their data[4] - especially in the ideation stage, before we’ve written any code. When an engineer first conceptualizes a feature, they start by thinking (for example)  “okay, to do this I need to get the status and total runtime of each of the logged-in user’s jobs and wire it in to the dashboard“ - they don’t think “I need to pull currentUser and traverse the jobs association; then for every job I need to look at the tasks array, and for every task I need to look at the taskStatusInfo object and pull runTime and status, then I need to subtract idleTime from runTime in each case…” They do have to eventually do all of these things, of course, but nobody uses literal data models in the brainstorming phase - mental models are simply more convenient tools of thought (unless you have a photographic memory, but I’m going to argue that’s a special case.) A big part of software development is taking your abstract plan that uses mental models and figuring out how to rewrite it in terms of the models you have in your codebase; depending on your workflow, this could happen during specing, in real-time when writing code, or some combination of both. This “model transcription” step isn’t often talked about, but it should be, because it’s a surprisingly tricky process - and a major vector for semantic bugs.

When we look at the whole picture - considering the mental models involved as well as the data models - we can more clearly see the risk involved in sharing models across teams.


Imagine this situation: the Catalog Team and the Inventory Team are both teams at a Vertical SaaS startup that makes software to manage bookstores. The Catalog Team owns a catalog display service that allows bookstores to put their catalogs online. The Inventory Team owns an inventory tracking service that allows bookstores to keep track of what books they have in their warehouse. Let’s say that since the Inventory Service serves as the “system of record” for books, the Inventory Team owns a Book model that many teams (including the Catalog Team) reference in their codebase. There are various problems that can arise from the impedance mismatch between the Catalog Team’s mental model of a book and the Inventory Team’s actual Book model. Let’s look at some failure modes:

  • Failure Mode #1 (the Catalog Team and the Inventory Team have two concepts that have the same name but mean different things): The Inventory Team’s Book model has a rating field that ranges from 1 to 5. This field represents the quality of the book - the quality of the cover, binding, etc. It’s intended to be used by the bookseller to track average book quality across shipments - if a supplier is delivering poor-quality books, that supplier can be flagged and potentially replaced.

    The Catalog Team, meanwhile, doesn’t have a concept of physical quality like the Inventory Team does. The term ‘rating’ exists in their mental model of a book, but it refers to the reader-generated rating of a book.

    Let’s say that the Catalog Team currently cross-references Amazon to get the ratings for the books in the catalog, but there’s a new engineer on the team that doesn’t know that. One of his first big features is to add search to the catalog, indexing books by title, description, and rating (this guy’s pretty senior, so everyone just trusts him to figure it out.) The first thing he does is look at the Book model for the fields he needs to index; he finds title, description, and yes - rating. “Cool”, he thinks, “I’ve got everything I need.” (The Inventory Team didn’t document rating especially well because they felt it was self-evident. Remember, in their mental model of the world, “rating” only means a single thing.) After a quick once-over by a tired reviewer, our developer’s bug slips into production. Books are now being indexed by their physical quality, not their user-generated rating. Because books with high physical quality are more likely to be highly rated by readers, it takes a long time for anyone to notice that the ratings of books on the search page don’t always make sense.
  • Failure Mode #2 (the Catalog Team has a concept that the Inventory Team doesn’t have): The Catalog Team has a concept in their mental model of ‘in-stock’ copies of a book - the number of in-stock copies is the number of remaining copies that the customer is able to order. The Catalog uses this field to display a warning that says ‘only <x> copies left!’ when the number of in-stock copies is low.

    The Inventory Team doesn’t have a concept of ‘in-stock’ copies. The closest thing it has is a field called quantity - but quantity only tracks how many books are physically in the warehouse. This could be strictly greater than the number of in-stock books if, for example, some of the books in the warehouse have been pre-ordered, or are part of a shipment that hasn’t gone out yet. To calculate the total number of in-stock copies, you’d have to pull information both from the Inventory Service, and from whatever service tracks order information (i.e. the Order Service).

    This puts the Catalog Team in a bit of a bind: they need to make it obvious that quantity does not track in-stock quantities, because if anyone on the Catalog team mixes those two up it could result in really bad outcomes, like someone being allowed to place an order for a book that wasn’t actually available to order. One solution would be to add an inStock field to the Book model, but the Inventory Team doesn’t like that solution because they don’t want to pollute their model (on its own, the Inventory Service doesn’t have a reason to track in-stock copies; it only cares about what comes into and leaves the warehouse). 

    Since the Catalog Team isn’t able to modify the Book model, they need to have extra machinery on their end to pull from the Order service, and then calculate the inStock field themselves. Then, they have to find a way to clearly communicate to everyone that they should not refer to the Book model for this particular field. This ends up being a pretty awkward situation.
  • Failure Mode #3 (the Inventory Team has a concept that the Catalog Team doesn’t have): The Inventory Team has a title field on its Book model, but it also has separate optional fields for various metadata about the book - things like the edition of the book, the series that the book belongs to, and whether the book has been signed. On the Catalog side, all of this metadata is dynamically embedded in the title when the book is displayed, so a full title looks something like “A Wrinkle In Time, First Edition Hardcover (Book 1 of the Time Quintet) [Signed by the Author!]” What this means is that there are no dedicated spaces for metadata in the catalog; for better or for worse, in the Catalog Team’s mental model, a book doesn’t have an edition, etc. - it only has a title.

    Now, let’s say that someone on the Catalog Team has a ticket to build out a “New Arrivals” banner on the top of the catalog homepage. This person sketches out a feature in their head using their mental models - they need to make some new space on the homepage, take the top n books ordered by arrival date, and then, for each book, display the book’s picture, author, and title. They build and deploy their feature, which pulls all of the information it needs from the Inventory Service - including the title field. A couple weeks later, bookstore owners write in to complain that the exciting new Special Edition they just ordered is showing up in New Arrivals, but the title makes it look like a normal book. Our developer forgot to embed the proper metadata in the book titles, because that metadata wasn’t part of their mental model.

The moral of the story: even a clean, well-architected data model can be hard to use if it doesn’t line up with the mental model of the world that lives in the user’s head.


Now that we’ve covered the problem in detail, let’s think about a solution. Two sections ago I compared centralized models to Esperanto, since they’re intended to be a “common language” for your company. Esperanto has a lot of passionate advocates and is perhaps the most successful constructed language in history, but it has not achieved its stated goal of becoming a universal standard of communication[5], and probably never will.[6] There are no truly universal common languages - the closest thing we have is English, but even English speakers (native and otherwise) comprise just under one-fifth of the global population. The historical way we’ve worked around this is to use lots and lots of translators - special individuals that are proficient enough in two or more languages to cleanly port the semantics of a piece of text between them. 

The translation approach works because: 

  • For each individual person involved, the scope of the problem is small enough to be tractable. No one - not even the most prodigious polyglot - knows the ins and outs of every language on Earth. On the other hand, most people have at minimum a lot of instinctual knowledge about their native tongue, and while developing expertise in a new language takes patience and dedication, it’s certainly achievable (especially if you have a chance to practice every day.) 
  • Nobody is forced to learn and use a second language except for the translators, who are in the vast minority. The vision behind Esperanto was for everyone to use it as a second language, which is simply too large a project to be feasible.

The “native language”[7] of an engineering team is the model of the world they keep in their heads. This is the model they use to think and talk about their domain, and it may or may not align with the model they use in their code.

It’s a tall order to ask anyone to be fluent in the language of every team at the company, but we know that every team is an expert in their own language, because they use it every day! This is the foundation of the translation-based approach to model management.[8] The strategy is:

  • Every team owns their own model layer. The data models in a team’s model layer should align with that team’s mental models as much as possible. It should also cover all of the data that team uses - not just the data the team owns!
  • When two teams want to share data, they should work together to craft an appropriate  translation function between their two model layers. That way when data moves between them it can be correctly transformed to fit the receiving team’s mental model (this technique can also be used when the receiver needs to stitch together data from multiple sources).
  • It’s fine for a team to directly import data models wholesale, as long as those models don’t clash with the team’s mental models.

Using this approach, teams can still write services/components/etc. that integrate cleanly, but everyone gets to develop using models that align with their intuition! 

Note that the point of this approach is to have one Model Layer per “domain”, and to bridge the gap between the Model Layers by writing translations. The implementation details are flexible! The Model Layers in question here could all live in individual microservices, but they could also correspond to groups of microservices, or they could live in different components that are all part of a single monolith.

Let’s look at a practical example:


Imagine an org that writes software for the operations team of a major commercial brewer. A big part of the job is monitoring the brewer’s giant fleet of fermentation tanks, which are all equipped with “smart gauges” that monitor pressure in real time and send it to the ops platform. We’ll be focusing on two services:

  • The first is a unified configuration service that is responsible for managing sensor configuration in a single place and pushing config changes to the individual sensors. The brewer has been upgrading sensors as they wear out, and different sensor models work better with different types of tanks, so the fleet has a wide variety of different sensors that all need to be configured in subtly different ways. Most of the sensors measure using psi, but a few of them - especially the older ones - use different units, like bar, kPa, etc.
  • The second is an alerting and visualization service that ingests time-series sensor data, displays it, and alerts when pressure is too high or low.

For ease of use, the configuration service allows users to set alerting thresholds for different classes of sensor and tank. This is represented using a threshold field. To avoid any confusion, the alert for a sensor is represented in the units that the sensor itself uses. The unit of the threshold is kept in a separate sensorUnit field.

On the other hand, the alerting service converts everything to psi internally. Alert thresholds are also converted to psi after they’re pulled from the configuration service. Everything is displayed to users in terms of psi.

Let’s say that some of the sensors start drifting a little over time, so the configuration service adds driftRate and driftStart fields to compensate until the faulty sensors can be replaced. A developer on the alerting team gets a ticket to use these fields to correct both the visualization and the alerting for any drift. driftStart is a timestamp, and driftRate is represented in the same units as threshold (i.e. sensorUnit). 

Now, let’s think about our developer’s mental model. Every time she personally deals with sensor data, it’s in psi. In fact, the vast majority of the time she doesn’t really have to think about units at all; she’s more concerned with making sure the calculations are right and that users are reliably notified of problems. Accordingly, she does a great job factoring in drift when calculating whether an alert has been tripped…but it slips her mind to convert the drift to psi. Her team has integration tests to catch these sorts of things, but she updates the test fixtures using the same faulty assumption that drift is always in psi. Her reviewers (who are also on the alerting team) fail to notice this, because, like her, they are mainly concerned with getting the logic right when applying the drift.

The reviewers sign off, the feature is shipped, and a few days pass without any issues. Everyone breathes a sigh of relief and starts to focus on other things. Six weeks later a call comes in from HQ.

It turns out a faulty batch of sensors was attached to a faulty batch of tanks. The sensors, which measured pressure in bar, were drifting downwards, while the pressure in the tanks was gradually drifting up. One bar is roughly 14 psi, and since the alerting system wasn’t doing any unit conversions, it thought that the drift on the sensors was 1/14th of what it actually was. This meant that it wasn’t correcting for the drift nearly as much as it should have been, which meant that it thought the drift-corrected pressure readings on the sensors were much lower than they were. This means that, when the pressure in the tanks climbed far above the alerting threshold, no one was alerted. Thankfully a physical failsafe tripped before anyone was hurt, but many of the tanks were damaged beyond repair. This means lost capacity while the tanks are replaced - and a black eye for the operations team.

The problem here is that our developer never stepped away from her own mental model. If, before starting work on this feature, she had to figure out what new translations were needed between the two services - if she had to sit down and think about what the differences were between the Configuration Team’s mental model and the Alerting team’s mental model - it’s more likely she would have realized that the drift wasn’t always going to be in psi (especially if she was pairing with someone on the Configuration Team.) Note that having the unit conversion for the alert thresholds stored in an explicit translation layer instead of elsewhere in the codebase would also have probably made it more noticeable, and having unit conversion happen at the service boundary would ensure that no one on the Alerting Team could mistakenly use unconverted alert thresholds or driftRates going forward.


Some points about this example:

  • Team-to-team translations don’t always have to be complicated! The translation we just looked at was a simple unit conversion. In fact, for a lot of fields the appropriate translation will simply be the identity function. Like with many things in software, usually 80% of the problems will be concentrated in 20% of the fields.
  • As is often the case, the incident in our example was the result of multiple things going wrong at the same time.
  • The problem was introduced by a change to one of the models. This is why I referred to semantic alignment as a “moving target” - as engineers we’re on the hook not only for building systems from scratch, but also for rapidly changing existing systems without introducing bugs. The second task can be a lot harder than the first!
  • In this case a translation did exist between the source and destination models, but it was buried in the code instead of being in a designated place where everyone knew to check.

This last point is important! Most likely, semantic translations are already present in your org - either concretely, in various functions scattered across your codebase, or implicitly, in your developers’ heads. Either way, until you single them out and create a formal way to manage them, they’ll remain the kind of architectural dark matter that’s hard to audit, hard to influence, and a breeding ground for bugs.

There’s no right or wrong way to implement/manage your translations, but it’s a good idea to start with the simplest workable approach first. My own preference is to have each consumer keep a library of transforms in their own codebase and apply those transforms to parsed API responses (this is especially nice if you’re using static response types.) The transforms can be maintained either by the consumer, by the producer, or by both working together (I’d recommend it be both, just because a good translation usually requires familiarity with the models on both ends.)

If you’ve tried this approach and it’s working well for you, there may come a point where you want to build or buy centralized infrastructure to handle transforms across your whole org. The benefits of centralizing your transforms are similar to the benefits of centralizing any other concern: for example, with a centralized system you have a single place you can consult to see which teams in the org are sharing data and how their specific models depend on each other (ideally there’s also a centralized audit log that will show you who changed what and when - this might even be integrated with your code review tools.) Think of a translation-management system as the kind of service mesh-style connective tissue I mentioned earlier when talking about microservices.

I want to emphasize that you don’t have to use a vendor to do translation-based model management, and in fact you probably shouldn’t until the need becomes obvious. You can usually get 80% of the benefit of a centralized solution just by using well-defined conventions when writing/storing your translations.

Let’s look at a few more useful properties of point-to-point translations:

Point-to-point translations make dependencies between teams explicit and easy to track. This might remind you of consumer-driven contracts[9]; like consumer-driven contracts, translations make it straightforward to tell when and how a change to a model is going to break downstream consumers. Unlike consumer-driven contracts, translations show not only which parts of the model a consumer depends on, but how that consumer is interpreting them. Here’s another example:


In this example we’ll think about a company that makes SaaS HR software. The employee service powers employee-related functionality, including the employee dashboard, which employees can use to see details of their relationship with their current and past employers. The admin service powers the admin interface, in which HR admins can perform various tasks for the companies they administer.  Naturally, the employee service owns the employee model. 

The employee dashboard often displays the user’s employers in a list format, and it delegates the choice of ordering to the backend dashboard service. When the dashboard asks the employee service to pull an employee object, that employee’s employer memberships (i.e. objects describing the relationship between a particular employer and employee) are attached as an array; the frontend simply displays employer memberships in the order in which they were returned. For the convenience of the employee, the active employer membership (i.e. the membership for the company that the employee currently works at) is always the first item in the list (if there is an active employer membership!)

The admin interface allows admins to see all the employees working at a specific company, and to terminate employees if needed. When the admin service is asked to terminate an employee from a company, it updates a lot of state in the system; among other things, it changes the state of the employer membership from “Active” to “Terminated”. There’s only one active employer membership per employee, and that membership is guaranteed to be at the head of the employerMemberships array on the employee object, so to change the state of the active relationship the admin service just does employee.employerMemberships[0].state = “Terminated”.

Now imagine that the startup decides to let employees be active at multiple employers. This is a big change that touches a lot of different services; among other things, the employerMemberships array can now have multiple active employer memberships in it. The array now begins with a list of active employer memberships, sorted alphabetically.

This, of course, means that the logic in the admin service is no longer correct. Consider this scenario - an employee is employed both by ACME Corp and Zeon industries. An admin goes to the admin dashboard and terminates the employee from Zeon industries. The admin service runs employee.employerMemberships[0].state = “Terminated” - but Zeon is in the second slot of employerMemberships, and ACME is in the first slot! The admin service makes a lot of various changes to terminate the employee from Zeon, but then updates the state of the ACME relationship to “Terminated”, while leaving the state of the Zeon relationship unchanged. This puts the system in an inconsistent state.

In spite of this bug, the type of the employerMemberships array didn’t change, which means that there are no type errors alerting the admin team to the faulty logic. Because of this, the problem gets overlooked. 

What about the tests? Well, in a stroke of bad luck, whoever updated the test fixtures did so by inserting new active employer memberships after existing ones - so tests of the termination functionality kept on working, and the bug slipped through.

The bug goes out, and things are placid for a bit. Then people start terminating employees, and data starts getting corrupted.  Reports of truly bizarre behavior start flooding in. Employees are getting termination notices with the wrong information, certain pages of the app are throwing exceptions, people can’t generate reports - it’s chaos. 

The problem here is that there was a semantic fact about the model that was not embedded within the model, so when the fact was no longer true, the model didn’t change, and no one noticed the newly-introduced bug. If there’s a semantic assumption your or your partners are making about a model, it’s best to have the structure of the model reflect that, whenever possible (this principle is embodied in the slogan: “make invalid states unrepresentable”.)

Now let’s think about what the situation would look like if the admin team had the flexibility to choose a model that reflected their assumptions. Instead of a simple array, we could have an employerMembershipsByState object shaped like this (I’ll use a Typescript type again here):

{
    active: EmployerMembership;
    terminated: Array<EmployerMembership>;
    otherState1: Array<EmployerMembership>;
    otherState2: Array<EmployerMembership>;
}

Notice that this type reflects the assumption that an employee can have only one active employer membership at a time!

The employee service wants to stick with its array representation for the sake of the frontend, so there should be a translation that splits out employer memberships by state when the admin service pulls employee data from the employee service. In this scenario, when the employee team changes its model to allow for multiple active employer memberships, part of the change is figuring out what updates need to be made to the employee service->admin service translation (as always, the employee team or admin team could do this alone, or they could pair on it.) Now it’s much easier to notice that the admin service only has space for one active employer membership, at which point the admin team could go and refactor their faulty logic. Giving each team their own model makes it easier to see what the semantic dependencies are between teams, which makes it easier to ensure those semantic dependencies are not broken.


At this point I’d like to plant a few ideas into your head. Although these aren’t irrelevant now, they’re going to become more relevant as we start talking more about Model Networks in the next section. 

So far we’ve been using translations to translate incoming information, but there are cases where we might want to translate outgoing information as well. Here are some examples:

  • If you use GraphQL, or other approaches that let you pull particular parts of an object instead of the whole thing, you might find it useful to apply translations to your queries, so teams can both request and receive data in terms of their own models. For example, in the example above, our admin service might send out a query to pull data for employerMembershipsByState.active - the translation layer could rewrite that query to ask the employee service for the employerMemberships array. Then, we could filter down the result array down to active employer memberships, and return that to the admin service.
  • For similar reasons, you may find it useful to translate writes as well as reads. For example, if you were on the admin service and wanted to update the state of several different employer memberships, you could move them around to different subfields of employerMembershipsByState (e.g. to terminate an employee from an employer, you could move that employer membership to the terminated array.) Then, you could ask the employee service to update the relevant employee with your new employerMembershipsByState object. Along the way, your translation layer could take the employersByState object and turn it into the array format that the employee service expects.
  • You can also apply translations to domain events - for example, you could fire an EmployerMembershipsChanged event when making the kind of update we mentioned in the previous bullet. You’d probably want to have translations that could do the appropriate object/array transform on the payload of the event based on who was looking at it.
  • You may want to use your translation layer as a facade in front of RPC-style methods exposed by other services[10] (earlier I mentioned that our working conception of “model” in this post includes operations as well as data - this is what I meant!) There are a few different things you can do with this facade:
    • You can translate the arguments to the methods. For example, let’s say the employee service exposes an updateMemberships() method that accepts an employee ID and an array of employer memberships. You can use the facade to provide the admin service with a version of updateMemberships() that accepts an object instead - then, in the translation layer, you can translate that argument to an array before calling the real updateMemberships() function.
    • You can wrap extra logic around the function call. For example, admins using the admin view usually only terminate employees from a single employer at a time, so we may decide it makes more sense for the admin team to use a function that only terminates one employee from one employer, instead of the updateMemberships() function - this way, we can guard against spurious or accidental membership changes. We can use our facade to expose a terminateEmployee() method that takes an employee ID, an employer ID, and the employerMembershipsByState object; then our translation layer can convert employerMembershipsByState to an array, set the state of the appropriate entry to “Terminated”, and pass the result to updateMemberships(). You can also do more complicated things in the facade, like calling multiple underlying methods, calling different methods based on conditional logic, etc.

Inter-service translation layers can also act as “refactor firewalls.”[11] They create isolated coupling points, ensuring that teams can change their own models without forcing synchronous refactors that ripple out past translation layers to other teams.

For example, let’s say the employee team decides that employee.employerMemberships needs a little more structure, so they change it from a flat array to an array of arrays, with one sub-array per membership state. This will probably necessitate a bunch of simple changes on the employee side, but the admin team is insulated from the change; the employee team just needs to change the appropriate translation between the two services. 

As emphasized before, you probably are already using ad-hoc translations to isolate your services/components from each other to some degree - the idea here is to do it in a systematic way, in a single designated place!

In this section we’ve seen the benefits of writing a translation between any pair of teams that wants to work together. So what are the drawbacks? Well, engineering orgs can have a lot of teams, especially when you get big enough that you start having to build internal facing software for support, sales, finance, operations, etc. Plus, your company probably uses a lot of SaaS tools internally (if you’re B2B, your customers likely do as well.) You’ll most likely end up with a mandate at some point to tie those into your ecosystem, which means managing relationships with external teams via their public-facing APIs.

As with microservices and other decentralized architectures, a major organizational goal of our translation-based approach is to allow for serendipity; if two teams have a cool idea and want to collaborate on it, they should be able to try it out without having to go through any bureaucratic bottlenecks. But if your org has n teams in it, the total possible number of pairs is (n * (n-1)) / 2, which is O(n^2). If you have a lot of teams and your graph of partnerships is fairly dense, you could end up having tons and tons of translations.

This is a famously extreme example, but here’s a graph of Uber’s microservice layout circa 2018 (from here). Take a look at the number of nodes - and, more importantly, the number of edges:

At first this might not seem like a big deal, but:

  • There’s a lot of maintenance work that scales with the number of translations. For example, if you have two or three translations between you and partner teams, it’s pretty easy to check them for breakages when you want to change your data model. If you have twenty five translations, that’s a lot of overhead.
  • When you have enough translations you’re almost certainly going to have to deal with some redundant logic.

    A simple example: Consider a large-ish company that owns a suite of productivity products - this includes both full-featured chat and email solutions. There are a bunch of services that support both of these product domains. Separately, there is a unified user management and identity service that keeps track of user profile details. The chat and email products both have notions of “availability”; a user is considered “available” if you can send them a message and they’ll probably get it and read it right away. The availability criteria are different between products, however, and they use different details of the UserProfile object. For chat, a user is available if online is true and awayMessage is empty, and for email, a user is available if currentMeeting is empty, outOfOfficeMessage is empty, and the user’s local time is within normal business hours.

    There are lots of services that both pull profile data from the user management service and reference the concept of user availability. Using naive point-to-point translations, we’d have to embed the appropriate availability calculation in the translation for every one of those services. We’re only looking at a single field in this example, but you could imagine this being true for a bunch of fields at a time.

    Redundant logic is a threat to efficiency, since developers are having to do the same work over and over. It’s a threat to agility, since any change that touches this redundant logic must be replicated to all relevant transforms (in the worst case this makes changes harder to make, which can wreck your code velocity.) Finally, it’s a threat to correctness - when you have lots of duplication, you always run the risk that someone updates the logic in one place but not the other, potentially causing subtle (or not-so-subtle) consistency problems.

Point-to-point translations can be really effective at a certain size, but as your org keeps growing the chaos can creep back in. Let’s see if we can update our approach to make it work better on a larger scale.

Model Management Strategy 4: Model Networks

So far we’ve looked at a totally centralized approach, where one team manages everything, and a totally decentralized approach, where every connection is managed on its own (and the null approach, where nobody manages anything.) Is there anything in-between? 

Well, consider what a typical LAN looks like - at a certain size, wiring every computer to every other is inefficient and impractical, and routing everything through one huge switch puts a lot of load on a single point of failure. It’s standard practice to instead use a hierarchical design, where the macro network is comprised of smaller networks, which are in turn comprised of smaller networks (and so on, for however many layers you want.)

This layout might seem familiar, because it’s one of those patterns that kind of pops up everywhere. For example, the dependency graph of your codebase probably looks similar to this: high level components interact via high-level interfaces; each component is made of sub-components that interact amongst themselves, and so on:

(There are, in fact, organizations that have used this kind of hierarchy to combat microservice sprawl - see LinkedIn’s Superblocks or Uber’s DOMA.)

This pattern also probably reflects the communication structure of your engineering org. At the bottom are “two-pizza” teams which individually own a component, service, or user experience. At higher levels we have higher organizational groups (called various things like departments, groups, divisions, sometimes also teams, etc) that are responsible for whole platforms or product areas. Organizational units at every level have specific processes and interfaces by which they collaborate, and teams are generally more likely to talk directly to other teams in their department/group/whatever:

Hierarchies are pretty common because they are a simple way to accommodate both high-level and low-level concerns.[12] Low-level work is tactical, specific, and usually only relevant to a small group of specialist stakeholders (the internals of a particular component, for example, are mainly of interest to the team that owns that component.) Things are also usually more fast-paced on the lower levels - since changes have a small blast radius, they’re easier to make. High-level work is core, “load-bearing” work that a lot of people depend on (e.g. the interface of a major component is probably of interest to many teams or departments across the org.) Changes to high-level items are highly disruptive and require lots of coordination, so they tend to be rare and slow to roll out. In the diagrams we’ve shown, low-level items are at the bottom and high-level items at the top. In our network diagram, for example, you can see that cutting links at the bottom has a local effect - you might create a network partition, but only in a small part of the network. Cutting links at the top could create a partition between two very large parts of the network.

When used judiciously, hierarchies strike an effective balance - higher-level teams are insulated from rapidly-changing details that aren’t relevant to them, and lower-level teams don’t have to slow down to address broad structural issues.

In the last section I kind of hinted that we need to draw a distinction between high-level and low-level translation logic. We can handle the redundant logic in our point-to-point translations the way we handle redundant logic elsewhere - by finding a way to pull it out and move it to a higher level. It would be nice if we could apply a hierarchical topology, but we don’t have a notion of topology for our bag of translations. What if we did, though? What if we could create a network of translations?

To start networking our translations, we need to introduce a notion of transitivity, which lets us create new translations by chaining existing translations end-to-end. This is as simple as applying the transforms one after the other; let’s look at two cases:

  • We have a User model with a displayName field. Initially, this is an email, but then we apply a translation that lops off the “@” and everything after it. Then, we apply another translation that converts the displayName to all lowercase. In this case chaining translations means we’re translating a field (or set of fields) twice.
  • We apply one translation to our User model that parses an address field into address1, address2, city, state, and zipCode. Then, we apply another translation that takes a role enum and maps it to an isAdmin boolean that’s true if role is set to “admin”. In this case chaining translations means translating multiple disjoint sets of fields.
  • Translation chains can combine both of these cases!

Adding transitivity is a pretty simple change[13], but it lets us tie our translations together into a hierarchical network structure, which can be much more efficient and easier to manage than a point-to-point topology.[14]


Let’s take another look at the hypothetical productivity product suite that we mentioned in an earlier example. Recall that there was a bunch of repetitive logic in the translations between the user management service and the services that powered the chat and email products - namely, we were duplicating the logic for the available field. We can pull that logic out using the following topology:

In this case, we’ve inserted a Chat Parent Model and Email Parent Model as intermediaries between the user management service and the many services comprising the chat product and the email product. The translations between the user management service and the chat/email parent models contain logic just for the available field (and any other logic that is common to all of the chat/email services.) The translations between the parent models and the individual service models contain logic that is specific to each chat/email service.

We may decide we want to go further - for example, we might have some fields and logic in our transforms that are common to admin-facing products, and we might have some that are common to user-facing products. We could add another level to our hierarchy:

Before going further, I should point out that we used a new concept in both these examples - that of the virtual model. These aren’t strictly required to build Model Networks, but they make it easier. A virtual model is a model that is not attached to any particular service or component; it simply acts as an intermediate node that people can translate to and from as needed. The User-Facing Products Model, Admin Facing Products Model, Chat Parent Model, and Email Parent Model are all virtual models, since they don’t belong to any particular service. It can be very effective to use virtual models in a 1-1 correspondence with higher-level organizational units, in the spirit of Conway’s Law - but you can use them however you want!

You’ll notice that in this last example data only traveled down the hierarchy. Let’s look at an example in which data flows up the hierarchy and back down as it moves from service to service. For this example let’s imagine a hypothetical company that operates a Github-based source control service and offers a complementary suite of DevOps products.[15]

Let’s say that you work on the Pipeline Execution team, which is responsible for running CI pipelines, and you’re building a feature that triggers pipelines when a merge request has been created and updated. To build your feature, you need to reference the MergeRequest model, which is owned by the Code Review team. Luckily, your organization has a preexisting network of translations that converts MergeRequests into different forms:

(From bottom to top, the organizational units are Groups, Stages, and Sections. Each Section is made up of many Stages, which are made up of many Groups.)

(We need a shorthand for this new type of network, but “Translation Network” doesn’t really roll off the tongue, so I’ll call it a Model Network, since the nodes are models.)

The translations in the network transform MergeRequests into formats that align with different mental models used by different parts of the company. These translations are specific at the bottom levels of the hierarchy and general at the top. For example:

  • Dev Section -> CI/CD Section: In the Dev Section’s mental model, a MergeRequest is a user-facing object that represents an interaction among developers, i.e. it's part of the development workflow. In the CI/CD Section’s mental model, a MergeRequest is essentially an event that triggers a pipeline on a certain version of the code. The Dev Section is concerned with allowing developers to author, interact and visualize merge requests; the CI/CD Section is concerned with reacting to them. Accordingly, the Dev Section -> CI/CD Section transform should:
    1. Remove information that is irrelevant to the CI/CD Section - for example, title and status
    2. Massage information when necessary to make it align with the CI/CD Section’s mental model: for example, let’s say that in our hypothetical DevOps Pipeline, CI always runs on the latest commit of the MergeRequest - so we can replace the one-to-many commits relationship on our MergeRequest with a one-to-one latestCommit relationship (this has the added benefit of ensuring that there’s no way for a bug to cause us to run CI on the wrong commit of a MergeRequest - i.e. we’ve made an invalid state unrepresentable!) 
  • As mentioned, the CI/CD side models a MergeRequest primarily as an event, but the Verify and Deploy stages have different ways to model that event, and the CI/CD Section -> Verify and the CI/CD Section -> Deploy transforms should reflect that. Let’s say our product had a feature that deployed a new staging environment when a new commit was pushed to a MergeRequest - teams under the Deploy stage would need a list of the reviewers on the MergeRequest in order to set up permissions for the new environment properly.[16] Teams under the Verify stage, on the other hand, wouldn’t care.
  • The Create and Plan stages have different mental models - the Plan stage (which builds project planning features - think JIRA, Kanban, etc.) thinks of a MergeRequest as a unit of work and cares mostly about its high level status and progress, rather than its specific content. The Create -> Plan transform might preserve things like title, status, and reviewers, but throw away things like commits. It could also do various massaging - like replacing the comments relationship with a commentCount field. 

A couple other important points about our Model Network:

  • As in the last example, we have a few virtual models in our hierarchy - “Create”, “Plan”, “Verify”, “Deploy”, “Dev Section”, and “CI/CD Section” are all virtual models. As before, each virtual model corresponds to a different organizational unit.
  • In general, when traversing a path in a Model Network, it doesn’t make much sense to shuttle the data to each individual node (especially when the path involves virtual models - where would the data even go?) Instead, we should:
  1. Physically move data from the source node to the destination node.
  2. Apply the appropriate transforms to the data, in sequence.

A Model Network is a logical network which overlays a physical network; the two shouldn’t be coupled.

As mentioned, our hierarchy allows us to pull out “load-bearing” translation logic, which can make the process of refactoring smoother than it would be when using point-to-point transforms. If, for example, you had a model change that was relevant to only the Create stage, the blast radius would be limited to the Create Model’s subtree and immediate neighbors - in other words, you would need to refactor at most the Create -> Plan and Create -> Dev Section transforms. The transforms are still acting as refactor firewalls, but in this case they’re insulating the rest of the tree from the node that changed.

Note that your hierarchy doesn’t have to look like the one in these examples! It could be taller or shorter in certain places; you don’t even have to make a hierarchical network at all if you don’t want to (although I’d recommend that you at least consider it, for the reasons I pointed out earlier.) You can use whatever topology works best for you.

Earlier, we used a single path of translations from the data provider to the data consumer. Sometimes we may need to use multiple paths to transform the data - in particular, in cases where the high level path is too lossy, and we need a higher level of detail than it provides. Say, for example, that the Pipeline Execution team wants to let users use keywords in the comments or description of a MergeRequest to control CI (e.g. to bypass or expedite a pipeline). They could quickly prototype this feature by adding a new transform directly from the Create stage (or the Code Review Group) to their own service:

To transform data across multiple paths, we run each one of the translation chains on the data and then pick a way to merge the results (a sensible default is to merge all the fields of the result objects, with a mechanism by which the user can specify ahead of time which option they’d prefer in the event of a conflict.)


You’ll notice that in adding our new transform we broke the strict hierarchy we established previously; that’s ok! There’s nothing wrong with having an escape hatch for special cases. If you eventually find that low-level teams are writing the same logic on a certain set of fields over and over, you can promote that logic to a higher level in the network. The idea is to have the Model Network reflect the actual communication structure of your org, which will change over time; it’s important to make sure you evolve your Model Network accordingly.

If you’re small(ish) and you intend to stay small(ish), a Model Network is most likely overkill; you can get by just fine building ad-hoc translations when you need them. However if a) you’re small now, b) you anticipate getting larger, and c) you plan to use a Model Network in the future, you won’t hamstring yourself by rolling one out a little early. Since Model Networks support arbitrary topologies, they’ll work just fine in smaller orgs, and they can smoothly track your org structure as you grow.

We’ve seen that we can chain translations together to clean things up if our point-to-point strategy starts to get out of control, but there’s another benefit we get from taking a network-based approach that may not be as obvious. In the last section we touched on the idea of using our translations to transform queries as well as query results - that way every team could both ask for and receive data using their own models. You clearly can do the same thing in a Model Network:

This works fine, but we can go a little further. Since the Model Network (ideally!) connects the whole organization, we can use it to broadcast a query, using the information in the network to transform the query into an appropriate format for every source, and then collecting the results and returning them to the requester:

Of course, this is another situation where the naive approach is not very efficient. We don’t want to really send every query to every node on the network; we’ll basically DoS ourselves. What would probably be better is to have sources declare, in a central place, which data they actually provide themselves - then we can route queries only to the sources that can actually fulfill them (in the example above, only the Code Review Group provides MergeRequests, so broadcasting the query would just reduce down to pulling data from the Code Review Group like we were already doing.) The idea is that you ask the Model Network for data (using your own familiar models), the Model Network finds all the data necessary to satisfy your query (wherever it lives), stitches it all together, and returns it to you in the format you asked for. I like to refer to this as “Model-Addressed Querying”[17], since you don’t have to specify where the data is or how to get it - you just declaratively describe what you want.[18]

When you use a Model-Addressed Query, the way you query the data is decoupled from the actual distribution of the data across data sources. Model-Addressed Queries pay off the most when:

  • You need to blend and/or join datasets across many different data sources. For example, if you have a freemium, product-led sales motion, you may track leads in Salesforce while also tracking them in your own product database after they sign up for a free account. In this case you potentially need to handle two problems when pulling leads:
    • The dataset of leads might be split across Salesforce and your product db. You need a way to merge the two into a unified dataset, as well as a policy for breaking ties.
    • Leads will probably need to refer to auxiliary tables that reside in both sources. For a truly unified interface you need to transparently handle cross-source joins - i.e. cases in which a lead in Salesforce has a foreign key to a product table, and vice versa.
  • You want to migrate your data with minimal disruption. Let’s say you’re replacing an old service and have cut over to a newer version. While you work on moving existing data over, you may end up in a situation where a particular record might be in the new system or the old system, so you need to query both. Having a single facade in front of the two systems means you are shielded from the details of the migration.
  • You generally have data distributions that are complex, awkward, or rapidly changing.

We’ve thrown around a lot of ideas in this section, but at some point we have to implement them. You may be wondering if you can do what you did with point-to-point translations and take a simple DIY approach.

Well, you can try, but it’ll be tough. Model Networks are powerful, but the machinery involved is complicated enough that you’re usually better off using a vendor than trying to build it all yourself (if you’re dead-set on keeping things in-house, you should think hard about whether you can get away with just doing point-to-point.) Here are some of the trickier design considerations that a centralized, production-grade Model Network platform needs to take into account:

  • First and foremost, we need to make transitivity work. Composing transforms is simple (we just run one after the other) but as we’ve discussed, we need a way to apply a chain of transforms without having to shuttle data between multiple services:[19]

The approach I suggested in the point-to-point section (in which transforms are simply embedded in the codebase of each individual service) doesn’t make sense in this scenario; a possible solution is to store the translations in a central place and allow services to retrieve them on-demand. That way, when one service wants to query data from another service, it can ask the translation database (i.e. our “Model Network platform”) for the chain(s) of translations connecting the two services, pull the data from the producer service, and apply the translation chain(s) afterwards:

The biggest technical challenge with this approach is figuring out a canonical format to store transforms in. This is non-negotiable if you want to compose transforms in a polyglot environment; if one node is a Typescript service and one node is a Go service, based on which direction you’re going you may need to transpile your translations from your canonical format into either Typescript or Go code. 

Another option is to have an out-of-process agent that interprets translations on its own and hands the results back to the requester:[20]

Finding a common format and execution strategy for translations is an open-ended design challenge, but it has to be done if Model Networks are going to be flexible enough to be useful.

  • When constructing point-to-point translations, it’s almost always going to be the case that one party hosts the data while the other party consumes it - so the transform between them will be unidirectional. On the other hand, if we really want to achieve any-to-any connectivity in a Model Network, we will need to apply transforms in both directions, i.e. the transforms will need to be bidirectional.[21] Ideally we should not make users maintain separate transforms for the forwards and backwards directions - both to spare them the extra effort, but also to make sure the two directions don’t get out of sync. Users should only have to write one direction, and the other direction should be auto-generated. Additionally, some functions are not invertible (aggregate functions, for example), so there needs to be machinery that surfaces lossy translations to the user so they can modify them or work around them, if necessary.
  • Model-Addressed Queries are complicated to implement; it’s hard to make arbitrary queries work with arbitrary data distributions. There are an especially high number of edge cases to handle when we get rid of the assumption that there is a single source of truth for the data - i.e. when multiple sources may have conflicting records for the same primary key (in this case we need a mechanism to merge data from different sources as well as a way for clients to indicate that they want to trust one source over another.) It may not be clear that it’s worth it to have full generality, but the point of Model-Addressed Querying is that the user is insulated from the underlying infrastructure - if there are a bunch of special exceptions they have to keep track of it kind of defeats the purpose. Any Model Network platform is going to need a specialized engine to interpret and execute Model-Addressed Queries.
  • Earlier we looked at some of the benefits for having a central repository for your point-to-point translations (things like auditability, validations, permissioning, etc.) These are all still things you’d want out of a Model Network platform. Model Networks are more heavyweight than simple point-to-point translations, so if you’re using Model Networks you’ll probably want these centralized capabilities sooner rather than later!
You may be wondering if you can use a Model Network with Domain Driven Design or a Data Mesh. The answer is yes! I know I mentioned both of these as examples of decentralized architectures - Model Networks augment them, but they don’t have to supplant them.

I touched on this before, but you also don’t have to use Domain Driven Design or a Data Mesh to use a Model Network - you don’t even have to use microservices. Model Networks will work just fine with a monolith, a data lake, or a data warehouse. You can use a Model Network whenever you have lots of different teams that all want to model the world in different ways.

(It goes without saying, but we spend all our time at Nerve thinking about and building Model Networks. If you think a Model Network might be a good fit for your org and you want to roll out a pilot, we can help!)

Summing Up

We’ve covered a lot of ground! Here’s a quick summary of all the approaches to Model Management we looked at and when you may want to use them:

Strategy Description Pros Cons When to Use Should You Use a Vendor?
Null Strategy No official policy on Model Management; everything is ad-hoc Zero overhead Not sustainable at a certain size When you’re small enough that everyone still understands how the whole system works

When you’re a “big data” company with high volume & low domain complexity (social networks, adtech)
N/A
Centralize Everything One data model that the whole company uses (usually called something like a “data catalog”, “semantic layer”, or “federated schema”), curated by a single team or department Single master schema defines a “common language” for the company, reducing confusion and mistakes (theoretically)

Lightweight and straightforward to implement from an infrastructure standpoint
“All or nothing” dynamics make it risky for stakeholders to lean in

All of the coordination work is bottlenecked on a single team, which usually undergoes a lot of burnout

Locking everyone with the central schema introduces a ton of drag, but letting people do what they want and playing catch-up is exhausting

People will still make some mistakes
In very controlled circumstances where the entire organization is very tightly aligned Maybe, but first see how far you can get with a well-maintained wiki or spreadsheet
Split Models Across Teams Every team is free to model the data that it owns, and all consumers of that data reference that team’s model. No single bottleneck as in the centralized approach

Easy to get buy-in; teams probably starting to move towards an informal version of this anyway

Teams are free to make decisions on their own; keeps things fast and flexible
Still have the potential for “impedance mismatches” between the data model in the code and the model in developers’ heads.

Sometimes causes high coupling, which can lead to painful migrations and refactors
When you want to give teams the flexibility to own their own models (but if you’re using this strategy, consider whether layering on point-to-point translations can help - usually it can!) Only if you need centralized governance (see below)
Point-to-Point Translations Every team has their own individual model which represents their specific view of the world; when teams want to partner they build translations to translate data, queries, and API calls between their respective models. Allows every team to use data models that align with the mental models they have in their head

Translations act as explicit contracts; easy to audit dependencies and catch breaking changes

Easy to see how each team is interpreting the other’s data. Interpretations are no longer implicit, they’re reified in code

Translations act as “refactor firewalls”, preventing refactors from spidering out across service boundaries
If your org has a large number of cross-team dependencies, writing and maintaining transforms for each one can be tedious

Sometimes teams can end up having to reimplement the same transform logic over and over - discrepancies can work their way in, causing problems.

Requiring a separate transform for every new partnership can drive up the cost of collaboration, both in terms of time/effort and in terms of extra code
When you’re big enough that your teams are starting to have divergent mental models, but you still want to facilitate safe and convenient bottom-up collaboration.

When the total number of translations (i.e. the total number of partnerships between your teams) is still small enough for you to manage
Only if you need centralized governance (e.g. standardization, dependency tracking/static analysis, analytics, PII/PHI tracking, audit trails & reporting, permissioning, etc), or if you have a polyglot environment and don’t want to write a ton of connectors
Model Network Similar to point-to-point translations, but now we allow for transitivity (chaining translations end-to-end) Strikes a balance between centralized control/efficiency and decentralized flexibility. Allows for (but does not require!) a hierarchical network structure that separates high-level and low-level transformation logic.

Network structure is flexible and can evolve with the structure of your system and org

Allowing translations to be chained together means you can connect all the teams in your org with an O(n) (not O(n^2)) number of translations

Allows for “Model-Addressed Querying”, which enables easy and stable querying over weird polyglot data distributions
More of a learning curve than point-to-point

Not at all easy to roll your own solution, especially if you want Model-Addressed Queries

Overkill for small orgs
When your point-to-point translations are starting to get out of control and it’s hard to manage/track all of them

When there’s a lot of redundant logic in your translations and it’s making it slow and tedious to make changes

When you want to be able to freely translate data between any two teams in your org
Yes, unless you have very specific needs and are willing to commit to a substantial infra buildout

This concludes Part 1 of this post. I hope you now feel better equipped to identify, resolve, and guard against semantic errors in your codebase. Thank you for reading!


Part 2: Digging Deeper Into Model Networks

We’re now going to switch gears pretty dramatically. If you’re not a big fan of theory or speculation, this is your off-ramp, because two things are about to happen:

  • The discussion is going to get more abstract. We’re not going to be talking about anything wildly futuristic, but we’ll cover some potential applications of Model Networks that aren’t as obvious as what we’ve seen so far.
  • Our design goals are going to shift. Earlier we looked at how to use Model Networks to achieve safety (“how can we drive down the number of semantic misunderstandings in our org?”); now we’re going to look at how to use them to achieve portability  (“how can we drive down or eliminate integration costs to make our code and infrastructure reusable and composable?”)[22] If it’s not apparent how Model Networks enable portability, don’t worry - I’ll explain soon!

I’d like to begin by taking the Model Network concept and extending it a bit.

A Public Model Network

Earlier I introduced Model Networks by drawing an analogy to a LAN - a LAN facilitates communication between an organization’s computers, and a Model Network facilitates collaboration between an organization’s teams. But at any modern company, much (most?) of the network traffic doesn’t stay on the private LAN. It ventures out into the Internet - a massive public network-of-networks - on its way to some public host that may be on the other side of the world. What might a public Model Network (a “Model Internet’) look like, and what could we use it for?

Starting from first principles, a public Model Network would have to have public models that anyone from any organization could link to (a public model would act as sort of a public API in this case.) For example, Salesforce could expose a public model that we could link to from our organization-wide Model Network like so:

It also makes sense to let public models link with each other (as in a private network, these links should be transitive). So, for example, Hubspot may want to link their public model to Salesforce’s, and Pipedrive might want to link with Hubspot:

These don’t have to be directly linked either; perhaps they’re all linked via some winding path in the public network:

As with our private Model Network, we need to have a place where the structure of the public Model Network is stored and managed. There are lots of ways to go about this, but for the purposes of our discussion we’ll assume that there is one global Model Network platform that keeps track of the translations for all Model Networks - public and private - and serves them on demand. To keep things clean, I won’t explicitly depict the Model Network platform on these diagrams - but remember that it’s there!

Right away we can see there are a couple properties that our Model Internet should have if we want it to be practical:

  • It’s common practice to put private networks behind a firewall, so hosts on the private network can initiate connections with hosts on the public network, but not the other way around. Correspondingly, private models should be able to build and manage translations between themselves and public models, but not the other way around. An organization’s private models should generally only be visible to users in that same organization.
  • The result of a network request depends on where it originates from - an IP that resolves to a certain host in one private network may resolve to a different host in another private network, or it may resolve to the same thing, or it may not resolve at all. Earlier we talked about using a Model Network to resolve Model-Addressed queries - now that we’ve thrown a public Model Network into the mix, Model-Addressed queries can end up pulling data from internal sources that are only accessible in-process or over the local (physical) network, from public sources that are accessible over the Internet, or a combination of both. The mix of sources that a query will pull from depends on which organization is running the query - i.e. the same query can resolve differently based on the context it’s run in. Here are some examples; let’s assume that all of these orgs are running a query for the same data:

Org A stores the data in an internal service - although its Model Network is hooked into Salesforce’s public model, it doesn’t store the data the query is asking for in Salesforce:

Org B only stores the data in Pipedrive, so fulfilling the Model-Addressed Query requires pulling data from a single external source:

Org C stores the data in many places - satisfying the query requires fetching and transforming data from Salesforce, Hubspot, and multiple internal sources:

In all three of these cases the query itself is the same! Remember, using the information in the Model Network we can translate the query as needed so that every data source can understand it - and translate the results back into the form the requester is expecting.

It goes without saying that any implementation of Model-Addressed Querying as described in this article is going to have to include a lot of built-in connectors to lots of different kinds of data sources and APIs - the idea is for the Model Network to act as a coordination and distribution layer that cleanly overlays existing infrastructure.

Now that we’ve sketched a rough outline of our public Model Network, let’s look at a few things we can do with it:

  • We can switch our code to use a different third-party service with minimal disruption. For example, if one of our services pulls data from Salesforce, we can instead have it pull data from Pipedrive (say, for example, we just migrated all our data from Salesforce to Pipedrive and want to cut our service over), and the transforms in the Model Network will keep the data in a consistent format. You don’t even need to write any new transforms - you can just inform your Model Network Platform that your data now lives in Pipedrive instead of Salesforce.
  • We can use the Model Network to smoothly integrate new software with our existing infrastructure. For example, say we wanted to use a cool Marketing Analytics SaaS product that is a member of the public Model Network (and is connected to Salesforce via some path of transforms.) All we have to do is tell the product where our data lives (in the pictured example some of it lives in Hubspot and some of it lives in one of our own internal services) and give it permission to access our data on our behalf; then the product can use a Model-Addressed Query to pull the data in and transform it into a format it can understand and analyze:

Again - we didn’t have to build any new transforms to get our private data in a format our cool new SaaS solution could understand! The SaaS solution is connected to Salesforce by some path in the public Model Network, and we have an existing link between our private Model Network and Salesforce, so there exists a chain of transforms that will turn our data into a format fit for consumption by the SaaS solution.

Notice that if you’re already using a private Model Network, you don’t have to learn anything new or special to start using the public Model Network. You just build a translation from one of your private models to a public model you happen to already know well (in our example that was Salesforce.)

How Do Model Networks Enable Portability?

I mentioned before that our focus in this section has changed from safety (i.e. prevention of semantic misalignment) to portability. This is a subtle shift, since in a certain sense these two concepts are duals of each other: a major reason a lot of code is not portable is that switching it out can cause semantic problems, especially if that code contains complicated business logic! If you have code that processes lead data from Salesforce, for example, you can’t just point that code at Zoho, because even though both CRMs provide leads, the model that Salesforce uses will be subtly different than the model that Zoho uses in ways that will make your code blow up (this is true even disregarding custom objects, etc. - even if you have the same data in your Zoho instance as you have in your Salesforce instance, your code will still not be portable!)

When we look at the concept of a Model Network from a portability-oriented viewpoint, different things jump out at us (even though technically all the functionality remains the same.) Consider what it means for two nodes to have a path of bidirectional translations between them in a Model Network. Before, the most interesting implication was that the two nodes could pass data back and forth with minimal potential for misunderstanding. Now, the most interesting implication is that the two nodes can be safely substituted for each other.[23] We saw what that looks like with Salesforce and Pipedrive; we even used the principle to make our own data understandable to third-party code.

Just to emphasize again - two nodes with a bidirectional path between them in the Model Network can be safely substituted for each other if we can successfully resolve differences in authentication, handshakes, query/API format, pagination, etc. etc. The assumption is that either both nodes adhere to a standard that covers these lower-level concerns[24] or that the Model Network includes machinery that successfully abstracts them away (in practice we’ll probably need a combination of both.)

These notions of portability and substitution may still seem kind of vague. Let’s fix that by looking at some examples:

  • As we’ve just seen, under the right circumstances we can leverage the public Model Network to easily substitute services for each other, and to make new services work with existing infrastructure.
  • We can use the public Model Network to tie together our own internal models. Imagine that we are building infrastructure for an online clothing retailer. This retailer not only has a service backing their general clothing store, they have a separate brand (with its own separate service) for designer sneakers. They also have an inventory management service that is built around a general notion of SKUs (Stock-Keeping Units.)

    These services all have various differences between their models. The Shoe model used by the designer sneaker service needs to have a lot more detail than the Shoe model used by the general clothing service; the SKU model, on the other hand, is going to be structured in the way that makes inventory management the easiest. We could resolve these differences by writing transforms between these models, but we could also hook these models into the public Model Network, assuming the public network had models that represented retail clothing, designer sneakers, and SKUs (these could be exposed as a public API for a particular piece of software or community-owned; they could correspond to particular services or they could be virtual models.) Assuming these public models had paths between them in the public network, we’d essentially get translations between our three services’ models “for free” (albeit with a much greater dependence on the logic in the public Model Network, which necessitates a greater degree of trust - see the discussion at the end of this section!):

  • Internally-deployed OSS services can use the public Model Network to interpret data that comes from other parts of our infrastructure. For example, let’s say you’re self-hosting an OSS billing service to handle your usage-based billing (here are some examples.)Let’s also say this service is hooked into the public Model Network - maybe it’s connected to Stripe’s public model[25] - which means it can easily ingest data from the rest of our infrastructure, since the public Model Network ideally already contains the transforms needed to translate raw events from our app into usage data a billing platform can understand:

OSS authors can use this pattern to write software that’s easy to integrate and easy to distribute. Note that this pattern isn’t limited to out-of-process services - we can also use it for in-process libraries! For example, assume our billing service comes with an in-proc instrumentation library - we could transform the data when the library gets it, instead of when the service gets it. In this case, our instrumentation library applies transforms on the data it receives, so it needs a way to get those transforms in the first place - one option is for the library to pull and cache the transforms from the global Model Network platform at runtime, but if you’d rather avoid any network calls, you could also have consumers run a script when they install the package that asks the Model Network to transpile the transforms and bakes the transpiled transforms into the package itself (note that with this approach consumers would need to re-run the install script when they want to pick up new changes.)

Earlier we briefly saw an example in which we needed to use multiple translation paths to fully fulfill a query in a private Model Network, since neither path provided complete coverage of fields on its own. This scenario can also come up in a public Model Network, although we haven’t investigated it in our examples yet. Here’s a Model Network with a topology similar to the one we saw earlier, but with a few extra edges added. You can see that in this instance the Model Network uses multiple paths to transform the results when we ask for data from Pipedrive (as before, the outputs of the two different translation paths are merged at the end). For this particular example I’ve added arrows showing the direction the transforms are applied in:

In the ideal situation, each path of translations translates a disjoint set of fields. Often, however, different translation paths will translate overlapping sets of fields. This leads to the unfortunate “diamond problem”, in which two different paths yield conflicting answers for what the value of a certain field (or fields) should be. This is a pretty manageable problem in a private Model Network -  it’s straightforward to find the owners of the translations involved and hash out a way to resolve the conflict. The situation is more complicated with a public Model Network: the translations making up the different paths may be owned by completely unrelated parties, and it may not be easy to tell which path’s answer should be preferred. A robust implementation of a public Model Network will need to have a sensible, scalable policy for breaking these kinds of ties.

Remember that some of these paths may also be lossy in ways that we’d need to work around. Let’s go back to our earlier case, the one that involves a cutover from Salesforce to Pipedrive:

Let’s say, hypothetically, that Salesforce Accounts have various booleans on them indicating that the account would prefer not to be contacted in a certain way (“Do Not Call”, “Do Not Email”, etc.) Meanwhile, Pipedrive only has a single “Do Not Contact” field. Let’s say that there’s a transform somewhere between Salesforce and Pipedrive that sets the “Do Not Contact” flag if any of the “Do Not <X>” fields are set. This transform is non-invertible - it is impossible to reconstruct the original state of the “Do Not <X>” flags based on whether or not “Do Not Contact” is set.

This complicates our cutover a bit, because we can no longer translate data from Pipedrive’s model to Salesforce’s model with perfect fidelity. If our internal code depends on any of the “Do Not <X>” fields, it won’t work with data that comes from Pipedrive! Our Model Network Platform needs to surface this fact to us when we ask it to satisfy our query using data from Pipedrive - at that point, we can either refactor our code to remove its dependence on the “Do Not <X>” flags, or accept that we need to store our data in another CRM if we want our code to work (one advantage of Model Networks is that these kinds of misalignments are surfaced to us immediately, instead of halfway through a migration.)

As with private Model Networks, our public Model Network has no technical constraints on network topology. I would bet, however, that a public Model Network would end up organically adopting a hierarchical Small-World structure. In this situation, the “high-level” nodes would be the major players in a space[26], and these would have important “load-bearing” links between them, with smaller players branching off of these major nodes, and so on. Going back to our earlier example - the link between Salesforce and Hubspot would be a major “load-bearing” link; in turn, we could imagine a lot of smaller, less established CRMs wanting to link their own public models with either Salesforce’s public model or Hubspot’s public model (based on which model they were more familiar with, among other things.)

These major links could conceivably span different domains as well - for example, a customer service app like Zendesk might want to build a link to Salesforce so it could understand certain touchpoints (emails, meetings, phone calls, etc.) that come from CRMs. In a certain sense, you can think of the links in a Model Network as moving data across “domain space”, the same way that, for example, airline networks move humans across physical space.
It’s worth pointing out that taking a portability point of view means the focus is on making sure the computer can interpret the data correctly, as well as the human using the computer. Structural differences that might be trivial for a human to reconcile can easily crash a program - for example, a phone number that is represented as three integers instead of as a formatted string. These are in addition to the classic semantic errors that we’ve been considering in the earlier parts of this post - for example, interpreting activeUsers as a count of paying customers when in reality it also counts users on the free tier. Preventing *both* kinds of errors becomes vital when we’re trying to connect two pieces of software that weren’t explicitly designed to work together.

Making a Public Model Network Practical

It’s important to acknowledge that building and using a Model Network in a public setting is inherently a lower-trust endeavor; when a team uses a public Model Network, they are taking a dependency on translations that are built and maintained by unaffiliated entities. It’s a lot like the difference between using a package made by another team at your company and reusing a package off of Github made by someone you’ve never met. It would be naive to think that we could simply take our earlier Model Network design, which was intended for private use, and simply open it up to the public without encountering any problems. A public Model Network would need extra infrastructure to ensure that loosely-coupled participants could work together effectively.

In our earlier discussion of Model Networks we talked about some security and auditability features that we may want in our platform; in a lower-trust environment these become hard requirements instead of nice-to-haves. A few other features we would probably need to be successful:

  • Trust and Safety
    • A verification system and a way to catch/remove impersonators.
    • Data quality assurances - a way for consumers and producers to specify health checks/validation on their data, and a way to notify users when those aren’t being met. When a producer consistently provides bad data, there should be a way to surface that to consumers, and it should be easy for consumers to work around the problem.
    • A way to test when a translation turns valid data into invalid data, and a way to identify & route around the bad translation.
    • Allow/Deny lists to ensure that users have control over who they work with and who they permit to work with them. 
    • A reputation system to identify and remove bad actors.
    • A way to handle the “diamond problem” that I alluded to earlier. This could involve a default policy for resolving conflicts - i.e. we could choose to trust the path that has the better aggregate reputation, or the path that is more heavily used by other consumers.There also needs to be a way for users to override the default policy as needed.
  • Versioning and Change Management
    • A way for users to announce breaking changes to the Model Network before they roll them out, so other affected users can be alerted and have time to adapt. This is especially important when making changes to the “load-bearing” transforms I alluded to earlier.
    • Support for multiple versions of a transform, deprecation, and upgrade paths - perhaps even automated upgrades, to the degree that’s possible.
  • Communication
    • A centralized way for data producers to manage their relationship with data consumers - a place where consumers can post comments, questions, requests, and bug reports, and a place where producers can post feature announcements, updates, and documentation.
    • Standard community management (anti-bullying/harassment, anti-spam, moderation, notifications)

I mentioned Github earlier, and I think it’s an apt analogy when it comes to the kinds of problems our Model Network would need to solve. What we need are a suite of standard tools and processes to update, maintain, and manage both the network and the community around it.


One Level Deeper

I’d like to end by making some broad, fundamental claims about what Model Networks are and what problem they solve. Fair warning - this section will be even more abstract and speculative than the last!

We’ve considered two big issues so far: safety and portability. The punchline of this entire post is that these are actually two views of the same problem: even though we sometimes think of data as objective and self-describing, in practice it’s really not - the meaning of a certain piece of data always depends on who is interpreting it and how. This is why you can’t just hand off raw data to another team and expect them not to draw the wrong conclusions; this is why you can’t simply hook up another payroll provider’s database to Gusto’s code and expect anything to work. Any long-term promise Model Networks hold stems from their capacity to address this issue  -  to give data some degree of objective meaning.

Objective Semantics and the Semantic Web

The subjectivity of data is so obvious it may seem like a simple fact of life, but back in the early 2000s it was actually the focus of a major initiative called the Semantic Web (also sometimes called Web 3.0 - this was the first Web3, before the blockchain one!) That name’s not an accident; Web 3.0 was the brainchild of Tim Berners-Lee, who in fact saw it as the final evolutionary state of the Web.[27] The Semantic Web was a complicated and ambitious project with many goals (and sometimes different sets of goals, depending on who you asked), but one point of consistency was an emphasis on attaching machine-readable semantics to data.[28] The idea was to use existing techniques in knowledge representation and automated reasoning to create ontologies (written in a new language called the Web Ontology Language, or OWL) that would very thoroughly describe what a given piece of data meant and how to reason about it.[29] That way, any program that wanted to understand a particular dataset could simply download the relevant ontologies and (as long as it knew how to parse and interpret OWL) it would know what the data represented and how to use it; it could even make deductions based on the data in an automated way. For this scenario to work, the ontologies had to be a rich and comprehensive source of semantic information, so they were often very complicated (and intimidating!)

OWL was intended to be a standard way for machines to represent and consume semantics. If every program could correctly interpret OWL, then every program could come to precisely the same conclusion about the meaning of a given piece of data. Data would no longer be subjective!

Of course, things didn’t turn out that way. Twenty years have passed, and although the Semantic Web has found success in certain domains (for example, Google uses some Semantic Web technologies to power parts of its search engine), it hasn’t achieved anything close to the monumental impact of Web 1.0. This raises the question: what happened? Why didn’t the Semantic Web take off? Is it simply unrealistic to try and “solve” data’s subjectivity? And - most importantly for our purposes - if the Semantic Web couldn’t do it, is there any way Model Networks could?

I think it’s possible, because Model Networks take an extensional approach to the problem where the Semantic Web took an intensional one. Let’s explore what that means!

Model Networks Give Data Extensional Semantics

Extensionality and intensionality are technical terms that come from the field of logic; informally, they represent two ways to think about meaning. Intensional definitions are definitions as we normally think of them - to define a term intensionally, we write down a description that captures the meaning of that term. For example, one intensional definition of the term “planet” is: “any of the large bodies that revolve around the sun in the solar system.” To define a term extensionally, we present a set containing all the objects that term represents. The extensional version of our earlier definition would be:

{
 Mercury,
 Venus,
 Earth,
 Mars,
 Jupiter,
 Saturn,
  Uranus,
  Neptune
}

(sorry Pluto)

Sometimes extensional definitions are infinite sets: for example, the extensional definition of “prime number” is a set containing every prime number, which is infinite and cannot be written down. Other times the definitions are simply very large: the extensional definition of “dog” is the set of all dogs past or present, real or fictional. If there exists an infinite multiverse of dog-planets, we don’t know about it, so for now we can assume the set of all dogs is vast but finite.

Model Networks are - in a deep though informal sense - an extensional way to give a piece of data objective semantics. Let’s look at an example to see how this is true; it’s an important and subtle point so I want to make sure I’m being clear about it. 

Say we have an object that looks like this:

{
    “id”: “cypher04”,
    “eventDT”: “2024/04/16 07:37:08    “super”: true,
    “location”: “Atlanta, GA”
}

To make it easier to talk about this piece of data, we’ll henceforth refer to it as Packet X (the “X” makes it mysterious.)

Taken in isolation, Packet X clearly has a  subjective meaning; there are several competing ways to interpret it, and they’re all reasonable. cypher04 could be a username, but it might also be the name of a server; a lot of people live in Atlanta but it also has a few datacenters. eventDT is probably an event date, but what’s the event? Maybe it’s a login timestamp. Maybe it’s the time of the latest successful boot and it’s meant to be used to calculate uptime - or maybe it’s the time of the latest health check. super could mean “superuser”; does that mean the user logged in as a superuser? Maybe it means the packet is tracking some process running as root, or maybe it refers to some kind of ad-hoc “super server” concept we don’t know about.

Let’s assume, for the sake of example, that Packet X comes from a service that we own, and that this service tracks login events. id is the username, super is whether the user logged in as a superuser, and eventDT is the timestamp of the login in local time. We, the owners of Packet X, understand its meaning to be “the user identified by ‘cypher04’ logged in as a superuser from somewhere in Atlanta, Georgia on April 16th, 2024, at 7:37 PM EDT”; however, the names and types of the fields on Packet X are not enough to unambiguously communicate this. How can we write the meaning of our packet down in a machine-readable format that every consumer can understand?

The Semantic Web approach would be to use OWL to write an ontology that comprehensively defined the categories involved and their relationships to each other. This ontology would probably look something like this and would specify things like:

  • A login event is a type of event that can happen in a system.
  • A login event indicates the point at which a user becomes active within a system.
  • A user is uniquely identified by a username.
  • A username is a pseudonym for a person or a program.
  • A superuser is a type of user.
  • A superuser is a user with elevated privileges.
  • A user with “elevated privileges” can do things that a normal user cannot do.
  • A user with “superuser privileges” may or may not choose to log in as a superuser.
  • If a user chooses to log in as a superuser, their original ID is still retained for tracking purposes.
  • A user may or may not be required to present authentication details to log in.
  • A login event must have a user, a timestamp, and a location.
  • If the user is a person, the login event’s location likely indicates the physical location of the user when they logged in, although it doesn’t have to (if the user is using a VPN, for example.)
  • Two login events cannot happen at the same time for the same user.

This is a sample of the types of things we could put in our ontology; we could be as thorough and detailed as we want (for example, we could go a little further and specify things like “a login happens on a machine or machines”, “a machine is a personal computer, server, or embedded system”, “a machine can be owned or rented by a user”, “a user can potentially have a name, multiple email addresses, and an employer”, etc.). The next step would be to annotate the different parts of Packet X with concepts from the ontology we just created. This communicates that:

  • id represents a username.
  • eventDT represents the local time of the login, relative to location.
  • super represents whether the user logged in as a superuser or not.
  • location represents where the user (maybe) logged in from.

This is an intensional way to describe the semantics of our data; we are using an ontology to explicitly and completely define the meaning and implications of the information contained in Packet X, to the best of our ability. In this way we endow Packet X with an objective meaning, insofar as all consumers will interpret Packet X the same way - if they have the specialized machinery required to understand and use information contained in OWL ontologies!

While this is perhaps the most obvious way to give Packet X objective semantics, it’s not the only way. We can start to see another path if we consider that many consumers of our packet already have a built-in concept of a login event, and the ones who don’t probably don’t care about our packet anyway. The issue is that every consumer has its own idiosyncratic way to structure information. Packet X represents the statement “the user identified by ‘cypher04’ logged in as a superuser from somewhere in Atlanta, Georgia on April 16th, 2024, at 7:37 PM EDT.” Let’s assume that Consumer A represents this information like so:

{
  “eventDetails”: {
    “actor”: “cypher04”,
    “actorType”: “User”,
    “eventType”: “loginEvent”,
    “eventTimestamp”: “2024/04/16 07:37:08”,
    “eventTimezone”: “UTC -4  },
  “eventMeta”: {
    “login”: {
      “location”: “Atlanta, GA”,
      “isSuperuser”: true
    }
  }
}

Consumer B represents it like so:

{
  “username”: “cypher04”,
  “pw”: null,
  “timeUTC”: “2024/04/16 11:37:08”,
  “admin”: true,
  “coordinates”: {
    “value”: [33.753746, -84.386330],
    “exact”: false
  },
  “sessionDuration”: null
}

Consumer C represents it like so:

{
   “user”: {
     “id”: “cypher04”,
     “name”: “<unknown>”,
     “location”: {
        “city”: “Atlanta”,
        “state”: “GA”
     }
   },
   “loginTime”: “2024/04/16 07:37:08}

Notice that sometimes Packet X has information consumers don’t care about (e.g. Consumer C doesn’t have a field indicating that the user logged in as a superuser) and sometimes the consumers care about information that Packet X doesn’t have (e.g. Consumer B has fields for password, session duration, and the exact coordinates of the user instead of a city; Consumer C has a field for the name of the user. Notice also that different consumers have different ways of saying “this information wasn’t recorded” - Consumer C sets the value of the missing field to “<unknown>”, and Consumer B sets it to null.)

Now consider a special object where each key is the ID of a consumer (this ID could be a URL in a decentralized system, or a simple string in a centralized one), and the value for each key is that consumer’s representation of Packet X (if it has one.) This object has one entry for every consumer in the whole world:

{
  “A”: {
    “eventDetails”: {
      “actor”: “cypher04”,
      “actorType”: “User”,
      “eventType”: “loginEvent”,
      “eventTimestamp”: “2024/04/16 07:37:08”,
      “eventTimezone”: “UTC -4    },
    “eventMeta”: {
      “login”: {
        “location”: “Atlanta, GA”,
        “isSuperuser”: true
      }
    }
  },
  “B”: {
    “username”: “cypher04”,
    “pw”: undefined,
    “timeUTC”: “2024/04/16 11:37:08”,
    “admin”: true,
    “coordinates”: {
      “value”: [33.753746, -84.386330],
      “exact”: false
    },
    “sessionDuration”: null
  },
  “C”: {
   “user”: {
     “id”: “cypher04”,
     “name”: “<unknown>”,
     “location”: {
        “city”: “Atlanta”,
        “state”: “GA”
     }
   },
   “loginTime”: “2024/04/16 07:37:08  },
  “D”: null,
  ...
}

Consider the (still hypothetical!) case in which we pass this special “representation object” around instead of Packet X. Every consumer can use their ID to look up their own special representation of the Packet X, and then handle that representation in whatever way they want. This still satisfies our goal of objectivity! Every consumer comes to the same understanding of the information conveyed by the packet and can react appropriately. Note that the appropriate reaction may be for the consumer to discard the packet because it’s not something they care about (as in consumer D’s case), or to throw an error because the packet is missing information that the consumer requires to do its job. 

Just like before, we want to figure out a way to express our earlier statement (“The user identified by ‘cypher04’...”, etc.) and have it mean the same thing to every consumer. This time, instead of writing down the statement in a format that is both machine-readable and completely consumer-agnostic, we present a set containing every machine-readable representation that would be meaningful to at least one consumer. This is an extensional way to describe what our data means. 

At first glance this all seems kind of silly. An object that had an entry for every system that could possibly consume our data would be massive: far too big to simply pass around. It’s also not clear yet how any of this ties back into Model Networks.

Let’s make a couple changes. First, we’ll rewrite our “representation object.” Each key will still be a consumer ID; the corresponding value will not be that consumer’s representation of Packet X, but a function that maps Packet X to that representation:[30]

{
  “A”: (packetX) => {
   const returnObj = { eventDetails: {}, eventMeta: { login: {} } };
   
   returnObj.eventDetails.actor = packetX.id;
   returnObj.eventDetails.actorType = “User”;
   returnObj.eventDetails.eventType = “loginEvent”;
   returnObj.eventDetails.eventTimestamp = packetX.eventDT;

   const parsedLocation = parseLocation(packetX.location);
   returnObj.eventDetails.eventTimezone = getTZ(parsedLocation);
   returnObj.eventMeta.login.location = 
     `${parsedLocation.city}, ${parsedLocation.state}`;
   returnObj.eventMeta.login.isSuperUser = packetX.super;

   return returnObj;
 },
 “B”: (packetX) => {
   const returnObj = {};

   returnObj.username = packetX.id;
   returnObj.coordinates = getCoords(parseLocation(packetX.location));
   returnObj.timeUTC = adjustTZToUTC(packetX.eventDT, returnObj.coordinates);
   returnObj.admin = packetX.super;

   returnObj.sessionDuration = null;
   returnObj.pw = null;

   return returnObj;
 },
 “C”: (packetX) => {
   const returnObj = { user: {} };
   returnObj.user.id = packetX.id;
   returnObj.user.name = “<unknown>”;
   returnObj.user.location = parseLocation(packetX.location);
   returnObj.loginTime = packetX.eventDT;

   return returnObj;
 },
 “D”: (packetX) => {
   return null;
 },
 ...
}

This new object is functionally equivalent to the old one - now we just have to pass Packet X around with the object. Then, instead of each consumer doing:

myRepresentation = representationObject[myId]

They do:

myRepresentation = representationObject[myId](packetX)

Using functions as values makes our solution more general - now it works not just for Packet X, but for any packet with the same shape.

Now we’ll add in transitivity. This means that consumers can write transforms among themselves, and as long as there’s some chain of transforms that starts from Packet X and ends at consumer Y’s representation, consumer Y will be able to transform Packet X into a format it understands. At this point we shouldn’t just stuff our functions into a big object anymore; we probably need a central place to store the network of transforms, let people inspect and query it, etc. 

Now we’ve reached the payoff of this thought experiment - we’ve turned our extensional definition into a Model Network! Conversely, we can see that our global Model Network provides objective extensional semantics for all the fields and objects it contains (in the ideal scenario where every service is hooked into our global Model Network.) In this particular case, we can connect Packet X’s model to the network - then, whenever a particular consumer wants to understand what Packet X means, we can find a path in the Model Network between that consumer’s model and Packet X’s model. Then, we can use that path to transform Packet X into a form that consumer natively understands (if such a form exists.)

If this line of argument doesn’t feel intuitive to you yet, consider the case of written language. On its own, the message “I’m going to get a pizza” has subjective semantics - someone who only understands Japanese would interpret it as gibberish - but taken together with our global collective ability to translate from English into any language, it has an objective meaning.[31] Anyone who wanted to know what you were saying could find a way to translate it to their preferred language (potentially via one or more intermediary languages) and come to an understanding that you are about to get a pizza.[32]

...But Why Does That Matter?

Hopefully by this point I’ve convinced you that our theoretical Model Internet is, from one point of view, the extensional counterpart of the Semantic Web, but it may not be apparent why that’s interesting. I think there’s a deep reason why an extensional approach has a better chance of succeeding than the intensional one did. To call back to our human language analogy: translators do not have to be experts in linguistics to do their jobs.

The Semantic Web approach is clearly grounded in the fields of knowledge representation and automated reasoning, which generally take for granted a robust analytical understanding of the semantics involved. Pretty much any intensional strategy is going to be like this - you cannot create a workable, complete, and explicit definition of something without a strong theoretical grasp of what that thing is and how best to describe it in a formal way. The problem is that writing good intensional definitions entails a lot of sitting around thinking about logic and philosophy and the fundamental meaning of different terms; the vast majority of engineers are not doing this and have no desire to do this. If you are a developer at Airbnb, it’s not your job to construct rigorous taxonomies of all the different possible types of dwellings or debate what it means precisely to “book” a room - you only think about these things to the extent that they help you write software that serves the customer and the business. You certainly aren’t worried about defining any concepts in a completely universal and unimpeachable manner. 

The Achilles’ Heel of the Semantic Web was that it simply demanded too much of its users. It not only asked developers to adopt an unfamiliar academic mindset, it asked them to rewrite a lot of their code: a big part of the vision was for software to process and reason about OWL-annotated data in a very general, rules engine-y, GOFAI way[33], but most non-academic applications just use hardcoded business logic instead and that usually works fine. The investment required to get started with the Semantic Web was so prohibitively high that it almost never made sense to even try.

The extensional approach has an edge because it’s less ambitious than its intensional counterpart - it solves a simpler problem. The key observation is that to give a piece of data an objective meaning, we do not actually need to come up with a complete and philosophically satisfying delineation of that meaning; we just need to preserve the meaning of the data as it moves from service to service. Denoting meaning in a truly context-independent way is so difficult and abstract that there’s an entire subfield of AI dedicated to it. On the other hand, preserving meaning (i.e. writing translations) is much more tractable - it may be a grind (and pretty difficult to automate), but it’s the kind of work most engineers are at least familiar with.[34] In fact, it’s basically taken for granted that most new integrations require a hand-built translation between the source and target data models. This means the activation energy for something like a Model Network is pretty low; we just have to convince developers to use a different platform to do the work they were probably going to do anyway (if we wanted, we could even conceivably create a push-button way to port existing integrations into (and out of) a Model Network.)

Intensional Semantic Web-style definitions consist of small amounts of complex data (i.e. OWL ontologies.) Extensional Model Network-style definitions consist of large amounts of simple data (i.e. translations in the Model Network.) A lot of the most successful Web-scale projects are built on small contributions from a very large number of loosely invested participants (for example, consider how Google was able to harvest the innate structure of the Web to triumph over Yahoo’s meticulously hand-crafted directory[35]) - the extensional strategy is informed by this fact.

It bears repeating that this section assumes the existence of an ideal Model Network that completely and accurately connects the vast majority of available services and objects. It goes without saying that if this kind of network ever comes to be, it will take many years to build - the idea is that, as the network grows, we’ll start to converge on our ideal state of perfect objectivity (asymptotically, of course - there will always be some number of imperfections.) Importantly, this is not an all-or-nothing scenario! A small Model Network provides plenty of value, and the value scales up with the size of the network.

Would it take a huge amount of time and effort to build a global Model Network? Of course. Would there be many difficulties and compromises along the way? Of course. But I believe it can be done.

(Here’s a pithy way to sum up this whole section: with the Semantic Web, the semantics live in the schemas; with Model Networks, the semantics live in the translations.)


So What?

We just sketched out a picture of the future where we can use a global Model Network to endow our data and APIs with objective semantics[36], insofar as every piece of software will come to the same conclusion about what those things mean, relative to their own models. But what do we get from that? Why does it matter?

No one can predict the future, of course, but if we could pull this vision off I think it would change the way we develop software in some important ways. A few examples are:

  • Negligible integration costs
  • Portable domain logic
  • More and better software ecosystems
  • Wildcard AI developments

Let’s go over these one by one.

Negligible Integration Costs

In general, high-level software doesn’t “just work” together out of the box. If you want two pieces of software to cooperate, someone has to write adapters to bridge the gap; maybe you fire off a script to do it, maybe an integration specialist at your company uses Mulesoft or something like it, or maybe you use a low-code tool like Zapier - or maybe one of the two vendors involved (or even a third-party) has written a turn-key integration for you. No matter the situation, there is no truly general way to integrate software at the moment, especially if you want externally-developed software to interface with your own internal systems. In practice this incurs a steady-state tax on the use and development of software - let’s call this tax the “integration cost.”

The biggest long-term impact of having objective semantics would be to dramatically reduce integration costs, in most situations. 

This is a big claim to make! If you need some convincing, consider the following:

Software integration is a tricky and somewhat ill-defined problem that includes a lot of moving parts, but many of these are consistent enough across applications to admit standardization. These tend to be the logistic parts: the parts that have to do with moving messages between participants in a secure, reliable, and scalable way. A few simple examples:

  • Message transport and serialization: SOAP/XML was an early standard, but the current frontrunners seem to be REST/JSON and gRPC/Protobuff (and sometimes GraphQL.) Also included in this category are Webhooks and message queues like Kafka.
  • Authentication and Authorization: OpenID and OAuth.
  • Interface Description: OpenAPI on the REST side, Protobuff on the gRPC side.
  • Client Libraries: sometimes, openapi-generator (although discontent with this library has led to the creation of some client generation startups like Speakeasy)

None of these areas are trivial or even easy yet, but increasing standardization has made them (and many other problems) less difficult over time in a straightforward way. It will take a lot of hard work and innovation to conclusively resolve the logistic challenges inherent in integration (and there will be legacy cruft to deal with for a long time - maybe forever), but at least there’s a clear path forward. 

If logistic concerns are fairly uniform across integrations, semantic concerns - i.e. finding agreement on the meaning of messages - are extremely diverse and thus very resistant to standardization. Software is used in a massive number of intricate domains, these domains can be so granular that they only encompass a handful of companies (or even a handful of integrations), and a semantic consensus established in one domain does not carry over to any others. The problem must be solved over and over, and indeed this part of integrating software is still mainly done by hand and case-by-case. I contend that the “hard” (i.e. intractable) part of standardizing software integration is suitably addressing this semantic dimension.

There are some startups attempting to do this by using a “Unified API” strategy, forging ahead with their own standards for select categories (like CRMs, ATSs, Accounting software, and HR software) and making their approach feasible by keeping their schemas quite high-level. This is suitable for prototyping and for simpler integrations, but I don’t think it will scale to the general case.

We’ve already seen two other approaches to achieving general semantic interoperability without the use of global schemas. The Semantic Web, while an ambitious effort, has apparently stalled out for reasons already discussed. The success of Model Networks remains to be seen.

Earlier we touched a bit on how a global, public Model Network could help with integration and portability; here are a few more examples:

  • You have two services running in parallel - one new and one legacy - that overlap in data and functionality. Both services have their models hooked into the public Model Network and are using model-addressed queries to pull data. You want to deprecate the legacy service and move all of its data to the new service. You can do that with the following steps:
    • Ask the public Model Network to see if you can smoothly translate data from the legacy service to the new service (you may need to either augment the new service’s model or figure out how to handle missing fields from the legacy service.)
    • Use a migration tool that moves data from the legacy service’s database to the new service’s database. The migration tool uses the information in the public Model Network to apply the necessary transforms. You can tell the migration tool whether to prefer the legacy service’s data or the new service’s data in the event of a conflict.
    • Point the legacy service’s code at the new service’s database. Since the legacy service is pulling data via the Model Network using a Model-Addressed Query, this is as simple as telling the Model Network to use a new data source when satisfying the query. The Model Network will take care of translating the data back from the new service’s format into the legacy service’s format.
    • Make any changes you need to the new service to reach feature parity with the legacy service.
    • Decommission the legacy service.
  • You want to make a “customer journey” report by combining data from various systems in your data warehouse. You use off-the-shelf ETL software like Fivetran to move your data into your warehouse. All of your first-party data and all the data from your third-party vendors has been hooked into the public Model Network, so it has objective semantics. That means you don’t have to do any transforms on the data to get it into a consistent form; you can write your reports against your own model (or whatever model in the Model Network you find convenient) - then your reporting software can use the information in the Model Network to interpret all of your data warehouse data in terms of the model you wrote the report against, doing appropriate merges and joins as needed. You don’t even have to write the reports yourself if you don’t want to: you could drop-in a premade report that was written against its own model, and the software could still do all the necessary translations for you.
  • You run a solutions engineering team for a startup selling into the enterprise. Luckily, in this hypothetical universe it’s common for enterprises to hook all their data and APIs into the public Model Network, giving them objective semantics. Your team builds reusable, productized software that is also hooked into the public Model Network; because of the translation work the Model Network is doing, you can generally just drop your software into any of your clients’ internal environments and have it work with relatively little manual customization. You can onboard clients to your solution with much less toil and much more code reuse. On the flip side, the enterprises themselves benefit from a much faster build-out cycle when buying new software, lowering the barrier to experimenting and staying agile.

There’s an analogy to draw here with installation costs. Way back when, every new piece of software (not to mention patches and updates) had to be rolled out on company-owned hardware by a specific person or team. SaaS (by way of the Internet) gave us software that had essentially zero installation cost. This not only saved a lot of time and effort for everyone involved, it had substantial qualitative impacts as well. Fundamental improvements to the economics of software distribution shifted the dynamics of the software industry; yes, removing client-side deployment toil meant more people were buying software (which meant more people were making software), but there were also major changes in the way we built products, the kinds of products we built, and even the way we conceptualized the things we were building. 

Reducing or eliminating integration costs would further lower the barrier to buying and/or using software. I think this would again markedly increase the raw amount of software being developed and deployed (especially in formerly cost-prohibitive areas) - but the dynamical changes would probably be different than they were with SaaS. In the remaining sections I’ll go over a few of those potential shifts:

Portable Domain Logic

In the last section I argued you could decompose every integration into a logistic part (the part that has to do with moving messages around) and a semantic part (the part that has to do with interpreting the messages.) In truth I think a version of this is true for every piece of software: when software is manipulating data (which is, reductively, what every piece of software does), either it’s agnostic as to the data’s contents (i.e. it’s concerned with moving, storing, organizing, or retrieving the data - it’s logistic) or it cares about the data’s contents and will behave differently when handed different data (i.e. it’s transforming or updating the data, or creating new data, or making something happen in the physical world, or showing something to the user, etc. - it’s semantic.) Software further down the stack (i.e. systems software) tends to be mostly logistic; top-level software products or applications[37] usually have a lot of both logistic and semantic parts (I’m hard-pressed to think of software that’s mostly semantic.) 

A database, for example, would be mostly logistic. HR software would be more semantic! We (roughly) refer to the semantic parts of business software as business logic

As I mentioned earlier, logistic problems tend to be the same across different software projects. This has led to a pervasive culture of sharing and reuse in the logistic space.[38] If you work anywhere but FAANG, odds are that the vast majority of your infrastructure - from the operating system all the way up to the deployment platform - is either open-source or handled by a vendor. You probably even outsource part of your application code to OSS language-level packages. 

Business logic, meanwhile, is generally not shared between orgs (think about the last time you saw a reusable package that (for example) took all the necessary steps to add a new beneficiary to a health insurance plan.[39]) On its face this kind of sharing doesn’t even really make sense, because business logic is necessarily coupled to the domain models that it references, which makes it very difficult to port. 

Thinking about it from first principles, though - if we did have a way to share business logic, it would make sense to do so, for many of the same reasons that it makes sense to share infrastructure. In a lot of domains, there are features that customers expect every product to have (for example, consider meeting passwords and waiting rooms for video conferencing software, or integrated code reviews for a VCS.) In the spirit of comparative advantage, it would be great to have shared, documented, battle-tested, and standardized implementations of the business logic behind some of these table-stakes features, so product engineers could instead focus on building things that differentiate the product within its domain (this is the same reasoning behind the common advice to use a vendor for infrastructure concerns that aren’t specific to your org.)

In terms of sharing and reuse, I believe Model Networks can help us make the situation for semantic code look more like the situation for logistic code. 

Consider the case where business objects are handed off to high-level third-party service objects that implement certain operations. The Model Network translates domain objects to and from the format expected by the service objects. For example: imagine we’re building a feature to handle PTO requests, and we’ve decided to use an in-process package that encapsulates the standard business logic behind a PTO request feature.[40] Both our model and the package’s data model are hooked into the public Model Network (let’s say the package owner has written a translation between the package’s data model and Gusto’s PTO data model, and we’ve write a translation between our data model and BambooHR’s PTO data model), so the package can use the translations in the Model Network to transform the data that we give it into its own internal data model. Like we saw in an earlier example (the one involving the instrumentation library for a hypothetical OSS billing system) the package has two options for doing this - either it can pull the transforms at runtime or transpile them and bake them in at installation time. Either way, the generic PTO package handles the business logic you’d need to implement a basic PTO feature, including:

  • Exposing a list of allowable actions.
  • Applying the actions to an employee’s PTO information, creating or updating data as appropriate and handing the results back to the requester (the package should not be responsible for persisting anything! It should only manipulate the data handed to it - i.e. it should be a collection of pure functions.)
  • Validating the PTO data both before and after the operation is applied.
  • Disallowing any illegal operations.

The package consumer is still responsible for:

  • Handling the UX and presenting information to the user, (including validation errors and forbidden operations.)
  • Retrieving and persisting user data.
  • Shuttling data from the UX component to and from the business logic package.

(As mentioned, this approach works best for cases where the business logic being outsourced to the package is stable across different consumers. However, it would be unrealistic to require every consumer to have exactly the same business logic. A successful business logic package would probably need to expose hooks to let consumers customize the logic in the package to suit their needs.)

There are plenty of other ways the idea of portable domain logic could be realized in practice (for example, we may see more standalone OSS services that are specialized to specific domain areas), but I think this example illustrates the concept.

More and Better Software Ecosystems

Jack Danger makes a compelling case that mature, well-architectured software products should actually look more like ecosystems under the hood, with many to many relationships within and between lots of different services and user interfaces, serving various stakeholders inside and outside the company, all built on a shared foundation. 

I agree with this position - I think high-quality software creates such an absurd amount of leverage that you should be deploying it in as many parts of your business as you possibly can. In fact, I believe software is at its best when it’s in “ecosystem mode”, operating as a sort of computational fabric that supports a combinatorial explosion of different workflows and use-cases.[41] I think is is true not only for internal tools but also for off-the-shelf software, and as evidence I’ll point to the longstanding trope of the “platform play” - every software company has ambitions of being the indispensably sticky foundation of an expansive “partner ecosystem”, and the companies that achieve this tend to be very successful (e.g. Salesforce.)  Even better is to build an entire suite of services yourself on your own internal foundations; this can take a ton of work, but those that pull it off (Microsoft, Rippling) can reap huge rewards.

There are, of course, structural reasons that platform companies tend to win so much of the market, but tightly integrated software ecosystems also just provide a great user experience. Cloud providers are a good example: AWS may have a lot of rough edges, but the fact remains that you can create a lambda function that pushes events to an SQS queue, set the queue up to dump events into S3, and write a report in SQL on top of the resulting dataset, all by clicking a bunch of buttons in the AWS console. That’s pretty attractive.

As great as they are, ecosystems (internal and otherwise) are very hard to build, and the ones that do work tend to revolve around (or be completely owned by) a well-capitalized power player. When I hear about a new “ecosystem” or “platform”, I generally think of an incumbent cementing their dominance by inducing network effects.

This might be stating the obvious, but I believe bringing down integration costs (i.e. making rich, stable integrations easier to create and maintain at scale between loosely coupled participants) could make it cost-effective for smaller players to form ecosystems on their own. Point solutions banding together to provide their own “platform experiences”[42] would lead to some great UX across the board, and could conceivably change the competitive dynamics of the industry; ecosystem-based network effects would be less of the formidable moat they are today, and customers would no longer have to choose between having an integrated platform or using focused, best of breed solutions - they could get both.

The hope is that lowering integration costs can spur a meaningful increase in the number, quality, and interoperability of software ecosystems. Perhaps we could get to the point where everything kind of feels like an “omni-platform” and using lots of apps together in a single workflow is the norm rather than a hard-won exception.

Wildcard AI Developments

AI (particularly Generative AI) is on everyone’s mind these days, but we’re still so early in its development that it’s hard to predict what the technology will look like in the coming years (much less the ecosystem around it) which is why I’ve designated this a “wildcard” section of sorts - I’m basically trying to guess which way the ball is going to bounce.

Even in these early days, the AI community has discovered some important principles for using Large Language Models (or LLMs) effectively in production. One foundational technique is called “grounding.”

LLMs are powerful but single-minded completion engines - if you feed them a piece of text, they will do their best to predict what is most likely to come next, based on what they’ve observed in their training data. LLMs can “absorb” a lot of knowledge this way, especially if it’s common knowledge (i.e. if it’s well-represented in the training set.) For example, if you hand one of the major models the following prompt: “the three branches of the American federal government are: ” it will almost certainly respond with “the executive branch, the legislative branch, and the judicial branch.” This is the truth, and because it’s the truth, it’s by far the most likely way that sentence would be completed in the training data (since relatively few people on the Internet are inclined to tell deliberate lies about the structure of the federal government.) 

On the other hand, LLMs are not very forthcoming when you ask them for information that is not present in their training set (many times this is information specific to a specialized domain.) Instead of communicating that it is unable to answer with sufficient confidence, a basic LLM will do the only thing it knows how: find the most likely completion of the prompt you provided. Usually this is something that, statistically, looks like it could be in the training set, but isn’t actually in the training set. The end result is often an assertion that sounds plausible but is actually false. These are called hallucinations.

(An example: I asked GPT-4 for a fun video game my partner and I could play together, and it enthusiastically recommended a wintery platformer called “Icy Couples”. Icy Couples sounded cute, but sadly it does not exist.)

Grounding is a way to get around hallucinations. Informally, grounding a general-purpose LLM means finding a way to augment its knowledge base with a smaller, more specific dataset (for example, if you were building a chatbot that answered questions about your codebase, you’d want to ground it in your own source code, specs, documentation, etc.) For a while the weapon of choice for grounding a model was fine-tuning; in fine-tuning, the standard training process is simply continued with a novel training set until all the new information is incorporated into the weights of your now-specialized LLM. Fine-tuning is arguably the most obvious way to bake new information into a model - but it turns out you can also just take the things you want the LLM to know about and inject them directly into the prompt. For example, your prompt might look like: “

    Please answer the following question, using the following documents.

    Question:

    <question>

    Documents:

    <documents>

    Write your answer in the json form:

    {{

        "answer": "your answer"

    }}

    Make sure your answer is just the answer in json form, with no commentary.

 “, where “<question>” is the specific question you’d like to ask, and “<documents>” contains all the new information you’d like the LLM to reference. This may seem like cheating - like it’s a little too convenient to be effective in practice - but it works! In fact, it often works better than fine-tuning, and it’s much easier to implement. This technique, called Retrieval-Augmented Generation (or RAG), is now a very popular method of grounding.

Most implementations of RAG don’t actually pass the user’s query directly to the underlying LLM. Instead, they use the query to fetch supplemental information (most of the time this is unstructured information), then they construct a new prompt containing both the supplemental info and the user query, then they hand the newly constructed prompt to the LLM. The obvious question is how to do the first step: how do we retrieve a specific document based only on a natural-language query from the user? 

The “standard” approach (insofar as there is a standard way to do anything this early in the game) is to store the information you want to access in a vector database. Without getting too deep into the details - when a document is stored in a vector database, the natural language contained in that document is transformed into a vector embedding, and then indexed by that embedding. A vector embedding is just a point in n-dimensional vector space, but the embeddings vector databases use have a special property - if two pieces of natural language have similar meanings, their embeddings will be close together. For example, the embedding for “a bushel of ripe oranges” will be closer to the embedding for “a lot of juicy Valencias” than it will to the embedding for “my brother’s collection of ball bearings.” This concept of similarity is general and flexible enough that we can usually expect the embedding of a document to be reasonably close to the embedding of a sentence describing the content of that document. So, if we want to search for a document based on a natural-language description that the user provided, we simply compute the embedding of the description, and then look for the closest neighbors of our embedding in the vector database. Then we return the documents attached to those neighbor embeddings as our search results.

Vector databases are a great way to store and retrieve unstructured documents, but there are also ways to do RAG using structured data (e.g. rows from a data warehouse.) In many of these cases the LLM is also provided with a detailed listing of the relevant table schemas or object models. There’s a version of this approach for AI agents[43] that also includes typed APIs that the agent can call when it needs to. 

As far as I know there isn’t an official name for this “structured context + actions” approach to RAG and grounding (I like to call it “structured grounding”), but I’m a big fan of it. I think that providing an agent with both raw data and a formal and explicit model of your domain reduces the amount of inference it needs to do to understand what your data means (and the number of fiddly details it needs to get right to successfully do what you asked it to.) I believe the most promising use cases of AI are these “cybernetic” setups where the human and the LLM both do the parts of the job that they’re best suited for. In the case of structured grounding, the humans provide a precise, pre-constructed structural layer, and the LLM does inference on top of it.

It’s important to note that most agents will need to pull data from and push data to a large number of loosely affiliated sources. There are various ways to connect and reconcile all the different schemas and datasets involved. One of these is just to let the LLM figure out how to wire everything up on its own - for reasons I detailed earlier, I’m not confident that this is a bulletproof strategy.[44]

Another approach is to find a way to transform data into a unified model before handing it off to the agent (and to reverse the transformation when the agent hands data back, if necessary.) Probably the most high-profile example of this is Microsoft Copilot - Microsoft’s flagship AI agent. Copilot is grounded in the Microsoft Graph - a unified API and data model that encompasses all of the services in Microsoft 365. Palantir also seems to be throwing their weight behind this paradigm: they encourage customers to model their available data and capabilities using an Ontology (not to be confused with a Semantic Web ontology!), which is then used to ground Palantir’s own AI product.[45] Their workflow looks something like this:

Palantir’s flavor (and, to a degree, Microsoft’s flavor) of grounded AI is a pretty standard platform play, in that you need to create and manage a central Ontology using Palantir software before you can start grounding any models. We touched on some of the pros and cons of this kind of approach before: it can work great if the whole company is bought into the Palantir product and philosophy, but otherwise Ontology maintainers are faced with the unenviable task of keeping up with all the moving parts of the org.[46]

Let’s think about how we could use our global Model Network for structured grounding. Some reasons we might want to do this:

  • It doesn’t lock clients into a single platform or a single set of source systems - as we’ll see, we can implement grounding via Model Networks in a way that lets users get the benefits of AI using whatever stack they want.
  • It doesn’t require a centralized object model, so there’s no bottleneck on a single person or team.
  • It doesn’t require the actual LLM to try and reconcile disparate APIs on its own.

Here’s one way (out of many!) that we might implement this:

First of all, LLMs need to be able to run Model-Addressed queries and actions using the Model Network (this is a matter of exposing a few APIs on the Model Network side and calling them from the LLM using LangChain or an equivalent.)

Next, we need a way for LLMs to be able to determine which parts of the Model Network (i.e. which field IDs, object IDs, etc) they should use in their queries or commands, based on the natural language prompt they receive from the user. 

As mentioned, whenever you want to pull data based on natural-language descriptions, the usual solution nowadays is to set up semantic search using vector embeddings; in our case, we can help LLMs figure out how to construct the appropriate queries by indexing all the field and entity IDs in the Model Network using a global vector database.[47] The embeddings for each ID should be constructed using all available documentation for that ID, as well as a natural-language summary of that ID’s general function in the network (i.e. “this field ID represents an integer that is a non-nullable foreign key representing a one-to-many relationship between entityID-A and entityID-B…”, and so on.) The goal is for an LLM to be able to take a prompt and use the vector database to construct queries in terms of some model(s) in the Model Network (it’s not particularly important which ones it chooses, since we can use the Model Network to rewrite the queries in terms of the “correct” models at runtime).

Let’s look at an example flow, end-to-end:

  • We want to keep track of cloud spend, so we give this prompt to our agent: “Please figure out how much we’re spending on all of our different units of cloud infrastructure this past month. If we’re spending more on a piece of infrastructure than we budgeted for, find the engineer responsible for it and cut a ticket asking them to investigate.”
  • In our particular org we use Azure; we also use Sage as an ERP, and Pivotal Tracker to keep track of tickets. We also use an internal database to keep track of which engineers are the primary maintainers for each piece of infrastructure. 
  • While processing our prompt, our agent sends a request to the global Model Network platform. The Model Network platform has its own LLM which takes a natural language description of a query and returns a fully written Model-Addressed Query the requester can run against the Model Network. 
  • The prompt our agent sends to the query-writing LLM is: “Write a query that pulls the following information for an organization:

    For each unit of infrastructure:

              - The amount in dollars the organization spent on the infrastructure this past month.
               - The total budget for the infrastructure this past month.
               - The engineer responsible for the unit of infrastructure.”

  • Our agent also sends the query-writing LLM a list of the different services our org is using; this ensures that the generated query only pulls information our org is able to provide.
  • The query-writing LLM uses the provided information along with the global vector database we mentioned earlier to construct a Model-Addressed Query for our agent to use. The query-writing LLM writes the query in terms of AWS’s public model, Netsuite’s public model, Jira’s public model, and PagerDuty’s public model (PagerDuty’s public model has a concept of an infrastructure owner, so we connected our internal database’s model to PagerDuty’s to hook it into the public Model Network.) The query-writing LLM picks these models because they were the most thoroughly documented, but the Model Network itself can take care of performing the translations needed to pull data from the sources we’re actually using.
  • The agent runs the query against the Model Network, and uses the resulting data to perform the necessary calculations. After it’s finished, it follows a similar process to ask the query-writing LLM to write it a Model-Addressed Action that cuts a ticket in the appropriate system. It executes that Model-Addressed Action using the Model Network, and informs us that it’s finished with our request.

Here’s a visual representation of all the moving parts. (Note that, to keep the diagram from becoming too cluttered, I’ll show the process in two parts - remember that these parts would happen sequentially in response to a single prompt.)

Again, this is all very theoretical, but the idea is to give LLMs a stable, unified way to understand and interact with their users’ software and data completely out-of-the-box, with no integration or reconciliation work demanded from the end user or from the LLM itself. This could allow for truly plug-and-play AI Agents that can immediately be put to work in any context - which I think is a powerful idea.


That’s it! Thanks for reading! I’m always happy to answer questions or comments about the ideas in this post - just drop me a line at mprast@get-nerve.com.

Footnotes

[1] Not always, of course. Everybody who’s been around long enough has their own crazy long-tail story; my personal one is from my time at Microsoft, where I spent a week trying to debug a mysterious race condition only to find out from the C# team that ReaderWriterLock was broken.

[2] It can also be a good idea to start with the models when building new software. To quote Linus Torvalds: “git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful.”

[3] These vendors are usually described as “platform”, “infrastructure”, or “API” companies.

[4] Domain-Driven Design has a loose analogue for this type of mental model called a “Domain Model” (although these are more formal and usually written down.)

[5] “Were there but an international language, all translations would be made into it alone ... and all nations would be united in a common brotherhood.” - L.L. Zamenhof, creator of Esperanto.

[6] From Wikipedia: “While many of its advocates continue to hope for the day that Esperanto becomes officially recognized as the international auxiliary language, some…have stopped focusing on this goal and instead view the Esperanto community as a stateless diasporic linguistic group based on freedom of association.”

[7] The analogous term in Domain-Driven Design is “ubiquitous language”.

[8] This idea of translation is sometimes mentioned when discussing Bounded Contexts in DDD, especially in the context of the Anti-Corruption Layer pattern. In this article I’m taking a stronger position on translation then DDD does; I’m basically arguing that (an equivalent of) an Anti-Corruption layer should be used whenever teams need to work together - not just to guard against messy code or legacy systems (another way to say this is that I think the Partnership communication strategy should use translations like the Anti-Corruption Layer strategy does.)

[9] If you’re unfamiliar with this pattern, Thoughtworks sums it up pretty well: “In Consumer-Driven Contracts, each consumer captures their expectations of the provider in a separate contract. All of these contracts are shared with the provider so they gain insight into the obligations they must fulfill for each individual client…This lets them stay agile and make changes that do not affect any consumer, and pinpoint consumers that will be affected by a required change for deeper planning and discussion.”

[10] Not to belabor the point, but these don’t have to literally be exposed over RPC; they could be local method calls if you’re working in a monolith and the different services are actually just different components, etc. etc. etc.

[11] I’m not using firewall here in the networking sense, rather, this metaphor refers to physical firewalls - just as physical firewalls are meant to contain a fire, refactor firewalls are meant to contain the scope of code changes in your codebase(s).

[12] Note that we’re not using “high-level” and “low-level” here in the sense of high-level and low-level infrastructure, or high-level and low-level software!

[13] Conceptually! Actually implementing it in a general and robust way is surprisingly difficult - more on this later.

[14] It’s worth noting that you don’t have to adopt a hierarchical network structure. I’d recommend it, for the reasons I outlined earlier, but you can use whatever structure you’d like!

[15] You guessed it - this example is inspired by Gitlab's org structure.

[16] Note that providing a list of reviewers to the Deploy stage would entail passing this list through the Dev Section -> CI/CD Section transform.

[17] Although the name makes reference to Queries only, it’s also possible to build Model-Addressed Writes and Model-Addressed RPC calls. The central concept is the same, but now sources declare whether they support writing to a particular field or performing a particular action. For simplicity, I’ll refer to all three of these techniques as “Model-Addressed Querying”

[18] This is similar to the philosophy that some GraphQL companies take.

[19] To keep these diagrams uncluttered, I’m omitting the query translation stage and sticking to the case where we pull data from one service. Remember that we can pull data from multiple services using a Model-Addressed Query and the process would look largely the same!

[20] Another advantage of having an out-of-process agent is that you could use one agent to serve many consumers at once.

[21] Model-Addressed Queries in particular can really benefit from bidirectional transforms.

[22] To be clear, Model Networks are intended to facilitate safety and portability - we’re just moving our focus from one to the other in this section!

[23] To the greatest extent possible, of course. There might be reasons a complete translation isn’t feasible - for example, sometimes one model will have a field that the other one simply doesn’t have, or vice-versa.

[24] I think it’s reasonable to assume that these kinds of concerns can be standardized (or close to it), given the success of OAuth/OpenID, GraphQL, OpenAPI, etc. I also think it’s reasonable to expect that we would need to do some legwork to cover the long tail of sources that are using older or custom-built ways of exposing their data, but I think this is in the realm of possibility, especially if we allow data providers to build their own connectors/plugins.

[25] This isn’t the only possibility, of course! It’s possible, for example, that a consortium of billing projects could come together to make an “Open Billing” public virtual model (which could then be connected to other public models, e.g. Stripe’s model, and maybe some domain-specific event models.)

[26] It’s also conceivable that some high-level nodes could be Virtual Models.

[27] “'Twenty years from now, we'll look back and say this was the embryonic period,' said Tim Berners-Lee, 50, who established the programming language of the Web in 1989 with colleagues at CERN, the European science institute.” [source]

[28] "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." (emphasis mine) [source]

[29] “For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning. Artificial-intelligence researchers have studied such systems since long before the Web was developed.” [source]

[30] This function is presented in Javascript and there are a couple of helper functions that are elided; note that this “function object” is only presented for illustration purposes.

[31] It may be more accurate to say “psuedo-objective” here. Every language has its own idioms and idiosyncrasies; accordingly, there are some statements which cannot be translated into every language with perfect fidelity - just as not every data model can completely represent every piece of information (as we previously noted.)

[32] Of course, machine translation has progressed to the point where finding a human translator is not always necessary. This raises the question: could we use generative AI to build translations between different data models? We might even be able to translate data automatically and completely on demand, in which case we probably wouldn’t need something like a Model Network. 

Personally I believe that this is one of those scenarios in which the steady-state is gen AI automating 80-90% of the work with humans stepping in and tweaking the details at the end. I’m pessimistic about getting to 100%; there’s a huge variety of domain areas that models and translations could be in, and I think gathering enough training data to get perfect performance in every domain is simply going to be impractical. As is often the case in software, small errors in translation functions can cause serious problems, so I think humans ought to remain involved in authoring and managing them. As such, I believe we’ll still need something like a Model Network to keep complexity under control.

[33] Good Old-Fashioned AI.

[34] Interestingly, the Semantic Web did have a mechanism to create mappings between ontologies, but it appears that this was intended to augment ontology building, not to replace it. Semantic Web proponents seemed to be focused on getting people to write new, semantically rich ontologies instead of finding ways to link together the markedly simpler schemas that people were already using.

[35] Consider also the “Bitter Lesson” of AI research: AI has historically included many symbolic systems grounded in intricate formal theories about the workings of the human mind, but these efforts, while admirable, are reliably out-performed by straightforward approaches that leverage massive amounts of real-world data. As Sutton puts it: “The eventual success [of these data-oriented systems] is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.”

[36] Or ever-improving approximations of objective semantics!

[37] With exceptions, of course - many of them from infrastructure or dev tools companies.

[38] Which, to be clear, I think is a great thing! There’s no reason to re-develop these solutions over and over.

[39] E-commerce is a notable exception here! One could also argue that API companies like Stripe and Twilio count as shared business logic, but 1) comparatively there are not a lot of these 2) there aren’t purely semantic - there’s a lot of logistic infrastructure they include too and that’s usually a big part of the value prop.

[40] If you squint you can kind of conceptualize this as the backend equivalent of a headless component.

[41] In other words, I’m a staunch adherent of the Unix Philosophy.

[42] A timely example is the newly-formed Lattice Excellence Alliance, which is conceivably a response to Rippling’s platform ambitions.

[43] In short - Agents are LLMs which are granted the power to perform or suggest external actions in response to requests from the user. There’s a lot of work going on to make Agents capable of performing longer-term autonomous workflows.

[44] To refresh your memory: “there’s a huge variety of domain areas that models and translations could be in, and I think gathering enough training data to get perfect performance in every domain is simply going to be impractical. As is often the case in software, small errors in translation functions can cause serious problems, so I think humans ought to remain involved in authoring and managing them.”

[45] “By leveraging the encoded relationships and rules within ontologies, AI systems can understand context, preferences, and nuances in a way that aligns with human thought processes, making decisions or recommendations based on a nuanced understanding of complex interdependencies. This capability transforms ontologies into a cornerstone of artificial intelligence, bridging the gap between raw data and meaningful insights by infusing AI with a scaffold of human knowledge. Integrating ontologies into AI systems, particularly agents, provides a framework to allow AI to take actions that affect the real world. Ontologies provide a semantic framework that enhances data understanding and supports complex reasoning processes. For AI agents, this means an ability to comprehend the semantics of the data — understanding not just the data itself but its context and relationships.” [source]

[46] I’m sure some readers are going to argue that HyperAuto can automate some or all of this work - as far as I can tell, HyperAuto is mainly a package of detailed, pre-written reports, pipelines, and metrics for several common business use cases. It’s impressive, to be sure, but as an integration solution I’d put it in the same bucket as Unified APIs - and as I discussed earlier, those don’t really generalize.

[47] This technique is not without precedent; for example, Pinterest uses vector embeddings for table selection in their internal text-to-SQL tool.