yggdrasil-network.github.io/_posts/2019-09-01-actors.md

---
layout: post
title:  "Acting out"
date:   2019-09-01 21:00:00 +0000
author: Arceliar
---

### Overture

We've recently rewritten much of Yggdrasil's internals to change from Go's native [communicating sequential processes](https://en.wikipedia.org/wiki/Communicating_sequential_processes) (goroutine+channel) style to using an asynchronous [actor model](https://en.wikipedia.org/wiki/Actor_model) approach to concurrency. While this change should be invisible to the average user, it dramatically changes what we developers need to think about when working on the code. I thought it would be useful to explain a little about the motivation for rewriting things this way, and what the consequences are.

Caution: theatre puns and references throughout, because `Actor`s.

### Exposition

Yggdrasil is written in the Go programming language. Go makes it easy to start a function running concurrently, and gives developers the tools they need to make concurrently executing functions communicate, but it's not always easy to use them correctly. To be clear, the things I'm about to rant about are all fixable. Working around them is a normal thing to do in Go. More importantly, it's a case where doing things the obvious way (which is sometimes even safe in isolation) leads to *wrong* behavior in a larger program. I prefer models where the obvious thing is still correct, and non-obvious things are only needed as a performance optimization.

#### Composition

There's a common pattern that has emerged many times in the Yggdrasil code base. We'll have a `struct` with some mutable fields that need reading or updating, such as information about a particular cryptographic session, or the switch's table of idle peers and buffered traffic. Since shared mutable state is hard, and Go is all about "[Share Memory By Communicating](https://blog.golang.org/share-memory-by-communicating)", we'll have packets get passed to a dedicated worker goroutine that "owns" that particular `struct`. The worker uses information from the packet and the owned `struct` to do whatever it is needs to do, updates these things accordingly, and passes the packet along to the next goroutine in the pipeline.

This often results in a "`for select`" pattern, where goroutines sit in an infinite `for` loop and `select` on several channels, to wait for packets to process or various types of signals from other goroutines. There are a few ways around it (with heavy use of `reflect` or `chan interface{}`, for example), but in most cases, every `select` statement needs to fully enumerate every behavior that the goroutine may need to engage in at that point in the code. If there's a common set of `case`s that always need to be handled, and then a few exceptional `case`s that may or may not matter (possibly when the associated `struct`s the workers are using are similar but not exactly the same types, or as the state of a `struct`'s fields change), then that typically involves multiple `select` statements with only the addition or modification of one or two `case`s.

Go embraces composition in its type system, but `select` statements (and channel operations in general) make execution resistant to composition.

#### Deadlocks

The "`for select`" pattern is safe, as far as I know, if the flow of messages through the program form a directed acyclic graph. However, in our case, cycles emerge if we try to handle things in the obvious way. For example, a cryptographic session needs to somehow get outbound encrypted traffic to the switch, but incoming encrypted traffic also needs to make it from the switch to the sessions for decryption (via the router, which is responsible for, among other things, identify which session is associated with the traffic).

When cycles of goroutines naively pass messages over channels, deadlocks are all but inevitable. There are a few ways to address this, but they're not always appropriate. Ideally, we would change the design to remove cycles, but this is not always possible, and may require significant changes to the workflow in cases where it is possible. In practice, what we'd actually do is either buffer messages (having some dedicated reader goroutine to take the message, add it to a slice, and then pass it to the real destination ASAP) or drop messages entirely (with a `select` statement that aborts and does cleanup in a `default` case, or by having a dedicated reader that drops messages more intelligently, such as from the front of the queue, under the assumption that older messages are less useful).

#### Leaks

Typically, when a goroutine is started, it continues to run until either the function returns or the program exits. For this reason, if a goroutine executes any statements which can block (such as a channel operation), it's important to include some `case` which signals that it's time to return. Forgetting to do this can result in goroutine leaks. [Never start a goroutine without knowing how it will stop](https://dave.cheney.net/2016/12/22/never-start-a-goroutine-without-knowing-how-it-will-stop), or so the experts say.

This is sometimes harder than it needs to be. To be blunt, the single producer N consumer cases are fine, you just close the channel and have all the consumers take this as a signal to exit. Anything involving multiple producers requires some sort of signaling to indicate that all producers have exited. Since you're using a channel already, the obvious option is a `select` statement with another channel that closes to signal shutdown, and then something like e.g. a [`sync.WaitGroup`](https://golang.org/pkg/sync/#WaitGroup) to wait for all producers to exit before closing the channel. Until your number of producers needs to change at runtime, and you realize that this races if you start to `Wait` before `Add`ing everything to the group, so you need to implement a custom counter, and be careful that additions and subtractions can also race and cause it to shut down early. And have fun solving it, because with how much `select` resists composition and code reuse, you're going to be implementing the same patterns over, and over, and over, and over...

It's not that this is some impossible problem to solve, it's just that Go's take on the [CSP](https://en.wikipedia.org/wiki/Communicating_sequential_processes), combined with the rest of the tools the language gives you, makes it easy and concise to run thing the *wrong* way, and leads to comparatively complex and delicate code when trying to run it the right way. At least, that's my personal view of it based on my experience so far, but it probably varies some based on the problem the code is trying to solve.

### Rising action

The [actor model](https://en.wikipedia.org/wiki/Actor_model) is another programming paradigm that embraces concurrency with a "share memory by communicating" philosophy.

For our purposes, an actor is basically a data type with a few special properties:
1. It has an inbox where messages to the actor are placed.
2. It has an associated unit of execution, such as a thread, which processes messages from the inbox one at a time.
3. Rather than exposing ordinary functions for other code to call, the actor exposes *behaviors*. A behavior is a function which has no return value, and is executed only for its side effects. When an actor `A` calls a behavior of an actor `B`, what really happens is that `A` places a message in `B`'s inbox, and `B` processes that message by executing some code.

Different implementations differ on details after that, such as what order messages are processed in, if actors are allowed to wait for a particular type of message before continuing, whether actors run locally or are distributed across a cluster, etc., but they tend to all include some version of the broad strokes above.

### Turing point

<!-- a play on "Turning point", aka the Climax of a classic 5-act play structure, which is what this post's structure is modeled after if you hadn't figured it out by this point -->

I'm particularly fond of the [pony](https://ponylang.io) programming language's take on the actor model. I really can't say enough nice things about their approach, and fully describing it is beyond the scope of this blog post, but if you come out of here with an interest in the actor model, then I highly recommend checking out that language. Maybe watch a few of the talks from the developers that have been posted to YouTube, or read their papers about what is *easily* the most promising approach to garbage collection I've ever come across.

Anyway, I don't actually work on anything written in pony, but I like their version of the actor model so much that I decided to see if I could trick Go's runtime into faking it. The result is [`phony`](https://github.com/Arceliar/phony), which manages to do most of what I want in under 70 lines of code. When we write code using this asynchronous message passing style, instead of ordinary goroutines+channels, the implications are pretty significant:

1. There are no deadlocks. Message sends always succeed, and are quite fast (it doesn't even require [CAS](https://en.wikipedia.org/wiki/Compare-and-swap) instructions in the normal case).
2. Inbox sizes stay small due to backpressure: if the sender sees that the receiver's inbox has too many pending messages, it will schedule itself to stop at some deadlock-free safe point in the future, to wait until the receiver signals that it's handled the message.
3. `Actor`s are *shockingly* lightweight: on a modern 64-bit processor, an idle `Actor`'s only resources are 24 bytes for an empty `Inbox`, some of which is padding that may not apply if embedded into a struct. In particular, an idle `Actor` with an empty `Inbox` has no associated goroutine, so it requires no stack.
4. The lack of a goroutine also means that idle `Actor`s, even cycles of `Actor`s, can be garbage collected automatically.
5. Any `struct` that embeds an `Inbox` satisfies the `Actor` interface. Since `Actor`s encapsulate their own unit of execution, it means the range of behaviors that unit of execution can engage in are encoded into the type system and can even be abstracted through `interface` types. In my opinion, the resulting code is cleaner, easier to read and understand, and far easier to reuse or extend than the `for select` pattern from goroutine+channel use.

### Falling action

I'm happy enough with the current state of `phony` that I decided to start migrating the `yggdrasil-go` code base to use it. This is still work in progress (there are some non-`Actor` goroutines around the edges of the code, mostly in main `Accept` loops and that sort of thing), but the hot paths are now `Actor` based.

Most of this was done in a weekend and came together with surprisingly little pain. I had exactly 2 crashes the entire time (1 accidental `nil` pointer deference and 1 legitimate bug I needed to fix in `phony`), and more importantly, 0 deadlocks. Most things just worked as intended the first time they compiled. There were a few bugs to work out when I was rewriting the `link` code, but nothing compared to the mess I had to deal with when writing the old code (which was a couple of horrifying interdependent `for select` loops to build a state machine).

So by now you're probably wondering what any of this looks like in practice. Just to give a generic example, suppose we have some struct with an exported function that needs to run code on a worker goroutine. We could end up with something like the following when writing Go in the CSP style:

```Go

// This is the function we want the worker to run.
func (n *NonActorStruct) theFunction(arg1 Type1, arg2 Type2) {
    // this is where the code we actually care about goes, the rest is basically boilerplate
}

// This is the struct that we want the worker to own and manipulate.
type NonActorStruct struct {
    inputForTheFunction chan argsForTheFunction
    // fields we care about, plus maybe more channels for other things
}

// Needed to initialize the channel to a working state
func NewNonActorStruct() *NonActorStruct {
    n := NonActorStruct{
        inputForTheFunction: make(chan argsForTheFunction),
    }
    return &n
}

// This is just a helper struct to carry arguments for the function.
type argsForTheFunction struct {
    Arg1 Type1
    Arg2 Type2
}

// This is the function we export.
func (n *NonActorStruct) RunTheFunction(arg1 Type1, arg2 Type2) {
    n.inputForTheFunction<-argsForTheFunction{arg1, arg2}
}

// This is needed to start the worker, otherwise things block.
func (n *NonActorStruct) Start() {
    go func() {
        for {
            select{
            // cases for other things we may need to do would also be here
            // presumably at least one is involved in safely shutting down
            case args := <-n.inputForTheFunction:
                // We could possibly have a switch statement here
                // Then switch on the arg type to pick which function to run
                n.theFunction(args.Arg1, args.Arg2)
            }
        }
    }()
}

// This is needed to stop the worker when we're done.
func (n *NonActorStruct) Stop() {
    // Actual implemenation depends on what else the worker does in its loop,
    // but it probably just sends a specific message and/or closes some channel.
}

// Then to use the code, we have something like:
myStruct := NewNonActorStruct()
myStruct.Start()
defer myStruct.Stop() // Or arrange this to happen somewhere else
myStruct.RunTheFunction(arg1, arg2)
```
<!-- just to reset ugly highlighting in my editor, ignore me -->

When migrating to the actor model, the basic pattern that emerged was to embed a `phony.Inbox` into any `struct` we wanted to make into a `phony.Actor`, and then define functions of the struct like so:

```Go

// This is the function we want the worker to run.
func (a *ActorStruct) theFunction(arg1 Type1, arg2 Type2) {
    // this is where the code we actually care about goes, the rest is basically boilerplate
}

// This is the struct that we want the worker to own and manipulate.
type ActorStruct struct {
    phony.Inbox // This defines the Act function, satisfying the Actor interface
    // fields we care about
}

// This is the function we export.
func (a *ActorStruct) RunTheFunction(from phony.Actor, arg1 Type1, arg2 Type2) {
    a.Act(from, func() {
        a.theFunction(arg1, arg2)
    })
}

// And then to use it, an Actor x would run something like:
myActor := new(ActorStruct)
myActor.RunTheFunction(x, arg1, arg2)
```

And that's about it. The first argument to `myActor.RunTheFunction` also `nil`able, if we have non-`Actor` code that needs to send a message, it just means there's no backpressure to slow down the non-`Actor` code if it's sending messages faster than the `Actor` can handle them. A `phony.Block` function exists to help non-`Actor`s wait for an `Actor` to process a message before continuing, since this seems like a common enough use case (especially when a package wants to export a non-`Actor` interface that uses `Actor` code internally).

What's great is that we don't need to think about starting or stopping workers, deadlocks and leaks are not possible outside of blocking operations (e.g. I/O), and we can add or reuse behaviors just as easily as any function.  I find the code easier to read and reason about too.

I/O is one rough spot, since an `Actor` can block on a `Read` or a `Write` and not process incoming messages as a result. This isn't really any worse than working with normal Go code, and the pattern we've adopted is to have separate `Actor`s for `Read` and `Write`, where one mostly just sits in a `Read` loop and sends the results (and/or error) somewhere whenever a `Read` finishes. These two workers can be children of some parent `Actor`, which is the only one the rest of the code needs to know about, and then all we need to remember to do is close the `ReadWriteCloser` (e.g. socket) at some point when we're done. This is the sort of thing that we'll eventually want to write a standard `struct` for, update our code everywhere to use it, and then never have to think about it again. In the meantime, we have a couple of very similar implementations for working with sockets or the tun/tap device.

### Dénouement

The Go language makes concurrency easy, but for some problems it can be difficult to do safely out-of-the-box. However, the language provides the tools needed to implement an actor model approach very easily. While I won't claim that the actor model is a panacea for all development woes, Yggdrasil by its very nature requires us to think about networks of nodes communicating asynchronously, so it makes sense to use a programming paradigm that lets us model that approach more explicitly in our code base. Outside of a couple of corner cases (namely blocking I/O for the network sockets and the tun/tap device), we expect this to obviate any need to even thing about deadlocks, make development easier moving forward, and generally lead to a better user experience as a result. The code migration is still a work in progress, but `Actor`s have replace `for select` workers along the hot paths through the code (minus 1 crypto worker pool in the session code) and will slowly replace synchronization primitives in the remaining code base. The current code has been merged into our `develop` branch, and I'm quite excited to see it land in Yggdrasil `v0.3.9`, along with the usual bug fixes and incremental improvements, which we plan to release in the near future.