Yggdrasil Network

Upcoming v0.5 Release

2023-10-22T00:00:00+00:00

Introduction

With the v0.5.0 release coming soon, now seems like a good time to explain what we’ve been working on for the past couple of years. While we’ve generally been pretty happy with v0.4.X, there are a few problems with that design which can cause the network to behave in ways we do not like. This blog post is meant to give a short review of how v0.4 works, explain the problems with this approach, and describe the changes we’ve made in v0.5 to try to address them.

Background

The v0.4.X design has 3 major components to the routing scheme:

A DHT-based routing scheme, used to route traffic when no route to the destination is known.
A greedy treespace routing scheme, used to route certain protocol traffic in the DHT and the “pathfinder” for source routing.
A source routings scheme, which encodes a path found through treespace in packet headers, so traffic can take a more direct route than what the DHT offers (and keep routing while the state of the tree is changing).

The life cycle of a connection walks through those three stages in sequence. During the initial key exchange, no path to the destination is known, so traffic is routed over the DHT. When nodes receive traffic that was routed over the DHT, they initiate pathfinding, to find a more efficient route through treespace. When pathfinding finishes, they switch to source routing, which encapsulates the DHT-routeable packet inside of a source routed packet. If a source routed packet ever hits a dead end, the source routing header is removed and finishes routing via the DHT. Receving this DHT routed packet triggers pathfinding in the background, so a new path can be found.

Overall, this design works well. Nodes can begin communicating (in our case, sending key exchange traffic) before needing to look up any routes, and things fall back gracefully. No special protocol traffic is needed to detect broken paths, since the DHT fallback takes care of signaling that a path has failed.

Problems

While I don’t have any concerns with the overall design of v0.4, the individual components all have issues.

First and foremost, the DHT design used in v0.4 does not scale as well as we had hoped. Nodes need to keep track of not only the paths to their keyspace neighbors, but also any such paths that go through that node. This means some fraction of nodes are stuck knowing a large percentage of all paths through their node. That leads to high memory costs and potentially high bandwidth. The v0.4 network’s DHT bandwidth use is relatively low, since the DHT is predominantly hard state, but attempts at a more secure DHT all led to soft state designs where the bandwidth costs can become significant. Without securing the DHT, it would remain vulnerable to some attacks (or behave badly in the presence of misconfigured nodes, such as accidental anycast nodes). The more insidious issue is DHT convergence time: it takes O(n) “steps” to converge in the worst case, and we have good reason to believe that some typical use cases experience this. Additionally, the hard state design required actively monitoring each peer link, to quickly detect when a link is dead. This leads to a lot more idle traffic between peers than what we’d like to see.

Secondly, the tree can produce inconsistent views of the network, depending on which peer’s information a node pays attention to. This leads to “flapping”, when a non-parent ancestral link fails, as nodes tend to switch to a new parent that used the same (now broken) link, but which hasn’t had time to advertise the link failure yet. So nodes tend to switch from their parent to an alternative, and then back to the original parent, when the alternative eventually advertises the same failure. That flapping causes down-tree (child) nodes to flap, which can cascade through the network. There are mechanisms in place to throttle how fast things flap in v0.4, but that’s a bandaid fix to an underlying problem in the design.

Lastly, source routing is good in principle, but the packet format we used for this is not. It’s too easy for a malicious node to insert multiple redundant hops to produce a (finite) loop, which can waste bandwidth on a targeted set of links.

Changes

Quite a number of changes have been made to the design of Yggdrasil in an effort to combat the above issues. The new approaches are not necessarily how we want the network to function long term, but rather they are alternatives that we wanted to test to better explore the solution space. Generally speaking, these are not user-facing, outside of some changes to the information available in yggdrasilctl’s API.

Destination Lookups

The most significant change is the removal of the DHT-based routing scheme used to initially set up routes through treespace. We now use a simpler YggIP/key->coord lookup protocol which resembles ARP/NDP lookups in an ethernet broadcast network (but without broadcast traffic through the full network). Nodes keep track of which peers are reachable by an on-tree link (that is, the node’s parent and children) along with a bloom filter of the keys of all nodes reachable by that link (with keys truncated to the parts used in /64 prefixes, to allow for IP/prefix lookups). A lookup packet received from an on-tree link is forwarded to any other on-tree link where the destination is found in the bloom filter.

While there are down sides to this approach, it has a number of advantages. First, accidental anycast configurations (using the same key from multiple nodes) will not break any network-wide data structures, it simply causes lookup traffic to arrive at more than one node. Subsequent steps will generally fail (route lookup, key exchange, etc), but there is no collateral damage to the rest of the network. Secondly, this requires very little idle maintenance traffic, and only needs a constant amount of state per peer. This means nodes in the core of the network are not responsible for maintaining a view of anything more than their immediate neighborhood, and are not hammered with idle DHT maintenance traffic originating at distant nodes. Similarly, nodes at the edge of the network do not need to send any regular DHT keepalive traffic, which may help with bandwidth use and power consumption on mobile devices. Third, this structure converges asynchronously and in time proportional to the depth of the tree, rather than sequentially and in time proportional to the size of the network, so the very poor worst-case-scenario convergence times of the DHT are avoided.

The major down side to this approach is that bloom filters can and will generate false positives as they fill. In practice, we would expect filters in the “core” of the network to saturate, where every node appears to be reachable by every path. This in turn means that a node’s route to the “core” of the network (generally via their parent) will take on the role of a “default route” and receive a copy of every lookup sent by the node. We expect lookup traffic will reach the core of the network, effectively act like broadcast traffic within the core, and then be culled by the bloom filters as it approaches the edges (such that a leaf node is unlikely to receive any traffic for which they are not the intended recipient). In short, the nodes in the core will see lower memory use and less bandwidth used by idle maintenance traffic, but active network use will consume more bandwidth. It remains to be seen whether or not this is a worth-while trade-off.

Just to put some hard numbers on things: we use 8192-bit bloom filters with 8 hash functions. If there is a node that acts as a gateway to a subnet with 200 nodes in it, then that has a false positive rate of about 1 in a million (that is, we expect that network needs about a million nodes before the gateway sees any false positive lookup traffic). A majority of lookup traffic is true positives up to a gateway to a 500 node subnet in a 1 million node network.

So in practice, most nodes should not see any meaningful number of false positives, unless they are acting as the gateway to a very large subnet (or are in a network many orders of magnitude larger than the current v0.4 network). In our current network, a handful of nodes may find themselves in the “core” region, where they receive false positive lookup traffic from most lookups. We hope this is still preferable to constant idle DHT maintenance traffic and potentially very high memory requirements.

CRDT Tree

Previously, each node’s location in the tree was verified by a chain of signatures (reach referencing their parent) from the node back to the root. This can lead to inconsistencies where different nodes have mutually incompatible views of the same ancestor (e.g. node A says parent P has grandparent G, but node B says the same parent P has grandparent G’), which complicates parent selection in response to changes in network state. To address this, we have broken up the tree information into separate per-link information, which is gossiped between nodes and merged into a CRDT structure. This forces nodes to have a locally consistent view of the network, which prevents unnecessary “flapping” in some cases where a node’s route to the root has broken. This also reduces the amount of information which must be sent over the wire, as a node does not need to send information back to a peer when it knows the peer has already seen it.

Greedy Routing

Source routing (from v0.4) has been removed in favor of greedy routing (as was done in v0.3.X). In a stable network, this has no effect on the route that packets take, only on how the decision to take that route is made. We may move back to a source routed approach in the future, but the approach used in v0.4 had some issues that would need to be addressed first. Source routing is a nice performance optimization to have, if it can be done securely, but it’s not an explicit goal of this project. While I have ideas on how to do this, it isn’t a high priority in the short term. Since the source routed scheme would presumably still depend on greedy routing for pathfinding, I think it’s useful to focus on stress testing the greedy routing part of the network in this release, and leave source routing for when other parts of the stack are closer to stable.

Per-peer Keepalive Removed

We no longer spam peer links with keepalive traffic every few seconds. Instead, when traffic is sent, we require an acknowledgement within a few seconds (unless the traffic we sent was an ack). This means we do not detect link failures as quickly in an idle network (we need to wait for user traffic or protocol traffic to use the link), but it should reduce idle bandwidth consumption (and likely reduce power consumption for mobile devices). Note that this is separate from e.g. TCP’s own keepalive mechanisms, which are left enabled.

New Features

There are also a few new features added in v0.5. It is now possible to restrict peers with a ?password=X argument to the listen and connecting strings (and multicast configuration). This requires nodes to agree on the password before they will peer. Note that this does not allow for network isolation: nodes can still peer with the rest of the network if they wish, and reachability is still transitive. This does make it easier to restrict who can automatically connect within a subnet, or to set up a node that’s public-facing without allowing connections from everyone who finds it. There’s also support for quic:// connections. Peering over QUIC will only use a single stream of traffic, so it’s largely the same semantics as peering over TCP/TLS, but it may be useful in cases where UDP packets have an easier time punching through a NAT or firewall. We generally expect it to perform worse than TCP/TLS, so we do not recommend using it when it’s not needed.

Summary

Barring any unforeseen delays, Yggdrasil v0.5 should be out within the next few weeks. We’ve hopefully addressed the most significant issues with stability and scaling in v0.4, and significantly reduce the memory footprint and idle bandwidth consumption for some nodes. Some aspects of the new design are radically different from v0.4, so it remains to be seen how well these changes will work in the real world. Preliminary tests (and lots of simulation work) have us optimistic that v0.5 will give us a stable foundation to build on for the immediate future, as we study any limitations of this new approach and work on the inevitable redesign for v0.6.

v0.4 Pre-release Benchmarks

2021-06-26T21:00:00+00:00

Revisiting v0.3

In the current stable release of Yggdrasil, v0.3.16, routing works basically the same way that it has always worked since release. Traffic is forwarded by greedy routing in a metric space. In essence, each node has a “distance label”, and given the distance label of any two nodes, you can calculate the distance of some path between them. In the code, this label is usually called coords, as it represents a position in the tree, but technically we don’t care about the position itself, we only care that it works as a distance label. Traffic is forwarded to whichever peer minimizes that distance to the destination. This has been discussed in an earlier blog post, so lets not worry about the details of how it works for now. Instead, we’ll focus on what happens when it doesn’t work.

To be able to send traffic to a destination D, the sender S must look up the node’s distance label and key in the DHT. This happens just before session setup, where ephemeral keys are exchanged. You can think of it a bit like a DNS lookup: it maps some known static information (the node’s Yggdrasil IPv6 address) onto some unknown or dynamic information (the node’s static key and dynamic distance label). If anything happens to the network that causes the destination node D’s distance label to change, then all traffic to D will drop until the S can look up D’s new distance label. However, that lookup depends on the DHT, and the DHT also uses distance labels for communication, so DHT lookups for D will fail for some amount of time, until the out-of-date information about D times out or is replaced. While that’s happening, S cannot communicate with D, even if the path between S and D is unaffected. Further exacerbating the problem, the DHT search is an iterative process, which requires round trip communication with multiple nodes. These nodes are, for the most part, randomly distributed across the physical network, meaning most of them are likely to be near the edge of the network, where connections are comparatively unreliable and costly to use. If any part of the lookup fails, then this delays search progress (if it doesn’t cause the search to fail entirely).

The network tries to combat these problems by having D refresh itself in the DHT and send a notification to S when D’s distance label changes. However, there is no guarantee that D knows every node which is tracking it in the DHT, and these notifications will hit a dead and and be dropped if the distance labels of the recipients have also changed. This often happens if S and D share a common ancestor in the tree.

To give a concrete example, if S and D are in a LAN with gateway G, and G’s connection to the outside world dies, then this disrupts the traffic flow between S and D. That happens even when the path between them in their own network is unaffected. It also causes various issues in the DHT, which hurt performance for the network in general, and prevents S and D in particular from being able to resume communication.

Improvements in v0.4

As noted in a recent post, the upcoming v0.4 release will include a number of major changes to how Yggdrasil routes traffic. Most of these changes aim to improve performance in dynamic networks and reduce bandwidth consumption from protocol traffic. Without repeating too much from that earlier blog post, the basic goal here is to insulate the routing from changes to distance labels. This happens through a mix of reactive opportunistic source routing and falling back to to proactive DHT-based routing, both of which use distance labels for path setup, but neither of which is broken when the distance labels change (provided that the links in the path still work).

Since it may take a while to see how this affects performance in a live network, and because it’s a bit difficult to actually measure these things in a real network, it seems like it would be useful to look at some results from benchmarks on simulated networks.

Mesh Network Lab

All of the results shown here are from meshnet-lab. You should probably just read the documentation if you want to know more, but to summarize: meshnet-lab simulates mesh networks using network namespace on linux. Each node is given a network namespace, which can be linked to other namespaces to simulate an arbitrary topology. Links are added and removed as needed to e.g. simulate movement in a mobile adhoc network.

Although meshnet-lab supports many other mesh networking protocols, this post will focus on comparing Yggdrasil v0.3.16 (the latest stable release) with v0.4rc3 (the most recent release candidate). Comparisons with other mesh routers would be interesting, but it would be best if those were done by an unbiased 3rd party (and using a stable v0.4.X release instead of a release candidate). Instead, this post will try to highlight (qualitatively) what sort of performance changes we expect to see in the new release.

Mobility1

The mobility1 benchmark simulates a dynamic unit disc graph. Nodes are simulated within a two-dimensional Euclidean plane, with each node having connections to other nodes that fall within a certain radius. The network periodically moves all nodes a random distance between 0 and X (X=10,30,60m) in a 1km x 1km virtual space, then waits some amount of time (10s or 30s) before pinging 200 random paths. The paths are limited to source/destination pairs that are in the same connected component, so it only tests paths that plausibly could work.

These mobility tests are an area where Yggdrasil has struggled up to now, as seen in the v0.3.16 results. Basically, when a node moves, this can affect the coords of other nodes in the network. With the changes in v0.4rc3, the 30s tests are generally in good shape. The 10s tests see some loss, due to the time it takes to detect failed links before we can route around them.

Mobility2

The mobility2 test is essentially a much more aggressive variation of the above. Nodes periodically move a random (increasing) step size with a 15s delay before testing 200 random paths. This test also monitors bandwidth usage.

The main feature to note is that, aside from having terrible reliability in this test, v0.3.16 uses a ridiculous amount of bandwidth when mobility is involved. With v0.4rc3, the bandwith use drops to at or below around 10KBps, depending on how mobile things are. I’m fairly certain that most of this bandwith is still a reaction to mobility events in the network, because (as we’re about to see) the bandwith use is pretty low in static networks.

Scalability1

The scalability1 test set involves running the network over line, tree, or square grid networks. The line and tree networks start at 50 nodes and increase to 300. The grid network starts at 49 nodes (7x7) and increases the side length by 1 at each step, up to 298 nodes (17x17). This test waits for about 5 minutes before pinging 200 paths (slowly, over an additional 5 minutes), and measures both packet delivery rate and network utilization.

There’s not a whole lot to say here, v0.4rc3 is just an improvement across the board. Note that it’s a little surprising how the bandwidth use decreases as the network grows. This may be an artifact of how the test works, since a fixed number of pings may represent proportionally more traffic in small network, but that’s speculation.

Conclusion

The upcoming v0.4 release changes how packets are routed through the network. While it’s hard to say exactly how things will behave in the real world, the performance gains in the simulated networks give us reason to be optimistic.

If things go according to plan, then these changes should improve the user experience and overall usefulness of the network. Changes to the network state should no longer affect existing traffic flows, as long as the path the flow is using is unaffected. In cases where the path is affected, it should take much less time for the network to detect this and route around the damage (when it’s possible to do so). With or without disruptive changes in the network, there should be reduced bandwidth from protocol traffic, leading to lower data use and longer battery life in energy constrained environments (e.g. mobile phones).

Preparing for Yggdrasil v0.4

2021-06-19T21:00:00+00:00

Version 0.4 is coming soon

In the coming weeks, we will be preparing to release Yggdrasil v0.4. This is a significant change from the v0.3 branch with an all-new protocol implementing an improved routing scheme.

This release brings some new and significant benefits:

Improved mobility performance — For nodes that move around or change peerings frequently. This was largely impractical with v0.3 as the sessions would have to time out and a new search repeated.
Spanning tree changes are now less disruptive — Previously it was common for sessions to fail or for traffic to be dropped if the root or parent coordinates changed. This is no longer the case as tree routing is largely only used for bootstrapping DHT paths and determining source routes.
Opportunistic source routing — Session traffic will now use source routing if available, to ensure that the overall connection quality of sessions is preserved. If a source-routed path fails, the traffic will revert to DHT forwarding seamlessly.

However, there are also a number of user-impacting changes coming in this release to be aware of, as we have worked to simplify the codebase and reduce complexity.

Protocol changes

Yggdrasil v0.4 contains a number of breaking changes to the protocol. That means that v0.4 nodes will not peer with v0.3 nodes. We will be wiping the public peers list around the time of release as a result and asking users to re-submit their information once they have upgraded their public nodes.

IPv6 address changes

In v0.3, IPv6 addresses on the network were generated as a hash of your curve25519 keys (the EncryptionPublicKey configuration option). This was made possible due to the iterative search nature of the DHT. In v0.4, the new DHT is based on ed25519 keys instead and therefore we have had to switch to generating IPv6 addresses from the ed25519 keys instead.

This is sadly unavoidable. We understand that this is a rather disruptive change, especially for those who operate public services. However, we believe that the added robustness of the new routing scheme and DHT is more than worth the disruption. We will also be clearing the public services list and asking service operators to re-submit their details after upgrading.

Session firewall deprecated

We decided to remove the session firewall from Yggdrasil v0.4. It’s no longer straight-forward to implement in the new codebase and we believe that it often lulled users into a false sense of security. While it may have given the impression of being stateful, it was much more rudimentary. For example, if the user allowed only outbound connections, it would still be possible for the remote side to send traffic back to you for the length of the session. It was also possible to extend this window just by sending more session traffic.

With the new version, the SessionFirewall options are no longer present in the configuration and will not take effect. You should look to use your operating system firewall instead if you need to control traffic coming to your node. If you intend to operate a node solely as an Yggdrasil router and do not need to send/receive Yggdrasil traffic from that node directly, you can also disable the TUN adapter by setting IfName to "none" in the configuration file.

Tunnel routing deprecated

We also took the decision to remove tunnel routing from v0.4. We know that this has been a somewhat popular feature with some users, but it ultimately was the source of a significant number of bugs within v0.3. It increased the complexity of the TUN module substantially and often also didn’t behave in the way that users expected, particularly those who were used to configuring Wireguard already.

With the new version, the TunnelRouting configuration options are no longer present and will not take effect either. It’s still possible to tunnel over Yggdrasil by using a number of other technologies: GRE, IPIP, Wireguard and others, using the Yggdrasil IPs as endpoint addresses for the tunnels. We recommend tunnelling one of these protocols over Yggdrasil instead.

Release candidates

There are release candidate builds available if you want to try out v0.4 today. We recommend though that you take a backup of your configuration before upgrading or installing any packages and be aware that you will not be able to peer with or access services from the v0.3 network.

At this point the release candidates are using a developmental protocol number. We will bump the protocol version on the final release candidate, but until then, any nodes running v0.4 release candidates should be considered to be experimental only.

Technical Details

Routing

The core routing logic has been redesigned and written into a separate library. This began as a toy hard-state reimplementation of Yggdrasil’s routing logic to test a new DHT design, but it eventually became a soft-state implementation using generational hard state – basically, nodes periodically set up a new network and throw away the old one, but everything acts like a hard-state protocol within the life of any one generation of the network.

Similar tree and DHT structures were reimplemented in pinecone, so if you grok the SNEK then this should seem very familiar.

Treespace

The spanning tree works largely the same way as before. The only significant differences are with the root selection and updates: the root is the node with the lowest ed25519 public key, rather than the highest sha512sum hash of the public key, and the root updates the timestamp for its spanning tree announcements every 30 minutes (previously 30 seconds) with a timeout after 60 minutes (previously 60 seconds). Parent selection uses whatever non-looping path has advertised the best root & timestamp combination the longest, i.e. the path that sent the update the fastest, unless that path was unstable, in which case any flapping should push the network towards the fastest stable path.

As before, each node uses the path from the root to itself as the node’s distance label. Given the distance labels of two nodes, the distance between them can easily be calculated (it’s the sum of the distance from each of them to their last common ancestor in the tree). This is used to do greedy routing in a metric space, but this is only used to find paths for protocol traffic. User traffic uses one of the following two routing schemes.

Keyspace

Each node need to be able to contact any other node given only the node’s IPv6 address (in the Yggdrasil address range). To do that, we use a distributed hash table (DHT). The new DHT for v0.4 is very different from the old DHT (which was based on Chord).

The new DHT takes advantage of the fact that node identifiers are simply ed25519 public keys, and that the root of the tree is the node with the lowest key. Since every node knows a path to the root, and the root is at one of the edges of keyspace, we don’t need to wrap keyspace to form a Chord-like ring. So the new DHT is simply a line of nodes, ordered from the lowest key to the highest key, beginning with the root of the tree.

Each node is responsible for setting up a path from itself to its predecessor in the line. The predecessor uses that path to route traffic to the node. In addition to this, intermediate nodes store a routing table entry for the path. So if node B sets up a path to node A, which node A uses to forward traffic to B, then every node in the path A->B also has a routing table entry for a path that routes traffic towards B. If a link in the path times out, then the nodes on either end of the broken link send an explicit notification that the path is broken, which is what allows the DHT to quickly detect and react to mobility events.

Packets are forwarded towards the key that’s highest without being higher than the destination (“The Price Is Right” rules). These routing decisions are made by any node along a path, not only the nodes at the endpoints of a path. So if A is the predecessor of B, then A need not handle traffic to B — traffic which happens to cross any node on the path A->B will flow towards B without reaching A. The root, as well as all peers and all ancestors of peers, act as additional DHT paths that nodes know about “for free” (since they were learned by necessity as part of the spanning tree setup).

To discover their keyspace neighbors, nodes with no predecessor (or with a predecessor based on an outdated version of the tree) periodically send a bootstrap packet. The bootstrap packet is routed via the DHT until it hits a dead end — at the node’s predecessor. The bootstrap contains the sender’s treespace distance label, which the predecessor uses to send a bootstrap acknowledgement message. The acknowledgement includes the predecessors treespace distance label, which the original node uses to set up a path for the DHT.

Because the new DHT rules are based around forwarding, rather than lookups, DHT searches (and the crawling operations that come with them) are no longer part of the network. Instead of looking up a path to a node, nodes simply forward traffic towards the destination key via the DHT. While establishing a session, the nodes set up a source route and transparently switch to it in the background.

To figure out how many DHT entries are needed in the network, consider the following:

Each node sets up at most 1 DHT path to other nodes (its predecessor).
Each path has 1 DHT entry in the routing table of each node in the path.
The longest possible path between two nodes is one which goes through the root.
In a stable network, nodes select a parent which minimizes the latency of the path from the root.
If we assume latency is proportional to hop count, then this minimizes the hop count.
This means the longest path we expect on the tree is equal to at most the diameter of the network d.
This means the longest possible path between two nodes via treespace is 2d.
Therefore, in an n-node network, there are O(nd) DHT entries across all nodes, or an average of O(d) entries per node (but with no bounds on how that’s distributed across nodes).
Internet-like (scale-free) graphs are observed to have a diameter that scales slowly with network size, most likely d~logn (or possibly d~loglogn).
Therefore, we expect O(nlogn) total DHT routing table entries in a large internet-like network, or O(logn) average state per node (with no bounds on the variance).

That works out to the same average state per node as in most popular DHT implementations, so this may scale OK in practice. However, we expect that distribution to be skewed, with a large number of nodes having very few entries, which plausibly could mean that some small fraction of nodes have O(n) routing table entries in the worst case. These nodes are the same nodes that would need to carry per-path keep-alive traffic if intermediate nodes did not store routing state, so in reality we’re trading one resource for another (possibly higher memory use in exchange for lower bandwidth consumption).

Source routing

The DHT seems quite reliable in benchmarks (using e.g. meshnet-lab), but it is not efficient: paths through the DHT keyspace typically have higher stretch than paths through treespace. In addition, while coord flapping and node join/leave events cause less disruption for the new DHT, that’s not the same as no disruption. To combat this, nodes transparently switch to source routing, and fall back to routing on the DHT only when no source route is known or when a source route reaches a dead end.

To be specific, when a node A sends traffic to node B, A adds B to a list of nodes it cares about. If B also cares about A, then when B receives traffic from A, B sends a notification packet back to A (containing B’s treespace distance label). If A receives a notification from a node they care about (in this case, B), then A sends a pathfinding packet via the tree. At each hop along the way, nodes along the path add the port number to the previous node to a reverse route at the end of the packet. When B receives the pathfinding packet, they send an acknowledgement back to A, using the reverse route back to A as a source route. The acknowledgement builds up its own reverse route, which represents a path back to B. When A receives the acknowlegement, A stores the reverse route as its source route to B. Subsequent traffic is sent to B via that source route, with infrequent (1/minute) checks for a new source route. The same process occurs in reverse, with A sending a notification to B, when A receives DHT-routed traffic from B.

If node A’s source routed traffic to B hits a dead end, due to e.g. a link failure in the network, then the source route is stripped from the packet, and the packet is routed the rest of the way over the DHT. When B receives the DHT-routed packet from A, this immediately and automatically causes B to send a new notify (possibly with B’s new treespace distance label) back to A, which leads to A discovering a new source route to B.

This gives us the best of both worlds. In a stable network, traffic is source routed along the familiar treespace routes from v0.3.X and earlier. If the path between A and B is stable, but the network is not, then the source route continues to work while the tree and DHT try to catch up to changes in network state. If the destination node is mobile, then source routed traffic gets as far as it can before falling back to the DHT, which often puts the packet close to the destination (minimizing the stretch added by the DHT fallback).

Encryption

The encryption and session logic has seen some minor changes as well. Since node IDs are now based on ed25519 keys, we no longer have permanent curve25519 keys to use for the Diffie–Hellman key exchange. Also, in older versions, we would perform one ephemeral key exchange, and then keep using the same key pair for the life of the session. The new version uses a ratcheting system, where keys are rotated after each round trip (or whenever a sender nonce overflows). This should offer better forward secrecy than the previous code, though it’s still subject to change in future versions (if we find something off-the-shelf that we’re happier with).

Conclusion

We are looking forward to releasing Yggdrasil v0.4 and are optimistic that the benefits will significantly outweigh any disruption caused at this stage. We’ve also made a number of other fixes and developed both iOS and Android apps, which we will talk more about soon.

We will be continuing to perfect the release candidates and will make announcements both in our ~~>Matrix channel~~ and on the blog around the time of the release. Please stay tuned for updates!

As always, please bear in mind that Yggdrasil is not production-grade software and we ask you to continue to report problems to us on GitHub.

Release v0.3.13

2020-02-21T09:00:00+00:00

Release time!

Our last Yggdrasil release, v0.3.12, was merged a couple of months ago at the end of November. For the most part we have seen good stability with the v0.3.12 builds, not to mention good adoption (with the crawler showing over 500 nodes running it). Today we are releasing our next version, v0.3.13.

Many of our releases tend not to warrant blog post entries, especially given that the changelog documents the changes. However, there’s some fairly big news points associated with this version therefore this post aims to discuss them in a bit more detail.

TUN adapter changes

The first big talking point is that this is the first Yggdrasil release that departs entirely from the Water library and replaces it with the Wireguard TUN library. There are a few reasons why we decided to switch from Water to the Wireguard library, but one of the most prominent is that it gives us better TUN support across all platforms and allows us to finally remove TAP support altogether.

At a high-level, TUN interfaces are effectively emulating “Layer 3” interfaces - they deal only in IP packets - whereas TAP interfaces are emulating “Layer 2” full-fat Ethernet interfaces.

To run in TAP mode, Yggdrasil not only had to add and remove Ethernet headers for each packet, but it also has to implement an entire NDP implementation and track MAC addresses in order to trick the host operating system into believing that there was a real Ethernet domain on the other end of the adapter. Needless to say, the amount of boilerplate code in order to make TAP mode work correctly was significant and much of that code was very fragile.

Although we implemented NDP, we did not ever get around to implementing ARP, which also meant that sending tunnel-routed IPv4 traffic over TAP interfaces invariably did not work either. We have now been able to remove much of this code and simplify the TUN code massively, closing the gaps between some of our supported platforms.

There is one platform that is negatively impacted by this change and that’s NetBSD. The Wireguard TUN package that we are using currently has no support for NetBSD, so we are also removing NetBSD as a supported target until the necessary code appears upstream. To our knowledge, we don’t have a base of NetBSD users anyway, but we will aim to re-add this soon.

The IfTAPMode configuration option has now been removed from Yggdrasil entirely and it will be ignored if specified. If you are using TAP mode today, then this will affect you. Please make sure to check your Yggdrasil configuration since this may result in interface naming changes and you may have to update network settings in your host operating system.

Initially we added TAP support into Yggdrasil as it was the only way to support Windows, since the OpenVPN driver that we used at the time only supported TAP mode. Thankfully, this is no longer a problem, as the Wireguard project have also released Wintun, which is supported by the Wireguard TUN library. The net result is that we gain TUN support on Windows and the performance is far better than the buggy OpenVPN driver, which is a nice segue into…

Windows installer and performance

We have spent a lot of time trying to improve the installation and setup experience on Windows. This mostly falls into two areas.

The first is that using the Wintun driver has massively improved performance, in some cases by hundreds of MB/s, and starting the Yggdrasil process is now much more reliable too - it should no longer be necessary to restart Yggdrasil due to cases of the TAP adapter not being set up or configured correctly.

The second is that we now automatically generate Windows .msi installers using Appveyor, which means that installing or upgrading Yggdrasil is now simpler than ever. It is no longer necessary to create directories, copy files and register Windows services by hand - a marked improvement!

The installer also bundles the Wintun driver and it is installed automatically if required, therefore there is no longer a need to hunt down and install the OpenVPN TAP driver separately. We hope that these changes will help to encourage adoption of Yggdrasil on Windows platforms by significantly reducing the barrier to entry.

As in the previous section, Yggdrasil on Windows has gone from supporting TAP mode only to now supporting TUN mode only. This may mean that you need to review your configuration. If you no longer need the OpenVPN TAP driver on your system, it is best to entirely uninstall it. It is also important to make sure that the IfName configuration option in your yggdrasil.conf does not specify the same name as an existing OpenVPN TAP interface or Yggdrasil may fail to start.

End of the v0.3 release cycle

Generally we try, where possible, to avoid make any changes which would damage backward compatibility with previous versions. The last version that had breaking changes was v0.2.1 - over a year and a half ago. However, maintaining backward compatibility so tightly also prevents us from improving the Yggdrasil design in various ways.

Therefore, unless any serious bugs or security vulnerabilities appear, it is very likely that this version will be the last in the v0.3 release cycle. Instead, we will start working on the v0.4 release, which is likely to include a number of breaking protocol changes and will be incompatible with v0.3 releases as a result.

More information will be announced on the types of changes in v0.4 as they happen - expect to see more blog posts and chatter in the ~~Matrix channel~~ on this subject - but we will aim to give as much notice as possible before releases occur that contain breaking changes.

Final mentions

In addition to the release notes above, I’d like to relay the message that @mwarning has a proposal open for a Google Summer of Code (GSoC) project under the Freifunk umbrella, comparing a number of mesh routing protocols including Yggdrasil. More information about the proposal is available here. If you are interested, please reach out!

Acting out

2019-09-01T21:00:00+00:00

Overture

We’ve recently rewritten much of Yggdrasil’s internals to change from Go’s native communicating sequential processes (goroutine+channel) style to using an asynchronous actor model approach to concurrency. While this change should be invisible to the average user, it dramatically changes what we developers need to think about when working on the code. I thought it would be useful to explain a little about the motivation for rewriting things this way, and what the consequences are.

Caution: theatre puns and references throughout, because Actors.

Exposition

Yggdrasil is written in the Go programming language. Go makes it easy to start a function running concurrently, and gives developers the tools they need to make concurrently executing functions communicate, but it’s not always easy to use them correctly. To be clear, the things I’m about to rant about are all fixable. Working around them is a normal thing to do in Go. More importantly, it’s a case where doing things the obvious way (which is sometimes even safe in isolation) leads to wrong behavior in a larger program. I prefer models where the obvious thing is still correct, and non-obvious things are only needed as a performance optimization.

Composition

There’s a common pattern that has emerged many times in the Yggdrasil code base. We’ll have a struct with some mutable fields that need reading or updating, such as information about a particular cryptographic session, or the switch’s table of idle peers and buffered traffic. Since shared mutable state is hard, and Go is all about “Share Memory By Communicating”, we’ll have packets get passed to a dedicated worker goroutine that “owns” that particular struct. The worker uses information from the packet and the owned struct to do whatever it is needs to do, updates these things accordingly, and passes the packet along to the next goroutine in the pipeline.

This often results in a “for select” pattern, where goroutines sit in an infinite for loop and select on several channels, to wait for packets to process or various types of signals from other goroutines. There are a few ways around it (with heavy use of reflect or chan interface{}, for example), but in most cases, every select statement needs to fully enumerate every behavior that the goroutine may need to engage in at that point in the code. If there’s a common set of cases that always need to be handled, and then a few exceptional cases that may or may not matter (possibly when the associated structs the workers are using are similar but not exactly the same types, or as the state of a struct’s fields change), then that typically involves multiple select statements with only the addition or modification of one or two cases.

Go embraces composition in its type system, but select statements (and channel operations in general) make execution resistant to composition.

Deadlocks

The “for select” pattern is safe, as far as I know, if the flow of messages through the program form a directed acyclic graph. However, in our case, cycles emerge if we try to handle things in the obvious way. For example, a cryptographic session needs to somehow get outbound encrypted traffic to the switch, but incoming encrypted traffic also needs to make it from the switch to the sessions for decryption (via the router, which is responsible for, among other things, identify which session is associated with the traffic).

When cycles of goroutines naively pass messages over channels, deadlocks are all but inevitable. There are a few ways to address this, but they’re not always appropriate. Ideally, we would change the design to remove cycles, but this is not always possible, and may require significant changes to the workflow in cases where it is possible. In practice, what we’d actually do is either buffer messages (having some dedicated reader goroutine to take the message, add it to a slice, and then pass it to the real destination ASAP) or drop messages entirely (with a select statement that aborts and does cleanup in a default case, or by having a dedicated reader that drops messages more intelligently, such as from the front of the queue, under the assumption that older messages are less useful).

Leaks

Typically, when a goroutine is started, it continues to run until either the function returns or the program exits. For this reason, if a goroutine executes any statements which can block (such as a channel operation), it’s important to include some case which signals that it’s time to return. Forgetting to do this can result in goroutine leaks. Never start a goroutine without knowing how it will stop, or so the experts say.

This is sometimes harder than it needs to be. To be blunt, the single producer N consumer cases are fine, you just close the channel and have all the consumers take this as a signal to exit. Anything involving multiple producers requires some sort of signaling to indicate that all producers have exited. Since you’re using a channel already, the obvious option is a select statement with another channel that closes to signal shutdown, and then something like e.g. a sync.WaitGroup to wait for all producers to exit before closing the channel. Until your number of producers needs to change at runtime, and you realize that this races if you start to Wait before Adding everything to the group, so you need to implement a custom counter, and be careful that additions and subtractions can also race and cause it to shut down early. And have fun solving it, because with how much select resists composition and code reuse, you’re going to be implementing the same patterns over, and over, and over, and over…

It’s not that this is some impossible problem to solve, it’s just that Go’s take on the CSP, combined with the rest of the tools the language gives you, makes it easy and concise to run thing the wrong way, and leads to comparatively complex and delicate code when trying to run it the right way. At least, that’s my personal view of it based on my experience so far, but it probably varies some based on the problem the code is trying to solve.

Rising action

The actor model is another programming paradigm that embraces concurrency with a “share memory by communicating” philosophy.

For our purposes, an actor is basically a data type with a few special properties:

It has an inbox where messages to the actor are placed.
It has an associated unit of execution, such as a thread, which processes messages from the inbox one at a time.
Rather than exposing ordinary functions for other code to call, the actor exposes behaviors. A behavior is a function which has no return value, and is executed only for its side effects. When an actor A calls a behavior of an actor B, what really happens is that A places a message in B’s inbox, and B processes that message by executing some code.

Different implementations differ on details after that, such as what order messages are processed in, if actors are allowed to wait for a particular type of message before continuing, whether actors run locally or are distributed across a cluster, etc., but they tend to all include some version of the broad strokes above.

Turing point

I’m particularly fond of the pony programming language’s take on the actor model. I really can’t say enough nice things about their approach, and fully describing it is beyond the scope of this blog post, but if you come out of here with an interest in the actor model, then I highly recommend checking out that language. Maybe watch a few of the talks from the developers that have been posted to YouTube, or read their papers about what is easily the most promising approach to garbage collection I’ve ever come across.

Anyway, I don’t actually work on anything written in pony, but I like their version of the actor model so much that I decided to see if I could trick Go’s runtime into faking it. The result is phony, which manages to do most of what I want in under 70 lines of code. When we write code using this asynchronous message passing style, instead of ordinary goroutines+channels, the implications are pretty significant:

There are no deadlocks. Message sends always succeed, and are quite fast (it doesn’t even require CAS instructions in the normal case).
Inbox sizes stay small due to backpressure: if the sender sees that the receiver’s inbox has too many pending messages, it will schedule itself to stop at some deadlock-free safe point in the future, to wait until the receiver signals that it’s handled the message.
Actors are shockingly lightweight: on a modern 64-bit processor, an idle Actor’s only resources are 24 bytes for an empty Inbox, some of which is padding that may not apply if embedded into a struct. In particular, an idle Actor with an empty Inbox has no associated goroutine, so it requires no stack.
The lack of a goroutine also means that idle Actors, even cycles of Actors, can be garbage collected automatically.
Any struct that embeds an Inbox satisfies the Actor interface. Since Actors encapsulate their own unit of execution, it means the range of behaviors that unit of execution can engage in are encoded into the type system and can even be abstracted through interface types. In my opinion, the resulting code is cleaner, easier to read and understand, and far easier to reuse or extend than the for select pattern from goroutine+channel use.

Falling action

I’m happy enough with the current state of phony that I decided to start migrating the yggdrasil-go code base to use it. This is still work in progress (there are some non-Actor goroutines around the edges of the code, mostly in main Accept loops and that sort of thing), but the hot paths are now Actor based.

Most of this was done in a weekend and came together with surprisingly little pain. I had exactly 2 crashes the entire time (1 accidental nil pointer deference and 1 legitimate bug I needed to fix in phony), and more importantly, 0 deadlocks. Most things just worked as intended the first time they compiled. There were a few bugs to work out when I was rewriting the link code, but nothing compared to the mess I had to deal with when writing the old code (which was a couple of horrifying interdependent for select loops to build a state machine).

So by now you’re probably wondering what any of this looks like in practice. Just to give a generic example, suppose we have some struct with an exported function that needs to run code on a worker goroutine. We could end up with something like the following when writing Go in the CSP style:


// This is the function we want the worker to run.
func (n *NonActorStruct) theFunction(arg1 Type1, arg2 Type2) {
    // this is where the code we actually care about goes, the rest is basically boilerplate
}

// This is the struct that we want the worker to own and manipulate.
type NonActorStruct struct {
    inputForTheFunction chan argsForTheFunction
    // fields we care about, plus maybe more channels for other things
}

// Needed to initialize the channel to a working state
func NewNonActorStruct() *NonActorStruct {
    n := NonActorStruct{
        inputForTheFunction: make(chan argsForTheFunction),
    }
    return &n
}

// This is just a helper struct to carry arguments for the function.
type argsForTheFunction struct {
    Arg1 Type1
    Arg2 Type2
}

// This is the function we export.
func (n *NonActorStruct) RunTheFunction(arg1 Type1, arg2 Type2) {
    n.inputForTheFunction<-argsForTheFunction{arg1, arg2}
}

// This is needed to start the worker, otherwise things block.
func (n *NonActorStruct) Start() {
    go func() {
        for {
            select{
            // cases for other things we may need to do would also be here
            // presumably at least one is involved in safely shutting down
            case args := <-n.inputForTheFunction:
                // We could possibly have a switch statement here
                // Then switch on the arg type to pick which function to run
                n.theFunction(args.Arg1, args.Arg2)
            }
        }
    }()
}

// This is needed to stop the worker when we're done.
func (n *NonActorStruct) Stop() {
    // Actual implemenation depends on what else the worker does in its loop,
    // but it probably just sends a specific message and/or closes some channel.
}

// Then to use the code, we have something like:
myStruct := NewNonActorStruct()
myStruct.Start()
defer myStruct.Stop() // Or arrange this to happen somewhere else
myStruct.RunTheFunction(arg1, arg2)

When migrating to the actor model, the basic pattern that emerged was to embed a phony.Inbox into any struct we wanted to make into a phony.Actor, and then define functions of the struct like so:


// This is the function we want the worker to run.
func (a *ActorStruct) theFunction(arg1 Type1, arg2 Type2) {
    // this is where the code we actually care about goes, the rest is basically boilerplate
}

// This is the struct that we want the worker to own and manipulate.
type ActorStruct struct {
    phony.Inbox // This defines the Act function, satisfying the Actor interface
    // fields we care about
}

// This is the function we export.
func (a *ActorStruct) RunTheFunction(from phony.Actor, arg1 Type1, arg2 Type2) {
    a.Act(from, func() {
        a.theFunction(arg1, arg2)
    })
}

// And then to use it, an Actor x would run something like:
myActor := new(ActorStruct)
myActor.RunTheFunction(x, arg1, arg2)

And that’s about it. The first argument to myActor.RunTheFunction also nilable, if we have non-Actor code that needs to send a message, it just means there’s no backpressure to slow down the non-Actor code if it’s sending messages faster than the Actor can handle them. A phony.Block function exists to help non-Actors wait for an Actor to process a message before continuing, since this seems like a common enough use case (especially when a package wants to export a non-Actor interface that uses Actor code internally).

What’s great is that we don’t need to think about starting or stopping workers, deadlocks and leaks are not possible outside of blocking operations (e.g. I/O), and we can add or reuse behaviors just as easily as any function. I find the code easier to read and reason about too.

I/O is one rough spot, since an Actor can block on a Read or a Write and not process incoming messages as a result. This isn’t really any worse than working with normal Go code, and the pattern we’ve adopted is to have separate Actors for Read and Write, where one mostly just sits in a Read loop and sends the results (and/or error) somewhere whenever a Read finishes. These two workers can be children of some parent Actor, which is the only one the rest of the code needs to know about, and then all we need to remember to do is close the ReadWriteCloser (e.g. socket) at some point when we’re done. This is the sort of thing that we’ll eventually want to write a standard struct for, update our code everywhere to use it, and then never have to think about it again. In the meantime, we have a couple of very similar implementations for working with sockets or the tun/tap device.

Dénouement

The Go language makes concurrency easy, but for some problems it can be difficult to do safely out-of-the-box. However, the language provides the tools needed to implement an actor model approach very easily. While I won’t claim that the actor model is a panacea for all development woes, Yggdrasil by its very nature requires us to think about networks of nodes communicating asynchronously, so it makes sense to use a programming paradigm that lets us model that approach more explicitly in our code base. Outside of a couple of corner cases (namely blocking I/O for the network sockets and the tun/tap device), we expect this to obviate any need to even thing about deadlocks, make development easier moving forward, and generally lead to a better user experience as a result. The code migration is still a work in progress, but Actors have replace for select workers along the hot paths through the code (minus 1 crypto worker pool in the session code) and will slowly replace synchronization primitives in the remaining code base. The current code has been merged into our develop branch, and I’m quite excited to see it land in Yggdrasil v0.3.9, along with the usual bug fixes and incremental improvements, which we plan to release in the near future.

Meshing using Apple Wireless Direct Link (AWDL)

2019-08-19T08:00:00+00:00

Wireless without borders

I was mostly prompted to write this post in response to a Hacker News thread recently, which announced the release of an open-source AirDrop implementation called OpenDrop, from the same team at Seemoo Lab who produced an open-source implementation of Apple Wireless Direct Link (AWDL) protocol called OWL. AWDL is the secret sauce behind AirDrop, peer-to-peer AirPlay and some other Apple wireless technologies. Even though everything covered in this post was done some time ago, I have never spent the time to document it.

With a few exceptions, most wireless networks in the world operate in “infrastructure mode” which is where a wireless access point serves one or more wireless clients. Think of your Wi-Fi at home, at work or in a coffee shop. However, as implied by the name, reliable and usable infrastructure Wi-Fi is often only available in certain physical locations with “good infrastructure”. If you wanted to connect some devices together anywhere not served by an infrastructure Wi-Fi network, or in a location where you can’t suddenly plug in a wireless access point, you may not have many options (Bluetooth aside).

AWDL is designed to avoid this problem by extending the 802.11 wireless standard to allow client devices to communicate directly with each other, without the help of the central wireless access point. You can walk out into a field with a couple of iPhones or Macs and they can use AWDL to discover each other and exchange data, peer-to-peer. Even better is that nearby devices that are connected to different infrastructure Wi-Fi networks can still communicate with each other using AWDL!

The science

Normally, when connected to a wireless access point, wireless clients remain locked to the specific radio channel that the AP is using. AWDL works by instructing the wireless adapter in the device to “hop” between channels so that it can not only remain connected to the wireless access point, but can also listen to other nearby devices.

Devices announce their presence and information about their services on a “social channel” for other devices to hear, effectively creating peer-to-peer service discovery. Once two devices have decided that they want to communicate directly, they agree to jump to another channel for real data exchange so that they don’t interrupt existing Wi-Fi networks or, indeed, the social channel. These “hops” between wireless channels happen so quickly that there’s very little disruption to what the user is doing with their Wi-Fi connection already (except for some minor wireless performance degradation - to be covered later).

A number of papers have been published by the OWLink team on the inner workings of the AWDL protocol, which can be found here. In particular, this paper from Mobicom 2018 contains a significant amount of detail about the AWDL protocol itself, channel hopping techniques and security considerations, amongst other things.

Mesh opportunities

Yggdrasil is designed to create a mesh network automatically out of interconnected nodes - the idea being that all nodes can route to all other nodes on the mesh network by routing through other nodes.

Today, many of these connections happen between nodes across the Internet, since the community is still relatively small and geographically dispersed. A node joining the Yggdrasil network needs to only peer with a single device that is already connected to the wider network in order to participate in the fully-routable mesh.

However, it’s not the goal of Yggdrasil to remain something that we just toy with over the Internet. We want to build a protocol that can scale globally and work ad-hoc, even in places where infrastructure might not be particularly strong otherwise. We think that one of Yggdrasil’s greatest strengths is that it is very close to zero-configuration, beyond giving it a very small number of configuration options, and it should scale well too in principle.

Yggdrasil can already discover potential peers on the same network segment by using multicast service discovery, which sounds a lot like what AWDL does on the social channel. You can configure which interfaces Yggdrasil beacons on with the MulticastInterfaces configuration directive.

I wanted to know if we could blend the two so that Yggdrasil could automatically discover other nearby devices and initiate peering connections with them using AWDL.

Getting started

Macs are a good target for developing and testing AWDL-aware applications as AWDL is exposed to userspace through a network adapter called awdl0. It sits there with a link-local IPv6 address, you can run tcpdump or Wireshark on it to listen to AWDL traffic and you can even ping multicast group addresses on the interface and get responses from other nearby devices, e.g. using ping6 ff02::1%awdl0! However, Apple devices don’t always keep AWDL alive and listening all of the time.

On macOS, the AWDL driver is only woken up when either AirDrop is being actively used in Finder, or where a NetService has been created (usually through Objective-C or Swift) which requests peer-to-peer networking. AWDL is normally kept alive long enough to satisfy connectivity for these sessions and then will be sent back to sleep after a period of idleness.

On iOS, the story is somewhat similar to above, except that AWDL is often woken up as soon as the device is unlocked if AirDrop is enabled. The NetService API otherwise functions the same way.

tvOS is the outlier in that it seems to wake up and listen to AWDL randomly, even when the device is otherwise asleep, presumably because it is advertising the ability to receive incoming AirPlay sessions to nearby devices.

From a user perspective, the awdl0 interface looks entirely unremarkable. It behaves largely like any other ethernet interface, carrying regular IPv6 traffic. In the background it’s a bit more complicated, as the AWDL driver performs traffic filtering for security reasons, namely, to stop someone sat next to you in the airport from browsing your file shares. Regular listening sockets won’t accept connections over AWDL unless a specific socket option was configured on the socket before it started listening.

Multicast traffic, however, does largely get passed through the filter untouched. Bingo.

Waking up AWDL

The NetService API is effectively a wrapper around multicast DNS-SD, which in Apple’s colourful language, is affectionately known as Bonjour. The API has the added benefit of being able to tell the operating system to wake up the AWDL driver pretty much on demand on behalf of “peer-to-peer” services.

So all we would need to do to wake up AWDL is to call the NetService API, publish a service that requests peer-to-peer functionality and let the operating system do the hard work for us. Yggdrasil, being written in Go, didn’t have any concept of NetService but thankfully we were able to use Cgo to do this instead.

We wrote a Cgo function which calls the NetService API and advertises our new fake service, _yggdrasil._tcp, which causes the operating system to wake up the AWDL driver. Amazingly this worked.

Yggdrasil doesn’t actually use DNS-SD - we currently use a custom-formatted multicast beacon on a different multicast group. It is planned to eventually migrate to something more standard, like DNS-SD, for service discovery. However, in this instance, registering a fake DNS-SD service was just enough to wake up AWDL.

Peering automatically

Once the driver is active, the regular Yggdrasil multicast beacons on the ff02::114 multicast group address seem to be passed through to the driver normally and the Yggdrasil nodes running on each machine start to hear each other’s calls.

The only thing that remained to be done was to configure the sockets with the aforementioned socket option to allow them to communicate over the AWDL interface. This socket option is called SO_RECV_ANYIF and is defined in sys/socket.h on Darwin as 0x1104.

We configure the socket option on our TCP peering socket:

err = unix.SetsockoptInt(int(fd), syscall.SOL_SOCKET, 0x1104, 1)
if err != nil {
  ...
}

Now that the Yggdrasil nodes can hear each other’s advertisements over the awdl0 interface, the regular automatic peering process kicks in and a TCP session is opened between the two devices, creating a peering. The net result? AWDL peerings!

$ sudo yggdrasilctl getSwitchPeers
   bytes_recvd   bytes_sent  coords       endpoint                         ip                                      port  proto
1  244278        313907      [3 5 5 2 1]  fe80::xxxx:xxxx:xxxx:xxxx%awdl0  xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx  1     tcp

To further cement the experiment, we can actually disconnect the two devices from each other, or connect to different Wi-Fi networks automatically, and the peering over the awdl0 interface still continues to function!

An iperf3 test over Yggdrasil using the new AWDL link looks fairly good - the devices are sat next to each other:

[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  15.4 MBytes   129 Mbits/sec
[  5]   1.00-2.00   sec  16.9 MBytes   141 Mbits/sec
[  5]   2.00-3.00   sec  15.9 MBytes   133 Mbits/sec
[  5]   3.00-4.00   sec  17.6 MBytes   147 Mbits/sec
[  5]   4.00-5.00   sec  16.8 MBytes   141 Mbits/sec
[  5]   5.00-6.00   sec  16.2 MBytes   136 Mbits/sec
[  5]   6.00-7.00   sec  12.5 MBytes   105 Mbits/sec
[  5]   7.00-8.00   sec  12.7 MBytes   106 Mbits/sec
[  5]   8.00-9.00   sec  14.9 MBytes   125 Mbits/sec
[  5]   9.00-10.00  sec  13.5 MBytes   113 Mbits/sec

Observations and iOS

As the iperf3 test above shows, the link performance is actually quite good! It routinely exceeds 100mbps, although this is between only two devices. I have not been able to test this with Yggdrasil nodes running over AWDL in any particular density due to only having a limited number of Macs to hand.

One thing that I did notice though is that, while AWDL is active, my wireless connection to my home Wi-Fi network does reduce in speed somewhat. This is to be expected, given that the wireless chipset is hopping between channels rather than spending all of its time on a single channel.

Sadly we weren’t able to reproduce this test using iOS Testflight builds of Yggdrasil. On iOS, we implement Yggdrasil as a VPN service which is subject to a number of probably reasonable restrictions imposed by the OS, which presumably exist to stop VPN extensions from spying on you.

We were able to create a NetService from within the VPN extension and the service beacons were advertised as expected, however, we weren’t able to initiate any other kind of connections over the awdl0 interface. After a chat with an engineer at Apple, it turns out that the awdl0 interface isn’t scoped for use within a VPN extension, thus squashing our hopes and dreams of being able to sprinkle this kind of magic onto our iOS port of Yggdrasil. We have a feature request radar open with Apple in the hope that they may be able to change this restriction in the future.

But we were able to get this to work on macOS and that, itself, is quite awesome.

Conclusion

Yggdrasil doesn’t enable AWDL by default because of the reduction in wireless performance that AWDL being active can cause. Therefore, to enable AWDL peering, you must add the awdl0 interface specifically into the MulticastInterfaces configuration option in yggdrasil.conf. However, we do have working support for connecting Macs together and meshing automatically using AWDL, and you can enable it very easily if you wish to experiment!

We’d love to hear if you are peering Yggdrasil nodes using AWDL, or have performed any more extensive tests of how it performs in real-world scenarios - join us on our ~~Matrix channel~~.

Version 0.3.6

2019-08-03T08:00:00+00:00

New release!

It’s been nearly five months since we released version 0.3.5 of Yggdrasil. In that time we’ve seen the node count rise to over 400 nodes on the public network at times (over 80% of which are running the latest released version) and we’ve gained valuable insight to the kinds of challenges that our users have. We’ve worked to fix a number of bugs and to improve Yggdrasil.

In terms of lines of code changed, version 0.3.6 is the biggest release of Yggdrasil to date, with several thousands of lines of code affected. It represents a massive refactoring exercise in which we’ve broken up and modularised the code, dividing core Yggdrasil functionality, TUN/TAP, admin socket and multicast features into their own respective Go packages.

Fixes

Most of the user-facing changes in this release are fairly minimal, however some bugs have been corrected. A complete list is available in the changelog.

Highlights include peers now being added correctly even when one or more configured peers are unavailable or unreachable. Multicast interfaces are also being evaluated more frequently now, which can help if an interface becomes available or goes down after Yggdrasil has already started.

A number of bugs have been fixed in the TUN/TAP and IP-specific code, including problems that affected ICMPv6 and Neighbour Discovery in TAP mode specifically. This helps reliability on platforms where TAP mode is used more commonly, e.g. on BSD platforms or on Windows, although this also improves TAP support on Linux too.

Refactoring and API

Around the previous release, it became obvious to us that our codebase was turning into a monolith. We had pretty much all of the necessary behaviour in a single yggdrasil package to run a single node, but this made our codebase inflexible and difficult to maintain and extend. It also meant that Yggdrasil was virtually impossible to integrate into other applications.

Our refactoring efforts in version 0.3.6 mean that our codebase is now easier to manage and to understand. It also includes the first taste of our API! The API makes it possible to take the Yggdrasil core, drop it into your own Go application and use the Yggdrasil network as a fully end-to-end encrypted and distributed transport layer. We’ve also moved all of the IP-specific code into the TUN/TAP module, which means that Yggdrasil’s core now provides a completely protocol-agnostic transport.

Documentation on how to use the API to integrate Yggdrasil into your own applications will follow soon—watch this space! In the meantime, godoc can be used to examine our new API functions.

Please note though that API functions are not yet finalised and may be subject to change in future versions. Yggdrasil is still alpha-grade software at this point so all of the usual warnings apply.

Platform Support

We enjoy great support from our community in bringing and packaging Yggdrasil on new platforms. Since the release of version 0.3.5, the following third-party packages have cropped up, and we are very grateful to the maintainers:

A new RPM build for Red Hat, Fedora, CentOS etc.
An AUR package for Arch Linux
A Void package for Void Linux
A MacPorts package for macOS

We expect that any third-party packages which have not yet been updated for v0.3.6 will be updated soon!

We are aware of a few outstanding issues with Windows, which are largely related to one or two bugs in the Water library which we use for TUN/TAP support. We are hoping to address these problems with the maintainer of this library soon. Using Yggdrasil in router-only mode does work as expected, but some bugs when using the TAP adapter still remain. In the meantime, we’d certainly welcome any assistance in maintaining the Windows port of Yggdrasil.

The iOS build has been largely neglected due to API changes, although hopefully a new TestFlight build for version 0.3.6 will be available before too long.

Upgrading

We recommend that all Yggdrasil users always run the latest version of the code wherever possible, so please upgrade as soon as it is convenient. New downloads are available from our Builds page and Neil’s S3 repositories are up-to-date for Debian and EdgeRouter installs.

If you have installed through a package manager, you should be able to upgrade in-place as soon as the new packages are available. On macOS, you can simply install the new .pkg from the builds page over the top of the old one. On Windows, and on any installation where the binary was installed by hand, you can simply replace the yggdrasil and yggdrasilctl binaries with the newly released builds.

Building from source is simple if you have Git and Go 1.11 or later installed:

git clone https://github.com/yggdrasil-network/yggdrasil-go
cd yggdrasil-go
./build

Feedback

We always welcome feedback, so please do feel free to join us either in our Matrix channel or on IRC in #yggdrasil on Freenode. You can also raise bug reports and issues in our GitHub repository.

Practical peering

2019-03-25T04:00:00+00:00

How many peers do I need, and which ones?

Perhaps the most common questions we receive are about peering. If you’re not familiar with how Yggdrasil works, or even if you are but you haven’t tested things carefully, then it’s sometimes easy to do things which seem like they should work right, but lead to higher latency and lower bandwidth for you or nodes that depend on you in the network. This post is meant to explain what happens when the wrong peers are selected, and what you can do to avoid it.

The problem

When building a physical network, the cost of adding a link between two nodes, as well as the benefits that having that link would give, play a role in deciding which nodes are ultimately linked. That cost often correlates with the cost of using the link – long links are more expensive to create and have higher latency than short links of the same type, for example, and there’s no point in adding a link to another node unless it’s worth the cost. Yggdrasil is designed to work well on the kinds of networks we see in the real world, and makes implicit assumptions which benefit from the relationship between the higher cost to both create and use a long link.

However, when peering Yggdrasil nodes over the internet, the performance difference between two links can be dramatic, but the cost of creating them is always the same: there is no difference between adding a link over the internet to a node if it’s 1 km away or 1000. As a result, it’s easy to add links over the internet which would make no sense if deploying dedicated infrastructure, and can violate some of Yggdrasil’s assumptions as a result. This can lead to worse performance for not only the two linked nodes, but other nodes in their area.

Rules of thumb

In an effort to clarify how nodes should connect to public peers, and how public peers should connect to each other, I think it’s helpful if we establish some rules of thumb:

When deciding if to connect to another node, you should only connect to the ones that are “good enough” to be worth the effort. Here, “good enough” means that they have as much (approximately) at least as much bandwidth as your own. A fast node shouldn’t decide to connect to a slow node, instead the slow node should decide if it wants to connect to the fast one.
When connecting to nodes, start with the “closest” (lowest latency) nodes, subject to the above constraint, and work your way out. Try not to skip over (equal or better) nodes if there’s no reason to.

While this may not be the only way to fix the problem, following these rules of thumb should approximate the kinds of constraints that real networks need to deal with. Nodes tend to connect to whoever is closest, and better nodes tend to skip over worse ones to establish a long range “backbone” connection between remote points.

In addition, the number of peers you want to add depends on what you want to do. If you only want to connect to the network, then 1 (better connected) peer is technically enough, but this acts as a single point of failure. Two to four peers adds some redundancy, but keep in mind that you may end up routing traffic between these peers if that ends up being the best route they can find. If your goal is to set up a public peer that can route traffic for the network, and you have enough bandwidth to spare, then keep adding peers. Generally speaking, an asymmetric home internet connection shouldn’t try to route traffic. And, wherever possible, replace internet links with real connections over directional wifi or similar – to avoid having multiple peers share bandwidth over a shared link.

What happens when things go wrong

Let’s imagine we have some nodes in New York, and initially they follow the peering rules outlined above. Now suppose that two of these nodes decide that they want to add connections to London. In Yggdrasil, nodes tend to select parents that minimize latency to the root, which happens to be a node in Paris at the time I’m writing this. As a result, both of the NY nodes are likely to select their respective London peers as their parents. If the nodes are following the peering rules, then at least one of them has also decided to peer with the other, so they have a shortcut they can use to talk to each-other (or any descendants in the tree).

However, if they ignore the peering rules and don’t peer with each other, then they are likely to route through London instead of communicating over their local mesh network. A shorter path exists, through their local mesh network, but it’s not one that the network must know about for routing to work, so they won’t necessarily know about it. As a result, the latency between these two nodes (or decedents thereof) will likely be an order of magnitude more than it needs to be (and probably lower bandwidth as well).

Conclusion

Yggdrasil was designed with scalability in mind, and to that end, it makes some assumptions about how nodes in the network are connected to avoid communicating unnecessary information. Peering over the internet allows you to violate these assumptions. When this happens, it’s possible for network performance to suffer unintended consequences when adding new links. If you prioritize adding new links the same way as you would when building physical links, you can expect lower latency and, in many cases, higher bandwidth, compared to adding peers at random.

History of the World Tree, Part I

2019-01-09T05:00:00+00:00

How did Yggdrasil get started?

On a few occasions I’ve been asked about how Yggdrasil was started, or what motivated certain things about the design. I’ve talked about the motivation and technical details in other blog posts, but I haven’t talked about the history before, so I thought it’s about time.

B.A.T.M.A.N. begins

The first time I can recall hearing about mesh networks, as a concept, was some time in late 2010 or early 2011, when B.A.T.M.A.N. reached the mainline Linux kernel. I liked the idea, but since I obsess over how things scale, I was worried about the network’s ability to cope with an internet-like number of users. In B.A.T.M.A.N., as in most other protocols, nodes must either rely on some externally configured (and coordinated) subnetting, or else every node in a network must know about every other node in the network. In particular, each node periodically sends a broadcast packet through the network, which allows the rest of the network to find a path back to the originating node. Other approaches, such as AODV, only search for routes when they’re needed, but the same ~O(n) cost applies for each node in a network with n nodes. At a certain point, particularly in a shared medium wireless network, the cost of protocol traffic can become larger than the resources available to the network, and so the network no longer has room to route any traffic for the user.

CJDNS

I came across cjdns in the summer of 2012. The thing about cjdns that caught my attention was how it used a Distributed Hash Table to allow each node to look up a path to any other node, instead of relying on broadcast traffic. The idea being, if you can use a DHT instead of broadcast traffic, then you can just throw the whole network into one large subnet, with “flat” identifiers (IP addresses) that have nothing to do with the position of a node in the network. Then, since you still need some way to assign addresses, you can derive them from a hash of a node’s public encryption key. That simultaneously addresses the protocol overhead issue, address assignment, and lets you do end-to-end encryption without depending on public key infrastructure.

What could go wrong? Well, the best short example I can give, is to imagine that Alice wants to deliver a package to Carol, and they live in a world without maps or addresses, and where you can’t rely on directions like “go North by any route until you reach X”, so everyone needs to memorize any roads or routes that they care about. Alice doesn’t know where Carol lives, but she knows where Bob lives, and she has reason to believe that Bob knows where Carol lives. So, Alice visits Bob and asks for directions to Carol. Bob tells Alice how to get from Bob’s house to Carol’s house, and Alice memorizes this. Now, any time Alice wants to deliver a package to Carol, she travels form her house to Bob’s house, and then from Bob’s house to Carol’s house. If anyone asks Alice for a path to Carol, she will give them the path from herself to Carol, including the unnecessary detour past Bob. If someone knows enough about the layout of the streets to recognize the detours, or otherwise know that there’s a shorter path between two points somewhere on the route, then they could improve upon this path, but in general this doesn’t happen, because nobody knows enough about the layout of things to see the big picture of where everything is.

That’s basically how cjdns routing worked before supernodes were introduced. Supernodes keep a (centralized) view of the full network, and then other nodes can ask a supernode (instead of doing DHT lookups) for a path. Ignoring any technical complaints I may have about that approach, it sidesteps the problem I’m interested in solving, so I stopped actively contributing to cjdns once the decision was made to go that route, and started looking for other ways to solve the routing problems cjdns had faced.

Just like the simulations

By around the middle of 2015, I had thrown together a basic skeleton of a network simulator in python, so I could compare the paths that different routing schemes find to the shortest paths through the same networks. Having studied up on the latest and greatest academic works at the time, I had initially been thinking that something resembling Thorup and Zwick’s universal compact routing scheme made the most sense, but I had issues finding a way to implement that securely as a distributed algorithm running on a dynamic network.

To make a long story short, I ultimately took the most inspiration from Robert Kleinberg’s approach, which is to use a greedy embedding. Here’s the thing, the Kleinberg approach grows a spanning tree of a (static) network, and embeds the tree in the hyperbolic plane, then proves that this embedding is always greedy (meaning, if you just forward to the point in the metric space closest to the destination, you’ll never hit a dead end). The only real difference is that Yggdrasil doesn’t bother to embed the tree in the hyperbolic plane. Instead, each node remembers the path from the root to itself, and we use these paths to calculate distance apart on the tree. This saves us the trouble of embedding, and we’d need to know the per-hop tree information anyway to securely build the tree, so this saves us some complexity.

Using a DHT, we can look up who we want to talk to (specified by an IPv6 “address”, which is a flat identifier / hash of a key, as in cjdns), we can learn where they are on the spanning tree. Then, when a node needs to forward a packet, it checks the tree location of each of its peers and forwards to whichever one is closest to the destination (+- a few caveats about congestion control). This is explained in more detail in earlier blog posts, if you’re not familiar with how Yggdrasil routes and care to read more.

In our package delivery example, imagine if the streets in Alice’s town were laid out in a grid, and then named and numbered systematically by blocks, with street signs to label where any off-grid bypasses go. Alice and friends still haven’t bought maps, but they know each other’s addresses instead. So, if Alice wants to contact Carol, she first travels to Bob’s house and asks him for Carol’s address. Now, when she wants to deliver a package to Carol, she can simply follow the block structure of the town until she arrives on Carol’s block, and she has the option to take any bypass she happens to come across if it brings her closer to Carol’s place. That’s basically how routing on the tree, or taking an off-tree shortcut, work in Yggdrasil’s greedy routing scheme, except with a tree instead of a grid (which, in addition to working everywhere, seems to work well in the places we care about).

I had most of the important parts of this working, in simulations, by mid September of 2015. Initially, I also included off-tree distance-vector like routes to nodes where the on-tree path would be too long, but I abandoned this once I saw that it added relatively little (except protocol overhead) for the kinds of networks that tend to show up in practice, including some internet topology maps from CAIDA and DIMES. In particular, it seems to work well any time the network diameter is small and the number of triangles in the network is large, since the former limits the worst case scenario paths that the network can use, and the latter adds many opportunities for off-tree shortcuts.

Going public

Having (mostly) finished simulation tests by about spring of 2016, I sat on the idea for a while, trying to work up the motivation to do anything with it. I eventually sat down one weekend and worked through gobyexample. The language seemed fast enough for a reasonable prototype, easy enough to learn/read that other people could pick it up quickly if they want to contribute, and generally made multithreading/multiprocessing bearable for me. Since I wanted to continue playing with the language, and I’d been meaning to implement my routing scheme for a while, I ultimately resolved to rewrite my sim in Go, refactor the important parts into the library, and then add the missing pieces to make it more-or-less a cjdns clone with different routing. Most of the work happened over a couple of long weekends, and I released the first working prototype on GitHub just before the end of 2017.

Changes since then are mostly documented in the git log, GitHub issues and pull requests, and discussions in our public matrix channel. Neil joined and started adding support for other platforms, and we started to roll out public nodes and attract more users. As of writing, a year or so after the first public release, there are around 130-140 nodes in the network, depending on the time of day, with maybe half of them having joined in the last few months.

Announcing Yggdrasil Network v0.3

2018-12-12T00:00:00+00:00

It’s finally here

At the end of 2017, Yggdrasil’s first commit was uploaded to GitHub - a project to explore whether it was possible to build a decentralised, end-to-end encrypted and scalable compact routing scheme modelled around the concept of a global spanning tree. Many concept routing schemes that we have seen to date seem to have problems with scalability - after the network exceeds a certain size, they either fail to perform or they start to rely on centralised points in order to consolidate routing information. We want to figure out how to build something that would not be subject to these limitations, and to maintain decentralisation as far as possible, and the best way to test our ideas is to build that network. To our knowledge, this hasn’t quite been achieved before.

Throughout the course of 2018, Yggdrasil has gone from being a very early-stage project supporting only a single platform to a feature-strong and relatively stable project which now runs on many supported platforms. Although we currently still haven’t advanced from the “alpha” label, our network has grown to exceed 70 nodes across the world (and growing slowly but steadily), with a good portion of these users coming on-board and contributing their own services to the network and using the network for their own purposes. We’ve even had a small amount of publicity - Toronto Mesh have been exploring using Yggdrasil on their city-wide mesh net, and even presented some Yggdrasil fundamentals to the Norwegian Unix User Group (NUUG) back in October.

So far, we believe that Yggdrasil is well on track to delivering on its promises to build a fully end-to-end encrypted, self-arranging IPv6 network. We also believe that Yggdrasil should be scalable on paper; we have somewhat proven this in simulations, but the real proof will come in how the Yggdrasil Network scales up in the real world, on real hardware, across real links. Having users helping us to test brings us closer to our goal and enhances our understanding of how our software will behave on large-scale network graphs.

Version 0.3 has been quite some time coming - we released version 0.2.7 on the 13th October and we have been working since then on what will make it into this release. Even though it feels in some ways that version 0.3 is a relatively small evolutionary release, it’s actually by far our biggest release yet. We’ve included quite a large list of fixes, changes and even new features and over 2000 lines of code changed. We’ve taken a lot of feedback from our users about their use-cases and pain points, and we’ve collected topographical data from various contributor nodes to try and get a good view of what the network looks like. We’ve even experienced some rather large topology changes and enjoyed relatively good network stability throughout.

For much of the time that we were developing v0.3, we had thought that there would end up being protocol-breaking changes and that this would render v0.3 incompatible with nodes running previous versions. I am happy to announce that we have not needed to introduce breaking changes at this stage and currently the network has been running a mix of both older and newer developmental nodes without any particular issues.

Features

You can see the full list of modifications that have been made in our changelog.

Perhaps the largest user-visible change is the introduction of Crypto-Key Routing for traffic tunnelling, allowing you to effectively use Yggdrasil as a VPN for both IPv4 and IPv6 traffic between any two given points on the network. This tunnelled traffic enjoys the same benefits as regular Yggdrasil IPv6 traffic in that it is end-to-end encrypted and our many optimisations assist in preventing TCP-over-TCP anomalies that often arise in other solutions. I wrote an introductory blog post back at the beginning of November about CKR, which explains some more about how to configure it and how it works.

In the background, we’ve made a substantial change from using a Kademlia-based DHT to a Chord-based DHT. The Chord-based approach allows us to do lookups with O(1) (constant) state, and only depends on additional (O(logn)) state as a performance optimisation, which allows us to bootstrap more quickly after changes. We also believe that using Chord can help us to reduce some idle DHT chatter on the network in the future, which will save a little bandwidth, and may be helpful on battery-powered devices.

The spanning tree is now constructed a little differently. Previously, in a stable network, each node would select a new parent only if this reduced the length of the path to the root of the tree, measured by the number of other Yggdrasil nodes in the path. This has the virtue of simplicity, but it sometimes leads to poor performance when a node replaces a few low-latency/high-bandwidth local links with a comparatively high-latency/low-bandwidth link over the internet (or an anonymous overlay like Tor or I2P). Starting with this release, nodes will switch to a new parent if it provides a consistency lower latency path to the root, and its less eager to immediately switch again after having just changed parents. This should lead to lower latency in stable networks, and better reliability in unstable ones.

We’ve fixed a reasonable number of bugs and crashes, including in the DHT, switch and ICMPv6 code, and have made a number of additions to the admin socket in order to support new functionality and to make parameter naming more consistent throughout.

Upgrading

Our CI pipeline automatically produces builds for all supported platforms and these will become available on our Builds page. In addition, our S3 repository for Debian and RPM-based distributions will also be updated with the new package releases.

New macOS .pkg installers are now available as a part of the v0.3 release too, so installing and upgrading on macOS is now significantly easier than before. You can find these installers on the Builds page also.

On other platforms, simply download the latest binary for your platform and drop it into place. Remember to take a backup of your configuration and normalise it, which will add any new options for features in v0.3:

cp /path/to/yggdrasil.conf /var/backups/yggdrasil.conf
yggdrasil -useconffile /var/backups/yggdrasil.conf -normaliseconf > /path/to/yggdrasil.conf

What’s next?

Our work is far from over. We still have a list of things that can potentially be rolled into future releases and we will be looking to see what we should prioritise for our next version.

A big thanks to our contributors, particularly those who have worked on creating packages for Yggdrasil and bringing it to their distributions of choice, and to all of the users who use Yggdrasil, contributing services and providing feedback to us on a regular basis!