FFMUC: Half a year with WireGuard
VXLAN + B.A.T.M.A.N. and some python included
FFWCW 2021
awlnx
● Annika Wickert
● Senior Network Engineer / OpenSource since 2010
● Twitter @awlnx / Github @awlx
2
Who am I?
3
FFMUC?
• Freie Netze München e.V. since 2014
• Community Freifunk München since 2004
• Wifi
• #FFMEET
• DoH/DoT/DNSCrypt/DNS
• Streaming
4
FFMUC ran on fastd
• FFMUC was built with fastd and B.A.T.M.A.N.
• We got bigger compute nodes and bigger uplinks - we wanted to leverage the
resources
• We didn’t want to change too much at once => not too much risk
• So why not change _only_ the transport network and keep B.A.T.M.A.N.
5
Wireguard vs fastd
• Fastd is a single threaded userspace process
• WireGuard runs in kernel space thus has to be multi threaded
• WireGuard cannot transport Layer 2 protocols - B.A.T.M.A.N. is one ...
• We need another encapsulation which solves this problem => VXLAN
Wireguard
VXLAN
B.A.T.M.A.N.
6
What does it look like in the end?
7
Challenges we already knew
• No systemd-networkd support for B.A.T.M.A.N.
• We are an open network - we don’t want node owners to signup
• WireGuard has a pre-shared key infra
=> we need a daemon which handles incoming keys and programs them
to the gateways
8
WGKex!
9
How does it work?
• WireGuard peers on the gateways are created by wgkex
• Allowed IP is derived from the public key of the node
• VxLAN Forwarding database entries are created by wgkex
10
Get in touch with maintainers
• To get validation data correct for wgkex etc
• We contacted WireGuard maintainers early in the process
• Asked questions about known scaling issues
• Opened PRs early as drafts to see if there is a chance of merging
• systemd-networkd https://github.com/systemd/systemd/pull/17252
• gluon-community-packages
https://github.com/freifunk-gluon/community-packages/pull/6
11
Solve problems upstream!
• We invested much time in systemd-networkd
• We wanted to get our stuff merged in upstream
• No custom solutions for our setup, just upstream compatible which solves many
resource problems in the future
12
Gateways
• Everything is automated with Saltstack
• systemd-networkd takes care of all interfaces
• 800 - 1000 Nodes per gateway are easy
• We are able to run whole FFMUC on just two gateways
13
Debugging … Flamegraphs and Bugs
• WireGuard performs well but we have too much load on our gateways. Why?
14
Upstream fixes!
• B.A.T.M.A.N.
■ https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20201126
153120.1053700-1-sven@narfation.org/
■ https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20201127
173849.19208-4-sw@simonwunderlich.de/
■ https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20201127
173849.19208-2-sw@simonwunderlich.de/
• VxLAN
■ https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20201126
125247.1047977-1-sven@narfation.org/
15
Keep your NTP sync!
• Sync NTP before you try to connect to WireGuard
• If you don’t do that many funky things happen
• OpenWRT defaults its clock to build date of firmware so it works the first few
days after release … because it’s good enough
16
Not enough random during boot
• ERX didn’t have a good enough random seed …
• After flashing, it’s unreachable for … hours … days … maybe weeks?
=> fixed
https://github.com/oszilloskop/UBNT_ERX_Gluon_Factory-Image/issues/
3
17
So is it faster?
18
Lessons learned
• Commit as much stuff as possible upstream
• Work close with upstream
• Get much feedback from all the communities/other people
• Involve as many people as you can
• Start your project anyway ;)
19
What’s next?
• We want to get rid of B.A.T.M.A.N. for gateway uplinks (make broadcast
domains small)
■ Should boost performance by 5x to 7x depending on CPU
■ Maybe VxLAN first, then a fully routed approach
■ https://github.com/freifunkMUC/site-ffm/issues/87
20
Community
• Freifunk Darmstadt and Freifunk Regensburg helped a lot during development
of wgkex!
• B.A.T.M.A.N. developers helped a lot during debugging the performance issue
and created many bugfixes
• Everything is opensource and available on Github
https://github.com/freifunkMUC
• More background and all fixes:
https://ffmuc.net/freifunkmuc/2020/12/03/wireguard-firmware/
21
Thanks to everyone involved
• Freifunk Darmstadt @hexa
• Freifunk Regensburg @MoepMan
• Freifunk Hannover @aiyion, @Codefetch
• systemd Yu Watanabe, Lennart Poettering
• WireGuard Jason A. Donenfeld
• B.A.T.M.A.N. @ecsv @T_X
• All the folks of FFMUC for testing
• Everyone else who we forgot and was involved in any way
=> Community rocks! #Together #OpenSource

FFMUC: Half a year with WireGuard

  • 1.
    FFMUC: Half ayear with WireGuard VXLAN + B.A.T.M.A.N. and some python included FFWCW 2021
  • 2.
    awlnx ● Annika Wickert ●Senior Network Engineer / OpenSource since 2010 ● Twitter @awlnx / Github @awlx 2 Who am I?
  • 3.
    3 FFMUC? • Freie NetzeMünchen e.V. since 2014 • Community Freifunk München since 2004 • Wifi • #FFMEET • DoH/DoT/DNSCrypt/DNS • Streaming
  • 4.
    4 FFMUC ran onfastd • FFMUC was built with fastd and B.A.T.M.A.N. • We got bigger compute nodes and bigger uplinks - we wanted to leverage the resources • We didn’t want to change too much at once => not too much risk • So why not change _only_ the transport network and keep B.A.T.M.A.N.
  • 5.
    5 Wireguard vs fastd •Fastd is a single threaded userspace process • WireGuard runs in kernel space thus has to be multi threaded • WireGuard cannot transport Layer 2 protocols - B.A.T.M.A.N. is one ... • We need another encapsulation which solves this problem => VXLAN Wireguard VXLAN B.A.T.M.A.N.
  • 6.
    6 What does itlook like in the end?
  • 7.
    7 Challenges we alreadyknew • No systemd-networkd support for B.A.T.M.A.N. • We are an open network - we don’t want node owners to signup • WireGuard has a pre-shared key infra => we need a daemon which handles incoming keys and programs them to the gateways
  • 8.
  • 9.
    9 How does itwork? • WireGuard peers on the gateways are created by wgkex • Allowed IP is derived from the public key of the node • VxLAN Forwarding database entries are created by wgkex
  • 10.
    10 Get in touchwith maintainers • To get validation data correct for wgkex etc • We contacted WireGuard maintainers early in the process • Asked questions about known scaling issues • Opened PRs early as drafts to see if there is a chance of merging • systemd-networkd https://github.com/systemd/systemd/pull/17252 • gluon-community-packages https://github.com/freifunk-gluon/community-packages/pull/6
  • 11.
    11 Solve problems upstream! •We invested much time in systemd-networkd • We wanted to get our stuff merged in upstream • No custom solutions for our setup, just upstream compatible which solves many resource problems in the future
  • 12.
    12 Gateways • Everything isautomated with Saltstack • systemd-networkd takes care of all interfaces • 800 - 1000 Nodes per gateway are easy • We are able to run whole FFMUC on just two gateways
  • 13.
    13 Debugging … Flamegraphsand Bugs • WireGuard performs well but we have too much load on our gateways. Why?
  • 14.
    14 Upstream fixes! • B.A.T.M.A.N. ■https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20201126 153120.1053700-1-sven@narfation.org/ ■ https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20201127 173849.19208-4-sw@simonwunderlich.de/ ■ https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20201127 173849.19208-2-sw@simonwunderlich.de/ • VxLAN ■ https://patchwork.open-mesh.org/project/b.a.t.m.a.n./patch/20201126 125247.1047977-1-sven@narfation.org/
  • 15.
    15 Keep your NTPsync! • Sync NTP before you try to connect to WireGuard • If you don’t do that many funky things happen • OpenWRT defaults its clock to build date of firmware so it works the first few days after release … because it’s good enough
  • 16.
    16 Not enough randomduring boot • ERX didn’t have a good enough random seed … • After flashing, it’s unreachable for … hours … days … maybe weeks? => fixed https://github.com/oszilloskop/UBNT_ERX_Gluon_Factory-Image/issues/ 3
  • 17.
    17 So is itfaster?
  • 18.
    18 Lessons learned • Commitas much stuff as possible upstream • Work close with upstream • Get much feedback from all the communities/other people • Involve as many people as you can • Start your project anyway ;)
  • 19.
    19 What’s next? • Wewant to get rid of B.A.T.M.A.N. for gateway uplinks (make broadcast domains small) ■ Should boost performance by 5x to 7x depending on CPU ■ Maybe VxLAN first, then a fully routed approach ■ https://github.com/freifunkMUC/site-ffm/issues/87
  • 20.
    20 Community • Freifunk Darmstadtand Freifunk Regensburg helped a lot during development of wgkex! • B.A.T.M.A.N. developers helped a lot during debugging the performance issue and created many bugfixes • Everything is opensource and available on Github https://github.com/freifunkMUC • More background and all fixes: https://ffmuc.net/freifunkmuc/2020/12/03/wireguard-firmware/
  • 21.
    21 Thanks to everyoneinvolved • Freifunk Darmstadt @hexa • Freifunk Regensburg @MoepMan • Freifunk Hannover @aiyion, @Codefetch • systemd Yu Watanabe, Lennart Poettering • WireGuard Jason A. Donenfeld • B.A.T.M.A.N. @ecsv @T_X • All the folks of FFMUC for testing • Everyone else who we forgot and was involved in any way => Community rocks! #Together #OpenSource