Konrad Scherer
TUESDAY, 2 MAY 2017

Docker Multi Host Networking

Introduction

I recently did a presentation at work covering the basics of getting docker container on different hosts to talk to one another. This was motivated because I wanted to understand all the available strange networking options and why Kubernetes choose the one network per pod model as the default.

Docker networking breaks many of the current assumptions about networking. A modern server can easily run 100+ containers and a datacenter rack can hold 80+ servers. If the networking model is one IP per container, that implies 100+ IPs per machine and 1000s per rack. Ephemeral containers with a short lifespan means that the network has to react quickly.

Of course there are competing container networking standards: CNM (libnetwork from Docker) and CNI (CoreOS and Kubernetes). Beyond the supported network models in Docker there is also a docker network plugin ecosystem with various vendors providing special integration with their gear.

Bridge Mode

Let’s start simple with the default bridge mode. Docker creates a linux bridge and veth per container. By default containers can access external network but external network cannot access container. This is the safe default. To allow external access to a container, host ports are forwarded to container ports. IPTables rules to prevent inter container communication. This functionality works with older kernels

Bridge mode example

> docker run --detach --publish 1234:1234 ubuntu:16.04 sleep infinity

# docker0 is the bridge, veth is connected to the docker0 bridge
> ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
    inet 128.224.56.107/24 brd 128.224.56.255 scope global eth0
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
    inet 172.17.0.1/16 scope global docker0
8: vethea44ea7@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master docker0 state UP

# iptables rules for packet forwarding
> iptables -L
Chain FORWARD (policy DROP)
target     prot opt source               destination
DOCKER     all  --  anywhere             anywhere

Chain DOCKER (1 references)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             172.17.0.2           tcp dpt:1234

# docker-proxy program forwards traffic from host port 1234 to container port 1234
> pgrep -af proxy
30676 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 1234 \
    -container-ip 172.17.0.2 -container-port 1234

Bridge Mode Limitations

  • The container IP is hidden and cannot be used for service discovery
  • Host ports become a limiting resource
  • Service discovery must have host ip and port
  • Port forwarding has a performance cost
  • Does not scale well
  • Application must support non standard port numbers
  • Large scale solutions involve load balancers + service discovery

Overlay

The overlay network feature Uses VXLAN to create a private network. It is part of Docker swarm mode. Each group of containers (Pod) has a dedicated network which is the Kubernetes network model. It does not require any underlay network modification, i.e. the network that the hosts are using. The docker swarm integration is very well done and many of the details are nicely abstracted away.

Benefits include:

  • Applications can use standard ports
  • Simplified service discovery can use DNS

Overlay Example - Create Swarm

manager> docker swarm init --advertise-addr 128.224.56.106
Swarm initialized: current node (tqxsn8ytpdq8ntd4sswl6qxjo) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join \
--token SWMTKN-1-0f49cat8w4xm29qndjza1u294i2 128.224.56.106:2377

worker1> docker swarm join ...
This node joined a swarm as a worker.

manager> docker node ls
ID                           HOSTNAME         STATUS  AVAILABILITY  MANAGER STATUS
qshqkznzaty8ggbyiodzb9jy9    worker2          Ready   Active
r0fj6inhs1tsin07mdsxoaiam    worker1          Ready   Active
tqxsn8ytpdq8ntd4sswl6qxjo *  manager          Ready   Active        Leader

Swarm Network and Load Balancing

I found this great example from the Nginx example repository:

Docker Swarm Load Balancing

Docker Swarm has built in DNS, scheduling and load balancing! In the following example A is Service1, B is Service2 and is not externally accessible

Overlay Example - Create Service

> docker network create --driver overlay demo_net
> docker service create --name service1 --replicas=3 \
        --network demo_net -p 8111:80 service1
> docker service create --name service2 --replicas=3 --network demo_net service2

Containers spread across three machines

> docker service ps service1
5fl2xpzbka28  service1.1  service1  worker1  Running
u3f4bd8q3p6d  service1.2  service1  worker2  Running
i85jdtgtinxr  service1.3  service1  manager  Running
> docker service ps service2
b5bzfdqw10y2  service2.1  service2  worker1  Running
k39m6utcq56o  service2.2  service2  worker2  Running
uaftc3ax0k17  service2.3  service2  manager  Running

Overlay Example - Load Balancing

Service 1 contacts Service 2 using internal DNS. Swarm uses Round Robin DNS lookup by default.

manager> curl -s http://worker1:8111/service1.php | grep address
service1 address: 10.255.0.9
service2 address: 10.0.0.5
manager> curl -s http://worker2:8111/service1.php | grep address
service1 address: 10.255.0.8
service2 address: 10.0.0.4

Overlay Limitations

  • VXLAN MTU and UDP complications
  • VXLAN adds latency (10-20%) and reduces throughput (50-75%)
  • Debugging VXLAN problems difficult
  • Docker swarm hides all the setup and routing complexity
  • Some network vendors provide VXLAN integration

Macvlan

  • Linux Networking driver feature
  • Low performance overhead
  • MAC and IP per container, similar to VM
  • MacVlan does not use VLANs!
  • Recently moved from docker experimental

Macvlan Example

host1> docker network create --driver macvlan --subnet 128.224.56.0/24 \
    --gateway 128.224.56.1 -o parent=eth0 mv1
host2> docker network create --driver macvlan --subnet 128.224.56.0/24 \
    --gateway 128.224.56.1 -o parent=eth0 mv1

Choose unused IPs

host1> docker run -it --rm --net=mv1 --ip=128.224.56.119 alpine /bin/sh
host2> docker run -it --rm --net=mv1 --ip=128.224.56.120 alpine /bin/sh
/ # ping 128.224.56.119
PING 128.224.56.119 (128.224.56.119): 56 data bytes
64 bytes from 128.224.56.119: seq=0 ttl=64 time=0.782 ms

Imagine /16 subnet where each host has a /24 for container IPs

Macvlan Limitations

  • Subnet and gateway must match host network
  • Requires new kernels: 4.2+
  • Requires IPAM and network cooperation
  • Isolation requires VLANs and/or firewalls
  • Limited to one broadcast domain
  • Too many MACs can overflow NIC buffer
  • Docker can allocate IPs in a given range
  • IPVLan L2 mode very similar

IPVlan L3 Mode

  • Linux Networking driver feature
  • Low performance overhead
  • Multicast and broadcast traffic silently dropped
  • Mimics Internet architecture of aggregated L3 domains
  • Scales well due to no broadcast domain
  • Docker experimental as of 1.13

IPVlan Example

create network - requires dockerd run with –experimental

host1> docker network create --driver ipvlan --subnet 192.168.120.0/24 \
    -o parent=eth0 -o ipvlan_mode=l3 iv1
host2> docker network create --driver ipvlan --subnet 192.168.121.0/24 \
    -o parent=eth0 -o ipvlan_mode=l3 iv1

Setup routes: host1=128.224.56.106, host2=128.224.56.107

host1> ip route add 192.168.121.0/24 via 128.224.56.107
host2> ip route add 192.168.120.0/24 via 128.224.56.106

Create containers

host1> docker run -it --rm --net=iv1 --ip=192.168.120.10 alpine /bin/sh
host2> docker run -it --rm --net=iv1 --ip=192.168.121.10 alpine /bin/sh
/ # ping 192.168.120.10
PING 192.168.120.10 (192.168.120.10): 56 data bytes
64 bytes from 192.168.120.10: seq=0 ttl=64 time=0.408 ms

IPVLan Limitations

  • Currently experimental
  • Requires new kernels: 4.2+
  • Isolation requires VLANs and/or iptables
  • Manage routes using BGP with Calico, Cumulus, etc.
  • Container networking becomes a routing problem, which is a well understood problem
  • Policies using BPF on veth and Cillium

Conclusion

  • Docker Multi-Host Networking is complicated!
  • Performance and Scale dictate solution
  • Balance between simplifying applications and infrastructure



MONDAY, 1 MAY 2017

Book Review: Sapiens

“Sapiens: A Brief History of Humankind” by Yuval Noah Harari

It is hard to do such a dense and well written book justice in a short blog post. I really enjoyed the content and writing style.

Much of content overlaps with books like “Guns, Germs and Steel” and “The third Chimpanzee” by Jared Diamond. The mass extinctions and genocides directly attributed to our ancestors are covered. The book is very careful to draw clear boundaries around the limits of our historical knowledge.

The first concept that really got me thinking was Culture as shared myth or fiction. Getting large groups of humans to live together requires mechanisms to limit anti-social behaviour, but violence and surveillance do not scale well. Shared fictions like the hierarchy of royalty over common people can be much more effective at regulating behavior. The clearest example from the book is the concept of a corporation. It exists only because people accept that it exists. It does not exist because a few people scribbled on some paper, although the ritual can be important. A corporation is technically just a group of people. What binds them together is an imagined construct of hierarchy, rules, values and an identity which is accepted as real by potentially millions of people.

One of my favorite lines of the book:

Yet it is an iron rule of history that every imagined hierarchy
disavows its fictional origins and claims to be natural and
inevitable.

Every culture from the Greeks to modern democracy to Communist Russia made the same claim of being natural and inevitable. The book even takes on imaged hierarchies like racial and gender and dismantles their proponents. Money is another convenient shared fiction that many people claim as inevitable. It also makes the excellent point that our current society places rich above poor and this is no more natural than placing men above women or whites above blacks. It makes the current discussions of wealth inequality even more urgent.

There is so much thought provoking material in this book I cannot cover it all. The last section talks about the future of our species: changing our genetics, becoming cyborgs and creating an intelligence more capable than our own. Each of these paths has mind boggling possibilities. The final line of the book sums it up very well:

Since we might soon be able to engineer our desires too, the real
question facing us is not "What do we want to become?" but "What
do we want to want?" Those who are not spooked by this question
probably haven't given it enough thought.

It is a book that changed the way I see and think about the world. That is highest praise for a book that I can think of.

Rating: Highly recommended




THURSDAY, 20 APRIL 2017

Book Review: Ego is the Enemy

“Ego is the Enemy” by Ryan Holiday

The central message of this book is not new. Ego has always a been a double edged sword. It motivates and energizes, but it also undermines us in many ways. Fundamentally all worthwhile progress involves more than one person and ego undermines human relationships. This book was a fantastic reminder of all the ways ego can undermine our relationships with other people and progress on our goals. The most enjoyable part was all the examples of famous and less famous people and how they succeeded by controlling ego or failed due to their ego.

The single line that resonated with me the most was “We choose to be or to do”. We either choose to expend our energy projecting an image of who we want to be or expend our energy doing the work. Do the work because it is important, not because we expect to be rewarded or acknowledged.

Thoughts provoked by this book:

The meaning of work is often a matter of perspective. A piece of code can be both “just a hack” and a valuable contribution to the world of open source software at the same time. At the same time I don’t want to make a small contribution seem more important than it really is.

Sometimes the work feels meaningful and sometimes I have to remind myself to change my perspective. Sometimes I think a different job would be more meaningful, but that ignores all the drudgery that is part of any job. I feel most motivated when I feel part of something much bigger than myself. For me it has always been the mythical “community” of open source software. Time to buckle down and do the work.

Rating: Recommended

Pages