Sunday 23 August 2020

Linux LVS with docker containers

I came across Linux LVS while reading about load balancing and relalized its an old Linux concept that I have never tested.
LVS is a mechanism implemented by Linux kernel to do layer-4 (transport layer) switching, mainly to achieve load distribution for high availability purposes.
LVS is managed using a command line tool: ipvmadm  
I used docker containers running apache 2.4 to test the concept on a Virtual box VM.
First, I needed to install docker containers, and create docker volums to store the apache configuration and the content outside the containers:

root@fingolfin:~# docker volume create httpdcfg1
httpdcfg1
root@fingolfin:~# docker volume create httpdcfg2
httpdcfg2
root@fingolfin:~# docker volume create www2
www2
root@fingolfin:~# docker volume create www1
www1
root@fingolfin:~#

Once the volumes are created, we can start the containers:

root@fingolfin:~# docker run -d --name httpd1 -p 8180:80 -v www1:/usr/local/apache2/htdocs/ -v httpdcfg1:/usr/local/apache2/conf/ httpd:2.4
a6e11431a228498b8fc412dfcee6b0fc682ce241e79527fdf33e7ceb1945e54a
root@fingolfin:~#
root@fingolfin:~# docker run -d --name httpd2 -p 8280:80 -v www2:/usr/local/apache2/htdocs/ -v httpdcfg2:/usr/local/apache2/conf/ httpd:2.4
b40e29e187b0841d81b345ca975cd867bcce587be8b3f79e43a2ec0d1087aba8
root@fingolfin:~#

root@fingolfin:~# docker container ps
CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS                  NAMES
a665073770ec        httpd:2.4           "httpd-foreground"   41 minutes ago      Up 41 minutes       0.0.0.0:8280->80/tcp   httpd2
d1dc596f68a6        httpd:2.4           "httpd-foreground"   54 minutes ago      Up 45 minutes       0.0.0.0:8180->80/tcp   httpd1
root@fingolfin:~#

Then we need to change the index.html on the volumes www1 and www2 to show different nodes.
This can be done by directly accessing the file from the docker volume mount point:

root@fingolfin:~# docker volume inspect www1
[
    {
        "CreatedAt": "2020-08-23T16:59:50+02:00",
        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/www1/_data",
        "Name": "www1",
        "Options": {},
        "Scope": "local"
    }
]
root@fingolfin:~# cd /var/lib/docker/volumes/www1/_data/

Next step is to obtain the docker container IP address using docker container inspect:

root@fingolfin:~# docker container inspect httpd1|grep '"IPAddress":'
            "IPAddress": "172.17.0.2",
                    "IPAddress": "172.17.0.2",
root@fingolfin:~#
root@fingolfin:~# docker container inspect httpd2|grep '"IPAddress":'
            "IPAddress": "172.17.0.3",
                    "IPAddress": "172.17.0.3",
root@fingolfin:~#

One more thing, we need to install ping inside the containers, just to make sure that we can troubleshoot network in case things didn't work.
To do this we use docker exec:
root@fingolfin:~# docker exec -it httpd1 bash
root@a6e11431a228:/usr/local/apache2#
root@a6e11431a228:/usr/local/apache2# apt update; apt install iputils-ping -y

Next we will create a subinterface IP address or as people call it a vip (virtual ip) on the main ethernet device of the docker host, will use a 10.0.2.0 network address, but it should be equally fine to use any other IP addess.

root@fingolfin:~# ifconfig enp0s3:1 10.0.2.200 netmask 255.255.255.0 broadcast 10.0.2.255
root@fingolfin:~#root@fingolfin:~# ip addr show dev enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:7e:6e:97 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute enp0s3
       valid_lft 74725sec preferred_lft 74725sec
    inet 172.17.60.200/16 brd 172.17.255.255 scope global enp0s3:0
       valid_lft forever preferred_lft forever
    inet 10.0.2.200/24 brd 10.0.2.255 scope global secondary enp0s3:1
       valid_lft forever preferred_lft forever
    inet6 fe80::a790:c580:9be5:55ef/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
root@fingolfin:~#

We can then ping the IP of the host, 10.0.2.15 from within the docker container, to ensure container can reach that:

root@a6e11431a228:/usr/local/apache2# ping 10.0.2.15
PING 10.0.2.15 (10.0.2.15) 56(84) bytes of data.
64 bytes from 10.0.2.15: icmp_seq=1 ttl=64 time=0.114 ms
64 bytes from 10.0.2.15: icmp_seq=2 ttl=64 time=0.127 ms
^C
--- 10.0.2.15 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 17ms
rtt min/avg/max/mdev = 0.114/0.120/0.127/0.012 ms
root@a6e11431a228:/usr/local/apache2#

Then set up the Linux LVS from the host prompt:

root@fingolfin:~# ipvsadm -A -t 172.17.60.200:80 -s rr
root@fingolfin:~# ipvsadm -a -t 172.17.60.200:80 -r 172.17.0.3:80 -m
root@fingolfin:~# ipvsadm -a -t 172.17.60.200:80 -r 172.17.0.2:80 -m
root@fingolfin:~# ipvsadm -L -n 
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn   
TCP  10.0.2.200:80 rr
  -> 172.17.0.2:80                Masq    1      0          0         
  -> 172.17.0.3:80                Masq    1      0          0    
root@fingolfin:~#

Then to test the setup, we use curl from the host command line:

root@fingolfin:~# curl http://10.0.2.200
It works! node 1
root@fingolfin:~# curl http://10.0.2.200
It works! node 2
root@fingolfin:~# curl http://10.0.2.200
It works! node 1
root@fingolfin:~# curl http://10.0.2.200
It works! node 2
root@fingolfin:~# 

root@fingolfin:~# ipvsadm -L -n --stats
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port               Conns   InPkts  OutPkts  InBytes OutBytes
  -> RemoteAddress:Port
TCP  10.0.2.200:80                       4       24       16     1576     1972
  -> 172.17.0.2:80                       2       12        8      788      986
  -> 172.17.0.3:80                       2       12        8      788      986
root@fingolfin:~#

As can be seen, the requests are equally distributed between the 2 Apache containers.
Other loadbalancing algorithms can be used to suite the need of the use case using the -s (scheduler) option.
More information can be found in the Linux ipvsadm man page: https://linux.die.net/man/8/ipvsadm

Referances:
http://www.ultramonkey.org/papers/lvs_tutorial/html/

Sunday 16 August 2020

Tomcat Session Replication Clustering + Apache Load Balancer

I wanted to start putting in some test configurations for a future project based on Tomcat 9.

Needed to setup a test environment that runs on 1 VM, that is just to test my Apache and Tomcat clustering configuration, the test setup looks like the below:

The setup is going to have 2 tomcat 9 nodes with clustering enabled, this should allow sessions to be replicated between the 2 nodes, session persistence is not configured yet.

In order to be able to effectively use the cluster, we need to also have a load balancer to proxy the requests to the cluster nodes, I have choose to use Apache for this project.

Tomcat 9 can be downloaded from the official Tomcat website: https://tomcat.apache.org/download-90.cgi.

Once downloaded, we just need to unzip the tomcat installation in to a directory named node01 and copy that into another directory named node02.

We then need to modify the Tomcat server.xml so that the 2 nodes would use different ports since they run on the same network, this can be done by modifying the connector element and the shutdown port in the server element:

<Server port="8105" shutdown="SHUTDOWN">
.....
    <Connector port="8180" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />

Then we need to add the below default cluster configuration, it is essentially copied from the tomcat documentation, though we still need to modify the receiver port since we are running on the same machine:

 <Host name="localhost"  appBase="webapps"
              unpackWARs="true" autoDeploy="true">

              <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
                 channelSendOptions="8">

                  <Manager className="org.apache.catalina.ha.session.DeltaManager"
                   expireSessionsOnShutdown="false"
                   notifyListenersOnReplication="true"/>

                  <Channel className="org.apache.catalina.tribes.group.GroupChannel">
                        <Membership className="org.apache.catalina.tribes.membership.McastService"
                                address="228.0.0.4"
                                port="45564"
                                frequency="500"
                                dropTime="3000"/>
                        <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                                address="0.0.0.0"
                                port="4001"
                                autoBind="100"
                                selectorTimeout="5000"
                                maxThreads="6"/>

                        <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                                <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
                        </Sender>
                        <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
                        <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor"/>
                </Channel>

                <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
                        filter=""/>
                <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>

                <Deployer className="org.apache.catalina.ha.deploy.FarmWarDeployer"
                    tempDir="/tmp/war-temp/"
                    deployDir="/tmp/war-deploy/"
                    watchDir="/tmp/war-listen/"
                    watchEnabled="false"/>

                <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
          </Cluster>

In this case I placed the cluster configuration under host element so that the deployer doesn't cause a warning in Tomat start up, as deployment is specific to a host not to the whole engine.

You can find the whole server.xml file in this URL: https://github.com/sherif-abdelfattah/ExSession_old/blob/master/server.xml.

Note that I included the JvmRouteBinderValve valve in the configuration, this is needed to be able to recover from failed session stickiness configuration, the cluster would just rewrite the JSESSIONID cookie and continue to serve the same session from the node that the request landed on.

Once we configure the 2 nodes, we then start them up and we can see the below messages in the Tomcat catalina.out:

node01 output when it starts to join node02:

16-Aug-2020 19:37:34.751 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8180"]
16-Aug-2020 19:37:34.768 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [431] milliseconds
16-Aug-2020 19:37:34.804 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
16-Aug-2020 19:37:34.804 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.37]
16-Aug-2020 19:37:34.812 INFO [main] org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal Cluster is about to start
16-Aug-2020 19:37:34.818 INFO [main] org.apache.catalina.tribes.transport.ReceiverBase.bind Receiver Server Socket bound to:[/0.0.0.0:4001]
16-Aug-2020 19:37:34.829 INFO [main] org.apache.catalina.tribes.membership.McastServiceImpl.setupSocket Setting cluster mcast soTimeout to [500]
16-Aug-2020 19:37:34.830 INFO [main] org.apache.catalina.tribes.membership.McastServiceImpl.waitForMembers Sleeping for [1000] milliseconds to establish cluster membership, start level:[4]
16-Aug-2020 19:37:35.221 INFO [Membership-MemberAdded.] org.apache.catalina.ha.tcp.SimpleTcpCluster.memberAdded Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{0, 0, 0, 0}:4002,{0, 0, 0, 0},4002, alive=3358655, securePort=-1, UDP Port=-1, id={-56 72 63 -39 -27 123 70 86 -109 22 97 122 50 -99 0 24 }, payload={}, command={}, domain={}]]
16-Aug-2020 19:37:35.832 INFO [main] org.apache.catalina.tribes.membership.McastServiceImpl.waitForMembers Done sleeping, membership established, start level:[4]
16-Aug-2020 19:37:35.836 INFO [main] org.apache.catalina.tribes.membership.McastServiceImpl.waitForMembers Sleeping for [1000] milliseconds to establish cluster membership, start level:[8]
16-Aug-2020 19:37:35.842 INFO [Tribes-Task-Receiver[localhost-Channel]-1] org.apache.catalina.tribes.io.BufferPool.getBufferPool Created a buffer pool with max size:[104857600] bytes of type: [org.apache.catalina.tribes.io.BufferPool15Impl]
16-Aug-2020 19:37:36.837 INFO [main] org.apache.catalina.tribes.membership.McastServiceImpl.waitForMembers Done sleeping, membership established, start level:[8]
16-Aug-2020 19:37:36.842 INFO [main] org.apache.catalina.ha.deploy.FarmWarDeployer.start Cluster FarmWarDeployer started.
16-Aug-2020 19:37:36.849 INFO [main] org.apache.catalina.ha.session.JvmRouteBinderValve.startInternal JvmRouteBinderValve started
16-Aug-2020 19:37:36.855 INFO [main] org.apache.catalina.startup.HostConfig.deployWAR Deploying web application archive [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/ExSession.war]
16-Aug-2020 19:37:37.000 INFO [main] org.apache.catalina.ha.session.DeltaManager.startInternal Register manager [/ExSession] to cluster element [Host] with name [localhost]
16-Aug-2020 19:37:37.000 INFO [main] org.apache.catalina.ha.session.DeltaManager.startInternal Starting clustering manager at [/ExSession]
16-Aug-2020 19:37:37.005 INFO [main] org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager [/ExSession], requesting session state from [org.apache.catalina.tribes.membership.MemberImpl[tcp://{0, 0, 0, 0}:4002,{0, 0, 0, 0},4002, alive=3360160, securePort=-1, UDP Port=-1, id={-56 72 63 -39 -27 123 70 86 -109 22 97 122 50 -99 0 24 }, payload={}, command={}, domain={}]]. This operation will timeout if no session state has been received within [60] seconds.
16-Aug-2020 19:37:37.121 INFO [main] org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager [/ExSession]; session state sent at [8/16/20, 7:37 PM] received in [105] ms.
16-Aug-2020 19:37:37.160 INFO [main] org.apache.catalina.startup.HostConfig.deployWAR Deployment of web application archive [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/ExSession.war] has finished in [305] ms
16-Aug-2020 19:37:37.161 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/ROOT]
16-Aug-2020 19:37:37.185 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/ROOT] has finished in [24] ms
16-Aug-2020 19:37:37.185 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/docs]
16-Aug-2020 19:37:37.199 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/docs] has finished in [14] ms
16-Aug-2020 19:37:37.200 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/examples]
16-Aug-2020 19:37:37.393 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/examples] has finished in [193] ms
16-Aug-2020 19:37:37.393 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/host-manager]
16-Aug-2020 19:37:37.409 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/host-manager] has finished in [16] ms
16-Aug-2020 19:37:37.409 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/manager]
16-Aug-2020 19:37:37.422 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/manager] has finished in [13] ms
16-Aug-2020 19:37:37.422 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/ExSession_exploded]
16-Aug-2020 19:37:37.435 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/opt/tomcat_cluster/apache-tomcat-9.0.37_node1/webapps/ExSession_exploded] has finished in [12] ms
16-Aug-2020 19:37:37.436 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8180"]
16-Aug-2020 19:37:37.445 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in [2676] milliseconds

These logs show that the cluster has started and that our node01 joined node02 which is listing on the receiver port 4002.

Next we need to configure Apache as a load balancer, since we have a running cluster with session replication, we have 2 options, either to setup session stickiness or not to set it up and let the Tomcat cluster handle sessions through session replication.

The Apache load balancer configuration in both cases would look like this:

# Apache Reverse proxy loadbalancer with session stickiness set to use JSESSIONID

<Proxy balancer://mycluster>
    BalancerMember http://127.0.0.1:8180  route=node01
    BalancerMember http://127.0.0.1:8280  route=node02
</Proxy>
ProxyPass /ExSession balancer://mycluster/ExSession stickysession=JSESSIONID|jsessionid
ProxyPassReverse /ExSession balancer://mycluster/ExSession stickysession=JSESSIONID|jsessionid

# Apache Reverse proxy loadbalancer with no session stickiness

#<Proxy "balancer://mycluster">
#    BalancerMember "http://127.0.0.1:8180"
#    BalancerMember "http://127.0.0.1:8280"
#</Proxy>
#ProxyPass "/ExSession" "balancer://mycluster/ExSession"
#ProxyPassReverse "/ExSession" "balancer://mycluster/ExSession"

I have commented out the config without session stickiness, and used the one with sessions stickiness to test what would happen if I deploy an application and shutdown the node I am sticky to it and see if my session will be replicated or not.

For testing this, I have wrote a small servlet based on the Tomcat servlet examples, it can be found in this URL: https://github.com/sherif-abdelfattah/ExSession_old/blob/master/ExSession.war.

One thing about the ExSession servlet application, it needs to have its sessions implementing the Java Serializable Interface so that Tomcat session managers can handle them, and the application must be marked with the <distributable/> tag in the application web.xml file.

Those are preconditions for application to be able to work with Tomcat cluster.

Once Apache is up and running we can see it is now able to display our servlet page directly from localhost on port 80:

Now take a note of our JSESSIONID cookie value, we are now served from node02, and the ID ends in CF07, the session is created on Aug. 16th and 20:19:55.

Now lets stop Tomcat node02 and refresh the browser page:

Now it can be seen that Tomcat failed over our session to node01, the session still have same time stamp and same ID, but it is served from node01.

This proves that session replication works as expected.