Open Source Windows service for reporting server load back to HAProxy (load balancer feedback agent).

In general when you are load balancing a cluster you can evenly spread the connections through the cluster and you get pretty consistent and even load balancing. However with some applications such as RDS (Microsoft Terminal Servers), you can get very high load from just a  few users doing heavy work. The solution to this is to use some kind of server load feedback agent. We’ve had one of these for a while in our product but now with a lot of help from Simon Horman we’ve managed to integrate the functionality into the main branch (well soon anyway) of HAproxy. We thought it would be a good idea to open source the previous work on Ldirectord/LVS, make it compatible with HAProxy, and release our Windows service code as GPL.

Until the work is merged and tested with an official release of HAProxy we’ve compiled a patched version of HAProxy dev19 ish here…. (http://downloads.loadbalancer.org/agent/haproxy-agent-check-20130813.tar.gz) Or you can get the patches from the mailing list archive…

UPDATE: The Loadbalancer.org feedback agent code is now supported in HAProxy 1.5-dev21

Download the Windows Feedback Agent Service Here:  http://downloads.loadbalancer.org/agent/loadbalanceragent.msi

 

Simply compile as usual and then modify your RDS cluster:

listen RDSTest
	bind 192.168.69.22:3389
	mode tcp
	balance leastconn
	persist rdp-cookie
	server backup 127.0.0.1:9081 backup  non-stick
	tcp-request inspect-delay 5s
	tcp-request content accept if RDP_COOKIE
	timeout client 12h
	timeout server 12h
	option tcpka
	option redispatch
	option abortonclose
	maxconn 40000
	server Win2008R2 192.168.64.50:3389 weight 100 check agent-check agent-port 3333 inter 2000  rise 2  fall 3 minconn 0  maxconn 0  on-marked-down shutdown-sessions

The important bit agent-check agent-port 3333 tells HAProxy to constantly monitor each backend server in the cluster by doing a telnet to port 3333 and grabbing the response which will usually be a percentage idle value i.e.

80% – I am not very busy please increase my weight and send me more traffic
10% – I’m busy please decrease my weight and stop sending me so much traffic
drain – Set the weight to 0 and gradually drain the traffic from this server for maintenance
stop – Stop all traffic immediately, kill this backend server

If you have a Linux backend you could create a simple service calling the following script:

#!/bin/bash
LOAD=(/usr/bin/vmstat 1 2| /usr/bin/tail -1| /usr/bin/awk '{print $15;}' | /usr/bin/tee)
echo "$LOAD%"
#This outputs a 1 second average CPU idle

Call the script  /usr/bin/lb-feedback.sh
make sure that you make it executable:

chmod +x /usr/bin/lb-feedback.sh


Insert this line into /etc/services

lb-feedback 3333/tcp # loadbalancer.org feedback daemon

Now create the following file called /etc/xinetd.d/lb-feedback

# default: on
# description: lb-feedback socket server
service lb-feedback
{
 port = 3333
 socket_type = stream
 flags = REUSE
 wait = no
 user = nobody
 server = /usr/bin/lb-feedback.sh
 log_on_success += USERID
 log_on_failure += USERID
 disable = no
}

Then change permissions and restart xinetd:

chmod 644 /etc/xinetd.d/lb-feedback
/etc/init.d/xinetd restart

You can now test this service by using telnet:

telnet 127.0.0.1 3333
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
95%
Connection closed by foreign host.

Now if you have a Windows server as your backend you can use our open source monitor service. You can download the Loadbalancer.org windows feedback agent here (http://downloads.loadbalancer.org/agent/loadbalanceragent.msi)

Source code is here together with the binary: CpuMonitor_4.3.0.zip

Once you have installed Loadbalancer.org feedback service you should find the monitor.exe file in Program Files/LoadBalancer.org

Feedback

Simply hit the ‘start’ button and the agent should start responding to telnet on port 3333 (you may need to make an exception for that port in your Windows firewall).

You can change the ‘mode’ setting to drain then ‘apply settings and restart’ and HAProxy will then set the weight to 0 and status to drain (blue) i.e.:

drain

Or you can set the ‘mode’ to halt then ‘apply settings and restart’ and HAProxy will then immediately set the status to DOWN (yellow) i.e.:

down

When the agent is running in normal mode it will report back the percentage idle of the system based on the settings in the feedback agent XML file:

<xml>
  <Cpu>
    <ImportanceFactor value="1" />
    <ThresholdValue value="100" />
  </Cpu>
  <Ram>
    <ImportanceFactor value="0" />
    <ThresholdValue value="100" />
  </Ram>
  <TCPService>
    <Name value="HTTP" />
    <IPAddress value="*" />
    <Port value="80" />
    <MaxConnections value="0" />
    <ImportanceFactor value="0" />
  </TCPService>
  <ReadAgentStatusFromConfig value="False" />
  <ReadAgentStatusFromConfigInterval value="5" />
  <AgentStatus value="Normal" />
</xml>

Notice that you can control both the importance of CPU & RAM utilization and also a threshold, so the following logic is used:

If CPU importance = 0 then ignore
If RAM importance = 0 then ignore
If Threshold level is reached on any monitor then immediately go into DRAIN mode.

Otherwise to calculate the percentage idle reported by the agent we
would be to divide the utilization by the number of factors involved i.e.

If you are using two services then:

utilization = utilization + cpuLoad * cpuImportance%;
utilization = utilization + ramOccupied * ramImportance%;
utilization = utilization / 2

So if importance was 1 for both cpu and ram you would only get 0% reported if both CPU and RAM were 100%.

And if the importance is zero then ignore completely i.e.

utilization = utilization + cpuLoad * cpuImportance%;
//utilization = utilization + ramOccupied * 0 (importance is zero so ignore)
utilization = utilization (one service only so don’t divide)

Also the final section TCPService effictvley lets you load balance on number of established connections to your server, so you could balance based on the number of RDP connections to port 3389.

For this setting MaxConnections is important to specify as otherwise the agent will have no idea how to calculate the load i.e.
utilization = MaxConnections / 100 * number of current connections * importance%

In the following screen shot from a Loadbalancer.org appliance you can see that the Win2008R2 server is healthy and 99% idle, whereas the Linux server was busy at 43% idle before the Linux agent was put into maintenance mode and the server taken out of the group.

sysoverview

Does that make sense? Have a play with the config file and let us know what you think….

 

 

 

 

 

 

 

 

7 thoughts on “Open Source Windows service for reporting server load back to HAProxy (load balancer feedback agent).

  1. Felix,
    Full support for the new agent will be incorporated in the Loadbalancer.org ENTERPRISE VA v7.6. The new version will also include external health check scripts, ssl re-encryption etc. I’ll update this post as soon as we have a release date (which should be very soon).

  2. Hi,

    We currently testing with this custom version of HAProxy. The backend servers are multiple Windows 2008R2 server with the loadbalancer.org agent.

    The problem is that the weight value is not decreased during heavy load. The agent is reporting the correct value. Please could someone help me to fix this problem? My config file:

    global
    daemon
    stats socket /var/run/haproxy.stat mode 600 level admin
    pidfile /var/run/haproxy.pid
    maxconn 40000
    ulimit-n 81000
    tune.maxrewrite 1024
    defaults
    mode http
    balance roundrobin
    timeout connect 4000
    timeout client 42000
    timeout server 43000
    listen RDP_Test
    bind 172.17.20.8:3389
    mode tcp
    balance leastconn
    option tcpka
    tcp-request inspect-delay 5s
    tcp-request content accept if RDP_COOKIE
    option tcpka
    timeout client 12h
    timeout server 12h
    option redispatch
    option abortonclose
    maxconn 40000
    server SRV-TS01 172.17.20.5:3389 weight 100 check agent-port 3333 inter 2000 rise 2 fall 3 minconn 0 maxconn 0 on-marked-down shutdown-sessions
    server SRV-TS02 172.17.20.6:3389 weight 100 check agent-port 3333 inter 2000 rise 2 fall 3 minconn 0 maxconn 0 on-marked-down shutdown-sessions
    server SRV-TS03 172.17.20.7:3389 weight 100 check agent-port 3333 inter 2000 rise 2 fall 3 minconn 0 maxconn 0 on-marked-down shutdown-sessions
    listen stats :7777
    stats enable
    stats uri /
    option httpclose
    stats auth loadbalancer:loadbalancer

    Thanks!

  3. Hi there,

    I was looking for something about HAProxy and found that post, and it’s funny because we already worked on this topic. We’ve developed such a feature in the past year for HAProxy, relying the http check “disbale-on-404″ feature. In fact, we have a light service (.NET service with listening socket) that will return 200 if load is OK, 404 if not. The load trigger is fully configurable, could be base on CPU load, RAM load or both. The period on wich the load is monitored is also configurable and all settings are stored in the regsitry to permit configuration deployement through GPO. When the load is ok, the returned page also contains the actual load of the server.

  4. Willem,

    Could you provide some more detail?
    Do the weights in HAProxy change correctly under normal load?
    Is it when the real server is under high load only that the problem occurs?

Leave a Reply

Login with your Social ID

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Powered by sweet Captcha