Thursday 30 April 2020

Python & Redash data regression modeling

I was trying to do some performance data curve fitting on data related to a Java application heap size scaling against number of data objects used by application, and I thought to test the possibility of using python NumPy library to do some curve fitting using polynomial regression also thought to explore capabilities of Redash as a dash boarding tool able to pull the data from JDBC datasources and offers SQL query interface to obtain the data used for graphs from backend database.

Python proved to be quite handy and offers better regression capabilities.
Also used the Matplotlib library to do the curve plotting.
This is the sample code: https://github.com/sherif-abdelfattah/Pyplots_regression
And below is the output:


I also experimented with Postgres built-in aggregate functions that does slope and intercept calculation using least squares method, and plotted the data into Redash.
This did work too, though Redash didn't offer a decent way of doing curve fitting and I had to rely on doing the calculation work on Postgres side, mainly creating extra tables and views to hold the Linear regression information to be displayed on Redash dashboards.
On Postgres side, I created views similar to the one below:

create view reg_obj_heap as
select (SELECT regr_intercept("objcount", "max_heap") intercept FROM obj_heap) + (SELECT regr_slope("objcount", "max_heap") as slope FROM obj_heap) * "max_heap"
 as reg_obj, "max_heap" from obj_heap; 



On Redash side, I used another SQL query to build the graph:

WITH group1 AS (
  SELECT "objcount", "max_heap",
  case
 when
    "max_heap" >999 and "max_heap" <999999 then "max_heap"/1000||  ' kb'
 when
    "max_heap" >999999 and "max_heap" <999999999 then "max_heap"/1000000||  ' mb'
 when
    "max_heap" >999999999 then "max_heap"/1000000000||  ' gb'
end as "Heap"
    FROM obj_heap
),
group2 AS (
  SELECT reg_obj, "max_heap"
    FROM reg_obj_heap
)
SELECT Distinct group1.*,group2.reg_obj
  FROM group1
  JOIN group2 ON group1."max_heap" = group2."max_heap"
  order by "max_heap"; 


Complicated SQL grouping is needed to allow drawing 2 series on the Redash graph, the graph looks like this:



Python on the other hand offers good regression capabilities and still can do some decent graphics similar to what could be done in Redash, you can even export the Matplotlib graphs to html format using the library mpld3 and build your own dashboards.

The use of both Python and Redash for data analytics should help me to do effective capacity planning and performance modeling for my services in the future.


Saturday 18 April 2020

Testing http/2 on Apache and Centos 8

Back in 2015 Apache (2.4.17 and onwards) started to support http/2 protocol through a dedicated apache module mod_http2.
CentOS 8  ships with Apache/2.4.37 and thus is cabable of serving http/2 and it was worth a test to try to see how this innovative protocol update works.

Apache on CentOS 8 ships with mod_http2 enabled by default, but in order to use the http/2 protocol, one needs to specify it expelicity using Protocols directive. (Note that it is Protocols with an 's').
Below is a sample Apache config that enables both http/2 on plain/clear text 'h2c' and standard h2 which works on top of SSL using SSL ALPN(Application Layer Protocol Negociation).
Please note that SSL must be enabled thus Openssl should be installed and Apache mod_ssl should also be installed and an https should be configured for h2 to work.


Apache config:

[root@beren ~]# cat /etc/httpd/conf.d/http2link.conf
### Adding http2 link headers and  H2PushResource

Protocols h2c h2 http/1.1
H2EarlyHints on

Header add Link "</test/ysf_100.png>; rel=preload; as=image"
Header add Link "</test/ysf_099.png>; rel=preload; as=image"
Header add Link "</test/ysf_098.png>; rel=preload; as=image"
Header add Link "</test/ysf_097.png>; rel=preload; as=image"
Header add Link "</test/ysf_096.png>; rel=preload; as=image"

H2PushResource /test/ysf.png
H2PushResource /test/ysf_096.png
H2PushResource /test/ysf_095.png
H2PushResource /test/ysf_094.png
H2PushResource /test/ysf_093.png
H2PushResource /test/ysf_092.png

[root@beren ~]#


The configuration mainly contains the Protocols directive which lists the prefered protocls Apache will offer to the client starting with prefered from left to right, ordering matters as per Apache documentation.

Then we expelicty set early hints to on, this feature will make use of the http/2 server push features to speed up page load times.
There are 2 ways Apache can use the http/2 server push, either by adding the 'Link' header as shown above using mod_header 'Header add' directive or using the new mod_http2 'H2PushResource' directive to push a certain resource to the client using early hints.
I have created an HTML page with some 110 image resources and tried to push the above subset of resources to test the configuration.

I tried 3 different clients, curl (curl now supports http/2 if compiled with the nghttp2 library, CentOS 8 offer curl compiled with http/2 feature), nghttp client tool and Vivaldi browser (Chromium like).

Below is the output of curl in connecting to Apache using h2c (http/2 on clear text), the protocol upgrade is visible and you can see indeed there is an extra reponse with HTTP 101 Protocol upgrade before the standard HTTP 200 response:

sherif@fingolfin:~$ curl -v --http2 http://192.168.56.105/test/test.html >/dev/null
*   Trying 192.168.56.105...
* TCP_NODELAY set
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 192.168.56.105 (192.168.56.105) port 80 (#0)
> GET /test/test.html HTTP/1.1
> Host: 192.168.56.105
> User-Agent: curl/7.58.0
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAARAAAAAAAIAAAAA
>
< HTTP/1.1 101 Switching Protocols
< Upgrade: h2c
< Connection: Upgrade
* Received 101
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 103
< link: </test/ysf.png>; rel=preload, </test/ysf_096.png>; rel=preload, </test/ysf_095.png>; rel=preload, </test/ysf_094.png>; rel=preload, </test/ysf_093.png>; rel=preload, </test/ysf_092.png>; rel=preload
< HTTP/2 200
< date: Sun, 00 Jan 1900 00:00:00 GMT
< server: Apache/2.4.37 (centos) OpenSSL/1.1.1c
< last-modified: Fri, 17 Apr 2020 18:57:51 GMT
< etag: W/"1326-5a3812047b27b"
< accept-ranges: bytes
< content-length: 4902
< link: </test/ysf_100.png>; rel=preload; as=image
< link: </test/ysf_099.png>; rel=preload; as=image
< link: </test/ysf_098.png>; rel=preload; as=image
< link: </test/ysf_097.png>; rel=preload; as=image
< link: </test/ysf_096.png>; rel=preload; as=image
< content-type: text/html; charset=UTF-8
<
{ [4902 bytes data]
100  4902  100  4902    0     0  2393k      0 --:--:-- --:--:-- --:--:-- 2393k
* Connection #0 to host 192.168.56.105 left intact
sherif@fingolfin:~$ 

You can also see the Link headers added to the response from Apache with preload, this should direct the browser to receive those resoruces early on to speed up page loading.
Then below is the output of using http/2 over SSL, this time using h2 with SSL ALPN negociation, Apache is seen offering both http/2 then http/1.1 using ALPN and using TLS1.3 during the handshake:

sherif@fingolfin:~$ curl -v -k --http2 https://192.168.56.105/test/test.html >/dev/null
*   Trying 192.168.56.105...
* TCP_NODELAY set
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 192.168.56.105 (192.168.56.105) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Unknown (8):
{ [15 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2672 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; O=Unspecified; CN=beren; emailAddress=root@beren
*  start date: Apr 16 19:10:23 2020 GMT
*  expire date: Apr 21 20:50:23 2021 GMT
*  issuer: C=US; O=Unspecified; OU=ca-5235634170413358813; CN=beren; emailAddress=root@beren
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* Using Stream ID: 1 (easy handle 0x55e221b8d580)
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
> GET /test/test.html HTTP/2
> Host: 192.168.56.105
> User-Agent: curl/7.58.0
> Accept: */*
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [265 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [265 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
< HTTP/2 103
< link: </test/ysf.png>; rel=preload, </test/ysf_096.png>; rel=preload, </test/ysf_095.png>; rel=preload, </test/ysf_094.png>; rel=preload, </test/ysf_093.png>; rel=preload, </test/ysf_092.png>; rel=preload
{ [5 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
< HTTP/2 200
< date: Fri, 17 Apr 2020 19:59:34 GMT
< server: Apache/2.4.37 (centos) OpenSSL/1.1.1c
< last-modified: Fri, 17 Apr 2020 18:57:51 GMT
< etag: "1326-5a3812047b27b"
< accept-ranges: bytes
< content-length: 4902
< link: </test/ysf_100.png>; rel=preload; as=image
< link: </test/ysf_099.png>; rel=preload; as=image
< link: </test/ysf_098.png>; rel=preload; as=image
< link: </test/ysf_097.png>; rel=preload; as=image
< link: </test/ysf_096.png>; rel=preload; as=image
< content-type: text/html; charset=UTF-8
<
{ [978 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
100  4902  100  4902    0     0  76593      0 --:--:-- --:--:-- --:--:-- 76593
* Connection #0 to host 192.168.56.105 left intact
sherif@fingolfin:~$

The next test is done with the nghttp client tool that ships with nghttp2/1.33.0 and is a client implementation for the http/2 C liberary libnghttp2.
The main objective here is to test resoruces being pushed and seening if it works as expected and the differance between resources pushed with early hints and resources pushed with Link header:

Testing pushes:

sherif@fingolfin:~$ nghttp -vnys https://192.168.56.105/test/test.html
[  0.003] Connected
The negotiated protocol: h2
[  0.054] recv SETTINGS frame <length=6, flags=0x00, stream_id=0>
          (niv=1)
          [SETTINGS_MAX_CONCURRENT_STREAMS(0x03):100]
[  0.054] recv WINDOW_UPDATE frame <length=4, flags=0x00, stream_id=0>
          (window_size_increment=2147418112)
[  0.054] send SETTINGS frame <length=12, flags=0x00, stream_id=0>
          (niv=2)
          [SETTINGS_MAX_CONCURRENT_STREAMS(0x03):100]
          [SETTINGS_INITIAL_WINDOW_SIZE(0x04):65535]
[  0.054] send SETTINGS frame <length=0, flags=0x01, stream_id=0>
          ; ACK
          (niv=0)
[  0.054] send PRIORITY frame <length=5, flags=0x00, stream_id=3>
          (dep_stream_id=0, weight=201, exclusive=0)
[  0.054] send PRIORITY frame <length=5, flags=0x00, stream_id=5>
          (dep_stream_id=0, weight=101, exclusive=0)
[  0.054] send PRIORITY frame <length=5, flags=0x00, stream_id=7>
          (dep_stream_id=0, weight=1, exclusive=0)
[  0.054] send PRIORITY frame <length=5, flags=0x00, stream_id=9>
          (dep_stream_id=7, weight=1, exclusive=0)
[  0.054] send PRIORITY frame <length=5, flags=0x00, stream_id=11>
          (dep_stream_id=3, weight=1, exclusive=0)
[  0.054] send HEADERS frame <length=50, flags=0x25, stream_id=13>
          ; END_STREAM | END_HEADERS | PRIORITY
          (padlen=0, dep_stream_id=11, weight=16, exclusive=0)
          ; Open new stream
          :method: GET
          :path: /test/test.html
          :scheme: https
          :authority: 192.168.56.105
          accept: */*
          accept-encoding: gzip, deflate
          user-agent: nghttp2/1.30.0
[  0.056] recv SETTINGS frame <length=0, flags=0x01, stream_id=0>
          ; ACK
          (niv=0)
[  0.057] recv (stream_id=13) :scheme: https
[  0.057] recv (stream_id=13) :authority: 192.168.56.105
[  0.057] recv (stream_id=13) :path: /test/ysf.png
[  0.057] recv (stream_id=13) :method: GET
[  0.057] recv (stream_id=13) accept: */*
[  0.057] recv (stream_id=13) accept-encoding: gzip, deflate
[  0.057] recv (stream_id=13) user-agent: nghttp2/1.30.0
[  0.057] recv (stream_id=13) host: 192.168.56.105
[  0.057] recv PUSH_PROMISE frame <length=60, flags=0x04, stream_id=13>
          ; END_HEADERS
......          (padlen=0)
          ; First push response header
[  0.067] recv (stream_id=20) :status: 200
[  0.067] recv (stream_id=20) date: Fri, 17 Apr 2020 20:02:08 GMT
[  0.067] recv (stream_id=20) server: Apache/2.4.37 (centos) OpenSSL/1.1.1c
[  0.067] recv (stream_id=20) last-modified: Thu, 16 Apr 2020 20:38:41 GMT
[  0.067] recv (stream_id=20) etag: "36ec-5a36e6b16758e"
[  0.067] recv (stream_id=20) accept-ranges: bytes
[  0.067] recv (stream_id=20) content-length: 14060
[  0.067] recv (stream_id=20) link: </test/ysf_100.png>; rel=preload; as=image
[  0.067] recv (stream_id=20) link: </test/ysf_099.png>; rel=preload; as=image
[  0.067] recv (stream_id=20) link: </test/ysf_098.png>; rel=preload; as=image
[  0.067] recv (stream_id=20) link: </test/ysf_097.png>; rel=preload; as=image
[  0.067] recv (stream_id=20) link: </test/ysf_096.png>; rel=preload; as=image
[  0.067] recv (stream_id=20) content-type: image/png
[  0.067] recv HEADERS frame <length=37, flags=0x04, stream_id=20>
          ; END_HEADERS
          (padlen=0)
[  0.067] recv (stream_id=2) :status: 103
[  0.067] recv (stream_id=2) link: </test/ysf.png>; rel=preload, </test/ysf_096.png>; rel=preload, </test/ysf_095.png>; rel=preload, </test/ysf_094.png>; rel=preload, </test/ysf_093.png>; rel=preload, </test/ysf_092.png>; rel=preload
[  0.067] recv HEADERS frame <length=2, flags=0x04, stream_id=2>
          ; END_HEADERS
          (padlen=0)
          ; First push response header
[  0.068] recv (stream_id=2) :status: 200
[  0.068] recv (stream_id=2) date: Fri, 17 Apr 2020 20:02:08 GMT
[  0.068] recv (stream_id=2) server: Apache/2.4.37 (centos) OpenSSL/1.1.1c
[  0.068] recv (stream_id=2) last-modified: Thu, 16 Apr 2020 20:37:22 GMT
[  0.068] recv (stream_id=2) etag: "36ec-5a36e665c587d"
........[  0.079] recv DATA frame <length=1291, flags=0x00, stream_id=20>
[  0.079] recv DATA frame <length=487, flags=0x01, stream_id=14>
          ; END_STREAM
[  0.079] recv DATA frame <length=1291, flags=0x00, stream_id=2>
[  0.079] recv DATA frame <length=1150, flags=0x01, stream_id=20>
          ; END_STREAM
[  0.079] recv DATA frame <length=167, flags=0x01, stream_id=2>
          ; END_STREAM
[  0.079] send GOAWAY frame <length=8, flags=0x00, stream_id=0>
          (last_stream_id=20, error_code=NO_ERROR(0x00), opaque_data(0)=[])
***** Statistics *****

Request timing:
  responseEnd: the  time  when  last  byte of  response  was  received
               relative to connectEnd
 requestStart: the time  just before  first byte  of request  was sent
               relative  to connectEnd.   If  '*' is  shown, this  was
               pushed by server.
      process: responseEnd - requestStart
         code: HTTP status code
         size: number  of  bytes  received as  response  body  without
               inflation.
          URI: request URI

see http://www.w3.org/TR/resource-timing/#processing-model

sorted by 'complete'

id  responseEnd requestStart  process code size request path
 13     +9.49ms       +675us   8.81ms  200   4K /test/test.html
  4    +13.76ms *    +3.93ms   9.83ms  200  13K /test/ysf_096.png
  6    +23.55ms *    +4.11ms  19.44ms  200  13K /test/ysf_095.png
  8    +23.60ms *    +4.29ms  19.31ms  200  13K /test/ysf_094.png
 10    +23.65ms *    +4.68ms  18.96ms  200  13K /test/ysf_093.png
 12    +23.93ms *    +6.61ms  17.32ms  200  13K /test/ysf_092.png
 16    +25.72ms *    +8.12ms  17.59ms  200  13K /test/ysf_099.png
 18    +25.75ms *    +8.29ms  17.46ms  200  13K /test/ysf_098.png
 14    +25.81ms *    +7.96ms  17.85ms  200  13K /test/ysf_100.png
 20    +25.87ms *    +8.58ms  17.28ms  200  13K /test/ysf_097.png
  2    +25.90ms *    +3.51ms  22.39ms  200  13K /test/ysf.png
sherif@fingolfin:~$ 

From above output, we can see all the resources that are starred are pushed by the server, both sets of resources pushed with 'Link' header and using the 'H2PushResource' are visible.
The resource ysf_096.png was pushed once, even though it was mentioned twice in the config, also resources pushed with early hint mechanism using 'H2PushResource' are sent using the HTTP 103 Early Hints response containing the Link header which could also be seen in the previous curl output.
Last test is to verify how browsers are handling server pushes.
This was done using Vivldi browser:


Using http/2 is gaining increasing popularity specially with CDN networks as it helps cutting down page load times by a big margin.
Dynamic applications using languages like php, Java and the like can always inject Link headers in their response to push large resources that might be still referneced deep inside the page or one of its dependancies to speed up page load time.