Sunday 27 October 2019

Regular experession matching in Postgres

Lately I ran across the need to extract user id text strings from a text colum in a database.
The text could be large and might contain multiple user id matches in the same row.

Thus I imported to the data into Postgres and started looking for Postgres functions that would help me use regular expressions to extract a text part of the column that matches the user id text.

I found a very useful Postgres function: REGEXP_MATCHES('input string', 'regexp pattern', 'flags').
This function works very similar to the sed Unix command, it applies the pattern against input string and allows the use of flag modifiers similar to the one used with sed.

Below is an example of a sample data:

The ids as you can see can be found in any part of the comment-text field.
A query using the  REGEXP_MATCHES function would help extract those Ids into a string arrary type defined in Postgres, the query looks like below:

select REGEXP_MATCHES("comment-text", '\[uidkey\:user[a-fA-F0-9]+\]','g') from comments where "comment-text" like '%[uidkey:%]%';

The output of the query looks like below:
As you can see the REGEXP_MATCHES function matched all 5 ids from the 2 rows that contain them and returned 5 array objects with 1 element each containing those ids.
To cast the array objects to standard text strings we use another Postgres function: ARRAY_TO_STRING(Array_object, 'array element separator', 'optional null element replacement'.

The final version of the query looks like below:

select array_to_string(REGEXP_MATCHES("comment-text", '\[uidkey\:user[a-fA-F0-9]+\]','g'),',')
from comments where "comment-text" like '%[uidkey:%]%';

The output of the query looks like below:
Thus, using those methods, we can extract text from text fields and build more complex queries based on that data.

Enjoy playing with your data !!






Tuesday 19 March 2019

Setting RequestHeader in Apache

Apache can be used to inject a Request Header in the incoming request that can be either consumed by Apache or forwarded further to another underlying service, in this case Apache works essentially as a reverse proxy.

In a test setup where Apache works as a reverse proxy in front of tomcat, the below Apache configuration is used to implement the reverse proxy functionality and add a Request Header:

<VirtualHost *>
   <Location "/sherif">
      ProxyPass http://127.0.0.1:8080/sherif
      ProxyPassReverse http://127.0.0.1:8080/sherif
      RequestHeader set myh "valueofarequestheader"
   </Location>
</VirtualHost>

In this setup, tomcat is using default port 8080, it has a defined context path for a dummy application defined in tomcat under its context.xml:

<Context>
    <WatchedResource>WEB-INF/web.xml</WatchedResource>
    <WatchedResource>WEB-INF/tomcat-web.xml</WatchedResource>
    <WatchedResource>${catalina.base}/conf/web.xml</WatchedResource>

    <Manager pathname="/sherif" />
</Context>


Under tomcat webapps, we have folder created as sherif and has a simple index.html file to support the test:
[root@localhost conf]# ls -lt ../webapps/sherif/
total 4
-rw-r--r-- 1 root root 14 Mar 19 16:27 index.html
[root@localhost conf]#

To verify the header being added, we configure tomcat to log the head myh.
This is done on tomcat server.xml accesslog value as below:

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
               prefix="localhost_access_log" suffix=".txt"
               pattern="%h %l %u %t &quot;%r&quot; %s %b %{myh}i" />

Once a request is sent to http://localhost/sherif, tomcat logs the below log showing the request header being added by Apache and reaching tomcat:

127.0.0.1 - - [19/Mar/2019:16:46:47 -0400] "GET /sherif/ HTTP/1.1" 200 17 valueofarequestheader

This configuration is useful in passing headers to backend services in case those are not already sent by the source user agent.