Sunday, 27 October 2019

Regular experession matching in Postgres

Lately I ran across the need to extract user id text strings from a text colum in a database.
The text could be large and might contain multiple user id matches in the same row.

Thus I imported to the data into Postgres and started looking for Postgres functions that would help me use regular expressions to extract a text part of the column that matches the user id text.

I found a very useful Postgres function: REGEXP_MATCHES('input string', 'regexp pattern', 'flags').
This function works very similar to the sed Unix command, it applies the pattern against input string and allows the use of flag modifiers similar to the one used with sed.

Below is an example of a sample data:

The ids as you can see can be found in any part of the comment-text field.
A query using the  REGEXP_MATCHES function would help extract those Ids into a string arrary type defined in Postgres, the query looks like below:

select REGEXP_MATCHES("comment-text", '\[uidkey\:user[a-fA-F0-9]+\]','g') from comments where "comment-text" like '%[uidkey:%]%';

The output of the query looks like below:
As you can see the REGEXP_MATCHES function matched all 5 ids from the 2 rows that contain them and returned 5 array objects with 1 element each containing those ids.
To cast the array objects to standard text strings we use another Postgres function: ARRAY_TO_STRING(Array_object, 'array element separator', 'optional null element replacement'.

The final version of the query looks like below:

select array_to_string(REGEXP_MATCHES("comment-text", '\[uidkey\:user[a-fA-F0-9]+\]','g'),',')
from comments where "comment-text" like '%[uidkey:%]%';

The output of the query looks like below:
Thus, using those methods, we can extract text from text fields and build more complex queries based on that data.

Enjoy playing with your data !!