Ops Daily: Simple apache archiva crawler

Tuesday, 19 July 2016

Simple apache archiva crawler

I had a requirement to be able to identify new application deployed on the fly.
The applications are pushed to an apache archiva repository that will be used to store the application snapshots.

Also the requirement mentions that we need to identify all new released apps automatically for older deployed apps.

To be able to accomplish this, we need to have a simple crawler to go and looks for all the latest war files stored in archiva.

below is a simple script to do this:

ARCHIVA_BASE="http://archiva:8080/archiva/repository/snapshots/com/sherif/"
BASEURLS=`curl -s ${ARCHIVA_BASE} |grep "<li><a href=" |cut -d"\"" -f2 |grep -v "\.\./"`
for pUrl in `echo ${BASEURLS}`
do
        #echo ${ARCHIVA_BASE}${pUrl}
        LAST_SNAPSHOT=`curl -s ${ARCHIVA_BASE}${pUrl} | grep "<li><a href=" |cut -d"\"" -f2|egrep -v "xml|\.\./"|egrep "[0-9]+.[0-9]+.[0-9]+" |tr "." ","|sort -n -t"," -k1,2|tr "," "."| tail -1`
        if [ "x${LAST_SNAPSHOT}" = "x" ]
        then
                continue
        else
                #echo ${ARCHIVA_BASE}${pUrl}${LAST_SNAPSHOT}
                WAR=`curl -s ${ARCHIVA_BASE}${pUrl}${LAST_SNAPSHOT} | grep "<li><a href=" |cut -d"\"" -f2 |egrep ".war$"|egrep -v "md5|sha|pom"`
                if [ "x${WAR}" = "x" ]
                then
                        continue
                else
                        echo ${ARCHIVA_BASE}${pUrl}${LAST_SNAPSHOT}${WAR}
                fi

        fi
        #read
done

The script uses some assumptions as per the requirement:

1- the application folders are immediately under the archiva base URL above.
2- only war files will be deployed, using a small modification we can also get other file types.
3- the release snapshot folders are immiadiately under the application folders
4- the release snapshot folders have the format "11.22.33{Anystring}" thus conatins 3 number sections and those are used for sorting.
5- the war file are immediate found under the snapshot folder.
6- Each snapshot folder contain only 1 war file.

Ops Daily

Tuesday, 19 July 2016

Simple apache archiva crawler

No comments:

Post a Comment