Open Source Software As a Learning Tool


With all the talk lately of open source software I see more and more often people think of open source software in a cost vs. convenience sort of matrix where they only want to avoid licensing costs if it won’t be too hard.  I often here something along the lines.

“how much more work am I willing to do, to save money”.

For some tasks there simply is no better option than proprietary software and for those jobs the decision is easy.  But what about for those things where maybe the best tool for the job or at least a really good tool for the job is from an open source project.  One of the great benefits of using some of these open source projects is that you get a chance to get a little more involved with the data you are working with, because there isn’t usually a “push button” way to get from point a to point b with these projects you have to learn a bit more about what it is you are trying to do.  An added bonus is that you usually have to get to know your computer a bit better as well as the project you plan on using.

A great example from my everyday life is the open source project GDALyou can read, write, or transform tons of data formats with GDAL but what I find most useful is using it to work with aerial photos.  We all know just how much time gets burned up processing imagery and batch processing it with gui tools wears on my nerves so with the help of awk and a text editor I script up my image manipulation to run as I sleep.  One task that seems to come up over and over re-projecting imagery. Let’s run through what we’ll need to get that job done.

1.    100px-GDALLogoColor.svg

Here is a good chance to learn about how applications and files are handled by your operating system.  Start by getting a copy for your platform from GDAL Downloads.  You will want to be sure that the GDAL executables are on your system path, easy enough to check just open a command prompt and type gdalinfo if the command is found… It’s on the path if not you will want to add the path by modifying either your user path or the system path.

2.    A good text editor

Let’s lay hands on a good text editor for windows I like to use notepad++ although I’ve been fond of sublime text as of late, on linux (desktop) I normally use gedit and on our linux servers I stick with vim.  It’s best to go with what you feel comfortable with, no point in getting to terribly fancy we’ll only be writing a few lines of awk to get us going.

3. A scripting language.

I think everyone who deals in data for a living should know at least a little of one scripting language, it can shave hours off your day. For this kind of task I just can’t seem to get away from python, it makes short work of turning directories full of files into a batch file or shell script to do our image processing. Here I’m printing out the commands to re-project or “warp” imagery from one coordinate system to another.

<pre>import os

NEW_DIR = r'/home/mbratcher/transimgs'

for root, dirs, files in os.walk(r'/home/mbratcher/newimgs'):
    for f in files:
        name, ext = os.path.splitext(f)
        if ext == '.tif':
            ofile = os.path.join(root,f)
            nfile = os.path.join(NEW_DIR,f)
            print "gdalwarp -s_srs EPSG:2264 -t_srs EPSG:3358 -co TILED=YES %s %s" % (ofile,nfile)


When I first started using some of these tools I was overwhelmed by the number of options and switches and file formats.  In order to get a grasp on what I was doing I was forced to learn more about the data I was working with.  What is a tiled tif?  Why would tiling it be a good or bad idea? And what the what is LZMA compression?  I know it can seem daunting to jump into using this crazy command-line driven tool set, but believe me it is way easier than it looks.  In no time at all you’ll be generating indexes from a collection of image with gdal_tindex, you’ll be moving data from shapefiles into PostGIS with ogr2ogr, and all the while you’ll be getting more familiar with how all these technologies work and can work together.

Written by mbratcher