It has been a while since we started writing in a consistent pace. But somehow, I see that happening now. Today, we will see how to organize and align your data so that you can make a map or two out of it.
We often deal with data in CSV formats, which potentially can be visualized as a map. Let’s start with a sample file.
The table above shows the first few rows from a CSV file containing SSLC results in Karanataka for the year 2012. You can download the complete file here. The contents of the file and what each row means is very evident from the column headers.
The column of interest for you right now should be ‘district’. We will now use this column to make a map from this data. The process of converting an address or part of an address to a geographic coordinate is called geocoding. We will geocode this data to find the latitude and longitude of the districts.
There are several ways of geocoding data – from free and easy APIs to comprehensive as well as expensive ones. Two of our favourites are: Batch Geocode and the MapBox Google Docs Geo plugin. We will use the second one for this exercise.
Geocoding the CSV on Google Drive
We have to first upload the file to Google Drive. This is very straight forward – just start by clicking the ‘Upload’ icon. Before uploading, make sure that you have turned on ‘Conversion’. This converts the CSV to the Google Doc format. Just how you open the CSV in a spreadsheet application.
After uploading, open the spreadsheet in Google Drive. Now let us enable the Geo plugin. Go to Tools -> Script Gallery and search for ‘geo’. From the list of search results, select ‘Geo by MapBox’. Click ‘Install’ and when asked to authorize, click the ‘Authorize’ button. Refresh the page and now you should see an new menu item called ‘Geo’.
Back to the data. Notice that the data has only the name of the district. The golden rule of geocoding is this: the descriptive the address, the precise the coordinates. For instance, an address like MG Road, Bangalore, Karnataka, India will have a better accuracy than MG Road, Bangalore. Since we know that all the district names that we see in the CSV are in Karnataka, we will add two more columns with the state and country name.
Now that the data is ready, let us geocode it. Select the district, state and country columns. Click Geo -> Geocode Addresses. The plugin will prompt you to select any geocoder API service – MapQuest or Yahoo! The accuracy of both the geocoders are pretty good and you might want to see what works best for your region and the data structure that you have. We will go with MapQuest for now. Just leave the API Key field blank and click ‘Geocode’. This will immediately add new columns for the coordinates and accuracy measure. When the geocoding is done, inspect the data and see if there are rows without coordinates. These rows have to be manually geocoded. I use OpenStreetMap to search for these locations manually and add the coordinates to the spreadsheet.
Now we have geocoded the spreadsheet. Great! Here’s a preview of few of the columns:
One of the important feature that the Geo plugin offers is to export the data to GeoJSON. We recommend that you use GeoJSON for mapping or even visualizing this data on the web. To export the data, click Geo -> Export to GeoJSON and save the file.
Verifying the GeoJSON
We will now use QGIS, an open source, geospatial data management software. QGIS is at the heart of any cartogapher. If you don’t have QGIS installed already, download the binary from the website and set it up. If you have question, let us know in the comments. Let us open QGIS and import the GeoJSON file as a layer. Click on the ‘Add Vector Layer’ icon.
In the file browser menu, change the format to GeoJSON and open the file. You should now see several coloured dots in QGIS.
You can now use your data to make maps!
Making a map
We will quickly make a map and introduce a very important feature in QGIS. Spatially speaking, our data is now a set of points. But we know that each of the dots correspond to a district. So what if we can have a map of all the districts in Karnataka with the data represented somehow? Great. For this, we need the district boundaries which can be downloaded from GADM. We extracted the boundaries from the country shapefile and you can download it here. Extract the archive and open it in QGIS. You should see a map like this.
Now that we have both the layers in QGIS, let us merge them together. Click Vector -> Data Management Tools -> Join attributes by location. Select the target layer as the district layer and the join vector layer as the GeoJSON layer. Choose a shapefile to save this merged data and click OK.
We’ve the map data. Import this new layer, right click on it in the QGIS layer pane and click ‘Open Attribute Table’. You should see the result data merged to the district boundary data. Yay! In the next post, we will discuss how to use this shapefile and make a map to show the data.
(SSLC data from the Karnataka Learning Partnership.)