It has been a while since we started writing in a consistent pace. But somehow, I see that happening now. Today, we will see how to organize and align your data so that you can make a map or two out of it.
We often deal with data in CSV formats, which potentially can be visualized as a map. Let’s start with a sample file.
The table above shows the first few rows from a CSV file containing SSLC results in Karanataka for the year 2012. You can download the complete file here. The contents of the file and what each row means is very evident from the column headers.
The column of interest for you right now should be ‘district’. We will now use this column to make a map from this data. The process of converting an address or part of an address to a geographic coordinate is called geocoding. We will geocode this data to find the latitude and longitude of the districts.
There are several ways of geocoding data – from free and easy APIs to comprehensive as well as expensive ones. Two of our favourites are: Batch Geocode and the MapBox Google Docs Geo plugin. We will use the second one for this exercise.
I recently rewrote the maps portal for the Karnataka Learning Partnership. The map is an important part of our project, action and process because it serves as the pivot point of navigation. I will quickly talk about the data and tools before we discuss the design aspects.
We have a fairly large dataset of schools in Karnataka. The name of the school, location, number of girls and boys etc. in a database. Fortunately, the data was clean and properly stored in a PostgreSQL database with PostGIS extensions. Most of my task was to modify the API to throw GeoJSON to the client using the ST_AsGeoJSON function and export the data.
We used the amazing Leaflet.js library and a wide range of plugins. Most of the UI elements are from Twitter’s Bootstrap. I cannot say that Leaflet and Bootstrap works well all the time, but in case you want to add something on the map, make sure that you use extend leaflet’s control layer. For instance, see how we added the Stop Drawing control.
We made several design decisions mostly inspired by the series of blog posts by Brian Timoney.
I was employed as a spatial data and cartographic consultant on a project to analyse specific agricultural commodities and Agricultural Produce Marketing Committees (APMCs) in the Indian states of Karnataka and Madhya Pradesh. The final product was a set of maps for various publications, as well as the clean datasets themselves.
Agricultural market datasets for the states of Karnataka and Madhya Pradesh were obtained for the purposes of spatial visualisation; these contained information on wheat procurement in Madhya Pradesh (2008 – 2012), tuar production in Karnataka (2007 – 2009) and the locations and categories of APMCs in both these states. Some of the data was linked to district names, while the rest was geocoded using a free online geocoding service. I used Quantum GIS, TextEdit and Microsoft Excel extensively for this project; Excel and TextEdit are invaluable when processing CSV files, and QGIS is where all the actual mapping itself takes place.
The actual process itself involved lots of data-cleaning and a little bit of mapping. First, for the geocoding, I ran the column containing the village names through the geocoder thrice; at each repetition, I tweaked the names a little more to get more accurate coordinate results. I then had to similarly tweak the district names to get them to match up with my source shapefiles; fixing bad spellings can be a LOT of work. In its entirity, this was a tedious process that involved organising, cleaning and validating four distinct datasets with both automated and manual operations. However, the final products were datasets that were clean, had accurate spatial locations and could easily be used to produce analytically valuable maps.