It has been a while since we started writing in a consistent pace. But somehow, I see that happening now. Today, we will see how to organize and align your data so that you can make a map or two out of it.

We often deal with data in CSV formats, which potentially can be visualized as a map. Let’s start with a sample file.

code district boys_appeared girls_appeared total_appeared boys_passed girls_passed total_passed pass_% rank
GA UDUPI 8013 8058 16071 6852 7537 14389 89.53 1
PA SIRSI 4582 4633 9215 3955 4183 8138 88.31 2
LL HASSAN 11783 11968 23751 9722 10685 20407 85.92 3
DD TUMKUR 12312 11085 23397 10305 9780 20085 85.84 4

The table above shows the first few rows from a CSV file containing SSLC results in Karanataka for the year 2012. You can download the complete file here. The contents of the file and what each row means is very evident from the column headers.

The column of interest for you right now should be ‘district’. We will now use this column to make a map from this data. The process of converting an address or part of an address to a geographic coordinate is called geocoding. We will geocode this data to find the latitude and longitude of the districts.

There are several ways of geocoding data – from free and easy APIs to comprehensive as well as expensive ones. Two of our favourites are: Batch Geocode and the MapBox Google Docs Geo plugin. We will use the second one for this exercise.

 

Geocoding the CSV on Google Drive

We have to first upload the file to Google Drive. This is very straight forward – just start by clicking the ‘Upload’ icon. Before uploading, make sure that you have turned on ‘Conversion’. This converts the CSV to the Google Doc format. Just how you open the CSV in a spreadsheet application.

Upload to Google Drive

After uploading, open the spreadsheet in Google Drive. Now let us enable the Geo plugin. Go to Tools -> Script Gallery and search for ‘geo’. From the list of search results, select ‘Geo by MapBox’. Click ‘Install’ and when asked to authorize, click the ‘Authorize’ button. Refresh the page and now you should see an new menu item called ‘Geo’.

Enable Geo Plugin

Back to the data. Notice that the data has only the name of the district. The golden rule of geocoding is this: the descriptive the address, the precise the coordinates. For instance, an address like MG Road, Bangalore, Karnataka, India will have a better accuracy than MG Road, Bangalore. Since we know that all the district names that we see in the CSV are in Karnataka, we will add two more columns with the state and country name.

code district state country boys_appeared girls_appeared total_appeared boys_passed girls_passed total_passed pass_% rank
GA UDUPI Karnataka India 8013 8058 16071 6852 7537 14389 89.53 1
PA SIRSI Karnataka India 4582 4633 9215 3955 4183 8138 88.31 2
LL HASSAN Karnataka India 11783 11968 23751 9722 10685 20407 85.92 3
DD TUMKUR Karnataka India 12312 11085 23397 10305 9780 20085 85.84 4

Now that the data is ready, let us geocode it. Select the district, state and country columns. Click Geo -> Geocode Addresses. The plugin will prompt you to select any geocoder API service – MapQuest or Yahoo! The accuracy of both the geocoders are pretty good and you might want to see what works best for your region and the data structure that you have. We will go with MapQuest for now. Just leave the API Key field blank and click ‘Geocode’. This will immediately add new columns for the coordinates and accuracy measure. When the geocoding is done, inspect the data and see if there are rows without coordinates. These rows have to be manually geocoded. I use OpenStreetMap to search for these locations manually and add the coordinates to the spreadsheet.

Now we have geocoded the spreadsheet. Great! Here’s a preview of few of the columns:

code district state country longitude latitude
GA UDUPI Karnataka India 74.7548510231 13.48048385
PA SIRSI Karnataka India 74.8350192 14.6176591
LL HASSAN Karnataka India 76.1668778061 13.02376815
DD TUMKUR Karnataka India 76.8748572157 13.41179805

One of the important feature that the Geo plugin offers is to export the data to GeoJSON. We recommend that you use GeoJSON for mapping or even visualizing this data on the web. To export the data, click Geo -> Export to GeoJSON and save the file.

 

Verifying the GeoJSON

We will now use QGIS, an open source, geospatial data management software. QGIS is at the heart of any cartogapher. If you don’t have QGIS installed already, download the binary from the website and set it up. If you have question, let us know in the comments. Let us open QGIS and import the GeoJSON file as a layer. Click on the ‘Add Vector Layer’ icon.

Add Vector Layer

In the file browser menu, change the format to GeoJSON and open the file. You should now see several coloured dots in QGIS.

QGIS Point Layer

You can now use your data to make maps!

 

Making a map

We will quickly make a map and introduce a very important feature in QGIS. Spatially speaking, our data is now a set of points. But we know that each of the dots correspond to a district. So what if we can have a map of all the districts in Karnataka with the data represented somehow? Great. For this, we need the district boundaries which can be downloaded from GADM. We extracted the boundaries from the country shapefile and you can download it here. Extract the archive and open it in QGIS. You should see a map like this.

qgis-map

Now that we have both the layers in QGIS, let us merge them together. Click Vector -> Data Management Tools -> Join attributes by location. Select the target layer as the district layer and the join vector layer as the GeoJSON layer.  Choose a shapefile to save this merged data and click OK.

qgis-join

We’ve the map data. Import this new layer, right click on it in the QGIS layer pane and click ‘Open Attribute Table’. You should see the result data merged to the district boundary data. Yay!  In the next post, we will discuss how to use this shapefile and make a map to show the data.

(SSLC data from the Karnataka Learning Partnership.)

I recently rewrote the maps portal for the Karnataka Learning Partnership. The map is an important part of our project, action and process because it serves as the pivot point of navigation. I will quickly talk about the data and tools before we discuss the design aspects.

We have a fairly large dataset of schools in Karnataka. The name of the school, location, number of girls and boys etc. in a database. Fortunately, the data was clean and properly stored in a PostgreSQL database with PostGIS extensions. Most of my task was to modify the API to throw GeoJSON to the client using the ST_AsGeoJSON function and export the data.

We used the amazing Leaflet.js library and a wide range of plugins. Most of the UI elements are from Twitter’s Bootstrap. I cannot say that Leaflet and Bootstrap works well all the time, but in case you want to add something on the map, make sure that you use extend leaflet’s control layer. For instance, see how we added the Stop Drawing control.

We made several design decisions mostly inspired by the series of blog posts by Brian Timoney.

 

popup

 

Show only the required information – depending on the zoom level, we show the user relevant information at that level. For instance, the district layer will be hidden on a zoom level beyond 8, clusters and projects are not shown beyond 10. At the same time, the user is free to turn these layers on from the layer control on the right side. In case you are curious how to do this, here’s the bit of code that does this trick.

 

 

clustering

Clustered markers – based on location, the markers are clustered. This makes the map much intuitive than laying out all the markers outright. There are several performance aspects of doing this though. Clustering gives the user a birds-eye view of how the data is spread over the locations.

 

search

Search – a very easy to use search bar is employed. Search is absolutely the best feature that any maps portal should offer. It helps the user quickly find his location of interest than pan the map around. In a way, search will take the user to the exact information that he/she is looking for.

 

 

filter

Filter – suppose the user wants to find one particular school in an area well known to him. The filter tool on the portal does just this. It will help the user filter the data according to different levels and the map will change its view whenever the user selects an attribute. You are more than welcome try this out yourself. Just click the filter icon on the right side.

 

Bounding circle – we’ve often seen our users wanting to find schools within a particular distance from their point of interest. We incorporated the Bounding circle tool which lets the user draw a circle on the map, and it will load all the schools in that region. More over, it will also list all the schools  as drop down on the left side of the map.

 

 

location

Locate the user – upon loading, the map portal will ask the user for his/her location, if it’s available, the map will change its view to that particular location and load the data. This gives more context to the volunteers at KLP. If you are starting out to build a map portal and are looking for a real world example, the maps might be a good place for you to start. Also, don’t forget to check the code!

I was employed as a spatial data and cartographic consultant on a project to analyse specific agricultural commodities and Agricultural Produce Marketing Committees (APMCs) in the Indian states of Karnataka and Madhya Pradesh. The final product was a set of maps for various publications, as well as the clean datasets themselves.

Agricultural market datasets for the states of Karnataka and Madhya Pradesh were obtained for the purposes of spatial visualisation; these contained information on wheat procurement in Madhya Pradesh (2008 – 2012), tuar production in Karnataka (2007 – 2009) and the locations and categories of APMCs in both these states. Some of the data was linked to district names, while the rest was geocoded using a free online geocoding service. I used Quantum GIS, TextEdit and Microsoft Excel extensively for this project; Excel and TextEdit are invaluable when processing CSV files, and QGIS is where all the actual mapping itself takes place.

The actual process itself involved lots of data-cleaning and a little bit of mapping. First, for the geocoding, I ran the column containing the village names through the geocoder thrice; at each repetition, I tweaked the names a little more to get more accurate coordinate results. I then had to similarly tweak the district names to get them to match up with my source shapefiles; fixing bad spellings can be a LOT of work. In its entirity, this was a tedious process that involved organising, cleaning and validating four distinct datasets with both automated and manual operations. However, the final products were datasets that were clean, had accurate spatial locations and could easily be used to produce analytically valuable maps.

CASI _ Five years of wheat procurement in Madhya Pradesh _ Animated

I’d like to say that processing agricultural data obtained from official Indian sources is a difficult task and involves manual intervention at various stages. However, these are rich datasets which can help create and implement national level agricultural policies. The creation of maps, or cartographic visualisations of these datasets, has a small but influential role to play in informing such policy creation. At a glance, complex spatial data can be digested and used as evidence to follow a certain course of action.

Once these datasets are clean and useable, it is possible to analyse them in multiple ways. For example, these same datasets can be used in association with web resources, such as Google Charts or JavaScript libraries (such as d3.js) to create interactive online maps that can be shared with a far larger audience than was traditionally possible.

EPW _ Five years of wheat procurement in Madhya Pradesh

Part of this work was published in an EPW article earlier this year, available at this link.