We often find ourselves choosing between various data formats while dealing with spatial data. Consider this (not-so) hypothetical example: your data collection department passed on a bunch of KML files but your analysts insist on SHP files and your web team is very particular about their GeoJSON. If this sounds familiar, you’re reading the right post; we will quickly run through some of the popular vector and raster data formats you should care about and discuss some of the ways to convert data between these formats.
The shapefile is perhaps the most popular spatial data format, introduced by Esri.
It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products. – Wikipedia
Esri still has the right to change the format when and if they choose to do so, it is otherwise open and is highly interoperable. Shapefiles can store all the commonly used spatial geometries (points, lines, polygons) along with the attributes to describe these features. Unlike other vector formats, a shapefile comes as a set of three or more files – the mandatory .shp, .shx, .dbf and the optional .prj file The .shp file holds the actual geometries, the .shx is an index which allows you to ‘seek’ the features in the shapefile, the .dbf file stores the attributes and the .prj file specifies the projection the geometries are stored in.
It has been a while since we started writing in a consistent pace. But somehow, I see that happening now. Today, we will see how to organize and align your data so that you can make a map or two out of it.
We often deal with data in CSV formats, which potentially can be visualized as a map. Let’s start with a sample file.
The table above shows the first few rows from a CSV file containing SSLC results in Karanataka for the year 2012. You can download the complete file here. The contents of the file and what each row means is very evident from the column headers.
The column of interest for you right now should be ‘district’. We will now use this column to make a map from this data. The process of converting an address or part of an address to a geographic coordinate is called geocoding. We will geocode this data to find the latitude and longitude of the districts.
There are several ways of geocoding data – from free and easy APIs to comprehensive as well as expensive ones. Two of our favourites are: Batch Geocode and the MapBox Google Docs Geo plugin. We will use the second one for this exercise.