We often find ourselves choosing between various data formats while dealing with spatial data. Consider this (not-so) hypothetical example: your data collection department passed on a bunch of KML files but your analysts insist on SHP files and your web team is very particular about their GeoJSON. If this sounds familiar, you’re reading the right post; we will quickly run through some of the popular vector and raster data formats you should care about and discuss some of the ways to convert data between these formats.
The shapefile is perhaps the most popular spatial data format, introduced by Esri.
Esri still has the right to change the format when and if they choose to do so, it is otherwise open and is highly interoperable. Shapefiles can store all the commonly used spatial geometries (points, lines, polygons) along with the attributes to describe these features. Unlike other vector formats, a shapefile comes as a set of three or more files – the mandatory .shp, .shx, .dbf and the optional .prj file The .shp file holds the actual geometries, the .shx is an index which allows you to ‘seek’ the features in the shapefile, the .dbf file stores the attributes and the .prj file specifies the projection the geometries are stored in.
GPX (GPSeXchange) is the format in which most GPS (Global Positioning System) devices store and export the data they collect. If you are familiar with OpenStreetMap, you will notice that all the editors support GPX among others. This format can record waypoints (points) and tracks (collection of points at periodic intervals), and is considered a lightweight and easy-to-handle format.
Keyhole Markup Language was developed primarily to work with Google’s spatial ecosystem which includes Google Earth, MapMaker and Google Maps. Those who’ve used XML will find that KML is just an extension to XML with the addition of custom and meaningful tags. Apart from the spatial geometry and their attributes which are textual, KML can also accommodate images and 3D models. KML is often used to display spatial data in the Google spatial ecosystem, and also to export data from it.
With the advent of D3.js and other tools, map-based visualisation on the web became very popular, and resulted in increasingly processor intensive client-side visualisation rendering due to large GeoJSON files. A solution was introduced by Mike Bostock in the form of TopoJSON; this format eliminates redundancy in the way GeoJSON is encoded and thus drastically reduces the size of the files.
TopoJSON is an extension of GeoJSON that encodes topology. Rather than representing geometries discretely, geometries in TopoJSON files are stitched together from shared line segments called arcs.
TopoJSON is worth exploring if you are creating visualisations that will be rendered primarily in a browser.
Scalabe Vector Graphics is a popular vector image format supported by most image manipulation applications and by the leading web browsers. It is is a fairly robust format extended from XML, and libraries like D3.js can render their graphics in this format. Maps intended for printing purposes are also often published finally in SVG, owing to the interoperability and versatility.
Digital satellite imagery, aerial photos, elevation models, and scanned maps are often stored in GeoTIFF format.
Everyone would have come across a JPEG file. They are one of the most popular ways of storing images employing lossy compression. Maps that are scanned or illustrated using a design suite can be found in JPEG format, and are geo-referenced before spatial analytics are conducted. Alternatively, .jpgs can be used as is for representational purpose.
The ASC format in the spatial world usually refers to the file generated by the ESRI suite. It is not considered as a format to store data but a way to transfer data from one software application to another.
When an existing raster is output to an ESRI ASCII format raster, the file will begin with header information that defines the properties of the raster such as the cell size, the number of rows and columns, and the coordinates of the origin of the raster. The header information is followed by cell value information specified in space-delimited row-major order, with each row separated by a carriage return.
Converting between data formats
There are several ways to convert data between the formats discussed above. We will quickly run through some of the important ones:
Geospatial Data Abstraction Library is an open source collection of procedures to read and write data in raster formats. The GDAL website is a great place to start and learn about the supported formats and how to use the commands. GDAL also has bindings for several popular programming langauges if your applications want to use it. GDAL comes with a collection of utilities, of which the gdalinfo is notable. If you have spatial data in unknown formats, gdalinfo will list all the features and properties of the file to identify the data better.
OGR is part of the GDAL/OGR suite and focusses on vector data formats. Just like GDAL, OGR comes with few utilities for data conversion and feature extraction. The ogr2ogr is widely used to convert between vector formats.
OGRE is a web client for the ogr2ogr utility that converts data to the GeoJSON format and also from GeoJSON to shapefile. It is fast and easy, as you can provide it with a file on your computer or with a remote URL.
As the name suggests, Kml2gpx is an online service to convert data between KML and GPX formats.
In this post, we’ve tried to discuss the major vector and raster formats as well as the ways to convert data between them. If you think we missed something, or need help with figuring out spatial data formats and how to use them, leave us a line in the comments below and we can start a discussion.