We often find ourselves choosing between various data formats while dealing with spatial data. Consider this (not-so) hypothetical example: your data collection department passed on a bunch of KML files but your analysts insist on SHP files and your web team is very particular about their GeoJSON. If this sounds familiar, you’re reading the right post; we will quickly run through some of the popular vector and raster data formats you should care about and discuss some of the ways to convert data between these formats.

Vector

Shapefiles

The shapefile is perhaps the most popular spatial data format, introduced by Esri.

It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products. – Wikipedia

Esri still has the right to change the format when and if they choose to do so, it is otherwise open and is highly interoperable. Shapefiles can store all the commonly used spatial geometries (points, lines, polygons) along with the attributes to describe these features. Unlike other vector formats, a shapefile comes as a set of three or more files – the mandatory .shp, .shx, .dbf and the optional .prj file The .shp file holds the actual geometries, the .shx is an index which allows you to ‘seek’ the features in the shapefile, the .dbf file stores the attributes and the .prj file specifies the projection the geometries are stored in.

GPX

GPX (GPSeXchange) is the format in which most GPS (Global Positioning System) devices store and export the data they collect. If you are familiar with OpenStreetMap, you will notice that all the editors support GPX among others. This format can record waypoints (points) and tracks (collection of points at periodic intervals), and is considered a lightweight and easy-to-handle format.

KML

Keyhole Markup Language was developed primarily to work with Google’s spatial ecosystem which includes Google Earth, MapMaker and Google Maps. Those who’ve used XML will find that KML is just an extension to XML with the addition of custom and meaningful tags. Apart from the spatial geometry and their attributes which are textual, KML can also accommodate images and 3D models. KML is often used to display spatial data in the Google spatial ecosystem, and also to export data from it.

GeoJSON

GeoJSON is a fairly new format, widely popular among web developers. As the name suggests, it is an extension of JSON (JavaScript Object Notation). It can store spatial geometry as well as related attributes, and is amazingly easy to use while building web applications. They can be parsed and populated with ease at the back-end as well as visualised in the front-end. Recently, Github added a new feature which visualises any GeoJSON file in your repository. Now you know why your web team wants just GeoJSON!

TopoJSON

With the advent of D3.js and other tools, map-based visualisation on the web became very popular, and resulted in  increasingly processor intensive client-side visualisation rendering due to large GeoJSON files. A solution was introduced by Mike Bostock in the form of TopoJSON; this format eliminates redundancy in the way GeoJSON is encoded and thus drastically reduces the size of the files.

TopoJSON is an extension of GeoJSON that encodes topology. Rather than representing geometries discretely, geometries in TopoJSON files are stitched together from shared line segments called arcs.

TopoJSON is worth exploring if you are creating visualisations that will be rendered primarily in a browser.

SVG

Scalabe Vector Graphics is a popular vector image format supported by most image manipulation applications and by the leading web browsers. It is is a fairly robust format extended from XML, and libraries like D3.js can render their graphics in this format. Maps intended for printing purposes are also often published finally in SVG, owing to the interoperability and versatility.

Raster

GeoTIFF

GeoTIFF is an extension to the raster image format TIFF.

The potential additional information includes map projection, coordinate systems, ellipsoids, datums, and everything else necessary to establish the exact spatial reference for the file.

Digital satellite imagery, aerial photos, elevation models, and scanned maps are often stored in GeoTIFF format.

JPEG

Everyone would have come across a JPEG file. They are one of the most popular ways of storing images employing lossy compression. Maps that are scanned or illustrated using a design suite can be found in JPEG format, and are geo-referenced before spatial analytics are conducted. Alternatively, .jpgs can be used as is for representational purpose.

ASC

The ASC format in the spatial world usually refers to the file generated by the ESRI suite. It is not considered as a format to store data but a way to transfer data from one software application to another.

When an existing raster is output to an ESRI ASCII format raster, the file will begin with header information that defines the properties of the raster such as the cell size, the number of rows and columns, and the coordinates of the origin of the raster. The header information is followed by cell value information specified in space-delimited row-major order, with each row separated by a carriage return.

Converting between data formats

There are several ways to convert data between the formats discussed above. We will quickly run through some of the important ones:

GDAL

Geospatial Data Abstraction Library is an open source collection of procedures to read and write data in raster formats. The GDAL website is a great place to start and learn about the supported formats and how to use the commands.  GDAL also has bindings for several popular programming langauges if your applications want to use it. GDAL comes with a collection of utilities, of which the gdalinfo is notable. If you have spatial data in unknown formats, gdalinfo will list all the features and properties of the file to identify the data better.

OGR

OGR is part of the GDAL/OGR suite and focusses on vector data formats. Just like GDAL, OGR comes with few utilities for data conversion and feature extraction. The ogr2ogr is widely used to convert between vector formats.

OGRE

OGRE is a web client for the ogr2ogr utility that converts data to the GeoJSON format and also from GeoJSON to shapefile. It is fast and easy, as you can provide it with a file on your computer or with a remote URL.

GPSBabel

GPSBabel is a free cross-platform software package which allows most GPS devices to interface with a computer. The package can be downloaded for most operating systems from the official website.

Kml2gpx

As the name suggests, Kml2gpx is an online service to convert data between KML and GPX formats.

 

In this post, we’ve tried to discuss the major vector and raster formats as well as the ways to convert data between them. If you think we missed something, or need help with figuring out spatial data formats and how to use them, leave us a line in the comments below and we can start a discussion.

5 thoughts on “Spatial Data Formats 101

  1. Very useful website. I really appreciate your tutorials. I am figuring my way through QGIS for a research project that I am doing on access to medicines. I found your site very useful.

    Any idea how I can get shp files for Tumkur with the taluka and village boundaries demarcated? Thanks.

    Reply

Leave a reply to Tonny Cancel reply

required

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>