Our tutorials so far have been focused on several aspects of cartography, from data structures to their analysis and representation. Not surprisingly, most of them are aligned to web technologies, and client browsers expect the applications to consume relatively low resources. Spatial data have comparatively higher memory footprints owing to the structure and the amount of information they hold. For instance, the taluk boundary level data for India is 46.3 MB in GeoJSON format. This means that it cannot be used directly in a web project; it needs to be optimised first.

Optimising spatial data essentially translates to simplifying the geometries in the file. Since, in a web context, it is not using it for analysis, a slight difference in the area or shape of corners will not make a huge difference. Users may not even realise that the shapes are simplified, if it is done in just the right way.

To give you an idea of the process, have a look at the following maps of Florida. The first row showcases the original data from the Florida Geographic Data Library, converted to GeoJSON (8.2MB). The second set of images shows the simplification of the geometry (note the sharp edges) in the GeoJSON (now 427KB). This really hasn’t changed the way the map looks on the whole, which is exactly what we need for web representation.

florida_combined florida_optimised_combined

In this article I will quickly look at a few easy methods to simplify geometries.

TopoJSON

TopoJSON, developed by Mike Bostock, is an extension of GeoJSON with encoded Topology.

Rather than representing geometries discretely, geometries in TopoJSON files are stitched together from shared line segments called arcs.

This simplifies the structure of the data by identifying the relationships and storing them in the same file, thus eliminating redundancy. TopoJSON works seamlessly with D3.js and can be integrated with pretty much any other web application.

Simplify using QGIS

The QGIS vector processing suite comes with a tool for simplifying geometries. It employs the popular Ramer–Douglas–Peucker algorithm which reduces the number of points in a curve. You have to select the layer that you want to simplify and pick a tolerance level. The higher the tolerance, the lesser the number of points and the lower the size of the file.

qgis_simplify

PostGIS ST_Simplify

In case you are serving spatial data from a PostgreSQL database through an API to the client-side, PostGIS implements the previously mentioned Ramer-Douglas-Peucker algorithm through the procedure called ST_Simplify. For example, to apply ST_Simplify on a geometry called ‘state’ of id 1, with a tolerance of 0.002 from a table called ‘country, the PostGIS command would be:

SELECT ST_Simplify(state, 0.002) from country where id=1;

These techniques are very essential when you deal with large amounts of spatial data that are required to be rendered in the browser. If you have more ideas or questions, let us know in the comments!

 

 

We often find ourselves choosing between various data formats while dealing with spatial data. Consider this (not-so) hypothetical example: your data collection department passed on a bunch of KML files but your analysts insist on SHP files and your web team is very particular about their GeoJSON. If this sounds familiar, you’re reading the right post; we will quickly run through some of the popular vector and raster data formats you should care about and discuss some of the ways to convert data between these formats.

Vector

Shapefiles

The shapefile is perhaps the most popular spatial data format, introduced by Esri.

It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products. – Wikipedia

Esri still has the right to change the format when and if they choose to do so, it is otherwise open and is highly interoperable. Shapefiles can store all the commonly used spatial geometries (points, lines, polygons) along with the attributes to describe these features. Unlike other vector formats, a shapefile comes as a set of three or more files – the mandatory .shp, .shx, .dbf and the optional .prj file The .shp file holds the actual geometries, the .shx is an index which allows you to ‘seek’ the features in the shapefile, the .dbf file stores the attributes and the .prj file specifies the projection the geometries are stored in.
Continue reading