Hexbins!

Binning is a general term for grouping a dataset of N values into less than N discrete groups. These groups/bins may be spatial, temporal, or otherwise attribute-based. In this post I’m only talking about spatial (long-lat) and 2-dimensional attribute-based (scatterplot) bins. Such binnings may be thought of as 2D histograms. This may make more or less sense after what lies beneath.

If you’re just after that sweet honey that is my code, bear down on my Github repository for this project — hexbin-js.

Rectangular binning

The simplest 2D bin is rectangular. Indeed, for most purposes rectangular bins suffice, and their computational simplicity is a significant advantage.

The above is a shot from a little example I produced on jsFiddle, while learning Mike Bostock’s fantastic D3 JavaScript library for HTML and SVG data-binding and visualization. The example demonstrates the need for and the technique of 2D rectangular binning and aggregation.

Binning can be good for both the users and the creators/developers of static or interactive thematic maps or other visualizations. For the user, showing every single point can lead to cognitive overload, and may even be inaccurate, as overlapping points lead to a misreading of density.

In the above image (from Antaeus Concepts) the data points are represented in black, and due to overlap the true concentration/density distribution is indiscernible from the graphic.

A binned representation may reveal patterns not readily seen in the raw point representation of the data (see Antaeus’ sunflower plot for the same data below). For the developer or cartographer, too, binning can present an advantage, chiefly in efficiency. Back in the day, manual cartographers probably weren’t too keen on drawing 10’s of 1000’s of data points — I’m guessing they’d rather do the cheap bin math so that they could then only draw 10’s or 100’s of uniformly-shaped bins to represent the same data.

Though we modern cartographers have fancy-fangled computers to draw our data points for us, even these beasts hiccup when asked to draw 10,000+ points at once. Hiccups are acceptable in static rendering, but not in real-time apps that employ animation (as even at just 5 frames per second, rendering 1000s of points each frame would prove intensive for even newer home desktops).

So anyway, binned representations can be beneficial for both users and creators. Below I’ll just describe one binning method (hexagon binning, or hexbinning), its implementation, and some examples.

Hexagonal binning

I first encountered hexagonal binning in the sweet 2006 O’Reilly book by Joseph Adler, Baseball Hacks. Adler demonstrates the need for binning by showing a “spraychart” of David Ortiz’s 2003 balls-in-play.

Adler writes,

Clearly, you can see that David Ortiz tends to hit balls more often to right field. Wouldn’t it be nice to have a cleaner way to see this density? We’ll use another visualization technique, called hexagonal binning, to get a clearer picture of where Ortiz’s hits land.

The idea of hexagonal binning is to break a two-dimensional plane into different bins. First, the bins make interlocking hexagons. It is possible to use squares (or interlocking triangles or another shape), but hexagons look “rounder” than squares.

To turn his spraychart into hexbins, Adler used the hexbin package for R developed by Nicholas Lewin-Koh and Martin Maechler (inspired by Dan Carr’s work; more on this below).

Though unfortunately printed smaller in the book than the spraychart, I think the hexbin representation does effectively simplify the data while revealing patterns not easily retrieved from the full spraychart representation.

Hex history and theory

The technique of using interlocking hexagons to aggregate 2-dimensional data was first described in a 1987 paper by four Pacific Northwest Laboratory statisticians (D.B. Carr et al, “Scatterplot Matrix Techniques for Large N”). The authors note, though, that they were inspired by another binning technique — sunflower plots — described a few years earlier by William Cleveland and Robert McGill.

Cleveland and McGill (1984, “The Many Faces of a Scatterplot”) didn’t mention hexagons; indeed they specified that squares be used to bin the data, with each bin then being tranformed into a “sunflower”, with each “petal” representing a datum within the bin:

In their later 1987 paper, D.B. Carr et al suggested that hexagon-bin-based sunflowers represented the data more faithfully than the rectangular-bin version. They cite various reasons for this advantage, but I think it basically comes down to Joseph Adler’s later observation that “hexagons look rounder than squares”. Indeed, a regular tessellation of a 2D surface is not possible with polygons of greater than six sides, making the hexagonal tessellation the most efficient and compact division of 2D data space.

Besides hexbin sunflower plots, D.B. Carr et al note two other possible symbologies to employ once the data have been successfully hex-binned. To use thematic mapping parlance, the methods are proportional symbols and choropleth. These symbologies and other possibilities are discussed below.

Hexbin symbologies

Hexbinning consists of 1) laying a hexagonal grid or lattice atop a 2-dimensional field of data and 2) determining data point counts for each hexagon. This says nothing of the symbolization or representation method that can then be employed to communicate these counts to the graphic’s reader.

Sunflower plots

Sunflowers are likely the most complex symbology for representing hexbins, but I’m covering them first in this section because they actually inspired the hexbinning method itself.

The above screenshot is from the Antaeus statistics project, and hexagonal sunflower plots have also been implemented within the popular Stata statistics package:

Whether square-bin or hexbin-based, these sunflower plots are notable for allowing the user to simultaneously view generalities and retrieve specifics. A higher number of “petals” leads to a darker hexagon, and clusters of such hexagons will reveal the overall trend of the data. At the same time, individual petals can be counted to determine the exact number of data lying within each hexagonal bin.

Proportional symbol

The choropleth and proportional symbologies don’t really need any explanation here, so images should suffice.

Choropleth
Multivariate

In a multivariate hexbin representation, the secondary variable can be anything, though the sum or average of some attribute of the binned points within each hex would be typical.

In the below, a bivariate symbology is shown, though in this case the variable distribution represented by color value/saturation is the same as that represented by size (making this a redundant symbolization).

In addition to size and saturation/value, alpha (opacity) could also be used to represent a hex attribute (see the bottom of this example page for a value-by-alpha representation of density above a Google map).

Implementation

As noted above, hexbins have been implemented in a few statistical packages, including R. But I haven’t seen any other implementations, and certainly not a client-side one. At first I tried to just port the R code, but it made no sense to me, and included many dependencies within R. So I decided I’d have to roll my own. Rolling my own seemed really hard, so luckily I ran into Alex Tingle’s libhex library. Though certainly not designed for hex-binning (I believe it was made with game developers in mind), this great library has all the methods necessary to create a hexagonal lattice data structure. And though written in C++, the author had completed an experimental JavaScript version.

It took some back-and-forth with libhex’s author Alex to figure it out, but the library’s methods make it easy to create a hexagonal lattice or grid of a given size with specified hexagon size. Once created, the Grid.Hex method can be used to determine corresponding hexes for each 2D point in the dataset to be binned. After this is achieved, we have a hexagonal lattice data structure, with each hex in the grid storing the points contained within its data space.

This creates the data structure, and performs the necessary binning routine for each point, but the library wasn’t meant to render anything. I therefore paired it with my favorite data-driven visualization (or document-manipulation) library, D3.js. The result is d3.hexbin.js, which can be used like so:

var hexset = d3.layout.hexbin()
    .xValue( function(d) { return d.x; } )
    .yValue( function(d) { return d.y; } )
    .hexI( hexI )
        ( data );

The result returned by d3.layout.hexbin()hexset in the above — is simply an array of hex objects with important properties — notably data (the binned points for the particular hex) and pointString (a string representing the outline points of the hex). As far as symbolization, once the hexbinning routine is completed, representing the data with multiple symbologies is quite easy with D3. The example posted here shows off the exact same random data represented with dots, choropleth, proportional symbol, bivariate, and value-by-alpha symbologies. The code below is used to render the frequency hexgrid as a choropleth.

d3.select( "div#choroplethHexgrid" )
    .append( "svg:svg" )
    .append( "svg:g" )
        .attr( "class", cbScheme + " stroke-true" )
        .selectAll( "polygon" )
        .data( hexset )
        .enter().append( "svg:polygon" )
            .attr( "class", function(d)
            {
                return 'q'
                    + ( (numClasses-1) - scale(d.data.length))
                    + "-" + numClasses;
            })
            .attr( "points", function(d)
            {
                return d.pointString;
            });

All the code is stored in the hexbin-js Github repository. Please browse the src and tests folders to see more of what’s going on here.

Because I’m a cartographer (or something), I wanted to create a geographic proof-of-concept for this method. I came up with this, combining my custom hexbin layout class with D3.js and Polymaps. The resultant interactive visualization shows an overview + focus view of all Walmarts in the USA. Hexes can be moused over for a focus view.

d3.layout.hexbin requires D3.js, though the dependencies don’t go that deep, so a few quick mods would allow you to use the hexbinning methods with any JavaScript mapping or graphics library. Use as you will. Many thanks to both Alex Tingle and Mike Bostock for answering my questions.

Dymaxion projection in OpenLayers

The great Buckminster Fuller created a series of Dymaxion maps utilizing various forms of his patented Fuller projection in the 1940s and 50s. The most well-known, icosahedral form of the projection (above) was devised in Raleigh, North Carolina, in 1954. Fuller intended the projection to better balance shape and areal distortion, while also eschewing the north-south cultural bias he saw in common projections.

I’ve seen a few code implementations of Bucky’s design over the years, including the Perl scripts Schuyler Erle wrote for his touchstone Mapping Hacks. Erle’s modules were based on Robert Gray’s foundational work in determining the appropriate transformation equations which were then incorporated into C source code. But I hadn’t seen a client-side implementation until I saw this map in the examples section of Jeff Heer and Mike Bostock’s excellent JavaScript visualization framework Protovis. In this post I just show how their JavaScript Dymaxion code can be brought into OpenLayers as a custom projection.

Since Buckminster Fuller, Robert Gray, and Mike Bostock have already done all the hard work, adding the projection to OpenLayers is a cinch. But I think it’s worth showing here because there isn’t much info out there on using custom (that is, outside of PROJ.4) projections in OpenLayers. And I’d love to see more online slippy maps using such experimental projections.

OpenLayers implementation

Most web mapping frameworks only display data in the Web Mercator projection. This is basically because of a decision Google made six or seven years ago and because web mapping platforms have been used more for reference than thematic mapping. OpenLayers is unique in allowing coordinate system transforms from any arbitrary projection to any other. Of course, if you’re loading in Google or Bing tiles, you’ll have to stick to Web Mercator for any overlays; in this case OpenLayers will just be transforming your overlay data from lat/long to Mercator for display. But if you’re using all vector data — from KML, GeoJSON, or many other formats — you can take advantage of OpenLayers’ projection abilities. And this is almost always going to involve the Proj4js library, a port of PROJ.4.

Bjørn Sandvik provides a great introduction to projecting KML data with OpenLayers and Proj4js. Proj4js contains most of your old favorites, like Lambert and Albers and Transverse Mercator. But for custom projections like Dymaxion, OpenLayers includes the OpenLayers.Projection.addTransform method, which I’m using like so:

OpenLayers.Projection.addTransform( "EPSG:4326", "DYMAX",
    function( point )
    {
        var converted = convert_s_t_p( point.x, point.y );
        point.x = converted.x * 150;
        point.y = converted.y * 150;
        return point;
    }
);

EPSG:4326 in the above method represents the standard WGS 84 lat/long coordinate system. “DYMAX” is an arbitrary string that we’ll use whenever we want to apply the Dymaxion transformation. The convert_s_t_p method is from the dymax.js code included with Protovis and does all the heavy lifting of converting lat/long points to Fuller x-y coordinates. Notice I’m only defining the forward transformation — from lat/long to Fuller. The inverse would be quite difficult but is luckily not necessary for projecting geodata onto a Dymaxion map.

To utilize the defined projection, then, we just have to instantiate an OpenLayers.Projection object with our “DYMAX” projection identifier, and include this as an option when we create our OpenLayers map.

var map = new OpenLayers.Map("olmap", {
    projection : new OpenLayers.Projection( "DYMAX" ),
    maxExtent : new OpenLayers.Bounds( 0, 0, 860, 400 ),
    allOverlays : true
} );

Below’s an image linking to an example of the Protovis Dymaxion code being used within OpenLayers. View source on that page to see the very little code that’s driving the example. The triangles and world countries should load right away in the Fuller projection, but the 3000+ world cities may take a bit.

Next

I’d be remiss if I didn’t mention Mike Migurski’s remarkable work on the “Faumaxion” projection, which resulted in an experimental and path-breaking interactive browser that re-orients and re-configures its equilateral triangular coordinate system based on the map center and scale. Ideally something similar would be possible using OpenLayers and the icosahedral geometry of the Dymaxion projection. But that’s way ahead of me and this post. Please see Mr. Migurski’s posts (I, II, III, and IV) for more information on the Dymaxion transformation and Migurski’s reasons for switching to a gnomonic projection thanks in part to its extant and efficient inverse equation.

I just want to see more and more geographic projections available in web mapping frameworks. An argument can certainly be made that no more than a handful of projections are necessary for the vast majority of reference and thematic mapping purposes. But this leaves out certain artistic and political mapping pursuits that may benefit from more experimental projections like the Dymaxion.

Gene Keyes, a Fuller devotee, has declared a similar but earlier octahedral projection (below) by B.J.S. Cahill superior to Fuller’s (h.t. the Map Room). So next I’d like to see about implementing Cahill’s also-patented “butterfly” projection for vector data in JavaScript. Both Fuller and Cahill are in a class of interrupted projections that are 1) especially difficult to implement for vector data because of section crossings and 2) potentially quite useful for mapping global datasets due to minimized shape/area distortion and the horizontalization of the classic world map projection’s north-south orientation.

Much love to the Buckminster Fuller Institute, the current patent-holders of Fuller’s geographic transformations. And again full credit must go to the Protovis people for their client-side implementation of the Dymaxion/Fuller projection algorithm.

Noncontiguous cartograms in OpenLayers and Polymaps

I’m happy to be doing less Flash and more JavaScript development these days. In particular, I’ve been investigating two open-source JavaScript web mapping platforms: one old, OpenLayers, and one new, Polymaps.

OpenLayers has been around a while, but still performs remarkably well as a slippy map framework while allowing easy thematic map customization. Polymaps is brand new (from Stamen so you know it’s going to blow your mind), but is remarkable for enabling web standards-based thematic customization of geographic layers loaded via GeoJSON or KML onto “vector tiles that are rendered with SVG”.

In this post I show how either OpenLayers or Polymaps can be used to create dynamic and customizable noncontiguous cartograms with very little code.

Idea

NY Times noncontiguous cartogram example from 2007

The above is a cartogram of state electoral influence from 2007 by the NY Times

I’ve written about these gals before. The form involves resizing features (like states or countries) relative to the units’ attribute values in a given field (often population). Unlike the more common, contiguous form of the cartogram, noncontiguous cartograms don’t attempt to maintain topology, but are therefore free to maintain shape perfectly and experiment with position, as in the above example.

Here’s an example from Judy Olson’s original 1976 article on this cartogram form.

noncontiguous cartogram from Olson\

See my older post or Olson’s article for more info on the technique and the theory behind it. Olson produced the above image semi-manually using a projector — each state was projected at a precise scaling factor and then traced. Below I show how to do more or less the same thing, but with JavaScript and either OpenLayers or Polymaps.

Implementation

I wanted to be able to load in any polygonal geodata file (supported by the chosen web mapping framework) and resize the features based on any numerical attribute in order to form a noncontiguous cartogram. The advantage of implementing this within a web mapping framework is obviously that additional data layers from various sources can easily be over or underlain.

As a test and proof of concept for both frameworks, I wanted to reproduce Olson’s graphic (above) as best as I fairly easily could. Olson used 1970 Census data to show the number of people aged 65+ by state; here I’m updating it with estimated 2009 data. Specifically, I’ll be loading this Geocommons data layer uploaded last year.

I’ve been working with OpenLayers for about half a year so I implemented cartograms there first.

OpenLayers

Implementing noncontiguous cartograms in OpenLayers is fairly straightforward, thanks to the helpful methods provided by this comprehensive framework. The first step is loading the geodata.

Load geodata

OpenLayers makes loading geodata quite easy; the library can parse WKT, GML, KML, GeoJSON, GeoRSS, etc. For all we’re gonna use the Layer.Vector class. In this case I’ll load in the KML version from Geocommons, and therefore OpenLayer’s Format.KML parser.

var kmlLayer = new OpenLayers.Layer.Vector( "KML", {
    projection : new OpenLayers.Projection("EPSG:4326"),
    strategies: [ new OpenLayers.Strategy.Fixed() ],
    protocol: new OpenLayers.Protocol.HTTP( {
        url: "http://geocommons.com/overlays/55629.kml",
        format: new OpenLayers.Format.KML( {
            extractStyles: false,
            extractAttributes: true,
            maxDepth: 2
        } )
    } ),
    style : {
        'fillColor' : '#dddddd', 'fillOpacity' : 1,
        'strokeColor' : '#666666', 'strokeWidth' : 1
    }
} );

Noncontiguous cartograms don’t require a geographic projection — regardless of the projection of the original linework the features can still be scaled up or down accurately to form the cartogram. So the only reasons for projection are aesthetics and to enhance recognizability of features. I’m unsure of what projection Olson used, but I just went for that old classic, Albers Equal Area. Specifically, I used Proj4js and the following definition:

Proj4js.defs["MY_ALBERS"] = “+proj=aea +lat_1=32 +lat_2=58 +lat_0=45 +lon_0=-97 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs”;

To apply it, I just create a new OpenLayers.Projection which gets applied to the map when it’s instantiated. Thematicmapping.org has more info on projections and OpenLayers.

Unless you already know the maximum value of whatever attribute you’re mapping, you’ll have to loop through them all once before you can loop through them again to scale. Attributes are accessible via the attributes property of each OpenLayers.Feature.Vector (accessible via the features property of the OpenLayers.Layer.Vector).

Scale features

In order to scale a polygonal feature for a noncontiguous cartogram, we must know:

  1. the feature’s value for the chosen thematic attribute (see above)
  2. the feature’s area and centroid as rendered

    OpenLayers makes these easily accessible. Each feature’s geometry object has getArea and getCentroid methods. As far as I can tell, the getCentroid function returns the true polygonal center of mass, and not just the center of the feature’s bounding box.

  3. the feature’s desired area (in pixels) given the maximum area provided and the feature’s value as a percentage of the layer’s maximum value

    desiredArea = ( value / maxValue ) * maxArea;

  4. finally, the feature’s scale which is just a function of its original and desired area

    desiredScale = Math.sqrt( desiredArea / originalArea );

    This scale is then applied via the resize method of each feature’s geometry.

    feature.geometry.resize( desiredScale, centroid );

Result

Here’s an image from the example you can find on this page. All source can just be accessed from there.

noncontiguous cartogram from Olson redrawn in OpenLayers

Hey, that looks pretty great. Michigan’s kinda off-center, but I believe that’s because the polygon is only defined by one ring of coordinates, though it should be a multipolygon. Perhaps more noticeable is the overlap in the Northeast. In Judy Olson’s original example, states were blown up by a visual projector and then traced. But the projector could be aimed before tracing, thus avoiding overlap. In this case I simply scale the states and keep them at their original centroids.

I could avoid overlap by determining appropriate positioning and setting this within OpenLayers (but overlap would be very difficult to determine and then address dynamically) or by significantly reducing the configurable maximum area allowable on the resultant cartogram (but in order to be sure you’re preventing overlap features would have to be scaled quite small, which would detract from the readability of the cartogram).

Polymaps

Polymaps is a fairly new JavaScript mapping library by Stamen and SimpleGeo. Geocommons recently introduced Polymaps to their online mapping service, creating a quite powerful Flash-free online thematic mapping tool. Creating noncontiguous cartograms in Polymaps was a bit tougher than the process detailed above, just because the library is so light-weight.

Load geodata

The first step is quite easy. As far as vector geodata formats, Polymaps only has built-in support for GeoJSON, though they do provide a KML example that takes advantage of an optional fetch method specified in the GeoJSON layer constructor.

But I’ll just go with GeoJSON for this one. I found out in this post from GeoIQ that I can access a GeoJSON version of features in any Geocommons dataset by going to the “features.json” endpoint. So for my 2009 estimated census dataset I’ll be loading in http://geocommons.com/overlays/55629/features.json?geojson=1 using the following simple method:

var url = "http://geocommons.com/overlays/55629/features.json?geojson=1";                     
map.add(po.geoJson()
    .url(url)
    .on("load", load)
    .tile( false )
    .id("states"));

Note that loading directly from Geocommons only works while developing locally because of cross-domain policy. So in my finished example I end up loading a local version of the JSON.

Polymaps is limited to the Web Mercator projection for display, but we can still produce a passable reproduction of Olson’s original.

As in the OpenLayers example above, unless you already know the maximum value of your attribute, you’ll need to first loop through the features to determine it. In the above code, you can see I’m using the layer’s on method to listen for the “load” event. And in there I can get my features off the event object’s features property. Attributes are stored on each feature’s data.properties property.

Scale features

To scale each feature we must first know it’s value in the chosen attribute (see above). Then we need to determine it’s current area (in pixels) in order to figure the feature’s desired area on the eventual cartogram. OpenLayers provides a convenient method for this but in Polymaps we have to roll our own; for this Mike Bostock (one of the primary authors of Polymaps) was of much help. To calculate the area of each feature I just needed access to the list of projected coordinates (then I could employ the basic technique detailed here). Mr. Bostock pointed me to the pathSegList property of the SVGPathElement interface. The pathSegList exposes a list of path segments with the SVGPathSeg interface. Mike said I could count on these segments being one of types “M” (move to), “L” (line to), or “Z” (end line). With this information I quickly put together a method that should return the projected area of any SVGPathElement that Polymaps may produce.

function getPathArea( segList )
{               
    var area = 0;
    var seg1, seg2;                 
    var nPts = segList.numberOfItems;
     
    // let's see if the last item is a 'Z' (it should be)
    var lastLetter =
        segList.getItem( nPts - 1 ).pathSegTypeAsLetter;
    if ( lastLetter.toLowerCase() == 'z' )
        nPts --;
    var j = nPts - 1;
    segItem_list:
    for ( var i = 0; i < nPts; j=i++ )
    {
        seg1 = segList.getItem( i );
        seg2 = segList.getItem( j );
        area += seg1.x * seg2.y;
        area -= seg1.y * seg2.x;
    }
    area /= 2;
    return Math.abs( area );
}

I could easily create a similar centroid method, but I got lazy and decided to just use the center of each feature’s bounding box (accessible via each feature SVG element’s getBBox method).

The desired area and scale are calculated just as we did above in OpenLayers:

desiredArea = ( value / maxValue ) * maxArea;
desiredScale = Math.sqrt( desiredArea / originalArea );

The scale is then applied via the ‘transform’ attribute of each SVG element; both scale and x-y translation must be defined in the ‘transform’ attribute:

feature.element.setAttribute(
    "transform",
    "scale(" + scale + " " + scale + ")" + " " +             
    "translate(" +
        -( ( centerX * scale - centerX ) / scale ) +
        " " +
        -( ( centerY * scale - centerY ) / scale ) +
    ")"
);
Result

As before, here’s an image captured from the example you can find on this page. You’ll see a bit of the code there, but please ‘view source’ to see all the code and markup.

noncontiguous cartogram from Olson redrawn in OpenLayers

Aside from the necessary difference of projection, the main thing that stands out is the overlap in the Northeast. I discussed possible ways to avoid this in the OpenLayers example above.

Conclusion

Nothing here is new. I’ve been doing this in Flash for a while — see here and here; and of course Judy Olson was doing it some 35 years ago. But it’s nice to see it working dynamically in a couple of open source, web standards-compliant libraries. Thanks to OpenLayers, Polymaps, and Geocommmons for making it possible!