indiemaps.com/blog

For a while now, I’ve wanted to build geographic topology in Flash. Topology, as described in an ESRI white paper, is

a set of rules and behaviors that model how points, lines, and polygons share geometry. For example, adjacent features, such as two counties, will share a common edge.

For the applications I’ve had in mind, polygonal or areal topology is all that’s required: I need to know which features share a common border with which features. As a bonus, it’d be nice to know how long a border they share (how contiguous are they?).

Described further down is a raster method to determine vector feature adjacency. First, though, I’ll cover why such information must be built in the first place and how it is useful for certain cartographic applications.

Application to cartography and visualization

Topology needs to be built because it is not encoded in the most popular vector GIS formats (KML and the shapefile). In these formats, features (states or counties perhaps) are all stored separately; redundant points (like the shared corner of the “four corners” states) are repeated. There are topological GIS formats, like Arc/Info Export (e00), but geospatial data are rarely distributed in such formats.

One potential use for topological information is graph decomposition. Dasgupta et al write in Algorithms:

A wide range of problems can be expressed with clarity and precision in the concise pictorial language of graphs. For instance, consider the task of coloring a political map. What is the minimum number of colors needed, with the obvious restriction that neighboring countries should have different colors? One of the difficulties in attacking this problem is that the map itself, even a stripped-down version like Figure 3.1(a), is usually cluttered with irrelevant information: intricate boundaries, border posts where three or more countries meet, open seas, and meandering rivers. Such distractions are absent from the mathematical object of Figure 3.1(b), a graph with one vertex for each country (1 is Brazil, 11 is Argentina) and edges between neighbors. It contains exactly the information needed for coloring, and nothing more.

The simple graph above is two steps away from a circular cartogram of the form popularized by Daniel Dorling. On such maps, 1) feature circles are sized according to some numeric attribute, and neighbors — instead of being connected by lines — are 2) iteratively moved together or apart so that adjacencies are maintained where possible while avoiding circle overlap. The circular cartogram below is a snapshot of a NY Times interactive app developed in part by Lee Byron.

In addition to knowing the neighbors of all features, Daniel Dorling’s original algorithm requires as input the shared border lengths of all contiguous features. These are used in the force-repulsion algorithm — while attempting to maintain all feature adjacencies, Dorling’s algorithm applies extra attraction forces to features that share relatively longer borders.

In cartography, topology has much value beyond that described above. Generalization immediately comes to mind; to generalize/simplify/smooth the outline of a polygonal feature, one must also consider the feature’s neighbors. If the cartographer fails to do so, gaps may be created where features previously neatly adjoined. That noted, I should say that the method described here — while quite accurate in evaluating adjacencies — can only determine adjacency and relative (pixel-based) feature overlap, whereas polygonal shapefile generalization necessitates a true (and computationally intensive) topological indexing of each and every coordinate in the dataset. More on this later.

Method

Features stored in KML and shapefiles are defined by a series of latitude-longitude coordinates. The brute force method of determining if two features are topological neighbors would be to loop through their coordinates searching for an exact match. Not only would this be quite computationally intensive, but it wouldn’t detect true neighbors that were digitized independently; a shared border does not necessarily mean that the two features are defined using the exact same control points.

Of course it can be done. Most GIS software today will accurately build topology from a non-topological data source. The NY Times’ Matt Bloch detailed a server-side method in his Wisconsin (natch) Cartography MS thesis (as well as a 2006 AUTOCARTO proceedings paper), used in his MapShaper app, for essentially converting a polygonal shapefile to a non-redundant polyline shapefile. Though more efficient than simple brute force, these applications still rely on algorithms that require a large number of computations, and this scales up quickly as features get more complex. This is why topology is always built server-side.

The method I settled on is similar to pixel-based collision detection (also called “pro pixel collision” or “pixel-perfect collision”) familiar to game developers. In Flash, hitTests are only possible with specific points. To test for overlap of complex polygonal features, game developers have often relied on a pixel-level bitmap test for overlap. As described in an article by Troy Gilbert,

It’s pretty straightforward: you render two display objects to separate color channels, combine the color channels, then search the resulting image for any overlapping color.

I first encountered the technique in a post by Grant Skinner describing a method for shape-based collision detection in Flash published in 2005. GSkinner’s method doesn’t work as is in detecting adjacency in geographic features drawn from shapefiles. But with a few additions, the method is quite accurate and efficient in detecting feature neighbors in real-time. Try it out:

Some code

This isn’t really fully developed. It’s just something I’ve been thinking about for a while and wanted to get out there. Here though is a class I wrote to perform this pixel-based adjacency testing, as used in the above demo. It contains two chief public static methods:

getNeighborsForFeature(): Returns a Vector of DisplayObject features determined to be neighbors of the passed-in feature (from a passed-in Vector of all features)
checkFeatureAdjacency(): Checks the two passed-in DisplayObject features for adjacency. Returns the number of overlapping pixels.

How it works

To test for adjacency, features are drawn to a bitmap. The second feature is overlain using the difference blend mode. Any pixels shared by both features will now be lighter than before; I can check for such pixels using the BitmapData.threshold() method.

Adjacent features, though, will only overlap if they’ve been drawn with an external stroke. The two methods above accept a strokeWidth parameter. If this is less than 2 (features with external strokes of 1 sometimes fail to detect overlap accurately), I apply a precise GlowFilter to each raster. This increases accuracy but affects performance; though it’s not necessary, the methods perform best when features are initially drawn with a 2px stroke (as in the demo above).

Limitations

Accuracy

This method is quite accurate for most realistic geographies. But of course it’s all dependent on 1) the scale and complexity of your data and 2) the size of the BitmapData instance used for testing. One particular area of concern is features that share a corner but no actual border line segments. In some applications, these should be considered neighbors; in others, not. With a large enough BitmapData, such adjacencies will be detected.

If such features should not be considered neighbors in your application, a threshold may be set. checkFeatureAdjacency() returns the number of overlapping pixels; if this number is sufficiently low, features can be considered non-adjacent.

Performance

I haven’t done any real performance tests, but the method works fine, in a test with the 3000+ U.S. counties drawn from a large-scale shapefile in Firefox on my MacBook; neighbor detection was detected in real-time and nothing was cached.

As noted above, the methods perform fastest when features are initially drawn with a 2px stroke. If the methods are to be called repeatedly, it’s best to pass in shared BitmapData instances to be used for testing; the larger the BitmapData instance, the greater the accuracy, though this is offset by increased processing time for the BitmapData instance methods. No matter what you pass in, though, this raster-based method is far faster than any true coordinate-based adjacency testing algorithm (important b/c it is being performed client-side).

Generalization

This method simply doesn’t work for generalization (simplification or smoothing of feature border linework). Such processes require knowledge of all shared line segments, necessitating true coordinate-based topology building; my method only returns the number of overlapping pixels, which only correlates with actual shared border length (this, though, is sufficient information for the Dorling circular cartogram algorithm).

But I hope it’s useful for at least a few visualization applications. True topological indexing is best; but this computationally intensive process cannot yet be performed client-side in Flash. I’ll be working with these methods in the future, most likely in an implementation of Dorling’s circular cartograms in indiemapper.

2 Comments

Nice work!

Do you think there’s any way to take the overlapping pixels you generate here and turn them into coordinates to generate a set of vector shared borders? (As a step toward, say, generalization.) I can imagine such a thing being pretty inaccurate and still computationally intense, but you probably have a better sense of it from your work here. I also ask this without having tried out your methods yet, so maybe it’s a stupid question in the first place.

— Andy Woodruff

Posted January 13, 2010 at 9:00 am | Permalink
@Andy Woodruff- good Q. I don’t think so though. The BitmapData.threshold() method only returns an integer of pixels that match the test. Other BitmapData methods may be used to gather more info about the location of overlapping pixels, and perhaps a rough vector could be approximated. But I doubt it, and I wonder how the resultant vector would be associated with the non-overlapping coordinates of overlapping features.

No worries, though. Soon enough, true topological indexing will be old hat in Flash. What I’ve presented is just a shortcut method for the meantime.

— zach'ry

Posted January 17, 2010 at 1:57 am | Permalink

neocartography + information visualization

the notebook of cartographer Zachary Forest Johnson

Building topology in Flash