Building topology in Flash

For a while now, I’ve wanted to build geographic topology in Flash. Topology, as described in an ESRI white paper, is

a set of rules and behaviors that model how points, lines, and polygons share geometry. For example, adjacent features, such as two counties, will share a common edge.

For the applications I’ve had in mind, polygonal or areal topology is all that’s required: I need to know which features share a common border with which features. As a bonus, it’d be nice to know how long a border they share (how contiguous are they?).

Described further down is a raster method to determine vector feature adjacency. First, though, I’ll cover why such information must be built in the first place and how it is useful for certain cartographic applications.

Application to cartography and visualization

Topology needs to be built because it is not encoded in the most popular vector GIS formats (KML and the shapefile). In these formats, features (states or counties perhaps) are all stored separately; redundant points (like the shared corner of the “four corners” states) are repeated. There are topological GIS formats, like Arc/Info Export (e00), but geospatial data are rarely distributed in such formats.

One potential use for topological information is graph decomposition. Dasgupta et al write in Algorithms:

A wide range of problems can be expressed with clarity and precision in the concise pictorial language of graphs. For instance, consider the task of coloring a political map. What is the minimum number of colors needed, with the obvious restriction that neighboring countries should have different colors? One of the difficulties in attacking this problem is that the map itself, even a stripped-down version like Figure 3.1(a), is usually cluttered with irrelevant information: intricate boundaries, border posts where three or more countries meet, open seas, and meandering rivers. Such distractions are absent from the mathematical object of Figure 3.1(b), a graph with one vertex for each country (1 is Brazil, 11 is Argentina) and edges between neighbors. It contains exactly the information needed for coloring, and nothing more.

The simple graph above is two steps away from a circular cartogram of the form popularized by Daniel Dorling. On such maps, 1) feature circles are sized according to some numeric attribute, and neighbors — instead of being connected by lines — are 2) iteratively moved together or apart so that adjacencies are maintained where possible while avoiding circle overlap. The circular cartogram below is a snapshot of a NY Times interactive app developed in part by Lee Byron.

In addition to knowing the neighbors of all features, Daniel Dorling’s original algorithm requires as input the shared border lengths of all contiguous features. These are used in the force-repulsion algorithm — while attempting to maintain all feature adjacencies, Dorling’s algorithm applies extra attraction forces to features that share relatively longer borders.

In cartography, topology has much value beyond that described above. Generalization immediately comes to mind; to generalize/simplify/smooth the outline of a polygonal feature, one must also consider the feature’s neighbors. If the cartographer fails to do so, gaps may be created where features previously neatly adjoined. That noted, I should say that the method described here — while quite accurate in evaluating adjacencies — can only determine adjacency and relative (pixel-based) feature overlap, whereas polygonal shapefile generalization necessitates a true (and computationally intensive) topological indexing of each and every coordinate in the dataset. More on this later.

Method

Features stored in KML and shapefiles are defined by a series of latitude-longitude coordinates. The brute force method of determining if two features are topological neighbors would be to loop through their coordinates searching for an exact match. Not only would this be quite computationally intensive, but it wouldn’t detect true neighbors that were digitized independently; a shared border does not necessarily mean that the two features are defined using the exact same control points.

Of course it can be done. Most GIS software today will accurately build topology from a non-topological data source. The NY Times’ Matt Bloch detailed a server-side method in his Wisconsin Cartography MS thesis (as well as a 2006 AUTOCARTO proceedings paper), used in his MapShaper app, for essentially converting a polygonal shapefile to a non-redundant polyline shapefile. Though more efficient than simple brute force, these applications still rely on algorithms that require a large number of computations, and this scales up quickly as features get more complex. This is why topology is always built server-side.

The method I settled on is similar to pixel-based collision detection (also called “pro pixel collision” or “pixel-perfect collision”) familiar to game developers. In Flash, hitTests are only possible with specific points. To test for overlap of complex polygonal features, game developers have often relied on a pixel-level bitmap test for overlap. As described in an article by Troy Gilbert,

It’s pretty straightforward: you render two display objects to separate color channels, combine the color channels, then search the resulting image for any overlapping color.

I first encountered the technique in a post by Grant Skinner describing a method for shape-based collision detection in Flash published in 2005. GSkinner’s method doesn’t work as is in detecting adjacency in geographic features drawn from shapefiles. But with a few additions, the method is quite accurate and efficient in detecting feature neighbors in real-time. Try it out:

Some code

This isn’t really fully developed. It’s just something I’ve been thinking about for a while and wanted to get out there. Here though is a class I wrote to perform this pixel-based adjacency testing, as used in the above demo. It contains two chief public static methods:

  • getNeighborsForFeature(): Returns a Vector of DisplayObject features determined to be neighbors of the passed-in feature (from a passed-in Vector of all features)
  • checkFeatureAdjacency(): Checks the two passed-in DisplayObject features for adjacency. Returns the number of overlapping pixels.

How it works

To test for adjacency, features are drawn to a bitmap. The second feature is overlain using the difference blend mode. Any pixels shared by both features will now be lighter than before; I can check for such pixels using the BitmapData.threshold() method.

Adjacent features, though, will only overlap if they’ve been drawn with an external stroke. The two methods above accept a strokeWidth parameter. If this is less than 2 (features with external strokes of 1 sometimes fail to detect overlap accurately), I apply a precise GlowFilter to each raster. This increases accuracy but affects performance; though it’s not necessary, the methods perform best when features are initially drawn with a 2px stroke (as in the demo above).

Limitations

Accuracy

This method is quite accurate for most realistic geographies. But of course it’s all dependent on 1) the scale and complexity of your data and 2) the size of the BitmapData instance used for testing. One particular area of concern is features that share a corner but no actual border line segments. In some applications, these should be considered neighbors; in others, not. With a large enough BitmapData, such adjacencies will be detected.

If such features should not be considered neighbors in your application, a threshold may be set. checkFeatureAdjacency() returns the number of overlapping pixels; if this number is sufficiently low, features can be considered non-adjacent.

Performance

I haven’t done any real performance tests, but the method works fine, in a test with the 3000+ U.S. counties drawn from a large-scale shapefile in Firefox on my MacBook; neighbor detection was detected in real-time and nothing was cached.

As noted above, the methods perform fastest when features are initially drawn with a 2px stroke. If the methods are to be called repeatedly, it’s best to pass in shared BitmapData instances to be used for testing; the larger the BitmapData instance, the greater the accuracy, though this is offset by increased processing time for the BitmapData instance methods. No matter what you pass in, though, this raster-based method is far faster than any true coordinate-based adjacency testing algorithm (important b/c it is being performed client-side).

Generalization

This method simply doesn’t work for generalization (simplification or smoothing of feature border linework). Such processes require knowledge of all shared line segments, necessitating true coordinate-based topology building; my method only returns the number of overlapping pixels, which only correlates with actual shared border length (this, though, is sufficient information for the Dorling circular cartogram algorithm).

But I hope it’s useful for at least a few visualization applications. True topological indexing is best; but this computationally intensive process cannot yet be performed client-side in Flash. I’ll be working with these methods in the future, most likely in an implementation of Dorling’s circular cartograms in indiemapper.

the first thematic maps

Below’s a quick outline of the first maps created with six common cartographic symbologies. All of the below is out there, in books, articles, and blog posts. Particularly helpful are Alan MacEachren’s 1979 article The Evolution of Thematic Cartography, Arthur Robinson’s Early Thematic Mapping in the History of Cartography (1982), Borden Dent’s thematic cartography textbook, and Michael Friendly and Daniel J. Denis’ Milestones chronology. I couldn’t find a good modern summary, though, of these “firsts” of thematic cartography. So I put a quick one together.

The six symbologies are the classic thematic cartography representation methods: choropleth, proportional symbol, dot density, flow, isarithmic, and cartogram. Borden Dent’s great cartography textbook, in the section titled “Techniques of Quantitative Thematic Mapping”, dedicates a chapter to each of these symbologies, and to no others. The classic Robinson textbook, as well as the modern Slocum et al one, also dedicate more space to these six representation methods than to any others.

isoline

The first isarithmic (representation of continuous phenomena using lines of equal value) map is often ascribed to Edwin Halley, with his 1701 isogonic contour maps of magnetic declination.

Edwin Halley's (1701) isogonic contour map

Not so, as the symbology has been traced back to as early as 1584. Robinson notes in Early Thematic Mapping:

The contour had tentative beginnings as early as the end of the sixteenth centruly in the form of lines of equal depth on a map of the River Spaarne made in 1584 by the Dutch surveyor Pieter Bruinsz. The next use of the symbol came more than a hundred years later in 1697 by Pierre Ancellin…

the earliest known isoline or contour map, produced by Pieter Bruinsz in 1584

You can’t see too much in the above — a terrible scan I made from Robinson’s Early Thematic Mapping. This map isn’t listed on Milestones, nor could I find any reproductions of Bruinsz’s map on the internet. What I did find was a detail image of unknown provenance on the interesting site, Dutch thematic maps on the web.

detail of the earliest known isoline or contour map, produced by Pieter Bruinsz in 1584

On the above you can clearly see the dashed bathymetric line of equal depth.

choropleth

Frenchman Charles Dupin’s (1826) Carte figurative de l’instruction populaire de la France is the first known choropleth map.

the earliest choropleth map, produced by Charles Dupin in 1826

Dupin’s map was and is well known, and had a great impact on cartographers and statisticians at the time. Interestingly, the particular form of Dupin’s choropleth map — the unclassed variety, in which each unique value gets a unique shade — didn’t stick. The classed variety quickly took hold, with the first such map appearing in an 1828 Prussian atlas, Administrativ-Statistischer Atlas vom Preussischen Staatae.

the earliest classed choropleth, included in an 1828 Prussian atlas

Notice not the subtle lean in my scan (from Robinson since I couldn’t find it elsewhere), but the class breaks shown along the bottom. The author of the thematic portion of the atlas is unknown.

dot density

Dupin’s fellow countryman Frère de Montizon is responsible for the first dot density map, Carte philosophique figurant la population de la France (1830).

the earliest dot density map, produced by Frère de Montizon in 1830

Unlike Dupin, who is fairly well known, Robinson describes Montizon as a “mystery man”. He continues,

It is one of the accidents of history that Frère de Montizon’s invention of the thematic dot map should have gone completely unnoticed. In terms of cartographic innovation it ranks with the isothermal map, yet as far as can be ascertained no reference to him or to his dot map was made by anyone well into the twentieth century…this basically simple, logical idea had to wait some thirty years to be reinvented and much longer than that to become generally known.

The symbology was “reinvented” by Thure Alexander von Mentzer in his 1859 map of population distribution in the Scandinavian peninsula. I can’t find any reproductions of this map anywhere.

proportional symbols

The first known example of a proportional symbol map appeared in the Atlas to Accompany Second Report of the Railway Commissioners, Ireland (1837). Though the atlas was apparently quite innovative with regard to thematic cartography, it went largely unnoticed due to limited distribution.

Here, Henry Drury Harness’s map of Irish population density from the atlas.

the earliest proportional symbol map, of Irish population, published by Henry Drury Harness in 1837

They’re a bit hard to make out in this horrible scan (again from Robinson — listed as part of his personal collection), but there they are: circles scaled according to population centered at various Irish cities. This wasn’t the first time that proportional circles had been used to convey data, but it was the first time it had been done on a map. Notice the shading?– this is also one of the first dasymetric maps.

flow maps

The following map is more well known, and could share the title of first proportional symbol map, as it appeared in the same 1837 atlas as the above map. Here, the innovation is the use of line thickness to convey a quantitative value, in this case the traffic between Irish cities (again by Henry Drury Harness).

the earliest known flow map, produced by Henry Drury Harness in 1837

cartogram

I’ve written about this before, so below’s a quick summary.

Many sources, including Dent’s text, point to Émile Levasseur as the originator of the value-by-area cartogram. Levasseur included diagrammatic maps like the following 1868 map of Europe in his economic geography textbooks.

early diagrammatic map (supposedly first cartogram) by Levasseur

The above map was reprinted by Funkhouser in 1937 and by Waldo Tobler in his Thirty Five Years of Computer Cartograms. In the latter, Tobler notes:

This is a map of the countries of Europe in which each country is represented by a square whose size is proportional to the area of the country, and with countries in their approximately correct position and adjacency. Could this be called an equal area map? Or is it an equal area cartogram?

I believe the former: that this should be considered a diagrammatic equal area map, but not a value-by-area cartogram. Sara Fabrikant makes the case for the German provenance of the first cartogram:

Another notable and early European contribution to the thematic mapper’s toolbox is the value-by-area cartogram. Hermann Haack and H. Wiechel published a cartogram depicting election results from the German Reichstag in 1903 (cited in Eckert 1925).

Haack and Wiechel’s map is indeed cited in Eckert’s famous Die Kartenwissenschaft volumes, but it isn’t reprinted there. Nor is it clear that unit areas on their election maps represented something other than land area. The earliest true value-by-area cartogram that I’ve seen reproduced came by way of Professor John Krygier, who reprinted William Bailey’s 1911 population cartogram.

perhaps the first American cartogram

I doubt the above is truly the first cartogram, and welcome any scans or references to true value-by-area cartograms that precede the 1911 map.

conclusion

Four of the six classic thematic cartography symbologies — choropleth, dot density, proportional symbol, and flow — originated between 1826 and 1837. Two of them — proportional symbol and flow — were initially produced by one man (Harness), and appeared in the same obscure railway atlas. All were refined in the 19th century, and only one (isolines) predate the century.

Conspicuously absent above are the names William Playfair and Joseph Minard. Indeed, some sources still cite these well-known engineer-statisticians as the originators of the proportional symbol and flow symbologies. Playfair may have been the first to use proportional (value-by-area) symbols (specifically circles) to represent quantitative data, but it wasn’t on a map.

Funkhouser credited Minard with the invention of the flow symbology, ignoring Harness’s innovation eight years earlier:

The second, called cartograms with bands (cartogramme a bandes), was originated simultaneously and independently by the French engineer, Minard, and the Belgian engineer, Belpaire, in 1845. This consists of colored bands or ribbons which follow the watercourses and railways on a map, the width of the band being proportional to the amount of traffic or number of passengers carried.

It’s certainly the case, though, that Playfair paved the way for much thematic mapping experimentation in the early 19th C. and that Minard helped popularize the flow, proportional symbol, and choropleth symbologies.

classed cartograms

Classification is commonplace in thematic cartography. In choropleth mapping, classification is the norm. This was not always so. The first choropleth map (created by Baron Charles Dupin in 1826) was unclassed.

According to Arthur Robinson, in his Early Thematic Mapping in the History of Cartography:

So far as we now know, the first choropleth map to provide a legend and give class limits to the tones was the 1828 Prussian manuscript map of population density. After the early 1830s most choropleth maps employed classes of various sorts, some based on rankings of the districts, some based on percentage departures from a mean, and some based on categorizing the data itself.

It seems that classification in choropleth mapping was adopted for technical rather than theoretical reasons. Robinson notes that “engravers were unsuccessful in making such [unclassed] maps, since they did not have full control over the darkness of very many flat tones.” Nonetheless, empirical research, begun in the late 1970s and continuing into the 1990s, backed up the decision, at least as regards the acquisition of specific information. Data on the recall of specific information and the acquisition of general information was less conclusive; a good summary of this research is provided in Slocum et al’s cartography textbook.

(classed histogram legend from my first thematic map, produced for Mark Harrower’s Geography 370: Introduction to Cartography)

Range-graded proportional symbols

In proportional symbol mapping, the unclassed form is the norm. This, too, seems to be a technical rather than theoretically-based trend. As noted in the Slocum text:

Classed, or range-graded, maps can be created by classing the data and letting a single symbol size represent a range of data values, but unclassed proportional symbol maps are more common. This might seem surprising given the frequency with which classed choropleth maps are used. The difference stems, in part, from the ease with which unclassed proportional symbol maps could be created in manual cartography (either an ink compass or a circle template could be used to draw circles of numerous sizes).

The logical extension of classed choropleth mapping to proportional symbol mapping was first suggested by Hans Joachim Meihoefer in the late 1960s. Here, too, classification is purported to make the thematic map easier to comprehend; also from Slocum:

Range grading is considered advantageous because readers can easily discriminate symbol sizes and thus readily match map and legend symbols; another advantage is that the contrast in circle sizes might enhance the map pattern, in a fashion similar to the use of an arbitrary exponent…

This comes at the expense of precision, as map reader’s can never get exact values for enumeration units. Borden Dent notes, “In this scaling method, symbol-size discrimination is the design goal, rather than magnitude estimation.”

Rarely, though, is exact value retrieval (magnitude estimation) the goal of thematic cartography, at least of the static variety. Further, map readers are notoriously bad size estimators, as an array of psychophysical studies in and outside of academic cartography have established. The graph below (from John Krygier’s excellent blog post on perceptual scaling) sums up the average response to 1, 2, and 3D proportional symbols.

The above suggests that subjects uniformly underestimate the areas of proportional symbols. Perceptual scaling has been developed in response, in which symbol sizes are perceptually scaled according to a power function, rather than being scaled mathematically.

Perhaps more important, though, are the deviations among subjects hidden in the graph above. T.L.C. Griffin, in a 1985 study, found an average underestimation similar to previous studies, but stressed that “perceptual rescaling was shown to be inadequate to correct the estimates of poor judges, while seriously impairing the results of those who were more consistent.”

Both the Slocum et al and Dent cartography textbooks introduce range-grading of proportional symbols as a potential solution to the 1) average poor estimation of symbol size and 2) high deviation in this underestimation.

Cartograms then

This makes a case for classification in proportional symbol mapping. It also makes the case for classification on cartograms. Cartograms can be considered a variant of proportional symbol map in which the symbol shape used is that of the original enumeration unit in some projection. But in my cartogram research for the M.S. Cartography degree at the University of Wisconsin, I ran across no references to classifying the area of cartogram units. A more recent search also revealed no references. Classification seems particularly appropriate to cartograms because the limited research done on their perception suggests that users estimate cartogram feature area much less accurately than the simplified shapes of standard proportional symbol maps.

The above is a normal, unclassed cartogram showing U.S. population. Notice the continuous range of state areas, shown below in order of population (but smaller).

Despite the two legend chips, I think it’s quite difficult to estimate the population of states with any degree of precision. The following classed cartogram abandons precision in favor of a more easily and quickly interpreted map.

Exact values can never be retrieved, but each state can quickly be matched with one of four classes (grouped according to Jenks natural breaks classification).

The above maps were created with a beta version of indiemapper. No other mapping software allows the creation of classed cartograms. You can get around this when using tools such as ScapeToad or Frank Hardisty’s Cartogram Generator (both of which utilize the contiguous Gastner-Newman cartogram algorithm) by pre-classing your data, so that only 3-7 unique values are found in the dataset for a given attribute.

In addition to 1) increased discriminability of symbol sizes and 2) a potentially enhanced spatial pattern, classed cartograms may hold a third advantage over the unclassed variety, though only in bivariate mapping. Cartograms, though, are most typically and appropriately employed in bivariate mapping. When constructing a bivariate cartogram, the cartographer sizes enumeration units according to one variable (typically population) and colors the units, choropleth-style, according to some other variable (often election results). In unclassed cartograms, some features may shrink down so as to be nearly invisible, making the reading of the second (coloring) attribute impossible. In the classed variant, a small but still legible minimum size can be established for the first class of data, ensuring that the coloring attribute can be interpreted on all units. A minimum size can be established on an unclassed cartogram, but if mathematical/proportional scaling is employed it may result in absurdly large features at the higher end of the mapped variable.

Of course, classed cartograms won’t always have a larger minimum size; this must be a conscious design decision (in the U.S. cartograms above, the classed cartogram minimum size is smaller than the minimum size found on the unclassed version).

Inappropriate

Classification in cartogram scaling is not always appropriate. Indeed, the typical unclassed form is preferable in many cases. Borden Dent writes the following of classification and choropleth mapping; I believe it’s equally applicable to the proportional and cartogram symbologies:

The purpose of the choropleth map dictates its form. If the map’s main purpose is to simplify a complex geographical distribution for the purpose of imparting a particular message, theme, or concept, the conventional [classed] choropleth technique should be followed. On the other hand, if the goal is to provide an inventory for the reader in a form that the reader must simplify and generalize, then the unclassed form should be chosen.

This sounds a lot like the modern distinction between cartography and geovisualization, or like the older one between communication and represenation. These distinctions have been overblown; experts and amateurs are often looking at the same map. But there are certainly cases where the purpose and audience are more clear-cut. In such cases, classed cartograms may be an option.

While providing the cartographer with the “ability to direct the message of the communication” (Dent), this power comes with great responsibility — specifically the responsible choice of the 1) number of classes, 2) classification method, and 3) appropriate and easily discriminable symbol sizes for each class.

Research on classification in choropleth and proportional symbol mapping has settled on three and seven as the minimum and maximum number of classes that are effective. Classification methods have been much studied in academic cartography, statistics, and other fields. Jenks natural breaks classification is often employed; equal intervals and quantiles are also typical. Cartographers have less guidance on the third choice, that of an appropriate symbol size for each class.

The above shows a set of ten class sizes for range grading developed by Hans Joachim Meihoefer as the result of a visual experiment; the symbol sizes were developed for maximum discriminability, while maintaining a realistic size range (none too small, none too large). Ten classes weren’t recommended, but supposedly any five contiguous sizes could be chosen. Indiemapper uses a slight variation of Meihoefer’s sizing scheme for classed proportional symbol and cartogram sizing.

Despite the additional responsibilities placed on the cartographer, I believe a carefully-crafted classed cartogram can be a very effective representation of some datasets for certain audiences.