Automatic label placement
Encyclopedia
Automatic label placement (sometimes called text placement or name placement) refers to the computer methods of placing labels automatically on a map or chart. This is related to the typographic design of such labels
.
Map
s communicate spatial information to the reader, therefore they are a medium of communication. The typical features depicted on a geographic map are line features (e.g. roads), area features (countries, parcels, forests, lakes, etc.), and point features (villages, cities, etc.). In addition to depicting the map's features in a geographically accurate manner, it is of critical importance to place the names that identify these features, in a way that the reader knows instantly which name describes which feature.
Automatic text placement is one of the most difficult, complex, and time-consuming problems in mapmaking and GIS (Geographic Information System)
. Other kinds of computer-generated graphics - like chart
s, graph
s etc. - require good placement of labels as well, not to mention engineering drawings, and professional programs which produce these drawings and charts, like spreadsheets (e.g. Microsoft Excel
) or computational software programs (e.g. Mathematica
).
Naively placed labels overlap excessively, resulting in a map that is difficult or even impossible to read. Therefore, a GIS must allow a few possible placements of each label, and often also an option of resizing, rotating, or even removing (suppressing) the label. Then, it selects a set of placements that results in the least overlap, and has other desirable properties. For all but the most trivial setups, the problem is NP-Hard
.
Cartographers work based on accepted conventions and rules and they place labels in order of importance. For example, New York City, Vienna, Berlin, Paris, or Tokyo must show up on country maps because they are high-priority labels. Once those are placed, the cartographer places the next most important class of labels, for example major roads, rivers, and other large cities. In every step they ensure that (1) the text is placed in a way that the reader easily associates it with the feature, and (2) the label does not overlap with those already placed on the map.
places consecutive labels on the map in positions that result in minimal overlap of labels. Its results are not satisfactory even for very simple problems, but it is extremely fast.
Slightly more complex algorithms rely on local optimization to reach a local optimum of a placement evaluation function - in each iteration placement of a single label is moved to another position, and if it improves the result, the move is preserved. It performs reasonably well for maps that are not too densely labelled. Slightly more complex variations try moving 2 or more labels at the same time. The algorithm ends after reaching some local optimum.
The algorithm that yields good results with relatively good performance - simulated annealing
- is very simple. It works like local optimization, but it may keep a change even if it worsens the result. The chance of keeping such a change is , where is the change in the evaluation function, and is the temperature. The temperature is gradually lowered according to the annealing schedule. When the temperature is high, simulated annealing performs almost random changes to the label placement, being able to escape a local optimum
. Later, when hopefully a very good local optimum has been found, it behaves in a manner similar to local optimization. The main challenges in developing a simulated annealing solution are choosing a good evaluation function and a good annealing schedule. Generally too fast cooling will degrade the solution, and too slow cooling will degrade the performance, but the schedule is usually quite a complex algorithm, with more than just one parameter.
Another class of direct search algorithms are the various evolutionary algorithm
s, e.g. genetic algorithm
s. Other algorithms are also used, like various graph solutions, integer programming etc.
One simple optimization that is important on real maps is dividing a set of labels into smaller sets that can be solved independently. Two labels are rivals if they can overlap in one of the possible placements. Transitive
closure of this relation divides the set of labels into possibly much smaller sets. On uniformly and densely labelled maps, usually the single set will contain the majority of labels, and on maps for which the labelling is not uniform it may bring very big performance benefits. For example when labelling a map of the world, America
is labelled independently from Eurasia
etc.
If a map labeling problem can be reduced to a situation in which each remaining label has only two potential positions in which it can be placed, then it may be solved efficiently by using an instance of 2-satisfiability
to find a placement avoiding any conflicting pairs of placements; several exact and approximate label placement algorithms for more complex types of problems are based on this principle.
Labeling (map design)
Cartographic labeling is a form of typography and strongly deals with form, style, weight and size of type on a map. Essentially, labeling denotes the correct way to label features .- Form :...
.
Map
Map
A map is a visual representation of an area—a symbolic depiction highlighting relationships between elements of that space such as objects, regions, and themes....
s communicate spatial information to the reader, therefore they are a medium of communication. The typical features depicted on a geographic map are line features (e.g. roads), area features (countries, parcels, forests, lakes, etc.), and point features (villages, cities, etc.). In addition to depicting the map's features in a geographically accurate manner, it is of critical importance to place the names that identify these features, in a way that the reader knows instantly which name describes which feature.
Automatic text placement is one of the most difficult, complex, and time-consuming problems in mapmaking and GIS (Geographic Information System)
Geographic Information System
A geographic information system, geographical information science, or geospatial information studies is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographically referenced data...
. Other kinds of computer-generated graphics - like chart
Chart
A chart is a graphical representation of data, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart"...
s, graph
Graph of a function
In mathematics, the graph of a function f is the collection of all ordered pairs . In particular, if x is a real number, graph means the graphical representation of this collection, in the form of a curve on a Cartesian plane, together with Cartesian axes, etc. Graphing on a Cartesian plane is...
s etc. - require good placement of labels as well, not to mention engineering drawings, and professional programs which produce these drawings and charts, like spreadsheets (e.g. Microsoft Excel
Microsoft Excel
Microsoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...
) or computational software programs (e.g. Mathematica
Mathematica
Mathematica is a computational software program used in scientific, engineering, and mathematical fields and other areas of technical computing...
).
Naively placed labels overlap excessively, resulting in a map that is difficult or even impossible to read. Therefore, a GIS must allow a few possible placements of each label, and often also an option of resizing, rotating, or even removing (suppressing) the label. Then, it selects a set of placements that results in the least overlap, and has other desirable properties. For all but the most trivial setups, the problem is NP-Hard
NP-hard
NP-hard , in computational complexity theory, is a class of problems that are, informally, "at least as hard as the hardest problems in NP". A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H...
.
Rule-based algorithms
The best computer algorithms are those that emulate an experienced human cartographer. Over centuries, cartographers have developed the art of mapmaking and label placement. For example, an experienced cartographer repeats road names several times for long roads, instead of placing them once, or in the case of Ocean City depicted by a point very close to the shore, the cartographer would place the label "Ocean City" over the water to emphasize that it is a coastal town.Cartographers work based on accepted conventions and rules and they place labels in order of importance. For example, New York City, Vienna, Berlin, Paris, or Tokyo must show up on country maps because they are high-priority labels. Once those are placed, the cartographer places the next most important class of labels, for example major roads, rivers, and other large cities. In every step they ensure that (1) the text is placed in a way that the reader easily associates it with the feature, and (2) the label does not overlap with those already placed on the map.
Other algorithms
The simplest greedy algorithmGreedy algorithm
A greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stagewith the hope of finding the global optimum....
places consecutive labels on the map in positions that result in minimal overlap of labels. Its results are not satisfactory even for very simple problems, but it is extremely fast.
Slightly more complex algorithms rely on local optimization to reach a local optimum of a placement evaluation function - in each iteration placement of a single label is moved to another position, and if it improves the result, the move is preserved. It performs reasonably well for maps that are not too densely labelled. Slightly more complex variations try moving 2 or more labels at the same time. The algorithm ends after reaching some local optimum.
The algorithm that yields good results with relatively good performance - simulated annealing
Simulated annealing
Simulated annealing is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete...
- is very simple. It works like local optimization, but it may keep a change even if it worsens the result. The chance of keeping such a change is , where is the change in the evaluation function, and is the temperature. The temperature is gradually lowered according to the annealing schedule. When the temperature is high, simulated annealing performs almost random changes to the label placement, being able to escape a local optimum
Local optimum
Local optimum is a term in applied mathematics and computer science.A local optimum of a combinatorial optimization problem is a solution that is optimal within a neighboring set of solutions...
. Later, when hopefully a very good local optimum has been found, it behaves in a manner similar to local optimization. The main challenges in developing a simulated annealing solution are choosing a good evaluation function and a good annealing schedule. Generally too fast cooling will degrade the solution, and too slow cooling will degrade the performance, but the schedule is usually quite a complex algorithm, with more than just one parameter.
Another class of direct search algorithms are the various evolutionary algorithm
Evolutionary algorithm
In artificial intelligence, an evolutionary algorithm is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses some mechanisms inspired by biological evolution: reproduction, mutation, recombination, and selection...
s, e.g. genetic algorithm
Genetic algorithm
A genetic algorithm is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems...
s. Other algorithms are also used, like various graph solutions, integer programming etc.
One simple optimization that is important on real maps is dividing a set of labels into smaller sets that can be solved independently. Two labels are rivals if they can overlap in one of the possible placements. Transitive
Transitive relation
In mathematics, a binary relation R over a set X is transitive if whenever an element a is related to an element b, and b is in turn related to an element c, then a is also related to c....
closure of this relation divides the set of labels into possibly much smaller sets. On uniformly and densely labelled maps, usually the single set will contain the majority of labels, and on maps for which the labelling is not uniform it may bring very big performance benefits. For example when labelling a map of the world, America
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...
is labelled independently from Eurasia
Eurasia
Eurasia is a continent or supercontinent comprising the traditional continents of Europe and Asia ; covering about 52,990,000 km2 or about 10.6% of the Earth's surface located primarily in the eastern and northern hemispheres...
etc.
If a map labeling problem can be reduced to a situation in which each remaining label has only two potential positions in which it can be placed, then it may be solved efficiently by using an instance of 2-satisfiability
2-satisfiability
In computer science, 2-satisfiability is the problem of determining whether a collection of two-valued variables with constraints on pairs of variables can be assigned values satisfying all the constraints...
to find a placement avoiding any conflicting pairs of placements; several exact and approximate label placement algorithms for more complex types of problems are based on this principle.