Geohash
Encyclopedia
Geohash is a latitude
/longitude
geocode
system invented
by Gustavo Niemeyer when writing the web service at geohash.org, and
put into the public domain. It is a hierarchical spatial data structure
which subdivides space into buckets of grid
shape.
Geohashes offer properties like arbitrary precision and the possibility of
gradually removing characters from the end of the code to reduce its size
(and gradually lose precision).
As a consequence of the gradual precision degradation, nearby places will
often (but not always) present similar prefixes. Conversely, the longer a
shared prefix is, the closer the two places are.
s which uniquely identify positions on the Earth
, so that referencing them in email
s, forums
, and website
s is more convenient.
To obtain the Geohash, the user provides an address to be geocoded
, or latitude
and longitude
coordinates, in a single input box (most commonly used formats for latitude and longitude pairs are accepted), and performs the request.
Besides showing the latitude and longitude corresponding to the given Geohash, users who navigate to a Geohash at geohash.org are also presented with an embedded map, and may download a GPX file, or transfer the waypoint directly to certain GPS receivers. Links are also provided to external sites that may provide further details around the specified
location.
For example, the coordinate pair 57.64911,10.40744 (near the tip of the peninsula
of Jutland
, in Denmark
) produces a hash of u4pruydqqvj, which can be used in the URL http://geohash.org/u4pruydqqvj
Geohashes have also been proposed to be used for geotagging
.
When used in a database, the structure of geohashed data has two advantages. First, data indexed by geohash will have all points for a given rectangular area in contiguous slices (the number of slices depends on the precision required and the presence of geohash "fault lines"). This is especially useful in database systems where queries on a single index are much easier or faster than multiple-index queries. Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes.
This operation results in the bit
s 01101 11111 11000 00100 00010. Assuming that counting starts at 0 in the left side, the even bits are taken for the longitude code (0111110000000), while the odd bits are taken for the latitude code (101111001001).
again from the left to the right side. For the latitude value, the interval
-90 to +90 is divided by 2, producing two intervals: -90 to 0, and 0 to +90. Since
the first bit is 1, the higher interval is chosen, and becomes the current interval.
The procedure is repeated for all bits in the code. Finally, the latitude value is
the center of the resulting interval. Longitudes are processed in an equivalent way,
keeping in mind that the initial interval is -180 to +180.
Finishing the procedure should yield approximately latitude 42.6 and
longitude -5.6.
Each subsequent bit halves this error. This table shows the effect of each bit. At each stage, the relevant half of the range is highlighted in green - a low bit selects the lower range, a high bit the upper range.
The last column shows the latitude, simply the mean value of the range. Each subsequent bit makes this value more precise.
(The numbers in the above table have been rounded to 3 decimal places for clarity)
Final rounding should be done carefully in a way that
So if rounding 42.605 to 42.61 or 42.6 is correct, rounding to 43 it is not.
locations close to each other but on opposite sides of the Equator
or a meridian
can result in Geohash codes with no common prefix.
Secondly a geohash essentially defines a bounding box within which a location lies, therefore two locations may be spatially very close but have different geohashes. In order to be useful to proximity searches, the surrounding eight geohashes of a geohash must be calculated and the locations matching these pulled out, therefore complicating potential usage in proximity search
es.
Despite those issues, there are possible workarounds, and the algorithm has been successfully used in MongoDB to implement proximity searches.
by its inventor in
the public announcement date, on February 26, 2008.
While comparable algorithms have been successfully
patented and
had copyright claimed upon, GeoHash is based on an entirely different algorithm and approach.
Latitude
In geography, the latitude of a location on the Earth is the angular distance of that location south or north of the Equator. The latitude is an angle, and is usually measured in degrees . The equator has a latitude of 0°, the North pole has a latitude of 90° north , and the South pole has a...
/longitude
Longitude
Longitude is a geographic coordinate that specifies the east-west position of a point on the Earth's surface. It is an angular measurement, usually expressed in degrees, minutes and seconds, and denoted by the Greek letter lambda ....
geocode
Geocoding
Geocoding is the process of finding associated geographic coordinates from other geographic data, such as street addresses, or zip codes...
system invented
by Gustavo Niemeyer when writing the web service at geohash.org, and
put into the public domain. It is a hierarchical spatial data structure
which subdivides space into buckets of grid
Grid (spatial index)
In the context of a spatial index, a grid is a regular tessellation of a manifold or 2-D surface that divides it into a series of contiguous cells, which can then be assigned unique identifiers and used for spatial indexing purposes...
shape.
Geohashes offer properties like arbitrary precision and the possibility of
gradually removing characters from the end of the code to reduce its size
(and gradually lose precision).
As a consequence of the gradual precision degradation, nearby places will
often (but not always) present similar prefixes. Conversely, the longer a
shared prefix is, the closer the two places are.
Service
The purpose of the geohash.org service, launched in February 2008, is to offer short URLUniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
s which uniquely identify positions on the Earth
Earth
Earth is the third planet from the Sun, and the densest and fifth-largest of the eight planets in the Solar System. It is also the largest of the Solar System's four terrestrial planets...
, so that referencing them in email
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...
s, forums
Internet forum
An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are at least temporarily archived...
, and website
Website
A website, also written as Web site, web site, or simply site, is a collection of related web pages containing images, videos or other digital assets. A website is hosted on at least one web server, accessible via a network such as the Internet or a private local area network through an Internet...
s is more convenient.
To obtain the Geohash, the user provides an address to be geocoded
Geocoding
Geocoding is the process of finding associated geographic coordinates from other geographic data, such as street addresses, or zip codes...
, or latitude
Latitude
In geography, the latitude of a location on the Earth is the angular distance of that location south or north of the Equator. The latitude is an angle, and is usually measured in degrees . The equator has a latitude of 0°, the North pole has a latitude of 90° north , and the South pole has a...
and longitude
Longitude
Longitude is a geographic coordinate that specifies the east-west position of a point on the Earth's surface. It is an angular measurement, usually expressed in degrees, minutes and seconds, and denoted by the Greek letter lambda ....
coordinates, in a single input box (most commonly used formats for latitude and longitude pairs are accepted), and performs the request.
Besides showing the latitude and longitude corresponding to the given Geohash, users who navigate to a Geohash at geohash.org are also presented with an embedded map, and may download a GPX file, or transfer the waypoint directly to certain GPS receivers. Links are also provided to external sites that may provide further details around the specified
location.
For example, the coordinate pair 57.64911,10.40744 (near the tip of the peninsula
Peninsula
A peninsula is a piece of land that is bordered by water on three sides but connected to mainland. In many Germanic and Celtic languages and also in Baltic, Slavic and Hungarian, peninsulas are called "half-islands"....
of Jutland
Jutland
Jutland , historically also called Cimbria, is the name of the peninsula that juts out in Northern Europe toward the rest of Scandinavia, forming the mainland part of Denmark. It has the North Sea to its west, Kattegat and Skagerrak to its north, the Baltic Sea to its east, and the Danish–German...
, in Denmark
Denmark
Denmark is a Scandinavian country in Northern Europe. The countries of Denmark and Greenland, as well as the Faroe Islands, constitute the Kingdom of Denmark . It is the southernmost of the Nordic countries, southwest of Sweden and south of Norway, and bordered to the south by Germany. Denmark...
) produces a hash of u4pruydqqvj, which can be used in the URL http://geohash.org/u4pruydqqvj
Uses
The main usages of Geohashes are- as a unique identifier.
- represent point data e.g. in databases.
Geohashes have also been proposed to be used for geotagging
GeoTagging
Geotagging is the process of adding geographical identification metadata to various media such as a geotagged photograph or video, websites, SMS messages, QR Codes or RSS feeds and is a form of geospatial metadata...
.
When used in a database, the structure of geohashed data has two advantages. First, data indexed by geohash will have all points for a given rectangular area in contiguous slices (the number of slices depends on the precision required and the presence of geohash "fault lines"). This is especially useful in database systems where queries on a single index are much easier or faster than multiple-index queries. Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes.
Example
Using the hash ezs42 as an example, here is how it is decoded into a decimal latitude and longitudeDecode from base 32
The first step is decoding it from base 32 using the following character map:Decimal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Base 32 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | b | c | d | e | f | g | |||
Decimal | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | |||
Base 32 | h | j | k | m | n | p | q | r | s | t | u | v | w | x | y | z |
This operation results in the bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s 01101 11111 11000 00100 00010. Assuming that counting starts at 0 in the left side, the even bits are taken for the longitude code (0111110000000), while the odd bits are taken for the latitude code (101111001001).
Decode binary to decimal
Each binary code is then used in a series of divisions, considering one bit at a time,again from the left to the right side. For the latitude value, the interval
-90 to +90 is divided by 2, producing two intervals: -90 to 0, and 0 to +90. Since
the first bit is 1, the higher interval is chosen, and becomes the current interval.
The procedure is repeated for all bits in the code. Finally, the latitude value is
the center of the resulting interval. Longitudes are processed in an equivalent way,
keeping in mind that the initial interval is -180 to +180.
Finishing the procedure should yield approximately latitude 42.6 and
longitude -5.6.
Worked example
Here's a worked example decoding 101111001001 into 42.6. To start with, we know the latitude is somewhere in the range -90 to 90. With no bits, we'd have to guess the latitude was 0, giving us an error of ±90. With one bit, we can decide whether its in the range -90 to 0, or 0 to 90. The first bit is high, so we know our latitude is somewhere between 0 and 90. Without any more bits, we'd guess the latitude was 45, giving us an error of ±45Each subsequent bit halves this error. This table shows the effect of each bit. At each stage, the relevant half of the range is highlighted in green - a low bit selects the lower range, a high bit the upper range.
The last column shows the latitude, simply the mean value of the range. Each subsequent bit makes this value more precise.
bit | min | mid | max | val | err |
---|---|---|---|---|---|
1 | -90.000 | 0.000 | 90.000 | 45.000 | 45.000 |
0 | 0.000 | 45.000 | 90.000 | 22.500 | 22.500 |
1 | 0.000 | 22.500 | 45.000 | 33.750 | 11.250 |
1 | 22.500 | 33.750 | 45.000 | 39.375 | 5.625 |
1 | 33.750 | 39.375 | 45.000 | 42.188 | 2.813 |
1 | 39.375 | 42.188 | 45.000 | 43.594 | 1.406 |
0 | 42.188 | 43.594 | 45.000 | 42.891 | 0.703 |
0 | 42.188 | 42.891 | 43.594 | 42.539 | 0.352 |
1 | 42.188 | 42.539 | 42.891 | 42.715 | 0.176 |
0 | 42.539 | 42.715 | 42.891 | 42.627 | 0.088 |
0 | 42.539 | 42.627 | 42.715 | 42.583 | 0.044 |
1 | 42.539 | 42.583 | 42.627 | 42.605 | 0.022 |
(The numbers in the above table have been rounded to 3 decimal places for clarity)
Final rounding should be done carefully in a way that
So if rounding 42.605 to 42.61 or 42.6 is correct, rounding to 43 it is not.
geohash length | lat bits | lng bits | lat error | lng error | km error |
---|---|---|---|---|---|
1 | 2 | 3 | ±23 | ±23 | ±2500 |
2 | 5 | 5 | ± 2.8 | ± 5.6 | ±630 |
3 | 7 | 8 | ± 0.70 | ± 0.7 | ±78 |
4 | 10 | 10 | ± 0.087 | ± 0.18 | ±20 |
5 | 12 | 13 | ± 0.022 | ± 0.022 | ±2.4 |
6 | 15 | 15 | ± 0.0027 | ± 0.0055 | ±0.61 |
7 | 17 | 18 | ±0.00068 | ±0.00068 | ±0.076 |
8 | 20 | 20 | ±0.000085 | ±0.00017 | ±0.019 |
Limitations
One limitation of the Geohash algorithm is in attempting to utilize it to find points in proximity to each other based on a common prefix. Edge caseEdge case
An edge case is a problem or situation that occurs only at an extreme operating parameter.For example, a stereo speaker might distort audio when played at its maximum rated volume, even in the absence of other extreme settings or conditions.An edge case can be expected or unexpected...
locations close to each other but on opposite sides of the Equator
Equator
An equator is the intersection of a sphere's surface with the plane perpendicular to the sphere's axis of rotation and containing the sphere's center of mass....
or a meridian
Meridian (geography)
A meridian is an imaginary line on the Earth's surface from the North Pole to the South Pole that connects all locations along it with a given longitude. The position of a point along the meridian is given by its latitude. Each meridian is perpendicular to all circles of latitude...
can result in Geohash codes with no common prefix.
Secondly a geohash essentially defines a bounding box within which a location lies, therefore two locations may be spatially very close but have different geohashes. In order to be useful to proximity searches, the surrounding eight geohashes of a geohash must be calculated and the locations matching these pulled out, therefore complicating potential usage in proximity search
Proximity search
In text processing, a proximity search looks for documents where two or more separately matching term occurrences are within a specified distance, where distance is the number of intermediate words or characters...
es.
Despite those issues, there are possible workarounds, and the algorithm has been successfully used in MongoDB to implement proximity searches.
License and patents
The Geohash geocode has been put in the public domainPublic domain
Works are in the public domain if the intellectual property rights have expired, if the intellectual property rights are forfeited, or if they are not covered by intellectual property rights at all...
by its inventor in
the public announcement date, on February 26, 2008.
While comparable algorithms have been successfully
patented and
had copyright claimed upon, GeoHash is based on an entirely different algorithm and approach.
See also
- Grid (spatial index)Grid (spatial index)In the context of a spatial index, a grid is a regular tessellation of a manifold or 2-D surface that divides it into a series of contiguous cells, which can then be assigned unique identifiers and used for spatial indexing purposes...
- Morton number (number theory)
- Natural Area CodeNatural Area CodeThe Natural Area Code is a proprietary geocode system for identifying an area anywhere on the Earth, or a volume of space anywhere around the Earth...
- Maidenhead Locator SystemMaidenhead Locator SystemThe Maidenhead Locator System is a geographic coordinate system used by amateur radio operators. Dr. John Morris, G4ANB, originally devised the system, and a group of VHF managers, meeting in Maidenhead, England in 1980, adopted it...
- Military grid reference systemMilitary grid reference systemThe Military Grid Reference System is the geocoordinate standard used by NATO militaries for locating points on the earth. The MGRS is derived from the UTM grid system and the UPS grid system, but uses a different labeling convention...
External links
- geohash.org
- Perl module to interact with geohash.org
- Java classes for encoding and decoding Geohashes without interacting with geohash.org
- https://sourceforge.net/projects/jgeohash/files/1.3.8/ Java library for encoding and decoding Geohashes without interacting with geohash.org and find the adjacent
- Another implementation of Geohash in Java containing quite some more features
- Javascript module for encoding and decoding geohashes without interacting with geohash.org and Demo
- MySQL Function for encoding and decoding geohash
- PostGIS function that returns a geohash representation of a geometry
- Ruby gem for encoding and decoding geohashes without interacting with geohash.org
- Perl module for encoding and decoding geohashes without interacting with geohash.org
- PHP class for encoding and decoding geohashes without interacting with geohash.org
- Python module for encoding and decoding geohashes without interacting with geohash.org
- Haskell library for encoding and decoding geohashes without interacting with geohash.org
- Ocaml module for encoding and decoding geohashes without interacting with geohash.org
- Proposal for a similar encoding and associated 'geo' URL scheme
- http://loc.is - a geohash designed for analytics.
- kml file for Google Earth displaying geohash grid
- ngeohash, nodejs module for encoding and decoding geohash
- geoplace.it: a free service to reference positions using an approach different than geohash.org and with a more accurate service