925 lines
30 KiB
Plaintext
925 lines
30 KiB
Plaintext
|
// echo DESIGN.adoc | entr sh -c "asciidoctor DESIGN.adoc; printf 'Done\n'"
|
|||
|
:toc:
|
|||
|
:nofooter:
|
|||
|
:!webfonts:
|
|||
|
:source-highlighter: rouge
|
|||
|
:rouge-style: molokai
|
|||
|
// :source-linenums-option:
|
|||
|
:stem:
|
|||
|
|
|||
|
= Design
|
|||
|
|
|||
|
++++
|
|||
|
<!--
|
|||
|
<meta http-equiv="refresh" content="60" />
|
|||
|
-->
|
|||
|
++++
|
|||
|
|
|||
|
The goal of this document is to walk through how the design was chosen.
|
|||
|
|
|||
|
== 10,000 meter view
|
|||
|
|
|||
|
This project allows anyone to address ~1 square meter of land by bringing together:
|
|||
|
|
|||
|
. Memorizability (the format is encoded like an address; ex: `2891 APPLE SPONGE BALLOON`)
|
|||
|
. Accuracy (between `0.73` and `1.52` square meter resolution; no point is too close or too far from any other point)
|
|||
|
. Accessible via multiple interfaces, open source, and offline operation
|
|||
|
|
|||
|
Using these tricks:
|
|||
|
|
|||
|
. Using Hilbert curves to map the 2-dimesnional surface area of earth to a 1-dimentional integer, thanks to Google's link:https://s2geometry.io/[S2 Geometry^] addressing scheme, and the unofficial Rust link:https://lib.rs/crates/s2[s2^] crate
|
|||
|
. Encoding the entire space stem:[4.22*10^14] points using a small wordlist so only very common words are used
|
|||
|
. Efficiency
|
|||
|
|
|||
|
== Goals [[goals]]
|
|||
|
|
|||
|
The list is in order of importance
|
|||
|
|
|||
|
. Create a memorizable address-like mapping of human-scale. Ex: houses, shops, roads
|
|||
|
+
|
|||
|
About 1 square meter points on Earth
|
|||
|
. Accessible, open source, and offline. The only requirement of this project should be to have a computing device.
|
|||
|
** Provide many user-friendly interfaces for *all* common uses cases
|
|||
|
** Decode/encode to standards like UTM and lat/lon
|
|||
|
** Small binary, with low minimum CPU requirements
|
|||
|
** Minimum (including zero) network traffic if possible
|
|||
|
. Optimize memorizability
|
|||
|
** Use only common words by using a smaller wordlist
|
|||
|
** Map homonyms and singluar/plural words to identical values
|
|||
|
. Equally distribute points
|
|||
|
.. The internal data structure can, therefore, not be lat/lon directly (see <<algorithm>>)
|
|||
|
. Locality (TODO: I don't know if this is a good idea or not)
|
|||
|
|
|||
|
=== Non-goals [[non-goals]]
|
|||
|
|
|||
|
. By-hand encoding/decoding
|
|||
|
** This is a nice feature for Xaddress, but it is not feasable with my algorithm
|
|||
|
. Variable-resolution grid size
|
|||
|
** It is not important that this algorithm can support higher- or lower-resolution mappings. The resolution is fixed
|
|||
|
|
|||
|
== Comparison [[comparison]]
|
|||
|
|
|||
|
Yes, this is yet another standard, but I believe it has significant improvements that no other existing algorithm can provide.
|
|||
|
|
|||
|
For a detailed comparison, go link:https://wiki.openstreetmap.org/wiki/What3words[here^] instead
|
|||
|
|
|||
|
.Comparison with similar algorithms
|
|||
|
[%header,cols="h,,,,"]
|
|||
|
|===
|
|||
|
|
|
|||
|
|this_algorithm
|
|||
|
|link:https://what3words.com/[what3words^]
|
|||
|
|link:https://xaddress.org/[Xaddress^]
|
|||
|
|link:https://maps.google.com/pluscodes/[Plus Codes^]/link:https://en.wikipedia.org/wiki/Open_Location_Code[Open Location Codes^]
|
|||
|
|
|||
|
|Format
|
|||
|
m|2891 APPLE SPONGE BALLOON
|
|||
|
m|///clip.apples.leap
|
|||
|
m|7150 MAGICAL PEARL
|
|||
|
m|849VCWC8+R9
|
|||
|
|
|||
|
|Open Source
|
|||
|
|Yes
|
|||
|
|No
|
|||
|
|Yes
|
|||
|
|Yes
|
|||
|
|
|||
|
|Memorizable
|
|||
|
footnote:[This is subjective, of course. I am defining this to mean similar enough to an address, which I consider memorizable]
|
|||
|
|Yes
|
|||
|
|Yes (is shorter than this_algorithm)
|
|||
|
|Yes (is shorter than this_algorithm)
|
|||
|
|No
|
|||
|
|
|||
|
|Small wordlist
|
|||
|
footnote:[This is important for a few reasons.
|
|||
|
Firstly, smaller wordlist means you can exclusively use very comon words (for example, clicking around in 30 seconds on what3words, link:https://what3words.com/rampage.unanswerable.desirability[`///balaclava.jostles.ghoulish`^] was found as an address.
|
|||
|
These words are not commonly used).
|
|||
|
Secondly, it is easier to translate the common words to other languages.
|
|||
|
Thirdly, it allows plural words and homonyms to be mapped to the same point easily.]
|
|||
|
|Yes (Under 5000)
|
|||
|
|No (25,000-40,000)
|
|||
|
|No (~200k)
|
|||
|
|N/A
|
|||
|
|
|||
|
|Compact (can be recorded in a small number of bytes)
|
|||
|
|No
|
|||
|
|No
|
|||
|
|No
|
|||
|
|Yes
|
|||
|
|
|||
|
|Relative Uniform Grid size
|
|||
|
footnote:[The varaince in distance between two points is low.
|
|||
|
Non-uniform grid size usually comes from mapping linearly to latitude/longitude.
|
|||
|
For example, 1deg lon at the equator is a much longer distance than 1deg lon near a pole
|
|||
|
]
|
|||
|
|Yes (Points range from .73m^2^ to 1.52m^2^)
|
|||
|
footnote:[Mappings are uniform in Hilbert space, but variance comes from reprojecting back to Earth's non-spherical surface]
|
|||
|
footnote:[This is because a link:https://en.wikipedia.org/wiki/Hilbert_curve[Hilbert curve^] is used to evenly distribute points instead of mapping linearly to latitude/longitude.
|
|||
|
See link:https://s2geometry.io/resources/s2cell_statistics[S2 Cell Statistics^] (level 23) for more.]
|
|||
|
|?
|
|||
|
footnote:[TODO: Does anybody know the answer to this?]
|
|||
|
|No
|
|||
|
(11 meters at the equator and .1 meters at the poles)
|
|||
|
footnote:[Minimum point resolution is lat 0.0001, lon 0.0001. Multiplying the circumference at the equator, stem:[4.0*10^7*.0001/360=11] meters at the equator and the circle around 90deg north 11m in radius, stem:[11*2*3.14*.0001/360=1.9*10^-5] meters (though Xaddress is not addressable here since it is not a country)
|
|||
|
]
|
|||
|
|N/A?
|
|||
|
footnote:[TODO: I don't know]
|
|||
|
|
|||
|
|Whole-Earth coverage
|
|||
|
|Yes
|
|||
|
|Yes
|
|||
|
|No
|
|||
|
|Yes
|
|||
|
|
|||
|
|Encode/decode offline
|
|||
|
|Yes
|
|||
|
|Yes (closed-source app is required)
|
|||
|
|Yes (though no app exists)
|
|||
|
|Yes
|
|||
|
|
|||
|
|Locality (similar addresses are physically nearby)
|
|||
|
|? (This is being considered. Either option is possible)
|
|||
|
|No
|
|||
|
|No
|
|||
|
|?
|
|||
|
|
|||
|
|===
|
|||
|
|
|||
|
== Algorithm [[algorithm]]
|
|||
|
|
|||
|
The algorithm was selected by thinking through each of these parts in order.
|
|||
|
|
|||
|
=== Addressing Scheme [[addressing-scheme]]
|
|||
|
|
|||
|
The original idea for creating a new algorithm is the issue with existing, similar projects that map to latitude, longitude.
|
|||
|
Mapping linearly to polar coordinates like lat, lon returns a non-uniform distribution of points.
|
|||
|
|
|||
|
The arc length of 1° around the equator is
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
d_(0 text(°))=d_(text(circumference)) * theta / (360 text(°))
|
|||
|
= (4.01*10^7 text(m)) / (360 text(°))
|
|||
|
= 1.11*10^5 text(m)
|
|||
|
++++
|
|||
|
|
|||
|
While the arc length of 1° at latitude 89.9° is approximately
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
d_(89.9 text(°))
|
|||
|
= cos(89.9 text(°)) * d_(text(circumference)) * theta / (360 text(°))
|
|||
|
= 194 text(m)
|
|||
|
++++
|
|||
|
|
|||
|
Any algorithm that is linearly lat/lon-based does not have a uniform distribution of points.
|
|||
|
|
|||
|
Xaddress does not address this and what3words created a proprietary algorithm so that more-dense locations have more points.
|
|||
|
The aim of this algorithm is to fairly distribute points across the globe, and encode to a reasonable resolution that is as functional or more-functional than what3words and Xaddress.
|
|||
|
|
|||
|
The addressing scheme I want to use, in principle, maps Earth's surface to a link:https://en.wikipedia.org/wiki/Hilbert_curve[Hilbert curve^].
|
|||
|
This is done by treating Earth as a perfect sphere, then projecting a Hilbert curve on each of the 6 faces of the bounding box (cube) of the sphere.
|
|||
|
A more graphical representation of this Earth Cube can be found on Google's S2 Geometry website link:https://s2geometry.io/resources/earthcube[here^].
|
|||
|
This application only uses the addressing scheme of the S2 geometry library, and no other features;
|
|||
|
|
|||
|
Next, the coordinates are reprojected a few times in order to more accurately represent Earth's surface.
|
|||
|
Again, all of these come from link:https://s2geometry.io[S2].
|
|||
|
They are described link:https://s2geometry.io/devguide/s2cell_hierarchy#coordinate-systems[here^].
|
|||
|
|
|||
|
.Hilbert curves projected and transformed onto Earth's surface. From link:https://s2.sidewalklabs.com/planetaryview/[sidewalklabs planetary view^]
|
|||
|
image::./32g2t7pp.png[Image of Earth with 6 Hilbert curve projections]
|
|||
|
|
|||
|
=== Search space [[search-space]]
|
|||
|
|
|||
|
S2 addresses map to a cell at a given level. Two cell level candidates were chosen due to their size and average resolution.
|
|||
|
|
|||
|
.Statistics of the two candidates for cell levels for this project. More can be seen link:https://s2geometry.io/resources/s2cell_statistics[here^]
|
|||
|
|===
|
|||
|
|Level |Min area |Max area |Average area |Number of cells
|
|||
|
|
|||
|
|22
|
|||
|
|2.90m^2^
|
|||
|
|6.08m^2^
|
|||
|
|4.83m^2^
|
|||
|
|1.05*10^14^
|
|||
|
|
|||
|
|23
|
|||
|
|0.73m^2^
|
|||
|
|1.52m^2^
|
|||
|
|1.21m^2^
|
|||
|
|4.22*10^14^
|
|||
|
|
|||
|
|===
|
|||
|
|
|||
|
The last column indicates the number of cells that need to be mappable by this algorithm.
|
|||
|
That is, I need a format that has values that map to all 1.05*10^14^ or 4.22*10^14^ cells.
|
|||
|
|
|||
|
=== Wordlist Size [[wordlist-size]]
|
|||
|
|
|||
|
Goal 1 of this project is to make the address memorizable.
|
|||
|
Using only digits or letters is not memorizable, and there are already better methods to map to latlon (just use lat/lon if you only want to use numbers; use Plus Codes if you only want to use letters and numbers).
|
|||
|
|
|||
|
A wordlist would be a good candidate for memorizability since current addresses are word-based.
|
|||
|
Xaddress has a similar idea - map a number and some words together to make a memorizable address.
|
|||
|
|
|||
|
In order to find a good wordlist that can map to all cases, a format needs to be selected.
|
|||
|
Given some format of known values, the minimum size of the wordlist can be found.
|
|||
|
|
|||
|
For example, if I wanted to see how many words I need to support level 22 with a format like `1234 WORD1 WORD2` (4-digit number, then 1 word, then a second word), I could compute
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
n_(22)
|
|||
|
= 1.05*10^14
|
|||
|
= 10^5 * n_(text(w)) * n_(text(w))
|
|||
|
++++
|
|||
|
|
|||
|
Where stem:[n_(22)] is the number of words in level 22, stem:[10^5] is the number of possible 4-digit numbers (0000-9999), and stem:[n_(text(w))] is the number of words in the wordlist.
|
|||
|
Therefore, the number of words required in a wordlist to support this would be
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
1.05*10^14 = 10^5 * n_(text(w)) * n_(text(w))
|
|||
|
= 10^5 * n_(text(w))^2
|
|||
|
++++
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
n_(text(w)) = sqrt(1.05*10^14 / 10^5)
|
|||
|
= 1.025*10^5
|
|||
|
++++
|
|||
|
|
|||
|
I would need a wordlist of 102,500 to support an algorithm like this.
|
|||
|
I compare the wordlist sizes for different formats below.
|
|||
|
|
|||
|
The general formula is
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
n_(text(w)) = (text(total_combinations)/text(num_prefix))^(1/text(num_words))
|
|||
|
++++
|
|||
|
|
|||
|
Where stem:[text(num_prefix)] is the number of number/letter combinations in the prefix and stem:text(total_combinations) is the total numer of results stem:[n_(22)] or stem:[n_(23)].
|
|||
|
|
|||
|
.Comparison of approximate wordlist size for different formats, sorted by wordlist size
|
|||
|
[%header,cols="m,,,"]
|
|||
|
|===
|
|||
|
|Format |n~23~ (thousand) |n~22~ (thousand) |Consider?
|
|||
|
|
|||
|
|999 WORD WORD
|
|||
|
|649.62
|
|||
|
|324.04
|
|||
|
|No, wordlist too big
|
|||
|
|
|||
|
|(0-999)(A-Z0-9) WORD WORD
|
|||
|
|108.27
|
|||
|
|54.01
|
|||
|
|No, wordlist too big
|
|||
|
|
|||
|
|(0-999)(A-Z0-9)(A-Z0-9) WORD WORD
|
|||
|
|18.04
|
|||
|
|9.00
|
|||
|
|No, wordlist too big
|
|||
|
|
|||
|
|(1-128) WORD WORD WORD
|
|||
|
|14.88
|
|||
|
|9.36
|
|||
|
|No, restricting numbers to 128 is not worth it
|
|||
|
|
|||
|
|999 WORD WORD WORD
|
|||
|
|7.50
|
|||
|
|4.72
|
|||
|
|Maybe
|
|||
|
|
|||
|
|(1-1024) WORD WORD WORD
|
|||
|
|7.44
|
|||
|
|4.68
|
|||
|
|Maybe
|
|||
|
|
|||
|
|WORD WORD WORD WORD
|
|||
|
|4.53
|
|||
|
|3.20
|
|||
|
|No, this does not look like an address
|
|||
|
|
|||
|
|9999 WORD WORD WORD
|
|||
|
|3.48
|
|||
|
|2.19
|
|||
|
|Maybe
|
|||
|
|
|||
|
|===
|
|||
|
|
|||
|
NOTE: This project will use the `(1-1024) WORD1 WORD2 WORD3` variation (1 number component, and 3 word component).
|
|||
|
|
|||
|
It is longer than Xaddress and what3words, but with the tradeoff of having a significantly smaller dictionary than both.
|
|||
|
It requires a larger wordlist than the `9999 WORD WORD WORD` variation, but it allows versioning.
|
|||
|
|
|||
|
=== Reversibility [[reversibility]]
|
|||
|
|
|||
|
Xaddress allows addresses to be encoded as either `WORD1 WORD2 0000` or `0000 WORD2 WORD1`.
|
|||
|
This might make more sense in loactions where numbers might come before word portions of addresses.
|
|||
|
TODO: I need better reasoning here with examples.
|
|||
|
|
|||
|
NOTE: This algorithm will allow exactly two encoding types.
|
|||
|
Both `0000 WORD1 WORD2 WORD3` and `WORD3 WORD2 WORD1 0000` formats are equivalent.
|
|||
|
|
|||
|
=== Versioning [[versioning]]
|
|||
|
|
|||
|
If there is ever any version update, the protocol should be able to support it in some fashion.
|
|||
|
Old addresses should be decodable by new decoders and should also be able to report that they cannot decode new versions.
|
|||
|
|
|||
|
Therefore, some version identifier should be embedded within the code.
|
|||
|
|
|||
|
I believe 2 bits for version information is sufficient.
|
|||
|
These versions allow for adding different languages, adding different processing techniques to words, or any other generic change since it is read first.
|
|||
|
|
|||
|
NOTE: Least significant bits 11-12 in the number component will be used for versioning when the number component is parsed as a 32-bit unsigned integer.
|
|||
|
|
|||
|
For this version, version 0, all three bits will be 0.
|
|||
|
For example, the bits responsible for determining the algorithm version of this project for the address `382 WORD WORD WORD` are:
|
|||
|
|
|||
|
[source]
|
|||
|
----
|
|||
|
# 382 parsed as a 32-bit unsigned integer yields:
|
|||
|
0000 0000 0000 0000 0000 0001 0111 1110
|
|||
|
# Version bits are: ^^
|
|||
|
# Data bits are: ^^ ^^^^ ^^^^
|
|||
|
|
|||
|
# Therefore, this address is using version 0
|
|||
|
----
|
|||
|
|
|||
|
=== Locality [[locality]]
|
|||
|
|
|||
|
Due to the use of a Hilbert curve, points on the CellID mapping (immediate step before encoding as `999 WORD WORD WORD`), it is posible that nearby addresses can have similiar addresses.
|
|||
|
This is similar to the real world where `123 Random Road, Washington D.C.` is close to `222 Random Road, Washington D.C.`.
|
|||
|
|
|||
|
This is not always a good idea.
|
|||
|
For example, imagine if the wordlist did not have distinct enuogh words.
|
|||
|
The address `111 word word word` will have a similar location to `111 words word word`, which may cause confusion, whereas if they were in significantly distant locations, there would be less confusion.
|
|||
|
|
|||
|
There are a few options
|
|||
|
|
|||
|
[cols=',a,a,a']
|
|||
|
|===
|
|||
|
|Option |Example |Pros |Cons
|
|||
|
|
|||
|
|Intentionally scramble addresses to avoid locality
|
|||
|
|Knowing that CellID-encoded addresses have locality, Intentionally randomize bit order or word order so that nearby locations intentionally have significantly different addresses.
|
|||
|
|
|||
|
`1234 APPLE GRAPE ORANGE` and `1234 SPONGE GREEN FACE` may be close together or may be far apart
|
|||
|
|* Users will be used to drastically different addresses, so there will be no confusion for close addresses to be dissimiliar or distant addresses being similar
|
|||
|
|* Loses ability for humans to see nearby locations from just looking at the address alone
|
|||
|
|
|||
|
|Preserve locality in some components
|
|||
|
|`1234 APPLE GRAPE ORANGE` and `1234 SPONGE GREEN ORANGE` are relatively close together because `ORANGE` is equivalent and `GRAPE` and `GREEN` are similar
|
|||
|
For example, words 2 and 3 could be analogous to country and state.
|
|||
|
|* Allows for some form of at-a-glance distance comparison
|
|||
|
|* Might lead to less confusion if close addresses are not always given the similar names
|
|||
|
|
|||
|
|Similarity implies locality in all components.
|
|||
|
|`1234 APPLE GRAPE ORANGE` and `1234 SPONGE GREEN ORANGE` are somewhat close together because `ORANGE` is equivalent and `GRAPE` and `GREEN` are similar
|
|||
|
For example, the digits could be the smallest resolution on a per-adjacent cell basis and word1, word2, and word3 can be analogous to a local city, state, and country respectively
|
|||
|
|* Simple implementation - just alphabetize the wordlits
|
|||
|
* Allows for rough estimations of closeness
|
|||
|
* It might be easier to memorize multiple locations if only some components change for nearby areas
|
|||
|
|* (TODO: Confirm) Locality is broken at the prime meridian, so close locations to the prime meridian will have significantly different addresses
|
|||
|
* Confusing property of similar addressess indicating closeness, but differing addresses not necessarily indicating distance
|
|||
|
|
|||
|
|Sameness implies locality in all components + scramble addresses
|
|||
|
|`1234 APPLE GRAPE ORANGE` and `1234 SPONGE GREEN ORANGE` are somewhat close together because `ORANGE` is equivalent, but that is the only equivalent component, so nothing else can be inferred
|
|||
|
|* May reduce confusion when trying to memorize close locations (as it may be hard to remember many addresses with very similar, but not the same, words)
|
|||
|
* Simple implementation - just randomize the wordlist
|
|||
|
* Behaves similarly to addresses in United States where city, state, zip code, and country are all included
|
|||
|
* No confusing property of dissimilar addresses not representing nearby locations
|
|||
|
|* ?
|
|||
|
|
|||
|
|===
|
|||
|
|
|||
|
The domain of each component of the address `0000 WORD1 WORD2 WORD3` is as follows:
|
|||
|
|
|||
|
. `0000` - Responsible for the least significant bits in the encoded layout (smallest area)
|
|||
|
. `WORD1`
|
|||
|
. `WORD2`
|
|||
|
. `WORD3` - Responsible for the most significant bits in the encoded layout (largest area)
|
|||
|
|
|||
|
NOTE: This algorithm will preserve locality in all components by requiring that sameness implies locality in every component.
|
|||
|
|
|||
|
See <<encoded-layout>> for how this algorithm will implement locality specifically.
|
|||
|
|
|||
|
=== Encoded Layout [[encoded-layout]]
|
|||
|
|
|||
|
The layout of the encoded address determines what bits map to which parts of the decoded string.
|
|||
|
|
|||
|
The CellID is a 64-bit unsigned integer and is the representation directly adjacent to the final address-like encoding.
|
|||
|
CellIDs are represented as:
|
|||
|
|
|||
|
[source]
|
|||
|
----
|
|||
|
Example: Face 2, level 23
|
|||
|
|
|||
|
# Most significant 3 bits are for the face
|
|||
|
face_number = 0b010
|
|||
|
|
|||
|
# This algorithm is always level 23
|
|||
|
data_bits = level * 2 = 23 * 2 = 46
|
|||
|
|
|||
|
# The bit after the data bits is always 1
|
|||
|
# All subsequent bits are always 0
|
|||
|
|
|||
|
Bit : 64 48 32 16 1
|
|||
|
: | | | | |
|
|||
|
: 01001011101010001011100010010011 1001001100100l001100000000000000
|
|||
|
Face number : ^^^
|
|||
|
Data bits : ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
|
|||
|
Bit after data bits (1) : ^
|
|||
|
All remaining bits (0) : ^^^^^^^^^^^^^^
|
|||
|
----
|
|||
|
|
|||
|
There are 6 faces at the top level encoding, which takes 3 bits to represent.
|
|||
|
There are 2 divisions (1 horizontal and 1 vertical division) per level, which take up 2 bits per level
|
|||
|
|
|||
|
The number of bits that encode the actual address is therefore
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
n_text(total_bits_required)
|
|||
|
= n_text(face_bits) + n_text(subdivision_bits)
|
|||
|
= 3 + 2*l
|
|||
|
++++
|
|||
|
|
|||
|
Where stem:[l] is the subdivision level, such as 22 or 23.
|
|||
|
|
|||
|
Therefore, there are stem:[3+23(2)=49] bits required to represent level 23 and stem:[3+22(2)=47] bits required to represent level 22.
|
|||
|
|
|||
|
NOTE: Level 23 will be used for this project, which requires 49 bits to fully represent.
|
|||
|
This does not include the <<versioning>> bits.
|
|||
|
|
|||
|
Since this algorithm will only ever use level 23, we know that bit 15 will always be 1 and bits 1-14 will always be 0, so these can be excluded from our encoding/decoding process.
|
|||
|
|
|||
|
The only 4 components in this address that can be used to represent a position are the number component and three word components.
|
|||
|
|
|||
|
The number component will be parsed as a 32-bit unsigned integer.
|
|||
|
For version 0, least significant bits 1-10 will be used for data.
|
|||
|
Therefore, stem:[b_text(number)=10]
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
49
|
|||
|
=b_text(number) + 3 * b_text(word)
|
|||
|
=10 + 3 * b_text(word)
|
|||
|
++++
|
|||
|
|
|||
|
[stem]
|
|||
|
++++
|
|||
|
b_text(word)
|
|||
|
=13
|
|||
|
++++
|
|||
|
|
|||
|
Therefore, each word component needs to represent 13 bits of information, or stem:[2^13=8192] words.
|
|||
|
|
|||
|
Using the responsibilities of each component we set in <<locality>>, the final layout can be determined.
|
|||
|
|
|||
|
NOTE: The layout of our encoding will be:
|
|||
|
|
|||
|
(From the example above)
|
|||
|
|
|||
|
[source]
|
|||
|
----
|
|||
|
All remaining bits (0) : vvvvvvvvvvvvvv
|
|||
|
Bit after data bits (1) : v
|
|||
|
Data bits : vvvvvvvvvvvvvvvvvvvvvvvvvvvvv vvvvvvvvvvvvvvvvv
|
|||
|
Face number : vvv
|
|||
|
Bit : 64 48 32 16 1
|
|||
|
: | | | | |
|
|||
|
: 01001011101010001011100010010011 1001001100100l001100000000000000
|
|||
|
Not represented : ^^^^^^^^^^^^^^^
|
|||
|
0000 (10 bits) : ^^^^^^^^^^
|
|||
|
WORD1 (13 bits) : ^^^^^^ ^^^^^^^
|
|||
|
WORD2 (13 bits) : ^^^^^^^^^^^^^
|
|||
|
WORD3 (13 bits) : ^^^^^^^^^^^^^
|
|||
|
----
|
|||
|
|
|||
|
Note that this is just what each component is responsible for encoding, but does not specify exactly how to encode the selected bits.
|
|||
|
|
|||
|
=== Wordlist Selection [[wordlist-selection]]
|
|||
|
|
|||
|
Considerations when designing a wordlist:
|
|||
|
|
|||
|
. Word complexity
|
|||
|
. Plural vs singular
|
|||
|
. Homonyms
|
|||
|
. Different language
|
|||
|
. Repetition
|
|||
|
|
|||
|
=== Implementation [[implementation]]
|
|||
|
|
|||
|
The link:https://s2geometry.io/[S2 Geometry^] project addressing scheme is exactly the format we want.
|
|||
|
However, I couldn't find a concrete description of the math behind the projections without looking at the source code.
|
|||
|
The implementation used in this algorithm cannot change because addresses might not map to the same location if they do.
|
|||
|
|
|||
|
Therefore, I will define the algorithm below, so it is independent from S2's link:https://github.com/google/s2geometry/blob/master/src/s2/s2point.h[Point^] and link:https://github.com/google/s2geometry/blob/master/src/s2/s2cell_id.h[CellID^] source code in case S2 ever changes their design, but it will be almost the exact same code as the current implementations.
|
|||
|
All credit for the projections should go to the S2 team.
|
|||
|
|
|||
|
.TODO
|
|||
|
[%header,cols="m,a,"]
|
|||
|
|===
|
|||
|
|Name |Format |Description
|
|||
|
|
|||
|
|(number, word1, word2, word3)
|
|||
|
|number ∈ [1, 9999]
|
|||
|
|
|||
|
word1, word2, word3 ∈ wordlist
|
|||
|
|
|||
|
len(wordlist) ≅ 2000
|
|||
|
|
|
|||
|
|
|||
|
|(cellid)
|
|||
|
|cellid ∈ [0, TODO] ∩ ℤ
|
|||
|
|Cell id: A 64-bit encoding of a face and a Hilbert curve parameter on that face, as discussed above.
|
|||
|
The Hilbert curve parameter implicitly encodes both the position of a cell and its subdivision level.
|
|||
|
|
|||
|
|(face, i, j)
|
|||
|
|face ∈ [0, 5] ∩ ℤ
|
|||
|
|
|||
|
i, j ∈ [0, 2^30^-1] ∩ ℤ
|
|||
|
|
|||
|
|Leaf-cell coordinates: The leaf cells are the subsquares that result after 30 levels of Hilbert curve subdivision, consisting of a 2^30^ × 2^30^ array on each face.
|
|||
|
stem:[i] and stem:[j] are integers in the range [0, 2^30^-1] that identify a particular leaf cell.
|
|||
|
The (i, j) coordinate system is right-handed on every face, and the faces are oriented such that Hilbert curves connect continuously from one face to the next.
|
|||
|
|
|||
|
|(face, s, t)
|
|||
|
|face ∈ [0, 5] ∩ ℤ
|
|||
|
|
|||
|
s, t ∈ [0, 1] ∩ ℝ
|
|||
|
|
|||
|
|Cell-space coordinates: stem:[s] and stem:[t] are real numbers in the range [0,1] that identify a point on the given face.
|
|||
|
For example, the point stem:[(s, t) = (0.5, 0.5)] corresponds to the center of the cell at level 0.
|
|||
|
Cells in (s, t)-coordinates are perfectly square and subdivided around their center point, just like the Hilbert curve construction.
|
|||
|
|
|||
|
|(face, u, v)
|
|||
|
|face ∈ [0, 5] ∩ ℤ
|
|||
|
|
|||
|
u, v ∈ [0, 1] ∩ ℝ
|
|||
|
|
|||
|
|Cube-space coordinates: To make the cells at each level more uniform in size after they are projected onto the sphere, we apply a nonlinear transformation of the form stem:[u=f(s)], stem:[v=f(t)] before projecting points onto the sphere.
|
|||
|
This function also scales the stem:[(u,v)]-coordinates so that each face covers the biunit square [-1,1]×[-1,1].
|
|||
|
Cells in stem:[(u,v)]-coordinates are rectangular, and are not necessarily subdivided around their center point (because of the nonlinear transformation stem:[f]).
|
|||
|
|
|||
|
|(x, y, z)
|
|||
|
|x, y, z ∈ [0, 1] ∩ ℝ
|
|||
|
|Spherical point: The final S2Point is obtained by projecting the (face, u, v) coordinates onto the unit sphere.
|
|||
|
Cells in stem:[(x,y,z)]-coordinates are quadrilaterals bounded by four spherical geodesic edges.
|
|||
|
|
|||
|
|(lat, lon)
|
|||
|
|lat ∈ [-90, 90]
|
|||
|
|
|||
|
lon ∈ [-180, 180]
|
|||
|
|
|
|||
|
|
|||
|
|===
|
|||
|
|
|||
|
The encoding and decoding code comes from a modified link:https://docs.rs/s2/latest/s2/[S2 Rust^] and link:https://github.com/golang/geo/blob/master/s2/cellid.go[S2 Go^].
|
|||
|
It might seem like duplication, but separating the math from the code allows this algorithm to be replicaed in any language.
|
|||
|
|
|||
|
.Encoding process
|
|||
|
[%header,cols="2m,5a"]
|
|||
|
|===
|
|||
|
|Step |Code
|
|||
|
|
|||
|
|(lat, lon)
|
|||
|
|Given
|
|||
|
|
|||
|
|(x, y, z)/Point
|
|||
|
| [source,rust]
|
|||
|
----
|
|||
|
fn lat_lon_to_xyz(lat: f64, lon: f64) -> (f64, f64, f64) {
|
|||
|
let x = cos(lon) * cos(lat);
|
|||
|
let y = sin(lon) * cos(lat);
|
|||
|
let z = sin(lat);
|
|||
|
|
|||
|
(x, y, z)
|
|||
|
}
|
|||
|
----
|
|||
|
|
|||
|
|(face, u, v)
|
|||
|
| [source,rust]
|
|||
|
----
|
|||
|
fn face(x: f64, y: f64, z: f64) -> u8 {
|
|||
|
let (x_abs, y_abs, z_abs) = (x.abs(), y.abs(), z.abs());
|
|||
|
let mut id = 0;
|
|||
|
let mut value = x;
|
|||
|
if y_abs > x_abs {
|
|||
|
id = 1;
|
|||
|
value = y;
|
|||
|
}
|
|||
|
if z_abs > value.abs() {
|
|||
|
id = 2;
|
|||
|
value = z;
|
|||
|
}
|
|||
|
if value < 0. {
|
|||
|
id += 3;
|
|||
|
}
|
|||
|
id
|
|||
|
}
|
|||
|
|
|||
|
fn xyz_to_fuv(x: f64, y: f64, z: f64) -> (u8, f64, f64) {
|
|||
|
let f: u8 = face(x, y, z);
|
|||
|
let (u, v) = match face {
|
|||
|
0 => (y / x, z / x),
|
|||
|
1 => (-x / y, z / y),
|
|||
|
2 => (-x / z, -y / z),
|
|||
|
3 => (z / x, y / x),
|
|||
|
4 => (z / y, -x / y),
|
|||
|
5 => (-y / z, -x / z),
|
|||
|
_ => panic!("Face {f} out of bounds"),
|
|||
|
};
|
|||
|
|
|||
|
(f, u, v)
|
|||
|
}
|
|||
|
----
|
|||
|
|
|||
|
|(face, s, t)
|
|||
|
| [source,rust]
|
|||
|
----
|
|||
|
pub fn u_or_v_to_s_or_t(u_or_v: f64) -> f64 {
|
|||
|
if u_or_v >= 0.0 {
|
|||
|
0.5 * (1.0 + 3.0 * u_or_v).sqrt()
|
|||
|
} else {
|
|||
|
1.0 - 0.5 * (1.0 - 3.0 * u_or_v).sqrt()
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
fn fuv_to_fst(f: u8, u: u64, v: u64) -> (u8, f64, f64) {
|
|||
|
let s = u_or_v_to_s_or_t(u);
|
|||
|
let t = u_or_v_to_s_or_t(v);
|
|||
|
|
|||
|
(f, s, t)
|
|||
|
}
|
|||
|
|
|||
|
----
|
|||
|
|
|||
|
|(face, i, j)
|
|||
|
| [source,rust]
|
|||
|
----
|
|||
|
fn st_to_ij(s: f64) -> u32 {
|
|||
|
clamp((MAX_SIZE as f64 * s).floor() as u32, 0, MAX_SIZE_I32 - 1)
|
|||
|
}
|
|||
|
|
|||
|
fn fst_to_fij(f: u8, s: u64, t: u64) -> (u8, u32, u32) {
|
|||
|
let i = st_to_ij(s);
|
|||
|
let j = st_to_ij(t);
|
|||
|
|
|||
|
(f, i, j)
|
|||
|
}
|
|||
|
----
|
|||
|
|
|||
|
|(cellid)
|
|||
|
| [source,go]
|
|||
|
----
|
|||
|
func cellIDFromFaceIJ(f, i, j int) CellID {
|
|||
|
// Note that this value gets shifted one bit to the left at the end
|
|||
|
// of the function.
|
|||
|
n := uint64(f) << (posBits - 1)
|
|||
|
// Alternating faces have opposite Hilbert curve orientations; this
|
|||
|
// is necessary in order for all faces to have a right-handed
|
|||
|
// coordinate system.
|
|||
|
bits := f & swapMask
|
|||
|
// Each iteration maps 4 bits of "i" and "j" into 8 bits of the Hilbert
|
|||
|
// curve position. The lookup table transforms a 10-bit key of the form
|
|||
|
// "iiiijjjjoo" to a 10-bit value of the form "ppppppppoo", where the
|
|||
|
// letters [ijpo] denote bits of "i", "j", Hilbert curve position, and
|
|||
|
// Hilbert curve orientation respectively.
|
|||
|
for k := 7; k >= 0; k-- {
|
|||
|
mask := (1 << lookupBits) - 1
|
|||
|
bits += ((i >> uint(k*lookupBits)) & mask) << (lookupBits + 2)
|
|||
|
bits += ((j >> uint(k*lookupBits)) & mask) << 2
|
|||
|
bits = lookupPos[bits]
|
|||
|
n \|= uint64(bits>>2) << (uint(k) * 2 * lookupBits)
|
|||
|
bits &= (swapMask \| invertMask)
|
|||
|
}
|
|||
|
return CellID(n*2 + 1)
|
|||
|
}
|
|||
|
----
|
|||
|
[source,rust]
|
|||
|
----
|
|||
|
fn fij_to_cellid(f: u8, s: u32, t: u32) -> u64 {
|
|||
|
let mut n = u64::from(f) << (POS_BITS - 1);
|
|||
|
let mut bits = u32::from(f & SWAP_MASK);
|
|||
|
|
|||
|
let mut k = 7;
|
|||
|
let mask = (1 << LOOKUP_BITS) - 1;
|
|||
|
loop {
|
|||
|
bits += ((i >> (k * LOOKUP_BITS)) & mask) << (LOOKUP_BITS + 2);
|
|||
|
bits += ((j >> (k * LOOKUP_BITS)) & mask) << 2;
|
|||
|
bits = LOOKUP_POS[bits as usize] as u32;
|
|||
|
n \|= ((bits >> 2) as u64) << ((k * 2 * LOOKUP_BITS) as u64);
|
|||
|
bits &= u32::from(SWAP_MASK \| INVERT_MASK);
|
|||
|
|
|||
|
if k == 0 {
|
|||
|
break;
|
|||
|
}
|
|||
|
k -= 1;
|
|||
|
}
|
|||
|
n * 2 + 1
|
|||
|
}
|
|||
|
----
|
|||
|
|
|||
|
|(number, word1, word2, word3)
|
|||
|
|
|
|||
|
|
|||
|
|===
|
|||
|
|
|||
|
.Decoding process
|
|||
|
[%header,cols="m,a,"]
|
|||
|
|===
|
|||
|
|Step |Formula |Description
|
|||
|
|
|||
|
|(number, word1, word2, word3)
|
|||
|
|Given
|
|||
|
|
|
|||
|
|
|||
|
|(cellid)
|
|||
|
|
|
|||
|
|
|
|||
|
|
|||
|
|(face, i, j)
|
|||
|
|
|
|||
|
|
|
|||
|
|
|||
|
|(face, s, t)
|
|||
|
|
|
|||
|
|
|
|||
|
|
|||
|
|(face, u, v)
|
|||
|
|
|
|||
|
|
|
|||
|
|
|||
|
|(x, y, z)/Point
|
|||
|
|
|
|||
|
|
|
|||
|
|
|||
|
|(lat, lon)
|
|||
|
|
|
|||
|
|
|
|||
|
|
|||
|
|===
|
|||
|
|
|||
|
|
|||
|
Sample source code:
|
|||
|
|
|||
|
[source,rust]
|
|||
|
----
|
|||
|
// Translating lat, lon to this_algorithm
|
|||
|
|
|||
|
impl<'a> From<&'a Point> for CellID {
|
|||
|
fn from(p: &'a Point) -> Self {
|
|||
|
let (f, u, v) = xyz_to_face_uv(&p.0);
|
|||
|
let i = st_to_ij(uv_to_st(u));
|
|||
|
let j = st_to_ij(uv_to_st(v));
|
|||
|
CellID::from_face_ij(f, i, j) // Important
|
|||
|
}
|
|||
|
}
|
|||
|
impl<'a> From<&'a LatLng> for CellID {
|
|||
|
fn from(ll: &'a LatLng) -> Self {
|
|||
|
let p: Point = ll.into();
|
|||
|
Self::from(p)
|
|||
|
}
|
|||
|
}
|
|||
|
impl<'a> From<&'a CellID> for LatLng {
|
|||
|
fn from(id: &'a CellID) -> Self {
|
|||
|
LatLng::from(Point::from(id))
|
|||
|
}
|
|||
|
}
|
|||
|
impl<'a> From<&'a LatLng> for Point {
|
|||
|
fn from(ll: &'a LatLng) -> Self {
|
|||
|
let phi = ll.lat.rad();
|
|||
|
let theta = ll.lng.rad();
|
|||
|
let cosphi = phi.cos();
|
|||
|
Point(Vector {
|
|||
|
x: theta.cos() * cosphi,
|
|||
|
y: theta.sin() * cosphi,
|
|||
|
z: phi.sin(),
|
|||
|
})
|
|||
|
}
|
|||
|
}
|
|||
|
impl<'a> From<&'a CellID> for Point {
|
|||
|
fn from(id: &'a CellID) -> Self {
|
|||
|
Point(id.raw_point().normalize()) // Important
|
|||
|
}
|
|||
|
}
|
|||
|
struct CellID {
|
|||
|
pub fn raw_point(&self) -> Vector {
|
|||
|
let (face, si, ti) = self.face_siti();
|
|||
|
face_uv_to_xyz(
|
|||
|
face,
|
|||
|
st_to_uv(siti_to_st(si as u64)),
|
|||
|
st_to_uv(siti_to_st(ti as u64)),
|
|||
|
)
|
|||
|
}
|
|||
|
fn face_siti(&self) -> (u8, u32, u32) {
|
|||
|
let (face, i, j, _) = self.face_ij_orientation(); // <= Important
|
|||
|
let delta = if self.is_leaf() {
|
|||
|
1
|
|||
|
} else if (i ^ (self.0 as u32 >> 2)) & 1 != 0 {
|
|||
|
2
|
|||
|
} else {
|
|||
|
0
|
|||
|
};
|
|||
|
(face, 2 * i + delta, 2 * j + delta)
|
|||
|
}
|
|||
|
pub fn is_leaf(&self) -> bool {
|
|||
|
self.0 & 1 != 0
|
|||
|
}
|
|||
|
}
|
|||
|
----
|
|||
|
|
|||
|
.Helper functions
|
|||
|
[source,rust]
|
|||
|
----
|
|||
|
pub fn siti_to_st(si: u64) -> f64 {
|
|||
|
if si > MAX_SITI {
|
|||
|
1f64
|
|||
|
} else {
|
|||
|
(si as f64) / (MAX_SITI as f64)
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
pub fn face_uv_to_xyz(face: u8, u: f64, v: f64) -> Vector {
|
|||
|
match face {
|
|||
|
0 => Vector::new(1., u, v),
|
|||
|
1 => Vector::new(-u, 1., v),
|
|||
|
2 => Vector::new(-u, -v, 1.),
|
|||
|
3 => Vector::new(-1., -v, -u),
|
|||
|
4 => Vector::new(v, -1., -u),
|
|||
|
5 => Vector::new(v, u, -1.),
|
|||
|
_ => unimplemented!(),
|
|||
|
}
|
|||
|
}
|
|||
|
pub fn st_to_uv(s: f64) -> f64 {
|
|||
|
if s >= 0.5 {
|
|||
|
(1. / 3.) * (4. * s * s - 1.)
|
|||
|
} else {
|
|||
|
(1. / 3.) * (1. - 4. * (1. - s) * (1. - s))
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
pub fn uv_to_st(u: f64) -> f64 {
|
|||
|
if u >= 0. {
|
|||
|
0.5 * (1. + 3. * u).sqrt()
|
|||
|
} else {
|
|||
|
1. - 0.5 * (1. - 3. * u).sqrt()
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
pub fn xyz_to_face_uv(r: &Vector) -> (u8, f64, f64) {
|
|||
|
let f = face(r);
|
|||
|
let (u, v) = valid_face_xyz_to_uv(f, r);
|
|||
|
(f, u, v)
|
|||
|
}
|
|||
|
|
|||
|
pub fn valid_face_xyz_to_uv(face: u8, r: &Vector) -> (f64, f64) {
|
|||
|
}
|
|||
|
|
|||
|
fn st_to_ij(s: f64) -> u32 {
|
|||
|
clamp((MAX_SIZE as f64 * s).floor() as u32, 0, MAX_SIZE_I32 - 1)
|
|||
|
}
|
|||
|
----
|
|||
|
|
|||
|
== Interfaces [[interfaces]]
|
|||
|
|
|||
|
Goal 2 of this project is to provide many interfaces that are easy to use.
|
|||
|
|
|||
|
All interfaces *must* use minimum resources (network, CPU, RAM), so they can run on virtually any device.
|
|||
|
|
|||
|
.Some Ideas for Interfaces
|
|||
|
. Command line
|
|||
|
** `this_algorithm -e -90,180` \=> `\...`
|
|||
|
** Useful for developers, test data generation, etc.
|
|||
|
** Easiest to write
|
|||
|
|
|||
|
. HTTP API
|
|||
|
** Simple HTTP endpoint for JSON/XML encoding/decoding
|
|||
|
** Will not require accounts and will not be resource intensive
|
|||
|
|
|||
|
. JavaScript/HTML File
|
|||
|
** Downloadable `.html` file with embedded CSS and JavaScript that can encode/decode offline
|
|||
|
|
|||
|
. Offline PWA
|
|||
|
** Useful for smartphones
|
|||
|
** Should work offline if possible, but also interface with Google Maps/JAWG/OSM if there is internet connectivity
|
|||
|
** Allows users to translate any link (Google Maps/OSM/Apple Maps) to this_algorithm as a share target
|
|||
|
|
|||
|
. OSMAnd/Other existing mapping applications
|
|||
|
* Requires input from these applications
|
|||
|
|
|||
|
++++
|
|||
|
<style>
|
|||
|
#header, #content, #footnotes, #footer {
|
|||
|
max-width: unset !important;
|
|||
|
}
|
|||
|
.hll {
|
|||
|
background-color: #ff0;
|
|||
|
}
|
|||
|
</style>
|
|||
|
++++
|