Add more info to docs

This commit is contained in:
Austen Adler 2023-03-05 23:46:11 -05:00
parent af4dfae163
commit 876a7c2f38

View File

@ -264,55 +264,73 @@ Where stem:[text(num_prefix)] is the number of number/letter combinations in the
.Comparison of approximate wordlist size for different formats, sorted by wordlist size
[%header,cols="m,,,"]
|===
|Format |n~23~ (thousand) |n~22~ (thousand) |Consider?
|Format |n~22~ (thousand) |n~23~ (thousand) |Consider?
|999 WORD WORD
|649.62
|324.04
|649.62
|No, wordlist too big
|WORD WORD WORD
|75.01
|47.18
|75.01
|No, this is the exact format what3words uses
|(0-999)(A-Z0-9) WORD WORD
|108.27
|54.01
footnote:wordlistsize[The letters `OD0Q LI1 Z2 S5 8B` would be excluded from the alphanumeric list, making the number of alphanumeric characters 36-13=23]
|62.40
|125.00
|No, wordlist too big
|(0-999)(A-Z0-9)(A-Z0-9) WORD WORD
|18.04
|9.00
footnote:wordlistsize[]
|14.10
|28.20
|No, wordlist too big
|(1-128) WORD WORD WORD
|14.88
|9.36
|14.88
|No, restricting numbers to 128 is not worth it
|999 WORD WORD WORD
|7.50
|4.72
|7.50
|Maybe
|WORD WORD WORD WORD
|4.53
|3.20
|4.53
|No, this does not look like an address
|9999 WORD WORD WORD
|3.48
|2.19
|3.48
|Maybe
|(3-9A-Y)(3-9A-Y)(3-9A-Y) WORD WORD
footnote:wordlistsize[]
|92.90
|186.00
|No, wordlist too big
|(3-9A-Y)(3-9A-Y)(3-9A-Y) WORD WORD WORD
footnote:wordlistsize[]
|2.05
|3.26
|Maybe -- contender
|(1-1024) WORD WORD WORD
|7.44
|4.68
|Maybe
|7.44
|Maybe -- contender
|===
TODO: Decide if the second to last implementation makes sense.
The alphanumeric component needs to encode 13 bits if the word component only encodes 12 bits each (stem:[49-12*3=13]).
stem:[log_2(23^3) = log_2(12.20*10^3) = 13.6], so there is enough room.
NOTE: This project will use the `(1-1024) WORD0 WORD1 WORD2` variation (1 number component, and 3 word component).
It is longer than Xaddress and what3words, but with the tradeoff of having a significantly smaller dictionary than both.
@ -750,6 +768,43 @@ fn st_to_ij(s: f64) -> u32 {
}
----
=== Multi-encoding
TODO: Describe more
Due to the <<locality>> property, nearby locations will likely have common suffixes (`123 APPLE ORANGE GRAPE` will be close to `876 APPLE ORANGE GRAPE`).
This is a useful property that might be able to be used to encode multiple addresses together.
For example, one might consider an encoding like `123 AND 876 APPLE ORANGE GRAPE` to encode both addresses above, with the `AND` conjoining keyword causing a fork at its position:
* `111 AND 222 A B C` => `111 A B C` and `222 A B C`
* `111 A AND 222 D B C` => `111 A B C` AND `222 D B C`
* `111 A B AND 222 D E C` => `111 A B C` AND `222 D E C`
* `111 A B C AND 222 D E F` => `111 A B C` AND `222 D E F`
If two addresses have the same component, it might look slightly strange:
* `111 A B AND 111 D E C` => `111 A B C` and `111 D E C`
It must also be noted that ordering might cause issues.
For example, to encode `111 A B C`, `222 D E F`, and `333 A B C`:
* Without ordering, it's simple: `111 AND 333 A B C AND 222 D E F`
* With ordering, it's complicated: `111 A B C AND 222 D E F AND 333 A B C`
=== Compact hashing/Emojis
TODO: Consider this more
It would be useful, especially in multi-encoding, to generate a hash that maps to one or two (no more than 3) small pictures that can be used for verification.
This would allow at a quick glance
Considerations:
* The list of emojis/pictures _must_ be large in order to be more more effective.
* When using emoji, the exact same emoji set must be used.
If this does not happen, an example where different emoji sets can cause issues is if one person describes a "blue rocket" on their device, which might be displayed as a green rocket on another device.
== Sample Data [[sample-data]]
In order to test this algorithm out, I want to ensure the conversions are not incorrect.