diff --git a/docs/DESIGN.adoc b/docs/DESIGN.adoc index 8ff6240..c23b331 100644 --- a/docs/DESIGN.adoc +++ b/docs/DESIGN.adoc @@ -264,55 +264,73 @@ Where stem:[text(num_prefix)] is the number of number/letter combinations in the .Comparison of approximate wordlist size for different formats, sorted by wordlist size [%header,cols="m,,,"] |=== -|Format |n~23~ (thousand) |n~22~ (thousand) |Consider? +|Format |n~22~ (thousand) |n~23~ (thousand) |Consider? |999 WORD WORD -|649.62 |324.04 +|649.62 |No, wordlist too big |WORD WORD WORD -|75.01 |47.18 +|75.01 |No, this is the exact format what3words uses |(0-999)(A-Z0-9) WORD WORD -|108.27 -|54.01 +footnote:wordlistsize[The letters `OD0Q LI1 Z2 S5 8B` would be excluded from the alphanumeric list, making the number of alphanumeric characters 36-13=23] +|62.40 +|125.00 |No, wordlist too big |(0-999)(A-Z0-9)(A-Z0-9) WORD WORD -|18.04 -|9.00 +footnote:wordlistsize[] +|14.10 +|28.20 |No, wordlist too big |(1-128) WORD WORD WORD -|14.88 |9.36 +|14.88 |No, restricting numbers to 128 is not worth it |999 WORD WORD WORD -|7.50 |4.72 +|7.50 |Maybe |WORD WORD WORD WORD -|4.53 |3.20 +|4.53 |No, this does not look like an address |9999 WORD WORD WORD -|3.48 |2.19 +|3.48 |Maybe +|(3-9A-Y)(3-9A-Y)(3-9A-Y) WORD WORD +footnote:wordlistsize[] +|92.90 +|186.00 +|No, wordlist too big + +|(3-9A-Y)(3-9A-Y)(3-9A-Y) WORD WORD WORD +footnote:wordlistsize[] +|2.05 +|3.26 +|Maybe -- contender + |(1-1024) WORD WORD WORD -|7.44 |4.68 -|Maybe +|7.44 +|Maybe -- contender |=== +TODO: Decide if the second to last implementation makes sense. +The alphanumeric component needs to encode 13 bits if the word component only encodes 12 bits each (stem:[49-12*3=13]). +stem:[log_2(23^3) = log_2(12.20*10^3) = 13.6], so there is enough room. + NOTE: This project will use the `(1-1024) WORD0 WORD1 WORD2` variation (1 number component, and 3 word component). It is longer than Xaddress and what3words, but with the tradeoff of having a significantly smaller dictionary than both. @@ -750,6 +768,43 @@ fn st_to_ij(s: f64) -> u32 { } ---- +=== Multi-encoding + +TODO: Describe more + +Due to the <> property, nearby locations will likely have common suffixes (`123 APPLE ORANGE GRAPE` will be close to `876 APPLE ORANGE GRAPE`). +This is a useful property that might be able to be used to encode multiple addresses together. + +For example, one might consider an encoding like `123 AND 876 APPLE ORANGE GRAPE` to encode both addresses above, with the `AND` conjoining keyword causing a fork at its position: + +* `111 AND 222 A B C` => `111 A B C` and `222 A B C` +* `111 A AND 222 D B C` => `111 A B C` AND `222 D B C` +* `111 A B AND 222 D E C` => `111 A B C` AND `222 D E C` +* `111 A B C AND 222 D E F` => `111 A B C` AND `222 D E F` + +If two addresses have the same component, it might look slightly strange: + +* `111 A B AND 111 D E C` => `111 A B C` and `111 D E C` + +It must also be noted that ordering might cause issues. +For example, to encode `111 A B C`, `222 D E F`, and `333 A B C`: + +* Without ordering, it's simple: `111 AND 333 A B C AND 222 D E F` +* With ordering, it's complicated: `111 A B C AND 222 D E F AND 333 A B C` + +=== Compact hashing/Emojis + +TODO: Consider this more + +It would be useful, especially in multi-encoding, to generate a hash that maps to one or two (no more than 3) small pictures that can be used for verification. +This would allow at a quick glance + +Considerations: + +* The list of emojis/pictures _must_ be large in order to be more more effective. +* When using emoji, the exact same emoji set must be used. +If this does not happen, an example where different emoji sets can cause issues is if one person describes a "blue rocket" on their device, which might be displayed as a green rocket on another device. + == Sample Data [[sample-data]] In order to test this algorithm out, I want to ensure the conversions are not incorrect.