Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We do something similar for some limited geospatial search using elastic search. We make a set of h3 indexes for each of the hundreds of millions of gps recordings on our service, and store them in elastic search. Geospatial queries become full text search queries, where a point is on the line if the set of h3 indexes contains the point. You can do queries on how many cells overlap, which lets you match geospatial tracks on the same paths, and with ES coverage queries, you can tune how much overlap you want.

Instead of using integers IDs for the hexes, we created an encoded version of the ID that has the property that removing a character gets you the containing parent of the cell. This means we can do basic containment queries by querying with a low resolution hex (short string) as a prefix query. If a gps track goes through this larger parent cell, the track will have hexes with the same prefix. You don’t get perfect control of distances because hexes have varying diameters (or rather the approximation, since they aren’t circles they are hexes), but in practice and at scale for a product that doesn’t require high precision, it’s very effective.

I think at the end of this year we’ll have about 6tb of these hex sets in a four node 8 process ES cluster. Performance is pretty good. Also acts as our full text search. Half the time we want a geo search we also want keyword / filtering / etc on the metadata of these trips.

Pretty fun system to build, and the concept works with a wide variety of data stores. Felt like a total hack job but it has stood the test of time.

Thanks uber, h3 is a great library!



Elastisearch and Opensearch have a built in geo_shape type that is a bit more optimal for queries like this.

Before that existed (pre 1.0 actually), I did something similar with geohashes, which are similar to h3 but based on simple string encoded quad trees. I indexed all the street segments in openstreetmap with that (~800 million at the time) and implemented a simple reverse geocoder. Worked shockingly well.

The geo_shape type uses a bkd tree in binary format. It's heavily optimized for this type of intersects/overlaps queries at scale. Basically does the same thing but using a lot less disk space and memory. It's similar to what you would find in proper GIS databases. Elasticsearch/opensearch also support h3 and geohash grid aggregations on top of geo_shape or geo_point types.

I'm guessing the author is using something like postgresql which of course has similar geospatial indexing support via post gis.


Doesn’t meet all our product requirements unfortunately. We used returned hexes in certain queries, and we also hacked in directionality of line using least significant 12 bits of the hex (didn’t need that level of hex precision), and we are doing direction oriented matching and counting. For simpler use cases it’s definitely a better option. thanks for reminding me and other people reading my comment!


Beware that the parent hexagon does not contain its children...


No idea if they are doing this, but you can use Gosper islands (https://en.wikipedia.org/wiki/Gosper_curve) which are close to hexagons, but can be exactly decomposed into 7 smaller copies.


Can Gosper islands tile the sphere though?


Yes! A Gosper Island in H3 is just the outline of all the descendants of a cell at a some resolution. The H3 cells at that resolution tile the sphere, and the Gosper Islands are just non-overlapping subsets of those cells, which means they tile the sphere.


Not quite - you need 12 pentagons in a mostly hexagonal tiling of the sphere (and if you're keeping them similar sizes, Gosper-islands force hexagon-like adjacency). I don't think it's possible to tile the sphere using more than 20 exactly identical pieces.

You could get a Gosper-island like tiling starting from H3 by saying that each "Hex" is defined recursively to be the union of its 6/7 parts (stopping at some small enough hexagons/pentagons if you really want). Away from the pentagons, these tiles would be very close to Gosper islands.


> I don't think it's possible to tile the sphere using more than 20 exactly identical pieces.

I was wrong about this (e.g. https://en.wikipedia.org/wiki/Rhombic_triacontahedron). It still seems possible to me that there's a limit to the smallest tile that can tile a unit sphere on its own. (Smallest by diameter as a set of points in R^3).


Awesome comment, thanks for sharing the details. I love this kind of pragmatic optimization. Also, one dev's "total hack* job" [e.g. yourself, in the past] is another's stroke of genius.

* I'd frame it as "kludge", reserving "hack" for the positive HN sense. :)


Very cool! And the prefix queries you mention are what I was trying to get at in another comment, but you explained it better :)


Does this effect writes negatively?


Not any differently than another indexed text field


Its pretty good then.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: