Hacker News new | past | comments | ask | show | jobs | submit login

Very nice. Worth keeping in mind prior examples for comparison's sake. My favorites so far:

- https://www.opendatanetwork.com -- what I would call the "Google, for Socrata datasets"

- https://public.enigma.com/ -- One of the best collections of U.S. federal data, with good taxonomy and lots of useful options for refining a search, such as filtering by dataset size.

- https://www.data.gov/ -- Not as useful as what most people would want -- e.g. unlike Enigma and Socrata, it's a directory of self-submitted (by the government) data sources, not one in which the data is stored/provided in a standardized way. But it's a pretty good listing, though not sure if it's much better than just using Google.

- https://data.gov.uk/ -- Better than the U.S. version in terms of usability and taxonomy.




@danso thanks for the feedback on data.gov. I'm part of the small (3 person) team that helps to manage it. If you have a moment to chat I'll see if I can reach out to you to see if you'd be interested in participating in some more in-depth user research in the future. Folks can also always leave feedback via email, github, twitter, and other means - https://www.data.gov/contact

The Federal Data Strategy will also be opening up for comments again in October - https://strategy.data.gov/feedback/

Data.gov and Federal agencies use the same metadata standard (DCAT) that Google Dataset Search is using so much of our metadata is also being syndicated there.


I think the biggest blocker with using publicly available datasets is stale data.

If you, or anyone else who aggregates these datasets could make it EASY to find the FREQUENCY of updates, rather than just the LAST UPDATED timestamp, it'd incentivize people to consume APIs more.

I realize having a snapshot from 2014 is better than what was publicly available before. But I feel no one's really talked about why they would or wouldn't use particular data.


I think this is exactly correct. Frequency of updates (and clear documentation of the lag relationship between when data is reported and for what period data is applicable too) is often missing or hard to find.

The value of increasing the cadence of updates should also not be understated! A lot of public dataset report on annual frequencies with more than a quarter of delay... Although this is a different issue altogether that has more to do with the processes of the reporting agency.


Yes, it's interesting how much difference the data about data management can make in people's engagement with the platform.


Definitely, feel free to email me (in my user bio). Thanks for the info about the upcoming comment period, will have to put a reminder on my calendar for that.


Interesting. There have been a lot of attempts at "meta data portals" that search across portals. Most of them have struggled.

At Open Knowledge we built a really early one called opendatasearch.org in 2011/2012 - now defunct - and were involved in the first version of the pan EU open data portal. We also had the original https://ckan.net/ (and subsites) which is now https://datahub.io/ and has become much more focused on quality data and data deployment. [Disclosure: I was/am involved in many of these projects]

The challenge, as others have mentioned, is that data quality is very variable and searching for datasets is complicated (think of software as an analogy - searching for good code libraries is a bit of an art).

I imagine Google are trying this out before making datasets another "special type" of search result -- after all you can already search google for datasets. In addition, Google are already Google so including datasets will have a level of comprehensiveness and exposure you struggle with elsewhere (part of the power of monopoly in a sense!).

PS: for those looking for data gov sites https://dataportals.org/ has most of these.


https://data.gov.ie —- this is a repository of a bunch of the open data produced in the Irish public sector.

(Disclosure, I work on this)


Playing around, a lot of countries use this url scheme:

https://data.gov.be

http://data.gov.ro/

https://www.dati.gov.it/ (note, Italy redirects data. to dati.)

https://data.gov.pl/

https://data.gov.za/ (South Africa's has a cert problem)

https://data.gov.in

https://data.gov.au/

https://datos.gob.mx/


although data.gov.pl exists, and even presents a valid cert, it has no content. The place with the data is:

https://danepubliczne.gov.pl/en/#

(website is being terribly slow for me right now)







I'm a particular fan of the boundary "dataset" you have that's a low-resolution TIFF file.

(edit to add something more productive: the site is littered to the tune of at least 25% and maybe even a third with junk "data", all obviously added to get the number of records as high as possible, with no regard to whether that data is either useful to anybody, is machine-readable in any way at all, or even -- in the example above -- even qualifies as "data". Data.gov.ie would be moderately interesting if all the shit in it was removed.)


The quality of the datasets varies greatly depending on the source. Some work well, some, less so. There are data sources that are undergoing active development to harvest them more accurately. None of them were added to pump up the numbers.

The biggest numbers bump recently was ca 1600 Met Eireann rainfall records datasets from all around the country, some of them daily rainfall dating back 60 years. (Spoiler, there’s a lot of rain)

This is specifically a catalog of data sets, it doesn’t host the data except for previews, and even doing that is pretty complicated in all its glory.


> None of them were added to pump up the numbers.

Then kindly explain these

https://data.gov.ie/advanced_search?query=&texttype=anywhere...

Or these

https://data.gov.ie/advanced_search?query=&texttype=anywhere...

(I'd show only the PDF-only ones but your search doesn't work.)

Oh and look two of these also contain no machine-readable data whatsoever

https://data.gov.ie/advanced_search?query=&texttype=anywhere...


Are you going to release any street address datasets at some point?


I believe that the eircode dataset is one of the most highly requested sources, but it’s a private for profit database.

If you are aware of such a dataset that a public body is hosting, then it would certainly be something to include. Convincing (and helping) the public bodies to publish their data is still a big task.


I believe that An Post built a similar system many moons ago, and they should presumably be more amenable to open-sourcing.

The handling of the whole Eircode thing makes my blood boil, to be honest.


Blatantly self-promoting, but we (in Australia) are trying to develop a general solution for better open data search over at https://search.data.gov.au - solution is all open source at https://github.com/magda-io/magda.


Also, https://data.gov.in -- Lots of self-submitted datasets, and includes an API. Usability is average, much like the other government data sources.


Another example is Elsevier's Data Search project

- https://datasearch.elsevier.com/


I really like NYC Open Data: https://opendata.cityofnewyork.us/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: