- https://public.enigma.com/ -- One of the best collections of U.S. federal data, with good taxonomy and lots of useful options for refining a search, such as filtering by dataset size.
- https://www.data.gov/ -- Not as useful as what most people would want -- e.g. unlike Enigma and Socrata, it's a directory of self-submitted (by the government) data sources, not one in which the data is stored/provided in a standardized way. But it's a pretty good listing, though not sure if it's much better than just using Google.
- https://data.gov.uk/ -- Better than the U.S. version in terms of usability and taxonomy.
@danso thanks for the feedback on data.gov. I'm part of the small (3 person) team that helps to manage it. If you have a moment to chat I'll see if I can reach out to you to see if you'd be interested in participating in some more in-depth user research in the future. Folks can also always leave feedback via email, github, twitter, and other means - https://www.data.gov/contact
Data.gov and Federal agencies use the same metadata standard (DCAT) that Google Dataset Search is using so much of our metadata is also being syndicated there.
I think the biggest blocker with using publicly available datasets is stale data.
If you, or anyone else who aggregates these datasets could make it EASY to find the FREQUENCY of updates, rather than just the LAST UPDATED timestamp, it'd incentivize people to consume APIs more.
I realize having a snapshot from 2014 is better than what was publicly available before. But I feel no one's really talked about why they would or wouldn't use particular data.
I think this is exactly correct. Frequency of updates (and clear documentation of the lag relationship between when data is reported and for what period data is applicable too) is often missing or hard to find.
The value of increasing the cadence of updates should also not be understated! A lot of public dataset report on annual frequencies with more than a quarter of delay... Although this is a different issue altogether that has more to do with the processes of the reporting agency.
Definitely, feel free to email me (in my user bio). Thanks for the info about the upcoming comment period, will have to put a reminder on my calendar for that.
Interesting. There have been a lot of attempts at "meta data portals" that search across portals. Most of them have struggled.
At Open Knowledge we built a really early one called opendatasearch.org in 2011/2012 - now defunct - and were involved in the first version of the pan EU open data portal. We also had the original https://ckan.net/ (and subsites) which is now https://datahub.io/ and has become much more focused on quality data and data deployment. [Disclosure: I was/am involved in many of these projects]
The challenge, as others have mentioned, is that data quality is very variable and searching for datasets is complicated (think of software as an analogy - searching for good code libraries is a bit of an art).
I imagine Google are trying this out before making datasets another "special type" of search result -- after all you can already search google for datasets. In addition, Google are already Google so including datasets will have a level of comprehensiveness and exposure you struggle with elsewhere (part of the power of monopoly in a sense!).
I'm a particular fan of the boundary "dataset" you have that's a low-resolution TIFF file.
(edit to add something more productive: the site is littered to the tune of at least 25% and maybe even a third with junk "data", all obviously added to get the number of records as high as possible, with no regard to whether that data is either useful to anybody, is machine-readable in any way at all, or even -- in the example above -- even qualifies as "data". Data.gov.ie would be moderately interesting if all the shit in it was removed.)
The quality of the datasets varies greatly depending on the source. Some work well, some, less so. There are data sources that are undergoing active development to harvest them more accurately. None of them were added to pump up the numbers.
The biggest numbers bump recently was ca 1600 Met Eireann rainfall records datasets from all around the country, some of them daily rainfall dating back 60 years. (Spoiler, there’s a lot of rain)
This is specifically a catalog of data sets, it doesn’t host the data except for previews, and even doing that is pretty complicated in all its glory.
I believe that the eircode dataset is one of the most highly requested sources, but it’s a private for profit database.
If you are aware of such a dataset that a public body is hosting, then it would certainly be something to include. Convincing (and helping) the public bodies to publish their data is still a big task.
- https://www.opendatanetwork.com -- what I would call the "Google, for Socrata datasets"
- https://public.enigma.com/ -- One of the best collections of U.S. federal data, with good taxonomy and lots of useful options for refining a search, such as filtering by dataset size.
- https://www.data.gov/ -- Not as useful as what most people would want -- e.g. unlike Enigma and Socrata, it's a directory of self-submitted (by the government) data sources, not one in which the data is stored/provided in a standardized way. But it's a pretty good listing, though not sure if it's much better than just using Google.
- https://data.gov.uk/ -- Better than the U.S. version in terms of usability and taxonomy.