Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorta kinda.

Your code has to put the right things in a transaction all the time for transactions for transactions to work right. If there is some flow of information like

   application does query -> application thinks -> application does update
you have to wrap the whole sandwich in a transaction, people frequently don't do that. If I'm writing 20 of those for an application I want something that I know is bulletproof.

My experience with SQL is that the average SQL developer doesn't really understand how to do transactions right but their ass gets saved (in a probabilistic sense) by the grouping of updates that is implicit by running an INSERT or an UPDATE against a table.

There's also the fact that a lot of triple stores are seriously half baked research-quality code if that. Many triple stores struggle if you just try to load 100,000 triples sequentially, for an application like my YOShInOn RSS reader which I expect to use every day and not have to patch or maintain anything for 18+ months. (Ok, a 20GB database that needs to be pruned crept up on me gradually, but that's an arangodb problem, I'd expect the average triple to store to have crumbled 17 months ago.)

I'd love to have something that updates like a document-oriented database but lets you run a SPARQL query against the union of all the documents. Database experts though always seem to change the subject when it comes to having a graph algebra that lets you UNION 10 million graphs.

(For that matter, I sure as hell couldn't pitch any kind any kind of "boxes-and-lines" query tool [1] etc. that passed JSON documents/RDF graphs over the lines between the operators to the VCs and private equity people who were buying up query engines circa 2015 because they were hung up on the speed of columnar query engines... Despite the fact that the ones that pass relational rows over the lines require people who really aren't qualified to do so create analysis jobs that look like terrible hairballs because of all the joins they do.)

[1] Alteryx, KNIME



> you have to wrap the whole sandwich in a transaction

True, SPARQL does not allow "opening" transactions such that you can run one query, do some logic, and run another query while doing commit. Which was a pain for me. RDF4J has a non-standard API to do that, I think they are trying to upstream it to SPARQL 1.2.

> There's also the fact that a lot of triple stores are seriously half baked research-quality code if that.

Also true. Although excellent researchers who wrote one of the best reasoners (Pellet) decided to leave academia and make a production grade system. They succeeded with Stardog but you don't want to know how much a license costs.

> couldn't pitch any kind any kind of "boxes-and-lines" query tool [1] etc. that passed JSON documents/RDF graphs

I really enjoy this talk from one of the creators of OWL [1]. There, he makes a point that OWL is unpopular not because it's too complex but because it's not advanced enough to solve real problems people care about (read: ready to pay money for). I think the case you described involves VCs having clarity on how to make money off one thing but not the other. I do think that the Semantic Web 3.0 (if we count Linked Data as a Semantic Web 2.0 aka Semantic Web Lite attempt) will need a better (appealing to business) case than the one presented in the 2001 SciAm paper.

[1]: https://videolectures.net/videos/eswc2011_hendler_work


OWL ontologies making a big comeback as part of Knowledge Graph groundings for LLM outputs. And several SPARQL and RDF knowledge graph startups are VC-baked and thriving. The world is a big place.


Well, there is the new use case that appeals to VCs! And I guess it's a good reminder that I should re-subscribe to your blog :)


Personally I thought Stardog was trash, but if I'd had different requirements I might be happy with it.

The trouble w/ OWL as I see it (talked about in that TR) is that people don't really want "first order logic", but they want "first order logic + arithmetic" which is a nightmare that Kurt Godel warned you about. (That ISO 20022 which that TR is related to is about the financial domain which is all about arithmetic)

After Doug Lenat's death a lot of stuff came out that revealed the problems w/ Cyc, not least that even if you try to build something that is "knowledge based" it can't practically solve all the problems you want using a SMT-based strategy but you have to build a library of special purpose algorithms for everything you want to do and it turns out to be a godawful mess.

I'm disappointed that the semweb community hasn't made a serious crack at usable and efficient production rules (dealing w/ problems like negation, controlling execution order, RETE execution, retraction) instead we get half-answers like SPIN with fixed-point execution (used an even more half-baked version of that to research that TR, gets you somewhere). Of course, production rules never got standardized in any domain because nobody can agree on the way to address those four issues even though it usually isn't hard to find an answer that's fine for a particular application.

(It's a frequently problem that experts on a technology can get by on half-baked specific answers that would need a general solution if they were going to be useful for a general audience. One reason why parser generators are so bad is that if you understand parser generators enough to write a parser generator you aren't bothered by the terrible developer experience of parser generators.)


You seem nice.


Sorry for the negativity Kendall but the semweb didn't return the love that I gave it. I did hundreds of sales calls that went nowhere, but my phone kept ringing for people who wanted me to work on neural nets.


That’s tough. Not sure what that has to do with Stardog. Biggest companies in the world rely on it daily and you say it’s trash. I couldn’t find an email from you using it since 2013. I guess we figured something out. NNs are cool too; at last count we use half a dozen different ones including GNNs… NeSy is hot and I can hardly read a paper these days that doesn’t talk about triples.


(1) I'll grant it was a long time ago. Things could have changed a lot.

(2) It's generic that a new database comes out, gets hyped, but turns out to be "trash" when you try to use it. If a new database was actually good that would be exceptional. (Probably in 2013 it satisfied somebody's requirements but the hype for Stardog in 2013 seemed to be entirely out of line with what I needed for the project I was doing at the time)

I thought Postgres was trash in 2001 and called it CrashGreSlow, now I swear by it. Early on people were making big claims for it that were not substantiated but people did the hard work over a long time to make it great.

I thought mongodb was trash when it came out, then I worked for a place that used it despite the engineers believing it was trash and begging me not to use it for a spike prototype. It never got better. Now it is common knowledge that mongodb is trash.

(3) Maybe it's not fair but I was hurt by the experience, my wife was furious at the balance I'd run up on the HELOC chasing my Moby Dick. As an applications programmer who was accustomed to getting things right I had a terrible opinion of most of the luminaries in the semantic web field at the time many of whom were shipping code that was academic quality at best.


You mean brutally honest.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: