Telehash: JSON+UDP+DHT = Freedom

jeremie · on Jan 5, 2011

It's a little on the young side yet, and I have to prove out some of the apps/demos of it at scale, but it's coming along very nicely and shows a lot of promise.

Ultimately it's a public DHT that enables apps to punch through the NATs (to go direct device-to-device) and talk JSON to each other. It doesn't solve all the problems you face being distributed, but it's a good start :)

jamii · on Jan 5, 2011

Are you aware of http://libswift.org/ ? The goals are different (swift is targeted more at bittorrent/multicast) but I think there is a lot of potential for crossover between the two projects.

pohl · on Jan 5, 2011

Reminds me of JXTA with its Pipe abstraction, which works in environment where the only thing allowed is outgoing http. Are you familiar with it? It has the burden of XML though. I like the JSON approach.

aston · on Jan 4, 2011

Jeremie Miller is the guy behind XMPP, so this is coming from somebody who really knows what he's doing when it comes to protocols.

This type of thing would be cool as a federation mechanism for something like Google Wave or Diaspora. I think it could also be used as an alternative to XMPP itself...

bkudria · on Jan 4, 2011

I agree with you, but I can see how some might strenuously object - XMPP is not universally loved.

(Luckily Telehash uses JSON)

jerf · on Jan 5, 2011

That's how you learn, though.

XMPP is much better than most people realize, anyhow. Most people do not realize how much of the messiness comes from the problem and not the protocol. IM's easy, right? Just message here, message there? Yeah, sure, the first iteration of IM in the mid-1990s was. File transfers are easy, right? Yes, when both clients and the server are on the same LAN they're trivial. On the real Internet? No, actually they're quite hard. Conferences are no sweat, right? If you're hacking together a Node.js demo that simply shovels out messages to everyone in the conference with no further features, sure. If you want like features and stuff in some sort of standard way, it gets harder. And so on. I'm not saying it's perfect in every way, in particular the oldest parts of the protocol are a bit crufty (like the overloading of the presence tag to mean too many things), but there's a lot of people who rag on it for really misguided reasons, or who failed to read the core standard and got tripped up on the connection process. (In fact, there's a lot of people who don't even spend enough time on it to realize that the standard is modularized and you don't have to take every piece for every purpose.)

If I were designing it from scratch today, yes it would be a JSON datagram protocol. But XMPP is many years older than JSON as a distinct thing.

moe · on Jan 5, 2011

XMPP is much better than most people realize, anyhow.

No, it is actually worse than most people realize. You only begin to realize the magnitude of the failure once you try to implement a client, and most people don't do that. XMPP is a trainwreck. It has never seen serious adoption because nobody wants to touch it - and for good reasons.

XMPP is what I cite when I try to explain the "XML mindset". It leads to bad things. It leads to ridiculous overengineering through layered complexity. It leads to a client/server ecosystem where each implementation speaks a different dialect because it's nearly impossible to get the protocol right.

There was a time when my roster would get screwed up in new, random, interesting ways whenever I launched a different client. Some clients would even manage to unsubscribe existing contacts for inexplicable reasons. And don't get me started on "Transports".

However, instant messaging is not rocket science. Neither is semi-decentralized instant messaging. XMPP makes it seem like a much harder problem than it really is, but only because XMPP is broken beyond repair.

Most people do not realize how much of the messiness comes from the problem and not the protocol.

Wrong.

Take a lesson from IRC, a group-chat protocol that, despite its age, works and scales amazingly well. A protocol that, despite an immense range of features, can easily be typed by a human on a telnet prompt, in real time.

It wouldn't take much fix the warts on IRC and extend it to cover everything that XMPP tries to do. This is what the XMPP author should have done in first place.

jerf · on Jan 5, 2011

I have implemented two clients, a generalized libpurple transport, and spent significant time on a server in a professional environment for sale to real customers. IRC isn't a replacement for what it does, and yeah, the problems come from the problem domain, not the protocol. I'd know the difference. None of the "professional" protocols seem to be significantly simpler, because they can't be... the domain complexity forbids it. I spent much more time working on semantic issues than I ever did working on the raw XML, orders of magnitude difference. The XML was always the easy part, even if JSON would have been even easier. The trappings are irrelevant. I could transform the entire XMPP protocol stack into JSON in a week, with implementation in ejabberd and one of my clients. XML is a sideshow.

The problem is getting the standardized semantics. If XMPP taught me anything, it is that it doesn't matter how many specs you throw at a programmer, they're just going to bash on the program until it sort of works most of the time and release it. That's where your roster problems come from, it's where a not insignificant number of your transport problems come from too. (Though the transport protocol is one of the spottier bits of the protocol.) The core bits of XMPP are generally reasonably well specified and in my experience actually held up surprisingly well as I bent and spindled it a little bit. (Corporate customers don't "get" rosters, don't get that you can start with a blank roster and work your way up, so I added a module to build rosters based on grouping criteria specified by the user and driven by outside input. Well beyond the stock ejabberd shared rosters, but pretty custom to our environment. XMPP actually dealt with these semi-magical roster entries just fine, to my surprise.) Those specifications really matter and just sort of bashing some stuff out that's 90% correct most of the time isn't good enough when you're trying to communicate with so many different systems. XMPP actually managed to avoid a lot of problems that even the "professional" systems had, having learned from their experiences; AIM last I knew still had some encoding corner cases you wouldn't expect in a modern program, all of the protocols had major encoding growing pains, surprising versioning issues, all these little quirks inside them that you never noticed because you can paper over a lot when you control both the clients and the servers. Again, I know it's not perfect but in the space of "deployed IM protocols" it does not make a bad showing.

IRC clients are actually just as quirky, IRC just doesn't hang on to anywhere near as much state or you'd see it mangled, spindled, and mutilated too. (Also part of the reason it's not a replacement, real users want that state.) This is fine, too, I don't have a problem with IRC for what it is, but you can't just drop it in everywhere you see an XMPP server.

If you try to work IRC up to be a real, true XMPP replacement, you'll be complaining about how hard it sucks in no time. Too much suckage is in the problem space.

moe · on Jan 5, 2011

If you try to work IRC up to be a real, true XMPP replacement, you'll be complaining about how hard it sucks in no time.

I doubt that.

The parts that actually somewhat work in XMPP would be fairly straightforward to add to IRC (mostly related to persistence and identity). From there the question is where you'd want to take it, not what idiocies XMPP fell for. I.e. the task would be to do it right, not to imitate a broken protocol.

Just compare http://www.ietf.org/rfc/rfc1459.txt to http://xmpp.org/protocols - where the latter isn't even the full story.

And then tell me with a straight face the complexity is "inherent to the problem". No. It's not.

IRC handles very similar problems to XMPP already (and then some that XMPP doesn't have) and the specification, in its entirety, is only 3643 lines long. Extending that for distributed, message-persisting operation would not bring it anywhere near the insanity of XMPP.

Naturally that's an academic exercise, nobody would actually re-shape IRC into an IM system that way. However, when cherry-picking concepts for a new protocol then IRC should be high on the list, and XMPP rather low.

pyre · on Jan 5, 2011

I challenge you to write a functional irc client based solely on the rfc. IRC is a fractured protocol where the real spec is in the source code of the implementations out there.

moe · on Jan 5, 2011

I challenge you to write a functional irc client based solely on the rfc.

I've actually written IRC bots mostly from the top of my head.

The protocol and semantics are really simple.

Type this into a console near you:

   nc irc.freenode.org 6667
   USER foo bar batz boo
   NICK test345
   JOIN #testchannel
   PRIVMSG #testchannel hello world
   PRIVMSG test345 hello self

Yes, that's all it takes for a minimal, functional client. (just remember to type PONG every once in a while)

I'm not sure what you mean by fractured. Like every protocol it has a few rough edges, but those are nowhere near the semantic nightmare that I witnessed when trying to dabble with XMPP (which admittedly was more than a year ago).

pyre · on Jan 5, 2011

I remember attempting to write an IRC bot a while back and finding that RFC severely lacking when it came to connecting to whatever irc network I had chosen to test it against. The problem was that the handshake to join the server was different in the RFC than what the server was expecting. This sentiment was mirrored by others I consulted with over IRC that were devs on the epic3 irc client. So, while writing a IRC bot may be simple, I would imagine that writing a client isn't. The protocol itself was the same (in the "COMMAND args" sense), but the conventions differed.

moe · on Jan 5, 2011

Let me tackle this from a different angle:

How long did it take you bring your client into a reliably working state? And have you tried to do the same with a XMPP client for comparison?

As said, I didn't mean to claim IRC is perfect - nothing is.

But if you think the differences that IRC networks have introduced are problematic then I invite you to try and build a most basic jabber client.

kragen · on Jan 5, 2011

It's a lot easier to write a functional IRC client by glancing at the RFC and banging on a piece of code until it mostly works. For example, http://lists.canonical.org/pipermail/kragen-hacks/2008-Febru... was written that way. Given that the only protocol message it has special handling for is PING, though, I'm pretty sure you could have written it based solely on the RFC.

However, I think it's a big mistake to claim IRC "scales amazingly well". The biggest IRC network today has tens of thousands of users (at the moment, freenode has 64000, undernet has 58000, and EFNet is down in the 32000 range) and the IRC networks are constantly suffering from breakdowns from overcapacity. Compare this to Skype, Facebook, or Gmail, with tens of millions of concurrent users.

remosi · on Jan 5, 2011

Speaking as someone who spent a long time working on Undernet trying to make IRC work. The RFC is poorly specified (eg ~ and ^ are missed from the case equivalent list). IRC scales poorly (the protocol relies heavily on global state). IRC doesn't do UTF-8, or in fact, any sane character set (see the case mapping mentioned above). Client's ignore the parsing rules (eg the : marking multiword last arguments) and kludge around them. Each network went off and did their own thing, fragmenting the protocol space, and then declared themselves as being the One True IRC Protocol. IRC puts a lot of trust in all the server admins on the network, making it difficult to federate, and so on.

IRC was a great protocol in the 1980's, it's been dead for a while, just there are no good replacements.

moe · on Jan 5, 2011

Yes, IRC has warts, I didn't mean to declare it as the end of all things.

The point I was trying to make is that IRC would be a more sane starting point than XMPP. Even despite all the shortcomings you mentioned and some more that you didn't. And even despite it being a strictly centralized design that would require more server-side changes than XMPP to turn it into a distributed system.

I'll lean out of the window and even claim you could make a distributed IRCd backwards compatible to existing IRC clients, as far as the core business of presence/state, chat and group-chat are concerned.

gst · on Jan 5, 2011

I mostly agree. A good overview over the drawbacks of XMPP is also given on: http://about.psyc.eu/Jabber

johnny22 · on Jan 7, 2011

only in that they are promoting what they are offering. :)

jemfinch · on Jan 5, 2011

"Luckily"? If the protocol were to catch on, imagine how many terabits of bandwidth would be wasted by such an unnecessarily verbose and redundant protocol. Imagine how many CPU-years will be wasted parsing and unparsing the packets. Imagine how much memory wasted because packets can't be processed as they come off the wire, but must be buffered in their entirety before they can be parsed.

This is exactly the sort of thing libraries like Google's Protocol Buffers and Facebook's Thrift were invented for, both of which are open source.

Optimizing for human readability using netcat just doesn't make sense for potentially core protocols such as this one.

jerf · on Jan 5, 2011

Ever gzipped yourself some JSON being used in a standardized protocol? I've got a JSON-based protocol at work for my project and my typical compression ratio on any but the smallest messages with a simple gzip is 16:1. YMMV depending on the exact contents but I don't lose much sleep over the bandwidth.

As for CPU years, meh. It's human years that matter. JSON parsing is that big a challenge anyhow. Protocol buffers et al are great, certainly, and they exist for a reason, but not every app needs them and human readability turns out to be very useful in practice.

zachrose · on Jan 5, 2011

What's "unparsing?"

(Not being snarky. Genuinely curious.)

loup-vaillant · on Jan 5, 2011

That's serializing: when you go from a data structure to a text file. The exact reverse of "parsing", actually.

wooster · on Jan 5, 2011

There are a few pretty decent streaming JSON parsers out there. I use yajl for exactly that.

Rantenki · on Jan 5, 2011

XMPP is positively streamlined compared to the bloat in a protocol like say SMTP; have you every looked at all those headers? IRC is simpler, but it has some pretty severe limitations. There are some worse protocols that could catch on (AMQP) ;)

jedsmith · on Jan 5, 2011

XMPP is streamlined compared to SMTP's bloat? Please elaborate on that opinion.

Rantenki · on Jan 5, 2011

Sure; XMPP's actual standard calls for a bunch of extraneous XML to get a basic message sent, no argument there. Also, a basic SMTP message calls for very little additional info (ie: to, from, subject, body), however, the reality is that an smtp message from almost any real world provider contains a ton of extra crap. I am not saying that SMTP is inferior in some way, just that in the real world, reliable message delivery isn't as streamlined as we would like. Here is a basic email from shapeways PR:

Delivered-To: derek@foodomain.com

Received: by 10.216.159.146 with SMTP id s18cs544645wek; Wed, 5 Jan 2011 11:11:57 -0800 (PST)

Received: by 10.150.158.4 with SMTP id g4mr22888707ybe.38.1294254716497; Wed, 05 Jan 2011 11:11:56 -0800 (PST)

Return-Path: <bounces-1e89d88424-b62849dacd@b.cts.vresp.com>

Received: from mkt4-sc.verticalresponse.com (mkt4-sc.verticalresponse.com [74.116.89.111]) by mx.google.com with ESMTP id l18si3840529ybn.44.2011.01.05.11.11.54; Wed, 05 Jan 2011 11:11:55 -0800 (PST)

... redacted another 2055 bytes of crap ...

Content-Type: multipart/alternative; boundary="__________MIMEboundary__________"; charset="UTF-8"

This is a multi-part message in MIME format.

--__________MIMEboundary__________ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

And here is where the actual message started...

bkudria · on Jan 5, 2011

Really? I think they can cross that bridge when they come to it. For now, adoption is critical.

cwp · on Jan 4, 2011

Ok, possibly interesting. What's it for? The about page says it's for sending small bits of JSON around. Why would I want to do that?

jamii · on Jan 5, 2011

It looks like telehash aims to be a substrate for building p2p apps. It basically solves addressing and NAT for you which are always the first things you have to deal with in a p2p system on the public internet.

lukeschlather · on Jan 4, 2011

This wiki page seems a little clearer, though anything prefaced with "My Understanding" might be a little suspect.

https://github.com/quartzjer/TeleHash/wiki/My-Understanding-...

So my understanding of that is that TeleHash is something of a cross between BitTorrent and Tor. Theoretically it allows for a totally distributed Internet. Though it looks like the implementation is rough, so it's hard to say how well-defined the protocol is.

Mizza · on Jan 5, 2011

Not like Tor, nor like BT, really.

It's more like.. Gnutella and JSON?

mathgladiator · on Jan 4, 2011

It looks like an alternative for DNS. ?

Kind of interesting, but I wonder how it works with security implications.

cilantro · on Jan 4, 2011

My guess would be that it doesn't. A public DHT can ensure that data is not corrupted or changed, but you cannot prevent others from seeing your data.

jamii · on Jan 5, 2011

I think the DHT is just used to address peers and do NAT negotiation. There is no need for every message to go through the DHT.

y0ghur7_xxx · on Jan 4, 2011

This could be combined with http://www.unhosted.org/ to deliver completely distributed applications.

scrame · on Jan 5, 2011

UDP does not guarantee connections. I wonder why it was done with this instead of a reliable multicast protocol (0mq comes to mind).

brandon · on Jan 5, 2011

ØMQ came to mind when I read the doc, too, but:

* UDP could be useful for hole-punching firewalls * In most networks, multicast traffic will be filtered at the WAN boundary * The nodes in most big DHTs aren't reliable anyway

The eventing would be nice, though... and the notion of an internet-wide ØMQ network is neat.

dedward · on Jan 5, 2011

0mq is a library to make network programming easier, not a new protocol itself.

UDP is being used because this is based on setting up a DHT which requires small messages that may or may not get through by nature to begin with - not bulk data transfer - presumably you deal with that at higher layers.

scrame · on Jan 5, 2011

0mq is an extension of the standard socket library and does support a reliable multicast protocol. It adds to sockets, its not just an api to make working with the existing standards easier.

http://manpages.ubuntu.com/manpages/maverick/man7/zmq_pgm.7....

Though, it looks like it might need more permissions than just using standard sockets.

dochtman · on Jan 5, 2011

Seems like an interesting idea. I also wonder why the max contents of a JSON object is 1400 bytes, that seems kind of limiting. UDP can do 65,507b of content...

agazso · on Jan 5, 2011

The bigger an UDP packet the less the probability it arrives. 1400 is probably chosen because it is less than the usual MTU of ethernet which is 1500.

phuff · on Jan 4, 2011

Telehash is being proposed and bootstrapped by Jeremie Miller the guy who originally did XMPP. I think it has some good potential to allow better decentralized communication between things like mobile devices, etc.

Deejahll · on Jan 4, 2011

"TeleHash is the culmination of years of discussions with many people, and is being primarily bootstrapped by Jeremie Miller."

This is the same Jeremie Miller who invented the Jabber (XMPP) protocol.

Groxx · on Jan 5, 2011

".see" as a property seems it could get a little annoying in many languages, given that period - you can't map keys directly. Why "." and not some other character? I'd think common unary characters should be avoided, as they may not be unambiguous in language-X, but why periods? And since there are periods and plusses, why not go hog-wild, and allow "^supercommand", "$(&@^!boom", and others?

dochtman · on Jan 5, 2011

Might be cleaner to separate headers and contents. I.e. {"h": {"ring": 43723}, "c": {"see": ["5.6.7.8:23456","11.22.33.44:11223"]}}. Get h for headers, c for commands, s for signals and d for data. Gets you a clean separation, no faux namespacing.

brandon · on Jan 5, 2011

I'd love to see a DHT network with some level of attention to durability.

Having played a lot with Kademlia, the biggest bummer was that in practice, stored resources had a lifetime of about 4 hours max, and that was when the requester was able to find the node holding your resource. This necessitated spamming your important resources pretty frequently to a redundant set of deterministic keys.

bobf · on Jan 5, 2011

It would be interesting to see this used in a non-public fashion, like for maintaining a distributed MongoDB-like system.

senthil_rajasek · on Jan 5, 2011

This uses UDP with a non-standard port. Personally, I would be wary of poking a hole in my firewall to make this work.

brandon · on Jan 5, 2011

Why, exactly? To your point, it uses UDP on a non-standard port; forwarding the traffic won't expose any of your existing behind-the-firewall services.

Anyhow, one of the draws of UDP for this stuff is that it's possible to do clever hole punching so that you don't have to open the port. http://en.wikipedia.org/wiki/UDP_hole_punching

pbrumm · on Jan 5, 2011

I like the idea. Having the data be encrypted would be cool too. A time-to-live would be nice so that it doesn't grow forever.

Although it sounds like an easy place for virus writers to store plans of attack. And hard to take down without taking down the whole system which by design would be very difficult.

jaimzob · on Jan 5, 2011

Looks very cool but are there particular advantages over, say, Scribe over FreePastry (http://www.freepastry.org/SCRIBE/default.htm)? How does routing efficiency compare?

mey · on Jan 4, 2011

Why couldn't this be done over TCP instead of UDP?

zacharypinter · on Jan 5, 2011

Might be for NAT traversal, ala something like:

http://en.wikipedia.org/wiki/UDP_hole_punching

One of the wiki pages (https://github.com/quartzjer/TeleHash/wiki/My-Understanding-...) specifically talks about working with NAT: "Packets can continue to pass through NAT network devices like home routers"

mburns · on Jan 5, 2011

No reason it couldn't, it is a matter of minimizing overhead.

zemanel · on Jan 5, 2011

first thing i thought when reading this was "distributed twitter"