If you were wanting to use Rust but also wanted to use HTTP (client or server), I've got https://github.com/chris-morgan/rust-http which has rapidly become the de facto HTTP library (although it started after Rust 0.7 was released). It's far from complete—but that's an opportunity for you to join in, if you want.
Thanks for reminding me: done. I saw that one earlier when I was looking for places to <del>plug</del><ins>advise people about</ins> rust-http, but at that time I only had an HTTP server. Then when I implemented the client I forgot to add an answer there.
It's good enough that the Servo team decided to use its client (and now do), but the approach I've taken (implementing the HTTP spec thoroughly, putting Rust's type system to good use) will take quite a long time before it's really polished (e.g. we must implement types for every type of header, rather than just using strings and leaving it to users to interpret them [often incorrectly]). It's still a little experimental, but it's the only really serious HTTP library there is for Rust out there.
Just wait until it's done. It will be a really great HTTP library.
Having done occasional database design work in the past, I can sympathise with the impulse to model things with properly restrictive types, and I'd be surprised if 10% of HTTP clients and servers could correctly quote on unquote the filename in a Content-Disposition header, but I have to wonder whether such a restrictive HTTP library would be very useful in practice.
For example, last week as a learning exercise I implemented an OAuth client, which involves adding a bunch of stuff to the "Authorization" header of an HTTP request, none of which was ever mentioned in the original HTTP RFCs, let alone specified. Likewise, the HTTP RFCs have a fixed and rather small set of verbs, but things like DAV add a bunch more.
How can you balance the reliability of strict typing with all the HTTP extensions that expect anyone can stick arbitrary strings anywhere?
An important thing with it is to support everything; for headers, for example, unsupported ones are of the enum variant `ExtensionHeader(~str, ~str)`, and with methods there's the `ExtensionMethod(~str)`.
Taking your example of the Authorization header: that uses the `credentials` type, defined in RFC 2617 (https://tools.ietf.org/html/rfc2617), which ends up thus:
But then, Basic and Digest come into the mix, and they've got data that should be treated as data rather than text. I'll probably end up with renaming the struct above to ExtensionCredentials (oh no! it doesn't have a proper name!) and using an enum:
enum Credentials {
BasicCredentials(BasicCredentials), // A new struct
DigestCredentials(DigestCredentials), // Ditto
ExtensionCredentials(ExtensionCredentials),
}
I've been playing in my mind with having traits to convert such things as custom credentials in some way without you needing to maintain it yourself in your own place, but it's not an easy problem however you dice it.
In the end, it is all about balance, as you say, and I'm not sure precisely where the balance falls, yet. But I know it uses the type system a whole lot more than almost all of the code that's out there.
I haven't taken a look at your library, but as a developer that has always wondered "Why use strings for these things that could be enums, contants, or types?" in API's, I appreciate your effort and your thoroughness.
That's been my feeling exactly, and why I was delighted that Rust didn't have HTTP support yet.
That follows through to many other aspects of e.g. web frameworks; there's a lot there where string typing is used. When rust-http is stable enough, I'll be getting on to my dream framework which will be astonishingly safe and bamboozlingly quick (to start and to run, if not quite to compile), incorporating and extending various ideas at present only present in Haskell frameworks and a couple of other similar language-frameworks (e.g. Ur/Web). It'll be fun!
I've been mostly a Python developer hitherto, but I'd never have tried something like this in Python—it simply wouldn't work. You need a type system like Rust's before it can work, but then it really works.
The answer is because that's what's in the spec. For HTTP, it's probably a mistake to hardcode header types. They are defined as key/value pairs of strings in the spec. There are a few keys that are specified, but how they actually work in practice (upper? lower? quoted?) is difficult to predict. There are just too many variations. So you end up with proper types for the most common ones, and then throw the extras into a separate "others" type. Which is great, except that now you have two places to check for things.
In this case (HTTP), it's easier (and more correct) to just leave them as stings. The general principle with network protocols is to be strict in what you send, and forgiving in what you receive.
The headers are data, not text. Somewhere along the way you'll need to interpret them; doing a good job of that at the system boundary is the only sensible approach. (It's not the approach the majority of tools have taken, but it is the only sensible approach). If it gets into the system as text, people will start pulling it apart in even worse and less consistent ways.
I agree with you that the parse behaviour for HTTP headers is poorly defined. That's something I'll be wrestling with all the time.
Supported headers will be in one place and uncommon extension headers in another. Such, alas, is life. But really, the only time when I would expect this to cause any trouble at all is when new headers are added. Compare it with things like the CGI standard and how it handles headers and you'll realise it's not such a bad system.
I should make it quite clear that the specs are (unfortunately) only a starting point for rust-http. Where there are deviations, more leniency may be added. But it'll be added thoroughly and properly.
I get a rather strong sense of déjà vu looking at lots of spray-http code: it's a pretty good model of what I was already starting to do or what I had in mind a lot of the time.
My own header definitions are pretty clumsy at present; I'm just about up to the stage of improving that with macros now. (I didn't do that to start with so that I could write a few and get a feel for what it would need to be like.)
Interesting. I took a quick look at the methods, and it looks like your library doesn't accept lower-case methods (which is technically correct). Made a bit curious, and so I tried google.com and apache.org, and indeed, both give an error with a simple "get /", but gives reasonable http/1.0 responses to a "GET /".
Method is one of the few places where the spec does indicate case-sensitive rather than the default of case-insensitive: From RFC 2616, section 5.1.1: "The method is case-sensitive."
This is going to be a problem for your users. RFC's are worth following when they represent a superset of the standard implementations, but when the RFC is more restrictive than what people actually use you're doing yourself a disservice by sticking to the spec.
That is something where I'll be needing to take care. Real-world usage will be very important. At present, for performance, it doesn't preserve the header value as it is reading it and so an invalid value is entirely lost. (Performance meaning you don't need to do an extra heap allocation for each header.) Providing the raw value of the header when parsing fails is something I may need to do; I'm not yet sure. I already know that "invalid" values for the Expire header (especially -1, as noted in RFC 2616) are normal (and so the Expires header has been switched back from being a Tm to being a ~str for the moment).
As I get further along, I intend to use the data from the Common Crawl, which fortuitously includes response headers, to see how my validation goes. Of course, that's only a small set of the real-world headers (cache ones in particular will be scarcely stressed by that at all). Validating request headers will be harder; I've still got to figure out what to do about that.
In the end, though, I'm determined that it will work and work well. Servo using it (and thus demanding robust HTTP support) will help with that goal.
Something I discovered a few hours ago, reading the specs: I believe this header should be valid, with the value being interpreted as the weak entity tag ``Super Encoding™``. I wonder how many clients or servers would support it? No idea yet.
I don't know the Rust type system well enough (nor the internal representation of strings), but if strings allow you to reference sub-strings without re-allocating, then you only have one contiguous section in memory for headers that you can "point" to for the values (maybe this is too C-like to be possible in Rust). My recommendation (feel free to ignore it) would be something that supports typed headers as well as arbitrary string headers, because the ability to fall-back to strings will make your library usable in a much broader sense.
I'd need to think about whether that's feasible or not in the overall design. (Locally, it'd work fine, but I don't think I want to be keeping the raw value around once it's validated.)
Arbitrary string headers are essential. Conversion between the typed header and strings is part of the design (though only partially implemented at present). As for other extension-headers (as they are designated in RFC 2616), that's the header enum variant ExtensionHeader(~str, ~str).
I'm not sure that's the case when it comes to http methods though -- I thought it was, but seeing two pretty high traffic sites, running different "real-world" web servers both give errors on this -- apparently it's an area in which we've already moved a bit away from "be lenient in what you accept; be strict in what you send".
It's been so long since I've played with netcat and HTTP that I can't remember if 'get' vs 'GET' "used to" work or not...
Still, might be something that should be possible to toggle with a flag (case insensitive parsing on/off or something like that).
Might also be useful to keep in mind that there are very real differences between HTTP/1.0 and HTTP/1.1. For browser-facing stuff, 1.1 should be fine these days(?) -- for apis etc, I don't know if "proper" 1.0 support makes sense or not.