Hacker News new | past | comments | ask | show | jobs | submit login

I took a Sawzall class for a week at Google along with my team, our manager had requested it because there was a lot of things that we wanted to do with logs in the long-term, which were quickest to do with Sawzall. The people who taught the class were awesome, but I can't help feeling that they were a bit apologetic for 1) knowing so much about the language, 2) having to teach us, and 3) not being able to change the language in any way. But maybe that's hindsight and altered memories based on the dinner and drinks we shared after the week of classes.

The opacity of the language lead to a shared conclusion of those in the class (some of whom were taking it as a refresher), that all of the unique Sawzall code at Google had already been written in the first few months of it's use, and everyone else had just been copying and pasting snippets from everyone else's scripts.

I can understand why Google released it, it's the start to a halfway decent map-reduce implementation, having a low-overhead startup, quick runtime, etc. (compared to Python, which had been an initial logs processor, but which was punted for logs-processing mapreduces thanks to it's relatively high startup costs compared to processing time). But with things like Hadoop (and it's support for arbitrary languages for operations), I can't help but feel like this is a little late to the open source game.

Also, back at Google, I had the start to a project to translate a subset of Python to Sawzall in order to allow for people to not have to suffer, and potentially to write better logs processing code. Left before even getting close to finishing it.




I'm wondering if this is the start of a set of open source lunching from google of its core tools. AFAIK, sawzall without mapreduce is like a car without the engine, but anyway I'm very pleased to read the code and start trying the language. Kudos to Google!


Google's implementation of MapReduce is so tightly bound to their internal infrastructure (GFS, BigTable, etc.) that opensourcing wouldn't do anybody much good.


Do you mean an engine without a car? It seems like you mean sawzall is only useful for mapreduce.


Sawzall as a language is quite a bit uglier than the vast majority of general purpose languages. Couple this with the read/emit nature of the language, and it's either useful as a stream processing language, or as a step in a mapreduce chain.

Given how easy other languages are at processing streams, tagging output, etc., and that Sawzall doesn't really have an idea of shared state between "records" (aside from data emitted), it's hard to find things that Sawzall is good at other than mapreduce.


Also, the language was purpose-built to be used as a step in mapreduce, and not as a general purpose language.


You can always use it with Yahoo!'s s4 (released yesterday). http://wiki.s4.io/Manual/S4Overview


Or I could use any one of a dozen other languages that are more convenient to use, already available on my system, already works with S4, and with a syntax that doesn't make me want to cry :P


Buzz post from a Googler on the release: "We have inflicted Sawzall on the world."

http://www.google.com/buzz/100587561646339426146/2Ehn2s7Nf1D...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: