How to Build a Data Startup

tricky · on Nov 3, 2010

My wife was part of a data startup. They'd take freely available epidemiology data, massage it into a spreadsheet, and sell it to pharma. They do really well, but only because they know their data and they know that their customers know that they know what they know. you know?

exactly.

zackattack · on Nov 3, 2010

would you kindly go into greater detail?

tricky · on Nov 3, 2010

you develop a drug that will work on a couple of different diseases (or, more likely, different cancers) You can only afford to run a clinical trial on one disease. How do you choose? You hire a company who has a good handle on data (A lot of which is gathered by the CDC and other government orgs who make it freely available.) The data people mine that data and come back to say, "x% more people die from disease 1 than 2, but insurance will reimburse more for disease 2, so go with 2."

- totally making that up, but it is something you could feasibly derive from the data. Mostly it was along the lines of "teens get acne, american teens would rather use a pill than a cream so develop a pill for them, cream for the rest of the world."

wlievens · on Nov 4, 2010

So it's market research as a product rather than a service, so to speak?

psynix · on Nov 3, 2010

Much easier to read on the original site: http://radar.oreilly.com/2010/10/strata-week-building-data-s...

physcab · on Nov 4, 2010

I think there are multiple ways to start a startup based off of data.

1) Take other people's data and learn how to represent it in a way that will let them understand it better. See FlowingData, Stamen, and NY Times graphics http://www.nytimes.com/interactive/2009/03/10/us/20090310-im...)

2) Take other people's data, glean information from it, and offer a new service based off that information. See FlightCaster and TweetFeel

3) Develop tools that will allow people to play with their own data and do their own analysis in-house. See Datameer, Google Prediction API, and Palantir.

Whatever the route, you'll probably need someone who is comfortable dealing with databases, a graphic artist to make the information pretty, and someone with the algorithmic knowledge to capture new insights from massive amounts of data.

anthonyb · on Nov 4, 2010

This article is really just a rehash of a tiny part of O'Reilly's roundup: http://radar.oreilly.com/2010/10/strata-week-building-data-s...

You're much better off reading the originals, which I think have already been posted on HN anyway:

http://datasyndrome.com/post/1375987697/analytic-product-tea...

and

http://petewarden.typepad.com/searchbrowser/2010/10/how-to-t...

il · on Nov 3, 2010

I'm wondering how they determined that a data startup needs exactly those three founders.

I think a data startup needs two founders- a hacker to collect and analyze the data and a business guy to provide actionable recommendations and sell it.

But then again, I'm currently a single founder working on a data startup and wearing all of the hats, so what do I know.

Shameless Plug: Anyone want to analyze huge datasets and create recommendations with me? Email me!

anthonyb · on Nov 4, 2010

It's much clearer in the original article - this one is really just a rehash/mix and match.

http://datasyndrome.com/post/1375987697/analytic-product-tea...

rjurney · on Nov 3, 2010

Mostly I determined it through failure.

benzheren · on Nov 4, 2010

Visualization is a key part of the data, highly recommend all Edward Tufte books on information visualization.

earlyresort · on Nov 3, 2010

An ok article, but if you're seriously considering building a data startup, the #1 most-important thing above all is to know exactly how to scale beforehand. Bad architecture is expensive, and trying to switch architectures midstream is a nightmare.

il · on Nov 3, 2010

I'm going to get flamed for this, but I think, even for a data startup, worrying about scaling before you have users or traction is premature, especially with how cheap hardware is becoming.

Case in point: I'm currently hacking together an inefficient, unoptimized prototype analyzing pretty large datasets on probably the worst architecture for this kind of thing known to man, and the whole thing still runs pretty well on a single $50 VPS.

earlyresort · on Nov 3, 2010

Do you have full control over the amount of data your system is taking in?

The startup I founded had analytics code in a ton of iPhone applications and was handling the load just fine right up until the day it suddenly wasn't. By that point we had customers who relied on us, and we had to deal with it very quickly. Not fun. And there's certainly more to scaling than just cheap architecture. We thought EC2 would handle the overflow until we unexpectedly became completely I/O bound. Firing up a few more instances can't fix that.

If you're just running some scraper and can control what you're taking in, that's a completely different story.

il · on Nov 3, 2010

You're absolutely right, I hadn't considered analytics as an example.

Some data startups I've seen as well as my own project take in existing data sets and simply generate reports from it for customers. Makes it a lot easier to scale.

SkyMarshal · on Nov 3, 2010

I think there's a happy middle ground between premature optimization and naive development.

While the former should not be allowed to impede one's progress toward a MVP, real customer feedback, and the potential need to adapt or pivot, neither should one ignore early optimization decisions where they are inexpensive and may only minimally impede (if at all) that progress.

Being able to recognize the difference is a talent that comes with experience.

Maro · on Nov 4, 2010

I agree. The first step should be a dataset that fits into a large CSV file or a regular DB, and play around with Gnuplot 'til your eyes pop out.

asanwal · on Nov 3, 2010

Two things (from perspective of a data startup founder)

- Datasets needn't be huge to be high value. In these instances, scale is not and should not be your primary worry (esp day one). - Figure out if people want your data before worrying about scaling it.

caffeine · on Nov 4, 2010

To be honest, that's probably #2 most important thing. #1 being, of course, that people want your data.

chewxy · on Nov 4, 2010

I was once advised that one should go look for customers who want to buy your data before scaling and planning to scale... it's very offline-ish and hack-y but I suppose it was good advice

gsteph22 · on Nov 3, 2010

The world is data. This is a killer article.

dstorrs · on Nov 3, 2010

Personally, I found it a bit fluffy. It pretty much boils down to:

"Data startups need three bodies (hustler, designer, prodineer). Talk to customers early. Here are the levels of knowledge: 1) data, 2) charts, 3) reports, 4) actionable analytics; higher numbered levels are more valuable."

rjurney · on Nov 3, 2010

Some other posts at www.datasyndrome.com are less fluffy on the same topic. The 3 founders one was written to suggest a 3rd founder to a startup. Data hackers tend to underestimate the importance of hustling and design.

sparky · on Nov 4, 2010

Surely they have numbers to back them up on that.

rjurney · on Nov 4, 2010

I don't think data on this is available, just experience.

kno · on Nov 3, 2010

How to present and sell the data is key.