My wife was part of a data startup. They'd take freely available epidemiology data, massage it into a spreadsheet, and sell it to pharma. They do really well, but only because they know their data and they know that their customers know that they know what they know. you know?
you develop a drug that will work on a couple of different diseases (or, more likely, different cancers) You can only afford to run a clinical trial on one disease. How do you choose? You hire a company who has a good handle on data (A lot of which is gathered by the CDC and other government orgs who make it freely available.) The data people mine that data and come back to say, "x% more people die from disease 1 than 2, but insurance will reimburse more for disease 2, so go with 2."
- totally making that up, but it is something you could feasibly derive from the data. Mostly it was along the lines of "teens get acne, american teens would rather use a pill than a cream so develop a pill for them, cream for the rest of the world."
2) Take other people's data, glean information from it, and offer a new service based off that information. See FlightCaster and TweetFeel
3) Develop tools that will allow people to play with their own data and do their own analysis in-house. See Datameer, Google Prediction API, and Palantir.
Whatever the route, you'll probably need someone who is comfortable dealing with databases, a graphic artist to make the information pretty, and someone with the algorithmic knowledge to capture new insights from massive amounts of data.
I'm wondering how they determined that a data startup needs exactly those three founders.
I think a data startup needs two founders- a hacker to collect and analyze the data and a business guy to provide actionable recommendations and sell it.
But then again, I'm currently a single founder working on a data startup and wearing all of the hats, so what do I know.
Shameless Plug: Anyone want to analyze huge datasets and create recommendations with me? Email me!
An ok article, but if you're seriously considering building a data startup, the #1 most-important thing above all is to know exactly how to scale beforehand. Bad architecture is expensive, and trying to switch architectures midstream is a nightmare.
I'm going to get flamed for this, but I think, even for a data startup, worrying about scaling before you have users or traction is premature, especially with how cheap hardware is becoming.
Case in point: I'm currently hacking together an inefficient, unoptimized prototype analyzing pretty large datasets on probably the worst architecture for this kind of thing known to man, and the whole thing still runs pretty well on a single $50 VPS.
Do you have full control over the amount of data your system is taking in?
The startup I founded had analytics code in a ton of iPhone applications and was handling the load just fine right up until the day it suddenly wasn't. By that point we had customers who relied on us, and we had to deal with it very quickly. Not fun. And there's certainly more to scaling than just cheap architecture. We thought EC2 would handle the overflow until we unexpectedly became completely I/O bound. Firing up a few more instances can't fix that.
If you're just running some scraper and can control what you're taking in, that's a completely different story.
You're absolutely right, I hadn't considered analytics as an example.
Some data startups I've seen as well as my own project take in existing data sets and simply generate reports from it for customers. Makes it a lot easier to scale.
I think there's a happy middle ground between premature optimization and naive development.
While the former should not be allowed to impede one's progress toward a MVP, real customer feedback, and the potential need to adapt or pivot, neither should one ignore early optimization decisions where they are inexpensive and may only minimally impede (if at all) that progress.
Being able to recognize the difference is a talent that comes with experience.
Two things (from perspective of a data startup founder)
- Datasets needn't be huge to be high value. In these instances, scale is not and should not be your primary worry (esp day one).
- Figure out if people want your data before worrying about scaling it.
I was once advised that one should go look for customers who want to buy your data before scaling and planning to scale... it's very offline-ish and hack-y but I suppose it was good advice
Personally, I found it a bit fluffy. It pretty much boils down to:
"Data startups need three bodies (hustler, designer, prodineer). Talk to customers early. Here are the levels of knowledge: 1) data, 2) charts, 3) reports, 4) actionable analytics; higher numbered levels are more valuable."
Some other posts at www.datasyndrome.com are less fluffy on the same topic. The 3 founders one was written to suggest a 3rd founder to a startup. Data hackers tend to underestimate the importance of hustling and design.
exactly.