Hacker News new | past | comments | ask | show | jobs | submit login
The cold start problem: how to build your machine learning portfolio (towardsdatascience.com)
340 points by sharemywin on Dec 10, 2018 | hide | past | favorite | 99 comments



I agree with the general point of this article. Showing that you have the skill to accomplish a job effectively will get you a job most anywhere.

There's a major issue when this is applied to a certain class of problems in ML and Data Science that people tend to ignore.

If you could get a job as a civil or mechanical engineer (building bridges or whatnot) by showing that you built a small bridge in your backyard... We'd have some unstable bridges.

If hospitals just let residents run the hospital... We'd have a lot of mistreated illnesses.

If you could show a realty company that you can build a recommendation engine and they hire you to build their advertisement algorithm... Suddenly you're breaking housing discrimination laws.

I am all for folks being able to get jobs from their cool projects. But we need ethical standards and educational standards before folks are given large, real-world problems to work with.

We need to take a page out of engineering and medical playbooks and build official education or apprenticeship requirements. We need to have licenses that can be revoked if someone fails to follow ethical or quality standards.

So - love creative people getting jobs. Now give them a high-quality education program along with those jobs.


Programming is the last decent job that doesn't have a bunch of artificial barriers to entry put up to protect privileged people from competition from the industrious poor. The day I need a license from the government to own a compiler is the day I check out of this life.


Does it bother you that every day, your personal information is transferred between a massive number of services created, maintained and secured by individuals with no security or ethical training whatsoever?

That personal information is then sold to even more organizations that need no scientific or statistical basis, again with no ethical oversight, to make decisions that impact your future and the future of those around you.

Picking up a compiler will never be banned anymore than it is banned to pick up a scalpel. But to call yourself a doctor you'd accept the responsibility (ethical and professional) that it entails. To call yourself a professional engineer you accept the responsibility that it entails. You hold yourself to a certain level of quality and ethics or you give up your right to hold the title.

Without a standard level of responsibility there is no way to build any more of a foundation than we have. We will always have the rickety mess of data leaks, corruption, and general lack of accountability that we have now.

We will have employers that can pressure programmers to go directly against their morals and ethics to get what they want.

I'm not proposing that the poor be excluded from the field. I'm proposing that you not only give them a job but give them a pathway to a quality, independent education that will continue to prepare them for the major challenges they will be expected to face and the ethical questions that they'll have to make a stand on.


> Does it bother you that every day, your personal information is transferred between a massive number of services created, maintained and secured by individuals with no security or ethical training whatsoever?

No, it doesn't bother me at all.

What does bother me is large companies such as Google have a business model that is based on them gathering as much information as possible about everyone, and then using that information to manipulate them, typically into buying things.

Credentialism will not solve that problem.

> We will have employers that can pressure programmers to go directly against their morals and ethics to get what they want.

Credentialism in programming will not solve that problem either. What will solve it is:

- a reduction in income and wealth inequality, and a basic income, so that employees have more bargaining power with respect to employers

- laws against big companies behaving badly


>- laws against big companies behaving badly well, actually big companies do in fact use their influence to suborn those laws to get away with bad behavior and create barriers to competition. It would probably be simpler to put an upper limit on the size of any organization (government too) and use vigilante justice to enforce the limit, HHOS.


Fair enough, but as a member of a protected profession myself (engineering) it's not the regulations per se that prevent people from gaming the system - it's a culture of professional pride. Where I practice we have this weird "swearing in" type ritual developed by Kipling back when the profession was born. I think it actually does more to ensure high standards than any legal threat the society can bring to bear on bad actors.


Hope that by ‘this life’ you mean the life of a software developer.


That sounded bad. If programming becomes illegal, I'll focus on graphic design. Short of a bolshevik style coup, they won't be able to ban that in the United States.


Indeed - look at the case in Law where in the UK at least the pupillage requirement (essentially an internship/apprenticeship post university IIRC) ensures it remains the preserve of the privileged - especially the upper echelons of the field such as barristers and judges.

We see the same with the iron grip that the medical council has on the number of medicine students (imagine if we fixed the numbers of software engineering students and made it difficult to register foreign qualifications).

I agree that educational standards are important - but if that isn't coupled with improving access to higher education (including post-graduate education) then it's basically just saying only the rich should be allowed in.


I definitely agree that higher education should be accessible to everyone and for free.

I feel like the conversation should be

- "We need ethical and professional standards because our current way of doing things isn't working out too well!"

- "But that will exclude people who don't have access to education!"

- "Then we should give everyone a means to get the education. Because it is an unlimited resource that can be freely given without being taken from someone else and enriches all of humanity!"

- "Yeah! Let's do both those things!"

At least that's how I had hoped the conversation would lead.

But we can't throw our hands in the air and say "Accessible education is impossible and so we can never have standards of ethics or quality". That is definitely a dead end for society.

Anywho - I absolutely agree!


This is the problem of a lot of software development. Look, this bright young man just built a wooden box all by himself. Let’s employ him to build our new shed. A shed is just a big box, isn’t it? This bright young man just built a shed. A house is just a big shed, isn’t it? He’ll figure out isolation, electricity and plumbing when we get to those parts...


You know, Hercules should never try to catch that turtle, because he would first have to walk haf the distance to the turtle, but in order to walk half the distance he first must walk 1/4 the distance; but first he needs to walk 1/8...

Building ML projects at home is nothing like building a box to building a shed; it's more like building a shed to building a house.

Sure, building a house is more complicated, but you won't be building that house alone, and if someone hired you to build a house because you were good at building sheds, I'm sure you'll start your job as a junior house builder, not master architect.


There were budget cuts. Teams were shuffled around.

You're running the team now. Your colleagues have only ever built boxes. Go get em', master architect!

This is my (hopefully humourous but actually taken from my experience) way of saying that companies will often do what is most immediately profitable rather than what's best in the long run (for humanity or themselves).


If they are building houses where the roof caves in after the first rain, they won't be in business long.


Why? Nowhere was it written in the requirements that the house must be able to withstand rain.


woodpeckers. civilization.


tbh your analogy doesn't work. modern software dev would mean that if the house can get by without isolation and plumbing until it needs to, then it will. when the requirement comes up then extra dev resources will be hired to plug them in. as long as you managed to get the frame up and the wiring complete, the other components are superfluous as long as someone is happy to stay in that house. then you can add isolation, then you can add plumbing.

this is part of the "if you're not unhappy with the first version of your product then you've launched too late" philosophy and it does work in non critical sectors


And you are kind of illustrating my point. Building a house requires a lot of additional considerations. Like what kind of daylight you get in different rooms, that maybe the toilet shouldn’t be in direct connection to the dining area, how the kids’ rooms are located relativ to the living room. And you will probably have to redo 70% of the work with the kitchen if you don’t start with making sure that the plumbing is in place. (Someone who knows more about houses than I do can come up with much better examples. For this exact reason I’m consulting an architect before I start renovating my house, even though I’m perfectly competent to both remove and put up a drywall myself.)

Of course people learn with experience, and luckily redoing things in software development is cheap compared with when building houses. But that’s the only reason we get away with it.

My point is, we would build better and cheaper systems if we from the start acknowledge that we have to take into account completely different sets of considerations when we move up the scale. A shed isn’t just a big box, a house isn’t just a big shed, etc.


> Of course people learn with experience, and luckily redoing things in software development is cheap compared with when building houses. But that’s the only reason we get away with it.

Here in software, we've turned that into a positive thing and made a philosophy out of it. How do you know the toilet shouldn't be in the kitchen? Maybe the users like it? You know what, the data actually shows that in houses where the toilet is next to the kitchen stove, people spend (on average) more time in the living room, thus raising the core metric of happiness.


If you were building houses, who would you choose

(a) Bright young man who built a wooden box

(b) Bright young man who says he has read for years about building houses

I'd pick (a) over (b) and put him on a team where they build houses so he actually learns on the job.


SF has a homeless problem. Maybe they would prefer a big box to life on the street.


Lol sounds exactly like my life as a first year dev at a small startup


This reminds me of the (relatively) recent post about breakdowns and descriptions of major and well-crafted applications.

Link: http://aosabook.org/en/index.html

A lot of Computer Science graduates, and/or self-taunt engineers are missing the commentary of 'masters' or senior people who can say why and why not this or that way.

Just like the difference between building a bridge across a river and in your backyard is far more than just scaling up, seeing design patterns and common pitfalls writ large is one more step towards proper education needed for a modern programmer IMO.


A good start would be only actual professional engineers calling themselves engineers.


In Germany enigneer ("Ingenieur") is a protected profession title and I was granted the right to use it with my B.Sc. in Compsci (though a B.Sc. in Germany is also a lot more focused than undergrad in US afaik). I have no idea how many people in IT use the enigneer title without being entitled to, because in contrast to e.g. construction engineers there is no "Kammer" (guild?) for software engineers.

There is also higher need for programmers ("developers") than for actual engineers.


You'd need a professional accreditation body for software engineers then.


Wasn't this tried/mooted a few years ago but got nowhere in the end? But you also have to understand that for many/most of these types of professional bodies in other fields, you need to have a basic BS type degree to be accredited and this is where I think it fell down flat as quite a few SWEs (compared to other fields) were self-taught, dropped out etc. and didn't necessarily have a college degree. Maybe they can have a different set of criteria for SW field.


> quite a few SWEs (compared to other fields) were self-taught, dropped out etc. and didn't necessarily have a college degree.

Accreditation isn't just for enforcing a minimum standard of education - it's primarily so that people can actually be held responsible for bad decisions, and so that they have leverage over their bosses (it's harder to find an actual engineer or a doctor willing to do unethical things for you, when they could lose their license over it).


I totally agree with your point. But unfortunately/ fortunately, those other fields you mention do require a degree as a minimum. But again, they also tend to have larger consequences (usually), compared to a few bad lines of code (usually).


Companies want people that can come in and hit the ground running and are self-starters.

It is always better to go into the interview, meeting, pitch with projects that are shipped or working to demonstrate.

For instance, if you want to get into the game industry, build games.

Same with any field, when you create/build you can rise above education, experience and more competitive metrics.

Created/functional projects, and especially shipped products, are the best qualifications.


Instead of the n-th Tutorial website, I would love a list of worthwhile projects one can build with increasingly tougher level of difficulty, start with MNIST and move up. You don't have to hand-hold me by providing a sandbox, but suggested frameworks are welcome.

Ending the list with a couple of research/unsolved problems is perfect.

To me, this is the perfect way to learn, almost like feel your way blind in a cave and reaching its extents.


OpenAI has "Requests for Research" from easy to very difficult:

https://openai.com/requests-for-research/

https://blog.openai.com/requests-for-research-2/


Woah. This is a good list, I just kinda assumed it was all unsolved problems (“request for research”) but it’s not!


Here's a collection of project ideas:

http://nifty.stanford.edu/


I've hired dozens of people in tech and building something that works and is related to the job you want is so important it's hard to overstate. IMO, it's far more important than what school you went to and maybe even your employment history. Tech moves so fast that if your portfolio is something you built 3+ years ago, it can be difficult to find a job other than at companies that can afford substantial training programs.


In the gamedev, I would not want to work with someone just because he has built a game on his own. Gamedev is notorious for having a low barrier of entry and then there is an exponential learning curve. I'd rather work with someone who has a deep understanding in some specific area or has worked on a library of some sort. I think there are a lot of other technical fields that work the same.


Shipped games both on teams/companies and individually are game development gold. You will be in if you have shipped titles especially individually or with a small team that is a decent game.

If you have no shipped titles, the second best way is at least game components, networking, game mechanics etc that you can demo.

Personally I only like to work with people that can take a project to ship and are product focused. Gamedevs that don't know the depth and troubles of shipping a title will have to learn that on the company otherwise and usually don't know when to cut/scale correctly the gameplay/mechanic/product etc. A big problem with AAA development and broken systems is the heavy specialization, it is why games come out bugged as there is very little ownership or people that can ship a game solely and understand how all parts connect.

Unless you are only working on triple-A titles, most game companies are small/medium and need shippers that aren't as specialized, especially mobile. Though most people that go into AAA games think they will get to make their game, but in actuality they work on such small parts that they eventually break out into their own smaller studios or indies to build games.

Everyone at a game company wants to build their games, it is why people go into games. For success you have to have shippers who understand the whole shipping adventure that want to come together to get games out the door that each can have input on as a team, but also have the itch scratched that they already have shipped some of their own games and an outlet for creativity if specialization is too heavy on a project. The best game studios and games are developed by smaller teams and key individuals that understand this, especially during pre-production into early production/prototyping/game mechanics, during production and post-production that is where you want more specialization but even then you want people that have shipped.


I think that experience in shipping and experience in maintaining a project is largely different. I'd rather work with someone who has maintained one live project, rather than with someone who has shipped 5. Especially on mobile. Shipping a simple game is not hard. Understanding how much you have fucked up with load times and fps is valuable. Understanding why certain technical and architectural decisions are wrong is valuable. I have rewritten big parts of the game just to fix performance issues even though the team has shipped 5 games before.


> I'd rather work with someone who has maintained one live project, rather than with someone who has shipped 5.

I agree, I coupled shipping with maintaining as I see that as one. Just getting it out in the world isn't enough, updating it, not breaking it, smooth updates for users with little friction, not breaking profiles, no crashes, upgrade/update testing, library/version updates etc are all massively important.

Definitely better to have someone that has done some production on a live state of the game over someone that shipped and forgot, but both are better than no shipped title or production.


I agree with you, but would argue that this doesn't rise above experience, this shows experience. The value of having shipped something, maybe even supported it and maintained it is invaluable experience as a software developer, and not something taught at school, at least not my university.


> doesn't rise above experience, this shows experience.

Experience is highly valuable but concrete examples of that which are done individually or completely self propelled, that is the way to be. Creating/shipping/projects to show does demonstrate experience and skills.

There can be people that have experience that can't ship, but if you created/projects/shipped and have less experience it truly demonstrates that you do have experience enough to be worthwhile and improve the place you are going to.

I'd argue even when I have had to interview or been on the hiring side, showing projects beats experience and education, it can also make interviews focused on your projects and what you bring rather than just whiteboard and talking about what you have done.

For instance when I went into gaming I had multiple games made for every place I was aiming for, i.e. a racing game at a racing game company, promotional games at advertising/promotional jobs, and networking/multiplayer shooters/online for more hardcore gaming on console and mobile.

Ultimately projects are the only way to get clients as a freelancer/small business pitching to clients, so that also works when going for full time employment and jobs.


Is this true outside of the SV bubble?


Certainly it's true everywhere. It's also not necessary in most places (including many places in SV). But you still want to convince the company that you can learn quickly and well enough to be worth it. If you don't have any of the traditional ways of signalling that (fancy degree, reference from somebody already in the game, etc), then this is probably the best way to go.


In addition to development, any creative field is this way including movies, writing, gaming, design etc. It is also true of vocational skills and probably even legal/accounting/executive and other corporate fields.


I learned two things from this -

1. Apparently modeling is a solved problem. No need for any knowledge of math/stats folks, just use the latest python packages.

2. Data collection is more important than actually knowing how a model works and even more importantly doesn't work. What happens when he is hired and can't put together a reliable model?

I think his approach is solid (using projects to learn and being ambitious), but it feels like trying to run before you can walk.


Like Jeremy Howard (fast.ai) says, nobody learns baseball by first studying several years how to build baseball bats, optimal playing strategies, or managing baseball teams. Nope, you're given a bat, told where to stand, swing the bat and try to hit the ball. Suddenly, you're playing baseball.


This feels inaccurate, or misleading because you typically learn baseball when very young. Going out and swinging wildly without knowing how is hardly "playing baseball."

When I tried to learn how to golf a lot of time was spent on proper form and club choice, not just "swing!" I swung as much without a ball in front of me as with one.


did your instructor also sit you down and instruct you on how the materials involved in your club make it possible for you to swing?

the club is just the component you need to launch the ball towards the hole using your skill. you don't need or want to know what the club is made of until you think of buying a more expensive club(which might be never)


My problem was not with Alex or his approach (which I agree is a great way to get started as stated in my original post).

It was the narrative the article was pushing about how knowledge of the field doesn't matter, just strong work ethic and selling yourself visually to potential employers.


I get where you're coming from (and I was convinced that I would dislike the article when I realised what it was about).

However, there's some good advice: - get real data

- clean it, play with it, build models

- focus on the cleaning, as that's what the gig normally is (and damn right too, your models will be way better if you've taken the time to understand your data).

I did find the and then he got a job part annoying, but the post was much better than I expected (relative to other towardsdatascience posts).


To be fair data collection is a really important skill as well, but hardly counts as ML/AI imo.

For me it feels more like "software plumber".


This part is often referred to as "data engineering"

Though I admit - I love "software plumber" and want that as a job title some day! Sounds very cyber-punk


I am not afraid to outsource domain-agnostic projects to smart junior practitioners, though. Your conclusion is a cultural barrier where metrics is the real king. Domain expertise is absolutely needed in a few specific cases only, the rest being gatekeeping.


I was maybe a bit too snarky, apologize if so. But the crux of it is that his model never achieved > 50% accuracy. It was never necessary because he already got the job. But the thing is, that is the hard part.

Closing those gaps in not just accuracy, but also generalization is the data science portion of the task (and requires much more knowledge than what is demonstrated by this blog post - although they could of left out a lot of detail). They make it seem like if he just had a little bit more time this would of been straightforward. But I am not sure about that.

I am all for giving junior practitioners a chance. But this is like hiring an english major for aerospace engineering because they built a model airplane in my opinion. But maybe I vastly underestimate the amount of extremely low hanging fruit out there for ML projects.


You’re welcome and not snarky at all. I am not sure you work with commercially-driven ML projects? It really depends on the metrics only. I can’t trust a ML black box for cancer screening (both false negatives and false positives have a big cost) or complex industrial failures (stopping a plant starts a lot of expensive compensating manouvres) but nobody sane in his mind has a real problem with 88, 90 or 92% accuracy for online retail recommenders (all the field is a kind of magic beyond a certain baseline and money never earned is not money lost from a bookkeeping perspective) or language translation (which is still human-proofed wherever it gets legal value of any kind). Hope I made my point clearer. Cheers.


Do you have some to assign right now? :)


The title of this article is misleading. I thought it was a discussion of the "Cold Start Problem" (https://en.wikipedia.org/wiki/Cold_start_(computing)), which is a common technical challenge. Instead, it is a recollection of two stories for how to get a job in ML.


It's a playful title -- using a term from the field in a new way.


It is using a term from the field in an incorrect way.


You've just failed the Turing test :)


Huh that’s an interesting way of thinking about the Turing Test. State something that is categorically wrong according to a dictionary lookup (“cold start problem” means X in machine learning and ML only) and see if the respondent can be maliable enough to use that in a different context. Course that’s the definition of a Turing test but in a way I hadn’t thought about.


Well, you know, this common technical challenge got its name from another problem.


I thought the title expressed what the article was about very clearly, and I expect most people will feel the same.


Omg yes ! I was confused having read the first paragraph and did not understand why it was titled this way.


One tricky thing I've noticed when trying to hire ML people is that it's very difficult to tell when someone is okay at ML and when someone is great at ML from their past projects. Because machine learning is just statistics, it is often resilient to errors in design and implementation. You can mess things up and still get reasonable (but suboptimal) results.

This means that someone can easily claim "this problem is difficult to learn for machines" when they fail or claim "we got X% accuracy look how great we are!" when they do okay. But a really good engineer or scientist would have succeeded in the same task, or have gotten X+10% accuracy with the right models, data, or engineering.


On my resume, I compare my results to either the previously implemented model or the state-of-the-art for that specific problem. If you bring someone in for an interview, ask them what the previous best was.


This is also a great way to become a better engineer - a lot of jobs are improving existing systems, so you get a different category of lessons building something from scratch to completion. Doing a write-up at the end is also teaches you to communicate better.


The number of prospective employers that will actually pull up your github / portfolio project website / etc (let alone read it) is not especially different from zero.

It's important to have stories to tell during an interview, but getting the interview is the hard part.


Getting an interview when you have a consequent Github portfolio is very easy.

Prospective employers not only do actually pull up your github profile, many of them will find you because of it.

As an aside, writing this makes me realize how successful Github has been at implementing their motto of "social coding" when none of us had any idea wtf that could actually mean or look like.


I'd be interested in the numbers on this. To get a rough number, I typed "data scientist" "San Francisco, CA" into indeed.com and it's reporting 2,832 job listings.

So of those 2,832 openings that companies are trying to fill, how many employers are out actively recruiting candidates based on their github profiles?

It seems to me that Github / social coding is an interesting mix of networking (the social kind) and coding (which demonstrates abilities), but like most networking, it's only one of many ways into the door, and it's unclear how many of the 2,832 positions will be filled via that particular flavor of networking. My guess is that it's the minority.


This is false at the company I work for now (an large enterprise) and was false when I was hiring for my startup.


I'd be curious to hear your experiences hiring - I've been on the hiring side in the chemical industry, but not in data science.

Let's say that the funnel of applicants for one position starts with 250 resumes. How do you proceed? Among the applicants with github profiles, are all of the github profiles looked at?


I had a similar experience with a personal project. I scrapped NFL player/game stats dating back past 2000 and spent a bunch of time on the data, feature engineering, and modeling. Although my original goal was to beat vegas models (got to even in some metrics, but not enough to cover the juice), it ended up being the best line on my resume. Every interviewer during my last job search asked about it, and it was very easy to talk about.

I did make money in the end however, but only because I was buying bitcoins to gamble with in the 2017 season.


Serendipity? Or just good planning.


There's another important trend in these two examples that generalizes well beyond just machine learning.

If you show a willingness to scrape resources together to work on difficult problems, that's scrappier than the majority of your competition for the job. Startups love that.

Of course actual technical skill matters, but in my experience, the willingness to reach for aspirational goals is rarer than baseline competence, so I'll happily interview someone who shows that specific trait.


> Alex planned to improve his accuracy, of course, but he was hired before he got the chance.

Reading these posts I get the impression people prefer portfolio projects over studying the mathematical fundamentals; "dazzle hiring managers with these three easy tricks!" Seems like a great strategy to getting hired, but one wonders how long people using these lifehacks persist in the role.


(not an ML person, take with grain of salt)

I think someone without experience training a model to high-accuracy might struggle at a startup where they're the first hire in the department, but at a larger company, presumably there are senior folks to help newcomers along. I think having motivation is a much better signal, and you need to be motivated to develop a side-project of significant magnitude.


> Reading these posts I get the impression people prefer portfolio projects over studying the mathematical fundamentals

It didn't appear as though these two didn't train their mathematical fundamentals though. I believe the gist here is to find an interesting project tailored to the field you want to work in, and make it visual for the non-technical interviewers. Once you do get their attention, you would naturally need to pass a real DS interview, which would test your mathematical fundamentals.


I had a hunch. I'm sitting here perusing theory after theory for years now. I need to just sit down with some data and a python ide. It's more fun that way anyway. This doesn't mean I think I'll get or even want a job, but I'll get a better sense of what the work is and how to best use it.


Side projects directly related to the area someone is interested in getting a job in are very useful, or those that use trendy technologies.

However if it’s in unrelated technology (or no longer trendy), then I’d argue they should be removed from the profile altogether. These might raise unnecessary questions and trigger biases.


>But it was similar enough that they quickly asked Ron to make his repo private.

How is this infringement? Why did he oblige? Unless he got paid by them, he should not have obliged with their request.


Ron wanted a job with the organization that asked him to make his repo private. Cooperating is the best strategy given the goal. Ron can always make his repo public later.


Seems to me more of a case where companies no longer train grads, and instead tell them to work on a proof of concept on solving a hard problem without paying for it, a sad state of affairs.


I disagree.

I think it is absolutely immoral (and dumb from an IP risk point of view) for companies to try to get interviewees to solve real problems for free.

This is different though: this is about potential candidates finding creative ways to demonstrate their skills.

The companies didn't even ask people to do this: these people chose to take on projects that would demonstrate their skills in a new space in which which they did no yet have commercial experience. I think that's commendable and a very smart strategy.


This isn't different though. This article was written as "How to get into ML" and is basically selling the idea that getting into ML requires (or is aided by) doing some independent open source work to bulk up a profile.

A very competent developer who is interested in doing some ML work may have just recently read this article and gotten to work on an ML project because he feels that not doing so will hurt his chances of getting a job.

As a note, I have no idea what the right solution to this problem is, it is good to confirm candidate's knowledge (I prefer creative ways similar to the parent) and part of that knowledge certainly can include open source work, but going down this road too much leads to people feeling obligated to make open source work to put on their resume and even to people faking open source work to try and land a job.


"but going down this road too much leads to people feeling obligated to make open source work to put on their resume"

...which makes the world a better place. Why do you object to people writing open source code?


i agree, and I recommend doing it for those wishing to get into the field. i believe one of the things that helped me get where i am now is my performance in kaggle competitions.

i do have one concern if it ever became an industry prereq, it becomes a filter for those who have more free time to work on side projects.


The current problem with things like data science thought pieces and MOOCs is that they don't prepare for the realities of machine learning/data science in the real world. (longer blog post on the subject: https://minimaxir.com/2018/10/data-science-protips/)

Doing a unique project is much better for learning cost/benefits of implementing AI/ML, although this post may be overoptimistic on how that can lead to a job offer.


Thanks for posting the article. While I've read up on ML, I don't work in the industry. I'm curious to see what other people think of the article. Can other people weigh in?


I agree with my sibling comment and would add:

Max is correct to point out the irony in his anti-thoughtpiece thoughpiece as he falls into the same trap of vagueness as those other articles. Specifically, he rails against general “black box” approaches to modeling, then takes a general “black box” approach to the work of operationalizing a model (much harder than building the prototype to begin with!).

The discussion of “pulling data” does not match the practical reality, since pulling via BI tools is not scalable and rarely automatable. SQL may cover this insofar as you dump data from SQL to...what, though? A Python session on your laptop? Automating this process allows a data scientist to scale their impact.

For more specificity on engineering practices required for data science, I recommend Robert Chang’s series of posts: https://link.medium.com/CG7c7mQdyS

For details on how a data scientist can impact an organization, I recommend this from the FirstMark blog by Jeremy Stanley and Daniel Tunkelang: https://firstround.com/review/doing-data-science-right-your-...


For clarity, I only covered the data science part of the perspective; the data engineering/DevOps part is another, more difficult can of worms which would require its own post!


I'm a junior SWE at a mid-sized AI startup. The article is pretty much spot on, but I'm not sure that the romantic view of data science the article argues against is actually that prevalent.

It seems like you would need a very specific level of knowledge to romanticize data science in that way. Most people know too little (So you're basically trying to build skynet?), and most of the rest are either in the industry or know someone that is, and so have a more realistic view.

I don't know, I did maths in undergrad, maybe some of the more clueless CS majors thought this way.


The reality is that there are enough people with experience in machine learning that companies don't need to train people.


This is not remotely true. It's true that there are lots of data scientists who can string together sklearn or R models. However, very few people know how these models work, when to apply them _and why_ or what concerns need to be addressed on data, metrics and deployment. Even fewer know how to improve these models once they catch all of the low-hanging fruit.


Playing Devil's advocate: The other side is they hire a false positive, and suffer through the different sad state of affairs?


Skunkworks: omg this kid has replicated our Auto-GAS system using only computer vision

Lockheed Martin CEO: So what the fuck is this youtube video?

https://www.youtube.com/watch?v=WkZGL7RQBVw


I don’t know why everyone cares about machine learning so much. Frankly there seem to be so many prerequisites to knowing what goes on, when I imagine a software developer could just get better at java and then learn about parallel code or something to get lots of money for less effort. I’m not in software so I don’t know if any of that is actually true


There are lots of ways to make money easier and faster. I guess this is for people who are genuinely interested in the subject.


Well, because ML is fun.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: