Hacker News new | past | comments | ask | show | jobs | submit login
Data Science Interview Questions (hackernoon.com)
123 points by eaguyhn on March 1, 2020 | hide | past | favorite | 22 comments



I would consider it a strong negative signal if a company used this kind of question for anything more than a very quick initial screen - a sort of "data science FizzBuzz." Firstly, most of these questions are things that a reasonably smart person with the right technical background could just look up on Wikipedia or in a textbook, etc. - why base your interview on whether the candidate has bothered to memorize a bunch of easily at hand knowledge? More importantly, none of these questions focus on the parts of a data science job that--for almost all data scientists--are actually the most difficult and subtle: understanding your data and where it came from, disentangling confounds, communicating results, understanding the right question to answer, etc.


For example,

> What kind of CNN architectures for classification do you know?

Is rated as an expert level question, but reads more like "what are some of your favorite pokemon?".


I can almost guarantee you that this is because "deep learning" is considered "advanced DS" in most corporate settings. DS has gotten so broad that, if you try to fit everything into a single category (rather than breaking it out into sub-areas like ML Engineer, Data Engineer, Research Scientistic, etc.), this is what you end up with. What "advanced" here really means is "relatively new field."

You can't ask someone just out of a stats program about CNNs because they probably have never touched them, even though for someone doing a lot of DL work in CS may consider this trivial, so you mark it as intermediate. And you can't ask someone with 10 years of pre-DL experience in DS about CNNs because they've been doing forests and regression and clustering and such, so you move it from intermediate to advanced.

In other words, it's "advanced" because HR wants to make sure it's "optional".


I like your suggestion about understanding data and where it came from. You are right that there are more interesting and complex questions than knowledge base questions, and I have no doubt that many people here agree with you. Knowledge questions in interviews seem to be a popular punching bag these days.

Let me offer a few thoughts to maybe soften the extreme point of view that knowledge questions are any kind of negative signal.

First, please don’t assume that the existence of knowledge questions means that the interview process of the author is based on them or primarily about them. Most, if not all, companies have multiple interview modalities, and basic knowledge questions is just one of them.

Given that there are multiple phases for an interview, the most important thing a candidate should do is avoid being rejected for something simple. It’s a gauntlet, and if you only think about the boss stage and forget to do really basic things like spell check your resume or choke on how to summarize what gradient descent is, you might not even make it to the hard interview questions.

From experience, it certainly is tempting to think that because you can look up knowlegde questions that people will just game it and ace them, but that’s not what happens in practice at all. In reality, people who don’t have a background in the subject might be able to recite something they’re heard or read, but have a very hard time answering any followup questions, dealing with terminology they haven’t heard, or connecting topics together. It’s very easy as an interviewer to detect BS by asking very straightforward followup questions, by having a conversation. Plus there’s a much bigger risk in trying to memorize answers to things you don’t know; when it’s quickly discovered you were trying to fake it, that is grounds for being quickly rejected.

Knowledge is part of the background you need to get to the interview, and while it might seem like you don’t use it directly on the job, it’s part of a foundation of exposure to the topics that connect the pieces together. There’s nothing wrong with exploring bounds of a candidate’s knowledge to get a sense of how much they covered in school, how much they exercised during internships or other jobs, how much they’ve thought about it on their own.

It’s perfectly okay for there to be easy interview questions (you’d be surprised how many get answered incorrectly!), and for there to be questions that are broader in scope than things you’ll do day-to-day. Keep in mind, it’s a conversation where you’re trying to convince the employer that you’re already trained in the subject, and also well-rounded and have the potential to grow.


All very fair points. Thanks.


I have never seen confound used as a noun before (I am not a data scientist). Is it the same concept as a confounding variable?


That's the sense I meant it, but I think it's an abuse of terminology. i.e. confound is not obviously a noun :)


I like to question about the models the candidate has used. I'm fine if someone has never used a CNN or another method. For me the signal is whether or not they care to know about the methods they use. If not it usually means that they are happy pushing data in and getting some numbers out.


Yes, of course - we should never ask questions where the answers can be found on wikipedia


Why does someone always say this? What are you supposed to ask these people, "hey, are you good at communicating?" "is u good with data?"? You need technical questions somewhere, and if you ask someone specific technical questions then you're being a pedantic dick, but if you broaden it (like these questions), you're only asking easily looked-up knowledge. I mean, come on.

Many of these questions are foundational or semi-foundational questions, and if you're not asking those then how can you possibly tell if someone understands the basics? I don't care that someone can look up what an ROC curve is, I care that they can explain why it's used and the relationship between TP, FP, recall, precision, etc., and why these distinctions are important.

And yes, your last sentence is correct, but it's ONLY correct when the individual actually understands the foundational parts of ds/stats. If they don't, then it doesn't really matter what questions they ask or how good their communication is.

For some reason there is this widespread notion that in DS, because the data is so important, and so tied into the business logic/processes, that you don't need to know the math, and that's just not true. That's how you end up with "data scientists" working in excel and putting out meaningless reports that no one ever uses because they can never get to any real, actionable results because anything more than linear regression is an impossible task.


I have to agree with you there. It's not nice to have to ask such questions, it feels like the equivalent of just hammering the candidate with questions about the time complexity of a bunch of random algorithms.

Coding or maths interview questions are quite nice because they can be treated as puzzles that the candidate can tackle in different ways, and the interviewer can guide them without giving too much away if they get too lost.

DS questions on the other hand come in two flavors: pop quiz type of questions like the ones in this post, or the "pretend you have data, how would you write a recommendation system" ones, which also suck.

I try to mitigate this by inserting questions when the candidate is talking about previous projects, so I can gauge if they have a deep enough understanding the stuff that they've used in the past, and also to give more context for the questions.


This blogger has questions and answers:

[40 Statistics Interview Problems and Answers for Data Scientists ](https://towardsdatascience.com/40-statistics-interview-probl...)

[Amazon’s Data Scientist Interview Practice Problems ](https://towardsdatascience.com/amazon-data-scientist-intervi...)

[Microsoft Data Science Interview Questions and Answers](https://towardsdatascience.com/microsoft-data-science-interv...)


Making a list of questions is the easy part. There are intricacies to all of them. Understanding that is the hard part.

For those with an advanced degree in statistics, what % of DS interviewers remotely understand the questions they ask?

I was great at interviewing. Then I took the time to really learn it, and made myself unhireable. So now I moved to a field where I am ignorant, and everybody seems brilliant again.


> made myself unhireable

you mean because your answers weren't cookie-cutter enough?


He/she talks about negative selection, probably


No answers. Any recommendations on books or blogs to learn data science?

I working through this book, which I do like:

https://www.amazon.com/Data-Science-Scratch-Principles-Pytho...

I’m rewriting the examples in Swift to help me learn:

https://github.com/melling/data-science-from-scratch-swift

Something with a little more theory might be good. Lots of the questions seem to require more theoretical knowledge.


I can really recommend Introduction to statistical learning by James, Witten, Hastie and Tibshirani if you are looking for something that covers the theory without going into too much detail.

There is also Elements of statistical learning by the same authors if you are looking for something more rigorous. I haven’t read very much of it but it is supposed to very good too.


Try machine learning bookcamp: http://bit.ly/mlbookcamp Learning machine learning by doing projects


The first time I loaded this page they had a female-looking icon as intermediate and a somewhat-male looking icon for expert. It now has a star and rocketship, res.


what is considered good data science interview questions? I think it depends on the purpose of the company.



How odd our field really is. Many of these questions are basic stats. Others seem like the kind of statistics process control I learned in my undergraduate engineering curriculum. Some are more modern, and would be learned by going through something like the scikit-learn documentation, or taking a coursera, though I suppose this might be in a more current formal curriculum.

The odd part is, do senior actuaries get quizzed on integration by parts when they interview? Do law firms put new hires through their 1L civil procedure exam?

And while those are fields with a standardized entrance exam (actuarial exams, bar exam, etc), I know lots of people in non-credentialed fields that still are knowledge based. They may get asked about a database or their experience with it, but they don't go through anything like this kind of technical exam style interview. It really is an exam, it's just administered capriciously, often by people who have unknown or dodgy credentials, without any review by experts or assurances that it won't be used to discriminate (the modern day equivalent of a "literacy" test, I suppose), and is graded under conditions of great secrecy (people often sign a non-disclosure prior to interviewing, google asks people to do this).

I personally think a company is well within its rights to do subject experienced people to an eternal cycle of undergraduate midterms if that's what it chooses to do. I think that this practice deters a lot of people from going into these fields as well, which is also their right as free and full members of the society they live in, free to choose a career path in response to their own interests and how they align with what employers want and what they're offering.

But... well, here we go. The companies that do this are almost always talking about a shortage of workers, one that the government needs to solve (or at least mitigate) by creating a special visa that allows employers to decide who is allowed to work in the united states and the conditions under which they are allowed to remain. Conditions that, surprise surprise, often involve working in a field that subjects people to an endless repeats of their undergraduate midterms. When people with choice won't choose a particular job, isn't that the market's answer? Just to be clear, I positive about immigration, provided the people who immigrate are, well, free. But why on earth would we create a special corporate controlled worker visa so that companies can continue to engage in practices that drive away people who can choose, in large numbers?

This is the pits, people. Don't these lists bug you just a little? I mean, I'm not trying to put down the list itself, I suppose it's probably a pretty good list for people who are going to subject themselves to this kind of interview exam, either because they like the field or because it's the only way to get through the US's byzantine immigration system without relatives.

But is it really a surprise that people with choice (i.e., the "free" citizens of Rome who are not bound by law to work only in certain fields as a condition of remaining within Rome) are rationally choosing to do things where they don't have to put up with this bullshit?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: