With data science subsumed into critical systems across a wide range of industries, it demands that greater care is taken when recruiting for these positions. Moreover, in some cases, an erroneous evaluation can not only affect a company’s profit margins, but also potentially put lives at risk. For instance, with data science integrated into the AI engines of self-driving companies or medical products and services, there is far more at stake.
Looking for ideal candidates who are stress-resistant and adept at vital technologies is challenging enough, but the volley of ‘fake’ data scientists masquerading as skilled professionals makes it even harder.
With data scientists hailed as one of the ‘sexiest jobs of the 21st century’, there is an emerging trend of more and more people branding themselves as such, even if they remotely happen to work with data, or have a few related tech skills.
While not intentional in most cases, this can largely be attributed to the novelty still attached to the field of data science, and the lack of a rigid job description that accompanies it. But how can recruiters filter through this to find the right candidates?
A good place to start would be to identify the skill set you should be seeking for that position. We have compiled a list of five broad questions that you can ask candidates when interviewing for data science roles. Although these should be modulated based on the opening, most should be able to test the technical competency of candidates:
What technologies do you typically use for your data science projects?
Educating yourself about the technologies they employ and are comfortable working with will offer you a window into whether or not they can be a good fit for your company. This is because if they are inflexible about the tools they use, it may be prudent to think about the associated costs of hiring them.
In some cases, companies may be willing to incur that cost if candidates are able to prove their mettle, and are able to effectively convince you that those choices make practical sense. It will also inform you about the breadth of their knowledge about technologies.
What machine learning model will you choose to perform predictions on a [given] use case? Explain the rationale behind your choice.
Taking the previous question up a notch, this will enable candidates to demonstrate their mathematical understanding of the algorithm they are employing. With the help of a practical use case that is similar to the company’s, this question can be posed to them as part of a larger coding exercise.
In addition to quizzing them on the performance of their ML algorithm, you can also dig a little deeper and test their knowledge on the pros and cons of that approach. Furthermore, follow it up with questions on what they would have done to improve the predictive performances of their approach if given more time.
Test their data preprocessing experience/skills.
It may be useful at some point during the interview process to have an understanding of their data preprocessing experience or skills. Give them a coding exercise using some internal data inputs, and ask them to clean it.
Since a lot of time goes into preprocessing, it would be useful to know how candidates tackle these issues. Although experience will teach aspiring candidates how to perform better, an understanding of where your candidate finds themself on this scale would be helpful.
Citing real-world problems, explain how you can validate your findings based on the modeling technique you used.
If the answer to this is not clear from the previous questions, it may be important to address it now. This is because while some candidates may know all that there is to know about algorithms and statistics, very few data-focused people may be able to identify which techniques may be appropriate to solve specific problems in the real-world.
Furthermore, it would only be fair to give candidates an opportunity to take this a step further and even propose a solution to that problem. The ensuing discussion will shed light on the issues of scale, and their knowledge about the sector the company operates in as well – a critical component for data science professionals to understand.
How do you acquire knowledge about new machine learning tools? Do you do that on a consistent basis?
While this question could take the shape of a discussion, it will give you an idea of how invested they are in the field of data science. You can also gauge their awareness about the industry by the source of their day-to-day learning.
Asking them to list an existing tool that appears to be under-appreciated/over-hyped, and the areas within the discipline of data science that they would like to learn more about will also help you understand the candidates better. Another question like this would be – which data scientists do you admire most and why?
Outlook
Data science is a blend of scientific tools and techniques that comprises machine learning algorithms, statistics, and programming. Although this necessitates that data scientists are proficient in related technologies, it also demands competence in other fields too.
This includes business acumen, communication skills, and the ability to understand where technology fits in this larger puzzle in order to deliver insights. While it is no easy task to assess – or prove – these qualities in an individual, the above questions could help guide interviews in the right direction.