Usually, I write a tutorial for other data scientists, or individuals that want to become data scientists. Today, I wanted to write a tutorial for managers that have never hired a data scientist before. Hopefully, my data scientist friends will find value in this tutorial for managers just as informative as they would from learning to apply a data science technique. It should teach them what will be, or rather should be, expected of them. So this article should be beneficial to all.
Why Do You Really Want To Hire A Data Scientist?
When your organization decides to hire its first data scientist, you need to ask yourself, why? What can this person do that no one else in your organization can do.
The answer is of course that your organization’s business has become so complicated that answering questions about it has become incredibly difficult. Imagine your business as a large hairy ball of twisted problems. Why do you hire a data scientist? You hire her/him to untangle that mess. A good data scientist is able to take hold of the end of one of those problems and tug and pull until the entire strand is free from the ball. From there they are able to tell you all about that problem, and gives you the information about it. They can tell you about the problem itself, and they should even be able to give you options.
Remember, this is the first and foremost thing that you are hiring a data scientist for. You are trying to hire someone who can help you make decisions. Your first data scientist should be considered a senior member of your staff. They are the person, to whom in a large degree you will be turning for answers. Do not alienate this person by not giving heavy weight to their proposals. I am not saying that they do not need to earn their place at the table. I am saying that you hired them to help you make decisions. If you ask them for a recommendation, and you decide to go against that recommendation, remember, that this individual has been looking at hard numbers, and has expressed an opinion. Please do not leave them in the dark. Explain why you went against their recommendation.
But I think that tells you exactly what you need to know about a data scientist. Especially, your first data scientist. You need to find someone with backbone. This person needs to be able to express an opinion. When you go looking for a first data scientist, keep that in mind.
What Skills Should My First Data Scientist Have?
You may be tempted to throw buzz words into the job description. Resist that urge. I know you want to find somebody that is technically competent. Buzzwords don’t help. True data scientists know that you need someone that is technically capable, they will be. Besides, do you want someone that knows hadoop and other distributed technologies, with deep learning expertise, if you are running everything out of a mySQL database and just want to know whether it is a good idea to lower the price of product X?
No, the correct skills your data scientist needs to have, as an absolute minimum, is some sort of coding experience, I would suggest python since it is so versatile, some SQL experience, and a background in statistics. From being the first data scientist/ data analyst on a number of different teams, I can tell you anything else is probably overkill. Seriously, it is straight up overkill.
I would fill your job description with items that scream to good data scientists that you know what the biggest problems facing your organization are, and don’t shy away from the fact that your data isn’t going to be pleasant to work with. You should have keywords like: interpolation, outlier identification, data consolidation, and merging of disparate data sources, and feature engineering. This will signal that you are on the look out for the right kind of individual. The kind of individual that knows what being the first data person on a team is going to look like. It would be admissible to also add key technologies as well since your job candidates will likely be looking to see if their current stack of tools fits into your ecosystem. If you want someone that uses R, feel free to say that. Do they need to write their own queries? Throw SQL in there as well. And then you want to make sure that this person understands linear regression.
How do I Interview a Data Scientist?
There are a lot of schools of thought. My personal belief is that your data scientist should be able to answer questions that you have about a business problem. So ask about a business problem. Make sure that there are only two possible courses of action that you can take. Tailor the question so that it is realistic to your needs, and then tell them to take a second to imagine that they have done the analysis, they have a result that points to a clear decision. And have them explain to you why the hypothetical analysis clearly indicates what you should do. If they can’t do this with a made up analysis in their head that clearly points in one direction, how well will they be able to communicate their actual results to you, when the data isn’t so cut and dry? Even better, come into the interview with your own analysis, a single visualization, that in your opinion says something incredibly important about your business. This is of course tricky. You need to make it easy enough to understand that people from diverse backgrounds can figure out what is going on with your data, but hard enough that it isn’t obvious what the right answer is. That line is complicated by the fact that you are oblivious to where that line is because you are so close to your data, and your business already.
The next thing you want to know is whether or not they are what you need them to be.in terms of data science. There are after all many different types of data scientist. I like to make a useful distinction between a few different types of data scientists. There are the New Grads, Plumbers, type A, type B, and Unicorns. You should be asking your candidates questions to ascertain what type of data scientist they are.
Avoid trying to get a unicorn. You aren’t ready for one. Your first hire should never be a unicorn. A unicorn is the combination of the other Four types of data scientists. They are not what you need at all. You would think that catching a unicorn would be a wondrous thing. But answer this, what are you going to do with the unicorn once you have it? How will you hold on to the unicorn? How will you keep the unicorn happy? Unicorns make terrible first hires.
Okay, what is a new grad? A new grad isn’t necessarily a fresh faced graduate. It could be a grizzly veteran of several companies. The new grad is the data scientist that is obsessed with the latest algorithms to come out. They’ll tell you why you should use lightgbm over xgboost (both are gradient boosting ensembles using trees). They think that the algorithm is of utmost importance, and they don’t spend enough time engineering features, cleaning data, and doing what needs to be done. This person will be disappointing as a first hire, because they will take you down many rabbit holes.
A plumber might actually be the type of data scientist you need. However, they shouldn’t be your first hire. At this point you don’t know if you need someone who specializes in building data pipelines. If your data is in that rough of shape, you have other needs than a data scientist. You have large structural problems in your business, and you should be investing in IT personnel and IT infrastructure for your business. A good type A or type B data scientist will help let you know when it is time to hire a data plumber.
Type A stands for Analyst. These guys know how to build data analytics. These guys are more akin to your basic run of the mill statistician. I like type A data scientists. It’s the type of data scientist that I would have classified myself as for most of my professional existence. To a certain extent, I still do. You can think of these individuals as chief explainers.of your business. They are confident with data to the point that you could ask them to give you an answer to anything, and as long as they have access to your database, they will get that answer for your. You want these guys around, obviously. If you can’t find a type B data scientist, you can make a worse decision than to hire one of these guys as your first hire.
What you really want to see though, is a type B data scientist walk through your door. Type B stands for builder. These data scientists lead by building a dashboard that will explain a model to you. They will give you tools to find answers for yourself. What you want is a data scientist that gives tools to the rest of the organization to find complicated questions on their own. They make everyone in the organization better by giving them access to the models that they build. Aside from the unicorn, these data scientists are the rarest group to find.
In my next blog post, I’ll give you specific questions to ask to determine what type of candidate you have sitting in front of you.