|
The experience of interviewing a data scientist is like none other. Over the past year, we’ve interviewed more than 100 data scientists, and most, if not all, of them are brilliant. After all, they are a data scientist and have spent many years mastering their craft. The purpose of this post is to potentially assist technology leaders who are considering hiring a data scientist or a data science team. There are five items of consideration:
Leadership: Before getting into the skill set that is required, you first need to find a leader, and this is really, really hard to identify when you are interviewing. The Data Science lead, in most cases, will be overseeing all of the ongoing ML Models that are being built at your organization. He/She should have the impetus to document and summarize in non-technical terms during each phase of the model building process. The goal is to load balance your ML Model building across your data science team, and this person must have the business acumen to do just that.
Project Portfolio: In addition to leadership skills, the data scientists you hire should have a good number of models that he/she has already built. Those projects should be outlined in the resume and understood by the interviewer to determine where his/her strengths lie as a data scientist. For example, one data scientist might lean heavily towards predictive analytics, have a strong background in statistical engineering, perhaps even a master’s in computer science. In contrast, another leans towards NLP and sentiment analysis. Scientists love to work towards their strengths, so it doesn’t make any sense to hire a scientist for a sentiment analysis project when he/she is defter at building predictive models. However, having a diverse project portfolio might convince you that hiring this data scientist would be a good choice because of their varied projects.
Technical Competencies or Skill Sets: Besides the scientist’s project portfolio’s breadth and diversity, you’ll need to appreciate his/her skill set. First is programming, then platforms, then visualization skills.
First is your industry. Does the DS have experience in your industry? This is an essential factor because you will need subject matter experts (SMEs) to gain insight from the data sets you are working on and potentially datasets he/she or the SME will introduce when building highly accurate models. We look at industry congruency just after education and projects. You might look at some of the “delivered solutions,” which may be extrapolated in asking questions about the projects, but highlight these in any case: quant methods, propensity models, Time to Event Models, Time-Series Forecasting, NLP, Sentiment Analysis, regression, classification, clustering and all of the subset methods used, which include Trees, KNNs, CNNs, DNNs, etc. There are hundreds of types of models, but these are just a few.
Creativity in a data scientist is particularly helpful when he/she is building out models. Make sure the interviewee has used ensembles in the past.
Platforms are essential, although most resumes will have AWS, GCP and Azure. Look for other platforms to build models that are transportable like Splice Machine, DataBricks or even Dataiku. From a programming perspective, Python is a must, but Julia is fast becoming its successor. A resume should include all of the following: NoSQL, SQL, Spark, JavaScript, Elastic Search and Kubernetes. You can include Docker in here as well. As it relates to visualization, this is key for the overall user experience of the model building process. Good Viz skills are critical to the comprehensive summary of the completed model or case study. Flask, Django, MatplotLib, Dash, Looker, Tableau, Seaborn and Plotly are pretty much the norm. Before you hire, ask to see some visualizations and decide for yourself.
Business Acumen: The combination of good business acumen and a DS who can identify any company’s problem statement by first identifying the problem and then asking probing concise questions about the variable outcome that you want to build a model for is the single first most crucial factor to achieving success. The clarity in identifying the variable outcomes makes it easier for the data scientist to conduct her job more efficiently. Suppose you are at the nexus of using machine learning to analyze each customer journey’s touchpoint, churn rate, or segmentation. In that case, you’ll want a data scientist that can focus their efforts on changing delta exponentially to understand and introduce the “next best action” dataset to launch for more effective campaigns.
Academic Diversity: Finally, a well-rounded team needs to have a diverse set of backgrounds, not only from the traditional sense of the word but also academically. This fundamental aspect of a competent team should include but not be limited to a balance of PH.ds from the academic world who can deliver deeper insights, experience in problem statement solving, and data science interns who, in my experience, are an overachieving asset to have on any team. There is a current and very acute movement among PH.ds who want to transfer their knowledge from academia into the business to solve the real-world problem at large technology companies and startups as well. Further, these academic types can be of genuine interest to your organization since many of them belong to elite organizations that can introduce you to thousands of ML practitioners, researchers, and data science executives. The pool of resources these days is scarce to find a top-notch data scientist. Having an academic type on your team, who is also an influencer and who is well-read, among his/her many citations and who might even have a blog, can benefit your organization. Finally, some of the academic types might have created patents. Having this person as part of your team will not necessarily make him/her indispensable but worthy of bolstering and mentoring your data science team—Good Luck in your efforts and your research.
Sponsored byVerisign
Sponsored byWhoisXML API
Sponsored byRadix
Sponsored byCSC
Sponsored byDNIB.com
Sponsored byIPv4.Global
Sponsored byVerisign