Today, becoming a data scientist is one of the most exciting occupations, and data science solutions and consulting companies keep multiplying. However, this field is a black box to the common folk, and it is not as predictive as it seems. So, what is data science consulting? Four years ago, there were almost 3 million data scientists in the world. Some people say there is still a shortage of data scientists. However, others believe there are too many of them.
Pursuing data science is a relatively new career path. The introduction to it can be challenging. It is continuously evolving and making it difficult for companies to develop an efficient process to identify and recruit data science talent. Businesses form unclear, inaccurate job descriptions and get candidates who pose as data scientists, even though they only work with data conventionally. These issues plague the field of data science and cause a massive waste of resources.
Online data science crash courses and boot camps add to the problem by producing hundreds of so-called data scientists without proper evaluation or scientific background.
Another interesting thing about data science is that it is not an exact science. Data scientists work with real-world data and data-sets - and make choices based on information. So, unlike an exact science, data scientists need to think creatively and have in-depth business knowledge. It prompts the following question: if the data science process starts from understanding the business requirement, can someone become a data scientist at 12 years old? With hands-on experience and years of formal education required to shape a well-rounded data scientist, we would say that it is challenging.
High salaries and surge in demand for candidates produce many fake data scientists that do more harm than good and fail to attain the primary goal of data science - creating computational business value from data. Whether you are looking to outsource data scientists or hire a data science consulting firm, you will save time and money if you can quickly identify the right person for your project.
Who Are Fake Data Scientists?
We describe fake data scientists as button-pushers. And here is why:
- They usually do not have a technical degree. That means such people often do not think mathematically nor offer custom solutions to real-world business problems. They also probably have not gone through any learning techniques on the matter.
- They can explain and execute pre-made functions but have little or no knowledge of what is happening under the hood.
- They can answer basic questions about supervised vs. reinforcement learning that will make you believe that they are experts on the subject matter. However, they cannot explain when to use logistic regression or a decision tree for a classification problem.
- Finally, they do not understand the problem they are solving nor the data required for it. So, fake data scientists match the assumptions from the initial point to the data and the issue at hand.
- Fake data scientists aim their focus on finding the best visualization methods. They ignore statistical methods and data preparation. It makes them unable to anticipate problems that can occur after, during the data collection stage.
In short, fake data scientists are scared of actual hands-on projects that require mathematical and analytical skills. They love talking about tools; they talk about the advantages of R over Python, laugh at Excel, brag about their SQL skills, etc. They also do a lot of unsupervised learning.
Actual data scientists are concerned about how practical their model will behave in real life, while fake data scientists are obsessed with another neural network.
With so many data science consulting companies emerging in the market, everyone claims to be a professional in big data, deep learning, and artificial intelligence. However, ask them to code an actual algorithm, run it live, and see if it works.
Keep reading and master the skill of recognizing a real data scientist. This knowledge will save you time and money. It will help you find professionals to optimize actual business processes.
Data Scientist vs. Data Engineer vs. Data Analyst
The confusion about different types of data professionals is overwhelming. We’ve all heard terms like data analyst, data engineer, or data scientist. Despite their correlation, they are not the same. There are many wrong definitions of these roles that are too narrow for the actual role, despite being true for certain companies.
For example, some define a data analyst as a person who focuses on producing reports based on data interrogated using SQL and Excel. On the other hand, some tend to think of data scientists as software engineers. This premise may lead to the wrong conclusion that a data analyst position is not suitable for someone who aims to become a data scientist.
Here are our high-level definitions of these roles that have a lot of overlap in many areas. That’s why we tried to keep them as distinctive as possible:
- Data scientists possess a variety of interdisciplinary skills (math+technology+business) that allow them to produce insights and predictions from data.
- Data engineers focus on making data accessible and secure by building data pipelines to organize an enormous amount of data and get data from point A to point B. Data analysts may not possess the same skills as data scientists, but they work towards the same goal - they give logic and meaning to a large amount of data.
- Data analysts focus on using a statistical approach to collect and uncover data insights that can help make informed business decisions. Some steps in the data analytics process are data mining, statistical analysis, and data presentation.
What Do Real Data Scientists Do?
Now that you know the differences between various data professionals, it is time to dig deeper into the role of data scientists. There are no specific or widely accepted definitions, but we will try to summarize them in a few words.
Many believe that data scientists are magicians who sit behind their screens and develop new miraculous machine learning algorithms from scratch. No doubt, data scientists should know how these algorithms work and they should be able to create them from scratch. However, in most cases, it's enough to just use existing libraries that are widely available online. The main focus of data scientists is not on inventing new algorithms but on solving business problems using data. This is how they do it:
- Identify a business problem. For example, one of our clients was looking to save time and cut operating costs by reducing the efforts needed for video processing.
- Collect data based on the business problem.
- Test various approaches and techniques to solve the problem before choosing the best ones. Use information science, mathematics, computer science, statistics, and machine learning methods to attain the desired goal.
Now that you know what data scientists do, you are ready to start interviewing your candidates to find the right one.
How to Interview a Data Scientist?
The basis of neural networks consists of Algebra, Probability Theory, and Theory of Algorithms. Hence, questions related to these disciplines will indicate whether a data scientist has formal education and understands what’s happening under the hood.
Some questions you could ask include:
- What is the difference between probability density function and probability mass function? What other probability distribution functions exist?
- What are the predefined probability distributions and what is the most basic distribution function for continuous random variables?
As we’ve mentioned before, a data scientist needs to solve business problems using data.
Here are some questions that will help you determine whether the candidate knows how to define a business problem and choose the right approach to solve it:
- Based on what data is a data scientist making decisions? How did he choose this data? Is it new data?
The dataset has to be prepared before training machine learning models. Hence, the outcome will depend on the choice of the correct data.
- How does a data scientist get to know the client's business domain? What information does he gather about the industry?
- What metrics and controls does a data scientist use to measure the results?
Many believe that machine learning starts at the push of a button. However, there is a long preparation process that precedes it. Ask a potential data scientist:
- What steps does he or she take before training a model?
The actual steps include: a training classifier, creating a numeric table, attributes, objects, and visualizing the data to train the model correctly.
The advance of big data shows no signs of slowing down, and the competition for top data science talent is fierce. It’s getting more difficult to find people with the right combination of scientific background, education, business knowledge, and analytical skills.
Knowing statistics, math, and Python is not enough to be a successful data scientist. Understanding business, being able to dive into the domain, formulate a problem, and propose solutions is a craft that not many data scientists can master.
Valuable data insights and data analysis can drive revolutionary change within an organization and illuminate its way toward its business goals. Data scientists can drive the creation of better products, solutions, paradigms, and support business leaders in making crucial data-driven decisions.