Whenever I visit colleges and universities in India for talks and lectures, I come across a common question from students enrolled in undergraduate engineering and science programs. Universally, this question is, “Can I get a data scientist job after college?”. In this blog post, I wish to explain five key reasons why new graduates in data science and analytics have difficulty finding jobs in analytics and data science. While the answer to the above question is typical “No”, there are clear reasons why this is the case, and I hope to illustrate why this is the case, in this post.
Data Science Roles and “Junior” Data Scientists
Data science roles have been around for years, and have changed in their scope and definition during the last few years. Given the hype around big data and data science, a number of data analysts that perform unsophisticated analytical tasks, such as reporting, dashboard generation, and simple data analysis, have been branding themselves data scientists. True to this trend, some companies that hire talent fresh out of college for data scientist positions often hand them simple analytical work, while having a learning plan to skill these data scientists up, so that they can take up advanced analytics projects.
Generally, these entry-level data science roles are not intended to be as rigorous as the advanced work that an experienced data scientist might do – and we firmly believe that data scientists should come with sufficient experience in the application of statistics, computer science and machine learning to problem-solving. As we will see below, there are other skills required as well.
There are also firms that offer a ‘junior’ data scientist position to freshly graduated students. Broadly, people from the data science community tend to agree that such a role exists merely to attract young talent because there is widespread agreement on the advanced skills that the data scientist role requires, that aren’t often taught in colleges.
Why do companies not hire new graduates for data science roles?
Even as colleges and universities start to roll out new courses and programs related to data science, there are still several gaps between the topics taught in class against the practicality of the industry. I’d like to highlight five specific gaps in this context:
1) Communication skills
In recent years, colleges offering data science or analytics degrees and courses have tended to focus heavily on technical topics such as statistics, linear algebra, optimization, machine learning and programming frameworks. They have generally failed to develop the communication skills of the incumbent data scientists, even though these are required as key skills for data science roles in industry. In data science roles, one expectation is to be able to deliver data-centric presentations and express complex relationships and interactions between data, such as hypotheses, descriptions of models and mathematical measures, in a language that is business-relevant. In this sense, data scientists play an important translation role in organizations, by essentially bridging the insight divide between the technical, strategic and business teams.
Also, in the industry, data scientists are generally treated as consultants while discussing the requirements and possible solutions. They tend to work closely with the customer or business user, where their communication skills and other soft skills are put to test every single day. Even if an incumbent data scientist is extremely competent technically, he/she will have to explain the interpretation of the results in simple terms to the business user who, at times, might find it difficult to understand words that are even remotely technical. For this reason, college curriculum and students of data science and analytics should treat communication skills as a key success factor when skilling up to become data scientists.
2) Domain/Business Knowledge
Due to the very nature of the role, data scientists are expected to possess vast functional knowledge across different industries in addition to good communication. One fine morning, the data scientist might work for a company like Decathlon on sports analytics and, a few days later, for a company such as Netflix movie recommendations. Even within the same client organization, the data scientist might be working on a demand forecasting problem for a product (sales) problem on one day and might be predicting cash flows (for their finance team) on another day. While it is possible to learn this required domain-knowledge or business-knowledge on-the-job, companies always prefer data scientists who have a head-start, when it comes to a basic understanding of different functions across industries.
Colleges and universities focus heavily on the theoretical and technical front of data science but fail terribly on the practical and applied side of it all. Although some universities are bucking this trend, most students that I interact with are very good in understanding algorithms such as K-means clustering, linear regression, random forests and deep learning, etc. but fail to understand why and where to use a specific algorithm. On the flip side, candidates with business experience or higher business degrees might fare better in this aspect, but generally shy away from programming and other technical aspects. This leaves a real lacuna in the spectrum of data science talent, due to the lack of data scientists who understand when and how to apply a specific analytical technique, and are, at the same time, able to discuss these in a business-relevant manner.
3) Be a Jack-of-all-Trades
In addition to a good grasp of functional knowledge across industries, when it comes to handling data science projects, data scientists are expected to be a jack of all trades. They are expected to come to office with a can-do mindset every day. In a traditional software product development scenario, we may have several players for activities such as a business analyst, for gathering customer requirements, a software tester, an experienced developer for reviewing and optimizing code, in addition to a project manager and a bunch of developers. However, in data science projects, data scientists are expected to wear multiple hats and handle the projects with the help of peers and in some cases project managers and business analysts. This requires that they are aware of the breadth and depth of skill in different functions, tools, and technologies, and be able to straddle professional worldviews in these different functions. For a highly motivated freshly graduated data scientist, such a role might take at least a year or two to grow into, and even then, the chance of role effectiveness is likely to be greater with a more experienced data scientist.
4) Understanding the Data Science Process
College curriculums focus heavily on imparting knowledge on tools such as Weka, SPSS, SAS, etc. in the context of data mining, data analysis, text analytics and statistical inference. While this is not a bad thing, they don’t emphasize the virtues of understanding the good process and understanding the state of industry practice. Naturally, this means that the data scientists trained by academia sometimes don’t understand the challenges of data science processes followed in the industry. While such tools are still being used in companies for different use cases, most companies have defined their own data science processes and pipelines. This in addition to traditional data mining processes such as CRISP-DM, SEMMA, and newer data science processes such as ASUM-DM by IBM and TDSP by Microsoft. At TheDataTeam, we follow a data science process through the Data Science Management Framework (DSMF), that focusses on the operationalization of analytics and insights. Technologies that enable such operationalization, such as Docker containers and Kubernetes are not taught in colleges, neither are aspects of deployment.
5) Get the Basics Right
There are no shortcuts to become a data scientist. One must be strong with the basic concepts before going into the advanced topics. Unfortunately, a large number of analytics-focused programs at the Master’s level begin their curricula with programming tasks centered around machine learning algorithms in languages like Python or R. Often they include the analysis of a random dataset using a data mining tool or using the libraries and frameworks that have become popular in the data science community. While it is difficult to cover all the aspects of data science in a single generic degree program, introducing the students to high-level algorithms and tools without the basics is at best useless and at worst, detrimental.
Unless they can motivate students of data science to explain the “how” and the “why” of different data science algorithms, tools and methods, universities will develop data science graduates who are unprepared for professional challenges as data scientists in the real world. If data scientists need to be able to answer these deeper questions, they need to understand the essential ingredients of all data science algorithms – statistics, calculus, linear algebra, optimization and programmatic representations of these concepts. The big data analytics and data science space are increasingly being disrupted by automation. For data-driven automation to happen effectively, data scientists must be able to get back to the basics and reason about the nuts and bolts of machine learning and statistics. This demands a clear understanding of the basic concepts and heuristics surrounding data science.
In the recent years, the data scientist job has become highly competitive, and definitions of the data scientist’s role are constantly evolving. With candidates of varying experience levels flooding the market, it is now difficult for recent graduates to compete with experienced professionals who possess strong functional knowledge. In this post, we’ve discussed five key elements of data scientist success – communication skills, domain knowledge, flexibility, process orientation, and, last but not the least, strong foundations in the basics of data science. In my next post, I’ll address how college graduates can upskill themselves to reduce the gap between what they have learnt in academia, to what’s required for industry data science positions.
About the author:
Balachandran Siddharthan is a Data Scientist @ The Data Team. With Masters degree in Big Data Analytics, Bala has more than 7 years of experience in the area such as Data Science, Business Intelligence, and Data Analysis.