Sunteți pe pagina 1din 5

Why does companies care about Data

Scientist/Analyst?
Already crowned as the best job in America for 2016, the definition and skill set required to be a
data scientist is in a constant state of flux. Advancements in technology and business demand
drive its evolution in an ever-changing industry. In this article, we take a closer look at the role
of a Data Scientist in 2016.
Dave Holtz writes that the title 'data scientist' is often used as a blanket title to describe a set of
jobs that are drastically different. He attributes this to the fact that the field of data science is still
in its infancy and so is ill-defined. Adopting the all-encompassing sub-title of being part of an
interdisciplinary field, a data scientist works to extract knowledge or insights from large
volumes of data in various forms.
The age of big data is upon us, and its here to stay. With more data being collected than ever
before, extracting value from this data is only going to become more intricate and demanding as
time goes on. The logic behind the big data economy is shaping our personal lives in ways that
we probably cant even conceive or predict; every electronic move that we make produces a
statistic and insight into our life.
As participants in the consumer economy, we are mined for data when we connect to any
website or electronic service, and a data scientist is there to collect, clean, analyse and predict the
data that we provide by using a combination of computer science, statistical analysis and
intricate business knowledge.
The following diagram shows the skillsets required for a Data Scientist. As we can see, this
responsibility is a combination of multiple skillsets and expertise compared to a typical Big Data
Developer or Business Analyst.

Figure 1. Data Scientist Skillsets

What sets aside a data scientist from other, seemingly


similar, data jobs?
Rivera and Haverson suggest that, whilst previous data professionals were concerned with
focusing on past movements and interpretation of data, a data scientist tends to be more
mathematically focused- concentrating on providing an insight into future patterns identified
from past and current data. If one takes the two words literally science implying knowledge
gained through systematic study; data being an information set of qualitative or quantitative
variables a data scientist can therefore be defined literally as one who systematically studies
the organisation and property of information.
Notwithstanding the crucial role of statisticians and others who study data analytics, the role of a
data scientist, described by Anjul Bhambari as part analyst, part artist, is set to revolutionise the
way that traditional data is analysed and used.

The growing demand for data scientists

The success of business networking site LinkedIn is a prime example of the crucial benefit that
data scientists are bringing to business intelligence. As an enterprise that relies almost solely on
the data transferred by its 380,000,000 users making connections with each other, LinkedIn is
utilising those professionals with the training and curiosity to make discoveries in the world of
big data.
LinkedIn, alongside other large knowledge industries such as Facebook and Google, is utilising
the role of data scientists to bring structure to large quantities of formless data and to determine
significance in its value, and systematic relationships between the variables.
A recent survey of C-suite executives by KPMG found 99% of respondents thought analysis of
big data was important to their strategy next year. In an age where enterprise data is expected to
exceed 240 exabytes per day by 2020, the need for data scientists with the skills to extract
valuable insights from this data is more important than ever. . However, an article by Travis
Wright for Venture Beat suggests that demand for data scientists is very much outstripping
supply and that companies in the United States alone will need to hire between 140,000
190,000 data scientists if they are to keep up with the new data economy.
Ironically, there is a great deal of conflicting data on the average salary for a data scientist,
however, what is clear is that the average salary does tend to be inherently concurrent with the
high demand level for data scientists. Not surprisingly, if employers are asking candidates to be
experienced with data mining algorithms, able to work comprehensively in languages like R and
Python, experienced in working with large databases (SQL or similar), implementing Java
applications, manipulating NoSQL databases (to quote about 10% of a job specification) all
with the ability to communicate all of this to a non-technical audience, an average salary of about
$120,000 doesnt seem too far fetched.

The role of a Data Scientist


Whilst the role of a data scientist crosses over with more conventional data analysis positions,
there are some stark differences.
A data analyst or architect can extract information from large sets of data. Yet they are bound by
the SQL queries and analytics packages used to slice these datasets. Through an advanced
knowledge of machine learning and programming/engineering, data scientists can manipulate
data at their own will uncovering deeper insight. They are not bound by these programmes.
Whilst your typical data analyst looks to the past and whats happened, a data scientist must go
beyond this and look to the future. Through application of advanced statistics and complex data
modelling they must uncover patterns and make future predictions.

The skills required of a Data Scientist

Successful data analytics rely on one being able to clean, integrate and transform the data and
this is the crucial combination of skills all data scientists must possess. By combining a scientific
background with computational and analytical skills, you can put yourself a cut above the rest.
Figure 3 below shows the several areas of focus for typical data science discipline.

Figure 2. Data Science Focus Areas

But lets dig deeper into the actual skills required to become a data scientist. Mark van
Rijmenam, CEO at Data Floq, recommends that data scientists possess the following skills:
statistical, mathematical and ethical, as well as a high degree of predictive modelling experience
in order to build the algorithms necessary to ask the right questions and find the right answers.
Ferris Jumah from LinkedIn goes further to neatly group the skills required, despite the huge
array of skills and different job roles a data scientist might perform.
A data scientist must:

Look at data with a mathematical mind-set. Learning skills such as machine learning,
data mining, data analysis and statistics are crucial. A data scientist will need to interpret
and represent data mathematically.
Use a common language to access, explore and model data. Knowledge of a statistical
programming language will be critical. Languages like R, Python or MATLAB, and a
database querying language like SQL are some of the most popular skills in demand.

Data extraction, exploration and hypothesis testing are central to the data science
practice.
Develop strong computer science and software engineering backgrounds. This
involves developing a skill set which could include Java, C++ or knowledge of
algorithms and Hadoop. These skills will be used to leverage data to architect systems.

Tools of a data scientist


Unlike your typical programmer, who may use a standardised set of tools, data scientists tend to
use a wide array of ever changing tools. This is because the data science landscape is evolving
rapidly, with many new tools still far from maturity. That being said, below weve compiled a
series of popular tools for data scientists aligned to specific practices:
Data Analysis:
Here, the tools are really just the programming languages a data scientist uses to extract and
analyse data. This is typically Python, R and SQL.
Data Warehousing:
A data scientist may choose to have their own database to which they can extract and analyse
data. MySQL is among the most popular to handle reasonable size datasets. Moving in to the
realms of big data, they would typically turn to programs like Hive or Redshift. Youd also be
surprised how far most data scientists can go utilising the average .CSV file before it falls over.
Data Visualisation:
Among the most commonly mentioned tools for data visualisation are D3.js and Tableau. For
D3.js, if you can imagine a data visualisation, a data scientist can achieve it using the software.
Tableau is the most popular data visualisation tool out there at the moment allowing the
compiling data from hundreds of inputs and then easily transforming the data into visualisations.
Machine Learning:
This is perhaps the area most in flux with new tools emerging daily. Most established and widely
used is perhaps Scikit-learn which utilises Python for machine learning. Then of course there is
Spark MLlib which is Apaches own machine learning library for Spark and Hadoop.

S-ar putea să vă placă și