Sunteți pe pagina 1din 1

Here are some of the key challenges to analysing big data 1,2:

Data access
The majority of big data is used for commercial purposes to increase profits, provide better
services, or gain competitive advantage. Thus, organisations are hesitant to share their data
with outsiders. Even when organisations allow access to their data, they usually restrict
access to certain portions of the data or impose rate limits on the amount of data that can be
accessed per day or user. This makes it difficult for researchers and non-profit organisations
to obtain data, but also for organisations to integrate their own data with other organisations’
data. However, many countries nowadays promote ‘open data’ portals, where datasets are
made available to the public.

Inconsistent and incomplete data


Even though we are collecting more data than ever before, the overall quality of our data has
not increased. The percentage of incorrect or incomplete data points remains the same. For
example, take electronic sensors that record an incorrect reading once in 1000 readings. As
the frequency of readings increases by a factor of ten to 10,000, the number of incorrect
readings also increases to ten. Written text on the web will also always include spelling
mistakes. As the amount of text posted increases, it may even contain a higher percentage
of mistakes. Therefore, data cleaning becomes an important task for big data analytics.

Heterogeneity of data
Heterogeneity of data refers to how much the data differs across the dataset we are looking
at. This can include differences in data format, number of missing values, level of detail, or
length of time period for which data is available.

Heterogeneity is a particular issue when we bring together data from unconnected sources.
For example, it may be useful to connect population data from government sources with data
from environmental sensors to determine action towards a drinking water management plan
for a city. The data from these different sources will need to be carefully matched to ensure
valid analysis results.

Data privacy and protection


More and more data is stored about personal interests, behaviours, and attitudes. While
consumers often trade their personal data for a product customised to their liking, their
privacy needs to be protected by clear policies. In addition, the results of analysing personal
data, perhaps from multiple sources, may be more sensitive than all the individual parts. As
Aristotle already said: ‘The whole is greater than the sum of its parts’.3

Data privacy and protection is not just important for individuals. Organisations also need to
have their data and intellectual property protected by policies and laws

S-ar putea să vă placă și