Sunteți pe pagina 1din 8

1.

Innovative Approaches To Improve Your Data Science

2.8 Information Science Thoughts That Can Inspire Your Companions.

3.Everyone Knew About Data Science

4.Data Science Ideas That Can Impress Your Friends.

big Data can be intimidating! If you are new to Big Data, please read ‘What is Big Data’, ‘Who
coined Big Data’ to get you started. With the basic concepts under your belt, let’s focus on
some key terms to impress your date or boss or family. By the way, I am putting together a
much more exhaustive list of Big Data terms (almost 100) and if you are interested in that,
please leave a comment below with a note ‘I want more of Big data’ or something like that.

So let’s get going with this shorter list. Also, check out the infographic image midway through
this article and feel free to download and keep it in your pocket.

Algorithm: A mathematical formula or statistical process used to perform an analysis of data.


How is Algorithm is related to Big Data? Even though algorithm is a generic term, Big Data
analytics made the term contemporary and more popular. (Bonus: Here’s a pickup line on your
date, ‘You show me your algorithms and I’ll show you mine. …. Ok, Ok, I’ll stop! No more
cheesy jokes)

Analytics: Most likely, your credit card company sent you year-end statement with all your
transactions for the entire year. What if you dug into it to see what % you spent on food,
clothing, entertainment etc? You are doing ‘analytics’. You are drawing insights from your raw
data which can help you make decisions regarding spending for the upcoming year. What if you
did the same exercise on tweets or facebook posts made by your friends/network or your own
your company. Now we are talking Big Data analytics. It is about making inferences and story-
telling with large sets of data.There are 3 different types of analytics and let’s discuss them
while we are on this topic.

Descriptive Analytics: If you just told me that you spent 25% on food, 35% on clothing, 20%
on entertainment and the rest on miscellaneous items last year using your credit card, that is
descriptive analytics. Of ?course, you can go into lot more detail as well.

Predictive Analytics: If you analyzed your credit card history for the past 5 years and the split is
somewhat consistent, you can safely forecast with high probability that next year will be similar
to past years. The fine print here is that this is not about ‘predicting the future’ rather
‘forecasting with probabilities’ of what might happen. In Big data predictive analytics, data
scientists may use advanced techniques like data mining, machine learning, and advanced
statistical processes (we’ll discuss all these terms later) to forecast weather, economy etc.

Prescriptive Analytics: Still using the credit card transactions example, you may want to find
out which spending to target (i.e. food, entertainment, clothing etc.) to make a huge impact on
your overall spending. Prescriptive analytics builds on predictive analytics by including
‘actions’ (i.e. reduce food or clothing or entertainment) and analyzing the resulting outcomes to
‘prescribe’ the best category to target to reduce your overall spend. You can extend this to big
data and imagine how executives can make data-driven decisions by looking at the impacts of
various actions in front of them.

Batch processing: Even though Batch data processing has been around since mainframe days,
Batch processing gained additional significance with Big Data given the large datasets that it
deals with. Batch data processing is an efficient way of processing high volumes of data where
a group of transactions is collected over a period of time. Hadoop, which I’ll describe later, is
focused on batch data processing.

Cassandra, a beautiful name, is a popular open source database management system managed
by The Apache Software Foundation. Apache can be credited with many big data technologies
and Cassandra was designed to handle large volumes of data across distributed servers.

Cloud computing: Well, cloud computing has become ubiquitous so it may not be needed here
but I included just for completeness sake. It’s essentially software and/or data hosted and
running on remote servers and accessible from anywhere on the internet.

Cluster computing: It’s a fancy term for computing using a ‘cluster’ of pooled resources of
multiple servers. Getting more technical, we might be talking about nodes, cluster management
layer, load balancing, and parallel processing etc.

Dark Data: This, in my opinion, is coined to scare the living daylights out of senior
management. Basically, this refers to all the data that is gathered and processed by enterprises
not used for any meaningful purposes and hence it is ‘dark’ and may never be analyzed. It could
be social network feeds, call center logs, meeting notes and what have you. There are many
estimates that anywhere from 60-90% of all enterprise data may be ‘dark data’ but who really
knows.

Data lake: When I first heard of this, I really thought someone was pulling an April fool’s joke.
But it’s a real term! Data lake is a large repository of enterprise-wide data in raw format. While
we are here, let’s talk about Data warehouse which is similar in concept in that it is also a
repository for enterprise-wide data but in a structured format after cleaning and integrating with
other sources. Data warehouses are typically used for conventional data (but not exclusively).
Supposedly data lake makes it easy to access enterprise-wide data you really need to know what
you are looking for and how to process it and make intelligent use of it.

Data mining: Data mining is about finding meaningful patterns and deriving insights in large
sets of data using sophisticated pattern recognition techniques. It is closely related the term
Analytics that we discussed earlier in that you mine the data to do analytics. To derive
meaningful patterns, data miners use statistics(yup, good old math), machine learning
algorithms, and artificial intelligence.

Data Scientist: Talk about a career that is HOT! It is someone who can make sense of big data
by extracting raw data (did you say from data lake?), massage it, and come up with insights.
Some of the skills required for data scientists are what a superman/woman would have:
analytics, statistics, computer science, creativity, story-telling and understand business context.
No wonder they are so highly paid.

Distributed File System: As big data is too large to store on a single system, Distributed File
System is a data storage system meant to store large volumes of data across multiple storage
devices and will help decrease the cost and complexity of storing large amounts of data.

ETL: ETL stands for extract, transform, and load. It refers to the process of ‘extracting’ raw
data, ‘transforming’ by cleaning/enriching the data for ‘fit for use’ and ‘loading’ into the
appropriate repository for the system’s use. Even though it originated with data warehouses,
ETL processes are used while ‘ingesting i.e. taking/absorbing data from external sources in big
data systems.

Hadoop: When people think of big data, they immediately think about Hadoop. Hadoop (with
its cute elephant logo) is an open source software framework that consists of what is called a
Hadoop Distributed File System (HDFS) and allows for storage, retrieval, and analysis of very
large data sets using distributed hardware. If you really want to impress someone, talk about
YARN (Yet Another Resource Negotiator) which, as the name says it, is a resource scheduler. I
am really impressed by the folks who come up with these names. Apache foundation, which
came up with Hadoop, is also responsible for Pig, Hive, and Spark (yup, they are all names of
various software pieces). Aren’t you impressed with these names?

In-memory computing: In general, any computing that can be done without accessing I/O is
expected to be faster. In-memory computing is a technique to moving the working datasets
entirely within a cluster’s collective memory and avoid writing intermediate calculations to
disk. Apache Spark is is an in-memory computing system and it has huge advantage in speed
over I/O bound systems like Hadoop’s MapReduce.

IoT: The latest buzzword is Internet of Things or IOT. IOT is the interconnection of computing
devices in embedded objects (sensors, wearables, cars, fridges etc.) via internet and they enable
sending / receiving data. IOT generates huge amounts of data presenting many big data
analytics opportunities.

Machine learning: Machine learning is a method of designing systems that can learn, adjust,
and improve based on the data fed to them. Using predictive and statistical algorithms that are
fed to these machines, they learn and continually zero in on “correct” behavior and insights and
they keep improving as more data flows through the system. Fraud detection, online
recommendations based

MapReduce: MapReduce could be little bit confusing but let me give it a try. MapReduce is a
programming model and the best way to understand this is to note that Map and Reduce are two
separate items. In this, the programming model first breaks up the big data dataset into pieces
(in technical terms into ‘tuples’ but let’s not get too technical here) so it can be distributed
across different computers in different locations (i.e. cluster computing described earlier) which
is essentially the Map part. Then the model collects the results and ‘reduces’ them into one
report. MapReduce’s data processing model goes hand-in-hand with hadoop’s distributed file
system.

NoSQL: It almost sounds like a protest against ‘SQL (Structured Query Language) which is the
bread-and-butter for traditional Relational Database Management Systems (RDBMS) but
NOSQL actually stands for Not ONLY SQL :-) . NoSQL actually refers to database
management systems that are designed to handle large volumes of data that does not have a
structure or what’s technically called a ‘schema’ (like relational databases have). NoSQL
databases are often well-suited for big data systems because of their flexibility and distributed-
first architecture needed for large unstructured databases.

R: Can anyone think of any worse name for a programming language? Yes, ‘R’ is a
programming language that works very well with statistical computing. You ain’t a data
scientist if you don;’t know ‘R’. (Please don’t send me nasty grams if you don’t know ‘R’). It is
just that ‘R’ is one of the most popular languages in data science.

Spark (Apache Spark): Apache Spark is a fast, in-memory data processing engine to efficiently
execute streaming, machine learning or SQL workloads that require fast iterative access to
datasets. Spark is generally a lot faster than MapReduce that we discussed earlier.

Stream processing: Stream processing is designed to act on real-time and streaming data with
“continuous” queries. Combined with streaming analytics i.e. the ability to continuously
calculate mathematical or statistical analytics on the fly within the stream, stream processing
solutions are designed to handle very high volume in real time.

Structured v Unstructured Data: This is one of the ‘V’s of Big Data i.e.Variety. Structured data
is basically anything than can be put into relational databases and organized in such a way that
it relates to other data via tables. Unstructured data is everything that can’t – email messages,
social media posts and recorded human speech etc.

5.Eliminate Your Fear And Pick Up Data Science Today.

Introduction

Different people have different answers and view points to the question above. I don’t want to
get into this debate here. I am rather taking a safer approach here. I would tell you a few
applications which are already impacting a lay man’s life. You can read them for yourself and
decide whether this is a buzz or an opportunity.

What to learn from this article?

In this article, I’ve listed down some of the most common applications of data science that we
use in our daily lives. Along side, you’ll also find some advanced applications which are yet to
come in near future. The idea behind sharing them is not to tell you the tools and techniques
used in these cases – it is even more fundamental in nature. I want to showcase the impact data
science in making and excite you about what is in store for future.
If you know of any other application, which I have missed, please feel free to add them through
comments below.

Applications / Uses of Data Science

Using data science, companies have become intelligent enough to push & sell products as per
customers purchasing power & interest. Here’s how they are ruling our hearts and minds:

Internet Search

When we speak of search, we think ‘Google’. Right? But there are many other search engines
like Yahoo, Bing, Ask, AOL, Duckduckgo etc. All these search engines (including Google)
make use of data science algorithms to deliver the best result for our searched query in fraction
of seconds. Considering the fact that, Google processes more than 20 petabytes of data
everyday. Had there been no data science, Google wouldn’t have been the ‘Google’ we know
today.

Digital Advertisements (Targeted Advertising and re-targeting)

If you thought Search would have been the biggest application of data science and machine
learning, here is a challenger – the entire digital marketing spectrum. Starting from the display
banners on various websites to the digital bill boards at the airports – almost all of them are
decided by using data science algorithms.

This is the reason why digital ads have been able to get a lot higher CTR than traditional
advertisements. They can be targeted based on user’s past behaviour. This is the reason why I
see ads of analytics trainings while my friend sees ad of apparels in the same place at the same
time.

Recommender Systems

Who can forget the suggestions about similar products on Amazon? They not only help you
find relevant products from billions of products available with them, but also adds a lot to the
user experience.

A lot of companies have fervidly used this engine / system to promote their products /
suggestions in accordance with user’s interest and relevance of information. Internet giants like
Amazon, Twitter, Google Play, Netflix, Linkedin, imdb and many more uses this system to
improve user experience. The recommendations are made based on previous search results for a
user.
Image Recognition

You upload your image with friends on Facebook and you start getting suggestions to tag your
friends. This automatic tag suggestion feature uses face recognition algorithm. Similarly, while
using whatsapp web, you scan a barcode in your web browser using your mobile phone. In
addition, Google provides you the option to search for images by uploading them. It uses image
recognition and provides related search results. To know more about image recognition,

Speech Recognition

Some of the best example of speech recognition products are Google Voice, Siri, Cortana etc.
Using speech recognition feature, even if you aren’t in a position to type a message, your life
wouldn’t stop. Simply speak out the message and it will be converted to text. However, at
times, you would realize, speech recognition doesn’t perform accurately. the conversation
between Cortana & Satya Nadela (CEO, Microsoft)

Gaming

EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led gaming experience to the next
level using data science. Games are now designed using machine learning algorithms which
improve / upgrade themselves as the player moves up to a higher level. In motion gaming also,
your opponent (computer) analyzes your previous moves and accordingly shapes up its game.

Price Comparison Websites

At a basic level, these websites are being driven by lots and lots of data which is fetched using
APIs and RSS Feeds. If you have ever used these websites, you would know, the convenience
of comparing the price of a product from multiple vendors at one place. PriceGrabber,
PriceRunner, Junglee, Shopzilla, DealTime are some examples of price comparison websites.
Now a days, price comparison website can be found in almost every domain such as
technology, hospitality, automobiles, durables, apparels etc.

Airline Route Planning

Airline Industry across the world is known to bear heavy losses. Except a few airline service
providers, companies are struggling to maintain their occupancy ratio and operating profits.
With high rise in air fuel prices and need to offer heavy discounts to customers has further
made the situation worse. It wasn’t for long when airlines companies started using data science
to identify the strategic areas of improvements. Now using data science, the airline companies
can:

Predict flight delay


Decide which class of airplanes to buy
Whether to directly land at the destination, or take a halt in between (For example: A flight can
have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in
any country.)
Effectively drive customer loyalty programs
Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data
science to bring changes in their way of working.

Fraud and Risk Detection

One of the first applications of data science originated from Finance discipline. Companies
were fed up of bad debts and losses every year. However, they had a lot of data which use to get
collected during the initial paper work while sanctioning loans. They decided to bring in data
science practices in order to rescue them out of losses. Over the years, banking companies
learned to divide and conquer data via customer profiling, past expenditures and other essential
variables to analyze the probabilities of risk and default. Moreover, it also helped them to push
their banking products based on customer’s purchasing power.

Delivery logistics

Who says data science has limited applications? Logistic companies like DHL, FedEx, UPS,
Kuhne+Nagel have used data science to improve their operational efficiency. Using data
science, these companies have discovered the best routes to ship, the best suited time to deliver,
the best mode of transport to choose thus leading to cost efficiency, and many more to mention.
Further more, the data that these companies generate using the GPS installed, provides them a
lots of possibilities to explore using data science.

Miscellaneous

Apart from the applications mentioned above, data science is also used in Marketing, Finance,
Human Resources, Health Care, Government Policies and every possible industry where data
gets generated. Using data science, the marketing departments of companies decide which
products are best for Up selling and cross selling, based on the behavioral data from customers.
In addition, predicting the wallet share of a customer, which customer is likely to churn, which
customer should be pitched for high value product and many other questions can be easily
answered by data science. Finance (Credit Risk, Fraud), Human Resources (which employees
are most likely to leave, employees performance, decide employees bonus) and many other
tasks are easily accomplished using data science in these disciplines.

Coming Up In Future
Though, not much has been reveled about them except the prototypes, and neither I know when
they would be available for a common man’s disposal. Hence, I’ve kept these amazing
application of data science in ‘Coming Up’ section. We need to wait and watch how far Google
can become successful in their self driving cars project. Robots, as we know, have lived for a
while but aren’t being used as a commodity yet due to associated security issues. Let’s see,
what our future holds for us!

End Notes

I am sure, you had fun watching the videos. The intent wasn’t just fun, but learning
simultaneously. By now, you would developed an understanding of the boundless potential that
data science has in this world. Almost everything on this planet, which generates data, falls
under the radar of data science which can improve and optimize its existing process.

Though, I wasn’t sure where to add ‘Coming up in future’ section, since much hasn’t been
revealed about their future advancement. Yet, I thought of leaving you with something
controversial (robots) just to ignite a discussion on the future of robotics.

Best institution For Learning Data Science:


We offer the best Data Science Training in Bangalore. where the trainings are
handled by competent professionals who show ideas in a simple and reasonable
way. Besant Technologies offers programming tests in preparation in Date
ScienceTraining in Chennai with the best position at a reasonable cost .

S-ar putea să vă placă și