Documente Academic
Documente Profesional
Documente Cultură
Theme
1
Summary:
Introduction…………………………………………………........................3
Definition...………………………………………………….........................3
Conclusion…………………………………………………………………………12
Bibliography……………………………………………………………………….13
2
Introduction:
We are in an age often referred to as the information age. In this information age, because we
believe that information leads to power and success, and thanks to sophisticated technologies
such as computers, satellites, etc., we have been collecting tremendous amounts of
information. Initially, with the advent of computers and means for mass digital storage, we
started collecting and storing all sorts of data, counting on the power of computers to help
sort through this amalgam of information. Unfortunately, these massive collections of data
stored on disparate structures very rapidly became overwhelming. This initial chaos has led to
the creation of structured databases and database management systems (DBMS). The
efficient database management systems have been very important assets for management of
a large corpus of data and especially for effective and efficient retrieval of particular
information from a large collection whenever needed. The proliferation of database
management systems has also contributed to recent massive gathering of all sorts of
information. Today, we have far more information than we can handle from business
transactions and scientific data, to satellite pictures, text reports and military intelligence.
Information retrieval is simply not enough anymore for decision-making. Confronted with
huge collections of data, we have now created new needs to help us make better managerial
choices. These needs are automatic summarization of data, extraction of the “essence” of
information stored, and the discovery of patterns in raw data.
Definition:
Generally, data mining (sometimes-called data or knowledge discovery) is the process of
analyzing data from different perspectives and summarizing it into useful information that
can be used to increase revenue, cuts costs, or both. Data mining software is one of a number
of analytical tools for analyzing data. It allows users to analyze data from many different
dimensions or angles, categorize it, and summarize the relationships identified. Technically,
data mining is the process of finding correlations or patterns among dozens of fields in large
relational databases. [1]
Example:
For example, one Midwest grocery chain used the data mining capacity of Oracle software to
analyze local buying patterns. They discovered that when men bought diapers on Thursdays
and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers
typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only
bought a few items. The retailer concluded that they purchased the beer to have it available
for the upcoming weekend. The grocery chain could use this newly discovered information in
various ways to increase revenue. For example, they could move the beer display closer to the
diaper display. Moreover, they could make sure beer and diapers were sold at full price on
Thursdays. [2]
3
Data, Information, and Knowledge:
Data:
Data are any facts, numbers, or text that can be processed by a computer. Today,
organizations are accumulating vast and growing amounts of data in different formats and
different databases. This includes :
Operational or transactional data such as, sales, cost, inventory, payroll, and
accounting.
Nonoperational data, such as industry sales, forecast data, and macro-economic
data.
Meta data - data about the data itself, such as logical database design or data
dictionary definitions.
Information:
The patterns, associations, or relationships among all this data can provide information. For
example, analysis of retail point of sale transaction data can yield information on which
products are selling and when.
Knowledge:
Information can be converted into knowledge about historical patterns and future trends. For
example, summary information on retail supermarket sales can be analyzed in light of
promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer
or retailer could determine which items are most susceptible to promotional efforts.
Scientific Viewpoint:
● Data collected and stored at enormous speeds
(GB/hour)
4
remote sensors on a satellite
telescopes scanning the skies
microarrays generating gene expression data
5
How data mining works?
How exactly is data mining able to tell you important things that you did not know or what is
going to happen next? The technique that is used to perform these feats in data mining is
called modeling. Modeling is simply the act of building a model in one situation where you
know the answer and then applying it to another situation that you do not. For instance, if you
were looking for a sunken Spanish galleon on the high seas the first thing you might do is to
research the times when Spanish treasure had been found by others in the past. You might
note that these ships often tend to be found off the coast of Bermuda and that there are
certain characteristics to the ocean currents, and certain routes that have likely been taken by
the ship’s captains in that era. You note these similarities and build a model that includes the
characteristics that are common to the locations of these sunken treasures. With these
models in hand you sail off looking for treasure where your model indicates it most likely might
be given a similar situation in the past. Hopefully, if you've got a good model, you find your
treasure.
This act of model building is thus something that people have been doing for a long time,
certainly before the advent of computers or data mining technology. What happens on
computers, however, is not much different than the way people build models. Computers are
loaded up with lots of information about a variety of situations where an answer is known and
then the data mining software on the computer must run through that data and distill the
characteristics of the data that should go into the model. Once the model is built, it can then
be used in similar situations where you do not know the answer. For example, say that you
are the director of marketing for a telecommunications company and you would like to
acquire some new long distance phone customers. You could just randomly go out and mail
coupons to the general population - just as you could randomly sail the seas looking for sunken
treasure. In neither case would you achieve the results you desired and of course you have
the opportunity to do much better than random - you could use your business experience
stored in your database to build a model.
6
As the marketing director, you have access to a lot of information about all of your customers:
their age, sex, credit history and long distance calling usage. The good news is that you also
have a lot of information about your prospective customers: their age, sex, credit history etc.
Your problem is that you do not know the long distance calling usage of these prospects (since
they are most likely now customers of your competition). You would like to concentrate on
those prospects who have large amounts of long distance usage. You can accomplish this by
building a model. Table 2 illustrates the data used for building a model for new customer
prospecting in a data warehouse.[5]
Customer Prospects
Clustering
Clustering is identifying similar groups from unstructured data. Clustering is the task of
grouping a set of objects in a such a way that object in same group are more similar to each
other than to those in other groups. Once the clusters are decided, the objects are labelled
their corresponding clusters, and common features of the objects in cluster are summarized
to form a class description. For example, a bank may cluster its customer in to several groups
based on the similarities of their income, age, sex, residence etc. and the command
characteristics of the customers in a group can be used to describe that group of customers.
This will the bank to understand its customers better and thus provide customized services.
Classification
Classification is learning rules that can be applied to new data and will typically include
following steps: preprocessing of data, designing modelling, learning/feature selection and
validation /evaluation. Classification predicts categorical continuous valued functions. For
example, we can make classification model to categorize bank loan application as either safe
7
or risky. Classification is the derivation of model which determines the class of an object based
on its attributes. A set of object is given as training set in which every object is represented by
vector of attributes along with its class. By analyzing the relationship between attributes and
class of the objects in the training set, classification model can be constructed. Such
classification model can be used to classify future objects and develop a better understanding
of the classes of the objects in the database. For example, from the set ISSN (Online) : 2278-
1021 ISSN (Print) : 2319-5940 International Journal of Advanced Research in Computer and
Communication Engineering Vol. 3, Issue 10, October 2014 Copyright to IJARCCE
www.ijarcce.com 8096 of loan borrowers (Name, Age, and Income) who serve as training set,
a classification model can be built, which concludes bank loan application as either safe or
risky. (If age = Youth then Loan decision = risky).
Regression
Regression is finding function with minimal error to model data. It is statistical methodology
that is most often used for numeric prediction. Regression analysis is widely used for
prediction and forecasting, where its use has substantial overlap with the field of machine
learning. Regression analysis is also used to understand which among the independent
variables are related to the dependent variable, and to explore the forms of these
relationships. In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can lead to
illusions or false relationships, so cautions advisable [6] for example, correlation does not
imply causation.
Association
Association is looking for relationship between variables or objects. It aims to extract
interesting association, correlations or casual structures among the objects i.e. the
appearance of another set of objects in [7]. The association rules can be useful for marketing,
commodity management, advertising etc. Association rule learning is a popular and well
researched method for discovering interesting relations between variables in large databases.
It is intended to identify strong rules discovered in databases using different measures of
interestingness[6] and based on the concept of strong rules presented in [8] , introduced
association rules for discovering regularities between products in large-scale transaction data
recorded by point-of-sale (POS) systems in supermarkets. For example, the rule {Onions,
potatoes} {burger} found in the sales data of a supermarket would indicate that if a customer
buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such
information can be used as the basis for decisions about marketing activities such as, e.g.,
promotional pricing or product placements. In addition to the above example from market
basket analysis association rules are employed today in many application areas including Web
usage mining, intrusion detection, Continuous production, and bioinformatics.
8
Data Mining Applications in Sales/Marketing:
Data mining enables businesses to understand the hidden patterns inside historical purchasing
transaction data, thus helping in planning and launching new marketing campaigns in prompt
and cost effective way. The following illustrates several data mining applications in sale and
marketing.
Data mining is used for market basket analysis to provide information on what product
combinations were purchased together when they were bought and in what
sequence. This information helps businesses promote their most profitable products
and maximize the profit. In addition, it encourages customers to purchase related
products that they may have been missed or overlooked.
Retail companies use data mining to identify customer’s behavior buying patterns .
Data mining is applied in claims analysis such as identifying which medical procedures
are claimed together.
Data mining enables to forecasts which customers will potentially purchase new
policies.
Data mining allows insurance companies to detect risky customers’ behavior patterns.
Data mining helps detect fraudulent behavior.
9
Data Mining Applications in Transportation
Data mining helps determine the distribution schedules among warehouses and
outlets and analyze loading patterns.
Marketing / Retail
Data mining helps marketing companies build models based on historical data to predict who
will respond to the new marketing campaigns such as direct mail, online marketing
campaign…etc. Through the results, marketers will have an appropriate approach to selling
profitable products to targeted customers.
Data mining brings many benefits to retail companies in the same way as marketing. Through
market basket analysis, a store can have an appropriate production arrangement in a way that
customers can buy frequent buying products together with pleasant. In addition, it also helps
the retail companies offer certain discounts for particular products that will attract more
customers.
Finance / Banking
Data mining gives financial institutions information about loan information and credit
reporting. By building a model from historical customer’s data, the bank, and financial
institution can determine good and bad loans. In addition, data mining helps banks detect
fraudulent credit card transactions to protect credit card’s owner.
Manufacturing
By applying data mining in operational engineering data, manufacturers can detect faulty
equipment and determine optimal control parameters. For example, semiconductor
manufacturers have a challenge that even the conditions of manufacturing environments at
different wafer production plants are similar, the quality of wafer are a lot the same and some
for unknown reasons even has defects. Data mining has been applying to determine the
ranges of control parameters that lead to the production of the golden wafer. Then those
optimal control parameters are used to manufacture wafers with desired quality.
10
Governments
Data mining helps government agency by digging and analyzing records of the financial
transaction to build patterns that can detect money laundering or criminal activities.
Privacy Issues
The concerns about the personal privacy have been increasing enormously recently especially
when the internet is booming with social networks, e-commerce, forums, blogs…. Because of
privacy issues, people are afraid of their personal information is collected and used in an
unethical way that potentially causing them a lot of troubles. Businesses collect information
about their customers in many ways for understanding their purchasing behaviors trends.
However, businesses do not last forever, some days they may be acquired by other or gone.
At this time, the personal information they own probably is sold to other or leak.
Security issues
Security is a big issue. Businesses own information about their employees and customers
including social security number, birthday, payroll etc. However how properly this information
is taken care is still in questions. There have been a lot of cases that hackers accessed and
stole big data of customers from the big corporation such as Ford Motor Credit Company,
Sony… with so much personal and financial information available, the credit card stolen and
identity theft become a big problem.
Information is collected through data mining intended for the ethical purposes can be
misused. This information may be exploited by unethical people or businesses to take benefits
of vulnerable people or discriminate against a group of people.In addition, data mining
technique is not perfectly accurate. Therefore, if inaccurate information is used for decision-
making, it will cause serious consequence.
11
Conclusion:
12
[1]
[2]
[3]
[4]
[5] http://www.thearling.com/text/dmwhite/dmwhite.htm \ 7_11_2016
[6] R.Kaur, S.Kaur, A.Kaur, R.Kaur, A.Kaur, “An Overview of Database management System,
Data warehousing and Data Mining”. IJARCCE, Vol.2, issue.7, July 2013.
[7] Y.Fu , Data Minig : Tasks, Techniques and Applications.
[8] Y. Ramamohan, K. Vasantharao, C. Kalyana Chakravarti, and A.S.K.Ratnam, “A Study of
Data Mining Tools in Knowledge
13