Sunteți pe pagina 1din 13

University Kasdi Merbah ouargla

Faculty of New Information Technologies


And Communication
Department of Computer Science and
Information Technology

Theme

 Chaima Derouiche Khalil Mezriche


 Hadjer EL_karbo

1
Summary:
Introduction…………………………………………………........................3

Definition...………………………………………………….........................3

Data, information and knowledge……………………………………….4

Cause of using data mining................................................…..4

What kind of data can be mined………………………………………….5

Origin of data mining………………………………………………………….5

How data mining works………………………………………………………6

The tasks of data mining……………………………………..................7

Data mining applications…………………………………………………….9

Advantage and disadvantage of data mining……………….......10

Conclusion…………………………………………………………………………12

Bibliography……………………………………………………………………….13

2
Introduction:
We are in an age often referred to as the information age. In this information age, because we
believe that information leads to power and success, and thanks to sophisticated technologies
such as computers, satellites, etc., we have been collecting tremendous amounts of
information. Initially, with the advent of computers and means for mass digital storage, we
started collecting and storing all sorts of data, counting on the power of computers to help
sort through this amalgam of information. Unfortunately, these massive collections of data
stored on disparate structures very rapidly became overwhelming. This initial chaos has led to
the creation of structured databases and database management systems (DBMS). The
efficient database management systems have been very important assets for management of
a large corpus of data and especially for effective and efficient retrieval of particular
information from a large collection whenever needed. The proliferation of database
management systems has also contributed to recent massive gathering of all sorts of
information. Today, we have far more information than we can handle from business
transactions and scientific data, to satellite pictures, text reports and military intelligence.
Information retrieval is simply not enough anymore for decision-making. Confronted with
huge collections of data, we have now created new needs to help us make better managerial
choices. These needs are automatic summarization of data, extraction of the “essence” of
information stored, and the discovery of patterns in raw data.

Definition:
Generally, data mining (sometimes-called data or knowledge discovery) is the process of
analyzing data from different perspectives and summarizing it into useful information that
can be used to increase revenue, cuts costs, or both. Data mining software is one of a number
of analytical tools for analyzing data. It allows users to analyze data from many different
dimensions or angles, categorize it, and summarize the relationships identified. Technically,
data mining is the process of finding correlations or patterns among dozens of fields in large
relational databases. [1]

Example:
For example, one Midwest grocery chain used the data mining capacity of Oracle software to
analyze local buying patterns. They discovered that when men bought diapers on Thursdays
and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers
typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only
bought a few items. The retailer concluded that they purchased the beer to have it available
for the upcoming weekend. The grocery chain could use this newly discovered information in
various ways to increase revenue. For example, they could move the beer display closer to the
diaper display. Moreover, they could make sure beer and diapers were sold at full price on
Thursdays. [2]

3
Data, Information, and Knowledge:
Data:
Data are any facts, numbers, or text that can be processed by a computer. Today,
organizations are accumulating vast and growing amounts of data in different formats and
different databases. This includes :

 Operational or transactional data such as, sales, cost, inventory, payroll, and
accounting.
 Nonoperational data, such as industry sales, forecast data, and macro-economic
data.
 Meta data - data about the data itself, such as logical database design or data
dictionary definitions.

Information:
The patterns, associations, or relationships among all this data can provide information. For
example, analysis of retail point of sale transaction data can yield information on which
products are selling and when.

Knowledge:
Information can be converted into knowledge about historical patterns and future trends. For
example, summary information on retail supermarket sales can be analyzed in light of
promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer
or retailer could determine which items are most susceptible to promotional efforts.

Cause of using data mining:


Commercial Viewpoint:
● Lots of data is being collected and warehoused
 Web data, e-commerce
 purchases at department/ grocery stores
 Bank/Credit Card transactions
● Computers have become cheaper and more powerful
● Competitive Pressure is Strong Provide better, customized services for an edge (e.g. in
Customer Relationship Management)

Scientific Viewpoint:
● Data collected and stored at enormous speeds
(GB/hour)

4
 remote sensors on a satellite
 telescopes scanning the skies
 microarrays generating gene expression data

Scientific simulations generating terabytes of data


● Traditional techniques infeasible for raw data
● Data mining may help scientists in classifying and segmenting in Hypothesis
Formation

What kind of Data can be mined?


In principle, data mining is not specific to one type of media or data. Data mining should be
applicable to any kind of information repository. However, algorithms and approaches may
differ when applied to different types of data. Indeed, the challenges presented by different
types of data vary significantly. Data mining is being put into use and studied for databases,
including relational databases, object-relational databases and object-oriented databases,
data warehouses, transactional databases, unstructured and semi structured repositories
such as the World Wide Web, advanced databases such as spatial databases, multimedia
databases, time-series databases and textual databases, and even flat files. Here are some
examples in more detail:
• Flat files: Flat files are actually the most common data source for data mining algorithms,
especially at the research level. Flat files are simple data files in text or binary format with a
structure known by the data-mining algorithm to be applied. The data in these files can be
transactions, time-series data, scientific measurements, etc.
• Relational Databases: Briefly, a relational database consists of a set of tables containing
either values of entity attributes, or values of attributes from entity relationships. Tables have
columns and rows, where columns represent attributes and rows represent tuples. A tuple in
a relational table corresponds to either an object or a relationship between objects and is
identified by a set of attribute values representing a unique key.[3]

Origin of data mining:


● Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems

● Traditional Techniques may be unsuitable due to


 Enormity of data
 High dimensionality of data

Heterogeneous, distributed nature of data[4].

5
How data mining works?
How exactly is data mining able to tell you important things that you did not know or what is
going to happen next? The technique that is used to perform these feats in data mining is
called modeling. Modeling is simply the act of building a model in one situation where you
know the answer and then applying it to another situation that you do not. For instance, if you
were looking for a sunken Spanish galleon on the high seas the first thing you might do is to
research the times when Spanish treasure had been found by others in the past. You might
note that these ships often tend to be found off the coast of Bermuda and that there are
certain characteristics to the ocean currents, and certain routes that have likely been taken by
the ship’s captains in that era. You note these similarities and build a model that includes the
characteristics that are common to the locations of these sunken treasures. With these
models in hand you sail off looking for treasure where your model indicates it most likely might
be given a similar situation in the past. Hopefully, if you've got a good model, you find your
treasure.

This act of model building is thus something that people have been doing for a long time,
certainly before the advent of computers or data mining technology. What happens on
computers, however, is not much different than the way people build models. Computers are
loaded up with lots of information about a variety of situations where an answer is known and
then the data mining software on the computer must run through that data and distill the
characteristics of the data that should go into the model. Once the model is built, it can then
be used in similar situations where you do not know the answer. For example, say that you
are the director of marketing for a telecommunications company and you would like to
acquire some new long distance phone customers. You could just randomly go out and mail
coupons to the general population - just as you could randomly sail the seas looking for sunken
treasure. In neither case would you achieve the results you desired and of course you have
the opportunity to do much better than random - you could use your business experience
stored in your database to build a model.

6
As the marketing director, you have access to a lot of information about all of your customers:
their age, sex, credit history and long distance calling usage. The good news is that you also
have a lot of information about your prospective customers: their age, sex, credit history etc.
Your problem is that you do not know the long distance calling usage of these prospects (since
they are most likely now customers of your competition). You would like to concentrate on
those prospects who have large amounts of long distance usage. You can accomplish this by
building a model. Table 2 illustrates the data used for building a model for new customer
prospecting in a data warehouse.[5]

Customer Prospects

General information (e.g. demographic Know Know


data)

Proprietary information (e.g. customer Know Target


transactions)

The tasks of Data mining:


Summarization
Summarization is the generalization or abstraction of data. A set of relevant data is
abstracted and summarized, resulting a smaller set, which gives a general overview of data.
For example, the long distance calls of customer can be summarized in to total minutes,
total calls, total spending etc. instead of detailed calls. Similarly, the calls can be summarized
in to local calls, STD calls, ISD calls etc.

Clustering
Clustering is identifying similar groups from unstructured data. Clustering is the task of
grouping a set of objects in a such a way that object in same group are more similar to each
other than to those in other groups. Once the clusters are decided, the objects are labelled
their corresponding clusters, and common features of the objects in cluster are summarized
to form a class description. For example, a bank may cluster its customer in to several groups
based on the similarities of their income, age, sex, residence etc. and the command
characteristics of the customers in a group can be used to describe that group of customers.
This will the bank to understand its customers better and thus provide customized services.

Classification
Classification is learning rules that can be applied to new data and will typically include
following steps: preprocessing of data, designing modelling, learning/feature selection and
validation /evaluation. Classification predicts categorical continuous valued functions. For
example, we can make classification model to categorize bank loan application as either safe

7
or risky. Classification is the derivation of model which determines the class of an object based
on its attributes. A set of object is given as training set in which every object is represented by
vector of attributes along with its class. By analyzing the relationship between attributes and
class of the objects in the training set, classification model can be constructed. Such
classification model can be used to classify future objects and develop a better understanding
of the classes of the objects in the database. For example, from the set ISSN (Online) : 2278-
1021 ISSN (Print) : 2319-5940 International Journal of Advanced Research in Computer and
Communication Engineering Vol. 3, Issue 10, October 2014 Copyright to IJARCCE
www.ijarcce.com 8096 of loan borrowers (Name, Age, and Income) who serve as training set,
a classification model can be built, which concludes bank loan application as either safe or
risky. (If age = Youth then Loan decision = risky).

Regression
Regression is finding function with minimal error to model data. It is statistical methodology
that is most often used for numeric prediction. Regression analysis is widely used for
prediction and forecasting, where its use has substantial overlap with the field of machine
learning. Regression analysis is also used to understand which among the independent
variables are related to the dependent variable, and to explore the forms of these
relationships. In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can lead to
illusions or false relationships, so cautions advisable [6] for example, correlation does not
imply causation.

Association
Association is looking for relationship between variables or objects. It aims to extract
interesting association, correlations or casual structures among the objects i.e. the
appearance of another set of objects in [7]. The association rules can be useful for marketing,
commodity management, advertising etc. Association rule learning is a popular and well
researched method for discovering interesting relations between variables in large databases.
It is intended to identify strong rules discovered in databases using different measures of
interestingness[6] and based on the concept of strong rules presented in [8] , introduced
association rules for discovering regularities between products in large-scale transaction data
recorded by point-of-sale (POS) systems in supermarkets. For example, the rule {Onions,
potatoes} {burger} found in the sales data of a supermarket would indicate that if a customer
buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such
information can be used as the basis for decisions about marketing activities such as, e.g.,
promotional pricing or product placements. In addition to the above example from market
basket analysis association rules are employed today in many application areas including Web
usage mining, intrusion detection, Continuous production, and bioinformatics.

Data Mining Applications:

8
Data Mining Applications in Sales/Marketing:
Data mining enables businesses to understand the hidden patterns inside historical purchasing
transaction data, thus helping in planning and launching new marketing campaigns in prompt
and cost effective way. The following illustrates several data mining applications in sale and
marketing.

 Data mining is used for market basket analysis to provide information on what product
combinations were purchased together when they were bought and in what
sequence. This information helps businesses promote their most profitable products
and maximize the profit. In addition, it encourages customers to purchase related
products that they may have been missed or overlooked.
 Retail companies use data mining to identify customer’s behavior buying patterns .

Data Mining Applications in Banking / Finance


 Several data mining techniques e.g., distributed data mining have been researched,
modeled and developed to help credit card fraud detection.
 Data mining is used to identify customers loyalty by analyzing the data of customer’s
purchasing activities such as the data of frequency of purchase in a period of time, a
total monetary value of all purchases and when was the last purchase. After analyzing
those dimensions, the relative measure is generated for each customer. The higher of
the score, the more relative loyal the customer is.
 To help the bank to retain credit card customers, data mining is applied. By analyzing
the past data, data mining can help banks predict customers that likely to change their
credit card affiliation so they can plan and launch different special offers to retain
those customers.
 Credit card spending by customer groups can be identified by using data mining.
 The hidden correlation’s between different financial indicators can be discovered by
using data mining.
 From historical market data, data mining enables to identify stock trading rules.

Data Mining Applications in Health Care and Insurance


The growth of the insurance industry entirely depends on the ability to convert data into the
knowledge, information or intelligence about customers, competitors, and its markets. Data
mining is applied in insurance industry lately but brought tremendous competitive advantages
to the companies who have implemented it successfully. The data mining applications in
insurance industry are listed below:

 Data mining is applied in claims analysis such as identifying which medical procedures
are claimed together.
 Data mining enables to forecasts which customers will potentially purchase new
policies.
 Data mining allows insurance companies to detect risky customers’ behavior patterns.
 Data mining helps detect fraudulent behavior.

9
Data Mining Applications in Transportation
 Data mining helps determine the distribution schedules among warehouses and
outlets and analyze loading patterns.

Data Mining Applications in Medicine


 Data mining enables to characterize patient activities to see incoming office visits.
 Data mining helps identify the patterns of successful medical therapies for different
illnesses.

Advantages and Disadvantages of Data Mining


Advantages of Data Mining

Marketing / Retail

Data mining helps marketing companies build models based on historical data to predict who
will respond to the new marketing campaigns such as direct mail, online marketing
campaign…etc. Through the results, marketers will have an appropriate approach to selling
profitable products to targeted customers.

Data mining brings many benefits to retail companies in the same way as marketing. Through
market basket analysis, a store can have an appropriate production arrangement in a way that
customers can buy frequent buying products together with pleasant. In addition, it also helps
the retail companies offer certain discounts for particular products that will attract more
customers.

Finance / Banking

Data mining gives financial institutions information about loan information and credit
reporting. By building a model from historical customer’s data, the bank, and financial
institution can determine good and bad loans. In addition, data mining helps banks detect
fraudulent credit card transactions to protect credit card’s owner.

Manufacturing

By applying data mining in operational engineering data, manufacturers can detect faulty
equipment and determine optimal control parameters. For example, semiconductor
manufacturers have a challenge that even the conditions of manufacturing environments at
different wafer production plants are similar, the quality of wafer are a lot the same and some
for unknown reasons even has defects. Data mining has been applying to determine the
ranges of control parameters that lead to the production of the golden wafer. Then those
optimal control parameters are used to manufacture wafers with desired quality.

10
Governments

Data mining helps government agency by digging and analyzing records of the financial
transaction to build patterns that can detect money laundering or criminal activities.

Disadvantages of data mining

Privacy Issues

The concerns about the personal privacy have been increasing enormously recently especially
when the internet is booming with social networks, e-commerce, forums, blogs…. Because of
privacy issues, people are afraid of their personal information is collected and used in an
unethical way that potentially causing them a lot of troubles. Businesses collect information
about their customers in many ways for understanding their purchasing behaviors trends.
However, businesses do not last forever, some days they may be acquired by other or gone.
At this time, the personal information they own probably is sold to other or leak.

Security issues

Security is a big issue. Businesses own information about their employees and customers
including social security number, birthday, payroll etc. However how properly this information
is taken care is still in questions. There have been a lot of cases that hackers accessed and
stole big data of customers from the big corporation such as Ford Motor Credit Company,
Sony… with so much personal and financial information available, the credit card stolen and
identity theft become a big problem.

Misuse of information/inaccurate information

Information is collected through data mining intended for the ethical purposes can be
misused. This information may be exploited by unethical people or businesses to take benefits
of vulnerable people or discriminate against a group of people.In addition, data mining
technique is not perfectly accurate. Therefore, if inaccurate information is used for decision-
making, it will cause serious consequence.

11
Conclusion:

Data mining is an important part of knowledge discovery process that we can


analyze an enormous set of data and get hidden and useful knowledge. Data
mining is applied effectively not only in the business environment but also in
other fields such as weather forecast, medicine, transportation, healthcare,
insurance, government…etc. Data mining has many advantages when using in a
specific industry. Besides those advantages, data mining also has its own
disadvantages e.g., privacy, security and misuse of information.

12
[1]
[2]
[3]
[4]
[5] http://www.thearling.com/text/dmwhite/dmwhite.htm \ 7_11_2016
[6] R.Kaur, S.Kaur, A.Kaur, R.Kaur, A.Kaur, “An Overview of Database management System,
Data warehousing and Data Mining”. IJARCCE, Vol.2, issue.7, July 2013.
[7] Y.Fu , Data Minig : Tasks, Techniques and Applications.
[8] Y. Ramamohan, K. Vasantharao, C. Kalyana Chakravarti, and A.S.K.Ratnam, “A Study of
Data Mining Tools in Knowledge

13

S-ar putea să vă placă și