Sunteți pe pagina 1din 18

Table of Contents

Overview 2 - 3 Data Mining History & Background...4 Statistics...4 Artificial Intelligence..4 - 5 Machine Learning 5 How Data Mining Works6 - 8 Steps in Data Mining..9 - 10 Data Mining Elements....10 Data, Information, and Knowledge.....11 Advantages and Disadvantages of Data Mining.12 Advantages of Data Mining.12 - 13 Disadvantages of Data Mining.....13 - 14 Sample Company that uses Data Mining . .15 Conclusion..16 References Appendix

Overview
Data mining is the process of discovering knowledge from databases that are being stored in data warehouses. It is a powerful new technology which helps companies focus on the most important information in their data warehouses.It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Data mining tools foresee future trends and behaviors, allowing businesses to make practical, knowledge-driven decisions. These tools answer business concerns that would usually be time consuming to solve.The purpose is to identify valid, useful, and understandable patterns in data.Data mining involves a seven (7) step process: Data Integration

Data Selection

Data Cleaning

Data Transformation

Data Mining

Pattern Evaluation and Knowledge Presentation

Decisions / Use of Discovered Knowledge

Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost.For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.

Data Mining History & Background

Data mining history started about 30 to 40 years ago. It started off as statistical analysis, promoted by two companies SAS (Statistical Analysis System) and SPSS (IBM Company).Statistics with regression analysis, standard distribution/deviation/variance, cluster analysis, confidence intervals is still important but today new techniques add greatly to the power of the statistics routines. New methods such as fuzzy logic, heuristics and neural networks were arriving on the scene in the 1980's. These could be classified into two groups - artificial intelligence and machine learning. First workshops on knowledge discovery in databases (KDD another name for data mining) were in the early 1990's. It could be said that data mining had three sources or roots, namelyStatistics, Artificial Intelligence and Machine learning. Statistics Statistics has been the core of intelligence over the past and has done a significant contribution to the business intelligence sector in the past. Nevertheless the statistics has not been able to produce all the expected outcomes in the complex business requirements of the modern days industries. The classical statistics model was comprised of concepts such as regression analysis, co-relation analysis, standard distribution, variance, standard deviation, and cluster analysis. Hence all these techniques could be identified as study of data and their relationships in a static manner. Artificial Intelligence Artificial Intelligence is another concept that has been an appealing topic in the research groups. The inspiration here was that it used heuristics compared to statistics where it makes an effort to

simulate the human thought process in statistical problems. Although this was an excellent theoretical concept its requirement of high computational power made it impractical in the early 1980s when it came to the lime light. Machine Learning Stemming from the AI model and the classical Statistics model another model came in with union of both named machine learning. Whilst AI was not a major success in the commercial area, machine learning has incorporated many concepts of it as the computational power has become cheap over the past few years. It could be stated that Machine learning is an evolution of AI since it incorporates heuristic model of AI with the advanced statistical analysis. It is such that machine learning lets the computers to learn the data it process and achieves the goal by using the learned data being applied to an advanced statistical model. It is believed that data mining as it is defined today is about 10 15 years old. Below is a diagram that shows the roots of data mining.
Statistics

How Data Mining Works


If you are interested in using data mining, it is important to have an understanding of how it works. The method that is used with data mining to make predictions is called modeling. Modeling is the process of creating a model. A model can be defined as a number of examples or a mathematical relationship. The relationship or examples are created based on existing situations where an answer is already known. The purpose of data mining is to take the model and place it in a situation where the answer is unknown. While the mathematics behind data mining has existed for centuries, it is the recent increase in computer processing power and storage that has allowed data mining technology to become feasible. To use an example of creating a model, imagine if you are the head of marketing for a telecommunications company. You have decided that you want to direct your advertising and sales towards people who are the most likely to use long distance communication services. While you may have knowledge of your own customers, it will be difficult to learn all the different attributes related to your best customers because there are too many variables to take into consideration.However, if you have a database which has information related to the income, credit history, age, sex, and occupation of your customers, you can use data mining tools to find the common attributes that are related to the customers that frequently make long distance phone calls. The use of data mining may allow you to learn that most of your high value customers are middle aged women that are 45 years of age. You may also find that these women have an average income that is in excess of $50,000 a year. Now that you know a bit about your best customers, you can
6

now tailor your advertising efforts to suit their needs. By doing this, you will greatly increase your chances of earning a profit. Computer algorithms are frequently used indata mining programs; nevertheless the factors which have led to the increasing popularity of data mining technologies are the increase in both processing power and storage. Another thing that has led to the rapid popularity of data mining technology is graphical interfaces. These interfaces have made the programs easier to use, and this has allowed them to be adapted by a larger segment of the population. Artificial neural networks are a cutting edge technology that is being used more in data mining applications. Unlike computer algorithms, neural networks are not linear, and are capable of learning. Neural networks are modeled after the human mind, and have powerful applications in data mining that have not been fully explored. In addition to this, decision trees play an important role in the development of data mining programs. As the name implies, decision trees are structures have a number of different decisions. Each decision could be called a branch. The decisions define the rules for a given set of data. The next element that makes up an important part of data mining is called rule induction. A rule induction will pull rules from data which are based on an "if-then" scenario. The next part that makes up data mining is a genetic algorithm. The genetic algorithm will utilize techniques that are based on mutation and natural selection. The last important part of data mining tools is called the nearest neighbor. The nearest neighbor will categorize records with other records that are similar within a database.

There are a number of real-world applications of data mining programs. Generally, having information which is highly detailed will allow you to make predictions that are equally detailed. Using this detailed information to make predictions about the behavior of your customers can allow you to make large profits. Companies can use data mining tools to get answers to complex questions. For example, a credit card company that wants to increase its revenues could use data mining to find out if reducing the minimum payments would allow them to earn more interest. If the company has detailed information related to their customers, they should be able to make accurate predictions about how customers will react to policies.

Steps in Data Mining


1. Data Integration: First of all the data are collected and combined from all the different

sources.
2. Data Selection: We may not use all the data we have collected in the first step. So in this

step we select only those data which we think useful for data mining.
3. Data Cleaning: The data that have been collected are not clean and may contain errors,

missing values, noisy or inconsistent data. So we need to apply different techniques to get rid of such anomalies.
4. Data Transformation: The data even after cleaning are not ready for mining as we need to

transform them into forms appropriate for mining. The techniques used to accomplish this are smoothing, aggregation, normalization etc.
5. Data Mining: Now we are ready to apply data mining techniques on the data to discover

the interesting patterns. Techniques like clustering and association analysis are among the many different techniques used for data mining.
6. Pattern Evaluation and Knowledge Presentation: This step involves visualization,

transformation, removing redundant patterns etc from the patterns we generated.


7. Decisions / Use of Discovered Knowledge: This step helps user to make use of the

knowledge acquired to take better decisions. Please see diagram overleaf.

Data Mining Elements


Data mining consists of five major elements namely:

Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

10

Data, Information, and Knowledge


Data Data can be defined as any facts, numbers, or text that can be processed by a computer. Organizations are gathering immense and growing amounts of data in different formats and different databases. This includes:

Operational or transactional data such as, sales, cost, inventory, payroll, and accounting. Nonoperational data, such as industry sales, forecast data, and macro-economic data. Meta data - data about the data itself, such as logical database design or data dictionary definition.

Information The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when. Knowledge Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts.

11

Advantages and Disadvantages of Data Mining


Advantages of Data Mining
Relevant information is gathered for usage within the organization and or company. Marketing companies use data mining to help build models on historical data to predict who will respond to new marketing campaign such as direct mail, online marketing campaign and others. With this prediction, markets can appropriately approach targeted customers to sell profitable products while providing satisfaction. Likewise, data mining brings many benefits to retail

companies. It helps retail companies to offer certain discount for particular products that will attract customers. Useful and accurate trends about an organization customers purchasing behavior are obtained. The data gathered can be sanitized for different departments within the organization. Data mining provides financial institutions with information about loans and credit reporting. By building a model from previous customer data with common characteristics, the institution can estimate the good and/ or bad loans and its risk level. It also helps banks to detect fraudulent credit card transactions to help credit card owners avoid losses. Information/data is available for future reference, to project market trends. Manufacturers can detect faulty equipment and determine optimal control parameters by applying data mining in operational engineering data.

12

Management decisions can be made to provide greater benefit to the staff and organization. Data mining helps government agency by digging and analyzing records of financial transaction to build patterns that can detect money laundering or criminal activity.

Disadvantages of Data Mining


Privacy Issues: The booming of social networks, e-commerce, forums and blogs have

enormously increased concerns about personal privacy, and as such people are afraid of their personal information being collected and used in unethical ways that may cause harm. Businesses collect information about their customers in many ways for understanding their purchasing behaviors trends. However businesses sometimes close down their operations or acquired by others. At this time the personal information they own may be sold or leaked to a third party. Security Issues: Businesses owns information about their employees and customers including social security number, birthday, payroll and other personal information, but the issue of whether or not this information is properly stored is a great concerns. There have been a lot of cases that hackers accessed and stole enormous data of customers from big corporation such as Ford Motor Credit Company, Sony and others. With so much personal and financial information available, the credit card stolen and identity theft is a big problem.

13

Misuse of information/inaccurate information: Information collected through data mining intended for marketing or ethical purposes can be misused. Information is demoralized by unethical people or business to take advantage of vulnerable people or discriminate against a group of people. Data mining technique is not necessarily accurate; therefore if inaccurate information is used for decision-making this will cause serious consequences.

14

Sample Company which uses Data Mining

Digicel is a sample company that uses data mining. After 11 years of operation, Digicel Group Limited has over 12.8 million customers across its thirty-two markets in the Caribbean, Central America and the Pacific. The company is renowned for delivering best value, best service and best network.

Digicel is the lead sponsor of Caribbean, Central American and Pacific sports teams, including the Special Olympics teams throughout these regions. Digicel sponsors the West Indies cricket team and is also the title sponsor of the Digicel Caribbean Cup. In the Pacific, Digicel is the proud sponsor of several national rugby teams and also sponsors the Vanuatu cricket team.

Digicel also runs a host of community-based initiatives across its markets and has set up Digicel Foundations in Jamaica, Haiti and Papua New Guinea which focus on educational, cultural and social development programs.

It is extremely important for data to be captured and store for all the various markets that Digicel operates in order to make management decisions on a daily basis. Three (3) software are used in data mining coupled with other Applications, software that are used: Sales Force, Customer Relation Manager (CRM) and E-care. Thesoftwares are used to capture relevant data from customers and staff for various departments. Several reports are generated from the data acquired and job cards, decisions are made based on these reports.
15

Conclusion
In the short term the results of data mining will be in profitable, if mundane, business related areas. Micro marketing campaigns will explore new niches.Advertising will target potential customers with new precision. In the medium term, data mining may be as common and easy to use as email. We may use these tools to find the best airfare to New York, root out a phone number of a long-lost classmate or find the best prices on lawn mowers. In the long-term, decisions about the sustainability, continuity and profitability of an organization can be made, to project revenue growth and investments. Imagine intelligent agents turned loose on medical research data or on sub-atomic particle data. Computers may reveal new treatments for disease or new insights into the nature of the universe.

16

References

Data Mining: What is Data Mining?


http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm: Retrieved on September 10, 2012 at 2:40 p.m.

Data Mining and Housing History http://dataminingandhousinghistory.com: Retrieved on September 11, 2012 at 9:35 p.m.

Business Research Methods/Research Methodology http://www.researchmethodology.info/data-mining/: Retrieved on September 14, 2012 at 9:32 a.m.

Business Research Methods 8th edition Authors: William G. Zikmund/ Barry J. Babin/ John C. Carr/ Mitch Griffin.

Digicel University Digicel Group, Jamaica, RKA 10-16 Grenada Way, Kingston 5

17

APPENDIX

18

S-ar putea să vă placă și