Sunteți pe pagina 1din 3

6/25/2015

CrossIndustryStandardProcessforDataMiningWikipedia,thefreeencyclopedia

CrossIndustryStandardProcessforDataMining
FromWikipedia,thefreeencyclopedia

CrossIndustryStandardProcessforDataMining,commonlyknownbyitsacronymCRISPDM,[1]isa
dataminingprocessmodelthatdescribescommonlyusedapproachesthatdataminingexpertsusetotackle
problems.Pollsconductedin2002,2004,and2007showthatitistheleadingmethodologyusedbydata
miners.[2][3][4]TheonlyotherdataminingstandardnamedinthesepollswasSEMMA.However,34times
asmanypeoplereportedusingCRISPDM.Areviewandcritiqueofdataminingprocessmodelsin2009
calledtheCRISPDMthe"defactostandardfordevelopingdataminingandknowledgediscovery
projects."[5]OtherreviewsofCRISPDManddataminingprocessmodelsincludeKurganandMusilek's
2006review,[6]andAzevedoandSantos'2008comparisonofCRISPDMandSEMMA.[7]Anupdated
versionofCRISPDM,theStandardMethodologyforAnalyticalModels(SMAM),hasrecentlybeen
introduced,morefullydescribingthemethodologyasanendtoendprocessinthebusiness.[8]

Majorphases
CRISPDMbreakstheprocessofdataminingintosixmajorphases.[9]
Thesequenceofthephasesisnotstrictandmovingbackandforthbetweendifferentphasesisalways
required.Thearrowsintheprocessdiagramindicatethemostimportantandfrequentdependencies
betweenphases.Theoutercircleinthediagramsymbolizesthecyclicnatureofdataminingitself.Adata
miningprocesscontinuesafterasolutionhasbeendeployed.Thelessonslearnedduringtheprocesscan
triggernew,oftenmorefocusedbusinessquestionsandsubsequentdataminingprocesseswillbenefitfrom
theexperiencesofpreviousones.
BusinessUnderstanding
Thisinitialphasefocusesonunderstandingtheproject
objectivesandrequirementsfromabusinessperspective,and
thenconvertingthisknowledgeintoadataminingproblem
definition,andapreliminaryplandesignedtoachievethe
objectives.Adecisionmodel,especiallyonebuiltusingthe
DecisionModelandNotationstandardcanbeused.
DataUnderstanding
Thedataunderstandingphasestartswithaninitialdata
collectionandproceedswithactivitiesinordertogetfamiliar
withthedata,toidentifydataqualityproblems,todiscover
firstinsightsintothedata,ortodetectinterestingsubsetsto
formhypothesesforhiddeninformation.
DataPreparation
Processdiagramshowingthe
Thedatapreparationphasecoversallactivitiestoconstruct
relationshipbetweenthedifferent
thefinaldataset(datathatwillbefedintothemodeling
phasesofCRISPDM
tool(s))fromtheinitialrawdata.Datapreparationtasksare
likelytobeperformedmultipletimes,andnotinany
prescribedorder.Tasksincludetable,record,andattributeselectionaswellastransformationand
cleaningofdataformodelingtools.
Modeling
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

1/3

6/25/2015

CrossIndustryStandardProcessforDataMiningWikipedia,thefreeencyclopedia

Inthisphase,variousmodelingtechniquesareselectedandapplied,andtheirparametersare
calibratedtooptimalvalues.Typically,thereareseveraltechniquesforthesamedataminingproblem
type.Sometechniqueshavespecificrequirementsontheformofdata.Therefore,steppingbackto
thedatapreparationphaseisoftenneeded.
Evaluation
Atthisstageintheprojectyouhavebuiltamodel(ormodels)thatappearstohavehighquality,from
adataanalysisperspective.Beforeproceedingtofinaldeploymentofthemodel,itisimportantto
morethoroughlyevaluatethemodel,andreviewthestepsexecutedtoconstructthemodel,tobe
certainitproperlyachievesthebusinessobjectives.Akeyobjectiveistodetermineifthereissome
importantbusinessissuethathasnotbeensufficientlyconsidered.Attheendofthisphase,adecision
ontheuseofthedataminingresultsshouldbereached.
Deployment
Creationofthemodelisgenerallynottheendoftheproject.Evenifthepurposeofthemodelisto
increaseknowledgeofthedata,theknowledgegainedwillneedtobeorganizedandpresentedina
waythatthecustomercanuseit.Dependingontherequirements,thedeploymentphasecanbeas
simpleasgeneratingareportorascomplexasimplementingarepeatabledatascoring(e.g.segment
allocation)ordataminingprocess.Inmanycasesitwillbethecustomer,notthedataanalyst,who
willcarryoutthedeploymentsteps.Eveniftheanalystdeploysthemodelitisimportantforthe
customertounderstandupfronttheactionswhichwillneedtobecarriedoutinordertoactually
makeuseofthecreatedmodels.

History
CRISPDMwasconceivedin1996.In1997itgotunderwayasaEuropeanUnionprojectunderthe
ESPRITfundinginitiative.Theprojectwasledbyfivecompanies:SPSS,Teradata,DaimlerAG,NCR
CorporationandOHRA,aninsurancecompany.
Thiscoreconsortiumbroughtdifferentexperiencestotheproject:ISL,lateracquiredandmergedintoSPSS
Inc.ThecomputergiantNCRCorporationproducedtheTeradatadatawarehouseanditsowndatamining
software.DaimlerBenzhadasignificantdataminingteam.OHRAwasjuststartingtoexplorethepotential
useofdatamining.
Thefirstversionofthemethodologywaspresentedatthe4thCRISPDMSIGWorkshopinBrusselsin
March1999,[10]andpublishedasastepbystepdataminingguidelaterthatyear.[11]
Between2006and2008aCRISPDM2.0SIGwasformedandtherewerediscussionsaboutupdatingthe
CRISPDMprocessmodel.[5][12]Thecurrentstatusoftheseeffortsisnotknown.However,theoriginal
crispdm.orgwebsitecitedinthereviews,[6][7]andtheCRISPDM2.0SIGwebsite[5][12]arebothnolonger
active.
WhilemanynonIBMdataminingpractitionersuseCRISPDM,[2][3][4][5]IBMistheprimarycorporation
thatcurrentlyembracestheCRISPDMprocessmodel.ItmakessomeoftheoldCRISPDMdocuments
availablefordownload[11]andithasincorporateditintoitsSPSSModelerproduct.

References
1. ShearerC.,TheCRISPDMmodel:thenewblueprintfordatamining,JDataWarehousing(2000)5:1322.
2. GregoryPiatetskyShapiro(2002)KDnuggetsMethodologyPoll
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

2/3

6/25/2015

3.
4.
5.

6.

7.
8.
9.

10.
11.

12.

CrossIndustryStandardProcessforDataMiningWikipedia,thefreeencyclopedia

(http://www.kdnuggets.com/polls/2002/methodology.htm)
GregoryPiatetskyShapiro(2004)KDnuggetsMethodologyPoll
(http://www.kdnuggets.com/polls/2004/data_mining_methodology.htm)
GregoryPiatetskyShapiro(2007)KDnuggetsMethodologyPoll
(http://www.kdnuggets.com/polls/2007/data_mining_methodology.htm)
scarMarbn,GonzaloMariscalandJavierSegovia(2009)ADataMining&KnowledgeDiscoveryProcess
Model(http://cdn.intechopen.com/pdfs/5937/InTech
A_data_mining_amp_knowledge_discovery_process_model.pdf).InDataMiningandKnowledgeDiscoveryin
RealLifeApplications,Bookeditedby:JulioPonceandAdemKarahoca,ISBN9783902613530,pp.438
453,February2009,ITech,Vienna,Austria.
LukaszKurganandPetrMusilek(2006)AsurveyofKnowledgeDiscoveryandDataMiningprocessmodels
(http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=451120).TheKnowledge
EngineeringReview.Volume21Issue1,March2006,pp124,CambridgeUniversityPress,NewYork,NY,
USAdoi:10.1017/S0269888906000737.
Azevedo,A.andSantos,M.F.(2008)KDD,SEMMAandCRISPDM:aparalleloverview
(http://www.iadis.net/dl/final_uploads/200812P033.pdf).InProceedingsoftheIADISEuropeanConferenceon
DataMining2008,pp182185.
"StandardMethodologyforAnalyticalModels"(https://en.wikipedia.org/w/index.php?
title=Standard_Methodology_for_Analytical_Models)(inEnglish).20150604.
Harper,GavinStephenD.Pickett(August2006)."MethodsforminingHTSdata"
(http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6T644KDJSRH
4&_user=793840&_coverDate=08%2F31%2F2006&_rdoc=4&_fmt=full&_orig=browse&_srch=doc
info(%23toc%235020%232006%23999889984%23627946%23FLA%23display%23Volume)&_cdi=5020&_sort=
d&_docanchor=&view=c&_ct=17&_acct=C000043460&_version=1&_urlVersion=0&_userid=793840&md5=f7f
5b2376172e12b63177a32b03de111).DrugDiscoveryToday11(1516):694699.
doi:10.1016/j.drudis.2006.06.006(https://dx.doi.org/10.1016%2Fj.drudis.2006.06.006).PMID16846796
(https://www.ncbi.nlm.nih.gov/pubmed/16846796).
PeteChapman(1999)TheCRISPDMUserGuide(http://lyle.smu.edu/~mhd/8331f03/crisp.pdf).
PeteChapman,JulianClinton,RandyKerber,ThomasKhabaza,ThomasReinartz,ColinShearer,andRdiger
Wirth(2000)CRISPDM1.0Stepbystepdataminingguides
(ftp://ftp.software.ibm.com/software/analytics/spss/support/Modeler/Documentation/14/UserManual/CRISP
DM.pdf).
ColinShearer(2006)FirstCRISPDM2.0WorkshopHeld(http://www.kdnuggets.com/news/2006/n19/4i.html)

Retrievedfrom"https://en.wikipedia.org/w/index.php?
title=Cross_Industry_Standard_Process_for_Data_Mining&oldid=666299567"
Categories: Applieddatamining
Thispagewaslastmodifiedon10June2015,at06:25.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmay
apply.Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.Wikipediaisa
registeredtrademarkoftheWikimediaFoundation,Inc.,anonprofitorganization.

https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

3/3

S-ar putea să vă placă și