Documente Academic
Documente Profesional
Documente Cultură
GradianceOnlineAcceleratedLearning
GradianceOnlineAcceleratedLearning
Mrunal
HomePage
AssignmentsDue
Submissionnumber:
Submissioncertificate:
Submissiontime:
130164
JF748747
2016050620:57:33PST(GMT8:00)
ProgressReport
Handouts
Tutorials
Homeworks
Numberofquestions:
Positivepointsperquestion:
Negativepointsperquestion:
Yourscore:
5
3.0
1.0
0
LabProjects
LogOut
Help
1.SupposeweperformthePCYalgorithmtofindfrequentpairs,withmarket
basketdatameetingthefollowingspecifications:
s,thesupportthreshold,is10,000.
Thereareonemillionitems,whicharerepresentedbytheintegers
0,1,...,999999.
Thereare250,000frequentitems,thatis,itemsthatoccur10,000
timesormore.
Thereareonemillionpairsthatoccur10,000timesormore.
TherearePpairsthatoccurexactlyonceandconsistof2frequent
items.
Nootherpairsoccuratall.
Integersarealwaysrepresentedby4bytes.
Whenwehashpairs,theydistributeamongbucketsrandomly,butas
evenlyaspossiblei.e.,youmayassumethateachbucketgetsexactly
itsfairshareofthePpairsthatoccuronce.
SupposethereareSbytesofmainmemory.InordertorunthePCY
algorithmsuccessfully,thenumberofbucketsmustbesufficientlylargethat
mostbucketsarenotfrequent.Inaddition,onthesecondpass,theremustbe
enoughroomtocountallthecandidatepairs.AsafunctionofS,whatisthe
largestvalueofPforwhichwecansuccessfullyrunthePCYalgorithmon
thisdata?Demonstratethatyouhavethecorrectformulabyindicating
whichofthefollowingisavalueforSandavalueforPthatis
approximately(i.e.,towithin10%)thelargestpossiblevalueofPforthatS.
a) S=500,000,000P=3,200,000,000
b) S=500,000,000P=5,000,000,000
c) S=500,000,000P=10,000,000,000
d) S=300,000,000P=3,500,000,000
Answersubmitted:a)
Youranswerisincorrect.
Herearesomehints:
1.Thenumberofinfrequentpairsperbucketonthefirstpasswillbeabout
Pdividedbythenumberofbuckets.
2.Apaircanonlybeacandidatepairforthesecondpassifitisina
http://www.newgradiance.com/cru/servlet/COTC?Method=GET&Command=ViewHomeworkAnswers&submissionId=130164&Screen=HomePage:StudentHo
1/4
5/9/2016
GradianceOnlineAcceleratedLearning
frequentbucket.ForthevaluesofPandSfoundinthisquestion,thatcan
onlyoccurifthebucketcontainsoneofthe1,000,000frequentpairs.
3.Youmustuseahashtabletocountcandidatepairsonthesecondpassof
PCY.Thishashtabletakes12bytespercandidatepair.
2.Supposewehavetransactionsthatsatisfythefollowingassumptions:
s,thesupportthreshold,is10,000.
Thereareonemillionitems,whicharerepresentedbytheintegers
0,1,...,999999.
ThereareNfrequentitems,thatis,itemsthatoccur10,000timesor
more.
Thereareonemillionpairsthatoccur10,000timesormore.
Thereare2Mpairsthatoccurexactlyonce.Mofthesepairsconsistof
twofrequentitems,theotherMeachhaveatleastonenonfrequent
item.
Nootherpairsoccuratall.
Integersarealwaysrepresentedby4bytes.
Supposeweruntheapriorialgorithmtofindfrequentpairsandcanchoose
onthesecondpassbetweenthetriangularmatrixmethodforcounting
candidatepairs(atriangulararraycount[i][j]thatholdsanintegercountfor
eachpairofitems(i,j)wherei<j)andahashtableofitemitemcount
triples.Neglectinthefirstcasethespaceneededtotranslatebetween
originalitemnumbersandnumbersforthefrequentitems,andinthesecond
caseneglectthespaceneededforthehashtable.Assumethatitemnumbers
andcountsarealways4byteintegers.
AsafunctionofNandM,whatistheminimumnumberofbytesofmain
memoryneededtoexecutetheapriorialgorithmonthisdata?Demonstrate
thatyouhavethecorrectformulabyselecting,fromthechoicesbelow,the
tripleconsistingofvaluesforN,M,andthe(approximate,i.e.,towithin
10%)minumumnumberofbytesofmainmemory,S,neededfortheapriori
algorithmtoexecutewiththisdata.
a) N=20,000M=80,000,000S=1,100,000,000
b) N=50,000M=200,000,000S=2,500,000,000
c) N=10,000M=50,000,000S=600,000,000
d) N=100,000M=40,000,000S=800,000,000
Answersubmitted:a)
Youranswerisincorrect.
Here'sahint.Whenconsideringthehashtableforcountingpairsoffrequent
itemsthatactuallyoccurinthedataset,rememberthatyouneed12bytesper
entry,4eachtostorethetwoitemID'sand4tostoretheintegercount.The
numberof12byteentrieswillbethenumberofpairsthatoccurinthedataand
havebothitemsfrequent.
3.Supposeweperformthe3passmultistagealgorithmtofindfrequentpairs,
withmarketbasketdatameetingthefollowingspecifications:
s,thesupportthreshold,is10,000.
Thereareonemillionitems,whicharerepresentedbytheintegers
0,1,...,999999.
Allonemillionitemsarefrequentthatis,theyoccuratleast10,000
times.
Thereareonemillionpairsthatoccur10,000timesormore.
http://www.newgradiance.com/cru/servlet/COTC?Method=GET&Command=ViewHomeworkAnswers&submissionId=130164&Screen=HomePage:StudentHo
2/4
5/9/2016
GradianceOnlineAcceleratedLearning
TherearePpairsthatoccurexactlyonce.
Integersarealwaysrepresentedby4bytes.
Whenwehashpairs,theydistributeamongbucketsrandomly,butas
evenlyaspossiblei.e.,youmayassumethateachbucketgetsexactly
itsfairshareofthePpairsthatoccuronce.
Thehashfunctionsonthefirsttwopassesarecompletelyindependent.
SupposethereareSbytesofmainmemory.AsafunctionofSandP,whatis
theexectednumberofcandidatepairsonthethirdpassofthemultistage
algorithm?Demonstratethecorrectnessofyourformulabydentifyingwhich
ofthefollowingtriplesofvaluesforS,P,andNisNapproximately(i.e.,to
within10%)theexpectednumberofcandidatepairsforthethirdpass.
a) S=300,000,000P=100,000,000,000N=19,000,000
b) S=200,000,000P=10,000,000,000N=3,400,000
c) S=300,000,000P=100,000,000,000N=9,300,000
d) S=500,000,000P=5,000,000,000N=10,500,000
Answersubmitted:a)
Youhaveansweredthequestioncorrectly.
4.DuringarunofToivonen'sAlgorithmwithsetofitems{A,B,C,D,E,F,G,H}
asampleisfoundtohavethefollowingmaximalfrequentitemsets:{A,B},
{A,C},{A,D},{B,C},{E},{F}.Computethenegativeborder.Then,
identifyinthelistbelowthesetthatisNOTinthenegativeborder.
a) {G}
b) {F,G}
c) {A,B,C}
d) {B,F}
Answersubmitted:a)
Youranswerisincorrect.
Thissetisinthenegativeborderbecauseitisnotfrequent,yeteachofits
immediatepropersubsets,i.e.,theemptysetonly,isfrequent.Notethatasubset
ofamaximalfrequentitemset,suchastheemptyset,mustitselfbefrequent.
5.Inthisproblem,assumeallintegersandpointersoccupy4bytes.The
assumptionthatwecountrepresentpaircountswithtriples(i,j,c)forthepair
i,jwithcountcdoesnotaccountforthespaceneededtobuildanefficient
datastructuretofindijpairswhenweneedthem.Supposeweuseabinary
searchtree,whereeachnodeisaquintuple(i,j,c,leftChild,rightChild).
SupposealsothatthereareIitems,andPpairsthatactuallyappearinthe
data.Underwhatcircumstancesdoesitsavespacetousetheabovebinary
searchtreeratherthanatriangularmatrix?
a) I=500,000P=20,000,000,000
b) I=200,000P=5,000,000,000
c) I=50,000P=3,000,000,000
d) I=1000P=120,000
Answersubmitted:d)
Youranswerisincorrect.
http://www.newgradiance.com/cru/servlet/COTC?Method=GET&Command=ViewHomeworkAnswers&submissionId=130164&Screen=HomePage:StudentHo
3/4
5/9/2016
GradianceOnlineAcceleratedLearning
Hint:IftherearePpairsinthedata,thenthebinarysearchtreerequiresfive
integersorpointersperpair.Howmuchmemorydoesthatrequire?
Copyright20072015GradianceCorporation.
http://www.newgradiance.com/cru/servlet/COTC?Method=GET&Command=ViewHomeworkAnswers&submissionId=130164&Screen=HomePage:StudentHo
4/4