Sunteți pe pagina 1din 3

CSE847Project:LargeScaleImageClassification

1.

Introduction

Theobjectiveofthisprojectistobuildlargescaleimageclassifiers.Youarerequiredtobuild
programsthatefficientlylearnclassificationmodelsfromonemillionhighdimensionaltraining
examples,andapplythelearnedclassifierstomakepredictforaround200,000testexamples.
Althoughyouareallowedtousetheoffshelftools,youareencouragedtodevelopyourown
classificationalgorithmsandlearningprograms.Thecourseprojectwillbeevaluatedinthree
aspects:theclassificationperformanceofyouralgorithms(70%),yourpresentation(20%)and
yourfinalreport(10%).
Toevaluatetheperformanceofyouralgorithmsandprograms,youarerequiredtosubmitthe
classificationresultsofthetestingdata,whichwillbeevaluatedbytheinstructorusingthe
metricdescribedinsection4.Therankingoftheevaluationresultswillbereleasedinthefinal
presentation.Inyourpresentation,youneedtoreporttherunningtimesofyourprogramsfor
trainingandtesting,andthemaximummemoryusedintrainingandtesting.Youmayalso
includethespecialeffortsyouputintothecourseprojecttoimprovetheefficiencyandthe
accuracyofyourlearningprograms.Forinstance,youcanexplainthestrategyyouusedto
efficientlytrainaclassifierfromalargenumberoftrainingexamplesusingonlyalimitedamount
ofmemory.

2.

Dataset

ThedatasetusedinthisprojectismodifiedfromtheImageNetLargeScaleVisualRecognition
Challenge2010(ILSVRC2010).Formoredetailsoftheoriginaldataset,youcanvisitthe
ILSVRC2010websitehttp://www.imagenet.org/challenges/LSVRC/2010/

Thedatasetusedinthecourseprojectconsistsof1,262,106,imagesthataredistributedover
164classes.SomeoftheclassesaredirectlyfromtheImageNetdataset,whiletheothersare
generatedbymergingmultipleclassesinordertomakeitmorechallenging.Eachimageinthe
datasetisrepresentedbyavectorof900dimensions,andisassignedtooneofthe164classes.
Allthefeaturesareintegers.

Werandomlychoose1,000,000imagesfromthedatasettoformthetrainingset,andusethe
rest262,106imagesasthetestingset.Furthermore,werandomlyselect125,000imagesfrom
thetrainingdatasettocreateasmalldevelopmentset,whichwillbeusedforalgorithm
development.Foreachset,theimagefeaturesandthecorrespondingclassassignmentsare
savedintwoplaintextfiles,namedasxxx.txtandxxx_label.txt,respectively.Ofcourse,
test_label.txtfileisunavailable.Eachlineinxxx.txtisthefeaturevectorofanimageandthe

valuesineachfeaturevectorareseparatedbyspaces.Thelinenumbersareusedastheindex
idsforimages.Forexample,theimagewhichfeaturesareonthefirstlinehasindexid1.Each
lineinxxx_label.txtistheclasslabelforthecorrespondingimageinthexxx.txt.
Inthecourseproject,thedevelopmentsetwillbedistributedon03/14/2013.Itcanbe
downloadedfromhttp://www.cse.msu.edu/~cse847/project/development.rar.Boththe
trainingsetandtestingsetwillbeavailableon04/04/2013andcanbedownloadedfrom
http://www.cse.msu.edu/~cse847/project/training.rarand
http://www.cse.msu.edu/~cse847/project/testing.rar.Youneedtosendyoupredictionresults
forthetestingsetbyemailtoyourinstructoron04/17/2011(11:59pm).

3.

Submissions

Foreachclass,youneedtoreturnalistoftheindicesfor100testimages,inthedescending
orderoftheclassificationscores,i.e.,thefirstimageindexintherankinglistshouldbetheone
thatismostlikelytobeassignedtotheclassandetc.Pleaseusethefollowingformatforeach
lineinthesubmittedfile:
Classlabel

imageindex

whereclasslabelisthelabelofthepredictedclass,variedfrom1to164;imageindexisthe
indexofatestimage.ThetwofieldsareseparatedbyaTab.Pleaseputthe100imageindicesof
class1,orderedbytheclassificationscores,atthetopofthefile,followedbythe100image
indicesoftheclass2andsoon.Belowisanexampleofthefile:

165

32464

164

164

3332

100imageindicesforfirstclass1

8476

100imageindicesforthelastclass164

4.

Evaluationmetric:

TheMeanAveragePrecision(MAP)isusedtoevaluatetheperformance,whichiscomputedas
thefollowing:

Where isthetotalnumberofclasses(i.e.,164), isthenumberofimagesreturnedforeach


class(i.e.,100). iscalledprecision,andisdefinedasthepercentageofthefirstktestimages,
returnedbyyourprograms,thatbelongtoclassi.

S-ar putea să vă placă și