Sunteți pe pagina 1din 1

ExplanationonA3TextClassificationtask

IwanttogiveanexplanationonA3TextClassificationtask,mostofthesubmissionfallintotwokinds
oferrors:
1. Overfittingisamajorprobleminmachinelearning,thisproblemoccurifwetryto
comeupswiththemodelthatutilizeallthedata(data+noise).Hencethemodelthatwe
learnhasmorevariancefromthedata,itcanbehaveverybadlyontheunseendata.
Mostofthepeopleperformoverfittingbyutilizingmorefeaturesthatarenottruly
basedontheclassificationproblem.Theintentionofthisassignmentistolearnthe
relevantfeaturesfromthedata.Themodeltrainedbyoverfittingsimplygrabthenon
relevantfeaturesaswell.Ihavetestedthemodelsuchproducedonthenewdataset,
anditgivesaccuracyofabout32%.<Thisistheproblemwithstudentswhohaveused
morefeatures(alldistinctwords,wordsmorethan100,lessfrequentwords,etc).You
canjudgethistypeofproblemwhenyouhaveTrainingErrorislow,butvalidationerror
ishigh.
2. Underfittingisalsoamajorproblembutnotasmajorasoverfitting.Inthisproblem
weintentionallyusesfewfeaturesfromtheexampletobuildourmodel.Hencedueto
thislimitedfeaturespaceovermodelisunderfit.Itmeansitwillgivebothlowaccuracy
ontrainingandtestingdatasets.itmeanswehaveuseverysimplemodeltofitinour
data,likefittingwithastraightline.Ihavetestedthemodelsuchproducedonthenew
dataset,anditgivesaccuracyofabout29%.<Thisistheproblemwithstudentswho
haveusedveryfewfeatures(toptenwordsfromthedataset).Youcanjudgethistype
ofproblemwhenyouhaveTrainingErrorislow,butvalidationerrorislow.

Agoodfitmodelisrequiredasasolutiontothisproblem,almostallarefailedtograbthisidea.So,
whatarethegoodfeaturesforthisassignmentthatperformbothtrainingandtestingerrortoa
minimum.Trytocomeupwithmostrelevantfeatures(with10+features)forthetwoclasses:
CameraandCaralongwithahierarchicalarrangementofgeneraldescriptorslike:Category,Model,
Configuration,Attributes,andusage.Withthesehierarchicalfeatureswewillhaveanaccuracyofabout
92%onbothtrainingandtestdatasets.

S-ar putea să vă placă și