Sunteți pe pagina 1din 22

Zenoss Discovery and Classification

February 2009 Jane Curry Skills 1st Ltd www.skills-1st.co.uk

DRAFT

JaneCurry Skills1stLtd 2CedarChase Taplow Maidenhead SL60EU 01628782565 jane.curry@skills1st.co.uk

Skills1stLtd

12Feb2009

Synopsis
Zenosshasseveralpossibilitiesfordiscoveringdevices,bothmanualandautomatic. Oncediscovered,thesubsequentmonitoringofadevicedependsverymuchonthe deviceclassthatanelementisallocatedto.Thispaperfocusesparticularlyona scenariothatautomaticallydiscoversdevicesinnetworksandthenallocatesthose devicestodeviceclasses. ThescenariousesanumberofZenosstechniquesincludingPythonscripts,event commandsandeventtransforms.

Skills1stLtd

12Feb2009

Table of Contents
1Introduction......................................................................................................................4 2Automaticdeviceclassallocationscenario.....................................................................9 3Elementsofthesolution................................................................................................12 3.1dev_to_class.pyscheduledscriptfordeviceclassallocation................................12 3.2Devicesthatdonot(initially)supportSNMP.......................................................14 3.3SNMPagentinstalledsubsequentlyfordevicein/Ping......................................18 4Conclusions.....................................................................................................................22

Skills1stLtd

12Feb2009

1 Introduction
Zenossprovidesanumberofmethodsfordiscoveringdevicesandtheircomponents. Thesimplestmethodistomanuallyaddindividualsystemsbutthistechnique obviouslydoesnotscalewell.Ifmanualdiscoveryisused,thenmanycharacteristics ofthedevicecanalsobespecified.

Figure1:Dialogueformanuallyaddinganewdevice

Itisalsopossibletodiscoveralldevicesonanetworkthisusesapingspray mechanismsocanbegoodifnetworksaresubnettedtoClassCwithupto254devices, butitisnotgoodifthereareclassBnetworks!Bothtechniquescaneitherbedriven fromtheZenossGraphicalUserInterface(GUI)orcanbedrivenfromthecommand lineusingzendisc.Ifdevicesarediscoveredbyrunningdiscoveryonanetwork,they areautomaticallyaddedtothedeviceclassof/Discovered. Deviceclassesgenerallycontroltheavailabilityandperformancemonitoringofa device.AlldeviceclasseshavezPropertiesthatcontrolSNMPaccess,telnetaccess, WindowsManagementInstrumentation(WMI)accessandsshaccess.

Skills1stLtd

12Feb2009

Figure2:zPropertiesforthetoplevel/Deviceclass(partialpanel)

NotethezCollectorPluginsproperty(3fromthetop)canbeusedtocontrolthe informationthatwillbecollectedfromadeviceonamodellingcycle(asopposedtoa discoverypoll).

Skills1stLtd

12Feb2009

Figure3:zCollectorPluginsfor/Devicesdeviceclass

ThethirdelementaffectedbydeviceclassistheperformanceTemplate.Thiscontrols theperformancedatathatiscollectedandanythresholdsthatmaygenerateevents, basedonthatperformancedata.

Figure4:PerformanceTemplatesfor/Devices

Alldeviceclassesand,indeed,individualdevices,haveazPropertiespagethatcan changeanydefaultzProperty.Zenoss'sobjectorientedclasshierarchymeansthat commonpropertiescanbespecifiedhigheruptheclasshierarchywithspecific attributesbeingoverriddenfurtherdownthehierarchy.So,forexample,thedevice class/Server/LinuxhasextrazCollectorPluginsdefined.

Skills1stLtd

12Feb2009

Figure5:zCollectorpluginsfor/Server/Linux

Anydevicethatisplacedinthisclassorasubclassof/Server/Linuxwill,bydefault, havecpuutilisationcollectedalongwithfilesystemandsoftwareinformationfromthe SNMPHostResourcesMIB.NotethatthezCollectorPluginsarenotspecifying performancedata;theyarespecifyingavailabilityinformation. Thestandardperformancetemplate(calledDevice)for/Devices,isalsooverriddenat the/Server/Linuxsubclass.

Skills1stLtd

12Feb2009

Figure6:Performancetemplate(calledDevice)forthe/Server/Linuxdeviceclass

Devicesthatareplacedinto/Server/Linuxautomaticallyhavecpu,memoryandIO datacollected,graphsdefined,andthresholdssetsothateventsaregeneratedfor extremesofCPUutilisationandlowswap. Anyoftheseavailabilityorperformancemonitoringcharacteristicscanbeoverridden eitherbyadevicesubclassorbyaspecificdeviceinstance.Notethatmostdefault informationcollectionreliesonSNMPsupport. The/Discovereddeviceclass,towhichautomaticallydiscovereddevicesareadded,has thesamecharacteristicsasthetoplevel/Devicesclass:


pingpollingisactive snmppollingisactivewithcommunitynamepossibilitiesofpublicandprivate WMImonitoringisinactive zCollectorpluginswillcollectbasicSNMPinformation,includinginterfaceand routinginformation

Skills1stLtd

12Feb2009

Performanceinformationwillonlybegatheredforinterfaces

Ifautodiscoveryforanetworkistobedeployed,thenamechanismisrequiredto assigndiscovereddevicestoasuitableclass.

2 Automatic device class allocation scenario


TakethescenariowheredevicesareautomaticallydiscoveredbyZenossfora particularnetwork.Theywillbeallocatedtothe/Discovereddeviceclass.Notethat (atleastbydefault)deviceswillonlybediscoverediftheydorespondtoping. Ofthosedevices,somemaysupportSNMP;othersdon't.IfdevicesdosupportSNMP thentheirSNMPObjectIdentifier(OID)willbecollectedandstoredinZenoss's ConfigurationManagementDatabase(CMDB).Further,onamodellingcycle,this OIDwillberecheckedandwillbeusedtopopulatethehardwaremanufacturerand modelandtheOperatingSystemmakeandversion.

Figure7:StatuspageforadeviceshowingHardwareandOSinformation

Zenosshasalargedatabaseofhardwareandsoftwareinformationoutofthebox, whichcanbeaddedtoandmodifiedbytheuser.

Skills1stLtd

12Feb2009

Figure8:ProductentryforCisco7206routershowingSNMPOIDassociation

AProductclassalsohaszPropertieswhichappeartodoexactlywhatisrequired assignadevicetoadeviceclass,basedontheSNMPOID.

Figure9:zPropertiesfortheProductclass/Manufacturers/Cisco/7206

UnfortunatelywiththecurrentversionofZenoss(2.3.2)thesezPropertiesare documentedasForfutureuseanddonotwork. Theotheralluringpossibilityforautomaticallyassigningadevicetoadeviceclassis thatthezendisccommanddocumentsanassigndevclassscriptoption.(Thisused tobecalledtheautoallocateoptionpriortoZenoss2.2.)Unfortunately,thereisno documentationastohowtousethisscriptandthecodein$ZENHOME/Products/ DataCollector/zendisc.pyhascommentsthatsaysthisoptiondoesnotwork!So,we havetohandcraftasolutionusingvariousfacetsofZenoss. Thescenariodescribedhereassumesadevicetobediscoveredintothe/Discovered class.IfitdoesnotsupportSNMPondiscoverythenapproximately1daycangoby beforethedevicewillbemovedfromthe/Discoveryclasstothe/Pingclass.During thisperiod,thedevicewillbepolledforSNMPevery5minutesand,ifnoresponseis received,thenthecountontheinitialeventreportingSNMPagentdown,willbe incremented.Whenthecountontheeventgetsto300,thedevicewillbemoved to/Pingwhich,bydefault,onlypingpolls;therearenoSNMPpollsandno performancetemplatesassigned.

10

Skills1stLtd

12Feb2009

ThispermitsstafftorecognisethatanewlydiscovereddeviceneedsanSNMPagent installingand/orconfiguringforusewithZenoss.Obviouslytheperiodelapsedbefore actionistaken,canbeadjusted. Ifthedeviceisreconfiguredafterthis300x5minuteperiod,theassumptionisthat thedevicewillsendanSNMPcoldstartTRAPtotheZenosssystem.Zenosswill configurethisTRAPtocheckthecurrentdeviceclassofthedeviceand,ifitis/Ping,to movethedevicebacktotheclass/Discovered.ThecoldstartTRAPwillalso automaticallyclearanySNMPagentdowneventsfromthesamedevice. FornewlydiscovereddevicesthatdosupportSNMP,anddevicesthatmay subsequentlysupportSNMP,ascriptwillberunperiodicallybytheUnixcronutility, thatusesthedevice'sOIDfromtheZenossCMDBandmovesthedevicetoan appropriatedeviceclass.

Figure10:Flowchartforassigningnewlydiscovereddevicestodeviceclass

11

Skills1stLtd

12Feb2009

3 Elements of the solution


ThissolutionusesanumberoffacilitiesthatZenossprovides,plustheabilitytorun scriptsperiodicallyusingthecronscheduleroftheZenossoperatingsystem.

3.1 dev_to_class.py scheduled script for device class allocation


Theeasy,optimalsolutioniswhenadeviceisdiscoveredandrespondstoSNMP. Duringtheinitialdiscovery,someSNMPinformationisgathered,includingthe SNMPOIDofthedevice.Onceadevicehasbeendiscoveredwithping,amodelling processtakesplace.Subsequently,modellingcaneitherberunmanuallyforadevice usingthedropdownmenuandtheManage>ModelDeviceoption.Alternatively,the zenmodelerprocessrunsautomatically,bydefaultevery12hours(configurefromthe lefthandCollectors>localhostmenu). Onepossibilitywouldappeartobetousetheeventgeneratedwhenadeviceis discovered,tomovethedevicetoanappropriateclass.Eitheraneventclass transformmightbeconsideredoraneventcommand.Thisturnsoutnottobesucha goodideabecause:

theeventisgeneratedbeforethecollectorpluginsofthemodellingprocesshas beenrun,sotheSNMPOIDmaynotbeknownateventtime ifaneventcommandisused,commandsareactuallyprocessedasynchronously everyminute(bydefault)bythezenactionsdaemonandyoucangetbad performanceissuesandinconsistenciesinupdatingZenoss'sZopeCMDB configurationdatabaseifseveralupdatesrunwithinthe1minutecycle ifaneventtransformisused,similarlyperformanceissuescanensue

Hence,thissolutiondoesnotattempttomodifyadevice'sclassatdiscoverytimebut runsascheduledscriptwhichsimplychecksthroughalldevicescurrentlyinthe class/Discovered,movesthedevicestoappropriateclassesandthenperformsasingle CMDBcommitattheend.

12

Skills1stLtd

12Feb2009

Figure11:dev_to_class.pyPythonscripttomovedevicesfrom/Discoveredtomoreappropriateclasses

Obviously,thisscriptcanbemodifiedtoaddextraexactmatchesfortheSNMPOID andotherpartialmatchesbasedonthestartofanOID. NotethatifyouhaveaZenossGUIwindowopenedforadevicewhoseclasshas changedthenyouwillgetanerrormessagewhenyoureturntothatwindow.Youcan simplyreturntoanyavailableoptioninthelefthandmenuandcontinue.

13

Skills1stLtd

12Feb2009

Figure12:Errormessagewhenreturningtodevicepageafterdeviceclassmove

3.2 Devices that do not (initially) support SNMP


TheremaybemanyreasonswhyZenosscannotgetSNMPinformationfromadevice:

thedevicehasnoSNMPagent thedevicehasadifferentSNMPcommunitynamethanZenossisusing thedeviceusesadifferentportforSNMPthanZenossisusing(UDP/161isthe default) thedeviceusesadifferentversionofSNMPthanZenossisusing(v1isthe Zenossdefault;v2candv3arepossible) theremaybeafirewallbetweenZenossandthedeviceblockingSNMP

Whateverthereason,Zenosswillcontinuetopolldevicesinthe/Discoveredclass every5minutes.Afterthefirstfailure,aneventappearsintheEventConsole.

Figure13:SNMPagentdowneventwitheventcountincreasing

14

Skills1stLtd

12Feb2009

Onsubsequent5minutepolls,theeventcountisincreased.Thisscenarioisbasedon allowingaperiodafterdiscoveryfortheSNMPagenttobefixed,soactionwillbe takenwhentheeventcountreaches300(25hours).Afterthat,hereisnopointin continuingtoissueSNMPpollstothesedevicessotheyneedmovingtothe/Ping deviceclass. Thisisachievedusinganeventcommandwhichhasaveryflexiblefiltermechanism todefineexactlywhensomethingshouldhappen.Basically,aneventcommand simplyrunsashellscript. Thefilterensuresthat:

thedeviceiscurrentlyintheclass/Discoveredwedon'twanttoaffectdevices thathavealreadybeenallocatedtousefulclasses theeventisofclass/Status/Snmp thesummaryoftheeventcontainsthestringSNMPagentdown.Thereare severaleventsthatmaptothe/Status/Snmpeventclassweareonly interestedspecificallyintheSNMPagentdownevent thecount>300obviouslythisiseasilyadjusted

WhenALLthefiltersarematched(thecriteriaarelogicallyANDed)thenthescriptis runthenexttimethatzenactionswakesup.

15

Skills1stLtd

12Feb2009

Figure14:disco_to_ping_fileeventcommand

Apossiblenegativeaspectoftheeventcommandisthatitisrunasynchronouslyby thezenactionsdaemonwhichrunseveryminute(bydefault).Iftheeventcommand attemptstomoveadevicetoadifferentclassandiftherearelotsofdevicesthatget processedsimilarlyatthesametime(asislikelyifanetworkdiscoverywas performed)thenyoucanendupwithalargenumberofeventcommandsallrunning atonce,alltryingtomodifytheCMDBdatabase.WhenIfirsttriedthis,Iendedup with80concurrentprocesses,allspawnedbyzenactions,alltryingtoupdatethe CMDB.PerformancewashorribleandtheCMDBtransactionsfailed.So,theevent commandsimplyechosthenameofthedevicetoatemporaryfile.Thescreenshot abovecreatesadifferentfileeveryhourwitha<yy><mm><dd><hh>suffix. Nowamechanismisrequiredtoprocessthetemporarydatafile.Asmallshellscript isrunbythecronschedulerthatcatenatesanytemporarydatafilesintoasinglefile, $ZENHOME/local/discoto_ping.out.Thecodethatmovesdevicesfromoneclassto anotherisPython,notshellscript,soratherthancallPythoncodefromthescript,the Zenossutilitytogenerateanevent,zensendevent,isused.Thismeansthatthereisa trackingeventwithintheZenossEventConsoleforwhentheprocessisrun.

16

Skills1stLtd

12Feb2009

Figure15:disco_to_ping.shscriptrunperiodicallybycron

Thezensendeventcommandtakesanumberofparameters,including:

d c s

thedevicethatgeneratedtheeventtheZenosssysteminthiscase theeventclasstogeneratealocallycreatedclass,/Skills/Disco_to_ping theseverityoftheeventInfo(blue)inthiscase

Theremainderofthelineistreatedastheeventsummary

Aneweventiscreated,/Skills/Disco_to_ping,anditisconfiguredwithaneventclass transformtorunthePythoncode(fromaneventclass,usethedropdowntablemenu toreachtheTransformoption).Notethatthisisaneventclasstransform,notan eventclassmappingtransformitrunswheneveraneventof class/Skills/Disco_to_pingarrives.

17

Skills1stLtd

12Feb2009

Figure16:Eventclasstransformfor/Skills/Disco_to_ping

Thescriptsimplyworksthrougheachlineindiscoto_ping.outandmoveseachdevice tothe/Pingdeviceclass.Asinglecommitisperformedattheendwhichmakesfor fewertransactionproblemswiththeCMDBdatabase. NotethatyoucannotrunaneventtransformwhentheSNMPagentdownevent reachesacertaincountasthecountfieldoftheeventisnotavailableatevent transformtime. Thusfar,thesolutionmovesdevicestoanappropriateclasssoonafterdiscoveryifthe devicesupportsSNMPandmovesnonSNMPdevicestothe/Pingdeviceclassifthere isnoSNMPsupport25hoursafterdiscovery.Therearetwoscriptstobescheduledby cronorrunmanuallydev_to_class.pyanddisco_to_ping.sh.

3.3 SNMP agent installed subsequently for device in /Ping


Togetgoodmonitoringforadevice,aresponsiveSNMPagentisabighelp. Hopefully,mostdeviceswilleventuallyhaveanagentinstalledandconfigured suitablytocommunicatewithZenoss.ThefirstindicationofawellconfiguredSNMP agentislikelytobeacoldstartTRAPfromthedevicetoZenoss. ZenosshasconfigurationoutoftheboxthatinterpretsthegenericcoldstartTRAP butitmapstothe/Unknowneventclass.Toeffectusefulactions,thisTRAPneedsto maptoaspecificevent.Tothatend,createaneweventsubclass,Snmp_agent_start,

18

Skills1stLtd

12Feb2009

under/Status/Snmp(usetheSubClassesdropdowntablemenuandAddNew Organizer).

Figure17:Neweventsubclassesunder/Events/Status/Snmp

AcoldstartTRAPintheEventConsolecanbemappedtothisnewclassverysimply byselectingtheeventandusingthedropdowntablemenutoMapeventstoClass choosethenew/Status/Snmp/Snmp_agent_startclassfromthedropdownlist.This mappingwillapplytoallcoldstartTRAPs. Thescenarioherecallsforadifferentactiononlyifthedeviceisinthe/Pingdevice classactionshouldcertainlynotbeinitiatedwheneveranSNMPagentisbounced; wearereallytryingtodefinethefirstappearanceofanSNMPagent.Thisisachieved withasecondeventclassmappingforthe/Status/Snmp/Snmp_agent_startevent class.

Figure18:Eventclassmappingsfor/Status/Snmp/Snmp_agent_start

19

Skills1stLtd

12Feb2009

Ascanbeseenabove,theoriginalmappingsimplymapsifthesummaryoftheevent containsthestringsnmptrapsnmp_coldStart.Createasecondmappingforthis eventusingtheEventClassMappingdropdowntablemenuandAddMapping.

Figure19:Eventclassmappingfor/Status/Snmp/Snmp_agent_startfordevicesin/Pingdeviceclass

Thismappingshouldonlymatchifthedevicethatsenttheeventiscurrentlyindevice class/PingsoamappingRulespecifiesthis.ThemappingTransformthenachieves twothings:

thecomponentfieldoftheeventissettosnmp.Thisissothatwecan eventuallyusethiseventtoclearanyassociatedSNMPagentdownevents. thedeviceismovedfrom/Pingto/Discovered

20

Skills1stLtd

12Feb2009

Sincetherearenowtwomappingsforthe/Status/Snmp/Snmp_agent_startevent class,theyshouldbeprioritisedusingtheSequencetab,withthemorespecificevent beingsequencenumber0(ie.testedfirst).

Figure20:Sequencenumbersforthe/Status/Snmp/Snmp_agent_downeventmappings

ThisensuresthatonlycoldstartTRAPsfordevicesinthe/Pingclasswillhavetheir classchanged;ordinaryrebootsofSNMPagentswillbeunaffected. Tocompletethescenario,itwouldbeusefulforthe specific/Status/Snmp/Snmp_agent_startmappingtoclearanySNMPagentdown events.UsethezPropertiestabtosetthezEventSeveritytoClearandthe zEventClearClassestoinclude/Status/Snmpevents.Notethatfor zEventClearClassestobeutilised,eventsmustbefromthesamedeviceandthe eventcomponentfieldmustbethesame.Thisiswhytheeventmappingtransform setsthecomponentfieldto'snmp'tomatchthosegeneratedbytheSNMPagent downevents.

Figure21:zPropertiesforeventclassmapping/Status/Snmp/Snmp_agent_start

Atthispoint,recentlydiscovereddeviceswhichhavelatterlyhadanSNMPagent installed,arenowinthesamepositionasthosedeviceswhohadSNMPsupporton discoverysotheperiodicdev_to_class.pyscriptshouldmovethedevicestoamore appropriatedeviceclass,basedontheirSNMPOID.

21

Skills1stLtd

12Feb2009

4 Conclusions
Thispaperdemonstratesamethodforclassifyingdevicesthatareautomatically discovered.Forsmallenvironmentswheredevicescanbeaddedand/orconfigured manually,itisoverkill.Forlargerenvironmentswherehundredsofdevicesmaybe discovered,especiallywhereasignificantproportiondonotinitiallysupportSNMP, thesolutionseemstobehelpful. Whentestingthesolution,$ZENHOME/log/zenactions.logisusefulforcheckingthe progressofeventcommandsand$ZENHOME/log/zenhub.logisusefulfordebugging problemswitheventtransforms.Pythoncodecanbetestedasastandaloneprogram andbitsofpythoncanbetestedusingZenoss'szendmdcommandenvironment.

22

Skills1stLtd

12Feb2009