Sunteți pe pagina 1din 45

PerformanceScenario:

Diagnosingandresolvingsudden
slowdownontwonodeRAC
Introduction
KarlArao,OCPDBA,RHCT
SeniorConsultantatSQL*Wizard
RACuserfor3years
1st environmentonVMware
Iheart performance
Dontliketoguesswhentroubleshooting
Scenario
OneThursday
aclientcalled

TherewasaSUDDEN
slowdown
onALL oftheapplications

abigimpacttotheBusiness
Anditsrunningon

RAC RAC
nochangesonthe
RACnodesandontheapplications
Someof10gPerformanceFeatures
OEMPerformancePage
ADDM
SQLTuningadvisor
AWR(DBA_HIST_)
ASH
TimeModel(totaltimeforalldbcalls)
WaitClass(12waitclass)
Metrics(v$performancemetricdeltas)
Services
Setup
ServerandStorage:SunFire X4200(2CPU,
12GBmemory)withLUNs onEMCCX300
OS:RHEL4.3ES
Databaseandclusterware:Oracle10.2.0.3
DatabaseFiles,FlashRecoveryArea,OCR,and
VotingdiskarelocatedonOCFS2filesystems
Application:FormsandReports(6iandalso
lower)
TroubleshootingPrinciple

Systematic/Layeredapproach..
Understand..
ThenFix..

Letsgetiton!
1.MeasuredtheOSstack
Monitoredthefollowing
cpu (vmstat,top,mpstat)
io (iostat)
memory(vmstat,meminfo)
network(netstat)
processinfo(top,ps)
CPUonserver1

CPUonserver2
Datafiles onserver1

Datafiles onserver2
OCR&votingdiskonserver1

OCR&votingdiskonserver2
Archivelogs onserver1

Archivelogs onserver2
FlashRecoveryAreaonserver1

FlashRecoveryAreaonserver2
Memoryonserver1

Memoryonserver2
2.CheckedtheDBenvironment
Comparedmypast&currentRDAofthe
database
Queryonsomev$views..aqueryonv$session
showedthatserver1hasmoreconnections
(89%ofthetotalusers)
2.CheckedtheDBenvironment
Thiscouldbebecauseof:
1) Theclientshavinglowerversions(<Sql*Plus8.1
orOCI8,seeNote97926.1)thatmaynotsupport
TAF(FAILOVER_MODE)andLoadBalancing
(LOAD_BALANCE)
OR
2)TheyareusingTNSentriesexplicitlyconnecting
toserver1
2.CheckedtheDBenvironment
UsersdonthaveFAILOVERcapabilities
2.CheckedtheDBenvironment
Checkedtheapplicationmoduleusageonserver1
2.CheckedtheDBenvironment
HowboutIgraphitinexcel?Willthedatabemore
meaningful?
..YES mostoftheusersusesthexxxlogin.fmx module
3.Checkedinstancewide
DBperformance
GraphedtheASHdata..
..sufferingfromgc cr blocklost andgc cr multiblockrequest from7amto4pm
3.Checkedinstancewide
DBperformance

ResearchedonMetalink forknownissues..
FoundDocID:563566.1gc lostblocks
diagnostics
Wasabletopinpointthepeakperiodfromthe
graph.Then,generatedADDMandAWR
reportonthatpeakperiod..
3.Checkedinstancewide
DBperformance
ADDM

ElapsedTime:60min
DBTime:61.83min
AAS:1.03
MaxCPU:2
3.Checkedinstancewide
DBperformance
ShouldIfollowtheserecommendationsrightaway?
Nope collectmorefacts,numbers,figures
3.Checkedinstancewide
DBperformance
AWR
3.Checkedinstancewide
DBperformance
Dowehaveaworkloaddistributionproblem?
Nope evenwithdistributedusers..
Westillhaveperformanceproblem..
4.Checkedsessionlevel
DBperformance
Thedatabasehastoomanyactivity,wheredo
Istart?Wheretodrilldown?
gv$session_longops &gv$session_wait output
toomanyusers,andrequirerepetitive
monitoring
InthespiritofMethodR
"WORKFIRSTTOREDUCETHEBIGGESTRESPONSETIMECOMPONENTOFA
BUSINESS'MOSTIMPORTANTUSERACTION

WenttotheAccountingDepartment,checked
onthedesktopterminals
4.Checkedsessionlevel
DBperformance
UsersPC1069(withSID601)andPC918(with
SID483)areontotalhang
4.Checkedsessionlevel
DBperformance
Checkedonthe
performance/waitcounters
thecurrentSQLs
4.Checkedsessionlevel
DBperformance
v$session_wait (SID601)
4.Checkedsessionlevel
DBperformance
v$sesstat (SID601)
4.Checkedsessionlevel
DBperformance
v$sql,v$sql_plan,v$sql_plan_statistics (SID601)

Runningfor98minutes
Just12.14secondsonCPU
4.Checkedsessionlevel
DBperformance
v$sesstat (SID483)
4.Checkedsessionlevel
DBperformance
v$sql,v$sql_plan,v$sql_plan_statistics (SID483)

Runningfor3hours
Just2.68secondsonCPU
4.Checkedsessionlevel
DBperformance
AnothergraphofASH
5.Drilleddownonthenetwork
interconnect

Generatedacat&egrep commandtolook
forproblemsintheinterconnectfromtheOS
Watchernetstat output
(fromMetalink DocID:563566.1gc lostblocksdiagnostics)
5.Drilleddownonthenetwork
interconnect
$catserver1_netstat.dat|egrep i"udpInOverflows|packet receive
errors|fragments dropped|reassembles failed|fragments droppedafter
timeout"
34096fragmentsdroppedaftertimeout
306030packetreassemblesfailed
15packetreceiveerrors
34096fragmentsdroppedaftertimeout
306268packetreassemblesfailed
15packetreceiveerrors
34096fragmentsdroppedaftertimeout
306574packetreassemblesfailed
outputsnipped
5.Drilleddownonthenetwork
interconnect
Restartedtheswitch

STILL THEREISAPERFORMANCEPROBLEM
5.Drilleddownonthenetwork
interconnect
Replacedtheswitch

THEYGOTFAST
5.Drilleddownonthenetwork
interconnect
karao@karl:~/Desktop$catkarlarao.dat |egrep i"udpInOverflows|packet receive
errors|fragments dropped|reassembles failed|fragments droppedaftertimeout"
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
5.Drilleddownonthenetwork
interconnect
AnothergraphofASH(Stackedgraph)
5.Drilleddownonthenetwork
interconnect
AnothergraphofASH(3dview)
Conclusion

Youdonthavetoguess..

EvenifitsaRACenvironment..

Itjusttakesfacts,numbers,figures
tosolveaperformanceproblem
ReferencesandTools
http://karlarao.wordpress.com
http://blog.tanelpoder.com
http://www.tanelpoder.com/files/TPT_public.zip
http://www.tanelpoder.com/files/PerfSheet.zip
NeilGunther &Tanel Poder MultidimensionalVisualizationofOracle
PerformanceusingBarry007http://arxiv.org/pdf/0809.2532
http://ashmasters.com
http://www.perfvision.com
http://www.methodr.com

Metalink DocID97926.1FailoverIssuesandLimitations[Connecttime
failoverandTAF]
Metalink DocID563566.1gc lostblocksdiagnostics
Metalink DocID301137.1OSWatcherUserGuide
JoinOracleUsers Philippines

Facebook
http://www.facebook.com/home.php#/pages/OracleUsersPhilippines/86773013086?ref=ts

Linkedin
http://www.linkedin.com/groups?home=&gid=2028295&trk=anet_ug_hm
Contactmethrough:

karao@sqlwizard.com
09192673389
8896999

S-ar putea să vă placă și