Sunteți pe pagina 1din 62

Troubleshooting Tools and

Methodology in a Citrix XenApp


5.0 Environment
Kapil Ramlal (KappA)
Escalation Engineer

Agenda
XenApp troubleshooting
The right tool, right place at the right time
Troubleshooting scenarios
Top utilities
Case studies
Additional resources/Q&A

Agenda
XenApp troubleshooting
The right tool, right place at the right time
Troubleshooting scenarios
Top utilities
Case studies
Additional resources/Q&A

Agenda
XenApp troubleshooting
The right tool, right place at the right time
Troubleshooting scenarios
Top utilities
Case studies
Additional resources/Q&A

Agenda
XenApp troubleshooting
The right tool, right place at the right time
Troubleshooting scenarios
Top utilities
Case studies
Additional resources/Q&A

Agenda
XenApp troubleshooting
The right tool, right place at the right time
Troubleshooting scenarios
Top utilities
Case studies
Additional resources/Q&A

Agenda
XenApp troubleshooting
The right tool, right place at the right time
Troubleshooting scenarios
Top utilities
Case studies
Additional resources/Q&A

XenApp troubleshooting
Understanding the infrastructure
The anatomy of a XenApp farm
Information: Static and Dynamic
Components: Where to focus troubleshooting
Understanding what happens from logon to launch
Types of issues: Denial of service, bottlenecks
Troubleshooting: Medevac, performance monitoring, CDF

Types of Information
Static

Dynamic

Data Store

Dynamic Store

Does not change


LHC frequently

Constantly changing

Farm configuration
Changes made in the

Management Console

DATA STORE

information

Load management
Information required for
application launch

Logon to launch
Active Directory

XML Broker

Client

Web Interface

Least Loaded Server


Zone Data Collector

Data Store

MedEvac (CTX107935)
The XML Broker tests
Verifies that the XML Service is able to respond to an XML / client request
XML is able to contact the Zone Data Collector

Zone Data Collector tests

Verifies that the ZDC can provide the address of the least loaded server for the requested app
The IMA Service is able to respond
The IMA Service can read the Local Host Cache
The IMA Service can read its Dynamic Store

Least Loaded Server tests


Verifies that Terminal Service is able to respond
Verifies that the RPC Service is able to respond

How to Monitor Farm Health using MedEvac?


See knowledge center article CTX119899

Monitoring
XML Broker
XML Threads
Client

Web Interface
ASP Requests

CDF

Zone Data Collector


IMA Work Item Queues
IMA %CPU time
Zone Elections Won

CDF

Active Directory

RSOP

Citrix Counter

Description

Threshold

Server to monitor

Time to resolve LLS

Determine baseline

All XML Brokers

Data Store Connection Failure

Number of minutes the server


has been disconnected from
the Data Store

Determine threshold
considering scheduled
reboots and maintenance

All XennApp servers

Number of Busy XML Threads

Number of XML requests


currently being processed
(Max=16)

16 sustained for 1 min or


longer

All XML Brokers

WorkItem Queue Ready Count

Number of work items that are


ready and waiting to be
processed by IMA

Sustained above 0 for 1 min


or longer

Resolution WorkItem Queue


Ready Count

number of work items (related


to application launches)
waiting to be processed by
IMA

Sustained above 0 for 1 min


or longer

Number of times this server


won an election

if this counter increments by


2 in a 1 hour period

Application Resolution Time (MS)

Zone Elections Won

All XML brokers


Most Preferred and
Preferred Data Collectors
All XML brokers
Most Preferred and
Preferred Data Collectors
Most Preferred and
Preferred Data Collectors

XenApp 5.0 Health Monitoring and Recovery


Enterprise & Platinum Editions of XenApp
Performs tests to monitor state and identify health risks
Terminal Services tests
XML Service test
Citrix IMA Service test
Logon Monitor test
Check DNS test
Local Host Cache test
XML threads test
Citrix Print Manager Service test
Microsoft Print Spooler test
ICA Listener test
See page 307 of the XenApp 5.0 Administrators Guide (CTX115519) for information

Large Farm Tips


Limit additional roles on Zone Data Collectors
Limit the number of zones in the environment
Do not run management consoles on or pointed to the ZDCs
Read the Key Infrastructure Tuning article: CTX116492
Free the
ZDC!

The evolution continues!


Citrix XenApp 5.0 opens the door for delivering resources on
Windows Server 2008
Clients are also adopting more Windows Vista users
Say hello to the next generation troubleshooting artillery for the
XenApp 5 environment
Existing tools have been updated, and new tools introduced
The evolution continues!

The right tool, right place at the right time


DON'T
Use troubleshooting tools just because you can
Recommend tools that are not relevant to the problem
Use troubleshooting tools without understanding their impact of the environment

DO

Use tools to help automate time consuming tasks


Use tools at the right time, such as when the problem is occurring and not afterwards
Understand what the tool is trying to accomplish, so that the right data is obtained
Use tools with a clear purpose
Maintain a local toolkit, so that the right tools are always available in times of crisis

CDF Tracing & CDFControl 2.5

Common Diagnostic Facility (CDF)


Provides the ability to collect traces for problem diagnosis on Citrix
binaries without disrupting the services or users
Citrixs standard debug tracing facility
Efficient and non-intrusive data collection process
Enabled without stopping and starting services
Faster & easier tracing for retail modules
Flexible & customizable troubleshooting facility
Consistency across most Citrix products

CDF Basics
To better understand what a CDF trace message is, lets look at the
following pseudo code example

In the example, the function belongs to a service, which can be


considered to be a Trace Provider (more on this later)

The moral of the story


We could capture a CDF trace to determine if the
CitrixFeatureDLL.dll loaded successfully

How difficult it would be to debug without having this tracing?


You need special symbol files to be able to read the trace messages
(TMF files)

This allows certain information to remain private as needed (similar to


.pdb files)

You get more by default!

CDF Internals
To better understand CDF, lets take a quick overview at how the
Operating System supports Event Tracing (ETW)

CONTROLLER

CONSUMER

CDFCONTROL
Events

Events

Enable/
Disable

Buffers

Events
Events
Events
Trace File

CDM.sys
RadeSvc.exe

WFShell.exe

ETW Components
Providers:

Modules containing tracing, that can be enabled or disabled


Example: MF_Driver_Cdm (Cdm.sys)
Controllers:

Enables/Disables a provider
Configures trace capture settings
Starts/Stops a trace
Consumer:

Reads trace events from log file


Reads trace events real-time from a trace session

CDFControl v2.5
CDFControl is a hybrid controller and consumer
It can start/stop/enable and configure an ETW/CDF trace session
It can consume (read) trace events from a log file, or from a live realtime trace session
The original version operated only as a ETW Controller, and was
published under CTX111961

CDFControl 2.5 Demo

Troubleshooting Scenarios

Troubleshooting scenarios
Application Streaming

Database

Seamless/Multi-Monitor

Network

3rd Party Applications

Black Hole Effect

CPU Spikes

XenApp Plugin (PNA)

Deadlocks/Hangs

Debugging

Application Streaming
RAD file

End User

1.
2.
3.
4.
5.
6.
7.

Streaming Client and AIE

What happens on the


client side?
manifest
.dlls
file
executable
data files
AIE
other
rules
.exes

Network File Servers

End user launches app from WI or PN Agent


RAD file is downloaded
RAD file launches client Application Isolation Environment (AIE)
RAD file instructs streaming client to download:
Manifest file | AIE rules | Application executable | Pre and post execution scripts
Streaming client launches executable according to instructions in manifest file and AIE rules including pre
and post execution scripts and registers with the ctxsbx.sys (redirector)
Application is available to user
Streaming Client requests additional files as required, checking first in the client cache, then if necessary,
downloading additional files from the file server

Application Streaming
Isolate the Issue
When?
Profiling
Publishing
Streaming

How?
Streaming to Server
Streaming to Client

Versions?
WI 4.5, 5.0
License server 4.5,5.0
Client

Application Streaming
Streaming Client Troubleshooting:
Client installation is required on workstations
Verify the Citrix Streaming Service is started or restart
Reference CTX116483 required permissions
Enable debug console
HKEY_LOCAL_MACHINE\Software\Citrix\Rade
REG_DWORD: EnableDebugConsole
Value: 1 to switch on, 0 to switch off

Application Streaming
Leverage realtime CDF tracing!
Run CDFControl on the client (where client is installed)
Choose the Application Streaming category
Enable realtime tracing
Provide a TMF path (CTX106233)
Start tracing and reproduce the launch failure

Seamless/Multi-Monitor
SEAMLESS HOST COMPONENTS
Winlogon

Default

winlogon.
exe

sehook20.dll
sehook20.dll

ICA Client

icast.exe

TWISysT
rayAgent

icactls.dll

seamls20.dll

TWIWorker
TWIReader

wfshell.exe

Seamless/Multi-Monitor
SEAMLESS CLIENT COMPONENTS

wfica32.exe
vdtwin30.dll

LVB

vdtwn.dll
ctxsrcc.lib
GAI

Seamless/Multi-Monitor
Multi-Monitor
An optional component
Client provides a monitor layout via thinwire channel which is shared by all
process loading mmhook.dll via shared memory
Work area change is always posted to host. This could be due to change in work
area of the existing area or change in virtual screen size due to addition /deletion
of monitors.
API hooks are controlled by flags and can be customized per process. Refer to
CTX115637 for various configuration options

Seamless/Multi-Monitor
Shift F2 to change to Full Screen mode
Reconnect as fixed size window session
Set global flags, 0x26DEA7, to see if it fixes the issue.
This is combination of following flags (See CTX101644 for details of each bit)
0x1 (Disable session sharing), 0x2 (Disable modality check), 0x4 (Disable AA hook)
Analyze CDF trace for MF_DLL_CTXNOTIF and MF_SESSION_TWI
Analyze window information using SPY++/Window History/Message History
Try per-window exception flags
Analyze application logic (API flow) using TracePlus utility

Seamless/Multi-Monitor
Get the Window class name which is exhibiting the problem
Collect the CDF traces for concerned module ONLY
CTXNOTIF, MMHOOK, TWCDS, TWI, TWI_HOOK

Analyze the behavioral aspect that could be affected by hooks???


Enable disable/ Does it happen on single monitor too? If yes, chances are very little. Disable
mmhook and see what happens?
Compare the window styles at host and client
For seamless specific issue, verify if it happens in ICA Desktop/RDP also.

3rd Party Applications


How does the application work?

Is it Native, or does it run on a Framework, such as .NET or Java?


Do you have the right versions of the Framework installed?
Are the correct dependencies present, and does it work at the console?
Does it require certain file and registry access? (Does it need Write permissions etc. ?)
Does it require component registration?

Inspect core functionality

View the application/process under an analysis tool such as ProcessExplorer or WinDbg


Inspect all loaded modules (DLLs) by the application
Validate any dependencies (missing DLL's?)
Inspect named events and handle usage (synchronization/resource problems?)
Validate file and registry access using ProcessMonitor
Run application under the AppVerifier utility to check for a multitude of issues

3rd Party Applications


Leverage the Global Flags for user-mode applications using the Gflags utility
Set 3rd party application to run under Image File Executions
Configure a debugger to invoke the application (such as WinDbg)
When the application launches, the debugger will automatically attach to the
process and halt its execution!
This gives the opportunity to explore all application threads from process
initialization (~*kb)
From here the internals of the application can be understood at the Native
Windows API level (i.e. Which Windows API's are being used)

3rd Party Applications


Use ProcessExplorer to view the loaded modules for a process, and check for
the presence of any hook modules (hooking DLL's)
Hook modules can alter the natural behavior of applications, which can
sometimes cause problems
Try excluding the problem application from all Citrix hooks (CTX107825)

CPU Spikes
Try to define a pattern (leverage perfmon)
Determine offending Thread ID causing the spike
(Process Explorer, QSlice)
Obtain userdump of offending process immediately after (Userdump.exe,
WinDbg.exe)
Check CDF trace for repeated (looping) messages (if Citrix component)
Use application spy to look at what the application is doing (TracePlus, Logger)

Deadlocks
Windows Vista and Server 2008 offer the new Wait Chain Traversal (WCT)
API!
This offers applications
a mechanism
checkMANAGER
internally for wait conditions, and
THE
WINDOWStoTASK
also allows for customCAN
toolsCAPTURE
to be created
which
can also
USER
DUMPS
IN check for application
hangs LIVE!
VISTA & 2008!!!
No cool WCT tools available? The debugger is your friend!
Attach to hung process/service and generate a dump for post-mortem analysis:
.dump /ma c:\PathToDump\DeadlockedApp.dmp

Manually inspect thread states, and get the debugger's opinion with:
!analyze -hang -v

Slow logons
Understand the logon process and Identify the slowdown!
Validate via network trace that the connection between server to client is good
If the connection makes it to the server, check which processes exist
Use TaskManager and sort by session ID
Gather userdumps for each process for the slow session to try to identify any
synchronization problems, such as LPC and ALPC wait chain conditions
Ensure Terminal Services is running (svchost.exe) and that the thread count
appears normal
Ensure critical Citrix processes are okay, such as IMA, CpSvc and XML

The XenApp client


PNAgent.exe starts up and communicates with PNAMain.exe to share
application launch, and shortcut details
PNAMain.exe initiates communication with the Web Server for application
requests and config.xml settings
WFCRun32.exe works with WFICA32.exe to launch an application
Best to use a live-debug approach as there is no inherent tracing readily
available on the client

The XenApp client


For single sign-on problems ensure:
PNSSON is at the top of the network provider list
SSONSVR is running
Nothing is causing any logon delays (such as 3rd party monitoring applications etc.) as
this would cause the SSON ticket to expire, therefore causing SSONSVR to exit
Enable a default debugger to look out for any unexpected termination of the client
processes

Debugging
User Mode versus Kernel Mode
The Windows operating system can be conceptually divided into 2
parts:
User Space (User Mode)
Kernel Space (Kernel Mode)

Applications run in User Mode


System drivers run in Kernel Mode (Privileged Mode)

USER SPACE
USER
USER
USER MODE
APPLICATION
USER
APPLICATION USER
APPLICATION
APPLICATION USER
USER
APPLICATION
USER
APPLICATION USER
APPLICATION
APPLICATION

KERNEL SPACE
rusb2w2k.sys
keyboard.sys
win32k.sys
tcpip.sys
[]

Debugging
Windows Vista and Server 2008 does not rely on the boot.ini for debug
settings anymore
Say hello to the BCDEDIT utility!
(http://technet.microsoft.com/en-us/library/cc721886.aspx)

To do a live local debug, you need to first enable debugging on the server
Bdcedit /debug on (requires reboot)

Debugging
In the event of a system crash (BSOD), ensure that:
1. The Pagefile (pagefile.sys) is configured to run on the system drive (where
Windows is installed)
2. The Pagefile is larger that the amount of physical RAM on the server
3. Startup and recovery options are set for a kernel or complete memory dump
4. Enough space exists to write the dump file

Debugging
To debug application crashes, configure a default application
debugger to handle fatal application errors!
Dr.Watson is gone in Vista and Server 2008
Manually configure a default application debugger (CTX105888)
Use the TestDefaultDebugger tool to ensure that server is able to
capture userdumps (CTX111901)

Debugger Basics
NTSD pn ProcessName (attaches to running process)
~*kb Lists all running threads
x *!*Symbol*
bp

Searches for a symbol matching the one specified

Sets a breakpoint (typically used with symbol)

kb Dumps callstack of current thread


!analyze v

Scans for exceptions

!analyze hang v

Scans for wait chains

Debugger Basics (The Call Stack)


Thread
#

Switch to thread 4
PID

TID

First Parameter

Function Parameters

Module
Name

Second Parameter
First Parameter off stack

Function
Name

Offset

Case Studies

Introducing the Citrix Symbol Server


#1 feedback during SMART post incident reviews
Traditional data collection/upload/analysis cycle takes too long

Live debugging while problem is occuring


Significant delays introduced when waiting on large uploads to complete
Resources are strained during CritSits keep focus on issue resolution

64-bit adoption increasing


Full system dump files will get larger
Significantly longer upload times

Citrix Symbol Server The Payoff A Case Study


A critical Citrix service is crashing on startup
Users unable to connect

Debugger attached to process at startup


Crash caused by heap corruption

Full page heap enabled


New stack trace points to root cause
Case archives reveal that problem is resolved with an existing hotfix

Time to resolve
With symbol server: less than 1 hour
Estimated time without symbol server: more than 1 business day

Using the Citrix Symbol Server


Products supported
Citrix Presentation Server 3.0, 4.0 and 4.5 all languages / hotfixes
XenApp 5.0 all languages / hotfixes

Location
Add http://ctxsym.citrix.com/symbols to your symbol path

Questions / Feedback
Article CTX118622 on Citrix Knowledge Base (http://support.citrix.com)
Send additional feedback to symsrv@citrix.com

Case Study CDFControl Realtime Tracing Demo

Questions?

S-ar putea să vă placă și