Sunteți pe pagina 1din 25

How To Debug CCI Issues Version 1.

How To Debug CCI Issues Version 1.3


Introduction
This short document describes some of the many ways CCI can break. It should be used as
a list of the things to check with a customer whenever CCI problems are raised.
Check:

This tag is used to highlight what to look for.

Installation and Environment Variables


UNIX
You would think it is easy to get this right but we see many errors caused due to bad
installation. Here is what should be done. Let us assume a non-default installation as that
is the easiest for the user to get wrong
Let us assume we want to install CCI in /opt/HORCM
1

Copy the installation file to the hard disk somewhere. It is called RMHORC. Lets
copy it to /var/tmp

cd /opt
Change directory to where you want /HORCM created

cpio -idmu < /var/tmp/RMHORC


This will copy all the files in the RMHORC package to /opt/HORCM

Check:
We have seen cases where users build their own installation packages for
HORCM and then copy the files from one Host to another. In UNIX particularly this is
dangerous. CCI needs a hidden directory called .uds or it will not start.

In CCI 01-16-03 and below it was in /var/tmp


In CCI 01-17-03 and above it is in /yourdirectory/HORCM

This directory contains UNIX pipes when the instances are started. The pipes are
deleted when the instance stops. Thus, you will see this:
root@SYD-E250-1:/opt/HORCM/.uds#:ls -al
total 4
drwxrwxrwx
2 root
sys
512 Feb 22 15:26 .
dr-xr-xr-x 12 root
sys
512 Feb 22 15:04 ..
root@SYD-E250-1:/opt/HORCM/.uds#:horcmstart.sh 4
starting HORCM inst 4
HORCM inst 4 starts successfully.
root@SYD-E250-1:/opt/HORCM/.uds#:ls -al
total 6
drwxrwxrwx
3 root
sys
512 Feb 22 16:29 .
dr-xr-xr-x 12 root
sys
512 Feb 22 15:04 ..
drwxrwxrwx
2 root
other
512 Feb 22 16:29 .lcmcl04

Mike Le Voi

Page 1

15/12/2012

How To Debug CCI Issues Version 1.3


srwxrwxrwx

1 root

other

0 Feb 22 16:29 .lcmep04

Always check for this directory if you have a case where CCI does not start.
4

ln -s /opt/HORCM /HORCM

Check: You must create a link or the install in the next step will fail.
5

/HORCM/horcminstall.sh

Check:
6

You must do this on UNIX to create links to the CCI commands.

raidqry -h

Here is what you see if the user has done everything right.
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HORC
-h
Help/Usage
-I[#] Set to HORCMINST#
-IH[#] or -ITC[#] Set to HORC mode [and HORCMINST#]
-IM[#] or -ISI[#] Set to MRCF mode [and HORCMINST#]
-z
Set to the interactive mode
-zx
Set to the interactive mode and HORCM monitoring
-q
Quit(Return to main())
-g
Specify for getting all group name on local
-l
Specify the local query
-r <group>
Specify the remote query
-f
Specify display for floatable host

Check:

Always get the user to run this command and send you the output.

If user is using an old version of CCI, ask why. 01-19-03/04 or later is preferred for
many reasons which will be discussed later.
In some special cases, like HPtM, a specific level of CCI may be stated in the ECN or
Release Notes. In this case, it may be advisable to stick with that level. Also, the
microcode ECNs for 9900V and USP always recommend a CCI level.
However, in my experience, CCI is always backwards compatible and the developer
has confirmed this so one should always use the minimum level stated in ECNs.
Any command this user issues will be assumed to be TrueCopy (refer above HORC). If the user is trying to perform ShadowImage operation, you now know why
it is failing
No instance has been set. Here is what you see if the instance variable has been set:

root@SYD-E250-1:/opt/HORCM/.uds#:export HORCMINST=4
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HORC[4]
-h
Help/Usage

Mike Le Voi

Page 2

15/12/2012

How To Debug CCI Issues Version 1.3


If the user is trying to control Instance 1, now you know why it is failing
Windows
Installation is easier. Double click the EXE and follow the bouncing ball. I always
recommend taking the default of C:\HORCM. The directory is only about 10 MB in size
so we are not likely to fill the drive.
Check:

The same rule applies as for UNIX. Always ask for raidqry output.

Environment Variables
Here is a ShadowImage example.
C:\HORCM\ETC>set horcminst=4
C:\HORCM\ETC>set horcc_mrcf=1
C:\HORCM\ETC>raidqry -h
Model : RAID-Manager/WindowsNT
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HOMRCF[4]
-h
Help/Usage
-I[#] Set to HORCMINST#
-IH[#] or -ITC[#] Set to HORC mode [and HORCMINST#]
-IM[#] or -ISI[#] Set to MRCF mode [and HORCMINST#]
-z
Set to the interactive mode

Check:

horcminst is case insensitive on windows it is case sensitive on UNIX

Change mode of operation to TrueCopy


C:\HORCM\ETC>set horcc_mrcf=
C:\HORCM\ETC>raidqry -h
Model : RAID-Manager/WindowsNT
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HORC[4]

Now try this on UNIX. Change mode of operation:


root@SYD-E250-1:/opt/HORCM/.uds#:export HORCC_MRCF=1
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HOMRCF[4]
root@SYD-E250-1:/opt/HORCM/.uds#:export HORCC_MRCF=
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HOMRCF[4]

Mike Le Voi

Page 3

15/12/2012

How To Debug CCI Issues Version 1.3


Check:

This does not work. You must do this.

root@SYD-E250-1:/opt/HORCM/.uds#:unset HORCC_MRCF
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HORC[4]

Sending logs to GSC


If you have to escalate the problem to GSC, we will need the complete set of HORCM
logs and all the HORCM CONF files. In general, the preferred method of doing this is to
run getconfig. These executables/scripts are available on the TUF web site
(https://tuf.hds.com).
If for any reason you do not run these scripts, you must zip up all the LOG directories
underneath the HORCM directory. Never pick and choose which log to upload many of
them have the same name and GSC may need to refer to all of them!
In addition, the factory always asks for the output from these commands (Windows only):
inqraid -CLI -fgx $Phys
inqraid -CLI -fgvx $Vol
inqraid -CLI -fgx $LETALL

Finding Command Devices


You cannot create a HORCM CONF file or check it for accuracy without doing INQRAID
commands for UNIX/Windows and RAIDSCAN commands for Windows.
UNIX
Check:

Get the user to send you the result of this command:

root@SYD-E250-1:/opt/HORCM/.uds#:ls /dev/rdsk/* | inqraid -CLI


DEVICE_FILE
PORT
SERIAL LDEV CTG H/M/12 SSID R:Group
...
c2t0d16s2
CL1-A-4 10111
0
c2t2d36s2
c2t4d0s2
CL1-A-20 20169
43
c2t6d0s2
CL1-A-11 80025 31F
c2t6d14s2
c3t2d128s2
CL2-A-6 10262 2180
c3t3d4s2
CL2-A-7
3157
1
...

-fxg
PRODUCT_ID
OPEN-V-CM
OPEN-V-CM
OPEN-V-CM
OPEN-V-CM
DF600F-CM

Here are 5 command devices 2 are in bold. The first is a USP the second is a 9570V. If
the user wants to use the first one, they need to code:
/dev/rdsk/c2t6d0s2

in the HORCM CONF file.


Windows
Mike Le Voi

Page 4

15/12/2012

How To Debug CCI Issues Version 1.3

Check:

Get the user to send you the result of these commands:

C:\HORCM\ETC>raidscan -x findcmddev h0,20


cmddev
cmddev
cmddev
cmddev
cmddev
cmddev
cmddev

of
of
of
of
of
of
of

Ser#
Ser#
Ser#
Ser#
Ser#
Ser#
Ser#

10111
10111
41
10262
80025
20169
20169

=
=
=
=
=
=
=

\\.\PhysicalDrive2
\\.\PhysicalDrive5
\\.\PhysicalDrive7
\\.\PhysicalDrive8
\\.\PhysicalDrive10
\\.\PhysicalDrive11
\\.\Volume{3c107ab6-7dbf-11db-a1ed-000e0c6abf1d}

Check:
Do not use ANY of these names! If you find a user using this syntax, ask
that it be changed See INQRAID output below:

Harddisk numbers can change after a reboot.


GUID numbers can change in a MS Cluster environment after reboot
Do yourself a favour do not use these names

C:\HORCM\ETC>inqraid $Phys -CLI


DEVICE_FILE
Harddisk0
Harddisk1
Harddisk2
Harddisk3
Harddisk4
Harddisk5
Harddisk6
Harddisk7
Harddisk8
Harddisk9
Harddisk10
Harddisk11

PORT
CL1-A
CL1-A
CL1-A
CL1-A
CL1-A
CL1-A
CL1-A
CL1-A
CL1-A
CL1-A

SERIAL
10111
10111
10111
10111
10111
41
10262
10262
80025
20169

LDEV CTG
515
1920
768
1856
2632
0
16
8320
784
13
-

H/M/12
s/s/ss
s/s/ss
P/s/ss
s/s/ss
-

SSID
9997
9993
999A
2000
-

R:Group
5:03-02
5:06-02
5:06-02
5:01-05
-

PRODUCT_ID
00JS-22MHB0
00JS-22MHB0
OPEN-V-CM
OPEN-V
OPEN-V
OPEN-V-CM
OPEN-V
DF600F-CM
OPEN-V-CM
OPEN-V
OPEN-V-CM
OPEN-V-CM

\\.\CMD syntax has been around since 01-17-03/05. There is no reason not to use it! If the
user is running 01-17-03/05 or below, get them to use 01-19-03/04 or higher.
In this case, for Harddisk8/USP 10262, the correct syntax in the HORCM CONF file is:
\\.\CMD-10262-16
\\.\CMD-10262-16-CL1-A-12
\\.\CMD-10262-16-CL1-A
\\.\CMD-10262

or even
if you know this is HSD 12 - or
or, for slack people,

\\.\CMD-10262-16 is my preferred coding technique as this takes care of multipath


environments as well.

Starting HORCM Instances


There are so many ways for this to fail that I could write a book on this topic
So, always take the easy way out. Send the user a deck that is bound to work. If it does
not, then you have very little to debug. Here is such a deck HORCM4.CONF:
UNIX

Mike Le Voi

Page 5

15/12/2012

How To Debug CCI Issues Version 1.3

HORCM_MON
#ip_address
10.129.2.53

service
11004

HORCM_CMD
#dev_name
dev_name
/dev/rdsk/c2t6d0s2

poll(10ms)
1000
dev_name

timeout(10ms)
3000

HORCM_DEV
#dev_group

dev_name

port#

HORCM_INST
#dev_group

ip_address

service

dev_name

TargetID

LU#

MU#

There are only 3 things to check:

Is the IP address correct? Note: You can use localhost here, but this will not work
for TC environments using 2 different CCI servers.
Is 11004 a free UDP port? Almost certainly it is.
Is the CMDDEV right? You can tell that from the commands we have already issued.

UNIX HORCM CONF files are kept in /etc


Windows
Here is HORCM8.CONF for Windows:
HORCM_MON
#ip_address
10.129.3.127

service
11008

HORCM_CMD
#dev_name
dev_name
\\.\CMD-10262-16

poll(10ms)
1000
dev_name

timeout(10ms)
3000

HORCM_DEV
#dev_group

dev_name

port#

HORCM_INST
#dev_group

ip_address

service

dev_name

TargetID

LU#

MU#

Use the same logic as for UNIX. Windows HORCM CONF files are in C:\WINDOWS
Other recommendations:

HDvM uses HORCM CONF files called HORCM900.CONF to HORCM988.CONF


for temporary HORCM CONF files. Do not use these numbers yourself.
I suggest that you use 0-799 for user created files and 800-899 for HDvM created
permanent HORCM CONF files.
I also suggest a numbering convention of 1100x where x is the number in
HORCMx.CONF. This means that you will need to reserve UDP ports 11000 to
11899 for HORCM CONF usage.

Mike Le Voi

Page 6

15/12/2012

How To Debug CCI Issues Version 1.3

Updating the Services file


Many people code HORCM CONF files like this:
HORCM_MON
#ip_address
10.129.3.127

service
horcm8

poll(10ms)
1000

timeout(10ms)
3000

In this case, the UDP port horcm8 must be defined in the Services file:
Windows
UNIX

C:\WINDOWS\system32\drivers\etc\services
/etc/services

Like this:
horcm0
horcm1

horcm8
horcm9
blank line

11000/udp
11001/udp
11008/udp
11009/udp

Check:
Under Windows, if there is no blank line after horcm9 (in this example)
that definition will be ignored! PS No blank lines at the end of the HORCM CONF file,
please
Check:
If you have 2 CCI servers, using horcm8 and horcm9, for example, then
both horcm8 and horcm9 have to be defined in both servers!

Reading the LOGS


Windows
Lets start with Windows first this time
In our example we used Instance 8 so you will find the log here:
C:\HORCM\log8\curlog\horcm_ml_acer510_log.txt
because this server is called ml_acer510.
Let us examine it in detail:
*****************************************************************************
- HORCM STARTUP LOG - Thu Feb 22 18:03:08 2007
*****************************************************************************
18:03:08-39210-05000- horcmgr started on Thu Feb 22 18:03:08 2007
18:03:08-39210-05000- _spawnvp() horcmd_08 using horcmgr [CWD=C:\HORCM\ETC]
18:03:08-3d090-07240- Fibre address conversion TBL has been set to '2'
P.P.
: RAID Manager for WindowsNT
Model : RAID-Manager/WindowsNT
Ver&Rev: 01-19-03/04

Mike Le Voi

Page 7

15/12/2012

How To Debug CCI Issues Version 1.3


Release: Production(GA)
ALL Rights Reserved, Copyright (c) 1998-2006, Hitachi, Ltd.
HORCM(ml_acer510 7240) started by Administrator (0) on Thu Feb 22 18:03:08 2007

Lots of useful information here. See the data in bold.


18:03:08-3d090-0724018:03:08-3d090-0724018:03:08-3d090-0724018:03:08-3d090-0724018:03:08-3d090-0724018:03:08-40b28-07240\\.\PhysicalDrive8

horcmd_08 started on Thu Feb 22 18:03:08 2007


[horcmcfgrdf] access(conf_file) OK.
[horcmcfgrdf] access(check) OK.
[horcmcfgrdf] open(conf_file) OK.
[horcmcfgetent] fseek(top) OK.
converted
CMDDEV
filename
\\.\CMD-10262-16

to

Here is where \\.\CMD syntax is converted to a physical drive number.


18:03:08-40b28-07240- [horcmcfgetent] read(conf_file) OK.
18:03:08-40b28-07240- [horcmcfgrdf] close(conf_file) OK.
18:03:08-40b28-07240- [horcmcfgrdf] check(conf) OK.
18:03:08-40b28-07240- [horcmcfgrdf] horccmddev(0) OK.
18:03:08-40b28-07240- [horcread] cmddevopen() start
18:03:08-40b28-07240- [horcread] cmddevopen() finished
18:03:08-40b28-07240***** horcread *****
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F[0x0012d8d4]0000:
48495441 43484920 4f50454e 2052454d
[0x0012d8e4]0010:
4f544520 434f5059 20535953 54454d20
[0x0012d8f4]0020:
44415441 20545950 45203031 00000000
[0x0012d904]0030:
f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09

------CHAR-----HITACHI OPEN REM


OTE COPY SYSTEM
DATA TYPE 01....
000000010262....

Here is the USP serial number.


[0x0012d914]0040:
50090100 00040000 00040004 00040004
P...............
[0x0012d924]0050:
ffffffff ffffffff 00060006 00060006
................
[0x0012d934]0060:
00070007 00070007 000f0c00 00000000
................
[0x0012d944]0070:
00000000 ef00e011 08030100 01004000
..............@.
[0x0012d954]0080:
38000400 04400100 01000400 00ff0100
8....@..........
[0x0012d964]0090:
80000000 00000000 00000000 00000000
................
[0x0012d974]00a0:
00000000 00000000 00000000 00000000
................
[0x0012d984]00b0:
00800012 000e0002 00000000 00000000
................
[0x0012d994]00c0:
00000000 00000000 00000000 00000000
................
[0x0012d9a4]00d0:
00000000 00000000 00000000 00000000
................
[0x0012d9b4]00e0:
00000000 00000000 00000000 00000000
................
[0x0012d9c4]00f0:
00000000 00000000 00000000 00000000
................
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0x0012d9d4]0100:
0004ffff 00060007 ffffffff ffff000b
................
[0x0012d9e4]0110:
ffff000d 000e000f 00100011 00120013
................
[0x0012d9f4]0120:
00140015 00160017 9914ffff 001a001b
................
[0x0012da04]0130:
001c001d 001e001f 00200021 00220023
......... .!.".#
[0x0012da14]0140:
20002001 00260027 00280029 002a002b
. ..&.'.(.).*.+
[0x0012da24]0150:
002c002d ffff002f 00300031 00320033
.,.-.../.0.1.2.3
[0x0012da34]0160:
00340035 00360037 00380039 003a003b
.4.5.6.7.8.9.:.;
[0x0012da44]0170:
003c003d 003e003f 00400041 00420043
.<.=.>.?.@.A.B.C
18:03:08-40b28-07240- [HORCREAD]: maxldev = 16384, unitnum = 256
18:03:08-40b28-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024,
maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
18:03:08-40b28-07240- [HORCREAD]: Number of used instance(s) = 17, Number of
attached instance(s) = UNKNOWN, Number of same instance(s) = 1
18:03:08-40b28-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8
18:03:08-40b28-07240- [HORCMCFGRDF] SLPR is supported.
18:03:08-40b28-07240***** SLPR bitmap *****
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0x0012db18]0000:
80000000 00000000 00000000 00000000
................
[0x0012db28]0010:
00000000 00000000 00000000 00000000
................

Mike Le Voi

Page 8

15/12/2012

How To Debug CCI Issues Version 1.3


18:03:08-40b28-07240[HORCRELOWNLBA]
floatable
LBA(e011)
is
released.ID:0:\\.\PhysicalDrive8
18:03:08-40b28-07240- [horcread] cmddevopen() start
18:03:08-40b28-07240- [horcread] cmddevopen() finished
18:03:08-40b28-07240***** horcread *****
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0x0012d8d4]0000:
48495441 43484920 4f50454e 2052454d
HITACHI OPEN REM
[0x0012d8e4]0010:
4f544520 434f5059 20535953 54454d20
OTE COPY SYSTEM
[0x0012d8f4]0020:
44415441 20545950 45203031 00000000
DATA TYPE 01....
[0x0012d904]0030:
f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09
000000010262....
[0x0012d914]0040:
50090100 00040000 00040004 00040004
P...............
[0x0012d924]0050:
ffffffff ffffffff 00060006 00060006
................
[0x0012d934]0060:
00070007 00070007 000f0c00 00000000
................
[0x0012d944]0070:
00000000 ef00e011 08030100 01004000
..............@.
[0x0012d954]0080:
38000400 04400100 01000400 00ff0100
8....@..........
[0x0012d964]0090:
80000000 00000000 00000000 00000000
................
[0x0012d974]00a0:
00000000 00000000 00000000 00000000
................
[0x0012d984]00b0:
00800012 000e0002 00000000 00000000
................
[0x0012d994]00c0:
00000000 00000000 00000000 00000000
................
[0x0012d9a4]00d0:
00000000 00000000 00000000 00000000
................
[0x0012d9b4]00e0:
00000000 00000000 00000000 00000000
................
[0x0012d9c4]00f0:
00000000 00000000 00000000 00000000
................
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0x0012d9d4]0100:
0004ffff 00060007 ffffffff ffff000b
................
[0x0012d9e4]0110:
ffff000d 000e000f 00100011 00120013
................
[0x0012d9f4]0120:
00140015 00160017 9914ffff 001a001b
................
[0x0012da04]0130:
001c001d 001e001f 00200021 00220023
......... .!.".#
[0x0012da14]0140:
20002001 00260027 00280029 002a002b
. ..&.'.(.).*.+
[0x0012da24]0150:
002c002d ffff002f 00300031 00320033
.,.-.../.0.1.2.3
[0x0012da34]0160:
00340035 00360037 00380039 003a003b
.4.5.6.7.8.9.:.;
[0x0012da44]0170:
003c003d 003e003f 00400041 00420043
.<.=.>.?.@.A.B.C
18:03:08-40b28-07240- [HORCREAD]: maxldev = 16384, unitnum = 256
18:03:08-40b28-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024,
maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
18:03:08-40b28-07240- [HORCREAD]: Number of used instance(s) = 17, Number of
attached instance(s) = UNKNOWN, Number of same instance(s) = 1
18:03:08-40b28-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8
18:03:08-40b28-07240- [HORCMCFGRDF] SLPR bitmap is checked.
18:03:08-40b28-07240- [horcmcfgrdf] horccmddev(0) OK.
18:03:08-40b28-07240[HORCRELOWNLBA]
floatable
LBA(e011)
is
released.ID:0:\\.\PhysicalDrive8
18:03:08-40b28-07240- [horcread] cmddevopen() start
18:03:08-40b28-07240- [horcread] cmddevopen() finished
18:03:08-449a8-07240***** horcread *****
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0x0012d8d4]0000:
48495441 43484920 4f50454e 2052454d
HITACHI OPEN REM
[0x0012d8e4]0010:
4f544520 434f5059 20535953 54454d20
OTE COPY SYSTEM
[0x0012d8f4]0020:
44415441 20545950 45203031 00000000
DATA TYPE 01....
[0x0012d904]0030:
f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09
000000010262....
[0x0012d914]0040:
50090100 00040000 00040004 00040004
P...............
[0x0012d924]0050:
ffffffff ffffffff 00060006 00060006
................
[0x0012d934]0060:
00070007 00070007 000f0c00 00000000
................
[0x0012d944]0070:
00000000 ef00e011 08030100 01004000
..............@.
[0x0012d954]0080:
38000400 04400100 01000400 00ff0100
8....@..........
[0x0012d964]0090:
80000000 00000000 00000000 00000000
................
[0x0012d974]00a0:
00000000 00000000 00000000 00000000
................
[0x0012d984]00b0:
00800012 000e0002 00000000 00000000
................
[0x0012d994]00c0:
00000000 00000000 00000000 00000000
................
[0x0012d9a4]00d0:
00000000 00000000 00000000 00000000
................
[0x0012d9b4]00e0:
00000000 00000000 00000000 00000000
................
[0x0012d9c4]00f0:
00000000 00000000 00000000 00000000
................
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0x0012d9d4]0100:
0004ffff 00060007 ffffffff ffff000b
................
[0x0012d9e4]0110:
ffff000d 000e000f 00100011 00120013
................
[0x0012d9f4]0120:
00140015 00160017 9914ffff 001a001b
................
[0x0012da04]0130:
001c001d 001e001f 00200021 00220023
......... .!.".#
[0x0012da14]0140:
20002001 00260027 00280029 002a002b
. ..&.'.(.).*.+
[0x0012da24]0150:
002c002d ffff002f 00300031 00320033
.,.-.../.0.1.2.3

Mike Le Voi

Page 9

15/12/2012

How To Debug CCI Issues Version 1.3


[0x0012da34]0160:
00340035 00360037 00380039 003a003b
.4.5.6.7.8.9.:.;
[0x0012da44]0170:
003c003d 003e003f 00400041 00420043
.<.=.>.?.@.A.B.C
18:03:08-449a8-07240- [HORCREAD]: maxldev = 16384, unitnum = 256
18:03:08-449a8-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024,
maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
18:03:08-449a8-07240- [HORCREAD]: Number of used instance(s) = 17, Number of
attached instance(s) = UNKNOWN, Number of same instance(s) = 1
18:03:08-449a8-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8
18:03:08-449a8-07240- [horcmcfgrdf] seldevdata() OK.
18:03:08-449a8-07240- MON(HORC): Size of memory allocation for CONFIG_DB = 64
bytes.
18:03:08-449a8-07240MON:HORCM_CMD=\\.\PhysicalDrive8[Fibre][AL-PA=0xef
->
C=5,T=1] port=CL1-A , targ=1, lun=12

Here is the AL-PA for the Port, and the Port, target ID and LUN.
18:03:08-449a8-07240- MON(HORC):number of Mus = 0
18:03:08-449a8-07240- MON(HOMRCF): Size of memory allocation for CONFIG_DB = 0
bytes.
18:03:08-449a8-07240- MON(HOMRCF):number of Mus = 0
18:03:10-d1b78-05000horcmgr
:
executed
'CreateProcess(raidscan.exe
-pi
$PhysicalDrive -find inst -z2w=NUL -z1w=NUL)',exit = 0.

I have quoted this in full for a reason. This is what you should expect to see if it all works.
If it does not work, at least you can compare the user log with this one
UNIX
Here is the output for a Solaris server called SYD-E250-1. This is the log for instance 4.
As we installed CCI in /opt/HORCM, the log is here:
root@SYD-E250-1:/opt/HORCM/log4/curlog#:ls -al
total 28
drwxr-xr-x
3 root
other
512 Mar 7
drwxr-xr-x
4 root
other
512 Mar 7
-rw-r--r-1 root
other
10274 Mar 7
drwxr-xr-x
2 root
other
512 Mar 7

16:29
16:29
16:30
16:29

.
..
horcm_SYD-E250-1.log
horcmlog_SYD-E250-1

*****************************************************************************
- HORCM STARTUP LOG - Wed Mar 7 16:29:59 2007
*****************************************************************************
16:29:59-cac9d-11271- horcmgr started on Wed Mar 7 16:29:59 2007
16:29:59-cd940-11271- execvp() horcmd_04 using /etc/horcmgr [CWD=/]
16:29:59-e99c5-11272- Fibre address conversion TBL has been set to '1'
P.P.
:
Model :
Ver&Rev:
Release:

RAID Manager for Solaris


RAID-Manager/Solaris
01-19-03/04
Production(GA)

ALL Rights Reserved, Copyright (c) 1998-2006, Hitachi, Ltd.


HORCM(SYD-E250-1 11272) started by root (0) on Wed Mar
16:30:00-11d9d-1127216:30:00-17e65-1127216:30:00-1c076-1127216:30:00-1e127-1127216:30:00-29cf3-1127216:30:00-31d0e-1127216:30:00-34856-1127216:30:00-389cb-1127216:30:00-4a34c-1127216:30:00-5ac7f-11272-

Mike Le Voi

7 16:30:00 2007

horcmd_04 started on Wed Mar 7 16:30:00 2007


[horcmcfgrdf] access(conf_file) OK.
[horcmcfgrdf] access(check) OK.
[horcmcfgrdf] open(conf_file) OK.
[horcmcfgetent] fseek(top) OK.
[horcmcfgetent] read(conf_file) OK.
[horcmcfgrdf] close(conf_file) OK.
[horcmcfgrdf] check(conf) OK.
[horcmcfgrdf] horccmddev(0) OK.
[horcread] cmddevopen() start

Page 10

15/12/2012

How To Debug CCI Issues Version 1.3


16:30:00-63837-11272- [horcread] cmddevopen() finished
16:30:00-6e384-11272***** horcread *****
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F[0xffbfcb94]0000:
48495441 43484920 4f50454e 2052454d
[0xffbfcba4]0010:
4f544520 434f5059 20535953 54454d20
[0xffbfcbb4]0020:
44415441 20545950 45203031 00000000
[0xffbfcbc4]0030:
f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09

------CHAR-----HITACHI OPEN REM


OTE COPY SYSTEM
DATA TYPE 01....
000000080025....

NSC55 with a Serial Number of 80025


[0xffbfcbd4]0040:
50090500 00020000 00050005 00050005
P...............
[0xffbfcbe4]0050:
00040004 00040004 00060006 00060006
................
[0xffbfcbf4]0060:
00070007 00070007 000f2a00 00000000
..........*.....
[0xffbfcc04]0070:
00000000 b200e00c 08030100 01004000
..............@.
[0xffbfcc14]0080:
f8000400 04400100 01000400 00ff0100
8....@..........
[0xffbfcc24]0090:
80000000 00000000 00000000 00000000
................
[0xffbfcc34]00a0:
00000000 00000000 00000000 00000000
................
[0xffbfcc44]00b0:
0080000e 00080002 00000000 00000000
................
[0xffbfcc54]00c0:
00000000 00000000 00000000 00000000
................
[0xffbfcc64]00d0:
00000000 00000000 00000000 00000000
................
[0xffbfcc74]00e0:
00000000 00000000 00000000 00000000
................
[0xffbfcc84]00f0:
00000000 00000000 00000000 00000000
................
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0xffbfcc94]0100:
00050004 00060007 00080009 0060ffff
.............`..
[0xffbfcca4]0110:
ffffffff ffffffff ffff000d ffffffff
................
[0xffbfccb4]0120:
0020ffff ffffffff ffffffff ffffffff
. ..............
[0xffbfccc4]0130:
ffffffff ffffffff ffffffff ffffffff
................
[0xffbfccd4]0140:
0039ffff ffffffff ffffffff ffffffff
.9..............
[0xffbfcce4]0150:
0028ffff ffffffff ffff002d ffffffff
.(.........-....
[0xffbfccf4]0160:
ffffffff 00320033 ffffffff ffffffff
.....2.3........
[0xffbfcd04]0170:
ffffffff ffffffff ffffffff ffff000a
................
16:30:01-ae6ea-11272- [HORCREAD]: maxldev = 16384, unitnum = 256
16:30:01-b1cea-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024,
maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
16:30:01-b5e34-11272- [HORCREAD]: Number of used instance(s) = 13, Number of
attached instance(s) = UNKNOWN, Number of same instance(s) = 1
16:30:01-c2226-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2

Here is the CMDDEV


16:30:01-c636e-11272- [HORCMCFGRDF] SLPR is supported.
16:30:01-ca4bf-11272***** SLPR bitmap *****
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0xffbfce08]0000:
80000000 00000000 00000000 00000000
................
[0xffbfce18]0010:
00000000 00000000 00000000 00000000
................
16:30:01-dad71-11272[HORCRELOWNLBA]
floatable
LBA(e00c)
released.ID:0:/dev/rdsk/c2t6d0s2
16:30:01-deb6b-11272- [horcread] cmddevopen() start
16:30:01-e2d12-11272- [horcread] cmddevopen() finished
16:30:01-e7502-11272***** horcread *****
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0xffbfcb94]0000:
48495441 43484920 4f50454e 2052454d
HITACHI OPEN REM
[0xffbfcba4]0010:
4f544520 434f5059 20535953 54454d20
OTE COPY SYSTEM
[0xffbfcbb4]0020:
44415441 20545950 45203031 00000000
DATA TYPE 01....
[0xffbfcbc4]0030:
f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09
000000080025....
[0xffbfcbd4]0040:
50090500 00020000 00050005 00050005
P...............
[0xffbfcbe4]0050:
00040004 00040004 00060006 00060006
................
[0xffbfcbf4]0060:
00070007 00070007 000f2a00 00000000
..........*.....
[0xffbfcc04]0070:
00000000 b200e00c 08030100 01004000
..............@.
[0xffbfcc14]0080:
f8000400 04400100 01000400 00ff0100
8....@..........
[0xffbfcc24]0090:
80000000 00000000 00000000 00000000
................
[0xffbfcc34]00a0:
00000000 00000000 00000000 00000000
................
[0xffbfcc44]00b0:
0080000e 00080002 00000000 00000000
................
[0xffbfcc54]00c0:
00000000 00000000 00000000 00000000
................
[0xffbfcc64]00d0:
00000000 00000000 00000000 00000000
................
[0xffbfcc74]00e0:
00000000 00000000 00000000 00000000
................
[0xffbfcc84]00f0:
00000000 00000000 00000000 00000000
................

Mike Le Voi

Page 11

15/12/2012

is

How To Debug CCI Issues Version 1.3


---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0xffbfcc94]0100:
00050004 00060007 00080009 0060ffff
.............`..
[0xffbfcca4]0110:
ffffffff ffffffff ffff000d ffffffff
................
[0xffbfccb4]0120:
0020ffff ffffffff ffffffff ffffffff
. ..............
[0xffbfccc4]0130:
ffffffff ffffffff ffffffff ffffffff
................
[0xffbfccd4]0140:
0039ffff ffffffff ffffffff ffffffff
.9..............
[0xffbfcce4]0150:
0028ffff ffffffff ffff002d ffffffff
.(.........-....
[0xffbfccf4]0160:
ffffffff 00320033 ffffffff ffffffff
.....2.3........
[0xffbfcd04]0170:
ffffffff ffffffff ffffffff ffff000a
................
16:30:02-62fd9-11272- [HORCREAD]: maxldev = 16384, unitnum = 256
16:30:02-6712a-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024,
maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
16:30:02-6b268-11272- [HORCREAD]: Number of used instance(s) = 13, Number of
attached instance(s) = UNKNOWN, Number of same instance(s) = 1
16:30:02-77659-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2
16:30:02-7b7d2-11272- [HORCMCFGRDF] SLPR bitmap is checked.
16:30:02-7f90c-11272- [horcmcfgrdf] horccmddev(0) OK.
16:30:02-85faf-11272[HORCRELOWNLBA]
floatable
LBA(e00c)
is
released.ID:0:/dev/rdsk/c2t6d0s2
16:30:02-89c66-11272- [horcread] cmddevopen() start
16:30:02-8de05-11272- [horcread] cmddevopen() finished
16:30:02-925ff-11272***** horcread *****
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0xffbfcb94]0000:
48495441 43484920 4f50454e 2052454d
HITACHI OPEN REM
[0xffbfcba4]0010:
4f544520 434f5059 20535953 54454d20
OTE COPY SYSTEM
[0xffbfcbb4]0020:
44415441 20545950 45203031 00000000
DATA TYPE 01....
[0xffbfcbc4]0030:
f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09
000000080025....
[0xffbfcbd4]0040:
50090500 00020000 00050005 00050005
P...............
[0xffbfcbe4]0050:
00040004 00040004 00060006 00060006
................
[0xffbfcbf4]0060:
00070007 00070007 000f2a00 00000000
..........*.....
[0xffbfcc04]0070:
00000000 b200e00c 08030100 01004000
..............@.
[0xffbfcc14]0080:
f8000400 04400100 01000400 00ff0100
8....@..........
[0xffbfcc24]0090:
80000000 00000000 00000000 00000000
................
[0xffbfcc34]00a0:
00000000 00000000 00000000 00000000
................
[0xffbfcc44]00b0:
0080000e 00080002 00000000 00000000
................
[0xffbfcc54]00c0:
00000000 00000000 00000000 00000000
................
[0xffbfcc64]00d0:
00000000 00000000 00000000 00000000
................
[0xffbfcc74]00e0:
00000000 00000000 00000000 00000000
................
[0xffbfcc84]00f0:
00000000 00000000 00000000 00000000
................
---ADDR--- -OFF0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F------CHAR-----[0xffbfcc94]0100:
00050004 00060007 00080009 0060ffff
.............`..
[0xffbfcca4]0110:
ffffffff ffffffff ffff000d ffffffff
................
[0xffbfccb4]0120:
0020ffff ffffffff ffffffff ffffffff
. ..............
[0xffbfccc4]0130:
ffffffff ffffffff ffffffff ffffffff
................
[0xffbfccd4]0140:
0039ffff ffffffff ffffffff ffffffff
.9..............
[0xffbfcce4]0150:
0028ffff ffffffff ffff002d ffffffff
.(.........-....
[0xffbfccf4]0160:
ffffffff 00320033 ffffffff ffffffff
.....2.3........
[0xffbfcd04]0170:
ffffffff ffffffff ffffffff ffff000a
................
16:30:02-e7a8a-11272- [HORCREAD]: maxldev = 16384, unitnum = 256
16:30:02-ebbdb-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024,
maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
16:30:02-efd23-11272- [HORCREAD]: Number of used instance(s) = 13, Number of
attached instance(s) = UNKNOWN, Number of same instance(s) = 1
16:30:03-07ece-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2
16:30:03-0e0d4-11272- [horcmcfgrdf] seldevdata() OK.
16:30:03-12354-11272- MON(HORC): Size of memory allocation for CONFIG_DB = 64
bytes.
16:30:03-16392-11272MON:HORCM_CMD=/dev/rdsk/c2t6d0s2[Fibre][AL-PA=0xb2
->
C=2,T=32] port=CL1-A , targ=32, lun=42

Here is the AL-PA for the Port, and the Port, target ID and LUN.
16:30:03-1a4ba-11272- MON(HORC):number of Mus = 0
16:30:03-1e633-11272- MON(HOMRCF): Size of memory allocation for CONFIG_DB = 0
bytes.
16:30:03-2275a-11272- MON(HOMRCF):number of Mus = 0

Mike Le Voi

Page 12

15/12/2012

How To Debug CCI Issues Version 1.3


16:30:07-b3adf-11271-

horcmgr

executed

'system(/bin/ls

/dev/rdsk/*

/HORCM/usr/bin/raidscan -find inst)'.

Audit Logging
Check:
Always set full logging if possible. This was introduced with 01-17-03/05
but it is disabled by default. The environment variable is HORCC_LOGSZ. If this
environment variable is not set, only errors are logged. With this variable set, successful
commands are logged as well very useful if you need to know what was typed and when.
However, only the input is logged, not the output! So, always cut and paste the entire
Command Prompt session and send that to GSC as well.
Check:
Can the user reproduce this problem at will? If so, get them to stop CCI,
delete the LOGx directories and then start CCI and issue the command that fails. This will
make reading the LOGx files much easier as the only messages in the logs will be what
you want to look at
Windows
TSTART.BAT

BAT file to start CCI and set the correct options for TC

@echo off
rem
rem Batch file to start HORCM for TrueCopy operations
rem
rem turn on CCI logging for 01-17-03/05 or later
set HORCC_LOGSZ=2048
rem
raidscan -x findcmddev h0,20
set horcmfctbl=2
rem set instance to match your naming convention for the PVOL instance
set horcminst=0
rem next line with a value for SI only!
set horcc_mrcf=
horcmstart 0 1
TSTOP.BAT

BAT file to stop CCI

@echo off
rem
rem Batch file to stop HORCM after TrueCopy operations
rem
horcmshutdown 0 1
set horcmfctbl=
set horcminst=
set horcc_mrcf=
set HORCC_LOGSZ=

UNIX
Check:
Always ask the user to cut and paste the command line input and output
you need to know what they typed and what the result was
root@SYD-E250-1:/opt/HORCM/log4/curlog#:horcmstart.sh 4

Mike Le Voi

Page 13

15/12/2012

How To Debug CCI Issues Version 1.3


starting HORCM inst 4
HORCM inst 4 starts successfully.
root@SYD-E250-1:/opt/HORCM/log4/curlog#:export HORCC_LOGSZ=2048
root@SYD-E250-1:/opt/HORCM/log4/curlog#:raidscan -p CL1-A

root@SYD-E250-1:/opt/HORCM/log4/curlog#:horcmshutdown.sh 4
inst 4:
HORCM Shutdown inst 4 !!!
root@SYD-E250-1:/opt/HORCM/log4/curlog#:cd ..
root@SYD-E250-1:/opt/HORCM/log4#:ls -al
total 10
drwxr-xr-x
4 root
other
512
dr-xr-xr-x 12 root
sys
512
drwxr-xr-x
3 root
other
512
-rw-r--r-1 root
other
289
drwxr-xr-x
3 root
other
512

Mar 7 16:50 .
Feb 22 15:04 ..
Mar 7 16:49 curlog
Mar 7 16:51 horcc_SYD-E250-1.log
Mar 7 16:29 tmplog

Here are the contents of LOG file: horcc_SYD-E250-1.log


COMMAND NORMAL : EUserId for HORC[4] : root (0)
CMDLINE : raidscan -p CL1-A
16:50:37-450c6-11368- [raidscan][exit(0)]
COMMAND NORMAL : EUserId for HORC[4] : root (0)
CMDLINE : /usr/bin/horcctl -S
16:51:54-0f8cf-11376- [horcctl][exit(0)]

Wed Mar

7 16:50:36 2007

Wed Mar

7 16:51:53 2007

Command Device Reject


Most CCI errors are self explanatory however this one is usually impossible for the user
to debug
Here is a simple ShadowImage example:
HORCM8.CONF
HORCM_MON
#ip_address
localhost
HORCM_CMD
\\.\CMD-977-5
HORCM_DEV
#dev_group
#
VG01
HORCM_INST
#dev_group
#
VG01

service
11008

poll(10ms)
1000

timeout(10ms)
3000

dev_name

port#

TargetID

LU#

MU#

LDEV49

CL1-A-1

ip_address

service

localhost

11009

service
11009

poll(10ms)
1000

timeout(10ms)
3000

dev_name

port#

LU#

HORCM9.CONF
HORCM_MON
#ip_address
localhost
HORCM_CMD
\\.\CMD-977-5
HORCM_DEV
#dev_group
#

Mike Le Voi

Page 14

TargetID

MU#

15/12/2012

How To Debug CCI Issues Version 1.3


VG01
HORCM_INST
#dev_group
#
VG01

Check:

LDEV49

CL1-A-1

ip_address

service

localhost

11008

Is the user using good syntax?

Even though this is a 9500V, users should always use Port-HSD-LUN syntax
I strongly recommend not to use Port-LDEV syntax where is the cross-check?
Is MU# specified for ShadowImage? On some levels of CCI, this is mandatory.
However, you should specify it anyway as this is Best Practice.

How to check if the HORCM CONF files are correct:


C:\HORCM\ETC>raidscan -p CL1-A-1 -m 0
PORT# /ALPA/C,TID#, LU#..Num(LDEV#....)...P/S, Status, LDEV#,P-Seq#,P-LDEV#
CL1-A-1/ef/ 5, 1,
0-0 .1(13)............S-VOL PAIR
13, ----10
CL1-A-1/ef/ 5, 1,
1-0 .1(29)............P-VOL PSUS
29,
977
309
CL1-A-1/ef/ 5, 1,
2-0 .1(48)............P-VOL PSUS
48,
977
300
CL1-A-1/ef/ 5, 1,
3-0 .1(309)...........S-VOL SSUS
309, ----29
CL1-A-1/ef/ 5, 1,
4-0 .1(310)...........S-VOL SSUS
310, ----29
CL1-A-1/ef/ 5, 1,
5-0 .1(308)...........S-VOL SSUS
308, ----24
CL1-A-1/ef/ 5, 1,
6-0 .1(305)...........S-VOL SSUS
305, ----1
CL1-A-1/ef/ 5, 1,
7-0 .1(49)............SMPL ---- -----, ----- ----CL1-A-1/ef/ 5, 1,
8-0 .1(50)............SMPL ---- -----, ----- ----C:\HORCM\ETC>pairdisplay -g VG01
Group
PairVol(L/R) (Port#,TID, LU-M) ,Seq#,LDEV#.P/S,Status, Seq#,P-LDEV# M
VG01
LDEV49(L)
(CL1-A-1, 1,
7-0 ) 977
49.SMPL ----,--------- VG01
LDEV49(R)
(CL1-A-1, 1,
8-0 ) 977
50.SMPL ----,--------- -

Check the PVOL and SVOL LDEV numbers (in this case, 49 and 50)
Check the Port/HSD/LUN (in this case, CL1-A-1 LUNs 7 and 8)

But now the failure:


C:\HORCM\ETC>paircreate -g VG01 -vl
paircreate: [EX_CMDRJE] An order to the control/command device was rejected
Refer to the command log(C:\HORCM\log8\horcc_hp2k5_log.txt) for details.

And in the log we see this:


COMMAND ERROR : EUserId for HOMRCF[8] : Administrator (0) Wed Mar 07 17:02:30
2007
CMDLINE : paircreate -g VG01 -vl
17:02:30-9a8a8-12452- ERROR:cm_sndrcv[rc < 0 from HORCM]
17:02:30-9e728-12452- [paircreate] L_CMD(CREATEPAIR) ERROR :rc = -35
17:02:30-9e728-12452- [paircreate][exit(221)]
[EX_CMDRJE] An order to the control/command device was rejected
[Cause ]: An order to the command(control) device failed,or was rejected.
[Action]:Please confirm the following items.If this trouble doesn't resolve,then
collect HORCM error logs(HORCM_LOG=C:\HORCM\log8\curlog) and Remote HORCM
logs,and send them to service personnel.
(1) Check if the 'HORC or HOMRCF function' is installed in the RAID.
(2) Check if the RCP and LCP are installed in the RAID.
(3) Check if the path between the RAID CUs is established by using the SVP.
(4) Check if the pair target volume is an appropriate status.

Mike Le Voi

Page 15

15/12/2012

How To Debug CCI Issues Version 1.3


Yes, meaningless error message numbers like -35 and 221. If this is a RAID subsystem,
check the SSBLOGS on the SVP. However, for DF, the SSB is logged in CCI
Contents of C:\HORCM\log8\curlog\horcmlog_servername\horcm_log.txt
17:02:30-9a8a8-14140- SCSI :
17:02:30-9a8a8-14140---ADDR--- -OFF0-1-2-3[0x0012f2b4]0000:
70000500
[0x0012f2c4]0010:
00001000
[0x0012f2d4]0020:
00000000
[0x0012f2e4]0030:
00000000
[0x0012f2f4]0040:
00000000
[0x0012f304]0050:
00000000
17:02:30-9a8a8-14140- SKEY =
17:02:30-9a8a8-14140- ASC =
17:02:30-9a8a8-14140- SSB =
17:02:30-9a8a8

Check Condition.
***** SCSI SENSE DATA
4-5-6-7- 8-9-A-B- C-D-E-F00000038 8400000d 961c0000
00000000 00000000 00000000
00000000 00000000 00000000
00000000 00000000 00000000
00000000 00000000 00000000
00000000 00000000 00000000
0x05
0x96
0x8400,000d

*****
------CHAR-----p......8........
................
................
................
................
................

is the cross-check. Next, it is not obvious, but the error code is:

961C 000D
Now, get hold of the latest AMS CCI manual which contains Appendix A.4
A.4 How to Read Detailed Error Log Codes
and this subsection
A.4.4 Sense Code and Detail Code
Beware some versions of this manual do not contain these sections Find one that
does.
Table A.5 Sense Codes and Detailed Codes

Error
Contents
Recommended Action
961C 000C
The S-VOL is a Sub LU of a unified LU.
Check the status of the LU.
961C 000D The default controllers controlling the P-VOL and S-VOL are not the
same.
961C 000E
The P-VOL is a Cache Residency LU.
Check the status of

In this case, the PVOL and SVOL default controllers are not the same.

Old Syntax HORCM CONF Files


This problem only applies to RAID subsystems from 9900V onwards. With 7700E and
9900, there were no Host Storage Domains (HSD) so all LUNs were on the real port.
With 9900V, USP etc the LUNs are normally considered to be attached to logical ports
which are called HSD or Host Groups.
However, it is still possible to use the old syntax. This always causes confusion after a
while as LUNs get added and deleted from various HSD. Here is an example:

Imagine that 3 HSD are created on an empty port HSD 1,2 and 3
Each HSD has 3 LUNs added numbered as 0, 1 and 2

Mike Le Voi

Page 16

15/12/2012

How To Debug CCI Issues Version 1.3


If this is done in sequence, HSD 1 has absolute LUNs 0-2, HSD 2 has absolute LUNs
3-5 and HSD 3 has absolute LUNs 6-8
Now imagine that the following actions have been performed some time later:
Delete HSD 2
Add HSD 4 with LUNs 0 and 1
And then you allocate LUN 3 to HSD 1 and 3. If you did not know that the previous
changes had been made, it would be impossible for you to guess that:

HSD 1 LUN 3 was absolute LUN 5


HSD 3 LUN 3 was absolute LUN 9

Even worse, you have no way of looking at the LUN allocations via Storage Navigator as
that only shows relative LUN numbers
In a recent case, 47 S-VOL LUNs were deleted by mistake from a HSD. When the mistake
was noticed, the same 47 S-VOL LUNs were added back in the same order. However, a
subsequent pairdisplay showed the following:
TC-WRP
TC-WRP
TC-WRP
TC-WRP
TC-WRP
TC-WRP
TC-WRP
TC-WRP
TC-WRP
TC-WRP
...

1003-108A(L)
1003-108A(R)
1004-108B(L)
1004-108B(R)
1005-108C(L)
1005-108C(R)
1006-108D(L)
1006-108D(R)
1007-108E(L)
1007-108E(R)

(CL2-F
(CL1-C
(CL2-F
(CL1-C
(CL2-F
(CL1-C
(CL2-F
(CL1-C
(CL2-F
(CL1-C

,
,
,
,
,
,
,
,
,
,

0,
0,
0,
0,
0,
0,
0,
0,
0,
0,

45)32179 10b5.S-VOL PAIR ASYNC , 0 102e


4)32208 1003.P-VOL PAIR ASYNC , 0 108a - (1)
46)32179 -----.---- ---- ------,----- ----- 5)32208 1004.P-VOL PAIR ASYNC , 0 108b 47)32179 -----.---- ---- ------,----- ----- 6)32208 1005.P-VOL PAIR ASYNC , 0 108c 48)32179 -----.---- ---- ------,----- ----- 7)32208 1006.P-VOL PAIR ASYNC , 0 108d 49)32179 108a.S-VOL PAIR ASYNC , 0 1003 - (2)
8)32208 1007.P-VOL PAIR ASYNC , 0 108e - (3)

What can you tell from the display above? Firstly, the pairdisplay was issued by the DR
CCI server as (L) refers to the S-VOL. Next, we have obvious mismatches in yellow.
What is less obvious is that the turquoise and green pairs are also invalid. Indeed:
(3) P-VOL is 10:07 and the associated S-VOL is 10:8E
(2) S-VOL is 10:8A and the associated P-VOL is 10:03
This entry does not go with (3)
(1) This is the associated P-VOL for (2)
Here is an excerpt from the old HORCM CONF file using absolute LUN numbers:
TC-WRP
TC-WRP
TC-WRP
TC-WRP

1003-108A
1004-108B
1005-108C
1006-108D

CL2-F
CL2-F
CL2-F
CL2-F

0
0
0
0

45
46
47
48

And here is the same excerpt after the file has been changed to use HSD syntax:
TC-WRP
TC-WRP
TC-WRP
TC-WRP

Mike Le Voi

1003-108A
1004-108B
1005-108C
1006-108D

CL2-F-2
CL2-F-2
CL2-F-2
CL2-F-2

Page 17

0
0
0
0

6
7
8
9

15/12/2012

How To Debug CCI Issues Version 1.3


As you can, the new HORCM CONF file is easier to understand and compare with
Storage Navigator
By the way, here is how you find out the absolute and relative LUN numbers:
raidscan -p CL2-F -fx
...
CL2-F
/88/ 3, 0, 49.1(108a)..........S-VOL PAIR
CL2-F
/88/ 3, 0, 50.1(108b)..........S-VOL PAIR
CL2-F
/88/ 3, 0, 51.1(108c)..........S-VOL PAIR
...
raidscan -p CL2-F-2 -fx
...
CL2-F-2 /88/ 3, 0,
6.1(108a)..........S-VOL PAIR
CL2-F-2 /88/ 3, 0,
7.1(108b)..........S-VOL PAIR
CL2-F-2 /88/ 3, 0,
8.1(108c)..........S-VOL PAIR
...

ASYNC
ASYNC
ASYNC

108a, ----- 1003


108b, ----- 1004
108c, ----- 1005

ASYNC
ASYNC
ASYNC

108a, ----- 1003


108b, ----- 1004
108c, ----- 1005

Secured CMDDEV and HORCMPERM Implications


If you use a normal ie non-secured CMDDEV, you can control CCI for any LUNs on
any Host. This also means that you can destroy anyones data by using SI to copy your
LUNs over the top of their LUNs
For this reason, you normally only let the Storage Administrator have access to a normal
CMDDEV and you always give normal users access to a Secured CMDDEV.
You can tell if a CMDDEV is secured as follows:
C:\HORCM\ETC>horcmstart 0
starting HORCM inst 0
HORCM inst 0 starts successfully.
C:\HORCM\ETC>set horcminst=0
C:\HORCM\ETC>horcctl D
Current control device = \\.\PHYSICALDRIVE1

This is a normal CMDDEV. For this test, I also had access to a secured CMDDEV and
it is possible to swap between them as follows:
C:\HORCM\ETC>horcctl -C
Changed control device(\\.\PHYSICALDRIVE1 -> \\.\PHYSICALDRIVE10*
C:\HORCM\ETC>horcctl -D
Current control device = \\.\PHYSICALDRIVE10*

The asterisk means that the CMDDEV is secured. Normally, of course, you would not
give any user access to different types of CMDDEV as that will cause problems.
To test what will happen before giving a secured CMDDEV to a user, you can set the
HORCMPROMOD environment variable as follows:
C:\HORCM\etc>set HORCMPROMOD=1
C:\HORCM\ETC>horcmstart 410
starting HORCM inst 410
HORCM inst 0 starts successfully.
C:\HORCM\ETC>set horcminst=410
C:\HORCM\etc>horcctl -D
Current control device = \\.\PhysicalDrive53

Mike Le Voi

Page 18

15/12/2012

How To Debug CCI Issues Version 1.3

Note, however, that this does not affect the horcctl display!
Here is some pairdisplay output when HORCMPROMOD is not set on any CCI server.
C:\HORCM\etc>pairdisplay -g VG01
Group
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01

PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M


d0(L)
(CL2-D , 1, 410)77010027
410.P-VOL PAIR NEVER ,75010010
410
d0(R)
(CL1-A , 1, 410)75010010
410.S-VOL PAIR NEVER ,----410 d1(L)
(CL2-D , 1, 411)77010027
411.P-VOL PAIR NEVER ,75010010
411
d1(R)
(CL1-A , 1, 411)75010010
411.S-VOL PAIR NEVER ,----411 d2(L)
(CL2-D , 1, 412)77010027
412.P-VOL PAIR NEVER ,75010010
412
d2(R)
(CL1-A , 1, 412)75010010
412.S-VOL PAIR NEVER ,----412 d3(L)
(CL2-D , 1, 413)77010027
413.P-VOL PAIR NEVER ,75010010
413
d3(R)
(CL1-A , 1, 413)75010010
413.S-VOL PAIR NEVER ,----413 d4(L)
(CL2-D , 1, 414)77010027
414.P-VOL PAIR NEVER ,75010010
414
d4(R)
(CL1-A , 1, 414)75010010
414.S-VOL PAIR NEVER ,----414 -

As you can see, LDEVs 410-414 on an AMS1000 (SN begins with 770x) are paired with
LDEVs 410-414 on an AMS500 (SN begins with 750x).
Here is the same pairdisplay output when HORCMPROMOD has been set on both CCI
servers.
C:\HORCM\etc>set HORCMPROMOD=1
C:\HORCM\etc>horcmstart 410
starting HORCM inst 410
HORCM inst 410 starts successfully.
C:\HORCM\ETC>set horcminst=410
C:\HORCM\etc>pairdisplay -g VG01
Group
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01

PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M


d0(L)
(CL2-D , 1, 410)77010027
410.P-VOL PAIR NEVER ,75010010
410
d0(R)
(CL1-A , 1, 410)75010010 *****.---- ---- ------,----- ----- d1(L)
(CL2-D , 1, 411)77010027
411.P-VOL PAIR NEVER ,75010010
411
d1(R)
(CL1-A , 1, 411)75010010 *****.---- ---- ------,----- ----- d2(L)
(CL2-D , 1, 412)77010027
412.P-VOL PAIR NEVER ,75010010
412
d2(R)
(CL1-A , 1, 412)75010010 *****.---- ---- ------,----- ----- d3(L)
(CL2-D , 1, 413)77010027
413.P-VOL PAIR NEVER ,75010010
413
d3(R)
(CL1-A , 1, 413)75010010 *****.---- ---- ------,----- ----- d4(L)
(CL2-D , 1, 414)77010027
414.P-VOL PAIR NEVER ,75010010
414
d4(R)
(CL1-A , 1, 414)75010010 *****.---- ---- ------,----- ----- -

As you can see, the local CCI instance (L) has access to all its LUNs/LDEVs. However,
the DR CCI server (R) has no access to LDEVs 410-414.
If you attempt to do any commands such as pairsplit, the following will happen:
C:\HORCM\etc>pairsplit -g VG01
pairsplit: [EX_ENPERM] Permission denied with the LDEV
Refer to the command log(C:\HORCM\log410\horcc_Verdande_log.txt)
details.

for

You can use HORCMPERM*.CONF (* is the instance number) to further limit CCI
access. HORCMPERM*.CONF does not give you access to LDEVs that you are not
allowed to process. It removes access to LDEVs that you are allowed to process but do
not wish to process.
How does this work? Lets start instance 410 with HORCMPROMOD=1 and no
HORCMPERM.CONF file. At the bottom of the start up log you will see this:
11:01:48-518b0-02092- HORCM has been set to the PROTECT MODE on 'ENV'.

Mike Le Voi

Page 19

15/12/2012

How To Debug CCI Issues Version 1.3


...
11:01:50-e2900-01428- horcmgr : executed 'CreateProcess(raidscan.exe -pi
$PhysicalDrive -find inst -z2w=NUL -z1w=NUL)',exit = 0.

As you can see, raidscan is called internally with an argument of -pi $PhysicalDrive (all
physical drives) thus allowing all LUNs on this server to be accessed.
Now lets stop horcm and define a file as follows:
C:\HORCM\etc>type C:\WINDOWS\HORCMPERM410.CONF
hd0-56
C:\HORCM\etc>

HORCMPERM410.CONF contains a list of every device that we wish to be able to access


via CCI. Here is the resultant pairdisplay after a restart of horcm:
C:\HORCM\etc>pairdisplay -g VG01
Group
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01

PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M


d0(L)
(CL2-D , 1, 410)77010027
410.P-VOL PAIR NEVER ,75010010
410 d0(R)
(CL1-A , 1, 410)75010010 *****.---- ---- ------,----- ----- d1(L)
(CL2-D , 1, 411)77010027
411.P-VOL PAIR NEVER ,75010010
411 d1(R)
(CL1-A , 1, 411)75010010 *****.---- ---- ------,----- ----- d2(L)
(CL2-D , 1, 412)77010027
412.P-VOL PAIR NEVER ,75010010
412 d2(R)
(CL1-A , 1, 412)75010010 *****.---- ---- ------,----- ----- d3(L)
(CL2-D , 1, 413)77010027 *****.---- ---- ------,----- ----- d3(R)
(CL1-A , 1, 413)75010010 *****.---- ---- ------,----- ----- d4(L)
(CL2-D , 1, 414)77010027 *****.---- ---- ------,----- ----- d4(R)
(CL1-A , 1, 414)75010010 *****.---- ---- ------,----- ----- -

The bold lines show what has changed. Here is the bottom of the start up log file.
11:08:03-7d3e8-02408- horcmgr : executed 'CreateProcess(raidscan.exe
-find inst -z0r=C:\WINDOWS\horcmperm410.conf -z2w=NUL -z1w=NUL)',exit =
0.

As you can, raidscan has been called internally and is using the list of disks in
HORCMPERM410.CONF to determine which LDEVs can be accessed. Here is some
inqraid output:
C:\HORCM\etc>inqraid $LETALL -CLI
DEVICE_FILE
PORT
SERIAL LDEV CTG
E:\Vol13\Dsk54 CL2-D
77010027
410
F:\Vol14\Dsk55 CL2-D
77010027
411
Q:\Vol11\Dsk12 CL1-B
3157
169
G:\Vol15\Dsk56 CL2-D
77010027
412
R:\Vol12\Dsk13 CL1-B
3157
170
H:\Vol16\Dsk57 CL2-D
77010027
413
I:\Vol17\Dsk58 CL2-D
77010027
414
J:\Vol2\Dsk0
-

H/M/12 SSID R:Group PRODUCT_ID


P/s/ss 0000 A:07-00 DF600F
P/s/ss 0000 A:07-00 DF600F
P/s/ss 0000 5:02-00 DF600F
- P/s/ss 0000 A:07-00 DF600F
P/s/ss 0000 5:02-00 DF600F
- P/s/ss 0000 A:07-00 DF600F
- P/s/ss 0000 A:07-00 DF600F
- ST336754LC
-

The bold lines show that LDEVs 413 and 414 are Physical Drives 57 and 58 and as we
only allowed access to Physical Drives 0-56, this explains why the pairdisplay has
changed.
Note that it is possible to fix this mistake by manual use of the raidscan command as
follows:
C:\HORCM\etc>echo hd57-58 | raidscan -find inst

Mike Le Voi

Page 20

15/12/2012

How To Debug CCI Issues Version 1.3


DEVICE_FILE
Harddisk57
Harddisk57
Harddisk58
Harddisk58

Group
VG01
VG01
VG01
VG01

PairVol
d3
d3
d4
d4

PORT
CL2-D
CL2-D
CL2-D
CL2-D

TARG
1
1
1
1

LUN
413
413
414
414

M
0
0
-

SERIAL LDEV
77010027
413
77010027
413
77010027
414
77010027
414

C:\HORCM\etc>pairdisplay -g VG01
Group
PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01
VG01

d0(L)
d0(R)
d1(L)
d1(R)
d2(L)
d2(R)
d3(L)
d3(R)
d4(L)
d4(R)

(CL2-D
(CL1-A
(CL2-D
(CL1-A
(CL2-D
(CL1-A
(CL2-D
(CL1-A
(CL2-D
(CL1-A

,
,
,
,
,
,
,
,
,
,

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

410)77010027
410)75010010
411)77010027
411)75010010
412)77010027
412)75010010
413)77010027
413)75010010
414)77010027
414)75010010

410.P-VOL
*****.---411.P-VOL
*****.---412.P-VOL
*****.---413.P-VOL
*****.---414.P-VOL
*****.----

PAIR
---PAIR
---PAIR
---PAIR
---PAIR
----

NEVER ,75010010
410
------,----- ----- NEVER ,75010010
411
------,----- ----- NEVER ,75010010
412
------,----- ----- NEVER ,75010010
413
------,----- ----- NEVER ,75010010
414
------,----- ----- -

Of course, you are unlikely to fix such an issue with raidscan. You would normally fix
HORCMPERM*.CONF and then stop and restart horcm.

Basic HORCM CONF problems


When HORCM will not start, you strip the CONF file back to the bare essentials and
then change one thing at a time. Sometimes even this fails. Here are the most common
reasons.
HORCM_MON
#ip_address
10.129.3.127

service
11042

poll(10ms)
1000

timeout(10ms)
3000

HORCM_CMD
#dev_name
# CMDDEV0 - USP600 - SN 10111 - \\.\CMD-10111-4
\\.\CMD-10111-4

The above file is correct let us make some simple changes to break it.
1

Wrong IP Address

Change 10.129.3.127 to 10.129.2.127. A simple typo, but here is what you get:
Windows
[System Call Error]
SysCall: bind
WSAerr : 10049(0x00002741) (See winsock2.h)
ErrInfo: Internal Error
ErrTime: Mon Sep 08 12:43:03 2008
SrcFile: shorcmc.c
SrcLine: 2405
ERROR:cmr_repcre[scmcrepcr fail]

Of course, it is the Internal Error that confuses most people here. The real error is in the
line above. It is the result of a standard call to an OS socket service, in this case Winsock.
Here is the relevant section from winsock2.h:

Mike Le Voi

Page 21

15/12/2012

How To Debug CCI Issues Version 1.3

#define WSABASEERR

#define WSAEADDRNOTAVAIL

10000
(WSABASEERR+49)

Here is a useful web page and some useful information:


http://www.sockets.com/err_lst1.htm

WSAEADDRNOTAVAIL (10049) Cannot assign requested address.


Berkeley description: Normally results from an attempt to create a socket with an address
not on this machine.
So, the error is obvious when you know where to look. The problem is, not many people
know where to look!
UNIX
UNIX errors messages are not only different, they are different on each platform! Here is
the same error for Solaris:
[System Call Error]
SysCall: bind
Errorno: 126 (Cannot assign requested address)
ErrInfo: Internal Error
ErrTime: Tue Sep 2 11:45:40 2008
SrcFile: shorcmc.c
SrcLine: 2427
ERROR:cmr_repcre[scmcrepcr fail]

Here is a useful web page:


http://www.ioplex.com/~miallen/errcmpp.html
The relevant line for this error says:
AIX 4.3,5.1

HP-UX 11.22

Solaris 9,10

EADDRNOTAVAI
Can't assign
Can't assign
Can't assign
68
227
126
L
requested address
requested address
requested address

Once again, this is not the most intuitive error I have seen.
2

Invalid CMDDEV

Here is what you get if you change the CMDDEV to \\.\CMD-10111-42


12:52:23-16b48-0400412:52:23-16b48-04004receiving a command.
12:52:23-16b48-04004HORC_CMD.
12:52:23-16b48-0400412:52:28-0b3b0-01136-

Mike Le Voi

horcread():cannot open command device:\\.\CMD-10111-42


[WARNING] This device(\\.\CMD-10111-42) is not ready for
No device is ready for receiving a command in 1 line from
ERROR:horcm_cfg_create
horcmgr:Failed to connect to HORCM.

Page 22

15/12/2012

How To Debug CCI Issues Version 1.3

Here I think it is pretty obvious what the problem is

Mike Le Voi

Page 23

15/12/2012

How To Debug CCI Issues Version 1.3


3

Invalid service name

Change 11042 to horcm42


17:29:02-d59f8-0226017:29:02-d59f8-02260HORCM_MON: line 4
17:29:02-d59f8-0226017:29:02-d59f8-0226017:29:02-d59f8-02260-

[horcmcfgrdf] open(conf_file) OK.


ERROR: A wrong ipaddr or servicename
10.129.3.127
horcm42
1000
[horcmcfgrdf] close(conf_file) OK.
ERROR:horcm_cfg_create

line

exists

in

3000

Once again, it is more obvious what is wrong.


4

UDP port which is in use

Change 11042 to 1030. This is not a sensible port number. It was chosen to cause an
error.
[System Call Error]
SysCall: bind
WSAerr : 10013(0x0000271d) (See winsock2.h)
ErrInfo: Internal Error
ErrTime: Mon Sep 08 17:39:46 2008
SrcFile: shorcmc.c
SrcLine: 2405
ERROR:cmr_repcre[scmcrepcr fail]

Here is the relevant section from winsock2.h:


#define WSAEACCES

(WSABASEERR+13)

The following web page has more information:


http://www.sockets.com/err_lst1.htm

WSAEACCES (10013) Permission denied.


Berkeley description: An attempt was made to access a file in a way forbidden by its file
access permissions.
However, in this case, that is hardly descriptive of the problem. Of course, if one had
access to a command prompt, one could do this:
C:\HORCM\ETC>netstat -a -p UDP
Active Connections
Proto
UDP
UDP
UDP

UDP

Local Address
Foreign Address
ml_acer510:microsoft-ds *:*
ml_acer510:isakmp
*:*
ml_acer510:1030
*:*
ml_acer510:54323

State

*:*

It is not likely that you will be this lucky

Mike Le Voi

Page 24

15/12/2012

How To Debug CCI Issues Version 1.3

Comments
This is a work in progress. If you would like to see anything else, let me know.

Mike Le Voi
Software Technical Specialist
APAC Global Support Centre
8th September 2008

Mike Le Voi

Page 25

15/12/2012

S-ar putea să vă placă și