Documente Academic
Documente Profesional
Documente Cultură
board Replacement
Connectivity to the rack will depend on the customer's access requirements. The following
procedure is to be used with the latest EIS Checklist & the Exalogic Owner's Guide (Section 3.4),
assuming using a laptop attached to the Cisco management switch. If no port is available in a full
rack, then temporarily disconnect a port used for one of the PDU’s and export the backup files to
/tmp on another compute node in the rack using scp. If the customer does not allow login access to
the host ILOM, then they will need to run the commands given below.
Remember if connecting to ILOM via serial cable that the baud rate is 9600 for replacement
boards. This will get corrected during the post-install procedure to the Exalogic default which is
115200 for installed boards.
Note : If you disconnect one of the ILOM ports for temporary usage - the entire compute node will
be inaccessible from the management network.
Reference links:
http://eis.us.oracle.com/checklists/pdf/Exalogic-X5.pdf
http://docs.oracle.com/cd/E18476_01/index.htm
ILOM Firmware
Prior to going to site it is a good idea to find
out what ILOM firmwares is running on the
customers
rack systems. You may well need to take this
software with you to upgrade the firmwares to the
Assuming the ILOM is not the reason for the replacement of the system motherboard, then take a
current backup of the ILOM SP configuration using a browser under ILOM Configuration Tab.
(Different versions of ILOM may have this in different locations on the BUI)
-> cd /SP/config
-> set passphrase=welcome1
-> set dump_uri=scp://root:password@laptop_IP/var/tmp/SP.config
The FSE/customer should also collect /SP/network and /SP since you will need to manually set the
IP address after the MB is replaced.
4. If the system is not down already due to whatever problem is causing the motherboard to be
replaced, the system administrator should prepare the system for service by performing any
application related functions required to shutdown the compute node. This might include but is not
limited to performing a system backup, fail-over of application or services, and finally a system
shutdown. If this is an OVS Virtual installation please confirm the SA has put the compute node in
‘Maintenance’ mode from OVMM/EMOC.
Maintenance Mode:
Please consult with site SA with these steps as they should run them.
The general process that needs to be followed when an identity changing repair is needed is as
follows;
a) Identify any vServers running on the Node to be serviced and shut them down.
c) Remove server from server pool and from OVMM (interacting with OVMM)
1) remove server from server pool
2) unexpose OVS repository
3) remove server from OVMM
Note:
The above procedure works OK if the node is up - if it is dead – the node may come up with no
/OVS mounted & does not join the cluster.
Workaround:
1. Edit the server pool and remove the node already added.
2. Go to the events view in OVMM for the unassigned server and acknowledge all events (green)
3. Rediscover the server.
4. Edit the server pool and add it once more (may cause the node to reboot)
5. When the node is back up , add the server to the pool again.
Note: A new system board will have a new UUID which will change the compute node's identity.
The affected node should be already without VM s running and in ‘Maintenance’ mode.
Check doc:
How To Stop and Start the Entire Exalogic Control Stack In An Exalogic EECS v2.0.6.0.0 and
later Virtual releases (Doc ID 1594223.1)
EMOC tasks
Example here assumes Compute Node 2 has had a failure. See Doc ID 1551724.1 for removal of
node from EMOC.
Identify any vServers running on the Node to be serviced and shut them down.
You may get an error as you see here - the error seen here can be ignored
check node to be serviced (compute Node 2 in this case) disappears in the left panel
Select ‘Server pool’ and click ‘Edit Server pool’ action then “Edit Server Pool” pops up,
select ‘Server’ tab and then move node (Compute Node 2) to the left pane, click OK, news jobs
starts, wait to be completed.
Click OVMM BUI ‘Repositories’ tab, select / in left repositories pane, then click the two green
arrows, next select ‘Servers’ and check cnode is not present on the right .
In the ‘Storage’ tab, select the Generic Network File System, this will bring up the Add/Remove
Admin Server list, ensure cnode is not listed there. If there, select << remove from list.
NOTE:- Pull power cords before opening the top cover to avoid a SP degraded condition.
2. Carefully follow the port numbers on the cables when re-attaching so they are not reversed. It is
easiest to plug cables in while the server is in the fully extended maintenance position.
Solution:
Update MAC in ifcfg-eth interfaces as indicated in document: Network Interfaces Not
2. Update the Serial Number on the new motherboard, to that of the server chassis. This is
REQUIRED in order for ASR to continue to work on the unit, and is REQUIRED for all servers
that
are part of Exalogic racks that may have a future Service Request, whether ASR is configured
now or not.
Exalogic Elastic Cloud X2-2 and X3-2 machines with x4170m2 or X3-2 / X4-2 / X5-2
Compute Nodes
These platforms use the Top Level Indicator (TLI) feature in ILOM to perform the motherboard
serial number update automatically.
For more information on TLI and restricted shell lease refer to the following 2 MOS notes for these
systems:-
NOTE:- The serial numbers of each server can be found at the front on the left hand side.
If the replacement has the correct product serial number, then skip to step 2 of the post-replacement
procedures. If the replacement does not have the product serial number populated correctly, then
continue:
(c) Where there is at least one container which still contains valid TLI information, a
service mode command copypsnc can be used to update the product serial number.
-> cd /SP/users
-> create sunny role=aucros (will ask for password)
(d) Gather “version”, “show /SYS” and “show /SP/clock” outputs needed for
generating the service mode password:
-> version
SP firmware 3.0.9.27.a
SP firmware build number: 58740
SP firmware date: Tue Sep 14 15:48:24 EDT 2010
SP filesystem version: 0.1.23
BRAND : sun
MODE : service
VERSION : 3.0.9.27.a
SERIAL : 00000000
UTC DATE : 10/24/2011 17:12
POP DOLL PHI TOW BRAN TAUT FEND PAW SKI SCAR BURG CEIL MINT DRAB
KAHN FIR MAGI LEAF LIMB EM LAWS BRAE DEAL BURN GOAL HEFT HEAR KEY
SEE A
(f) Logout of root and log back in as 'sunny' user that you created, and enter Service
mode:
-> showpsnc
Primary: fruid:///SYS/PDB
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/DBP
------------------+-------------------+-------------------+-------------------
Container Status Invalid Valid Valid
PPN 602-4980-01 602-4980-01 602-4980-01
PSN 00000000 1039FMM0E6 1039FMM0E6
Product Name SUN FIRE X4170 M2 SERVER SUN FIRE X4170 M2 SERVER SUN FIRE
X4170 M2 SERVER
WWN 500605b00290a3e0 500605b00290a3e0 500605b00290a3e0
->
(j) Logout from the 'sunny' user, and log back in as root, and remove the 'sunny' user:
2. Re-flash the ILOM/BIOS to the correct levels required for Exalogic Elastic Cloud.
(a) login to another compute node ilom and check the version.
-> version
SP firmware 3.0.16.10.a
SP firmware build number: 68533
SP firmware date: Wed Oct 12 10:46:03 EDT 2011
SP filesystem version: 0.1.23
(b) If you do not have the correct firmware installed, and you know the correct
version, then it can be obtained from MOS patches or EIS DVD.
http://eis.central.sun.com/eisdvd/eisdvd.html
-> cd /SP/config
-> set passphrase=welcome1
-> set load_uri=scp://root:password@laptop_IP/var/tmp/SP.config
If SP backup was not possible check with customer for network information & use another ILOM
within the rack for general settings. The primary specific setup for Exalogic are:
(a) Baud rate is 115200
(b) /SP system_identifer is set to the appropriate rack type string and master
Rack Serial Number. This is critical for ASR deployments. The Master Rack Serial
number can be obtained top left inside the cabinet or from show /SP on any other
ILOM. The string should be of the following format:
1. system_identifier = Oracle Exalogic X2-2 1052AK22D6
For Example:
-> show /SP
Properties:
check_physical_presence = true
hostname = elx22bur09cn01-ilom
reset_to_defaults = none
system_contact = (none)
system_description = SUN FIRE X4170 M2 SERVER, ILOM
v3.0.16.10.a, r68533
system_identifier = Oracle Exalogic X2-2 1052AK22D6
system_location = (none)
If the root password has not been changed to customers you can
have the customer do this, or do this manually:
Finally, check you can login to all interfaces and ILOM can be
accessed using a browser and ssh from another system on the
customer's management network.
Example:
-> show /SP/network
/SP/network
Targets:
interconnect
ipv6
test
Properties:
commitpending = (Cannot show property)
dhcp_server_ip = none
ipaddress = 10.152.223.171
ipdiscovery = static
ipgateway = 10.152.223.1
ipnetmask = 255.255.255.0
macaddress = 00:21:28:A5:BE:21
managementport = /SYS/MB/NET0 <--
outofbandmacaddress = 00:21:28:A5:BE:20 <--
pendingipaddress = 10.152.223.171
pendingipdiscovery = static
pendingipgateway = 10.152.223.1
pendingipnetmask = 255.255.255.0
pendingmanagementport = /SYS/MB/NET0 <--
sidebandmacaddress = 00:21:28:A5:BE:21 <--
state = enabled ←
(b) Reset the ILOM under the Maintenance Tab or from ILOM cli:
-> reset /SP
(b) For Solaris Physical ECU, there are no BIOS changes required.
For Linux Physical ECU, Enable the C-states.
Make sure the CPU C-State is ENABLED. Follow the steps below:
• Login to the compute node ILOM
• set /HOST boot_device=bios
• start /SYS
• Start /SP/Console
• Wait for Menu
• Advanced Tab
• CPU Configuration
• Scroll to bottom
• Confirm enabled
For ECU for OVS Virtual installation there are specific bios settings that need to be set.
Make sure that the PCI Payload is set to 256. Follow the steps below:
• PCI Express Configuration
• Change Maximum Payload Size to '256'
Make sure that the CPU C-State is DISABLED. Follow the steps below:
• CPU Configuration
• scroll to bottom
• confirm DISABLED
Enable SRIOV if the nodes have been freshly imaged to OVS from OEL. If
SRIOV is not enabled, the IB network will not be enabled. To enable SRIOV do
the following:
• Select I/O Virtualization and enable it
• Save Changes and Exit
Below are the detailed steps to follow to return a node back to EMOC and OVMM
Please consult with site SA with these steps as they should run them.
If the customer needs assistance with these procedures, they should engage
EEST for assistance through the SR
Once the node is up and running, it is good practice to check the admin network, and all other
networks, as ilom, node eth-admin, ipoib-xxx ip’ss – check these can be pinged from other nodes,
If everything is ok , proceed with the next steps
EMOC Details:
Log into EMOC with root user
Now that the EMOC part is completed, check in assets that the cnode is listed OK in left panel and
is listed with the other nodes
Log in to Oracle VM Manager using the admin user credentials and discover the new compute
node. Use the IP address of the IPoIB-ovm-mgmt partition.
In ‘Servers’ and ‘VM’s’ tab, select Server Pools and click ‘Discover’ server action
The Default password for ovs-agent in OVMM is oracle . This is required during discovery.
Check if the node is presented in repositories (use the two green arrows action icon, select server in
pop up)
In the case here on this screen shot, the node is already presented
Check the cnode is configured ok, Utility, VM Server, take Ownership flags selected, so node is
ready to work in normal operation in virtual environment.
Now the customer should be ready to test if the VMs run OK in new cnode, (can be used procedure
to set vserver_placement.ignore_node=true to all other cnodes
except the node we want vms to be started to test)
It is good practice to leave EC VMs in the default nodes were they should be.
Note: If Vserver is not starting, please verify OVMM has the following checked after re-discovery
of asset.
Utility Server X
VM Server X
You can now hand the system back to the customer System Administrator to check all services
are up and also if this was an OVS Virtual install, they will need to verify the VM's are able to
come up properly. If the customer DBA requires assistance beyond this, then you should
direct them to callback the parent SR owner in EEST.