Documente Academic
Documente Profesional
Documente Cultură
FW RELEASE:
OEM 0005
Customer(s): STD_OEM
Product(s): MantaRay
Interface(s): SAS
NOTES:
0004 to 0005
Drive is unresponsive after servo code corruption in the flash during download
Doc Reference
CDD-169723
Likelihood
Low
Severity
Major
Drive Hang
Failure Scenario
Drives were returned from a customer as DNR (Drive Not Ready). Investigation determined that the servo code
in the flash was corrupted. The servo code was corrupted because power was pulled during download while flash
was being programmed, causing Servo unload failure on a download reset. When the drive is in this state,
download of a combination of Controller plus servo code could result in a drive hang.
Root Cause
F/W Change Descr
CDD-173832
Likelihood
Low
Severity
Moderate
Spec Violation
Failure Scenario
- The drive attempts to send a response for a command, but the response fails for some reason (i.e. NAK'ed by
initiator, port down, etc).
- The command associated with the response is aborted (ie. due to hard reset).
- The Re-Transmit bit may be incorrectly set to 1 in the response frame for a subsequent command.
Root Cause
When a response needs to be re-transmitted, a flag is set to indicate that the Re-Transmit bit should be set in the
response frame. If the command which is associated with the response is aborted prior to the response being retransmitted, the flag may not be cleaned up properly. A subsequent response frame for a different command
may have the Re-Transmit bit incorrectly set to 1.
When a command is aborted, properly clean up the flag that is used to indicate that the Re-Transmit bit is to be
set.
CDD-174655
Likelihood
Low
Severity
Moderate
Unexpected status
Failure Scenario
Multiple Start Immediate commands, issued together, can cause unexpected status to be returned (e.g., 02/04/02
- Need SSU when it should have been 02/04/01 - in process of coming ready). This can lead to unexpected
system behavior.
Root Cause
When processing the second of two Start Immediate commands received together, a structure that typically
contains command context is referenced after being freed and reused by a new command.
When evaluating the properties of a Start Stop Unit command, only reference fields that are still valid after the
command returns status to the host.
CDD-175009
Likelihood
Low
Severity
Major
Failure Scenario
Power cycle the machine with RAID controller. Upon booting, the controller hangs the bus in the middle of a RLA
read. After hanging for 20 seconds the controller sends hard reset to the drive. As a result the drive asserts.
Root Cause
When host dropped bus in the middle of read, drive would not properly abort RLA reads because the bus was
dropped in a very specific window. If a reset is sent in that ~4 microsecond window, it could cause drive to free an
internal data pointer repeatedly and lead to assert.
Updated aborting command logic to prevent data pointers from being freed repeatedly.
CDD-175689
Likelihood
Low
Severity
Minor
Spec Violation
Failure Scenario
After a stop command is issued, an internal process may still send a write request to the read/write subsystem.
The request fails, causing data to be logged to internal error logs. The event is nonfatal and does not affect drive
operation.
Root Cause
A routine that tries to save state information to the media was not checking the state of the drive before issuing
its request to the read/write subsystem.
In the routine that tries to save state information to the media, return an error if that disc is not in a state to accept
writes, before sending the request to the read/write subsystem.
Drive incorrectly reports link reset received during data transfer error
Doc Reference
CDD-177916
Likelihood
Low
Severity
Minor
Protocol Violation
Failure Scenario
Loss of sync that would lead to a drive initiated OOB would happen at a point in time such that the drive would
later incorrectly initiate a data transfer even though the loss of sync was already detected resulting in the drive
incorrectly aborting a command and sending a link reset received during data transfer error (0B/4B/03/01) instead
of holding off the data transfer until re-synced.
Root Cause
The function called by firmware to check if the port was blocked did not include an out of sync indication and so
firmware was not blocked even though the port was in a loss of sync state at the time.
Changed the function that determines if the port is blocked in firmware to include the out of sync condition.
Pseudo Read Errors Incorrectly Counted Toward A Hardware Error SMART Trip
Doc Reference
CDD-184858
Likelihood
Low
Severity
Major
Spec Violation
Failure Scenario
A pseudo unrecovered error can be generated on an LBA by the WRITE LONG command with the COR_DIS bit
set to one and the WR_UNCOR bit set to one. When these pseudo unrecovered errors were read, the resulting
unrecovered read errors were getting counted against the Hardware Error SMART trip counter. If a number of
these errors are encountered over a defined interval of time, then a Hardware Error SMART trip would occur.
Root Cause
F/W Change Descr
FW did not adequately filter pseudo read errors from the Hardware Error SMART trip counter.
Changed FW to not increment the Hardware Error SMART counter if read error is a pseudo error that results in
an 03/11/00/83.
SDD-176401
Likelihood
Low
Severity
Major
BMS Failure
Failure Scenario
When the drive temperature is out of range (hot or cold), BMS is suspended. BMS does not restart, even when
the temperature returns to normal, until the drive is power cycled.
Root Cause
After BMS is suspended due to temperature being out of range, request to restart BMS is not issued.
If BMS is suspended due to temperature being out of range, an attempt to restart it will be performed every 5
minutes. If the temperature gets within the valid range, BMS will restart.