Documente Academic
Documente Profesional
Documente Cultură
OneFS
Version 7.2.0.0 - 7.2.0.4
Release Notes
CONTENTS
Chapter 1
Chapter 2
Upgrading OneFS
Target Code...................................................................................................10
Supported upgrade paths..............................................................................10
Chapter 3
13
CONTENTS
SmartLock........................................................................................27
SmartQuotas....................................................................................27
SMB.................................................................................................27
Chapter 4
29
Chapter 5
Resolved issues
33
CONTENTS
Chapter 6
131
CONTENTS
Chapter 7
135
Chapter 8
Known issues
141
Chapter 9
165
CHAPTER 1
OneFS Release Notes
The OneFS release notes contain information about new features, changes in
functionality, issues that are resolved, support for new hardware and firmware, and
known issues and limitations in the Isilon OneFS 7.2.0 operating system.
l
The new features, functionality changes, resolved issues, and known issues listed in the
release notes are categorized by functional area. For a list of the functional areas used to
categorize the release notes and a brief description of what each functional area typically
contains, see the Functional areas in the OneFS release notes section in the OneFS release
resources section at the end of this document.
For a list of available OneFS releases and information about target code releases and
general availability (GA) releases, see Current Isilon Software Releases on the EMC Online
Support site.
CHAPTER 2
Upgrading OneFS
OneFS upgrades comprise a full operating system upgrade and require that the Isilon
cluster be rebooted. To help ensure that the version of OneFS to which you upgrade
contains all of the resolved issues included in the version you are upgrading from,
upgrades are supported only from designated previous releases of OneFS.
Before upgrading OneFS, review the Supported upgrade paths section of this document to
verify that the cluster can be upgraded from your current version of OneFS directly to this
release.
See the OneFS Upgrade Planning and Process Guide on the EMC Online Support site for
detailed upgrade instructions and additional upgrade information.
To download the installer for this maintenance release, see the OneFS Downloads page
on the EMC Online Support site.
l
l
Target Code........................................................................................................... 10
Supported upgrade paths......................................................................................10
Upgrading OneFS
Upgrading OneFS
Target Code
OneFS 7.2.0.3 is the current 7.2.0.x target code version. A OneFS release is designated as
Target Code after it satisfies specific criteria, which includes production time in the field,
deployments across all supported node platforms, and additional quality metrics. For
information about upgrading to OneFS Target Code, see Upgrading to OneFS Target Code
on the Isilon EMC Community Network (ECN) pages.
Rolling upgrades to OneFS 7.2.0.4 are supported from the following OneFS versions:
l
Rolling upgrades to OneFS 7.2.0.3 are supported from the following OneFS versions:
10
Upgrading OneFS
11
Upgrading OneFS
12
CHAPTER 3
New features, software support, logging, and
controls
This section contains descriptions of new features, new software support, new protocol
and protocol version support, additional logging, and new controls such as commandline options and sysctl parameters.
New features enable you to perform tasks or implement configurations that were
previously unavailable.
These new features include:
l
New logging
New controls such as command options, sysctl parameters, and OneFS web
administration controls
Functionality changes include modifications and enhancements to OneFS that enable you
to perform preexisting tasks in new ways, or that improve underlying OneFS functionality
or performance. These changes also include removing support for deprecated protocols
and software.
The functionality changes documented in the release notes include:
l
Changes to enable functionality in the OneFS web administration interface that was
previously available only from the command-line interface
l
l
l
l
13
In most cases, bind-dn and bind-password must be enabled in order to use VLV.
Cluster configuration
New protection policy
To ensure that node pools made up of new Isilon HD400 nodes can maintain a data
protection level that meets EMC Isilon guidelines for meantime to data loss (MTTDL),
OneFS offers a new requested protection option, +3d:1n1d (3 drives or 1 node and 1
drive). This setting ensures that data remains protected in the event of three
simultaneous drive failures, or the simultaneous failure of one drive and one node.
This protection policy can also be applied to node pools that do not contain HD400
nodes.
Suggested protection
OneFS now includes a function to calculate a recommended protection level based
on cluster configuration. This capability is available only on new clusters. Clusters
upgraded to OneFS 7.2 do not have this capability. Although you can specify a
different requested protection on a node pool, the suggested protection level strikes
the best balance between data protection and storage efficiency. In addition, as you
add nodes to your Isilon cluster, OneFS continually evaluates the protection level
and alerts you if the cluster falls below the suggested protection level.
Node equivalency
OneFS now enables nodes of different generations to be compatible based on
certain criteria and constraints. You can specify compatibilities between Isilon S200
and similarly configured Isilon S210 nodes, and between X400 and similarly
configured X410 nodes. Nodes must have compatible RAM amounts and identical
HDD and SSD configurations. Compatibilities allow newer generation nodes to be
joined to existing node pools made up of older generation nodes. After you add
three or more newer generation nodes, you can delete the compatibility so that
OneFS can autoprovision the new nodes into their own node pools. This enables you
to take advantage of the speed and efficiency characteristics of the newer node
types in their own node pools.
14
Zone-aware ID mapping
OneFS now supports management of ID mapping rules for each access zone. ID
mapping associates Windows identifiers to UNIX identifiers to provide consistent
access control across file sharing protocols within an access zone.
File system
L3 cache stores metadata only on archive platforms
For Isilon NL400 and HD400 nodes that contain SSDs, L3 cache is enabled by default
and cannot be disabled. In addition, L3 cache stores only metadata in SSDs on
archive platforms, which feature mostly data writing events. By storing metadata
only, L3 cache optimizes the performance of write-based operations.
Hardware
Automatic drive firmware updates
OneFS now supports automatic drive firmware updates for new and replacement
drives. This is enabled through drive support packages.
Improved InfiniBand stability
The stability of back-end connections to the cluster has been improved by
addressing a number of issues that were encountered when one or more InfiniBand
switches was rebooted. In some cases, the issues that were addressed occurred if
one or more InfiniBand switches were rebooted manually. In other cases, the one or
more InfiniBand switches unexpectedly rebooted due to an issue such as a memory
leak or a race condition. If any of these issue occurred, the affected nodes typically
lost connectivity to the cluster and, in some cases, had to be manually rebooted in
order to reestablish a connection.
HDFS
Increased Hadoop support
l
Networking
Source-based routing
OneFS now supports source-based routing, which selects which gateway to direct
outgoing client traffic through based on the source IP address in each packet
header.
File system
15
NFS
NFS service improvements
OneFS incorporates a number of improvements to the NFS service, including support
of NFS v4 and NFS v3 (NFS v2 is no longer supported). Other improvements include
moving the service from the operating system kernel into userspace for increased
reliability; supporting audit features for NFS events; incorporating access zone
support for NFS clients; autobalancing across all nodes to achieve performance
parity and ensure continuous service; and the ability to create aliases to simplify
client connections to NFS exports.
OneFS API
RESTful interface for object storage
OneFS introduces Isilon Swift, an object storage application for Isilon clusters based
on the object storage API provided by OpenStack Swift. The Swift RESTful API, an
HTTP-based protocol, allows Swift clients to execute Swift API commands directly
with Isilon to execute object storage requests. Accounts, containers, and objects
that form a basis for the object storage can be accessed through the NFS, SMB, FTP,
and RAN protocols in addition to the Swift RESTful API. The following Swift RESTful
API calls are supported: GET, PUT, POST, HEAD, DELETE, and COPY.
Security
Telnet_d support disabled on upgrade
Telnet service, which was removed in OneFS 7.0.0, will stop functioning on upgrade
to 7.2.0. SSH should be used for all shell access.
SMB
Support for SMB2 symbolic links
Beginning in OneFS 7.2.0, OneFS natively supports translation of SMB2 symbolic
links. This change might affect the behavior of SMB2 symbolic links in environments
that rely on them. For more information, see article 193808 on the EMC Online
Support site.
ID
A user that attempts to connect to the cluster over SSH, through the OneFS API, or
through a serial cable, can no longer be authenticated on clusters running in
compliance mode if any of the following identifiers are assigned to the user as
either the user's primary ID or as a supplemental ID:
UID: 0
156600
SID: S-1-22-1-0
The message that is logged in the /var/log/lsassd.log file when a trusted
151058
Active Directory domain is offline now includes the name of the domain that cannot
16
ID
be reached. In the example below, <domain_name> is the name of the domain that is
offline:
[lsass] Domain '<domain_name>' is offline
File system
New and changed in OneFS 7.2.0.4
ID
If you run the stat command to view information about a file, the Snapshot ID of
the file is now included in the output. This information appears in the st_snapid
field.
147333
ID
Wear life thresholds were added for the system area on the following Sunset Cove
Plus SSD drive models:
156892
Hardware
HGST HUSMM1620ASS200
HGST HUSMM1640ASS200
HGST HUSMM1680ASS200
HGST HUSMM1680ASS205
HGST HUSMM1616ASS200
The addition of these thresholds enables OneFS to generate alerts and log events if
the wear life of the system area on these SSD drive models reaches 88 percent
(warn), 89 percent (critical), or 90 percent (smartfail).
New control:Options were added to the isi_dsp_install command to enable
you to display the version number of the most recently installed drive support
package (DSP) or to display a list of previously installed DSPs. To display the
version number of the most recently installed DSP, run the following command:
154222
isi_dsp_install --latest
File system
17
ID
The error that appears if you run the isi_dmilog command on a platform that
does not support the command was changed from
150724
to
dmilog functions not supported on this platform - please consult
'isi_hwmon -h'
For more information about the isi_hwmon command, see article 199270 on the
EMC Online Support site.
HDFS
New and changed in OneFS 7.2.0.4
ID
157860
154873
ID
The default network flow control setting for Isilon nodes that contain Intel network
interface cards (ixgbe NICs) was changed. The default flow control setting is now 1.
The ixgbe NIC can receive pause frames but does not send pause frames. This
configuration is consistent with Isilon nodes that contain Chelsio NICs.
151707
Networking
Note
Ethernet flow control in a full-duplex physical link provides a mechanism that will
allow an interface or switch to request a short pause in frame transmission from a
sender by issuing a media access control (MAC) control message and PAUSE
specification as described in the 802.3x full-duplex supplement standard.
Security
New and changed in OneFS 7.2.0.4
ID
On clusters running in compliance mode, you can no longer run the su command to 157417
assume the privileges of a user with root-level (UID=0) access to the cluster. If you
attempt to run the su command to assume the privileges of a user with root-level
privileges, the following message appears on the console:
su: UID 0 denied by compliance mode
18
ID
Note
154655
The version of OpenSSL that is installed on the cluster was updated to version
0.9.8.zg.
145892
ID
156585
The pre-upgrade check can be run alone, or as part of the upgrade process. In
either case, if the configuration size of an SMB share exceeds the maximum size
allowed , a message similar to the following appears on the console during the preupgrade check:
Error: The 'share_name' share has too many access permissions and
it cannot be upgraded.
The suggested resolution for this issue is:
1. Remove those users from the share permissions.
2. Add those users to a group.
3. Add that group to the share permissions.
4. Retry the upgrade.
If the pre-upgrade check detects that the configuration size of an SMB share
exceeds the maximum size allowed when it is running as part of the default
upgrade process, the pre-upgrade check portion of the upgrade completes,
however the OneFS upgrade is not started, and a message similar to the following
appears on the console, and in the SMB upgrade log file located in the /
ifs/.ifsvar/tmp directory:
Error: The 'share_name' share has too many access permissions and
it cannot be upgraded.
The suggested resolution for this issue is:
1. Remove those users from the share permissions.
2. Add those users to a group.
3. Add that group to the share permissions.
4. Retry the upgrade.
Under these conditions, the upgrade process cannot be completed until the SMB
share configuration information is reduced in size. In most cases, this can be
accomplished by following the resolution suggested during the pre-upgrade check.
19
ID
If you encounter this limitation and cannot reduce the size of the SMB configuration
information by following these steps, contact EMC Isilon Technical support for
assistance.
Note
Prior to the addition of this check, if the configuration size of an SMB share on a
cluster that was being upgraded to OneFS 7.1.0 or later exceeded the maximum
size allowed, some of the share information might not have been preserved during
the upgrade process, and an error similar to the following might have appeared in
the /var/log/isi_gconfig_d.log file:
Update error: value for key 'share_name' has size (12324) greater
than max allowed value size (8192)
Although the isi pkg command was not intended to be used to install a drive
support package (DSP), it was possible to install a DSP by running the isi pkg
command. If a DSP was installed using the isi pkg command, the cluster might
have exhibited unexpected behavior until the DSP was removed.
Beginning in OneFS 7.2.0.4, if you attempt to install a DSP using the isi pkg
command, the installation fails and a message similar to the following appears in
the /var/log/isi_pkg log file:
153429
ID
150558
Hardware
20
ID
A new version of the QLogic BXE driver was incorporated into this release.
152083
ID
Adds a check to the OneFS software event 400120001 to detect boot drives that
are missing mirror components.
145967
Improves the node format command so that the progress of the node format
operation is reported in percentage complete. Prior to this change, dots were
displayed on the console until the operation was complete.
142241
Removed redundant requests for a node's sensor data from the isi_hw_status
command, to improve the response time on A100, S210, X410, and HD400 nodes.
142147
ID
153925
Support for the HDFS truncate remote procedure call was added.
143461
140053
ID
Support for PTR record lookup for SmartConnect zone member addresses was
added.
149662
New control:
The following parameters were added to the isi networks command:
145012
HDFS
Networking
--disable-dns-tcp-support
--enable-dns-tcp-support
The first parameter can be used to enable TCP support for SmartConnect; the
second parameter can be used to disable TCP support for SmartConnect. By
default, TCP support is enabled and SmartConnect works as expected. If TCP
support is disabled, SmartConnect doesn't listen for TCP connections on the DNS
port (53), and clients that attempt a DNS query over TCP receive a connection
refused error.
Security
New and changed in OneFS 7.2.0.3
ID
The version of Apache that is installed on the cluster was updated to version
2.2.29. For more information, see ESA-2015-093 on the EMC Online Support site.
136994
HDFS
21
SMB
New and changed in OneFS 7.2.0.3
ID
149826,
149776
The default logging level for the srvsvc process was changed from WARNING to
INFO.
The user name and domain name for the user performing an action is logged in
the /var/log/srvsvcd.log file, in addition to the SID.
An example of the new logging output appears below, where <USER SID info> is the
name and SID of the user and <SMB_SHARE_NAME> is the name of the share:
Log level changed to INFO
DOMAIN_NAME\USER_NAME <USER SID info> set info on share
SMB_SHARE_NAME
DOMAIN_NAME\USER_NAME <USER SID info> deleted share SMB_SHARE_NAME
149777
Note
Support for the SMB 2 QFid SMB CREATE Request value allows a file opened from
an SMB share to be temporarily cached on an SMB 2 client, reducing some network
traffic associated with opening and closing the file.
ID
The MCP virus_scan parameter was added to the isi_rpc_d configuration file.
142083
ID
The number and type of actions that are logged when a machine password change
triggers a configuration update were increased. Beginning in OneFS 7.2.0.2, if a
machine password is updated, the following activities are logged:
138759
Authentication
22
The time at which an lsass thread starts the machine password update
ID
Cluster configuration
New and changed in OneFS 7.2.0.2
ID
Logging was added to help identify issues that are caused by applying restrictive
permissions to the /usr/share/zoneinfo directory or its subdirectories.
138729
Note
File system
New and changed in OneFS 7.2.0.2
ID
NEw control:
141959
Cluster configuration
23
ID
The --reserved option was added to the isi get command, and the isi get
command was modified so that it runs only on specific, reserved logical inodes
(LINs) when the command is run with both the --reserved option and the -L
option.
Logging similar to the following was added to the /var/log/messages file if the 139667
NVRAM journal cannot be read:
Bad type: 0
Logging was added to improve diagnosis of issues that can occur if a necessary
OneFS python file fails to load. If this condition is encountered, a message similar
to the following appears in the /var/log/messages file where <python_file> is
the name of the python file that failed to load:
138733
Note
In addition to the messages described above, if you run the isi stat or if you
run the isi events list -w command, a bad marshal error appears on
the console. If you encounter the issue that this new logging is intended to help
diagnose, contact EMC Isilon Technical Support for assistance. For more
information about this issue, see article 197403 on the EMC Online Support site.
HDFS
New and changed in OneFS 7.2.0.2
ID
145759
Support for the getEZForPath and checkAccess HDFS RPC calls was added.
142558,
140040
Note
In previous versions of OneFS, if an HDFS client sent a request to the HDFS server
that contained one of these RPC calls, the call failed, and messages similar to the
following were returned to the client:
org.apache.hadoop.ipc.RemoteException
(org.apache.hadoop.ipc.RpcNoSuchMethodException):
Unknown rpc: getEZForPath and
org.apache.hadoop.ipc.RemoteException
(org.apache.hadoop.ipc.RpcNoSuchMethodException):
Unknown rpc: checkAccess
24
140051
Security
New and changed in OneFS 7.2.0.2
ID
The version of GNU bash installed on the cluster was updated to version 4.1.17. For 143337
more information, see ESA-2014-146 on the EMC Online Supprot site.
User input that is passed to a command line is now escaped using quotation
marks. For more information, see ESA-2015-112 on the EMC Online Support site.
140931
137884
137111
Adds a setting to the OneFS registry that enables you to configure the maximum
amount of memory that can be allocated to the lsass process.
134439
Note
Without this setting, the maximum amount of memory that can be allocated to the
lsass process is set to a default of 512 MB. If the system approaches that limit,
LDAP connections are closed, and the following lines appear in the lsassd.log
file:
Error code
Retrying.
Error code
Retrying.
Error code
Retrying.
Error code
Retrying.
Error code
Retrying.
Work with EMC Isilon Support to determine whether you need to configure the
amount of memory allocated to the lsass process. The memory limit must be at
least 512 MB, and no more than 1024 MB. If the memory limit is set outside that
range, the system will restore the default value of 512 MB.
For more information, see article 195564 on the EMC Online Support site.
Security
25
Cluster configuration
New and changed in OneFS 7.2.0.1
Updates the time zone database that OneFS relies on when you configure the
cluster time zone to Time Zone Data v. 2014h. This database is made available by
the Internet Assigned Numbers Authority (IANA).
135492
Diagnostic tools
New and changed in OneFS 7.2.0.1
New control:
The following options were added to the isi_gather_info command:
l
135226
Note
dump refers to files that are logged when the node stops responding, and core
refers to files that are logged when the node unexpectedly restarts.
File transfer
New and changed in OneFS 7.2.0.1
The throughput calculation performed by the vsftpd process was improved so that
the total throughput perceived by FTP clients is more precisely controlled by
configuring the local_max_rate option in the /etc/mcp/templates/
vsftpd.conf file.
134432
Note
Prior to implementing this fix, after configuring the local_max_rate option, the
total throughput perceived by FTP clients was lower than expected.
HDFS
New and changed in OneFS 7.2.0.1
26
138484
133358
Security
New and changed in OneFS 7.2.0.1
ID
137904
The versions of the Network Time Protocol daemon (NTPD) and Apache, were
updated as follows:
137895
The version of Apache that is installed on the cluster was updated from 2.2.21
to 2.2.25
The version of NTPD that is installed on the cluster was updated from 4.2.4p4
to 4.2.6p5
The version of ConnectEMC installed on the cluster was updated from version
3.2.0.4 to 3.2.0.6. This upgrade changes the behavior of the ConnectEMC
component so that it no longer uses an internal version of OpenSSL and instead
relies on the version of OpenSSL installed on the Isilon cluster. For more
information, see ESA-2015-038 on the EMC Online Support Site.
134760
SmartLock
New and changed in OneFS 7.2.0.1
Adds commands to the sudoers file, which is a file that defines which commands
a user with sudo privileges is permitted to run. These additional commands enable
EMC Isilon Technical Support staff to troubleshoot clusters that are in compliance
mode.
133285
SmartQuotas
New and changed in OneFS 7.2.0.1
New control:
The efs.quota.allow_remote_root sysctl parameter was added to allow a
root user who is connected to the cluster remotely to make changes to files and
directories within a SmartQuota domain, even if those changes would exceed or
further exceed the quota domains hard threshold.
131283
For more information about sysctls, see article 89232 on the EMC Online Support
site.
SMB
New and changed in OneFS 7.2.0.1
New control:
136296
Security
27
Open an SSH connection on any node in the cluster and log on using the root
account.
2.
Run the following command from the command line where <max_buffer> is the
desired maximum buffer size: isi_gconfig
registry.Services.lwio.Parameters.Drivers.srv.MaxBuffer
SizeSMB1=<max_buffer>
Note
For optimal interoperability with Kazeon, the maximum buffer size should be set to
16644.
28
134448
CHAPTER 4
New hardware and firmware support
The following sections list new support for hardware and firmware revisions that was
added in the specified OneFS releases.
l
l
l
l
l
29
Model
Number
Drive Type
Compatible
Nodes
Firmware
IQ108NL, NL400,
S200, X200, X400
Ver7.02k or
Ver7.02w
Model
Number
Drive Type
Compatible
Nodes
Firmware
HGST
HUSMM1680A
SS205
SED SSD
HD400, NL400,
X200, X400, X410
D252
SED SSD
HDD
X200, X400,
NL400, IQ 108NL,
IQ 108000X
HGST
HUA723030AL
A640
MKAOA580
Model
Number
Drive Type
Compatible
Nodes
Firmware
HGST
HUSMM1680A
SS205
SED SSD
X200, X400,
NL400, X410,
HD400
D252
30
Hardware
Model
Number
Drive Type
Compatible
Nodes
HGST
HUSMM1616A
SS200
SSD
HGST
HUSMM1680A
SS200
SSD
HGST
HUSMM1640A
SS200
SSD
HGST
HUSMM1620A
SS200
SSD
HDD
Firmware
X410, S210,
HD400
S8FM08.0
HD 400, NL400
1EZ
SATA
X200, X400,
NL400,IQ72000X,
IQ72NL
MFAOABW0
SATA
X200, X400,
NL400, IQ108NL
MFAOABW0
SATA
X400, NL400
MFAOABW0
31
32
CHAPTER 5
Resolved issues
Resolved issues
33
Resolved issues
ID
The OneFS web administration interface did not list any files in the Detected
Threats section of the Antivirus > Reports page if any ASCII special characters
for example, an ampersand (&)were in the path name of any infected file.
153117
The OneFS antivirus client could not connect to some ICAP servers if the ICAP URL
that you configured on the cluster was not in the following format:
144726
icap://<hostname>:<port>/avscan
Authentication
Authentication issues resolved in OneFS 7.2.0.4
ID
A local user who did not have root privileges could not change their password by
running the UNIX passwd command. As a result, if an affected users password
expired, they were unable to log on to the cluster until the password was reset
through another method.
155570
If an SMB client sent a request to apply an invalid security identifier (SID) to a file or 154257
directory on the cluster, the cluster returned a STATUS_IO_TIMEOUT response.
Depending on the application that was used to send the request, a message similar
to the following might have appeared on the client:
The specified network name is no longer available
If the cluster was not joined to a Microsoft Active Directory (AD) domain, and you
attempted to change the access control list (ACL) of a file on the cluster from a
Windows client, the operation failed, and a message similar to the following
appeared on the client:
150915
Under these conditions, ACLs could only be modified through the OneFS commandline interface.
34
ID
While synchronizing data between source and target clusters in compliance mode,
if the file flags applied to a file on the source cluster differed from the file flags
assigned to the file on the target cluster, SyncIQ attempted to update the file
attributes of WORM committed files on the target cluster even if the retention date
157106
Resolved issues
ID
for those files had not yet passed. As a result, the synchronization failed. If this
issue occurred, lines similar to the following appeared in the /var/log/
messages file:
bam_ads_setmode error: 30Local error : syncattr error for
<path_to_WORM_file>:
chfal
During an initial SyncIQ data replication, Access Control Lists (ACLs) applied to
155965
symbolic links, pipes, block devices, and character devices were not replicated
from the SyncIQ source cluster to the SyncIQ target cluster. As a result, following an
initial synchronization, applications and users were prevented from accessing
these file system objects and were also prevented from accessing files and
directories on the cluster through symbolic links.
When performing an NDMP restore, OneFS verifies the end of the data stream by
155782
detecting two consecutive blocks of zeroes. In rare cases, in OneFS 7.2.0.0 through
7.2.0.3, if the second block of zeroes was stored in a different buffer than the first
block of zeroes, OneFS did not read the second block of zeroes from the other
buffer, and instead read the data that followed the first block of zeroes. If this
occurred, the restore operation was immediately stopped, and data that was in the
process of being restored might have been incompletely restored.
This issue did not occur if the RESTORE_OPTIONS NDMP environment variable was
set to 1, specifying that a single-threaded restore operation be performed.
Note
154830
Stack: ------------------------------------------------/lib/libc.so.7:__sys_kill+0xc
/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0
/usr/lib/libisi_migrate_private.so.2:get_lmap_name+0x54
/usr/bin/isi_migr_sworker:work_init_callback+0xacd
/usr/bin/isi_migr_sworker:old_work_init4_callback+0x16f
/usr/lib/libisi_migrate_private.so.2:generic_msg_unpack+0x8bc
/usr/lib/libisi_migrate_private.so.2:migr_process+0x2f1
/usr/bin/isi_migr_sworker:main+0xafa
/usr/bin/isi_migr_sworker:_start+0x8c
-------------------------------------------------/boot/kernel.amd64/kernel: pid 24302 (isi_migr_sworker), uid 0:
exited on signal 6 (core dumped)
Note
Starting in OneFS 7.2.0.4, the following message will appear in the /var/log/
isi_migrate.log file:
Source version unsupported. 'sync_id' must contain a valid policy
id.
154326
35
Resolved issues
ID
policies stopped running, and the following message appeared in the /var/log/
isi_migrate.log file:
Cannot allocate memory
154311
154250
Although multiple IPv4 and/or IPv6 addresses were defined, NDMP listened to only
one IPv4 and/or one IPv6 IP address. For example:
154248
If a node had multiple IPv4 addresses defined, NDMP listened to only one IPv4
address.
If a node had multiple IPv6 addresses defined, NDMP listened to only one IPv6
address.
If a node had both IPv4 addresses and IPv6 addresses defined, NDMP listened
to only one IPv4 address and only one IPv6 address.
During a snapshot-based incremental backup, a Write Once Read Many (WORM) file 154246
might have been backed up as a regular file. If this occurred, and the files were
restored, the files were restored as regular files, and they could have been modified
after they were restored.
36
If the isi_ndmp_d process was stopped, the NDMP process ID file was still locked
by one or more NDMP child processes. As a result, the mcp process could not
restart the isi_ndmp_d process, and no new NDMP connections could be
established. If this occurred, a Failed to spawn NDMP daemon message
appeared in the /var/log/isi_ndmp_d.log file.
154244
If you queried for the date on which a SyncIQ policy would next be run using the
next_run OneFS API property, the date and time that was returned was incorrect.
154211
Resolved issues
ID
This was not an issue if the tape drive was used only by Isilon backup accelerators.
While a SyncIQ policy was running, if a SyncIQ primary worker (pworker) process on 153446
the source cluster sent a list of directories to delete to a secondary worker
(sworker) on the target cluster, and then the pworker process unexpectedly
stopped, the pworker's work range was transferred to another pworker. The other
pworker then sent the list of directories to another sworker. This action resulted in
two sworker processes on the target cluster trying to delete the same directory at
the same time. If this issue occurred, the SyncIQ job stopped, and lines similar to
the following appeared in the /var/log/messages file:
/boot/kernel.amd64/kernel:
[kern_sig.c:3349](pid 70="isi_migr_sworker")(tid=2) Stack
trace:
/boot/kernel.amd64/kernel:
Stack:-------------------------------------------------/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:move_dirents
+0x1b6
/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:delete_lin
+0x279
/boot/kernel.amd64/kernel: /usr/bin/
isi_migr_sworker:delete_lin_callback+0x143
/boot/kernel.amd64/kernel: /usr/lib/libisi_migrate_private.so.
2:generic_msg_unpack+0x8bc
/boot/kernel.amd64/kernel: /usr/lib/libisi_migrate_private.so.
2:migr_process+0x2f1
/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:main+0xa18
/boot/kernel.amd64/kernel: /usr/bin/isi_migr_sworker:_start+0x8c
/boot/kernel.amd64/kernel:
-------------------------------------------------/boot/kernel.amd64/kernel: pid 70 (isi_migr_sworker), uid 0:
exited on signal 10 (core dumped)
If a SyncIQ policy designated a target directory that was nested within the SyncIQ
target directory of a pre-existing policy, an error occurred during SyncIQ protection
domain creation which caused the SyncIQ policy's protection domain to be
incomplete. If this occurred, the following message appeared in the /var/log/
isi_migrate.log file:
153444
In addition, if you ran the isi domain list -lw command, the Type field for
the affected SyncIQ target was marked Incomplete.
37
Resolved issues
ID
If you ran a full SyncIQ data replication to a target directory that contained a large
153437
number of files that no longer existed in the source directory, it was possible for the
process that removes extra files from a target directory to conflict with the process
that created the domain for the target directory. If this occurred, the SyncIQ job
failed and had to be restarted.
If the --skip_bb_hash option of an initial SyncIQ policy was set to no (the
153377
default setting), and if a SyncIQ file split work item was split between pworkers, the
pworker that was handling the file split work item might have attempted to transfer
data that had already been transferred to the target cluster. If this occurred, the
isi_migr_pworker process repeatedly restarted and the SyncIQ policy failed. In
addition, the following lines appeared in the /var/log/messages file:
isi_migrate[45328]: isi_migr_pworker: *** FAILED ASSERTION
cur_len != 0 @ /usr/src/isilon/bin/isi_migrate/pworker/
handle_dir.c:463:
/boot/kernel.amd64/kernel: [kern_sig.c:3376](pid
45328="isi_migr_pworker")(tid=100957)
Stack trace:/boot/kernel.amd64/kernel: Stack:
-------------------------------------------------/boot/kernel.amd64/kernel:
/lib/libc.so.7:__sys_kill+0xc
/boot/kernel.amd64/kernel
/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:migr_continue_file+0x1507
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:migr_continue_generic_file+0x9a
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:migr_continue_work+0x70
/boot/kernel.amd64/kernel:
/usr/lib/libisi_migrate_private.so.2:migr_process+0xf
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:main+0x606
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:_start+0x8c
/boot/kernel.amd64/kernel:
-------------------------------------------------/boot/kernel.amd64/kernel: pid 45328 (isi_migr_pworker), uid
0:exited on signal 6 (core dumped)
If a SyncIQ job was interrupted during the change compute deletion phase
(STF_PHASE_CC_DIR_DEL), the Logical Inodes (LINs) could have been incorrectly
removed from the SyncIQ job work list. If this occurred, the SyncIQ job failed, and
messages similar to the following appeared in the /var/log/
isi_migrate.log file:
150613
38
If you viewed the details of a snapshot alias in the OneFS web administration
interface, the Most Recent Snapshot Name was always No value, and the
Most Recent Snapshot ID was always 0.
145938
If you started a restartable backup with a user snapshot, after the backup was
completed and the BRE context was removed, the expiration time of the snapshot
was changed. As a result, the snapshot might have been deleted prematurely.
144427
Resolved issues
Cluster configuration
Cluster configuration issues resolved in OneFS 7.2.0.4
ID
154322
As a result, only one node could be excluded from contacting an external NTP
server.
Diagnostic tools
Diagnostic tools issues resolved in OneFS 7.2.0.4
ID
Because the following ESRS log files were not listed in the newsyslog.conf file
a configuration file that manages log file rotationover time the files could have
grown in size and could have filled the /var partition:
/var/log/GWExt.log
154107
/var/log/GWExtHTTPS.log
Note
If the /var partition on a node in the cluster is 90% full, OneFS logs an event
warning that a full /var partition can lead to system stability issues. Depending on
how the cluster is configured, an alert might also be issued for this event.
When EMC Secure Remote Services (ESRS) was configured on the cluster, the ESRS
process automatically selected the first available IP address, rather than selecting
an IP address from an IP address pool in the System access zone. Since only the
System zone allows a user SSH access for remote management, if the selected IP
address was not in the System access zone, EMC Isilon Support could not monitor
the cluster remotely.
153455
ID
Because isi_rest_server, a component of the Platform API, did not check for the
156400
correct error codes when interacting with the OneFS auditing system's queue
producer library (QPL), if configuration auditing was enabled and there was an error
in the QPL, the error was not handled correctly. If this issue occurred, it might have
prevented system configuration changes from being audited.
If auditing is enabled, the audit filter waits for a response from the queue producer
library (QPL) before sending audit events to the auditing process (isi_audit_d).
In OneFS 7.2.0.0 through 7.2.0.3, if the QPL became disconnected from the
auditing process, isi_audit_d, while the auditing process was waiting for a
response, the QPL failed to send a response to the auditing process. If this
156398
Cluster configuration
39
Resolved issues
ID
occurred, auditing events continued to collect in the auditing process until the
queue became full. If the auditing process queue became full, processes related to
events that were being auditedfor example, processes related to file system
protocols and configuration changesmight have stopped working. Depending on
which related processes were affected, various cluster operations could have been
disrupted by this issuefor example, if configuration auditing was enabled, you
might have been prevented from making configuration changes through the OneFS
web administration interface.
Under some circumstances, multiple isi_papi_d process threads might have called
the same code at the same time. If this occurred, the isi_papi_d process might
have unexpectedly restarted.
154324
If file system auditing was enabled and you configured the system to audit events
in which a user renamed a file, if the user renamed the file from a Mac client
connected to the cluster through a virtual private network (VPN), the complete path
to the file was not always captured in the audit log. If this occurred, applications
that relied on the file paths in the audit logs might have been adversely affected.
Beginning in OneFS 7.2.0.4, if as user attempts to rename a file and the complete
file path to the renamed file is not captured in the audit log, the file is not renamed
and an error appears in the audit log.
153463
Only the root user was permitted to run the isi_audit_viewer command. This
limitation prevented other usersincluding users with sudo privilegesfrom
viewing configuration audit logs and protocol audit logs on the cluster.
153439
If you enabled auditing on the cluster, only nodes that had the primary external
interface (em0) configured could communicate with the Common Event Enabler
(CEE) server, even if a secondary interface, such as em1, was configured and active
on the node. As a result, the audit logs from these nodes were not collected on the
CEE server.
153432
If you configured OneFS to send syslog messages to a remote syslog server, the
HOSTNAME of the cluster was not included in the messages. The absence of the
HOSTNAME entry made it difficult to distinguish messages sent from multiple
clusters to the same syslog server.
153417
Because the OneFS auditing system did not correctly convert a POSIX path with
150920
multiple path separators (/) into a Microsoft UNC path, if NFS protocol auditing was
enabled, incorrect paths could have been recorded in the audit log and
applications that rely on the information in the audit log might have been adversely
affected.
If file system protocol auditing was enabled and a client opened a parent directory
and then opened a subdirectory or file within the parent directory, the auditing
system might have incorrectly appended the subdirectory or file path to the parent
directory path. If this occurred, the incorrect path might have caused an error in the
auditing process and file system protocol events that were in the process of being
logged might not have been captured. If the incorrect path was logged,
applications that relied on file paths in the audit log might have been adversely
affected.
40
150918
Resolved issues
File system
File system issues resolved in OneFS 7.2.0.4
ID
If a node ran for more than 497 days without being rebooted, an issue that affected 158417
the OneFS journal buffer sometimes disrupted the drive sync operation. If this issue
occurred, OneFS reported that the journal is full, and as a result, resources that are
waiting for a response from the journal enter a deadlock state. Any cluster that
contains a node that has run for more than 497 consecutive days with no downtime
might unexpectedly reboot as a result of this issue.
For more information, see ETA 202452 on the EMC Online Support site.
If a node ran for eight months or longer without a reboot and the nodes internal
157489
clock rolled over, the universal memory allocator (UMA) processed an invalid value,
which prevented the UMA from reclaiming any of the memory it had allocated. If
this issue occurred, the affected node might have run out of memory, causing the
node to unexpectedly reboot.
On a compliance mode cluster, if either the retention period or the DOS Read
Only flag that was applied to a file on a SyncIQ source cluster was changed after
the initial synchronization, subsequent incremental SyncIQ jobs failed, and
messages similar to the following appeared in the /var/log/messages file,
where <path> was the path to the file on the target cluster:
156270
If OneFS was not mounted on a node and you ran the isi_flush --l3-full
command on that node, the node restarted unexpectedly and messages similar to
the following appeared in the /var/log/messages file:
154264
Stack: -------------------------------------------------kernel:trap_fatal+0x9f
kernel:trap_pfault+0x386
kernel:trap+0x303
efs.ko:mgmt_finish_super+0x4e
File system
41
Resolved issues
ID
efs.ko:l3_mgmt_nuke+0x70
efs.ko:sysctl_l3_nuke+0xcb
kernel:sysctl_root+0x132
kernel:userland_sysctl+0x18f
kernel:__sysctl+0xa9
kernel:isi_syscall+0x39
kernel:syscall+0x28b
--------------------------------------------------
If you attempted to smartfail multiple nodes that were holding user locks, the lock 153436
was held by LK client entries but not present in lock failover (LKF) entries. As a
result of this inconsistency, future lock attempts failed, and a manual release of the
lock was required to grant the desired access.
If you exceeded the number of recommended snapshots on a cluster, nodes in the
cluster might have rebooted unexpectedly. If this issue occurred, lines similar to
the following appeared in the /var/log/messages file:
152660
/boot/kernel.amd64/kernel:
Stack:-------------------------------------------------/boot/kernel.amd64/kernel:kernel:isi_assert_halt+0x42
/boot/kernel.amd64/kernel:efs.ko:pset_resize+0x107
/boot/kernel.amd64/kernel:efs.ko:pset_add+0x50
/boot/kernel.amd64/kernel:efs.ko:bam_data_lock_get_impl+0x1c8
/boot/kernel.amd64/kernel:efs.ko:bam_data_lock_get+0x2b
/boot/kernel.amd64/kernel:
efs.ko:ifm_read_op_init+0xa8
/boot/kernel.amd64/kernel:efs.ko:bam_mark_file_data+0xfd
/boot/kernel.amd64/kernel:efs.ko:ifs_mark_file_data+0x373
/boot/kernel.amd64/kernel:efs.ko:_sys_ifs_mark_file_data+0x166
/boot/kernel.amd64/kernel:kernel:isi_syscall+0x53
/boot/kernel.amd64/kernel:kernel:syscall+0x1db
/boot/kernel.amd64/kernel:-------------------------
If you ran a SmartPools job on a file with an alternate data stream (ADS), the job
sometimes failed, and continued to fail, even if the job was manually started. If the
SmartPools job failed for this reason, the SmartPools process eventually stopped
running scheduled jobs, and this might have caused node pools to become full,
degrading cluster performance. If this occurred, the SmartPools job reported an
error similar to the following in the job history report:
151619
In some environments, where there was a heavy workload on the cluster, a node
could run out of reserved kernel threads. This condition could have caused the
node to restart unexpectedly. If this iisue occurred, client connectivity to that node
was interrupted, and lines similar to the following appeared in the /var/log/
messages file:
panic @ time 1422835686.820, thread 0xffffff0248243000: ktp: No
reserved threads left
cpuid = 6
Panic occurred in module efs.ko loaded at 0xffffff87b7c84000:
Stack: -------------------------------------------------efs.ko:ktp_assign_reserve+0x29f
efs.ko:dfq_reassign_cb+0x9b
kernel:_sx_xlock_hard+0x276
kernel:_sx_xlock+0x4f
efs.ko:lki_unlock_impl+0x306
efs.ko:lk_unlock+0xbe
efs.ko:bam_put_delete_lock_by_lin+0x36
efs.ko:_bam_free_free_store+0x34
42
143399
Resolved issues
ID
efs.ko:dfq_service_thread+0x139
efs.ko:kt_main+0x83
kernel:fork_exit+0x7f
Hardware
Hardware issues resolved in OneFS 7.2.0.4
ID
In rare cases, a failing dual in-line memory module (DIMM) caused a burst of
correctable error correcting code (ECC) errors. If this burst of errors was extreme
for example, if it occurred tens of thousands of times per hourthe performance of
the node and the cluster might have been degraded. If this issue occurred, a
message similar to the following appeared tens of thousands of times per hour in
the /var/log/messages file and on the console:
156345
154596
This issue occurred because earlier versions of OneFS, and earlier versions of the
firmware package did not recognize PSU part number 071-000-022-00.
Note
Hardware
43
Resolved issues
ID
If a node with a LOX NVRAM card was unable to communicate with the NVRAM card
because the NVRAM card controller was unexpectedly reset, the cluster became
unresponsive to all client requests and data on the cluster was unavailable until
the affected node was rebooted.
153693
Note
Beginning in OneFS 7.2.0.4, if this issue is encountered, the affected node will be
rebooted automatically to prevent the cluster from becoming unresponsive.
HDFS
HDFS issues resolved in OneFS 7.2.0.4
ID
156921
http://isilon_ip:8082/webhdfs/v1/?op=GetFileStatus&user.name=root
If multiple threads attempted to simultaneously update the stored list of blocked IP 156306
addresses, the HDFS service restarted and client sessions were disconnected. The
service was automatically restored after a few seconds.
Because the WebHDFS CREATE operation does not explicitly instruct the system to
create parent directories, if OneFS received a WebHDFS request to create a file or
directory within a parent directory that did not yet exist, the request failed.
Beginning in OneFS 7.2.0.4, OneFS will automatically create parent directories if it
receives a WebHDFS create request that requires them.
154404
ID
154335
Migration
Note
You might still encounter this issue if you restart an isi_vol_copy migration of a
single, large file three or more times.
44
Resolved issues
Networking
Networking issues resolved in OneFS 7.2.0.4
ID
154455
and down did not resolve the issue and the node had to be rebooted to reestablish
the link.
In some cases, the Mellanox InfiniBand driver waited for a hardware status register
to be cleared, which caused the driver to enter a read and retry loop. If the retry
loop timed out, the driver attempted to print out a significant amount of system
data three times. Since printing the system data output was enabled by default,
and because there was a significant amount of data to be processed, the driver
eventually triggered several Software Watchdog time outs. After five of these time
outs, the software watchdog rebooted the affected node and the following lines
appeared in the /var/log/messages file:
153425
Note
Beginning in OneFS 7.2.0.4, the system data is not printed by default, allowing the
read and retry loop to complete more quickly, and minimizing the chance of a
software watchdog time out events.
If Source Based Routing (SBR) was enabled on the cluster, client connections that
were handled by SBR were disconnected if the MAC address (ARP entry) for the
relevant subnet gateway expired. This issue occurred because nodes in the cluster
150647
Networking
45
Resolved issues
ID
did not send an ARP request to refresh the MAC address and, as a result, attempted
to send network traffic to an incorrect destination MAC address for the gateway
Note
NFS
NFS issues resolved in OneFS 7.2.0.4
ID
If an NFS operation failed because the NFSv3 client that attempted to perform the
operation did not have adequate access permissions and then the same NFSv3
client sent a request for file system information, the NFS server unexpectedly
restarted and an error message similar to the following was logged in the in
the /var/log/nfs.log file:
156109
If all of the following conditions were met, users connected to an NFS export
received Permission denied errors when they attempted to access file system
objects to which they should have had access:
46
The --map-lookup-uid option was enabled (set to yes) for the affected
NFS export.
The group owner of the affected file system object was one of the user's
supplemental groups rather than the user's primary group.
154927
Resolved issues
ID
This issue occurred because, when the lookup for the user's UID failed, OneFS did
not correctly apply supplemental group permissions to the user. As a result, the
user was denied access to the file system object.
If an NFSv3 or NFSv4 client attempted to move a subdirectory from one directory to
another within a parent directory to which a directory SmartQuota was applied, the
file could not be moved and messages similar to the following appeared on the
console:
154910
OR
cannot move `directory_name1' to
error
`directory_name2: Input/output
If an NFS client attempted to send an NLM asynchronous request to lock a file and
received an error in response to the request, a socket was opened but was not
closed. Over time, it was possible for the maximum number of open sockets to be
reached. If this occurred, processes could not open new sockets on the affected
node. As a result, affected nodes might have been slow to respond to file lock
requests, or lock requests sent to an affected node might have timed out. If lock
requests timed out, NFS clients could have been prevented from accessing files or
applications on the cluster.
153453
If NFSv4 clients mounted NFS exports on the cluster through NFS aliases, it was
152337,
possible to encounter a race condition that caused the NFS service to unexpectedly 151697
restart. This issue was more likely to occur when many NFSv4 clients were
simultaneously mounting exports through NFS aliases. If this race condition was
encountered, the NFS service on the affected node unexpectedly restarted, NFS
clients connected to the node might have been disconnected, some NFS clients
NFS
47
Resolved issues
ID
might have been prevented from mounting an export, and the following lines
appeared in the /var/log/messages file:
/lib/libc.so.7:thr_kill+0xc
/usr/likewise/lib/lwio-driver/nfs.so:NfsAssertionFailed+0xa4
/usr/likewise/lib/lwio-driver/nfs.so:Nfs4OpenOwnerAddOpen+0x112
/usr/likewise/lib/lwio-driver/nfs.so:NfsProtoNfs4ProcOpen+0x2567
/usr/likewise/lib/lwio-driver/nfs.so:NfsProtoNfs4ProcCompound
+0x5fe
/usr/likewise/lib/lwio-driver/nfs.so:NfsProtoNfs4Dispatch+0x43a
/usr/likewise/lib/lwio-driver/nfs.so:NfsProtoNfs4CallDispatch+0x3e
/usr/likewise/lib/liblwbase.so.0:SparkMain+0xb7
150347
OneFS API
OneFS API issues resolved in OneFS 7.2.0.4
ID
In OneFS, a numeric request ID is included in API client requests that are generated 157487
by a script or application that relies on the isi.rest python module to
communicate with the OneFS API. Because, after generating 1431 request IDs, the
formula that was used to generate the API request ID generated an ID of zero, which
is an invalid value, the next API request failed.
The impact of the failed request depended on how the application or script that
sent the request was designed to handle this type of failure. If the request was
retried, a new request ID was generated and the request succeeded.
ID
In the OneFS web administration interface, if the path to the shared directory for an
SMB share was long enough to exceed the width of the SMB shares page, the
shared directory Edit link was sometimes not visible.
144423
Note
The Edit link was accessible if you used the Tab key to move to the link.
48
Resolved issues
SmarQuotas
SmartQuotas issues resolved in OneFS 7.2.0.4
ID
If you edited the usage limits of an existing directory quota in the OneFS web
administration interface, the Show Available Space as: Size of hard
154331
threshold and Size of cluster options were missing from the Set a hard
limit section. This issue occurred if you chose the Size of cluster option
when you created the directory quota with a hard limit.
If a SmartQuota threshold was exceeded and then files were moved or deleted to
149570
correct the issue, an alert was sometimes sent after the issue was corrected, even
though the threshold was no longer exceeded. If this occurred, a false alert similar
to the following was generated, where /ifs/<path> was the path of the directory
that temporarily exceeded the configured threshold:
Your root quota under /ifs/<path> has been exceeded.
Your quota is 12 TB,and 6.7 TB is in use. You must delete files
to bring usage below 12 TB before you can create or modify files.
Please clean up and free some disk space.
SMB
SMB issues resolved in OneFS 7.2.0.4
ID
If an SMB share on the cluster was configured with the Impersonate Guest
security setting set to Always, and if a large number of SMB sessions to the share
were being opened and closed, an extra cred file was opened for each SMB
session. However, when the SMB session ended, the extra cred file was not
correctly closed and, over time, it was possible for the number of open cred files to
reach the maximum number of open files allowed. If this occurred, new SMB
sessions to the affected node could not be established, and messages similar to
the following appeared in the /var/log/lwiod.log file:
157030
155057
Due to a race condition that could occur when multiple SMB 1 sessions were being 154962
opened on the same connection, the lwio process sometimes unexpectedly
restarted. If the process restarted, SMB clients connected to the affected node were
disconnected from the cluster.
If SMB auditing was enabled and you set the --max-cached-messages
parameter to 0 (zero) to disable message caching, the SMB client session and
negotiate requests that were waiting to be audited might have prevented new SMB
154271
SmarQuotas
49
Resolved issues
ID
session and negotiate requests from being processed. If this occurred, SMB clients
might have been prevented from establishing new connections to the cluster until
the backlog of audit messages was processed.
Note
In addition, if the EMCopy tool attempt to retry the failed operation, the retry failed
and an error similar to the following appeared on the EMCopy client:
ERROR (4392) : \path_to_target\symbolic_link -> Unable to open,
Failed after 1 retries.
153366
Under some circumstances, after an SMB2 client attempted to access a file on the
cluster through a symbolic link, OneFS returned an ESYMLINKSMB2 error (an
internal error that is not seen on the client). If this error was returned, the symbolic
link was resolved; however, some kernel memory that was allocated in order to
complete the process of resolving the symbolic link was not deallocated after the
link was resolved. As a result, over time a node's kernel processes might have run
out of memory to allocate. If this occurred, the affected node rebooted
unexpectedly, and messages similar to the following appeared in the /var/log/
messages file on the affected node:
/boot/kernel.amd64/kernel: Pageout daemon can't find enough free
pages.
System running low on memory. Check for memory pigs
50
152404
Resolved issues
ID
149841
Note
Searches with only the *.* string listed the entire contents of the directory, as
expected.
ID
Authentication
Authentication issues resolved in OneFS 7.2.0.3
ID
Due to a file descriptor (FD) leak that occurred when SMB clients listed files and
directories within an SMB share, it was possible for OneFS to eventually run out of
available file descriptors. If this occurred, an ACCESS_DENIED or
STATUS_TOO_MANY_OPENED_FILES response was sent to SMB clients that
attempted to establish a new connection to the cluster or SMB clients that were
connected to the cluster that attempted to view or open files. As a result, new SMB
connections could not be established, and SMB clients that were connected to the
cluster could not view, list, or open files. If this issue occurred, messages similar to
the following appeared on the Dashboard > Event summary page of the OneFS
web administration interface, and in the command-line interface when you ran the
isi events list -w | grep -i descriptor command:
152809
51
Resolved issues
ID
149810
Stack: -------------------------------------------------/usr/lib/libkrb5.so.3:krb5_copy_principal+0x33
/usr/lib/kt_isi_pstore.so:krb5_pktd_get_next+0xe6
/usr/lib/libkrb5.so.3:krb5_dyn_get_next+0x5e
/usr/lib/libkrb5.so.3:krb5_rd_req_decoded_opt+0x4a4
/usr/lib/libkrb5.so.3:krb5_rd_req_decoded+0x1d
/usr/lib/libkrb5.so.3:krb5_rd_req+0xc1
/usr/lib/libgssapi_krb5.so.2:krb5_gss_accept_sec_context+0x8fd
/usr/lib/libgssapi_krb5.so.2:gss_accept_sec_context+0x22c
/usr/lib/libgssapi_krb5.so.2:spnego_g
/boot/kernel.amd64/kernel: ss_accept_sec_context+0x3d6
/usr/lib/libgssapi_krb5.so.2:gss_accept_sec_context+0x22c
/usr/likewise/lib/lwio-driver/srv.so:SrvGssContinueNegotiate+0x2c5
/usr/likewise/lib/lwio-driver/srv.so:SrvGssNegotiate+0xd3
/usr/likewise/lib/lwio-driver/
srv.so:SrvProcessSessionSetup_SMB_V2+0x6c6
/usr/likewise/lib/lwio-driver/
srv.so:SrvProtocolExecute_SMB_V2+0x1324
/usr/likewise/lib/lwio-driver/srv.so:SrvProtocolExecuteInternal
+0x51b
/usr/likewise/lib/lwio-driver/
srv.so:SrvProtocolExecuteWorkItemCallback+0x28
/usr/likewise/lib/liblwbase.so.0:WorkThread+0x1f7
/lib/libthr.so.3:_pthread_getprio+0x15d
--------------------------------------------------
If an LDAP server was configured to handle Virtual List View (VLV) search instead of 149797
paged search, and if LDAP users were listed, a memory leak occurred when
returning more than one page of information. If users were listed a sufficiently large
number of times, the lsass process could run out of memory and restart
unexpectedly. As a result, SMB users could not be authenticated for the several
seconds it took for the lsass process to restart.
Microsoft Active Directory (AD) users in trusted domains were allowed a higher level 149795
of access to EMC Isilon clusters by default if RFC 2307 was enabled on the cluster,
and if Windows Services for UNIX (SFU) was not configured on the trusted domain.
If the lsassd process was not able to resolve user and group IDs, a message was
logged to the /var/log/messages file. In rare and extreme cases, excessive
logging could decrease the wear life of the boot disks on the affected node. If this
occurred, lines similar to the following appeared in the /var/log/messages
file:
149769
If you configured public key SSH authentication on a cluster running OneFS 7.1.1.2
through OneFS 7.1.1.5 or OneFS 7.2.0.1 through OneFS 7.2.0.2, and then you
52
138180
Resolved issues
ID
upgraded to OneFS 7.2.0.x, the root user could no longer log in to the cluster
through SSH without entering their password.
ID
Reduces lock contention by changing the lock type used by the SyncIQ coordinator
when reading the siq-policies.gc file coordinator from an exclusive lock to a
shared lock.
149818
During a SyncIQ job, if the rm command that was run during the cleanup process of 149771
the temporary working directory on the target cluster exited with an error, the
SyncIQ policy went into an infinite loop, and data could not be synced to the
cluster. If this occurred, a message similar to the following appeared in
the /var/log/isi_migrate.log file:
Unable to cleanup tmp working directory, error is
149668
SyncIQ consumed excessive amounts of CPU during the phase when SyncIQ was
listing the contents of snapshot directories. This caused SyncIQ policies to take
longer to complete.
148431
147200
146395
Stack: -------------------------------------------------/lib/libc.so.7:__sys_kill+0xc
53
Resolved issues
ID
/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0
/usr/lib/libisi_migrate.so.2:siq_job_summary_save_new+0x200
/usr/bin/isi_migr_sched:sched_main_node_work+0xf3f
/usr/bin/isi_migr_sched:main+0xf13
/usr/bin/isi_migr_sched:_start+0x8c
--------------------------------------------------
When performing a SyncIQ job, in certain cases the target sworker would not
acknowledge completing some tasks. Furthermore, if a SyncIQ job was very large, a
source pworker could have accumulated a large number of un-acknowledged tasks
and then waited for the target worker to acknowledge work that was already
completed. If this occurred, the SyncIQ job would run indefinitely.
142966
If a directory was renamed to a path that had been excluded from a SyncIQ job, the
SyncIQ state information for the directory and its children remained stored.
However, the directory and its children tree were removed from the SyncIQ target.
Any future changes that were made to the directory or its children were treated as
changes to included paths. If this occurred, a SyncIQ target error similar to the
following appeared in the /var/log/isi_migrate.log file:
141584
If all directories that had been excluded from the SyncIQ job were removed in an
incremental SyncIQ job, that incremental SyncIQ job could have failed while trying
to delete an excluded directory. If this occurred, an error similar to the following
appeared in the /var/log/messages or /var/log/isi_migrate.log files:
FAILED ASSERTION found == true
141176
Note
Beginning in OneFS 7.2.0.3, the protection policy for SyncIQ System B-Trees is set
to the system disk pool default, which enhances SyncIQ performance. If you want
to change the default protection policy for SyncIQ System B-Trees, contact EMC
Isilon Technical Support.
If SyncIQ encountered an issue when processing an alternate data stream for a
directory, an incorrect directory path appeared in the error message that was
logged in the /var/log/isi_migrate.log file.
132233
ID
When adding preformatted drives to a node, the drive did not get properly
repurposed for the pool that it was being added to. If this issue occurred, data was
not written to the drive, the drive remained unprovisioned until it was reformatted,
150040
Cluster configuration
54
Resolved issues
ID
If the isi_cpool_rd driver was enabled and the FILE_OPEN_REPARSE_POINT flag was 149010
also enabled, then, if an SMB client attempted to open a symbolic link, the
symbolic link was inaccessible, and the following error appeared on the console:
STATUS_STOPPED_ON_SYMLINK
If a file on the cluster was deleted or modified, and the most recent snapshot of
that file was deleted, any changes to SmartPools policies might have silently failed
to propagate to some snapshot files.
147958
Available space remaining on SSDs that are deployed as L3 cache was incorrectly
reported in the OneFS web administration interface.
141931
ID
When you selected Help > Help on This Page or Help > Online Help from the
General Settings page of the web administration interface, a page appeared with
the following message:
146846
Diagnostic tools
Not Found The requested URL /onefs/help/GUID-E395ABA6B63A-4F40-8281-3574CCF6C8B1.html was not found on this server.
Note
This issue did not affect the SNMP Monitoring and SupportIQ general settings
pages.
If you ran the isi_gather_info command with the --ftp-proxy-port and
--save-only options or with the --ftp-proxy and --save-only options,
the specified FTP proxy port or FTP proxy host values were not saved. As a result,
the desired FTP proxy settings had to be specified each time the
isi_gather_info command was run.
142784
75677
Diagnostic tools
55
Resolved issues
ID
153565
The stated storage capacities for /, /var, and /var/crash were reported 8 times 151651
too high in the OneFS statistics system. This sometimes caused incorrect capacity
sizes to appear in the web administration interface, SNMP queries, or in Platform
API-enabled applications.
The following event message did not automatically clear after the boot drive was
replaced:
150730
150625
isi_celog_monitor[5723:MainThread:ceutil:92]ERROR: MemoryError
isi_celog_monitor[5723:MainThread:ceutil:89]ERROR: Exception in
serve_forever()
Note
56
If the CELOG notification master node went down, delivery of event notifications
stopped until the down node returned to service or until the CELOG notification
subsystem (isi_celog_notification) was restarted, at which point the subsystem
would elect a new notification master with the updated group information.
149682
If phase 2 of an FSAnalyze job took longer than 100 minutes to complete, the job
sometimes stopped progressing, might have progressed very slowly, or might have
failed and then resumed. This issue occurred because, during phase 2, the
FSAnalyze job updated an SQLite index, and while the job was updating this index,
it could not handle other job engine requests, which prevented the job from
progressing. In addition, if, while the SQLite index was being created, the number
of requests waiting to be handled grew to more than 100 (the maximum allowed),
the job was terminated and then resumed from a point before the 100 minutes had
elapsed.
147009
The isi_papi_d process did not properly handle CELOG events that referenced a
path name that contained special characters or multibyte characters. If this issue
144742
Resolved issues
ID
If the cluster was being monitored by an InsightIQ server, this issue might also
have resulted in a lost connection between the InsightIQ server and the cluster.
The physIfaces object identifier (OID) was incorrectly named in the ISILON-
144382
TRAP-MIB.txt file, available in the General Settings > SNMP Monitoring tab
of the OneFS web administration interface. As a result, it was not always possible
to monitor the cluster through SNMP.
Protocol event logging in the /var/log/audit_protocol.log file always
showed a value of 0 bytes written for a write event, and close events did not have
138957
135108
If you attempted to run Insight IQ 3.1.x to monitor a cluster, disk statistics were not
being collected due to the Platform API disk statistics query returning an error. As a
result, InsightIQ could not be used to collect drive statistics from the cluster.
129187
ID
If SMB2 symbolic link translation was disabled on the cluster by running the
following command:
150833
File system
isi_gconfig
registry.Services.lwio.Parameters.Drivers.onefs.SMB2Symlinks=0
Symbolic links to directories might have failed and an error similar to the following
might have appeared on the client:
The symbolic link cannot be followed because its type is disabled.
57
Resolved issues
ID
that the SSDs never automatically transitioned from the [REPLACE] back to the
Isilon A100 nodes might have restarted unexpectedly during a group change,
resulting in data unavailability. If this issue occurred, lines similar to the following
appeared in the /var/log/messages file:
149687
Due to a race condition that could occur while file metadata was being upgraded
following an upgrade from OneFS 6.5.5.x to OneFS 7.2.0.x, a node might have
unexpectedly restarted. If this issue occurred, the following lines appeared in
the /var/log/messages file on the affected node:
panic @ time 1406566983.500, thread
0xffffff07b80ae560:
Assertion Failure
Stack:
------------------------------------------------kernel:isi_assert_halt+0x42
efs.ko:ifm_di_get_current_protection+0x61
efs.ko:ifm_get_parity_flag+0x33
efs.ko:bam_read_block+0x5f
efs.ko:bam_read_range+0xd8
efs.ko:bam_read+0x613
efs.ko:bam_read_uio+0x36
efs.ko:bam_coal_read_wantlock+0x37a
efs.ko:ifs_vnop_wrapunlocked_read+0x2c6
nfsserver.ko:nfsvno_read+0x58b
nfsserver.ko:nfsrvd_read+0x55c
nfsserver.ko:nfsrvd_dorpc+0x4d3
nfsserver.ko:nfs_proc+0x243
nfsserver.ko:nfssvc_program+0x7b1
krpc.ko:svc_run_internal+0x3c6
krpc.ko:svc_thread_start+0xa
kernel:fork_exit+0x7f
-------------------------------------------------*** FAILED ASSERTION ifm_di_getinodeversion(dip)
== 6 @/build/mnt/src/sys/ifs/ifm/ifm_dinode.c:
397:ifm_di_get_current_protection: wrong
inode
58
149669
Resolved issues
ID
It was possible for a race condition between the group change and the deadlock
probea mechanism that attempts to detect and correct deadlock conditionsto
cause a node to restart unexpectedly.
149667
If a cluster had run for more than 248.5 consecutive days, an issue that affected
the OneFS journal buffer could sometimes disrupt the drive sync operation. When
this issue occurred, OneFS reported that the journal was full, and as a result,
resources that were waiting for a response from the journal entered a deadlocked
state. When the journal was in this state, nodes that were affected rebooted to
clear the deadlock. In addition, a message similar to the following appeared in
the /var/log/messages file:
148960
/boot/kernel.amd64/kernel:efs.ko:rbm_buf_timelock_panic_all_cb
+0xd0
Under rare circumstances, the lock subsystem did not drain fast enough, causing
an assertion failure. When this issue occurred, the node restarted, and the
following stack was logged to the /var/log/messages file:
148123
Stack: -------------------------------------------------kernel:isi_assert_halt+0x2ekernel:lki_lazy_drain+0xf76
kernel:_lki_split_drain_locks+0xa8
kernel:kt_main+0x15ekernel:fork_exit+0x75
-------------------------------------------------<3>*** FAILED ASSERTION must_drain ==> !pool->lazy_queue_size || !
li->mounted @ /b/mnt/src/sys/ifs/lock/lk_initiator.c:13270:
lki_lazy_drain_pool on LK_DOMAIN_DATALOCK took 302454934. lazy
queue 1870 -> 11. li->llw_count = 0, iter_count=11087431
chk_space_time = 0, chk_space_iters = 0 llw_time = 880073
llw_iters = 2503 reject_drain_time = 1550050 reject_drain_iters =
1 yield_time = 282713930 yield_iters = 11084926
shrink_lazy_queue_count = 11087431
If an SMB client changed the letter case of the name of a file or directory stored on 147606
the cluster, the file or directory's ctime (change time) value was not updated. As a
result, the affected file or directory was not backed up during incremental backups.
If SmartCache write caching was enabled and if clients were performing
synchronous writes to the cluster, it was possible to encounter a runtime assert
that caused an affected node to unexpectedly restart. If this issue occurred, lines
similar to the following appeared in the /var/log/messages file:
146541
Stack: -------------------------------------------------kernel:cregion_issue_write+0xdcb
kernel:_cregion_write+0x1f5
kernel:cregion_write+0x24
kernel:cregion_flush+0xf6
kernel:coalescer_flush_overlapping+0x219
kernel:coalescer_flush_local_overlap+0x275
kernel:bam_coal_flush_local_overlap+0x2d
--------------------------------------------------
While running an initial SyncIQ job, the target root directory and its contents
remained in a read-write state instead of read-only until the SyncIQ job completed.
As a result, files could be deleted or modified in the target cluster.
145714
File system
59
Resolved issues
ID
144278
In rare cases, an SMB client released its lease on a file before OneFS received a
request to release the lease. If this occurred, the lwio process restarted
unexpectedly, SMB clients connected to the affected node were disconnected, and
lines similar to the following appeared in the /var/log/messages file:
139833
Stack: -------------------------------------------------lib/libc.so.7:thr_kill+0xc
/usr/likewise/lib/liblwiocommon.so.0:LwIoAssertionFailed+0x9f
/usr/likewise/lib/lwio-driver/
onefs.so:OnefsOplockBreakFillBuffer_inlock+0xbf
/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockComplete_inlock
+0x7e
/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockBreakToRH+0x187
/usr/lib/libisi_ecs.so.1:oplocks_event_dispatcher+0xf3
/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockChannelRead+0x8c
/usr/likewise/lib/liblwbase.so.0:EventThread+0x333
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec
/lib/libthr.so.3:_pthread_getprio+0x15d
--------------------------------------------------
File transfer
File transfer issues resolved in OneFS 7.2.0.3
ID
If a client was connected to the cluster through vsftpd and ran the ls or dir
commands for directories that contained more than 100,000 files, the vsftpd
process reached its memory limit, and a memory allocation error occurred. As a
result, the files in the affected directories could not be listed.
149665
ID
Hardware
The isi firmware status command did not report the firmware version of the 150725
Mellanox IB/NVRAM card. This issue affected the S200, X200, X400, and NL400
series nodes.
60
The LED on the chassis turned solid red for a drive prior to completion of the
smartfail process. As a result, the drive might have been replaced prematurely,
possibly causing data loss.
145348
If you installed a new drive support package (DSP) on a node that already had a
DSP installed and you then attempted to update a drive whose update was
included only in the new DSP, the fwupdate command did not update the drive
unless either the isi_drive_d process or the affected node was restarted. If this
issue occurred, and you ran the isi devices a fwupdate command before
145268
Resolved issues
ID
restarting the isi_drive_d process or the node, the following error appeared on the
console:
'fwupdate' action complete, 0 drives updated, 0 updates failed
If you attempted to install a node firmware package that did not have support for
the Chassis Management Controller (CMC) component, on a node that contained a
CMCfor example, an S210, X210, X410, NL410, or HD400 nodethe installation
failed and an unhandled exception error similar to the following appeared on the
console:
144708
Note
The isi_sasphymon process could potentially close a valid 0 file descriptor. If this
issue occurred, any drive associated with the file descriptor would no longer be
monitored by the isi_sasphymon process. This issue would also cause excessive
logging in the /var/log/isi_sasphymon.log file similar to the following:
143042
isi_sasphymon[3979]: Can't get SCSI Log Sense page 0x18 from Bay
2 - scan 6
isi_sasphymon[3979]: cam_get_inquiry: error from cam_send_ccb: 9
isi_sasphymon[3979]: scsi_get_info: error from scsi_get_inquiry
If you ran the isi_reformat_node command on a node containing selfencrypting drives (SEDs), sometimes the SEDs could not be released from
ownership, and when the node rebooted, the unreleased SEDs came up in a
SED_ERROR state.
141983
Note
Note
If the reformat process continues without reverting the listed drives, it is likely they
will be in a SED_ERROR state on the next node boot.
Hardware
61
Resolved issues
ID
After replacing boot flash drives to a node and running the gmirror status
command, the correct number of active components was displayed but a status of
DEGRADED was incorrectly returned for some components in the output. In the
128304
Status
COMPLETE
mirror/keystore
DEGRADED
mirror/var-crash
mirror/mfg
COMPLETE
DEGRADED
mirror/journal-backup
COMPLETE
mirror/var1
COMPLETE
mirror/var0
COMPLETE
mirror/root1
COMPLETE
Components
ad7p4
ad4p4
ad7p11
ad4p12
ad7p10
ad7p9
ad4p10
ad7p8
ad4p8
ad7p7
ad4p7
ad7p6
ad4p6
ad7p5
ad4p51
Although the operation of the node was unaffected, the incorrect Status sometimes
led to unnecessary service calls for hardware exchanges.
HDFS
HDFS issues resolved in OneFS 7.2.0.3
ID
If the maximum number of HDFS client connections to the cluster was reached, all
worker threads remained busy during processing. As a result, no further cluster
connections could be established, namenode remote procedure calls (RPCs) were
queued for long periods of time, and the HDFS server incorrectly appeared to be
unavailable.
154175
If you tried to change ownership of files or directories through the WebHDFS REST
API by setting only the owning user or the owning group of a file or directory (but
not both), an exception error similar to the following might have appeared in the
command-line interface:
153786
"RemoteException":
{
"exception"
: "SecurityException",
"javaClassName": "java.lang.SecurityException",
"message"
: "Failed to get id rec: 1:"
}
}
Additionally, Ambari 2.1 might have failed to install Hortonworks Data Platform 2.3
through the WebHDFS REST API.
62
The datanode port that HDFS listens on was changed from 1021 to 585 to avoid
conflicts with other processes that might have been listening on the same port.
152933
If the maximum number of HDFS client connections to the cluster was reached, all
worker threads remained busy during processing. As a result, no further cluster
connections could be established, namenode remote procedure calls (RPCs) were
queued for long periods of time, and the HDFS server incorrectly appeared to be
unavailable.
147723
Resolved issues
ID
146753
When Kerberos authentication was used with HDFS, the isi_hdfs_d process could
eventually run out of memory and unexpectedly stop. If this issue occurred, an
isi_hdfs_d.core file was created in the /var/log/crash/ directory, and the
following lines appeared in the /var/log/messages file:
146026
Java class names were not included for remote exceptions in WebHDFS. The
exclusion of Java class names might have caused unexpected errors, similar to the
following, when creating and writing a file through WebHDFS:
142056
If a Hadoop client tried to export data in Hive to a directory that already existed,
and the client did not have permissions on the directory to make the change, the
mkdir command failed. If the mkdir command failed, an error similar to the
following appeared on the client:
142049
139269
HDFS
63
Resolved issues
Job engine
Job engine issues resolved in OneFS 7.2.0.3
ID
If you tried to start a PermissionRepair job from the ClusterManagement > Job
Operations > Job Types > Start Job dialog, and you set the Repair Type to
Clone: copy permissions from the chosen path to all files and directories
or Inherit: recursively apply an ACL, the Template File or Directory field did
not appear. As a result, you could not configure a PermissionRepair job to
154094
148016
In the web administration interface, the Edit Job Type Details page for jobs that
had a schedule set to Every Sunday at 12:00am displayed Close and Edit Job
Type buttons instead of Cancel and Save Changes buttons.
144692
ID
149816
Migration
149815
64
149814
Resolved issues
ID
If the isi_vol_copy_vnx tool was used to migrate data from a VNX array to a OneFS
cluster, and if the data contained any NULL SIDs, the migration process stopped,
and a message similar to the following appeared in the /var/log/messages
file:
149760
/boot/kernel.amd64/kernel:
[bam_acl.c:190](pid 83648="isi_vol_copy_vnx")(tid=101308)
ifs_verify_acl:
Failed verifying
security_ace on lin:1:02df:da06. Ace#3. An ACE cannot have a NULL
identity type.
Networking
Networking issues resolved in OneFS 7.2.0.3
ID
S210 and X410 nodes that were configured to communicate through a 10 GigE
150883,
network interface card that was using the BXE driver, and that were also configured 152083
to use aggregate interfaces with the link aggregation control protocol (LACP),
experienced connectivity issues with those interfaces if the node was rebooted or if
the MTU on those interfaces was reconfigured.
If you performed an extended link flapping test on a node containing a Chelsio
149767
network interface card (NIC), the NIC eventually became unresponsive, and had to
be manually disabled and then re-enabled before it resumed normal operations.
While the NIC was unresponsive, external clients could not communicate with the
node; however, because the nodes back-end communication was unaffected, data
on the node was still available to clients connected to the cluster through other
nodes.
If the cluster contained X410, S210, or HD400 nodes that had BXE 10 GigE NIC
cards and any external network subnets connected to the cluster were set to 9000
MTU, an error similar to the following appeared in the /var/log/messages file,
and the affected nodes rebooted:
148695,
152083
For more information, see ETA 200096 on the EMC Online Support site.
A memory leak in the networking process, isi_flexnet_d, might have caused the
process to stop running, and could have damaged the /etc/ifs/
flx_config.xml file. If the file was damaged, all clients could have lost their
connections to the cluster.
141822
ID
NFS
Because OneFS 7.2.0 and later returned 64-bit NFS cookies, some older, 32-bit NFS 153737
clients were unable to correctly handle read directory (readdir) and extended read
directory (readdirplus) responses from OneFS. In some cases, the affected 32-bit
clients became unresponsive, and in other cases, the clients could not view all of
Networking
65
Resolved issues
ID
the directories in an NFS export. In the latter cases, the client could typically view
the current directory (".") and its parent directory ("..").
For more information, see ETA 205085 on the EMC Online Support site.
Because NFSv3 Kerberos authentication requires all NFS procedure calls to use
RPCSEC_GSS authentication, some older Linux clientsfor example, RHEL 5 clients
that started the FSINFO procedure call with AUTH_NULL authentication before
attempting the FSINFO procedure call with RPCSEC_GSS authentication, were
prevented from mounting an NFS export if the export was configured with the
Kerberos V5 (krb5) security type. Newer clients that started the FSINFO procedure
call with RPCSEC_GSS were not affected.
151582
If the lsass process was not running when NFS configuration information was
refreshed on the cluster, it was possible for empty netgroups to be propagated to
some or all of the cluster nodes. If this issue occurred, NFS clients were unable to
mount NFS exports.
149781
If you created a hard link that contained a colon (:) from an NFSv3 client, the colon
and any characters that followed it were removed from the hard link name. As a
result, the hard link on the cluster did not have the correct name.
If removing the colon and following characters resulted in changing the hard link
name to a file name that was already in use in the destination directory on the
cluster, a file name conflict resulted, and a "File exists error appeared on the NFS
client.
148001
If a client held a read lock on a file and an NFS4 client checked the lock status of
the file, the response from the cluster incorrectly reported that the original client
was holding a write lock on the file.
This issue might have caused the program that the NFS client was using to work
improperly.
147638
If an NFS client attempted to list a file or directory at the root of an NFS export
mount point directory that began with two dotsfor example, /mnt/
nfs_export/..my_folder and the requested file or directory did not exist,
OneFS returned the contents of the NFS export instead of a file not found error
message.
147404
A memory leak in the isi_papi_d process might have caused an out-of-memory error 145209
when running isi nfs exports commands.
Because the nfs and onefs_nfs drivers (and the flt_audit_nfs driver, if you enabled
protocol auditing) share the same process ID, if one of these drivers failed to start,
the MCP process did not always detect the failure and did not always restart the
stopped drivers.
144485
On the NFS Export Details page, if you added a secondary group for either the
Map Root User or the Map Non Root User, the value field did not display until you
refreshed the web administration interface page.
142343
If the NFS server shut down in the middle of a NFS export refresh, it was possible for 142296
an NFS resolver thread to be in use when the NFS server was attempting to shut
down. If this issue occurred, a core file might have been created, and lines similar
to the following appeared in the /var/log/messages file:
Stack: -------------------------------------------------/lib/libthr.so.3:_umtx_op_err+0xa
66
Resolved issues
ID
/usr/likewise/lib/liblwbase.so.0:WaiterSleep+0xe0
/usr/likewise/lib/liblwbase.so.0:LwRtlMvarTake+0x69
/usr/likewise/lib/lwio-driver/nfs.so:NfsLockMvar+0x19
/usr/likewise/lib/lwio-driver/
nfs.so:NfsExportManagerResolveCallback+0x5f8
/usr/likewise/lib/liblwbase.so.0:SparkWorkItem+0x56
/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee
/lib/libthr.so.3:_pthread_getprio+0x15d
-------------------------------------------------
It was possible for two NFS threads to create a race condition when the threads
139673
were inserting NFS export information into the hash table. This race condition could
damage the hash table, causing the NFS process to restart. When this race
condition occurred, lines similar to the following appeared in the /var/log/
messages file:
/boot/kernel.amd64/kernel: [kern_sig.c:3376](pid 7997="nfs")
(tid=100859) Stack trace:
/boot/kernel.amd64/kernel: Stack:
-------------------------------------------------/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.
0:HashLookup+0x31
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.
0:LwRtlHashTableInsert+0x5a
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.
0:LwRtlHashTableResize+0xaf
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.
0:LwRtlHashTableResizeAndInsert+0x2e
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.
0:LwRtlHashMapInsert+0x6f
/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/
nfs.so:NfsExportManagerResolveCallback+0x66
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.
0:SparkWorkItem+0x563
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.
0:WorkThread+0x256
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.
0:LwRtlThreadRoutine+0xee
/boot/kernel.amd64/kernel: /lib/libthr.so.3:_pthread_getprio+0x15d
/boot/kernel.amd64/kernel:
-------------------------------------------------/boot/kernel.amd64/kernel: pid 7997 (nfs), uid 0: exited on
signal 11 (core dumped)
If there was a group change in the cluster, it was possible that the NFS server would 131197
not shut down after a set period of time. After the set period of time elapsed, the
NFS server was forcefully signaled to stop. When the NFS server was forcefully
stopped, a core file was created and lines similar to the following appeared in
the /var/log/messages file:
Stack: -------------------------------------------------/lib/libc.so.7:_kevent+0xc
/usr/likewise/lib/liblwbase.so.0:EventThread+0x964
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee
/lib/libthr.so.3:_pthread_getprio+0x15d
--------------------------------------------------
NFS
67
Resolved issues
SmartLock
SmartLock issues resolved in OneFS 7.2.0.3
ID
139167
ID
149758
If you changed a quota's soft or hard limit through the web administration
interface, the Enforced parameter changed from Yes to No, making the quota
accounting-only. Any usage limit that was set was not enforced.
148807
If a quota was created with a hard, soft, or advisory threshold that included a
decimal pointfor example, isi quota quotas create --hardthreshold=4.5Tthe operation failed, and a message similar to the following
appeared on the console:
145943
SmartQuotas
Unknown suffix '.5T'; expected one of ['b', 'K', 'M', 'G', 'T',
'P', 'B', 'KB', 'MB','GB', 'TB', 'PB']
In the web administration interface, after clicking View details for a quota on the
Quotas & Usage page, the %Used value under Usage Limits did not always
correctly match the percentage value displayed under %Used in the top summary
row for the quota.
123355
ID
If you created an SMB share and then created a single user or group with run-as
root permissions to the share, the user or group could not be deleted, and the user
or groups run-as-root permission could not be modified. If you attempted to delete
the user or group, the command appeared to successfully complete; however, the
user or group was not deleted. If you attempted to modify the user or groups
permissions, the command appeared to successfully complete; however, the
original permissions entry was not removed, and an additional entry, with the
modified permissions, was added to the share. In the example below, the domain
146616
SMB
68
Resolved issues
ID
admins group displays the duplicate entries created when the groups permissions
run-as-root was modified:
Account
Account Type Run as Root Permission Type
Permission
-----------------------------------------------------------------------EXAMPLE\domain admins group
True
allow
full
EXAMPLE\domain users group
False
allow
change
EXAMPLE\domain admins group
False
allow
full
SMB clients were unable to display alternate data stream information for files on
the cluster that contained alternate data streams.
153666
During an upgrade to OneFS 7.2.0.x, an upgrade script did not properly interpret an 150658
empty string value for the HostAcl parameter in the /ifs/.ifsvar/
main_config.gc file. This caused SMB shares to be inaccessible after the
upgrade was complete, and as a result, the SMB shares had to be re-created. If this
occurred, output similar to the following appeared after running the isi_gconfig
registry.Services.lwio.Parameters.Drivers.srv.HostAcl
command:
registry.Services.lwio.Parameters.Drivers.srv.HostAcl (char**) =
[ "" ]
149811
In OneFS 7.2.0.x clusters, the SMB2 connection was sending invalid share flags. As 149796
a result, if the inheritable-path ACL was set while creating a share, access to files
on a cluster using UNC path hyperlinks in Microsoft Outlook emails failed to open.
If you ran the isi statistics client command to view information about
some SMB1 and SMB2 read and write operationsfor example, the
namespace_write operationthe word UNKNOWN appeared in the UserName
149683
column, instead of a valid user name. As a result, if you ran scripts to filter read/
write operations per user, the scripts did not work correctly.
If you attempted to override the default Windows ACL settings that were applied to 149664
an SMB share, by adding custom ACLs to the /ifs/.ifsvar/smb/isi-sharedefault-acl/ template directory, the overrides were not implemented. As a
result, actual access permissions on the SMB share did not match expected
results.
If the FILE_OPEN_REPARSE_POINT flag was enabled, and an SMB client opened 148734
an alternate data stream (ADS) through a symbolic link, the ADS was inaccessible,
and the following error appeared on the console:
STATUS_STOPPED_ON_SYMLINK
If you ran the EMCopy application to migrate data containing symbolic links to the
cluster, the SMB process unexpectedly restarted because of an lwio process
assertion failure. When the SMB process restarted, clients were disconnected from
145612
SMB
69
Resolved issues
ID
the cluster and the following error message appeared in the /var/log/
lwiod.log file:
ASSERTION FAILED: Expression = (pFcb->bIsDirectory ==
bIsDirectory)
On the client, EMCopy might have displayed the following error message:
ERROR (50) : \\TARGET\symlink ->
ID
If the Disable access logging option was set in the OneFS web administration
interface, and then you upgraded your cluster from OneFS 6.5.x to OneFS 7.x, the
apache2 service failed to start, and an error similar to the following appeared
repeatedly in the /var/log isi_mcp file:
149812
149695
70
149684
Resolved issues
ID
If you attempted to scan an infected file from the OneFS web administration
141960
interface, and if the file name or the path name where the file was located
contained the apostrophe (') character, the web interface displayed an HTTP 500
If the job that was running an antivirus scan policy was terminated, either by
another process or due to a software failure, the antivirus scan policy continued to
be listed as running in the OneFS web administration interface, and the job could
not be manually cancelled or cleared from the list of running jobs. The correct
status of the policy was displayed when viewed from the command-line interface.
141954
Because some antivirus scan reporting fields accepted invalid characters from
SQLite queries, running or completed antivirus scan policies were not listed in the
OneFS web administration interface, and messages similar to the following
appeared in the webware_webui.log file where <policy_ID> was the ID of the
affected policy:
138754
135097
Note
Because repeated logging to the /var partition can adversely affect the wear life of
a node's boot flash drives, to reduce logging under the previously described
circumstances, if a large number of duplicate messages are logged within a short
period of time, some of the messages are suppressed and a message similar to the
following appears in the /var/log/isi_avscan_d.log file:
isi_avscan_d[1764]: Suppressed 152 similar messages!
71
Resolved issues
Authentication
Authentication issues resolved in OneFS 7.2.0.2
ID
147221
145590
If an LDAP provider returned a UID or a GID that was greater than 4294967295 (the
maximum value that can be assigned to an unsigned 32-bit integer), an incorrect
UID or GID was assigned to the associated user or group. This issue could have
affected a users ability to access data on the cluster.
144002
Note
such user error will be returned. Additional logging was also added to
the /var/log/lsassd.log file to help identify these issues.
If the selective authentication setting was enabled for a Windows trusted domain,
and if a user who was a member of the domain was assigned to a group to which
the ISI_PRIV_LOGIN_SSH or ISI_PRIV_LOGIN_PAPI role-based access privilege was
assigned, the user was denied access to the cluster when attempting to log in
through an SSH connection or through the OneFS web administration interface.
This issue occurred because the selective authentication setting prevented OneFS
from resolving the users group membership.
142088
If a DNS server became unavailable while the lsass process was sending RPC
requests to a domain controller, the lsass process might have restarted
unexpectedly. If this issue occurred, authentication services were temporarily
unavailable, and a message a similar to the following appeared in the /var/log/
messages file:
142073
Stack: -------------------------------------------------/usr/likewise/lib/liblsaonefs.stat.so:LsaOnefsGetIpv4Address+0x9
/usr/likewise/lib/liblsaonefs.stat.so+0xee4:0x807315ee4
/usr/likewise/lib/liblsaserverstats.so.0:LsaSrvStatisticsRelease
+0x82
/usr/likewise/lib/lsa-provider/
ad_open.so:AD_NetLookupObjectSidsByNames+0x3bc
/usr/likewise/lib/lsa-provider/
ad_open.so:AD_NetLookupObjectSidByName+0x1b1
/usr/likewise/lib/lsa-provider/ad_open.so:LsaDmConnectDomain+0x205
/usr/likewise/lib/lsa-provider/
ad_open.so:LsaDmWrapNetLookupObjectSidByName+0x76
/usr/likewise/lib/lsa-provider/
ad_open.so:LsaDmEngineGetDomainNameWithDiscovery+0x6a5
/usr/likewise/lib/lsa-provider/
ad_open.so:AD_ServicesDomainWithDiscovery+0x79
72
Resolved issues
ID
/usr/likewise/lib/lsa-provider/ad_open.so:AD_AuthenticateUserEx
+0x418
/usr/likewise/lib/liblsaserverapi.so.
0:LsaSrvAuthenticateUserExInternal+0x436
/usr/likewise/lib/liblsaserverapi.so.0:LsaSrvAuthenticateUserEx
+0x4be
/usr/likewise/lib/libntlmserver.so.0:NtlmValidateResponse+0xeb1
/usr/likewise/lib/libntlmserver.so.
0:NtlmServerAcceptSecurityContext+0x10a
/usr/likewise/lib/libntlmserver.so.
0:NtlmSrvIpcAcceptSecurityContext+0x325
/usr/likewise/lib/liblwmsg.so.0:lwmsg_peer_assoc_call_worker+0x20
/usr/likewise/lib/liblwbase.so.0:CompatWorkItem+0x16
/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec
/lib/libthr.so.3:_pthread_getprio+0x15d
--------------------------------------------------
141947
/usr/lib/libisi_persona.so.1:persona_get_type+0x1
/usr/lib/libisi_auth_cpp.so.
1:_ZN4auth15json_to_personaERKN4Json5ValueERKNS_14lsa_connectionER
KSs+0xc08
/usr/lib/libisi_auth_cpp.so.
1:_ZN4auth15persona_to_jsonERKNS_7personaERKNS_14lsa_connectionEb
+0x62
/usr/lib/libisi_platform_api.so.
1:_ZN4auth15sec_obj_to_jsonERKNS_7sec_objERKNS_14lsa_connectionEbb
+0x178
/usr/lib/libisi_platform_api.so.
1:_ZN18auth_users_handler8http_getERK7requestR8response+0x4c4
/usr/lib/libisi_rest_server.so.
1:_ZN11uri_handler19execute_http_methodERK7requestR8response+0x56e
/usr/lib/libisi_rest_server.so.
1:_ZN11uri_manager15execute_requestER7requestR8response+0x100
/usr/lib/libisi_rest_server.so.
1:_ZN14request_thread7processEP12fcgi_request+0x112
/usr/lib/libisi_rest_server.so.1:_ZN14request_thread6on_runEv+0x1b
/lib/libthr.so.3:_pthread_getprio+0x15d
If a machine password was changed by a node while the lwreg process on another
node was refreshing that node's lsass configuration, the lsass process on the
second node could have cached both the old and new machine passwords. If this
occurred, the lsass process unexpectedly restarted, and clients connected to the
affected node could not be authenticated. In addition, lines similar to the following
appeared in the /var/log/messages file:
141940
/lib/libc.so.7:thr_kill+0xc
/usr/likewise/lib/lsa-provider/
ad_open.so:LsaPcachepEnsurePasswordInfoAndLock+0x9b6
/usr/likewise/lib/lsa-provider/
ad_open.so:LsaPcacheGetMachineAccountInfoA+0x28
/usr/likewise/lib/lsa-provider/
ad_open.so:AD_MachineCredentialsCacheInitialize+0x38
/usr/likewise/lib/lsa-provider/ad_open.so:AD_Activate+0x9d5
/usr/likewise/lib/lsa-provider/ad_open.so:LsaAdProviderStateCreate
+0xb22
/usr/likewise/lib/lsa-provider/
ad_open.so:AD_RefreshConfigurationCallback+0x792
/usr/likewise/lib/liblsaserverapi.so.0:LsaSrvRefreshConfiguration
+0x432
/usr/likewise/lib/lw-svcm/lsass.so:LsaSvcmRefresh+0x209
/usr/likewise/lib/liblwbase.so.0:RefreshWorkItem+0x24
/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256
Authentication
73
Resolved issues
ID
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec
/lib/libthr.so.3:_pthread_getprio+0x15d
If a cluster that was joined to a Microsoft Active Directory (AD) domain was also
140851
configured with an IPv6 subnet, and if the AD domain controller was configured to
use an IPv6 address, the netlogon process on the cluster repeatedly restarted and
members of the Windows AD domain could not be authenticated to the cluster. If
the netllogon process restarted as a result of this issue, Windows clients might
have received an Access Denied error when attempting to access SMB shares
on the cluster, or they might have received a Logon failure: unknown
139654
138738
Stack:
-------------------------------------------------/boot/kernel.amd64/kernel: /usr/lib/libisi_persona.so.
1:persona_len+0x1
/boot/kernel.amd64/kernel: /usr/lib/libisi_acl.so.1:cleanup_sd
+0x506
/boot/kernel.amd64/kernel: /usr/lib/libisi_acl.so.1:sd_from_text
+0x1f1
Although an LDAP or NIS file provider was configured with a list of unfindable users
through the --unfindable-users option of the isi auth create or isi
auth modify command, a user's groups were still queried for through the LDAP
or NIS provider.
74
137897
Resolved issues
ID
If an update to Microsoft Active Directory (AD) succeeded, but the subsequent LDAP 137743
query for the new password failed, OneFS did not update the cluster's machine
password configuration file, pstore.gc. As a result, there was a mismatch
between the machine password registered with Active Directory and the machine
password being used by the cluster, and clients attempting to connect to the
cluster could not be authenticated.
ID
During a parallel restore operation, if only a portion of the restore operation's file
data write was written to disk, the remaining file data from that write could have
been discarded. Because a restore operation writes a maximum of 1 MB of data at
a time, it was extremely unlikely that only a portion of the data would be written to
disk.
142339
Under some circumstances, the NDMP process might have failed to correctly
142075
account for the number of isi_ndmp_d instances running on a node, and the
number of running instances might have exceeded the maximum number allowed.
In some cases, the running instances might have consumed all available resources,
causing a node to unexpectedly reboot, and the running NDMP job to fail. If this
issue occurred, clients connected to the node were disconnected, and lines similar
to the following appeared in the /var/log/messages file:
/boot/kernel.amd64/kernel: pid 56071(isi_ndmp_d), uid 0 inumber
2111 on /tmp/ufp: out ofinodes
isi_ndmp_d[56071]: ufp copy error: failed to open destination
for /tmp/ufp/isi_ndmp_d/4675/gc ==>/tmp/ufp/isi_ndmp_d/.56071.tmp/
gc: No space left on device
isi_ndmp_d[56071]: ufp error: Failed to initialise failpoints for
isi_ndmp_d/56071
If a snapshots expiration time was extended or changed to zero (indicating that the 142072
snapshot never expires) while the snapshot was being deleted, the isi_snapshot_d
process could have missed the expiration change, and, as a result, the snapshot
might have been deleted.
If the --skip_bb_hash option of a SyncIQ policy was set to no (the default
setting) and if a SyncIQ file split work item was split between pworkers, it was
possible for the pworker that was handling the file split work item to attempt to
transfer data that had already been transferred to the target cluster. If this
occurred, the isi_migr_pworker process repeatedly restarted and the SyncIQ policy
failed. In addition, the following lines appeared in the /var/log/messages file:
142058
75
Resolved issues
ID
/usr/bin/isi_migr_pworker:migr_continue_file+0x1507
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:migr_continue_generic_file+0x9a
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:migr_continue_work+0x70
/boot/kernel.amd64/kernel:
/usr/lib/libisi_migrate_private.so.2:migr_process+0xf1
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:main+0x606
/boot/kernel.amd64/kernel:
/usr/bin/isi_migr_pworker:_start+0x8c
/boot/kernel.amd64/kernel:
-------------------------------------------------/boot/kernel.amd64/kernel: pid 45328 (isi_migr_pworker), uid 0:
exited on signal 6 (core dumped)
If a Collect job had not been run for a long time, snapshots were not processed,
and, over time, they accumulated. As a result, it took longer than expected to
delete files associated with a large number of accumulated snapshots.
141968
141935
In the OneFS web administration interface, the View Details hyperlink on the Data 141933
Protection > SnapshotIQ > Snapshot Schedules page displayed only one line of
the snapshot schedule settings. As a result, the full details of the schedule were
not available unless the user's mouse hovered outside of the browser window.
Although configuring an NDMP backup job with both the BACKUP_FILE_LIST
environment variable and the BACKUP_MODE=SNAPSHOT environmental variable
negated the effect of setting the BACKUP_MODE=SNAPSHOT environment variable
(faster incremental backups), it was possible to configure a job with both
environment variables. Beginning in OneFS 7.2.0.1, if you configure both
environmental variables, the job does not run, and the following message appears
on the Data Management Application (DMA), on the console, and in
the /var/log/ndmp_debug.log file:
141928
This issue occurred because the SyncIQ process attempted to decrease the
retention period of a WORM-committed file, which is not permitted.
76
138935
Resolved issues
ID
Beginning in OneFS 7.2.0.2, if the retention date applied to a file on the source
cluster predates the retention date on the target cluster, no attempt is made to
update the retention date on the target cluster during synchronization.
If a SnapRevert job was run on a directory to which both a SyncIQ domain and a
SnapRevert domain were applied, and if the SyncIQ domain was set to read/write
mode, the SnapRevert job failed, and lines similar to the following appeared in
the /var/log/messages file and in the /var/log/isi_migrate.log file :
138780
isi_job_d[20805]: Man
Working(manager_from_worker_stopped_handler, 2012):
Error from worker 2:14-12-03 12:16:50
SnapRevert[409] Node 1 (1) task 2-1:
Snaprevert job finished with status failed: Unable to create and
getfile descriptor for tmp working directory:
Read-only file system(unrunnable)
from snap_revert_item_process(/usr/src/isilon/bin/isi_job_d/
snap_revert_job.c:730)
from worker_process_task_item(/usr/src/isilon/bin/isi_job_d/
worker.c:940)
isi_job_d[20805]:snap_revert_item_process:743: Snap revert job
finished with status failed: Unable to create and
get file descriptor for tmp working directory: Read-only file
system (unrunnable)
isi_job_d[1910]: SnapRevert[409]Fail
Due to a memory leak in the isi_webui_d process, while viewing SyncIQ reports
through the OneFS web administration interface, the isi_webui_d process
unexpectedly restarted. As a result, the OneFS web administration interface
stopped responding, and users who were logged into the OneFS web
administration interface were disconnected and returned to the log-in screen. In
addition, messages similar to the following appeared in the /var/log/
webware-errors file:
138731
Cluster configuration
ID
If you attempted to reconfigure an existing file pool policy from the OneFS web
143453
administration interface without selecting the disk or node pool in the Storage
Settings section again, an error similar to the following appeared, and the file pool
policy change was not saved:
File Pool Policy Edit Failed The edit to the file pool policy did
not save due to the following error: Invalid storage pool
'<storage-pool-name> (node pool)'
After a cluster that was configured with manual node pools was upgraded, it was
possible for the drive purpose database file (drive_purposing.db) to contain
incorrect node equivalence information for the nodes in the manual node pools.
Because OneFS relies on the information in the drive_purposing.db file when
provisioning nodes, if this issue was encountered, it might have prevented new
nodes from being provisioned.
142026
Cluster configuration
77
Resolved issues
Diagnostic tools
Diagnostic tools issues resolved in OneFS 7.2.0.2
ID
If you ran the isi_gather_info command with the --ftp-port <alt-port> -141922
save-only options, where <alt-port> was the name of the alternate FTP port to set
as the new default, the isi_gather_info command ignored the request, and
used the default FTP port (port 21) instead. As a result, the alternate FTP port
number had to be specified each time the isi_gather_info command was run.
Because the following isi_gather_info command options were processed
immediately before all other command options, the options that followed these
options were sometimes ignored:
l
--verify-upload
--save
--save-only
--re-upload
135541
As a result, the .tar file that is created when the isi_gather_info command
is run might not have been uploaded to Isilon Technical Support, and running the
command sometimes had unexpected results. For example, if you ran the following
command, the --ftp-proxy-host option was ignored:
isi_gather_info --verify-upload --ftp-proxy-host=x
If you ran the isi_gather_info command with the -f optionan option that
enables you to designate a specific directory to gatherand if you specified that
the /ifs/data/Isilon_Support directory should be gathered, the .tar file
that was created by the command could have been extremely large. This issue
occurred because /ifs/data/Isilon_Support is the default temporary
directory that is used to store the .tar files that are created when the
isi_gather_info command is run, and, as such, this directory might contain
previous .tar files that are large in size. In addition, the isi_gather_info -f
command gathers the contents of the /ifs/data/Isilon_Support directory
from each node in the cluster, multiplying the size of the resulting .tar file <x>
times, where <x> is the number of nodes in the cluster.
Note
Beginning in OneFS 7.2.0.1, if you run the isi_gather_info command with the
-f option, and if you specify that the /ifs/data/Isilon_Support directory
should be gathered, the following message appears on the console and the
command does not run:
WARNING: ignored path /ifs/data/Isilon_Support
78
135540
Resolved issues
ID
In some cases, a race condition between the I/O request packet (IRP) cancellation
callback function and the IRP dispatch function caused the lwio process to restart.
If the process restarted as a result of this issue, client connections to the cluster
were disrupted, and the following lines appeared in the /var/log/messages
file:
147471
/boot/kernel.amd64/kernel: /lib/libc.so.7:thr_kill+0xc
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwiocommon.so.
0:LwIoAssertionFailed+0xa3
/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.
0:IopFltContextReleaseAux+0x79/boot/kernel.amd64/kernel:
/usr/likewise/lib/libiomgr.so.0:IoFltReleaseContext+0x2f
/boot/kernel.amd64/kernel: /usr/lib/libisi_flt_audit.so.1:_init
+0x3b37
/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.
0:IopFmIrpCancelCallback_inlock+0x2af
In 7.2.0.1, if a file whose name contained multibyte characters was audited, the
146609
isi_audit_cee process did not decode the file name correctly when it forwarded
audit events to the EMC Common Event Enabler (CEE). As a result, the name of a file
that contained multibyte characters was incorrect within the auditing software.
Some information regarding NFS clients that were being audited, such as the
userID, was omitted from the audit stream. As a result, NFS clients could not be
correctly audited.
138945
Note
Although all of the necessary information regarding NFS clients is now included in
the audit stream, NFS clients might not be correctly audited by some auditing
software.
If memory allocated to the CELOG monitoring process (isi_celog_monitor) became
very fragmented, the isi_celog_monitor process stopped performing any work. As a
result, no new events were recorded, alerts regarding detected events were not
sent, and messages similar to the following were repeatedly logged in
the /var/log/isi_celog_monitor.log file:
138874
isi_celog_monitor[5723:MainThread:ceutil:92]ERROR: MemoryError
isi_celog_monitor[5723:MainThread:ceutil:89]ERROR: Exception in
serve_forever()
Note
138737
138675
79
Resolved issues
ID
Note
Log files that are not correctly rotated can grow in size, and might eventually fill
the /var partition, which can affect cluster performance.
Because commas were not correctly escaped in the output of the isi
statistics--csv command, if the data returned from the command contained
commas, the commas were treated as separators, and the data could not be
accurately interpreted by third-party monitoring tools.
138613
If users attempted to access a file under an audited SMB share and the attempt
failed, the failed access attempts were not recorded in the audit log. As a result,
these events could not be tracked.
138068
ID
File system
If L3 cache was enabled on a cluster running OneFS 7.2.0.1, it was possible for
147475
OneFS to erroneously report that the journal on one or more nodes was invalid. This
issue was more likely to affect S210 and X410 nodes.
Note
Although OneFS reported that a nodes journal was invalid, the journal was actually
intact. This issue occurred because a OneFS script erroneously detected that the
journal was invalid.
If this issue occurred, the affected node or nodes could not boot, and the following
message appeared on the console:
Checking Isilon Journal integrity...
Attempting to save journal to default location
Warning: /etc/ifs/journal_bad exists. Saving bad journal.
OneFS is unmounted
A valid backup journal already exists. Not saving.
NVRAM autorestore status: Not performed...
Attempting to restore journal from disk backup...
Restore from disk failed
Attempting to save and restore journal to clear any ECC errors in
unused DRAM blocks...
Restore failed
Could not recover journal. Contact Isilon Customer Support
immediately.
On clusters with L3 cache enabled, if you updated SSD firmware by using an Isilon
Drive Support Package (DSP), it was possible to encounter an issue that could
cause data loss. If this issue occurred, data integrity (IDI) issues were reported as
an IDI event, and a critical event notification similar to the following was sent:
146182
For more information, see article 200097 on the EMC Online Support site.
When a node joins an Isilon cluster, the file system acquires a merge lock in order
to postpone joining the node until running file system operations are complete. In
rare cases, if an AutoBalance, FlexProtect, or MediaScan job was running while a
80
144214
Resolved issues
ID
node was joining the cluster, the merge lock was not released in a timely manner,
and the merge lock timed out. If this occurred, the file system could not be
accessed until the issue was resolved. In addition, messages similar to the
following appeared in the /var/log/messages log file, where <time> was the
number of milliseconds that the merge lock was held before timing out:
error
142835
Note
140906
If either L1 or L2 prefetch was disabled for a 4TB file, nodes that handled the file
unexpectedly rebooted while reading the last block of the file. If this issue
occurred, the following FAILED ASSERTION message appeared
140639
in /var/log/messages file:
*** FAILED ASSERTION end_l1 <= max_lbn @
/build/mnt/src/sys/ifs/bam/bam_file.c:1128
File system
81
Resolved issues
ID
Note
Beginning in OneFS 7.2.0.1, the snmpget function will time out after ten seconds
and will retry the affected request once.
Hardware
Hardware issues resolved in OneFS 7.2.0.2
ID
82
Resolved issues
ID
142946
This issue occurred because the value assigned to the maximum number of logical
drives allowed was not updated to fully accommodate HD400 nodes. For more
information, see article 198924 on the EMC Online Support site.
On older nodes running OneFS 7.2.0.0 through 7.2.0.1, if the
getNumBatteries() function was called to count the number of NVRAM
batteries in the node, the function did not return the correct number. As a result,
processes that relied on this information might not have performed correctly. For
example, battery tests might not have been correctly configured.
142159
If you ran the isi firmware update command to update node firmware on an
X210 or an X410 node, the update failed and the following error appeared on the
console, where <X> was the number of the node on which the update failed:
142141
ERROR: Node <X>: failed to cold reset car and unable to get
completion code and bit flag
Note
This issue occurred only on X210 and X410 nodes with CMC firmware version 00.0f
or earlier. You can confirm your version of the CMC firmware by logging on to any
node in the cluster and running the following command:
isi firmware status
141986
Hardware
83
Resolved issues
ID
Note
Beginning in OneFS 7.2.0.2, if a node containing SEDs that cannot be released from
ownership is reimaged by using a USB flash drive, before the node shuts down, the
following messages appear:
Failed to release SEDs, one or more drive(s) will be in SED_ERROR
state after reimage is complete and will require a PSID revert.
This may result in /ifs being unable to mount.
Press Enter to continue
If you press ENTER, the reimage process completes, and the node shuts down.
When the node is subsequently booted, you might be required to manually revert
the affected SEDs to restore the node to normal operation.
If the /var/db/hwmon/isi_hwmon.p file was damaged and you attempted to
start the isi_hwmon service, the service failed to start. In addition, lines similar to
the following appeared in the /var/log/isi_mcp file, confirming repeated
attempts to restart the isi_hwmon service:
141929
Note
84
139718
Resolved issues
ID
139697
Beginning in OneFS 7.2.0.1, the preceding messages are no longer logged, and the
relevant messages related to this test appear only on the node on which the
isi_bootdisk_read_test is run.
Even though a drive was smartfailed, physically removed, and replaced, the old
drive appeared in the output of the isi devices list command in a
suspended, smartfailed, or erased state. For example:
Unavailable drives:
Lnum 40
[SUSPENDED]
Unavailable drives:
Lnum 40
[SMARTFAIL]
Unavailable drives:
Lnum 40
[ERASE]
138207
After you installed an Isilon Drive Support Package (DSP) on a cluster, the year and 137271
month of the date recorded in the /var/log/isi_dsp_tool.log file was
overwritten. Because the day of the month was not also overwritten, it was possible
for the resulting date to be invalid. For example, the date could have been changed
to February 31st. If this occurred, an error similar to the following appeared on the
console during the post-install verification phase of the installation:
ValueError: day is out of range for month
Although an error appeared, the DSP was successfully installed. For more
information, see article 194343 on the EMC Online Support site.
If the LCD server process was unable to communicate with the LCD on the front
panel of a node, extraneous messages were repeatedly logged in the /var/log/
messages file.
136603
Note
Hardware
85
Resolved issues
ID
File "/usr/local/lib/python2.6/site-packages/isi/ui/lcd/
noritake.py", line 357, in verifyModel
File "/usr/local/lib/python2.6/site-packages/isi/ui/lcd/
noritake.py", line 350, in waitForResponse
LCDError: LCD did not respond
If you ran the isi firmware status command, OneFS might have
encountered an error while attempting to log a value that was too large. If this
occurred, the following error appeared on the console after running the isi
firmware status command:
135083
Job engine
Job engine issues resolved in OneFS 7.2.0.2
When certain jobs were run, the isi_job_d process created temporary files in
the /var/tmp directory. Files written to this directory are stored on the clusters
boot flash drives. In rare cases, writing to the boot flash drives could cause
excessive wear and premature boot flash drive failure.
Beginning in OneFS 7.2.0.2, the temporary files are created in the /
ifs/.ifsvar/tmp/jobengine directory.
141951
If a snapshot, or the first of a set of snapshots, was empty when the snapshot
delete job ran, the isi_job_d process failed, and lines similar to the following
appeared in the /var/log/messages log file:
140865
Note
Migration
Migration issues resolved in OneFS 7.2.0.2
After an initial VNX data migration, if a source file was replaced by a file that was a 147197
Block Device file or a Character Device file with the same name, the new file was
not copied to the target during the next or subsequent incremental data migrations.
86
Resolved issues
146151
Networking
Networking issues resolved in OneFS 7.2.0.2
On the Cluster Management > Network Configuration page in the OneFS web
administration interface, if you enabled the int-b interface and the InfiniBand (IB)
internal failover network and specified a valid subnet mask, and then assigned the
same IP address range or overlapping IP address ranges to the int-b network and
the IB failover network, a Subnet overlaps error appeared and you could not
142889
142068
142065
Note
This issue did not affect nodes that were rebooted following an upgrade.
A race condition sometimes occurred when the isi_flexnet_d and isi_dnsiq_d
141924
processes were both configuring IP addresses. If this condition occurred, the nodes
restarted unexpectedly, and lines similar to the following appeared in
the /var/log/messages file:
Stack: -------------------------------------------------kernel:trap_fatal+0x9f
kernel:trap_pfault+0x287
kernel:trap+0x313
kernel:sysctl_iflist+0x1e7
kernel:sysctl_rtsock+0x200
kernel:sysctl_root+0x121
kernel:userland_sysctl+0x18f
Networking
87
Resolved issues
141920
141587
show statistics
^
error: expecting {cache,cluster,debug,dns,parameters,server}
On the Cluster Management > Network Configuration page of the OneFS web
administration interface, it was possible to configure multiple subnets with the
same gateway priority value, even though gateway priority values must be unique.
If multiple subnet gateways were configured with the same priority value, users
were unable to access the cluster from a client in one subnet, but could
successfully connect to the same cluster from client in a different subnet.
140368
Note
It is not possible to configure multiple subnet gateways with the same priority value
from the command-line interface.
For more information, see article 88862 on the EMC Online Support site.
If a client used statically assigned cluster IP addresses to mount the cluster, and if
that client was connected to the cluster through SMB 2, the client could be
88
139170
Resolved issues
139044
138727
If you removed a gateway from a subnet, either through the OneFS web
administration interface or the command-line interface, the IP address for the
gateway remained in the routing table. As a result, if you ran the netstat
command to view information about the network configuration, the IP address that
was removed continued to appear in the output.
133973
If source-based routing (SBR) was enabled and static routes were also configured,
it was possible for SBR to override the static routes.
123581
Note
Beginning in OneFS 7.2.0.2, if SBR is enablied and static routes are also
configured, SBR excludes the static routes from SBR management.
NFS
NFS issues resolved in OneFS 7.2.0.2
When an NFSv4 client initiated a request to mount the pseudo file system, the
information that OneFS returned about the file system indicated that the maximum
file size allowed within the system was zero. As a result, some NFSv4 clientsfor
example, AIX 6.1 clientsdid not attempt to mount the file system.
143912
While OneFS was closing an idle client connection to an NFS export, it was possible 142269
to encounter a race condition. If this race condition was encountered, the NFS
server unexpectedly restarted and NFSv4 clients were disconnected from the
cluster. In addition, the following lines appeared in the /var/log/messages
file:
/usr/likewise/lib/lwio-driver/nfs.so:__svc_zc_clean_idle+0x1f7
/usr/likewise/lib/lwio-driver/nfs.so:rendezvous_request+0x7f6
/usr/likewise/lib/lwio-driver/nfs.so:svc_getreq_xprt+0x120
/usr/likewise/lib/lwio-driver/nfs.so:NfsListenerProcessTask+0x3b
0x800f15e5c (lookup_symbol: error copying in Ehdr:14)
0x800f1da9e (lookup_symbol: error copying in Ehdr:14)
0x8014f56bd (lookup_symbol: error copying in Ehdr:14)
142074
NFS
89
Resolved issues
If an NFSv4 client sent a request to the cluster while the file system was
unavailablefor example, while nodes were rebootingOneFS returned the wrong
response and did not correctly disconnect the client. If this occurred, lines similar
to the following appeared in the /var/log/messages file:
140511
Note
Beginning in OneFS 7.2.0.2, if an NFSv4 client sends a request to the cluster while
the file system is unavailable, the client is disconnected from the cluster and an
informative message is logged inthe /var/log/messages file.
Under some circumstances, although an NFS export was configured to return 32-bit 140372
file IDs for files created within the export, 64-bit file IDs were instead sent to the
client. As a result, the client could not access files on the cluster.
In environments where many NFSv4 clients were reading from and writing to the
139910
cluster, it was possible to encounter a condition that enabled a memory resource to
be over-allocated. If this issue occurred, the following lines appeared in
the /var/log/messages file:
/lib/libc.so.7:thr_kill+0xc
/lib/libc.so.7:__assert+0x35
/usr/likewise/lib/lw-svcm/nfs.so:xdr_iovec_allocate+0x191
/usr/likewise/lib/lw-svcm/nfs.so:svc_zc_getrec+0x1db
/usr/likewise/lib/lw-svcm/nfs.so:svc_zc_recv+0xa1
/usr/likewise/lib/lw-svcm/nfs.so:svc_getreq_xprt+0x11e
/usr/likewise/lib/lw-svcm/nfs.so:NfsSocketProcessTask+0x415
/usr/likewise/lib/liblwbase.so.0:EventThread+0x6b0
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0x100
/lib/libthr.so.3:_pthread_getprio+0x15d
The isi_cbind command did not parse numbers correctly. As a result, the
command could not be used to change settings that required a numeric value.
139008
90
141970
Resolved issues
SmartLock
SmartLock issues resolved in OneFS 7.2.0.2
On clusters running in compliance mode, the compadmin user did not have
permission to run the newsyslog command. As a result, the compadmin could
not manually rotate OneFS log files.
141953
SMB
SMB issues resolved in OneFS 7.2.0.2
In some cases, while the lwio process was shutting down on a node (because it
147473
was manually or automatically restarted), the lwio SRV component waited
indefinitely for a file object to be freed and did not shut down. If this occurred, after
5 minutes, the SRV service was stopped by the lwsm process and then
automatically restarted. SMB clients were unable to connect to the affected node
until the SRV service restarted.
Distributed Computing Environment (DCE) Remote Procedure Calls (RPCs) that were 147470
sent to the cluster in big-endian byte order were not correctly handled. As a result,
clients with CPUs designed to format RPCs in big-endian byte orderincluding
PowerPC-based clientswere unable to communicate with the cluster. For
example, PowerPC-based clients running Mac OS 10.5 and earlier were unable to
connect to SMB shares. If a packet capture was gathered to diagnose this issue, an
nca_invalid_pres_context_id RPC reject status code appeared in the
packet capture.
Although path names that are up to 1024 bytes in length are supported in OneFS
144100
7.2.0.x, if a user who was connected to the cluster from an SMB client attempted to
rename a file on the cluster in Windows Explorer, and if the full path to the renamed
file was greater than 255 bytes in length, the file was not renamed and the
following error appeared:
The file name(s) would be too long for the destination folder.
You can shorten the file name and try again, or try a location
that has a shorter path.
If you ran the isi smb settings shares modify command with the -revert-impersonate-user option to restore the --impersonate-user
option applied to a share to the default value, the command did not take effect
until the lwio process was restarted.
142066
After upgrading a cluster to OneFS 7.2.0.0 through OneFS 7.2.0.1, Linux and Mac
clients connecting to the cluster through SMB 1 were unable to view or list SMB
shares. If an affected Linux client attempted to list shares the following error
appeared:
142060
NT_STATUS_INVALID_NETWORK_RESPONSE
SmartLock
91
Resolved issues
As a result, SMB shares were not accessible to those Linux and Mac clients.
If an SMB2 client sent a compound request to the cluster, OneFS did not send the
correct response. As a result, the client was disconnected from the cluster.
141961
141943
If the SMB2Symlinks option was disabled on the cluster and a Windows client
141323
navigated to a symbolic link that pointed to a directory, under some circumstances,
the system returned incorrect information about the symbolic link. If this occurred,
the symbolic link appeared to be a file, and the referenced directory could not be
opened.
In addition, because OneFS 7.2.0.1 did not consistently check the OneFS registry to
verify whether the SMB2Symlinks option was disabled, in some cases, although
the SMB2Symlinks option was disabled, the lwio process attempted to handle
symbolic links when it should have allowed them to be processed by the OneFS file
system. If this occurred, the following error appeared on the client:
The symbolic link cannot be followed because its type is disabled.
138763
If both the antivirus Scan files when they are opened option and the
SMBPerformance Settings Oplocks option were enabled, and a file was opened,
modified, and closed multiple times through an application such as Microsoft
92
Resolved issues
137822
In Microsoft Windows, if you ran the mklink command to create a symbolic link to
a file or directory in an SMB share on the cluster, the command failed and the lwio
process sometimes unexpectedly restarted, if the name of the symbolic link began
with a colon (:). In addition, the following error appeared on the console:
137820
137772
Because OneFS did not respond correctly to a specific Local Security Authority
(LSA) request made by Mac OS 10 clients running Mac OS 10.6 through 10.10, the
ACLs and POSIX owner applied to an affected share could not be viewed from Mac
OS 10 clients running those versions.
135560
146937
Note
93
Resolved issues
If a OneFS upgrade was performed while nodes were down, the SmartPools portion
of the upgrade failed without presenting an error or logging a CELOG event. If this
issue occurred, new nodes could not be added to the cluster and nodes that were
removedfor example, nodes that were smartfailedcould not be re-added to the
cluster.
If you encountered this issue, and you ran the following command, the disk pool
version listed was not correct for the version of OneFS to which the cluster was
upgraded:
139285
Note
The correct disk pool version for clusters running OneFS 7.2.0.x is version 8.
If a USB flash drive with a bootable image of OneFS was attached to a node while
the node was being smartfailed, the partition table on the flash drive became
damaged. As a result, the node could not boot from the flash drive after it was
smartfailed, and the image on the flash drive was unusable.
110337
Virtual plug-ins
Function
al area
ID
Virtual
plug-ins
138741
94
Resolved issues
Function
al area
ID
/usr/lib/libisi_vasa_service.so:_Z39__ns5__query
AssociatedPortsForProcessorP4soapP38_ns4__query
AssociatedPortsForProcessorP46_ns4__query
AssociatedPortsForProce+0x102prodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/lib/
libisi_vasa_service.so:_Z50soap_serve___ns5__query
AssociatedPortsForProcessorP4soap+0xf7prodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/lib/libisi_vasa_service.so:_Z10soap_serveP4soap
+0x58prodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/local/apache2/modules/libmod_gsoap.so:_init
+0x1b66prodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/local/apache2/bin/httpd:ap_run_handler
+0x72prodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/local/apache2/bin/httpd:ap_invoke_handler
+0x7eprodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/local/apache2/bin/httpd:ap_process_request
+0x18eprodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/local/apache2/bin/httpd:ap_process_http_connection
+0x13dprodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/local/apache2/bin/httpd:ap_run_process_connection
+0x70prodisi1-6(id6)
/boot/kernel.amd64/kernel:
/usr/local/apache2/bin/httpd:worker_thread
+0x24bprodisi1-6(id6)
/boot/kernel.amd64/kernel:
/lib/libthr.so.3:_pthread_getprio+0x15d
ID
AVScan reports were deleted from the OneFS system 24 hours after the job
successfully completed because the end date for the reports was incorrectly set to
1970-01-01.
113563
Note
Authentication
Authentication issues resolved in OneFS 7.2.0.1
ID
138750
95
Resolved issues
ID
to the site configuration information require a refresh of the lsass service, this
behavior caused authentication services to become slow or unresponsive.
On a cluster with multiple access zones configured that was upgraded from OneFS
7.0.x or earlier to OneFS 7.2.0.0, if you attempted to create a local user from the
command line interface or through the OneFS web administration interface in an
access zone other than the System access zone, an error similar to the following
appeared, and the user could not be added to the access zone:
135537
135182
If you ran a recursive chmod command to add, remove, or modify an access control 134860
entry (ACE) to a directory that contained files that were quarantined by an antivirus
scan, the command stopped running when it encountered a quarantined file. As a
result, ACEs were only modified on the files and directories that were processed
before the command stopped running.
In the OneFS web administration interface, if you created a user mapping rule that
134825
contained incorrect syntax related to the use of quotation marks, the following error
appeared when you attempted to save the updated Access Zone Details:
Your access zone edit was not saved
Error #1: Rules parsing failed at ' ': syntax error, unexpected
QUOTED, expecting BINARY_OP or UNARY_OP
ID
A SyncIQ job configured with the --disable_stf option set to true sometimes
failed when an sworkera process responsible for transferring data during
replicationdetected differences between files on the source and target clusters
and then attempted to access and update the linmap database. If a SyncIQ job
failed as a result of this issue, the following error appeared in the
isi_migrate.log file:
132579
If a Multiscan or Collect job was running, it was possible for the job to attempt to
update the snapshot tracking file (STF) for a snapshot at the same time that a write
was made to a file under that snapshot. If this occurred, and if the STF file
contained a large number of files (in the millions), it was possible for the Multiscan
or Collect job to fail to account for some blocks of data in the STF file, or to account
96
138403
Resolved issues
ID
for some blocks of data more than once. If this issue occurred, errors similar to the
following appeared in the /var/log/idi.log file:
Malformed block history: marking free block
or
Malformed block history: freeing free block
Note
In addition to the errors that were logged, a coalesced event appeared in the list of
new events on the Dashboard > Events > Summary page in the OneFS web
administration interface. The event ID, which can be found by clicking View
details in the Actions column, was 899990001, and the message was as follows:
File system problems detected
The NDMP process ignored the protocol version setting in the config.xml file. As 135187
a result, only NDMP version 4 messages were accepted and sent.
In environments with a large number of configured SyncIQ policies, the
isi_classic sync job report and isi_classic sync list
commands sometimes took several minutes to return a list of SyncIQ reports.
135183
134846
If the paths added to the NDMP EXCLUDE or FILES environment variables exceeded 134845
the maximum length allowed1024 charactersthe affected backup job would fail
and an error similar to the following appeared in the ndmp_debug.log file:
ERRO:NDMP fnmmatching.c:413:isi_fnm_is_valid_pattern
Exclude pattern longer than 1024 limit
Note
The maximum length allowed is now handled by the Data Management Application
(DMA).
In rare circumstances, the isi_snapshot_d process failed due to an internal error
but the process would not exit. As a result, it was not possible to create new
scheduled snapshots or to recover previous versions of snapshot files created by
the scheduling system, and the following error message appeared in
the /var/log/isi_snapshot_d.log file, where [####] is the PID for the
isi_snapshot_d service:
134808
In environments with a large number of configured SyncIQ policies, the isi sync
job report and isi sync list commands sometimes took several minutes
to return a list of SyncIQ reports.
134429
97
Resolved issues
ID
SmartLock compliant files and directories that were backed up through an NDMP
file list back up could not be restored to a SmartLock domain. This issue occurred
because the selected files were not backed up in SmartLock compliance mode. If
this issue occurred, lines similar to the following appeared in the
ndmp_debug.log file:
134227
Cluster configuration
Cluster configuration issues resolved in OneFS 7.2.0.1
ID
Lwio subscriptions held by the isi_gconfig_d process were not always released in a
timely manner. As a result, the subscriptions sometimes accumulated. If a large
number of subscriptions accumulated, it sometimes took a long time to release
these resources back to the system and it was possible for the isi_gconfig_d
process to become unresponsive until the operation was complete. Because the
isi_gconfig_d process is responsible for maintaining SMB share configuration
information, if this issue occurred, SMB clients were prevented from viewing or
creating shares, and messages similar to the following appeared in
the /var/log/lwiod.log file:
139741
Command-line interface
Command-line interface issues resolved in OneFS 7.2.0.1
ID
If you ran the isi status -d -w command in an environment with long pool
names, the pool names broke into multiple lines in the outputas many as were
needed to fit into the table. Because the table was not widened to accommodate
the pool name, this caused issues with scripts that parse the output in the table.
134717
98
ID
The safe.id.nvram onsite verification test (OVT) did not include support for the
version 2.1 MLC NVRAM card model. As a result, the safe.id.nvram test failed and
139905
Resolved issues
ID
If you edited or added a notification rule, the first six configurable events listed on
the Edit Notification Rule and Add Notification Rule pages were related to
CloudPools, a feature that was not available on the cluster.
136709
135006
Note
134420
ID
Under rare circumstances, the FlexProtect and FlexProtectLin jobs left pointers to
blocks on a node or a drive that was no longer in the cluster. If a file was partially
truncated during a repair job (the job that is responsible for removing nodes or
drives), there was a narrow window where, if a further unlikely circumstance
occurred (such as a node reboot or a temporary network issue that affected backend network connections between nodes), then some snapshot data might have
been left under-protected. A subsequent mark job (such as MultiScan or
IntegrityScan) would then log attempts to mark blocks owned by a snapshot of the
truncated file on the node or drive that was no longer on the cluster. As a result,
messages similar to the following appeared in the /var/log/idi.log
and /var/log/messages files, where <Node>,<Drive> identified the device that
was no longer in the cluster:
139723
File system
Filesystem problems
File system
99
Resolved issues
ID
Note
This information is also available on the Dashboard > Events > Cluster Events
Summary page in the OneFS web administration interface. Contact EMC Isilon
Technical Support immediately if you see these messages on the console or in the
web administration interface.
If protocol auditing was enabled and the NFS auditing service was running, the NFS 136061
service failed to start. As a result, data access through NFS was limited. In addition,
the following NFS statuses appeared in the output after running the lwsm list |
grep nfs command:
flt_audit_nfs
nfs
onefs_nfs
[driver]
[driver]
[driver]
running
stopped
stopped
After adding a node to a large cluster that had L3 Cache enabled, some nodes in
the cluster might have unexpectedly rebooted.
136031
If there were millions of back end batch messages in a single batch initiator on a
node, the counter in the batch data structure sometimes reached the maximum
allowed value. If this occurred, the affected node could have rebooted
unexpectedly, causing clients connected to the node to be disconnected, and a
message similar to the following appeared in the var/log/messages
135828
In the OneFS web administration interface, if you increased the size of an existing
iSCSI LUN, OneFS did not include the space already used by the LUN when
calculating how much space the LUN would occupy after the LUN was resized. As a
result, the web administration interface would display a Size exceeds
134851
100
134725
Resolved issues
ID
efs.ko:_sys_ifs_mark_file_data+0x14c
kernel:isi_syscall+0x53
kernel:syscall+0x1db
--------------------------------------------------
134217
assigned value represented the number of milliseconds that the scanner thread
would sleep, when the value actually represented the number of operating system
ticks that the thread would sleep. The descriptions of the following sysctl options
have been updated to reflect the correct information:
l
efs.bam.av.scan_on_open_timeout
efs.bam.av.scan_on_close_timeout
efs.bam.av.batch_scan_timeout
efs.bam.av.nfs_request_expiration
efs.bam.av.scanner_wait_time
efs.bam.av.nfs_worker_wait_time
efs.bam.av.av_opd_restart_sleep
Note
To view the description of a sysctl option, run the following command where
<option> is the option whose description you want to view:
sysctl d <option>
Hardware
ID
X210 and X410 nodes that were configured to communicate through a 10 GigE
138521
network interface card that was using the Broadcom NetXtreme Ethernet (BXE)
driver that was introduced in OneFS 7.2.0 might have restarted unexpectedly. If this
occurred, a message similar to the following appeared in the var/log/
messages file:
Node panicked with Panic Msg: sleeping thread 0xffffff04692a0000
owns a nonsleepable lock
137173
Hardware
101
Resolved issues
ID
configCheck command on an HD400 node (a node that uses a new part number
format with more than 11 digits), the part number could not be processed, and
errors similar to the following appeared on the console:
Unexpected exception:
<type 'exceptions.TypeError'>
If you attempted to install a drive support package (DSP) while the /ifs partition
was not mounted, the following lines appeared on the console:
136710
Note
Beginning in OneFS 7.2.0.1, if you attempt to install a DSP when the /ifs partition
is not mounted, the following error appears:
ERROR: Cannot check if DSP is installed. Please ensure /ifs is
mounted.
If you ran the isi firmware update command on an HD 400 node and it
included updating the Chassis Management Controller (CMC) device firmware
along with other devices, the firmware update process might have failed. If the
process failed, errors similar to the following appeared on the console:
136039
HDFS
HDFS issues resolved in OneFS 7.2.0.1
ID
102
Resolved issues
ID
If a HAWQ client attempted to connect to HDFS over Kerberos, the connection and
authentication process failed and an error similar to the following was logged in
the /var/log/isi_hdfs_d.log file:
137967
During read operations, an HDFS client sometimes closed its connection to the
135859
server before reading the entire message received from the server. Although closing
connections in this manner did not cause any issues on the cluster, if this occurred,
the following message appeared multiple times in the isi_hdfs_d.log file:
Received bad DN READ ACK status: -1
If a user ran the hdfs dfs -ls command to view the contents of a directory on
the cluster, files to which the user did not have read access did not appear in the
output of the command.
135858
135644
Because OneFS did not properly handle requests from HDFS clients if the requests
contained fields that the OneFS implementation of HDFS did not support, affected
clients were unable to write data to the cluster. If this issue occurred, a
java.io.EOFException error similar to the following appeared on the client:
135568
HDFS
103
Resolved issues
ID
Under some circumstances, the isi_hdfs_d process handled the return value of a
135185
system call incorrectly, causing the HDFS process to restart. If this occurred, HDFS
clients were disconnected from the affected node, and the following error appeared
in the isi_hdfs_d.log file:
FAILED ASSERTION pr >= 0
During read operations, an HDFS client sometimes closed its connection to the
135184
server before reading the entire message received from the server. Although closing
connections in this manner did not cause any issues on the cluster, if this occurred,
the following message appeared multiple times in the isi_hdfs_d.log file:
Received bad DN READ ACK status: -1
If a Hadoop Distributed File System (HDFS) client attempted to perform a recursive 134863
operation on a directory tree, a race condition sometimes occurred in the
isi_hdfs_d process which caused the process to restart unexpectedly. This race
condition was most frequently encountered while an HDFS client was recursively
deleting directories. If the isi_hdfs_d process unexpectedly restarted as a result of
this condition, HDFS clients connected to the affected node were disconnected and
messages similar to the following might have appeared in the /var/log/
isi_hdfs_d.log file:
isi_hdfs_d: RPC delete raised exception:
Permission denied from rpc_impl_delete
(/usr/src/isilon/bin/isi_hdfs_d/rpc_impl.c:484)
from _rpc2_delete_ap_2_0_2 (/usr/src/isilon/bin/isi_hdfs_d/
rpc_v2.c:811)
Job engine
Job engine issues resolved in OneFS 7.2.0.1
ID
If a cluster was experiencing heavy client traffic, OneFS might have significantly
limited the amount of cluster resources that job engine jobs were allowed to
consume, causing jobs to run very slowly.
136193
ID
After performing an initial full migration from a VNX array to an Isilon cluster
through isi_vol_copy_vnx, if a hard link was deleted from the source VNX
135028
Migration
104
Resolved issues
ID
array and a new file with the same name was then created on the source array, it
was possible for the data from the new file to be improperly copied to the hard link
on the target cluster. This issue occurred because the isi_vol_copy_vnx utility
copied data from the new file into the pre-existing hard link when it should have
deleted the hard link from the target cluster, and then created the new file on the
target cluster. If this occurred, the new file was not accessible on the target cluster.
If the isi_vol_copy utility was unable to resolve on-disk identities associated
with data being migrated to a OneFS cluster, the operation timed out. If the
operation timed out, the correct user and group information might not have been
applied to the migrated data, and valid users and groups might not have had
access to the data following the migration. In addition, messages similar to the
following appeared on the console and in the /var/log/messages file:
134715
If you ran the isi_vol_copy utility to migrate files from a NetApp filer to an Isilon 134434
cluster, and the ACL setting Deny permission to modify files with
DOS read-only attribute over both UNIX (NFS) and Windows
File Sharing (SMB) was enabled, incremental migrations might have failed to
transfer some files to which the DOS read-only attribute was applied. If this
occurred, errors similar to the following appeared in the isi_vol_copy.log file:
./dirX/fileY.txt: cannot create file: Operation not permitted
Networking
Networking issues resolved in OneFS 7.2.0.1
ID
The OneFS web administration interface allowed the same IP address range or
overlapping IP address ranges to be assigned to the int-a and int-b interfaces and
the InfiniBand internal failover network. If a cluster was configured with the same
or overlapping IP address ranges, nodes sometimes displayed unexpected
behavior or unexpectedly rebooted.
136888
Note
Beginning in 7.2.0.1, the IP ranges for the int-b interface and the InfiniBand
internal failover network cannot be configured until a valid Netmask has been
specified.
The rate of data transfer to and from nodes that were configured with link
aggregation on their 10GbE network interfaces in combination with a maximum
transfer unit (MTU) of 1500 was sometimes slower than the rate of data transfer to
and from nodes that were not configured in this way.
136887
If SmartConnect zone aliases were configured on a Flexnet pool, a memory leak that 136704
could affect several processes related to the SyncIQ scheduler was sometimes
encountered. If this memory leak occurred, scheduled SyncIQ jobs did not move to
Networking
105
Resolved issues
ID
the running state, and lines similar to the following appeared in the
isi_migrate.log file:
isi_migrate[6923]: sched: siq_gc_conf_load: Failed to
gci_ctx_new: Could not allocate parser read buffer: Cannot
allocate memory
As a result, SyncIQ jobs in a scheduled state never moved to the running state.
If a new node was added to a cluster that was configured for dynamic IP allocation,
SmartConnect did not detect the configuration change and did not assign the new
node an IP address. As a result, clients could not connect to the affected node. If a
group change occurred after the new node was added, or if IP addresses were
manually rebalanced by running the isi networks --sc-rebalance-all
command, SmartConnect then detected the configuration change and assigned an
IP address to the new node.
136295
Because the driver for the 10 GbE interfaces on the A100 Accelerator nodes was
out-of-date, the interfaces sometimes unexpectedly stopped transferring data. If
you ran the ifconfig command to confirm the status of an affected interface, a
no carrier message appeared, even if a cable in good working order was
136293
135193
Server Failure
This affected the performance of applications running on the cluster that performed
large numbers of DNS lookups, such as mountd.
If an IPv4 SmartConnect zone was a subdomain of another SmartConnect zone (for
example, name.com and west.name.com), clients that sent a type AAAA (IPv6) DNS
request for the subdomain zone received an NXDOMAIN (nonexistent domain)
response from the server. This response could have been cached for both type A
(IPv4) and type AAAA requests. If this occurred, future DNS requests for the
subdomain zone (in this example, west.name.com) could also receive an
NXDOMAIN response, preventing access to that SmartConnect zone.
135173
134723
messages file, and the interfaces had to be manually removed from the pool.
NFS
NFS issues resolved in OneFS 7.2.0.1
ID
If all of the following factors were true, a user with appropriate POSIX permissions
was denied access to modify a file:
141210
106
Resolved issues
The user was a member of a group that was granted read-write access to the
file through POSIX mode bit permissions, for example, -rwxrwxr-x (775).
ID
Depending on how the file was accessed, errors similar to the following might have
appeared on the console:
Permission denied
or
Operation not permitted
For more information, see article 197292 on the EMC Online Support site.
If users were being authenticated through a Kerberos authentication mechanism,
NFS export mapping rules such as map-root and map-user were not being enforced
for those users. As a result, the file permissions check was not correct, and users
might have had incorrect allow or deny file access permissions.
139001
If the NFS server was unable to look up a user through the expected providerfor
example, if the LDAP provider was not accessiblethe NFS server did not attempt
to look up the user in the local database, but instead mapped the user to the
nobody (anonymous) user account. As a result, some users were denied access to
resources that they should have had access to.
138784
Due to a memory leak, each time an NFS client registered or unregistered through
Network Lock Manager (NLM), some memory was allocated but never returned to
the system. Over time, this behavior could have caused a node to run out of
available memory, which would have caused the affected node to unexpectedly
reboot. If a node unexpectedly rebooted, clients connected to that node were
disconnected.
137261
If an NFS export that was hosting a virtual machine's (VM) file system over NFSv3
became unresponsive, the VM's file system became read-only.
136637
If the OneFS NFS server was restarted, it assigned client IDs to NFS clients
beginning with client ID 1. As a result, in environments with very few NFS clients, it
was possible for a client to be assigned the same client ID before and after the NFS
server was restarted. If this occurred, the NFS client did not begin the necessary
process to recover from the loss of connection to the NFS server, and the NFS client
became unresponsive.
136365
If a network or network provider became unavailable, the LDAP provider might have 135780
evaluated some error conditions incorrectly, causing inaccurate or empty netgroup
information to be cached and distributed to nodes in the cluster. If incorrect or
empty netgroup information was distributed, LDAP users could not be
authenticated and could not access the cluster.
If the isi_nfs4mgmt tool was called to manage clients on a node that had
thousands of NFSv4 clients connected, the NFS service unexpectedly restarted,
causing a brief interruption in service, and lines similar to the following appeared
in the /var/log/messages file:
135690
NFS
107
Resolved issues
ID
/usr/likewise/lib/lwio-driver/nfs.so:xdr_pointer+0x74
/usr/likewise/lib/lwio-driver/nfs.so:xdr_nfs4client+0x114
/usr/likewise/lib/lwio-driver/nfs.so:xdr_reference+0x42
/usr/likewise/lib/lwio-driver/nfs.so:xdr_pointer+0x74
/usr/likewise/lib/lwio-driver/nfs.so:xdr_nfs4client+0x114
/usr/likewise/lib/lwio-driver/nfs.so:xdr_reference+0x42
/usr/likewise/lib/lwio-driver/nfs.so:xdr_pointer+0x74
[ repeats many times ]
While the NFS service was being shut down, it could have attempted to use memory 135528
that was already freed. If this occurred, the NFS service restarted. Because the
service was being shut down, there was no impact to client services.
In environments with NFSv4 connections, the 30-second lease time setting for the
vfs.nfsrv.nfsv4.lockowner_nolock_expiry sysctl was not properly
applied by the OneFS NFS server if locks were held for a very brief duration. As a
result, the server prematurely timed out lock owners, causing the server to send an
NFS4ERR_BAD_STATEID error to the client. In some cases, affected NFS
135467
clients were temporarily prevented from accessing one or more files on the cluster.
Because the NFS refresh time was in the range of 10 minutes per 1000 NFS exports, 135222
if you had thousands of exports, there was a significant delay before changes and
additions became effective. This delay might have adversely affected NFS
workflows.
If you ran the isi nfs exports create command with the --force option
to force the command to ignore bad hostname errors, the command also ignored
export rule conflicts. As a result, it was possible to create two exports on the same
path with different rules. For example, you could create two exports of the /ifs/
data directory where export 1 was set to read-write permissions and export 2 was
set to read-only permissions. If an NFS client connected to the /ifs/data export,
either rule could have been applied, resulting in an inconsistent experience for the
client.
135217
During the NFS export host check, although an IPv6 address (AAAA) was not
135192
configured on the node, AAAA addresses were searched. As a result, during startup,
mountd would be very slow to load export configurations that referred to many
client hosts.
On systems with thousands of NFS exports, it might have taken several minutes to
list the exports with the isi nfs export list command.
135111
If you attempted to modify thousands of exports using the isi nfs export
modify command, the following error appeared:
135107
108
Resolved issues
ID
NFS clients to the affected node sometimes timed out and messages similar to the
following appeared in the /var/log/messages file:
lkfd_simple_waiter_backup_resp_cb: Unregister for client: 0x<lkfclient-id> failed with error: 16
If the cluster was handling many client requests from clients connected through
different protocols (for example, both SMB and NFS clients), contention for filesystem resources sometimes caused delays in client request processing. If the
processing of client requests was delayed, kernel resources might have been
reserved more quickly than they were released until all resources were eventually
consumed, and then the node restarted unexpectedly.
133963
ID
OneFS API
Because the RESTful Access to the Namespace (RAN) API process was not case136526
sensitive, if you queried for a directory or file name through the RAN API, it was
possible for the query to return the wrong file. For example, if the file system
contained a file named AbC.txt and a file named abc.txt, a query for AbC.txt
might have returned abc.txt instead.
If a user with an RBAC role was deleted from Active Directory and then the role that
the user belonged to was modified, an erroneous entry was added to the sudoers
file. As a result, if a user ran the sudo command, a syntax error similar to the
following appeared:
sudo:
sudo:
sudo:
sudo:
135186
134445
If a namespace API query used the max-depth query parameter to discover the
number of files and subdirectories in the /ifs/home directory, the query
sometimes returned only a portion of the contents of the directory. In other cases,
the query returned the entire contents of the directory. If either result was returned,
the object_d job unexpectedly restarted.
134416
ID
In the OneFS web administration interface, on the Cluster Diagnostics > Gather
Info page, if you clicked the Start Gather button to collect and send log files to
EMC Isilon Technical Support and the file upload failed, the Gather Status bar
indicated that the gather succeeded. However, no .tgz file was created and new
gathers could not be started.
134854
OneFS API
109
Resolved issues
ID
In the OneFS web administration interface, if the cluster time zone was changed,
the new date and time set on the cluster was sometimes incorrect. If the new date
and time set on the cluster was significantly different than the correct date and
time in the selected time zone, the difference could prevent the cluster from
properly communicating or synchronizing with external systems, such as Active
Directory domain controllers.
134426
ID
SmartLock
In compliance mode, the compadmin role did not have read permissions for several 134422
log files, including the isi_papi_d and isi_papi_d_audit log files. As a
result, the log files were not collected during the isi_gather_info process.
SmartQuotas
SmartQuotas issues resolved in OneFS 7.2.0.1
ID
If a default-user quota existed on a directory where the user did not have a linked
quota, and you modified the default-user quota to clear a threshold and then again
to set a threshold, the user quota domain was not created, and the following
message appeared if the isi quota quotas create command was run,
where <username> was the name of the specific user:
135225
Creating:
user:<username>@snaps=no@/ifs/data/ec_workareas FAIL
!! Failed to create domain 'user:<username>@snaps=no@/ifs/data/
ec_workareas': Failed to save
!! domain: Invalid argument
134213
133641
SMB
110
ID
If a Windows client that was connected to the cluster through SMB copied a file
from the cluster, the timestamp metadata applied to the file might have become
invalid. This issue occurred because OneFS did not properly interpret the value
142313
Resolved issues
ID
assigned to a file's timestamp metadata if the value was set to -1, which is a valid
value. Workflows that rely on timestamp metadata might have been negatively
affected by this issue.
Note
The SMB protocol specifies that, when file attributes are set, a value of -1 indicates
that the attribute in the corresponding field must not be changed.
For more information, see ETA 198187 on the EMC Online Support site.
On a Microsoft Windows client, if you attempted to delete a file from an SMB share 139852
and the letter case of the file path that you wanted to delete did not exactly match
the actual letter case of the share path, the file was not deleted, and, if lwio logging
was increased to the DEBUG level, the following messages appeared in
the /var/log/lwiod file:
Status: STATUS_OBJECT_NAME_NOT_FOUND
Note
136889
NetBIOS requests sent over SMB 2 were not properly handled. As a result, the lwio
process unexpectedly restarted and lines similar to the following appeared in
the /var/log/messages file:
135468
/lib/libc.so.7:thr_kill+0xc
/usr/likewise/lib/liblwbase_nothr.so.0:__LwRtlAssertFailed+0x5a
/usr/lib/libisi_ntoken.so.1+0x23d673:0x808490673
/usr/lib/libisi_ntoken.so.1+0x243b4e:0x808 496b4e
/usr/lib/libisi_ntoken.so.1+0x2453c5:0x8084983c5
/usr/lib/libisi_ntoken.so.1+0x2e450f:0x80853750f
/usr/likewise/lib/liblwbase.so.0:EventThread+0x333
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec
/lib/libthr.so.3:_pthread_getprio+0x15d
If SRV logging was enabled in the OneFS registry, incoming SMB 1 requests caused
the lwio process to unexpectedly restart. If the lwio process restarted, SMB clients
connected to the cluster were disconnected.
134224
OneFS permitted the use of forward slashes in path names, and, in OneFS, forward
slashes within an SMB request were converted to backslashes. This behavior did
not comply with the SMB protocol, which specifies that such a request should fail
and return the following error:
132952
OBJECT_NAME_INVALID
SMB
111
Resolved issues
ID
Note
OneFS 7.2.0.1 and later versions of OneFS comply with the SMB protocol. If an SMB
request that contains a forward slash is received, an OBJECT_NAME_INVALID error
is returned.
Because OneFS sent an incorrect response to a NetBIOS session request, the
132574
request to connect was closed and the NetBIOS client could not connect to the
cluster. If the session request was closed, lines similar to the following appeared in
the packet capture:
10.0.0.35 10.0.0.100
10.0.0.100 10.0.0.35
10.0.0.35 10.0.0.100
10.0.0.100 10.0.0.35
Virtual plug-ins
Virtual plug-ins issues resolved in OneFS 7.2.0.1
ID
Due to an error that occurred when drive capacity was checked during the creation 133546
of a new OneFS 7.2.0.0 cluster through the OneFS 7.2.0.0 simulator, after creating
a cluster on a system running Microsoft Windows or on a Microsoft Windows virtual
machine, the new cluster did not boot up, and the following messages appeared on
the console:
mount_efs:
mount_efs:
mount_efs:
IFS failed
ID
If the ID of an antivirus scan report was more than 15 characters long, the OneFS
web administration interface and command-line interface would report the job as
running forever. Any threats detected by the scan would not be associated with the
correct policy.
125535
If you ran the isi avscan report purge command while an antivirus scan
was running, OneFS would sometimes delete the report of the antivirus scan that
was currently in progress.
125534
A syntax error in the .xml file from which AVScan reports are generated caused
125526
reports accessed from the Data Protection > Antivirus > Reports page in the
OneFS web administration interface to not include threats that appeared on the
Data Protection > Antivirus > Detected Threats page.
112
Resolved issues
Authentication
Authentication issues resolved in OneFS 7.2.0.0
ID
Users assigned to the admin group were able to reuse a previously used password
immediately even if the Password History Length option was configured to
prevent the reuse of a specified number of previously used passwords.
130656
Users with assigned roles could not access the Cluster Management >
Diagnostics page because permission to access the Diagnostics page was
assigned only to the ISI_PRIV_SYS_SUPPORT privilege.
130342
OneFS now defaults to LDAP paged search if both paged search and Virtual List
View (VLV) are supported. If paged search is not supported and VLV is enabled on
the LDAP server, OneFS will use VLV when returning the results from a search.
130171
Note
In most cases, bind-dn and bind-password must be enabled in order to use VLV.
If a mapping rule contained a username with a space, mapping tokens would fail,
which prevented users from joining.
130024
Because the lsass process could not distinguish between different trust domains
130003
that shared the same NetBIOS name, role-based authentication would fail when
clients that were connected to the cluster through SSH, CIFS, or the web
administration interface tried to access the identically-named domains. As a result,
the identically-named domains were inaccessible.
If the dup() function (a function that duplicates a file descriptor) failed, no error was 128435
returned to the lsass process. As a result, the lsass process attempted to pass a
nonexistent file descriptor to the lwio process. If this condition was encountered,
there was a potential for SMB clients to be temporarily prevented from
authentication on the cluster.
If you changed the machine name for the local provider (system zone) to include
periods or commas, errors similar to the following were logged in the /var/log/
messages file when an administrator attempted to create new users from the
command line:
123878
While the lwio process was in the process of shutting down, it sometimes
referenced a data structure that no longer existed. If this occurred, the following
lines were logged in the /var/log/messages file:
123397
Stack: -------------------------------------------------/lib/libthr.so.3:_pthread_mutex_lock+0x1d
/usr/likewise/lib/lwio-driver/onefs.so:OnefsAsyncTableGet+0x1f
/usr/likewise/lib/lwio-driver/onefs.so:OnefsAsyncUpcallCallback
+0x58
/usr/lib/libisi_ecs.so.1:oplocks_event_dispatcher+0xb9
/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockChannelRead+0x8c
/usr/likewise/lib/liblwbase.so.0:EventThread+0x333
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec
/lib/libthr.so.3:_pthread_getprio+0x15d
--------------------------------------------------
Authentication
113
Resolved issues
ID
When the RequireSecureConnection over LDAP setting was enabled, connection to 114935
the LDAP server failed because the StartTLS command was not sent from the cluster
to the LDAP server.
ID
If one or more objects, such as a file or directory, were moved out of the scope of a
SyncIQ policys root path between two sequential snapshots, subsequent
ChangelistCreate jobs for those two snapshots failed and errors similar to the
following appeared in the isi_job_d log file:
133809
133504
Stack: -------------------------------------------------kernel:isi_assert_halt+0x2e
kernel:btree_leaf_get_key_at_or_before+...
kernel:sbt_txn_get_entry_at+...
kernel:_sys_ifs_sbt_get_entry_at+0x2a7
kernel:isi_syscall+0x7fkernel:syscall+0x325
-------------------------------------------------*** FAILED ASSERTION pct.num < pct.den @ /build/mnt/src/sys/ifs/
btree/btree_leaf.c:2418
Stack: -------------------------------------------------kernel:isi_assert_halt+0x2e
kernel:btree_leaf_get_key_at+0x15c
kernel:sbt_txn_get_entry_at+0x287
kernel:_sys_ifs_sbt_get_entry_at+0x2a7
kernel:isi_syscall+0x7fkernel:syscall+0x325
-------------------------------------------------*** FAILED ASSERTION pct.num < pct.den @ /build/mnt/src/sys/ifs/
btree/btree_inner.c:2532
114
A SyncIQ job sometimes failed while handling a file with a hard link if the hard link
referred to a file that no longer existed.
131302
If you created a Smartlock directory, set the retention date to "forever," and then
attempted to restore the directory through NDMP, the NDMP job failed and the
131138
Resolved issues
ID
131001
130756
CLI timeout exceeded while waiting for the server to respond; the
request still may have completed.
A SyncIQ job configured with the --disable_stf option set to true sometimes
failed when an sworker (a process responsible for transferring data during
replication) detected differences between files on the source and target clusters
and then attempted to access and update the linmap database.
If a SyncIQ job failed as a result of this issue, the following error appeared in the
isi_migrate.log file:
130340
A work item has been restarted too many times. This is usually
caused by a network failure or a persistent worker crash.
If a SyncIQ policy designated a target directory that was nested within the SyncIQ
target directory of a preexisting policy, an error occurred during SyncIQ protection
domain creation which caused the SyncIQ policy's protection domain to be
incomplete. If this occurred, the following message appeared in the /var/log/
isi_migrate.log file:
130337
In addition, if you ran the isi domain list-lw command, the Type field for the
affected SyncIQ target was marked Incomplete.
SyncIQ requests from the OneFS command-line interface and web administration
130000
interface repeatedly opened and closed the reports.db SQlite database. As a result,
changes made in the web administration interface would not take effect and
commands run from the command-line interface might not return results and
eventually failed.
If a large number of replication policies existed on the cluster, the isi sync
policies list command might timeout before the command completed.
129999
The pthread-cancel process would sometimes fail without releasing the resources it 129997
contained. As a result, other processes stopped indefinitely.
If the number of file system event snapshots exceeds the amount of space
allocated by the OnefsEnumerateSnapshots buffer, the lwio process will restart on
various nodes, causing clients to be disconnected and then reconnected to the
cluster.
125571
115
Resolved issues
ID
125536
Replication reports created for the first run of a replication policy sometimes
contained inaccurate values. All other replication reports were accurate.
122906
Due to a memory leak in the isi_papi_d process, the process would sometimes stop 120509
responding. As a result, SyncIQ policies would not be listed in the OneFS web
administration interface or after running the isi sync policy list command
from the command-line interface.
Cluster configuration
Cluster configuration issues resolved in OneFS 7.2.0.0
ID
If an unprovisioned drive was physically removed from a node without first being
smartfailed and the isi_drive_d process was subsequently restarted (either
manually or automatically), OneFS attempted to reprovision the removed drive,
preventing new drives and nodes from being provisioned. As a result, new drives
and nodes could not be added to the cluster.
132913
Default SmartPools jobs incorrectly scanned configuration information for all files
on a cluster. As a result, SmartPools jobs progressed for days, but did not
complete.
132309
ID
131035
Some events were configured with invalid variable bindings (the association
between a variable name and its value). As a result, SNMP alerts were not sent for
these events.
130621
If a Windows 8.1 client or a Windows server 2012R2 SMB2 client requested file
system volume and attribute information from the cluster and the maximum
response length requested by the client was too small to hold the entire response,
the affected node would return a STATUS_BUFFER_TOO_SMALL response
130589
116
130155
Resolved issues
ID
130128
The flt_audit lwio filter driver would fail to audit SMB traffic on files with non-ASCII
characters in their names. As a result, these files were not audited, and Failed
130011
129143
Access to files with non-ASCII characters in their names was not audited. A client
could access and modify such a file without problem, but the action would cause
an error in the audit filter driver and the following message would display in the
lwiod.log file:
127508
ERROR:flt_audit:0x805c02560:SyncGetFileName():audit_info_util.cpp:
563: Failed to allocate memory for a path: UNKNOWN'
If you ran the isi statistics heat command with the --events option, the 125549
output was not filtered correctly.
Incorrect SNMP traps were sent for some alerts. The alerts were sent with the
correct alert level but indicated that the wrong threshold had been exceeded. For
example, when a high temperature threshold was exceeded, a critical SNMP trap
was sent, however the trap stated that the low temperature threshold was
exceeded.
125541
SNMP traps were not sent if two or more SNMP recipients were defined in an event
notification rule.
125537
ID
File system
Whenever the asynchronous delete operation (an operation which deletes files in
132921
the background while the user can run other OneFS operations) finished before all
the data was deleted, the synchronous delete path reverted a file back to
asynchronous delete. As a result, the asynchronous delete operation became stuck
in an endless loop, and multiple nodes attempted to delete the file at the same
time. This resulted in performance issues for the user.
A race condition could be encountered if Network Lock Manager (NLM) received a
would_block lock request from an NFS client just before a group change began. If
the race condition was encountered, a node could have been prevented from
leaving the group and the node that prevented the group change could become
unavailable. If a node became unavailable, client connections to the affected node
timed out or were unresponsive.
131724
File system
117
Resolved issues
ID
If a node was leaving the cluster at the same time that the node received a lock
request from an NFS client, a lock failover (LKF) waiter might be created. If this
occurred, the affected node was prevented from leaving the cluster and would
unexpectedly restart.
131198
It was possible to configure the overcommit limit below the low and high
overcommit thresholds (which is an invalid configuration). If the overcommit
thresholds were configured incorrectly, nodes sometimes ran out of memory and
unexpectedly rebooted.
130363
Note
The overcommit thresholds are set through the following sysctl settings:
l
vfs.nfsrv.rpc.request_space_overcommit
vfs.nfsrv.rpc.request_space_high
vfs.nfsrv.rpc.request_space_low
129562
128750
If the filename related to a change notify event was not a valid UTF-8 string, an
assertion error would sometimes occur, resulting in the lwio process restarting.
128077
If the event count for change events exceeded the 32-bit counter limit, multiple
124720
nodes might reboot unexpectedly and lines similar to the following would appear in
the kernel stack trace:
kernel:isi_assert_halt+0x42
efs.ko:bam_event_synchronize+0x7f5
efs.ko:ifs_vnop_wrapunlocked_write_mbuf+0x612
kernel:recvfile+0x6e7
kernel:isi_syscall+0x64
kernel:syscall+0x26e
If a customer-issued API call was used to look up the lock status of any given file in 122759
the cluster at the exact same time that a node was being taken off the cluster, and
that node was the one tasked with the API verification, the entire cluster might have
become unable to serve any NFS or SMB connections for about five minutes.
118
Resolved issues
File transfer
File transfer issues resolved in OneFS 7.2.0.0
ID
If an HTTP client sent a request to the cluster through the Apache WebUI service,
the following message appeared repeatedly in the /var/log/apache2/
webui_httpd_error.log file:
130648
Requested service
"WK" doesn't match authenticated session services.
If the httpd process was handling a large number of client connections, the process 127190
sometimes unexpectedly restarted while accepting a connection from an http
client. If the process restarted, http clients connected to the affected node were
disconnected from the cluster.
Hardware
Hardware issues resolved in OneFS 7.2.0.0
ID
In some cases, when an X410 or S210 node was configured for the first time,
during the initial boot-up process the node did not boot completely, and the
following error messages appeared on the console:
131674
For more information about this issue, see KB 190590 on EMC Online Support.
If a battery that supplies power to the mt25208 NVRAM failed, the LED on that
battery remained green instead of turning red, even though the CELOG alert
correctly indicated the batterys failure. This issue affected the following node
types: S200, X200, X400, or NL400 nodes.
130683
If the PCIe connection between the motherboard and the NVRAM/IB card was
129914
disrupted, the affected node stopped responding. If the unresponsive node was
subsequently powered down, the NVRAM/IB card failed to set Fast Self Refresh
(FSR). If FSR was not set when the node was powered down, the NVRAM journal was
not preserved and the following message appeared on reboot:
Could not recover journal. Contact Isilon Customer Support
Immediately.
Installing a power supply unit (PSU) firmware update on a common form factor (CFF) 129810
PSU in a node with only one working PSU caused the node to shut down. This
occurred because the working PSU was rebooted as part of the PSU firmware
update process.
Attempting to SmartFail a SED would sometimes fail, even after the drive had been
manually removed and successfully replaced.
129326
File transfer
119
Resolved issues
ID
If the QLogic 10 Gig Ethernet card experienced a timeout from a request initiated by 129252
the Direct Memory Access Engine (DMAE), lines similar to the following appeared in
the /var/log/messages file.
Stack: -------------------------------------------------if_bxe.ko:bxe_write_dmae+0xd0
if_bxe.ko:bxe_write_dmae_phys_len+0x78
if_bxe.ko:ecore_init_block+0x122
if_bxe.ko:bxe_init_hw_common+0x7d5
if_bxe.ko:bxe_init_hw_common_chip+0x18
if_bxe.ko:ecore_func_hw_init+0xd7
if_bxe.ko:ecore_func_state_change+0x10c
if_bxe.ko:bxe_init_hw+0x41
if_bxe.ko:bxe_nic_load+0x726
if_bxe.ko:bxe_init_locked+0x18c
if_bxe.ko:bxe_handle_chip_tq+0x86
kernel:taskqueue_run_locked+0x9a
kernel:taskqueue_thread_loop+0x48
kernel:fork_exit+0x7f
--------------------------------------------------
129012
If you added an unsupported boot drive to a node, a CELOG alert was properly
generated, but a traceback blocked the addition of the entry to the baseboard
management controller's (BMC) system events log. As a result, there was no report
of the unsupported drive in the events log.
128984
When installing the drive support package (DSP) that contained firmware for
127958
upcoming hardware models, the installer reported errors similar to the following. In
addition, although installation was successful, an error message indicating that the
installation had failed appeared on the console and in the isi_dsptool.log
file:
- ERROR: Found 2 error messages in isi_dsptool logfile
- ERROR Gconfig parse warnings for file /dsp_staging/config/
models/HGST_HUS726060ALA640.gc; dropping unrecognized entries
- ERROR Gconfig parse warnings for file /ifs/.ifsvar/modules/
hardware/drives/config/models/HGST_HUS726060ALA640.gc.2; dropping
unrecognized entries
DSP Install Failed
Note
120
The CTO upgrade process did not complete on clusters in compliance mode.
126895
If you added a node to a cluster that was configured to synchronize its time with an
external NTP server, the cluster would sometimes synchronize its time with the
node that was added. As a result, the cluster time might have been so different
from the time on the NTP server that the cluster would not automatically correct
itself.
126652
Unprovisioned nodes could not be added to a manual node pool. If you added
nodes in your cluster to one or more manual node pools, and then attempted to
add one or two nodes to the cluster, OneFS would not be able to add those nodes
126363
Resolved issues
ID
to a node pool, and so those nodes would be unprovisioned, and you would be
unable to add those unprovisioned nodes to the manual node pools.
A100 nodes reported warning-level sensor errors when a power supply was
removed or failed, rather than reporting a critical-level redundant power supply
failure.
126321
If you remove a power cable from the Power Supply Unit (PSU) on an A100 node,
the isi_hw_status command incorrectly displayed the following output:
126240
Power Supplies OK
After a drive-down operation completed, nodes would sometimes panic with the
following error message:
126239
If you shut down a node with a failed boot drive, the node would sometimes stop
responding during the shut down process because the journal for the node could
not be saved to the failed boot drive.
126219
If the InfiniBand card in a node ran out of memory, the affected node might have
been disconnected from the cluster.
124325
ID
HDFS
Due to an issue in the Hadoop Distributed File System (HDFS) code, HDFS-1497, the 127983
sequence numbers assigned to HDFS data packets were not always consecutive. If
OneFS received an HDFS data packet with a sequence number that was not
consecutive, the affected HDFS client connection was closed.
Job engine
Job engine issues resolved in OneFS 7.2.0.0
ID
The isi_job_d process might fail, causing jobs to briefly pause and then resume.
136028
Data reliability issues could occur after the job engine ran Collect or MultiScan
jobs.
132695,
132696,
132697,
132698
Job engine logging always ran at trace level, a level used to gather detailed
information about job engine processes. As a result, job engine performance was
adversely affected and the job engine log file, isi_job_d.log, was
unnecessarily flooded with messages.
132895
If the cluster was being monitored by InsightIQ, the FSA job could fail with the
following error message:
130999
HDFS
121
Resolved issues
ID
If you ran the FlexProtect job while the impact of the job was set to high, nodes that 129445
contained SSD drives would sometimes panic.
When smartfailing a drive with very little data on it, a FlexProtect or FlexProtectLin
job could pause in phase 2 for as long as two hours, causing the job to be
cancelled by the system. Until a FlexProtect or FlexProtectLin job successfully
completed, no other jobs could run. In addition, the cluster could fall below the
configured protection level.
129349
126675
After creating a custom job policy with isi job policy create, the values for 125544
the job impact policies were incorrectly set and jobs could not be run. Errors similar
to the following appeared:
Parse warnings from defaults:
Multiple errors:
Repeated
disk record:
old={ivar:impact.policies {token:0, version:
1, flags:---I---} = (read: write:)}
new={ivar:impact.policies {token:0, version:1, flags:---I---} =
(read: write:)}
Repeated disk record:
old={ivar:impact.policies {token:0, version:1, flags:---I---} =
(read: write:)}
new={ivar:impact.policies {token:0,
version:1, flags:---I---} = (read:HIGH write:HIGH)}
If you upgraded your cluster from OneFS 6.5.5, you could not modify job impact
policies.
125543
ID
The isi_vol_copy_vnx utility did not properly handle new files that were added to
new directories between incremental copies. As a result, incremental copies failed.
131728
ID
Some changes to VLAN tagging pools, such as adding a Network Interface Card
(NIC) or rebalancing dynamic IPs, caused the Smartconnect process to stop
responding to DNS queries until the cluster was rebooted or until the isi_dnsiq_d
service was restarted.
132022
Migration
Networking
Due to issues in the failover code path in the Sockets Direct Protocol (SDP), failover 131544
to the backup InfiniBand (IB) fabric could fail. If failover was unsuccessful, the
Isilon cluster was unavailable until the IB switches were rebooted. The potential for
encountering these issues was limited, but the potential increased in proportion to
the number of nodes in the cluster.
Under some conditions, the flx_conf.xml file could not be accessed
immediately after a group change occurred on a cluster. If this issue was
encountered, the SmartConnect process, isi_dnsiq_d, unexpectedly restarted on
122
130702
Resolved issues
ID
one or more nodes and the following lines appeared in the /var/log/messages
file of the affected nodes:
/usr/lib/libisi_flexnet.so.1:flx_config_get_kevent+0x13
/usr/sbin/isi_dnsiq_d:update_flx_config_kevent+0x25
/usr/sbin/isi_dnsiq_d:realloc_ips+0xf9
/usr/sbin/isi_dnsiq_d:main+0xbf0
/usr/sbin/isi_dnsiq_d:_start+0x8c
If you ran the isi networks support sc_put command to manually assign
a dynamic IP address to an interface on a specific node, the command failed and a
FAILED ASSERTION message similar to the following appeared:
130652
If the SmartConnect service IP address was the only IP address assigned to the
external network interface of a node, Flexnet would not populate the subnet
gateway. As a result, the affected node did not respond to DNS queries.
130642
If a static route was assigned to a storage pool (by running the isi networks
modify pool command with the --add-static-routes option), Flexnet
checked each node in the pool for UP interfaces to which the route could be
assigned. If Flexnet did not detect any UP interfaces, the following informational
message was sometimes repeatedly logged in the isi_flexnet_d.log file:
130343
Note
This message will now be logged only if the Flexnet logging level is set to debug or
higher.
SmartConnect did not pause for 10 seconds between rebalance operations and
thus rebalanced IP addresses more frequently than necessary.
130318
If you added a static route and incorrectly set the gateway, the affected node
sometimes became unresponsive and the OneFS software watchdog rebooted the
node.
130263
Note
An unreachable IP address
After an IB switch was rebooted, the FlexNet process running on each node
125193
updated the flx_config.xml file, causing the SmartConnect process to lock the
file. As a result, SmartConnect would fail to respond to new DNS requests for up to
two minutes on large clusters.
Networking
123
Resolved issues
ID
If an administrator added a static route that would send traffic across different
88072
interface types using the IP address of a node as the route destination, the affected
node rebooted unexpectedly.
NFS
NFS issues resolved in OneFS 7.2.0.0
ID
The isi nfs exports list command sometimes timed out, preventing users
from viewing or configuring NFS exports from the command-line Interface or the
OneFS web administration interface.
130270
If an inherit_only Access Control Entry (ACE) was applied to the owner of a file, and 130253
the Access Control List (ACL) was modified, the inherit_only ACE was mapped to the
NFSv4entry OWNER@. If the OWNER@ entry was subsequently remapped, the entry
was re-mapped to creator_owner rather than the original owner of the file, which
could prevent the original owner from accessing the file.
If either of the following conditions existed, the lockd process would stop
responding, preventing some NFS clients from accessing files on the cluster
because they could not be granted file locks:
l
A lock was granted to an NFS client while the client was unregistering from the
LKF system.
The isi_classic nfs client rm command was run while there were
several lock waiters on the NFS client.
If a cluster received NLM requests that included the AUTH_NONE credential, OneFS
would return a locking error instead of the correct error message.
129900
125482
During the OneFS boot process, a race condition prevented some sysctl parameters 125479
that were required for NFS Kerberos authentication from being read. This issue
caused Kerberos authentication to be unavailable to NFS clients; if this issue
occurred, messages similar to the following were logged in the nfs.log file:
Kerberos not available: gss_acquire_cred: Key table entry not
found;
If an NFS client sent lock requests with security type AUTH_NONE, the client
received an incorrect error message that did not indicate the reason for failure.
123567
If the cluster was configured with the overcommit limit below the low and high
settings (which is an invalid configuration), nodes could run out of memory and
unexpectedly reboot.
116133
124
ID
130720
Resolved issues
ID
On the Cluster Management > Access Management > LDAP page in the OneFS
web administration interface, if the length of the Bind to value exceeded the width
of the page , the corresponding edit link was not available.
130336
The OneFS web administration interface was not accessible to clients using
Microsoft Internet Explorer 8, 9 , or 10 in compatibility view. In addition, if a client
attempted to access the web administration interface using Internet Explorer in
compatibility view, the IE console displayed the following error: :
119315
You could not set a netmask of 0.0.0.0 through the OneFS web administration
Interface.
96604
ID
If a file was committed to a WORM directory through the RESTful Namespace API,
the file permissions were altered and, as a result, the file was accessible to
everyone.
130319
On clusters running in compliance mode, the compadmin user did not have access
to core files that were created when system processes stopped running. This
prevented the compadmin user from analyzing the cause of a failure if a system
process unexpectedly restarted. This also prevented the compadmin user from
deleting the files.
130284
If a cluster was running in SmartLock compliance mode, you could not renew the
SSL certificate of the Isilon web administration interface.
128443
The CTO upgrade process did not complete on clusters in compliance mode.
118428
ID
When a soft quota was modified, if the --soft-grace option was modified but
the --soft-threshold option was not modified, the command-line interface
ignored the configuration change.
130640
ID
Because OneFS relied on a function that could handle only file descriptors with a
maximum value of 1024, the lsass process unexpectedly restarted when it
attempted to process file descriptors assigned a value higher than 1024. As a
result, SMB users could not be authenticated for the few seconds it took for the
process to restart.
132043
SmartLock
SmartQuotas
SMB
SmartLock
125
Resolved issues
ID
While the lwio process was handling a symbolic link (a file that acts as a reference
to another file or directory) a memory allocation issue could occur in the lwio
process. If this issue was encountered, the lwio process unexpectedly restarted
and SMB clients that were connected to the affected node were disconnected.
131751
While executing a zero-copy system call, the lwio process could attempt to access
memory that was previously released to the system (also known as freed memory).
If the lwio process attempted to access freed memory, the lwio process
unexpectedly restarted and SMB clients that were connected to the affected node
were disconnected.
131748
The lwio process sometimes attempted to read data from a socket connection that
was not ready to be read from. If this occurred, the lwio process unexpectedly
restarted and the following ASSERTION FAILED message appeared in the
131745
lwiod.log file:
[lwio] ASSERTION FAILED: Expression = (pConnection>readerState.pRequestPacket->bufferUsed <= (maxHeader
+sizeof(NETBIOS_HEADER)))
Under some circumstances, the lwio process reported the length of a file name in
131711
bytes when a different value type was expected. As a result, the lwio process
attempted to access memory that wasn't allocated to it, causing the lwio process to
crash. If the lwio process crashed, SMB clients that were connected to the affected
node were disconnected.
If an SMB2 client experienced connection issues at the same time that it attempted 131681
to place a lease on a file, a race condition could occur that resulted in the client
being disconnected from the cluster.
Under rare circumstances, if a subprocess of the lwio process opened a new file
131586
handle on an existing lease at the same time that another subprocess was breaking
the lease, the lwio process unexpectedly restarted. If the lwio process restarted,
SMB clients that were connected to the affected node were disconnected.
When upgrading to OneFS 7.1.1.0, if any share names contained an invalid
character (for example, a bracket, colon, asterisk, or question mark), or if a share
path did not start with /ifs, the SMB configuration could not be upgraded. In
addition, no SMB shares would be visible after the cluster was upgraded and SMB
clients could not connect to the cluster until the invalid shares were removed and
the SMB configuration was successfully upgraded.
131364
Note
In OneFS 6.0 and earlier, an SMB share name could contain invalid characters and
shares could be created outside of the /ifs directory (an invalid share
configuration). On an upgrade to OneFS 6.5 through 7.0, an SMB share
configuration that contained shares with an invalid character or share paths that
did not start with /ifs could be successfully upgraded; however, the invalid
shares were inaccessible. Although the shares were inaccessible in OneFS 6.5 and
later, the existence of these shares could adversely affect upgrades to OneFS
7.1.1.0.
Under some circumstances, the lwio process reported the length of a file name in
130641
bytes when a different value type was expected. As a result, the lwio process
attempted to access memory that wasn't allocated to it, causing the lwio process to
126
Resolved issues
ID
crash. If the lwio process crashed, SMB clients that were connected to the affected
node were disconnected.
While a network socket was being closed, contention between process threads
could cause data structures referencing the socket to be prematurely freed. If the
freed structures were then accessed by another thread, the lwio process
unexpectedly restarted and SMB clients connected to the affected node were
disconnected.
130353
In environments with more than 12,000 SMB shares, the isi_webui_d process
sometimes ran out of memory and stopped running . If the isi_webui_d process
stopped running, the OneFS web administration interface was unavailable until the
process restarted and existing connections to the web administration interface
were unresponsive or were disconnected.
130267
Note
12,000 SMB shares exceeds the maximum number of shares supported by OneFS.
A work item could be scheduled in SRV and then freed before it could run. As a
result, crashes could occur.
130132
Because some SMB2 functions used an incorrect value type to manage SMB
message sequence numbers, SMB sometimes incorrectly returned a
STATUS_INVALID_PARAMETER error in response to SMB2 client requests. If
130130
130032
If a user requested access to a file to which they had write access and the file was
located in a share to which they had Read-Only access, the user might be
incorrectly denied access to the file if the create disposition of the request was
FILE_OPEN_IF.
130030
Note
The create disposition of the file specifies the action the system will take in
response to a request to access a file, based on whether the requested file does or
does not already exist.
SMB2 message IDs larger than 64kb in size were incorrectly displayed as zero,
which caused Active Directory domain controller connections to be reset.
130021
130010
SMB
127
Resolved issues
ID
Clients connected to the cluster through SMB were disconnected if the lwiod
process crashed. When the process crashed, the following lines were logged in the
stack trace:
130001
128
If auditing was enabled and a directory was accessed, the isDirectory flag was
sometimes incorrectly set to false. As a result, the audit log incorrectly indicated
that the item accessed was a file rather than a directory.
129455
A race condition would occur wherein an SMB1 session setup request and the Tree
Connect process simultaneously tried to access the security context in-memory
object. As a result, the lwio process would stop, and existing connections to the
node would close.
128076
The DC connection would reset when the MessageID wrapped to zero at 64 KB,
although it should have continued incrementing up to 0xFFFFFFFF (64 bits).
127778
127010
If there is was a mismatch between share names stored in memory and share
names stored in the registry, an assert would sometimes occur and lwio might
restart unexpectedly with a signal 6 error.
127005
126496
You could not create a share for a path that didn't exist through MMC. If you did
this, you could view the share through the OneFS web administration interface.
However, you could not access the share, because the path did not exist.
125888
When the number of file system event snapshots exceeded the amount of space
allocated by the OnefsEnumerateSnapshots buffer, the lwio process restarted on
various nodes, causing clients to be disconnected and then reconnected to the
cluster.
125570
If OneFS received an SMB request that contained a filepath, OneFS would convert
any forward slashes (/) to backslashes (\) before processing the request. This was
contrary to SMB standards, which specify that requests containing file paths that
include forward slashes return a STATUS_OBJECT_NAME_INVALID error.
125566
If a user had permission to access a shared directory, but the user was not granted
access to the parent directory that contained the shared directory, the user could
not rename files or folders contained in the shared directory.
125036
Resolved issues
ID
Clients connected to the cluster over SMB were disconnected when the lwio
process crashed. When the process crashed, the following lines were logged in
the /var/log/messages file:
124981
/lib/libthr.so.3:pthread_rwlock_init+0x117
/usr/likewise/lib/lwio-driver/srv.so: SrvConnection2SetInvalidEx
+0x22
/boot/kernel.amd64/kernel:
/usr/likewise/lib/lwio-driver/
srv.so:SrvProtocolTransport1DriverSendDone +0x6e
/usr/likewise/lib/lwio-driver/srv.so:SrvSocketProcessTaskWrite
+0x2dc
/usr/likewise/lib/lwio-driver/srv.so: SrvSocketProcessTask
+0x3d0
/usr/likewise/lib/liblwbase.so.0:EventThread+0x333
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xec
/lib/libthr.so.3:_pthread_getprio+0x15d
If a client sent an oplock or lease break acknowledgment for an oplock or lease that 123747
was never requested, a crash would occur with the following stack trace:
/boot/kernel.amd64/kernel: /lib/libc.so.7:thr_kill+0xc
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase_nothr.so.
0:__LwRtlAssertFailed+0x5a
/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/
srv.so:SrvPrepareOplockStateAsync_SMB_V2+0x57
/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/
srv.so:SrvOplockBeginPolling+0x36
/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/
srv.so:SrvExecContextContinue2+0x1c7
/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/
srv.so:SrvProtocolExecute2+0xdf
/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/
srv.so:SrvExecuteCreateAsyncCB_SMB_V2+0x6a
/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.
0:IopIrpCompleteInternal+0x324
/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.
0:IoFmIrpDispatchContinue+0x8c4
/boot/kernel.amd64/kernel: /usr/likewise/lib/libiomgr.so.
0:IoIrpComplete+0x33</msgblock>
/boot/kernel.amd64/kernel: /usr/likewise/lib/lwio-driver/
onefs.so:OnefsCompleteIrpContext+0xa9
/boot/kernel.amd64/kernel: /usr/likewise/lib
/boot/kernel.amd64/kernel: /lwio-driver/
onefs.so:OnefsProcessIrpContext+0x18b
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.
0:CompatWorkItem+0x16
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.
0:WorkThread+0x256
/boot/kernel.amd64/kernel: /usr/likewise/lib/liblwbase.so.
0:LwRtlThreadRoutine+0xec
/boot/kernel.amd64/kernel: /lib/libthr.so.3:_pthread_getprio+0x15d
If OneFS received a request from an SMB client whose Kerberos service ticket could 114524
not be decrypted, OneFS returned a STATUS_LOGON_FAILURE response to
the SMB client that sent the request. If this response was sent, the affected SMB
client might have experienced issues accessing files or applications that were
stored on the cluster.
SMB
129
Resolved issues
ID
Note
In OneFS 7.2.0.0 and later, if OneFS receives a request from an SMB client whose
Kerberos service ticket cannot be decrypted, a
STATUS_MORE_PROCESSING_REQUIRED response is returned. This
response prompts the affected SMB client to search for a secondary cluster. If the
search for a secondary cluster fails, the affected SMB client could still experience
issues accessing files or applications on the cluster.
ID
Ff all of the following conditions were met and you if upgraded to OneFS 7.0 or
later, the SMB configuration was not successfully upgraded and one or more
services were sometimes disrupted following the upgrade:
130266
The file path to one or more SMB shares on the cluster contained a multibyte
character.
The upgrade process might detect events that do not appear in the OneFS web
administration interface or the output of the isi events list command.
Because these events are older than 30 days, they are not displayed by default.
125551
During a OneFS upgrade, the crontab file was not updated with data from the
125550
crontab.smbtime file. As a result, crontab overrides that were configured before
the upgrade were not applied after the cluster was upgraded.
Virtual plug-ins
Virtual plug-ins ssues resolved in OneFS 7.2.0.0
ID
Due to a capacity checking error, if you created a new cluster through the OneFS
Simulator on Windows, Windows VMs, or VMWare Fusion workstations, the cluster
failed to mount /ifs and the following error message appeared:
133546
130
CHAPTER 6
Isilon ETAs and ESAs related to this release
The following section provides a list of EMC Technical Advisories (ETAs) and EMC Security
Advisories that describe issues that affect the latest 7.2.0 release or previous 7.2.0
releases.
For the most up-to-date list of Isilon ETAs and ESAs, see the Notifications section of the
Isilon Uptime Info Hub on the EMC Isilon Community Network site. You can also subscribe
to receive ETAs and ESAs related to OneFS via email by visiting the EMC Isilon OneFS
product page on the EMC Isilon Support site and clicking the Manage Advisory
Subscriptions link under Advisories.
l
l
131
132
Functional
area
ETA
Description
Status
ID
Authentication
199379
Resolved
in OneFS
7.2.0.2
147221
Backup,
recovery, and
snapshots
203815
154269
File system
202452
158417
Hardware
198924
142946
NFS
197460
Resolved
in OneFS
7.2.0.1
141210
NFS
204898
Resolved
in OneFS
7.2.0.4
149743
Functional
area
ETA
Description
Status
ID
NFS
205085
Resolved
in OneFS
7.2.0.3
153737
Networking
200096
152083,
148695
SMB
198187
142313
Description
Status
ID
ESA-2015-154
Resolved in OneFS
7.2.0.4
154655
ESA-2015-114
136994
ESA-2015-112
Resolved in OneFS
7.2.0.2
140931
ESA-2015-093
Resolved in OneFS
7.2.0.2
137884
133
134
ESA
Description
ESA-2014-146
143337
ESA-2015-015
Resolved in OneFS
7.2.0.1
137904
ESA-2015-038
134760
Status
ID
CHAPTER 7
OneFS patches included in this release
The following section provides a list of patches that address issues that are now fixed in
OneFS. If you previously installed one or more of the listed patches, and you upgrade to a
release that includes the fix for the issue the patch addressed, you do not need to
reinstall those patches after you upgrade.
After upgrading, see Current Isilon OneFS Patches on the EMC Online Support site to find
out if any new patches were released that might apply to the version of OneFS you
upgraded to.
l
l
l
l
135
Description
Patch ID
Authentication
Functionality change:
Users that attempt to connect to the cluster over SSH,
through the OneFS API, or through a serial cable, can no
longer be authenticated on clusters running in compliance
mode if any of the following identifiers are assigned to the
user as either the user's primary ID or as a supplemental ID:
patch-156748
UID: 0
SID: S-1-22-1-0
HDFS
patch-159065
HDFS
NFS
patch-158509
NFS
patch-156230
SMB
patch-154603
Description
Patch ID
Events, alerts,
and cluster
monitoring
patch-153659
Job engine
136
patch-156835
Functional
area
Description
Patch ID
NFS
patch-151610
SMB
Description
Authentication
Patch ID
137
Functional area
Description
Patch ID
NFS
patch-142630
SMB
138
Functional area
Description
Patch ID
SMB
Security
patch-139164
Description
Patch ID
Networking
Patch-138767
139
Functional area
Description
NFS
Patch ID
or
Permission denied.
SMB
140
Patch-142418
CHAPTER 8
Known issues
Unless otherwise noted, the following issues are known to affect OneFS 7.2.0.0 through
OneFS 7.2.0.4.
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
Known issues
141
Known issues
Antivirus
Antivirus known issues
ID
In the OneFS web administration interface, on the Antivirus Policies page, if you
double-click the Start link for a policy, multiple instances of the AVScan job start.
54477
ID
If there are files or directories on the cluster with ACLs that include SIDs for which
no corresponding UID is foundfor example, ACLs that include SIDs for users that
were deleted from an Active Directory (orphaned SIDs)OneFS queries external
authentication providers in an attempt to map the SID to an authoritative UID.
If the attempt to map an orphaned SID to an authoritative UID fails, OneFS
continues to query external authentication providers for the missing UID, and, in
environments where a large number of orphaned SIDs exist, the volume of queries
sent to the providers might adversely affect the performance of the external
authentication provider. If this occurs, users might be prevented from being
authenticated to the cluster.
158867
158243
Authentication
Note
142
In some cases, if the security mode of the SMB file sharing service is unchanged
from the default configuration in OneFS 6.5.5.x, and if another SMB share setting
for example, Change Notifyis also changed from the default setting in OneFS
6.5.5, then, during an upgrade to OneFS 7.1.1.x, the Impersonate guest security
parameter is changed from Always to Never. If this issue occurs, following the
upgrade, SMB clients might not be able to access shares on the cluster until the
Impersonate Guest value for the share is manually set to Always.
154826
The OneFS SMB server might fail to respond to NetrWkstaGetInfo remote procedure
calls at info level 102 that a clienttypically, embedded systems in a printer or
scannermight make when establishing a connection to the cluster. This could
cause the client to fail to establish a connection to the cluster.
134324
Known issues
ID
The lwio process cannot rename files to a name longer than 255 bytes.
134304
Incorrect sequence numbers during SMB2 traffic could cause the lwsmd process to
fail, resulting in a temporary loss of SMB service.
134247
The lsass process might fail while running the NtlmGetDomainNameFromResponse 134239
function due to an incorrectly formed request during NTLM authentication, resulting
in a temporary loss of authentication service.
The lsass process might fail while running the NtlmValidateResponse function due
to an incorrectly formed request during NTLM authentication, resulting in a
temporary loss of authentication service.
134238
The lsass process might fail while running the AuthenticateNTLMv2 function due to
an incorrectly formed request during NTLM authentication, resulting in a temporary
loss of authentication service.
134237
not
not
not
not
not
131835
found:2)
found:2)
found:2)
found:2)
found:2)
When you run the isi auth local user modify command with the -password-never-expires option on one of the default services accounts, you
receive an Invalid Parameter error. For example, running the following
83444
command attempts to set the password to never expire for the insightiq user
account:
isi auth local users modify --name=insightiq
--password-never-expires
For more information, see article 89092 on the EMC Online Support site.
ID
139186
143
Known issues
ID
the actual configured expiration time. This issue occurs because the expiration
time is interpreted differently by the web interface, the command-line interface,
and the isi_papi_d process.
Workaround: Use the following commands to configure snapshot schedule
expiration times from the command-line interface. Values set by running these
commands are not interpreted by the isi_papi_d process and are, therefore, not
affected by this issue:
isi_classic snapshot schedule create
isi_classic snapshot schedule modify
For more information about the preceding commands, run the following
commands:
isi_classic snapshot schedule create h
isi_classic snapshot schedule modify h
The isi_migr_sched process might fail while it is not possible to run replication
jobs, such as while a node is shutting down.
135744
Restoring an especially large amount of data (more than 50 TB), might fail due to a
memory allocation error with the following error message:
133591
Parallel NDMP restores might fail while the cluster is under heavy load.
130693
If you set the read-only DOS attribute to deny modification of files over both UNIX
(NFS) and Windows File Sharing (SMB) on a target directory of a replication policy,
the associated replication jobs will fail.
127652
125897
rm -rf /ifs/.ifsvar/modules/ndmp/sessions/*
This will remove all stale files while retaining current sessions.
The following message might appear in the /var/log/messages file:
124767
isi_migrate[98488]: coord[cert2-long123-d0b]:
Problem reading from socket of (null):
Connection reset by peer
Workaround: Ignore the error message. This is a transient error that OneFS will
recover from automatically.
144
Backing up large sparse files takes a very long time because OneFS must build
sparse maps for the files, and OneFS cannot back up data while building a map.
OneFS might run out of memory while backing up a sparse file with a large number
of sparse regions.
124216
File list backups are not supported with dir/node file history format.
113999
Known issues
ID
The SyncIQ scheduler service applies UTF-8 encoding even if the cluster is set with
a different encoding. As a result, DomainMark and SnapRevert jobs, which apply
cluster encoding, might fail to run.
99383
If you revert a snapshot that contains a SmartLock directory, the operation might
fail and leave the directory partially reverted.
99211
When SyncIQ and SmartQuotas domains overlap , a SyncIQ job might fail with one
of the following errors:
97492
unable to delete
failed to move
unable to rename
For more information, see article 88602 on the EMC Online Support site.
If you are using the Comvault Simpana data management application (DMA), you
cannot browse the backup if the data set has file names with non-ASCII characters.
As a result, you cannot select single files to restore. Full restoration of the dataset
is unaffected.
For more information, see article 88714 on the EMC Online Support site.
96545
If you use SyncIQ to synchronize data and some data is freed on the source cluster 94614
because a file on the source decreased in size, the data is not freed on the target
cluster when the file is synchronized. As a result, the space consumed on the target
cluster might be greater than the space consumed on the source.
SyncIQ allows a maximum of five jobs to run at a time. If a SnapRevert job starts
while five SyncIQ jobs are running, the SnapRevert job might appear to stop
responding instead of pausing until the SyncIQ job queue can accept the new job.
93061
After performing a successful NDMP backup that contains a large number of files
(in the tens of millions), when you restore that backup using Symantec NetBackup,
the operation fails and you receive the following error message:
87092
For more information, see article 88740 on the EMC Online Support site.
Cluster configuration
Cluster configuration known issues
ID
If a user is assigned only the ISI_PRIV_AUDIT privilege, the user can view the
controls to delete file pool policies on the File System > Storage Pools > File
Pool Policies page.
134378
Cluster configuration
145
Known issues
ID
Note
Although the ISI_PRIV_AUDIT privilege does not allow a user to delete file pool
policies, a user who is assigned the ISI_PRIV_AUDIT privilege can view the controls
to delete file pool policies on the File System > Storage Pools > File Pool
Policies page.
The isi_cpool_io_d process might fail while attempting to close a file, generating
"bad file descriptor" errors in the log. This is due to leaving a stale descriptor for
the cache header.
132397
The command-line wizard requires a default gateway to set up a cluster. You may
not have a default gateway if your network uses a local DNS server.
Workaround: Enter 0.0.0.0 for your default gateway.
24621
Command-line interface
Command-line interface known issues
ID
If you run an isi command with the --help option to get more information about
the command, the text that is displayed might provide information about the
related isi_classic command instead of providing information about the
command that you typed. For example, if you run the isi storagepools
command with the --help option, the following information appears:
129637
The isi version osreldate command returns a random number rather than
the expected OneFS release date.
98452
ID
On the Gather Info page In the OneFS web administration interface, the Gather
Status progress bar indicates that the Gather Info process is complete while the
process is still running.
103906
Diagnostic tools
146
ID
If an NFS request specifies an inode rather than a file name, and more than one
hard link to the specified inode exists, OneFS auditing will be unable to determine
136038
Known issues
ID
which hard link was intended by the NFS client. If this happens, OneFS auditing
might select the incorrect hard link, which can cause client permissions to be
misrepresented in audit logs.
The isi_papi_d process might fail while InsightIQ begins monitoring a cluster that
contains 80 or more nodes.
135767
The isi_stats_hist_d process might fail when the cluster is under heavy load, with
the following lines in the stack trace:
135641
/lib/libc.so.7:thr_kill+0xc
/lib/libc.so.7:__assert+0x35
/usr/sbin/isi_stats_hist_d:_ZN15stats_hist_ring4initEitb+0x506
/usr/sbin/isi_stats_hist_d:_ZN10ring_cache3getEiiiiii+0x228
/usr/sbin/
isi_stats_hist_d:_ZN11db_mgr_impl5queryER20stats_timeseries_setP10
stats_impltRK11query_timesRK14stats_hist_pol+0x33d
/usr/sbin/
isi_stats_hist_d:_ZN16database_manager5queryER20stats_timeseries_s
etP10stats_impltRK11query_timesRK14stats_hist_pol+0x28
/usr/sbin/
isi_stats_hist_d:_ZN20ecd_query_timeseries8query_dbEP10stats_implt
RK11query_timesRK14stats_hist_pol+0x3d
/usr/sbin/
isi_stats_hist_d:_ZN20ecd_query_timeseries12proc_commandEl+0x56c
/usr/sbin/isi_stats_hist_d:main+0xbcd
/usr/sbin/isi_stats_hist_d:_start+0x8c
The isi_celog_coalescer process fails when the garbage collector reaches across
multiple threads/connections and attempts to clear out what it deems as
unreferenced.
132398
The SNMP daemon might restart after a drive is smartfailed and then replaced.
129711
If you have auditing with NFS enabled on your cluster, the NFS service might restart
unexpectedly. If this occurs, lines similar to the following appear in
the /var/log/messages file:
129098
Stack: -------------------------------------------------/usr/lib/libstdc++.so.6:_ZNSs6assignERKSs+0x1e
/usr/lib/libisi_flt_audit.so.1:_init+0x3b60
/usr/lib/libisi_flt_audit.so.1:_init+0x4092
/usr/likewise/lib/libiomgr.so.0:IopFmIrpStateDispatchPostopExec
+0x16a
/usr/likewise/lib/libiomgr.so.0:IoFmIrpDispatchContinue+0x74d
/usr/likewise/lib/libiomgr.so.0:IopIrpDispatch+0x317
/usr/likewise/lib/libiomgr.so.0:IopRenameFile+0x117
/usr/likewise/lib/libiomgr.so.0:IoRenameFile+0x22
/usr/lib/libisi_uktr.so.1+0x167873:0x8082f2873
/usr/lib/libisi_uktr.so.1+0x194a17:0x80831fa17
/usr/lib/libisi_uktr.so.1+0x18fc90:0x80831ac90
/usr/lib/libisi_uktr.so.1+0x169b2c:0x8082f4b2c
/usr/likewise/lib/liblwbase.so.0:SparkMain+0xb7
--------------------------------------------------
113689,
112774
Workaround: Run the isi events quiet all command on the master node.
147
Known issues
ID
If the email address list for an event notification rule is modified from the command 89086
line, the existing list of email addresses is overwritten by the new email addresses.
For more information, see article 88736 on the EMC Online Support site.
Although SNMP requests can reference multiple object IDs, the OneFS subtree
responds only to the first object ID.
81183
If you have a large number of LUNs active, the event processor might issue a
warning about open file descriptors held by the iSCSI daemon.
You can safely ignore this warning.
79341
On the Cluster Overview page of the OneFS web administration interface, clicking 77470
the ID of a node that requires attention, as indicated by a yellow Status icon, does
not provide details about the status.
Workaround: In the list of events, sort the nodes by the Scope column or by the
Severity column, and then click View details.
Alternatively, run the isi events list --nodes <id> command to view the
events.
For more information, see article 16497 on the EMC Online Support site.
If you run the isi status command, the value displayed for the sum of per-node 73554
throughput might differ from the value displayed for the sum of cluster throughput.
This occurs because some data is briefly cached. The issue is temporary.
Workaround: Re-run the isi status command.
For more information, see article 88690 on the EMC Online Support site.
Reconfiguring aggregate interfaces can leave active events for inactive interfaces.
Workaround: Cancel the events manually.
72200
Event system databases that store historical events might fail to upgrade correctly.
If the databases fail to upgrade, they are replaced by an empty database with a
new format and historical events are lost.
71840
71399
Monitoring with SNMP, InsightIQ, or the isi statistics command can fail
when a cluster is heavily loaded.
68559
While a cluster processes a heavy I/O load, graphs in the OneFS web
administration interface might display the following message:
62736
148
55247
You might receive an alert that a temporary license is expired even though a
permanent license is installed.
24504
Known issues
ID
File system
File system known issues
ID
If you create or open Alternate Data Stream (ADS) with the Permission to
Delete option enabled at open time, a memory resource leak on the virtual file
system can result. This might degrade overall cluster performance.
153312
If a dedupe job is running on a file that is also in the process of being deleted, the
workers for the job can be delayed long enough to generate a hangdump file. The
dedupe job will continue afterwards. If this issue is encountered, messages similar
to the following appear in the /var/log/messages file:
141028
A node might fail to shut down or reboot if the shutdown process is unable to stop
the lwsm process in less than 2 minutes. If this issue occurs the following error
appears in the /var/log/messages file:
140822
If you encounter this issue, wait 5 minutes and then try to reboot the node by
running the reboot command. If the node fails to reboot, contact EMC Isilon
Technical Support for assistance.
The lwio process might fail while a node is being shut down.
135869
The lwio process might fail while the cluster is under heavy load, causing clients to
become disconnected. If this occurs, the following lines appear in the logs:
134343
/lib/libc.so.7:thr_kill+0xc
/usr/likewise/lib/liblwiocommon.so.0:LwIoAssertionFailed+0xb6
/usr/likewise/lib/libiomgr.so.
0:IopFmIrpStateDispatchFsdCleanupDone+0x26
/usr/likewise/lib/libiomgr.so.0:IoFmIrpDispatchContinue+0x36c
/usr/lib/
libisi_cpool_rdr.so:_Z16cprdr_pre_createP21_IO_FLT_CALLBACK_DATAP2
3_IO_FLT_RELATED_OBJECTSPPvPPFvS0_S3_ES4_+0x646
/usr/lib/
libisi_cpool_rdr.so:_Z19process_pre_op_itemP13_LW_WORK_ITEMPv+0x54
File system
149
Known issues
ID
/usr/likewise/lib/liblwbase.so.0:WorkThread+0x256
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee
/lib/libthr.so.3:_pthread_getprio+0x15d
During the upgrade process, an MCP error might prevent the last node on a cluster
from upgrading and corrupt the /etc/mcp/mlist.xml file.
Workaround: Delete the /etc/mcp/mlist.xml file and restart MCP. MCP will
autogenerate a new mlist.xml.
133115
When processing a restart request, MCP service configuration scripts that call isi 131924
services might result in a recursive service stop request, and this second
request might cause the MCP to simultaneously stop a service while starting
another that depends upon it. This will result in unnecessary service restarts.
Workaround: Manually stop the processes in the reverse order of their dependency.
If a node crashes on a three-node cluster and it is not re-added to the cluster, and
then you add a node, one of the remaining nodes might unexpectedly reboot. You
might need to wait for a significant amount of time before you can add the node to
the cluster successfully.
Workaround: Add the node to the cluster while no writes are being made to the
cluster. This will prevent the issue from occurring.
124603
LDAP user and group ownership cannot be configured in the OneFS web
administration interface.
Workaround: Use the command-line interface to configure LDAP user and group
ownership.
103983
An Alternate Data Stream (ADS) block-accounting error might cause the Inode
Format Manager (IFM) module to fail, causing the following message to be logged
to the stack trace:
100118
kernel:isi_assert+0xde
kernel:isi_assert_mayhalt+0x70
efs.ko:ifm_compute_new_ads_summary+0x9a
efs.ko:ifm_update_ads_summary+0x15b
efs.ko:ifm_end_operation+0x11ad
efs.ko:txn_i_end_all_inode_ops+0x11d
efs.ko:txn_i_end_operations+0x5e
efs.ko:txn_i_end+0x3d
efs.ko:bam_remove+0x198
efs.ko:ifs_vnop_wrapremove+0x1bf
kernel:VOP_REMOVE_APV+0x33
kernel:kern_unlinkat+0x2a6
kernel:isi_syscall+0x49
kernel:syscall+0x26e
Workaround: Ignore this error message. This is a transient error that OneFS will
recover from automatically.
150
Nodes without subpools appear in the per-node storage statistics, but are not in
the cluster totals because you cannot write data to unprovisioned nodes.
86328
The OneFS web administration interface does not prevent multiple rolling upgrades
from being started simultaneously. If multiple rolling upgrades are running
simultaneously, the upgrades fail.
84376
74272
Known issues
ID
available. See Best Practices Guide for Maintaining Enough Free Space on Isilon
Clusters and Pools on the EMC Online Support site.
When you attempt to create a hard link to a file in a WORM (Write Once Read Many)
directory, the following incorrect error message displays:
73790
When FlexProtect is run with verify upgrade check enabled and one or more drives
are down, OneFS occasionally reports false data corruption. If this issue occurs,
contact EMC Isilon Technical Support.
73276
18901
ID
129599
By default, the Very Secure FTP Daemon (vsftpd) service supports clear-text
authentication, which is a possible security risk.
127738
File transfer
Note
For more information about this issue, see the Protocols section of the OneFS 7.2
Security Configuration Guide.
In the OneFS web administration interface, on the Diagnostics > Settings page, if
you enter an invalid address in the HTTP host or FTP host field, Connection
70448
Hardware
Hardware known issues
ID
File transfer
151
Known issues
ID
Note
136915
If the power supply fan in an HD400 node fails, the power supply indicator light
turns yellow, but no alert is sent. If this condition is not addressed, the power
supply will eventually fail and an alert will be sent for the power supply failure.
Contact EMC Isilon Technical Support if you encounter this issue.
135814
If a node encounters a journal error during an initial boot, OneFS allows the user to
continue booting the node through the following text:
135354
If the node is booted in this state, and then joined to a cluster, it will remain in a
down state and might affect cluster quorum.
Workaround: Do not continue booting the node. Contact Isilon Technical Support.
If an SED SSD drive is set to SED_ERROR, and the drive is formatted while L3
133696
cache is enabled on the cluster, the drive will be formatted for storage and will
report a status of HEALTHY.
Workaround: SmartFail the SED SSD that has been formatted for storage and then
format the drive again.
The isi firmware update command might incorrectly report that a firmware
update has failed because OneFS requires nodes to be rebooted after a firmware
update, but the command performs a shutdown -p command instead.
133606
The isi firmware update command might incorrectly report that a firmware
update has failed on a remote node.
133317
Node firmware updates will fail if HPM downloads return error code D5 during the
upgrade process.
Workaround: Retry updating the node firmware. If this issue persists, contact EMC
Isilon Technical Support.
132523
123303
/usr/bin/isi_ipmicmc -c -a cmc
An internal sensor that monitors components might not correctly detect the source 73050
of a hardware component failure, such as the I2C bus. If this occurs, the wrong alert
or no alert might be generated.
152
Known issues
ID
Nodes with invalid system configuration numbers are split from the cluster after
joining.
Workaround: Use smartfail to remove the node from the cluster. Contact Isilon
Technical Support to apply a valid system configuration number to the node and
then add the node to the cluster again.
71354
A newly created cluster might not be visible to unconfigured nodes for up to three
minutes. As a result, nodes will fail to join the cluster during that time period.
69503
67932
There are multiple issues with shutting down a node incorrectly that can potentially 35144
lead to data loss.
Workaround: Follow instructions about shutting down nodes exactly.
For more information, see article 16529 on the EMC Online Support site.
HDFS
HDFS known issues
ID
DataNode connections can potentially experience a memory leak in the data path.
Over time, this can result in an unexpected restart of the HDFS server. As a result,
clients connected to that node are disconnected.
Workaround: The HDFS server will automatically be operational again within a few
seconds and no further action is necessary.
158083
If the Hadoop datanode services are left running on Hadoop clients that are
connected to a cluster, the isi_hdfs_d process will continuously log the following
message to /var/log/messages and /var/log/isi_hdfs_d.log as it
receives the requests:
135993
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol
from verify_ipc_protocol
(/build/mnt/src/isilon/bin/isi_hdfs_d/protoutil.c:18)
from parse_connection_context
(/build/mnt/src/isilon/bin/isi_hdfs_d/protoutil.c:100)
from ver2_2_parse_connection_context
(/build/mnt/src/isilon/bin/isi_hdfs_d/protocol_v2_2.c:388)
from process_out_of_band_rpc
(/build/mnt/src/isilon/bin/isi_hdfs_d/protocol_v2_2.c:1000)
If the cluster is under heavy HDFS load, it might cause the isi_hdfs_d process to
restart. If this occurs, the following lines appear in the stack trace:
123802
/lib/libc.so.7:__sys_kill+0xc
/usr/lib/libisi_util.so.1:isi_assert_halt+0xa0
/usr/lib/libisi_hdfs.so.1:hdfs_enc_mkdirat_p+0x2b1
/usr/lib/libisi_hdfs.so.1:hdfs_mkdir_p+0x41
/usr/bin/isi_hdfs_d:config_init_directory+0x13
HDFS
153
Known issues
iSCSI
iSCSI known issues
ID
The iSCSI protocol can log a data digest error message in the iSCSI log.
83537
74303
For more information, see article 88763 on the EMC Online Support site.
In the OneFS web administration interface, the iSCSI Summary page sometimes
loads slowly. When this occurs, the page might time out and the isi_webui_d
process might be consuming a high percentage of CPU resources on one or more
nodes.
73038
If you create a new target after you move iSCSI shadow clone LUNs, the OneFS web
administration interface might become unresponsive.
71919
ID
Job engine
In rare instances, if a drive fails while IntegrityScan is running, the IntegrityScan job 139708
can fail. In addition, if you run the isi job events list --job-type
integrityscan command, a message similar to the following appears on the
console, where <x> is the job ID:
2015-02-12T15:35:31 <x> IntegrityScan 1
State change Failed
154
134301
Known issues
ID
/usr/bin/isi_job_d:quotascan_task_next_item+0x4c
/usr/bin/isi_job_d:worker_process_task+0x307
/usr/bin/isi_job_d:worker_main+0x11cd :
/lib/libthr.so.3:_pthread_getprio+0x15d
133771
The MediaScan job reports errors for drives that have been removed from the
cluster.
Workaround: Don't fail a drive after a MediaScan job has started, or cancel the job.
132083
If the MultiScan, Collect, or Autobalance jobs are disabled before a rolling upgrade, 124744
the jobs will automatically become enabled after the rolling upgrade completes.
Workaround: If MultiScan, Collect, or Autobalance jobs are disabled before a rolling
upgrade, and you want those jobs to be disabled after the rolling upgrade
completes, manually disable those jobs after the rolling upgrade completes.
If a FlexProtect or FlexProtectLin job is started during a rolling upgrade, OneFS
123167
might cancel the job. The job might not complete until after the rolling upgrade is
complete.
Workaround: If OneFS creates a FlexProtect job because a device failed during a
rolling upgrade, pause the upgrade until the job completes. It is recommended that
you pause the rolling upgrade and do not pause the FlexProtect job.
The isi job status command displays jobs in numerical order by running ID
instead of displaying active jobs before inactive jobs.
114802,
114583
The isi job reports view job command sometimes returns reports twice.
112265
The Dedupe and DedupeAssess jobs can only run with a job-impact level of low.
110129
When you run a DomainMark job after taking a snapshot, and then run a
SnapRevert job with a job impact policy set higher than low, the impact policy has
no effect.
For more information, see article 88597 on the EMC Online Support site.
93603
Job engine operations occasionally fail on heavily loaded or busy clusters. When
the command fails, a message similar to the following is displayed:
72109
Workaround: If an operation fails, wait a moment and then retry the operation.
The final phase of the FSAnalyze job runs on one node and can consume excessive
resources on that node.
64854
Job engine
155
Known issues
Migration
Migration known issues
ID
If you migrate ACLs to the cluster through the isi_vol_copy_vnx command and 131299
then attempt to read those ACLs over NFSv4, the read will fail with the following
error message:
An NFS server error occurred
If you migrate FIFO files using the isi_vol_copy utility, the following message
displays:
100366
If the isi_vol_copy command is run twice, with different source paths but the
same target path, the second run fails without migrating any files.
100365
ID
Networking
If a network socket is already closed when sbflush_internal is called, the affected
150739
node might unexpectedly reboot. If a node reboots as a result of this issue, an error
similar to the following appears in the /var/log/messages file:
Software Watchdog failed (userspace is starved!)
In clusters with a large number of nodes, after an InfiniBand switch is rebooted, the 134665
cluster might experience a high level of group change activity for approximately two
hours. Because, by default, a single Device Work Thread (DWT) is handling all node
transitions to the new InfiniBand connections, some requests are not handled in a
timely manner. As a result, nodes might not successfully failover to the new
InfiniBand connection, and, in some cases, might fail to rejoin the cluster.
Workaround: To increase the number of DWT threads handling requests to failover
to a new InfiniBand connection, set the following sysctl value:
sysctl efs.rbm.dwt_threads=4
For more information about viewing and setting sysctl options, see article 89232 on
the EMC Online Support site.
Note
Increasing the number of DWT threads might affect CPU performance, depending
on the number of processors in the node.
156
Known issues
ID
The OpenSM process might fail, causing cluster-wide actions to slow for a short
period of time. If this occurs, the following lines appear in the stack trace:
132546
/lib/libc.so.7:thr_kill+0xc
/lib/libc.so.7:__assert+0x35
/usr/lib/libcomplib.so.1:cl_spinlock_acquire+0x53
/usr/libexec/opensm:osm_log+0xef
/usr/libexec/opensm:umad_receiver+0x55b
/usr/lib/libcomplib.so.1:__cl_thread_wrapper+0x18
/lib/libthr.so.3:_pthread_getprio+0x15d
Ixgbe interfaces might report a status of inactive, even if the cable and the port that 127706
the cable is plugged into is functioning correctly.
If a port on an A100 node has IP addresses assigned to it, the port will reinitialize
when the node is booted up.
126464
After a group change, the dnsiq_d process might fail. After this, the following
message is logged to the stack trace:
78588
/usr/sbin/isi_dnsiq_d:vip_configured+0x54
/usr/sbin/isi_dnsiq_d:vip_ifconfig_down+0x18
/usr/sbin/isi_dnsiq_d:apply_flx_subnet+0x7c
/usr/sbin/isi_dnsiq_d:gmp_group_changed+0x122
/usr/sbin/isi_dnsiq_d:main+0x660
/usr/sbin/isi_dnsiq_d:_start1+0x80
/usr/sbin/isi_dnsiq_d:_start+0x15
71687
If an IPv6 subnet includes two or more NICs, one NIC might become unresponsive
over IPv6.
57880
ID
If you run the rmdir command to remove a directory from an NFS export that is
configured with character encoding other than the default encodingfor example,
CP932 or ISO-8859-1 encodingand if the name of the directory you want to
remove contains a special character, the directory is not removed and a message
similar to the following appears on the console:
159373
NFS
On occasion, when OneFS is shutting down the NFS server, a system call made by
the server does not return a response within the allowed 15-minute grace period.
As a result, the NFS server is forcibly shut down and lines similar to the following
appear in the /var/log/messages file:
136358
/lib/libc.so.7:syscall+0xc
/usr/likewise/lib/lw-svcm/onefs.so:OnefsQuerySetInformationFile
+0xa7
NFS
157
Known issues
ID
/usr/likewise/lib/lw-svcm/onefs.so:OnefsSetInformationFile+0x3b
/usr/likewise/lib/lw-svcm/onefs.so:OnefsIrpSpark+0x109
/usr/likewise/lib/lw-svcm/onefs.so:OnefsIrpWork+0xfa
/usr/likewise/lib/lw-svcm/onefs.so:OnefsAsyncStart+0x55
/usr/likewise/lib/lw-svcm/onefs.so:OnefsDriverDispatch+0x6f
/usr/likewise/lib/libiomgr.so.0:IopFmIrpStateDispatchFsdExec+0x9d
/usr/likewise/lib/libiomgr.so.0:IoFmIrpDispatchContinue+0x56c
/usr/likewise/lib/libiomgr.so.0:IopIrpDispatch+0x1d0
/usr/likewise/lib/libiomgr.so.0:IopQuerySetInformationFile+0x1fc
/usr/likewise/lib/libiomgr.so.0:IoSetInformationFile+0x44
/usr/likewise/lib/lw-svcm/nfs.so:Nfs4SetattrSetInfoFile+0x5a2
/usr/likewise/lib/lw-svcm/nfs.so:Nfs4Setattr+0x3bd
/usr/likewise/lib/lw-svcm/nfs.so:NfsProtoNfs4ProcSetAttr+0x178
/usr/likewise/lib/lw-svcm/nfs.so:NfsProtoNfs4ProcCompound+0x87e
/usr/likewise/lib/lw-svcm/nfs.so:NfsProtoNfs4Dispatch+0x486
/usr/likewise/lib/lw-svcm/nfs.so:NfsProtoNfs4CallDispatch+0x3e
/usr/likewise/lib/liblwbase.so.0:SparkMain+0xb7
The NFS process might fail if you attempt to shut down the NFS process while the
cluster is under heavy NFS load.
135529
OneFS might report that NFS clients are still connected to the cluster after the
clients have disconnected.
135376
The NFS process might core, causing all NFS clients to be disconnected. If this
occurs, the following lines appear in the stack trace:
129684
/lib/libc.so.7:thr_kill+0xc
/lib/libc.so.7:__assert+0x35
/usr/likewise/lib/libiomgr.so.0:IoFileSetContext+0x32
/usr/likewise/lib/lwio-driver/onefs.so:OnefsStoreCCB+0x20
/usr/likewise/lib/lwio-driver/onefs.so:OnefsNfsCreateFile+0xf4b
/usr/likewise/lib/lwio-driver/onefs.so:OnefsCreateInternal+0x1209
/usr/likewise/lib/lwio-driver/onefs.so:OnefsSemlockAvailableWorker
+0x92
/usr/likewise/lib/lwio-driver/
onefs.so:OnefsAsyncUpcallCallbackWorker+0x1dd
/usr/likewise/lib/lwio-driver/onefs.so:OnefsAsyncUpcallCallback
+0xe8
/usr/lib/libisi_ecs.so.1:oplocks_event_dispatcher+0xb9
/usr/likewise/lib/lwio-driver/onefs.so:OnefsOplockChannelRead+0x56
/usr/likewise/lib/liblwbase.so.0:EventThread+0x6dc
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee
/lib/libthr.so.3:_pthread_getprio+0x15d
If an SMB client has an opportunistic lock (oplock) on a file and the file is renamed 94168
or deleted by an NFS client, the SMB client does not relinquish its oplock, and the
file data on the SMB client is not updated. This issue is caused by an extremely rare
race condition that might occur in OneFS 6.0 or later.
For more information, see article 88591 on the EMC Online Support site.
158
After a node restarts, the mountd process starts before authentication. As a result,
immediately after the node restarts, NFS clients might experience permission
problems or receive the wrong credentials when they mount a directory over NFS.
Workaround: On the NFS client, unmount and remount the directory.
73090
70616
When you add a node to the cluster, the master control program (MCP) loads the
sysctl.conf file after the external interfaces have IP addresses. As a result, NFS
70413
Known issues
ID
clients that require 32-bit file handles might encounter issues connecting to newly
added nodes.
Workaround: On NFS clients that encounter this issue, unmount and then remount
the directory.
The default number of NFS server threads was changed to address a potential issue 69917
in which the NFS server monopolizes node resources. NFS performance might be
lower than expected.
Workaround: Adjust the number of nfsd threads by running the following
commands. Modify the minimum number of threads by running the following
command, where <x> is an integer:
isi_sysctl_cluster vfs.nfsrv.rpc.threads_min=<x>
Modify the maximum number of threads by running the following command, where
<x> is an integer:
isi_sysctl_cluster vfs.nfsrv.rpc.threads_max=<x>
We recommend that you set threads_min and threads_max to the same value.
Increasing the number of threads can improve performance, but can also cause
node stability issues.
OneFS API
OneFS API known issues
ID
The lwswift process might fail if a large number of clients retrieve large files that
have not been previously accessed by Swift. If this occurs, the following lines
appear in the stack trace:
135252
/lib/libc.so.7:thr_kill+0xc
/usr/likewise/lib/liblwbase_nothr.so.0:LwRtlMemoryAllocate+0x9e
/usr/likewise/lib/liblwbase.so.0:LwIovecCreateMemoryEntry+0x22
/usr/likewise/lib/liblwbase.so.0:LwIovecPullupCapacity+0x1ae
/usr/likewise/lib/lwio-driver/
lwswift.so:_Z12HttpProtocolPN5swift10_LW_SOCKETEP9_LW_IOVECiPvPj
+0x165
/usr/likewise/lib/liblwswift_utils.so.
0:_ZN5swift12LwSocketTaskEP8_LW_TASKPv19_LW_TASK_EVENT_MASKPS3_Pl
+0x634
/usr/likewise/lib/liblwbase.so.0:EventThread+0x6dc
/usr/likewise/lib/liblwbase.so.0:LwRtlThreadRoutine+0xee
If you attempt to write to a read-only file, OneFS does not log an error message to
the /var/log/lwswift.log file.
134770
In the RESTful Access to the Namespace (RAN) API, when a file is created through
the PUT operation, a temporary file of the same name with a randomly generated
suffix is placed in the target directory. Under normal circumstances, the temporary
file is removed after the operation succeeds or fails. However, the temporary file
may remain in the target directory if the server crashes or is restarted during the
PUT operation.
104388
OneFS API
159
Known issues
ID
If you run the isi devices fwupdate command on a node that contains SSDs
configured for use as L3 cache, and that node is in read-only mode, the node might
restart unexpectedly and an error similar to the following appears in
the /var/log/messages file:
155489
If you attempt to upload cluster information through the OneFS web administration
interface and the upload fails, the web interface for uploading information ceases
to function. If you attempt to upload information again, OneFS will display
Gather Succeeded. However, no cluster information will be uploaded.
133974
If you have not uploaded cluster information to Isilon Technical Support yet, on the
Cluster Management > Diagnostics Info page, the Gather Status bar appears
gray or black.
133972
The default SSL port (8080) for the web administration interface cannot be
modified.
For more information, see article 88725 on the EMC Online Support site.
94026
If you use the SmartConnect service IP or hostname to log in to the OneFS web
administration interface, the session fails or returns you to the login page.
Workaround: Connect to the cluster with a static IP address instead of a hostname.
75292
ID
137904
Security
160
Known issues
ID
For more information, see ESA-2015-015 on the EMC Online Support site.
SmartQuotas
SmartQuotas known issues
ID
138115
Quota configuration import and export functionality is missing from the isi
quotas command.
Workaround: To export or import quota configuration files, run the isi_classic
quota list --export or the isi_classic quota --import --fromfile <filename> command from the command-line interface, where <filename>
is the name of the file to be imported.
94797
To export a file from the OneFS web administration interface, click SmartQuotas >
69816
ID
SMB
139712
On the Protocols > Windows Sharing > SMB Shares tab in the OneFS web
administration interface, if you click Reset or Cancel in the Add a User or Group
dialog box while adding or viewing an SMB share, the Add a User or Group dialog
becomes inoperable.
Workaround: Refresh the OneFS web administration web page.
If you shut down a node while a cluster is under heavy load, the following lines
might appear in the stack trace:
134661
/lib/libc.so.7:recvfrom+0xc
/usr/lib/libisi_gconfig_c_client.so.1:gconfig_connection_flush
+0x375
/usr/lib/libisi_gconfig_c_client.so.
1:gconfig_connection_read_message+0x47
/usr/lib/libisi_gconfig_c_client.so.
1:gconfig_client_update_entries_count+0x799
/usr/lib/libisi_gconfig_c_client.so.
1:gconfig_client_wait_for_config_change+0x274
/usr/likewise/lib/lwio-driver/
srv.so:StoreChangesWatcherThreadRoutine+0xf3
/lib/libthr.so.3:_pthread_getprio+0x15d
If an application sends OneFS a request for alternate data streams, but specifies a
buffer size that is too small to receive all of the alternate data streams, OneFS will
134299
SmartQuotas
161
Known issues
ID
report that the streams do not exist, instead of reporting that the buffer size was
too small.
Alternate data streams might be inaccessible through Windows PowerShell.
134250
The isi_papi_d process might fail while there is a large amount of SMB traffic and
multiple threads call the same code at the same time. However, in rare cases, the
port can suddenly become inactive.
Workaround: If a port becomes inactive, you must reboot the node to resolve this
issue.
130692
Some SMB 1 clients send a Tree Connect AndX request using ASCII to specify a
path. The cluster rejects the connection with STATUS_DATA_ERROR.
84457
When you add a new Access Control Entry (ACE) that grants run-as-root permissions 72337
to an Access Control List (ACL) on an SMB share, OneFS adds a duplicate ACE if
there is already an entry granting full control to the identity. The extra ACE grants no
extra permissions.
Workaround: Remove the extra ACE by running the isi smb permissions
command.
ID
Beginning in OneFS 7.2.0.1, the network port range used for back-end
communications was changed. As a result, in rare cases, if you perform a rolling
upgrade from a supported version of OneFS to OneFS 7.2.0.1 or later, and if the
upgrade process fails or is paused before all of the nodes in the cluster have been
upgraded, commands sent from nodes that have not yet been upgraded might be
sent to an upgraded node through an unsupported port.
If this issue occurs, affected nodes are not upgraded, the command that was sent
fails, and messages similar to the following might appear on the console:
143408
Note
You can avoid this issue by performing a simultaneous upgrade. If you encounter
this issue, see article 198906 on the EMC Online Support Site.
If you initiate a simultaneous upgrade through the OneFS web administration
interface, OneFS incorrectly reports that a rolling upgrade is occurring through the
following message:
133409
When running the sudo isi update command, you might encounter warnings
that the cluster contains unresolved critical events, that certain drives are ready to
be replaced, or that devices in the carrier boards are not supported boot disks. You
can disregard these messages because they have no adverse affects.
162
131929
Known issues
ID
After a rolling upgrade is complete, the OneFS web administration interface might
report that a rolling upgrade is still in progress.
Workaround: Restart the rolling upgrade.
126799
For more information, see article 186845on the EMC Online Support site.
If a node is rebooted during a rolling upgrade, and the node fails, the upgrade
process might continue to run indefinitely, even after all other nodes have been
upgraded.
125320
If Collect or MultiScan jobs are in progress when either a rolling upgrade or cluster
reboot is initiated, the job will fail instead of being cancelled.
123903
Note
If the Collect or MultiScan jobs continue to fail after the rolling upgrade is
complete, it is unlikely that the failure was caused by this issue.
During a rolling upgrade, if you are logged in to a node that has not been upgraded 123842
yet, and you view job types, the system displays several disabled job types with IDs
of AVScan.
These job types are new to OneFS 7.1.1 and have been mislabeled during the
rolling upgrade process. The IDs of the job types will resolve to the correct IDs after
the rolling upgrade is complete.
Jobs that are running when a OneFS upgrade is started might not continue running
after the upgrade completes.
Workaround: Cancel all running jobs before upgrading or manually restart jobs that
did not restart automatically following the upgrade.
98341
98072
ID
Adding an Isilon vendor provider might fail when you enable VASA support.
Additionally, the VASA information that appears in vCenter might be incorrect.
These issues can occur if you create a data store or virtual machine through the
VMware vSphere PowerCLI.
Workaround: You can resolve this issue by creating data stores through either the
VMware vCenter graphical user interface or the VMware ESXi command-line
interface.
97735
Virtual plug-ins
Virtual plug-ins
163
Known issues
164
CHAPTER 9
OneFS Release Resources
Sources for information about and help with the OneFS operating system.
l
l
l
l
165
Description
Visit the EMC Isilon OneFS product page on the EMC Online Support site to
download Isilon product documentation and current software releases.
Help on This
Page
Select Help on this Page from the Help menu in the OneFS web
administration interface to see information from the OneFS Web
Administration Guide and the OneFS Event Reference. The Help on This Page
option does not require internet connectivity.
Online Help
Select Online Help from the Help menu in the OneFS web administration
interface to see information from the OneFS Web Administration Guide and the
OneFS Event Reference. The Online Help contains the latest available versions
for these guides. The Online Help option requires internet connectivity.
ISI Knowledge
You can visit the ISI Knowledge blog weekly for highlights and links to Isilon
support content we have to offer. Announcements of availability of content,
product tips, and information about new ID.TV videos.
EMC Isilon
YouTube playlist
You can visit the EMC Isilon YouTube playlist on the EMC Corporate YouTube
channel for Isilon how-to videos, information about new features,
information about Isilon hardware, and technical overviews.
Available documentation
OneFS documentation is available across the following channels.
166
Document
Channel
Online Help
Online Help
Document
Channel
167
Authentication
This functional area is used to categorize new features, changes, and issues that
affect authentication on the cluster. This includes, but is not limited to:
l
LDAP
NIS
NDMP
Snapshots
SyncIQ
Symantec NetBackup
Cluster configuration
This functional area is used to categorize new features, changes, and issues that
affect cluster configuration. This includes, but is not limited to:
l
CloudPools
Licensing
NTP
SmartPools
Command-line interface
This functional area is used to categorize new features, changes, and issues that
affect the OneFS command-line interface.
Diagnostic tools
This functional area is used to categorize new features, changes, and issues that
affect tools that are used to research and diagnose cluster issues. This includes, but
is not limited to:
168
Alerts
Protocol auditing
Statistics
Status
File system
This functional area is used to categorize new features, changes, and issues that
affect the OneFS file system. This includes, but is not limited to:
l
FreeBSD
L3 cache
MCP
OneFS Kernel
File transfer
This functional area is used to categorize new features, changes, and issues that
affect FTP and HTTP connections to the cluster.
Hardware
This functional area is used to categorize new features, changes, and issues that
affect Isilon hardware in a OneFS cluster .
HDFS
This functional area is used to categorize new features, changes, and issues that
affect the HDFS protocol.
iSCSI
This functional area is used to categorize new features, changes, and issues that
affect the iSCSI protocol and iSCSI devices connected to a OneFS cluster.
Note
169
Job engine
This functional area is used to categorize new features, changes, and issues that
affect the OneFS job engine and deduplication in OneFS.
Migration
This functional area is used to categorize new features, changes, and issues that
affect migration of data from a NAS array or a OneFS cluster to a OneFS cluster
through the isi_vol_copy utility or the isi_vol_copy_vnx utility.
Networking
This functional area is used to categorize new features, changes, and issues that
affect the OneFS external network and the OneFS back-end network. This includes,
but is not limited to:
l
Fibre Channel
Flexnet
InfiniBand
SmartConnect
TCP/IP
NFS
This functional area is used to categorize new features, changes, and issues that
affect NFS connections to the cluster.
OneFS API
This functional area is used to categorize new features, changes, and issues that
affect the OneFS Platform API and SWIFT.
OneFS web administration interface
This functional area is used to categorize new features, changes, and issues that
affect the web administration interface.
Performance
This functional area is used to categorize new features, changes, and issues that
affect cluster performance.
Security
This functional area is used to categorize new features, changes, and issues that are
related to security fixes and vulnerabilities.
Security Profiles
This functional area is used to categorize new features, changes, and issues that
affect hardened profiles such as the security technical information guides (STIG).
SmartQuotas
This functional area is used to categorize new features, changes, and issues that
affect SmartQuotas.
SMB
This functional area is used to categorize new features, changes, and issues that
affect SMB connections to the cluster.
Upgrade and installation
This functional area is used to categorize new features, changes, and issues that
affect OneFS upgrades, installation of OneFS patches, and the reformatting and
reimaging of Isilon nodes by using a USB flash drive.
170
Virtual plug-ins
This functional area is used to categorize new features, changes, and issues that
affect virtual plug-ins. This includes, but is not limited to:
l
OneFS Simulator
vOneFS
This functional area is used to categorize new features, changes, and issues that
affect vOneFS.
Live Chat
Create a Service Request
Telephone Support
171
172