Sunteți pe pagina 1din 21

Bolder Technology, Inc.

2011 Page 1

A BTI Case Study
Obsession with Quality
at Western Digital Corporation
August 2010
Company Background .............................................................. 2
Business Problem ..................................................................... 3
Business Solution ..................................................................... 6
QIS Project ............................................................................... 7
Benefits Realized .................................................................... 16
Lessons Learned ..................................................................... 17
Synthesis ................................................................................ 18
SIDEBAR: Social Responsibility is Smart Business ................... 20
References .............................................................................. 20
About the Methodology ......................................................... 21
About Bolder Technology ....................................................... 21
About the Sponsor ................................................................. 21


Bolder Technology, Inc. 2011 Page 2
Our business model is focused on process efficiency to control, characterize,
and tune our processes like a fine watch. Being data driven got baked into our
culture early because the thousands of white collar workers mostly engineers
who understand the data were obsessed with relentless quality improvement.
This relentless quality improvement permeates our entire culture from
manufacturing to HR and Finance. We cant improve our business without
having a magnifying glass showing where the opportunities are.
- Gary Meister, CIO and Senior VP of Customer Satisfaction
1

This is the story of the Quality Information System (QIS) at Western Digital Corporation. It is a story about using
data to understand and improve their business. WD is a culture that continually searches for opportunities and
implements that next small improvement, not only within its supply chain, but throughout all parts of their
business. The approach is to use Information Technology (IT) as the magnifying glass to examine each piece of their
business. As one analyst mused, Sunlight is the best disinfectant. Errors can be fixed once you can see them.
Company Background
Western Digital Corporation WD is a pioneer in hard disk drive storage manufacturing since the 1980s.
Headquartered in Irvine, California, they are the largest drive supplier by volume in the world. Their product line is
quite diverse, spanning both internal and
external drives across desktop, mobile,
enterprise, AV, network attached storage, and
digital home entertainment products, along
with the emergence of the solid-state drive
units.
2

In their fiscal 2010 year, WD achieved $9.8
billion with $1.4 billion in income. They have
approximately 63,000 employees worldwide.
Consistent Growth
WD has proved itself as best of class among
drive manufacturers. WD boasts a track
record of customer focus, quality,
technology deployment leadership,
operations excellence, sustained profitability, strong balance sheet, and asset efficiency.
When WD first started manufacturing hard drives, the opinion of many industry observers was that WD would not
last in the turbulant industry of disk drives. WD was late to enter this market that was already crowded with 55
manufacturers. WDs growth in market share has been consistently positive while other drive manufacturers have
had large swings in their growth curves. Other companies have grown mainly through acquisitions rather than
WD's steady organic growth. And during the 2009 recession, WD was the only drive
manufacturer that did not lose money.
WD has had consistent growth in market share, reaching 31% in fiscal 2010 and exceeding
Seagate in number of units manufactured. WDs leadership in the fast-growing 2.5-inch hard
drive market includes WD Scorpio and retail products like the My Passport capable of storing
a terabyte on a single drive.
Obsession with Quality
The pervasive culture throughout WD is an obsession for continuous quality improvements. The term obsessive
highlights the positive aspects of WD: a persistent idea or impulse that continually forces its way into

Figure 1 - WD Product Line



Bolder Technology, Inc. 2011 Page 3
consciousness
3
. With this mindset, WD makes small improvements daily throughout its processes and supply
chain. Gary Meister remarked, The amazing thing for me in my tenure as good as you think you are, not a day
goes by you dont ask yourself Why do we do it that way? Would it be better to do it some other way?
Quality improvements are driven by complete product and component traceability on the entire life-cycle of every
drive unit, from suppliers, to manufacturing, to testing, shipment, and customer use. As described later, this
traceability is enabled by the centralized capture and storage of highly detailed data.
Corporate Strategies
To complement the quality focus and to execute this constant stream of changes, the culture at WD encourages
fast decision-making at many levels using compact management units. The emphasis is on taking corrective action
fast. A popular mantra at WD illustrates this: It is not the big that eat the small. It is the fast that eat the slow. In
particular, not only can WD find the root cause of a defect in minutes or hours, but it is only hours after discovery
that management will eliminate the source of the defect and minimize its impacts.
In addition, WD in recent years has vertically integrated much of the major components of its supply chain, based
on acquisitions, strategic volume purchases of disk heads and media and vertical integration of technology, supply
stability, and cost leverage.
IT and MITECS
Information Technology (IT) is considered an essential and synergistic part of the company. The heart of WDs IT is
their manufacturing system called MITECS, which is short for Manufacturing Information Technology Execution
Control System. MITECS monitors workflows with statistical process control (SPC) techniques, tracking each stage
of the product manufacturing line. If any indicator is trending out of norm, that component (such as assembly
machine or clean room) can be halted. And if problems occur on the manufacturing line, MITECS can slow or halt
the downstream lines to compensate.
MITECS was developed internally and is more than fifteen years old. The system has continued to prove invaluable
to WD over those years. In the late 1990s, WD was building 20,000 drives per day. Today, WD is building
approximately 550,000 per day and more than six per second all of which managed by the same MITECS
architecture.
WD is replicating their IT success with manufacturing into the back office functions, like HR, which is not normally
data driven. For the WD culture that embraces data, improvement opportunities surfaced through data analysis
are being used in all areas of their company.
Do-It-Yourself Culture
The IT group at WD has a strong do-it-yourself (DIY) culture that is intent on home-grown tooling and
development. For instance, IT uses a home-grown job scheduling system. They do not use a commercial ETL tool,
and various BI analysis tools have failed to be widely adopted. DIY culture is not a cost issue. Expensive high-quality
software tools are used, such as SAS for COX modeling as well as warranty modeling. Often the IT group has the
internal talent to develop a solution that satisfies the requirements. Although more elegant, a commercial tool
may not precisely meet the users specific needs. Further, an internally developed solution can be more responsive
to changing requirements.
Business Problem
The business problem for WD is a broad spectrum of issues that start with managing its supply chain from raw
materials to components to assemblies to finished goods to OEM and retail distributors. Because of the unique
nature of WDs business, these issues span diverse areas as:
Supplier Management
Product and process development
Asset utilization
Bolder Technology, Inc. 2011 Page 4
Warranty management
Customer Service/Support
Manufacturing
Product Engineering
Field Quality Engineering
Nature of the Business
WDs product is exceedingly complex as well as being a mass commodity. To be successful, WD must have both
volume and efficiency in the manufacturing and distribution of this complex product. For example, the surface
layer in the disk heads is just three or four atoms thick. Yet, millions of those disk heads are assembled into
millions of disk drives that must work reliably for years.
WD produces about 550,000 disk drives per day or nearly 200 million drives per
year. Nearly all units pass all the tests, are shipped to customers worldwide,
and perform flawlessly for years. While this high success rate is incredible for
such a complex product, the failure rate of even a fraction of a percentage
results in the production of over a million defective drives.
The high-tech manufacturing industry calls this problem Defective Parts per
Million or DPPM. WD manages DPPM continuously in their normal
manufacturing and shipping processes. However, defective units occasionally get shipped to customers. The WD
process to react quickly to this is called Exposure-Containment-Disposition (ECD). With one million disk units
produced every two days, WD cannot afford to ship defective units. Receiving a defective product damages
customer satisfaction and their willingness to buy what they consider to be a reliable commodity product.
Most defects are caught and eliminated before anything is sent to the customer. But according to WD executives,
this situation is the low hanging fruitthe easy part. It is a different story when defects are found in units shipped
to customers.
Minimizing customer losses is critical. The objective is to minimize the distribution of bad units, either by retrieving
units in the field and/or by halting manufacturing of any units that might experience the same defect. Once a
product is retrieved from the field, a post-mortem investigation of the failed disk units must ensure that the same
defect never occurs again, eliminating the specific combination of factors that produced the failure.
The individual components, like the disk head, are approaching perfection so that correcting defects in those
components result in diminishing returns. Thus, a single component is increasingly unlikely to cause a failure.
Failures now result from the combination of several factors based upon parts from various suppliers, process steps
on many manufacturing lines, and a variety of uses in the field. For example, an unpredictable combination of
factors might occur when the customer puts the drives into their product, like a PC chassis with poor cooling or a
fluctuating power supply, exceeding WDs recommended conditions.
Success depends on coping with and even thriving upon this mind-boggling situation. Consider one component of
the drive the disk heads to illustrate the situation.
To create disk heads that read/write data that is on the disk platters, WD must
fabricate thousands of six-inch silicon wafers, each of which contains fifty thousand
potential heads. The manufacturing process takes many weeks to slice the wafers,
attach each head to a suspension, cut an air bearing, attach that to a head stack
assembly, and finally put it into the drive, which is packaged into a box, which is
stacked onto a pallet. For example, assume that a certain disk drive fails in testing or
in the field. From its serial number, WD is able to trace back to the fabrication site for every part or shipping pallet
from which this drive came. For every process step, the company must locate every part in the drive, from where it
came and to where it went.
4

Bolder Technology, Inc. 2011 Page 5
Finding the Root Cause
Finding the root cause of defects is difficult. Multiple departments at WD do a good job isolating the causes of
defects from their perspective. Hence, the primary way that defects occur is when complex combinations of events
and thresholds occur. It is the cracks between the functional organizations where the last mile of defect
elimination happens. For example, some recent issues involved questions like:
Is the problem a combination of components, or is the problem in the customer machine?
Is there a test machine in Malaysia that let something marginal through that is affected by the customer
PC motherboard?
Did the shipping department at the customer manufacturing site report a dozen cracked shipping boxes
on one pallet of drives?
The following figure shows the complexity of the potential root causes. Capturing and organizing data about each
of these factors is a huge challenge. Moreover, it is essential to have trusted and unified data that allows these
problems to be found and action taken quickly.
Field test
Rework History
Lot DL Diag/Dispo Config History Root Cause
Factory Technician ID Test Parameters Fail Mode
Exposure
Factory
Process
Tooling PickTicket #
Operator Invoice #
Vendor Cust Symptom Critical metrics Field Test Corr
Supplier
Shift DRMs Carrier Root Cause
Line Test Code Customer C/As PCNs
Equip Line/Field SN Box Pallet Para Lot
Tech
Support
Failure
Analysis
Factory
Test
Component
Traceability
Defects

Figure 2 - Root Causes of Defects
Warranty Reserves
Another business problem is managing the warranty costs of returned drives and setting warranty reserves. A
major cost to WD is the cost of shipping, testing, and reworking returned units. Returned drives may be false
positives (i.e., returned units that were never defective) and must be accurately identified. Truly defective units
must be diagnosed and fixes applied to make the unit sellable as refurbished goods. These warranty costs are
directly reflected on the bottom line of the quarterly financial statement.
Furthermore, the percentage of returned items must be publicly reported to the Securities Exchange Commission.
Using the data on defect rates and costs, the Finance group estimates the actual costs using Survival Analysis
5
that
estimates the predicted life of drives according to various factors. For instance, failure rates and forecasts are done
by product line, by platter count, by business unit and by warranty period. OEM customers have different return
profiles depending on their usage pattern. For example, the reliability profile of WD disk drives used in the
products of one OEM customer may decline slightly over the duration of a year, while another OEM customer may
not experience any changes in drive reliability.
Bolder Technology, Inc. 2011 Page 6
Field support for OEM Customers
There are WD field quality engineers who work onsite at Dell, HP and other OEM manufacturers. These engineers
need to access and analyze their quality statistics and share it with the customer. For instance, an OEM
manufacturer may be having problems with a batch of disk drives. The WD field engineers can quickly locate the
cause of the defect, determine precisely the disk drives that are suspect, and replace only those units. Or they can
help the OEM customer figure out what they are doing that causes the drives to fail. It may be that the PC
manufacturer changed a cooling fan or power supply that does not match normal requirements for proper disk
drive operation. Sometimes, the WD field agents educate the customer as to which type of disk drive might be a
better fit in their configuration.
OEM manufacturer customers of WD are now expecting to have access to that detail data. For instance, WD can
query specific lots and drives quickly and reliably from the PC manufacturer site. This has dramatically increased
the loyalty of these OEM customers.
Business Solution
The business solution for WD was to develop the Quality Information System (QIS) that creates total product
traceability throughout the entire supply chain. QIS centrally captures and analyzes detail data on every drive unit
throughout the entire life-cycle from component suppliers, to manufacturing lines, to testing, to shipment, and
finally to customer use in the field. The QIS approach is to:
Build an enterprise data warehouse (EDW)
Capture data from every step in the supply chain
Assure that the data is accurate and complete
Apply dashboards and predictive analytics
Remove the majority of defects before shipping
Act quickly to replace returned HDDs
Find the root cause to prevent further defects
Bigger Magnifying Glass
The analogy of a magnifying glass is often used to explain this QIS approach. As is obvious at any point in time, WD
will find and correct the current problems and search for opportunities to improve cost/performance. To correct
the next problem and to find that next improvement, WD needs yet a bigger magnifying glass. Gary Meister
explained the requirements for a bigger magnifying glass as:
To do that, you need more datadata you were not previously collecting. Then you
need the tools to convert the data into information, and the database capable enough
to hold it all. The database must be queryable so we can get our hands on all the
data. We need to be able to add to data easily and retire data thats no longer useful.
Drilling down, down, and further down with ad hoc complex analysis is key to a lot of
what we do. So you need a [database] box thats big and flexible. That isnt easy!
Data Fragmentation
Like most companies, WDs necessary data is fragmented and found in many places, both internal and external to
the company, such as sales, engineering, suppliers, shipping, manufacturing production lines, R&D, suppliers, and
customers. At each point, there are hundreds of test points, places where engineers or sensors monitor the quality
of goods. On a single drive, there are as many as 3000 variables being sampled at each process step. And, there are
at least six major process steps that break down into dozens of small steps.
As the quality of individual parts approaches perfection, the interaction between multiple components increasingly
defines the quality of the finished product. Therefore, the imperative is to collect all the disparate data into a data
warehouse to get a cross-organizational view of the entire supply chain and product quality. Without the
integrated cross-organizational data warehouse, the "big magnifying glass" could not find the most complex
Bolder Technology, Inc. 2011 Page 7
problems. Their goal has always been to remove defects from the product beginning with suppliers and ending
with customers (OEM PC manufacturers, systems integrators and retailers). But collecting the data is only a means
to achieving fast reaction to defects when they are discovered.
QIS Project
The QIS project is not a project in the usual sense but is a continuing evolution that unfolded over seven years.
Within the QIS project, there have been dozens of both big and small projects, many of which integrated the
various manufacturing systems. Hence, it is difficult to place boundaries on project duration and resources.
Over that period, approximately three to four database administrators (DBA) and architects worked on various
aspects of QIS project, along with dozens of volunteer self-service engineers.
The project requirements are listed as:
Provide simple spreadsheet interface for business users
Strive for ease-of-use for different user types so that analyses are easy
Simplify data mining for the engineers
Reduce time to discover root cause of defects and time to take action
Strive for six hour latency for all data sources
Completion of nightly batch loads by 6:00am, along with any down-stream refreshes
Availability of 24x7 at 99.99% level, with offline maintenance on weekends
SLA for strategic queries is 10-30 seconds for 20-25 super users doing 15-20 concurrent complex queries
SLA for data mining analytics is less than two hours at any time of the day for 10-20 queries
Behind the scenes, a variety of infrastructure components were required to support the QIS system. Central to this
infrastructure are the two Teradata systems, one for high performance with three-months of historical data and
another for high capacity with a year of historical data. WD provides multiple venues for performing analytics using
analysis tools, such as Microsoft Analysis Service, TIBCO Spotfire, SAS and MATLAB
6
. Developers often use
Microsoft Visual Studio, Notepad ++ and Perl as their development environments.
The next sections will go into detail on these components of the infrastructure.
A Data Warehouse from Teradata
The Teradata system is the main component for the magnifying glass that enables WD to find defects, determine
the root causes, and suggest actions to improve quality. WD felt that this required a data warehouse platform that
had both speed and scalability.
Speed is vital to WD because their business moves fast. Within a few hours, 100,000 drives could be manufactured
and readied for shipping. With a 0.5% defect rate, it is likely that 500 drives are defective. Finding those drives and
preventing them from being shipped is critical. And if defective drives are shipped, the ability to track and
proactively recall those drives reduces the high cost of warranty returns. The Teradata platform is essential to
perform the needed analyses rapidly and to eliminate delays in taking action.
Gary Meister explained, Where Teradata shines is the speed of ad hoc analyses. We need answers to the what if
questions that must be fast. Oracle could not handle the response times we needed. With Teradata, we were able
to pull bad devices off of pallets without quarantining the entire shipment. This quickly produced kudos with our
customers as well as the internal staff.
Scalability is also vital to WD because their business is growing in size and in complexity. WD needed a data
warehouse platform that could grow along with them. WD has many researchers creating hundreds of
sophisticated algorithms to explore new approaches to discovering drive defects. The Teradata engine allows WD
to execute those algorithms on massive amounts of data.
The story of a recent acquisition from Teradata is amazing and instructive. In 2007, WD was running an enterprise
data warehouse, model 5255 from Teradata as the data warehouse. When daily data loads began requiring 18
Bolder Technology, Inc. 2011 Page 8
hours, Ross Gough and his team knew that the old system was running up against the wall and needed to be
replaced. In January of 2008, the team started procurement procedures for a new data warehouse platform. The
vendors considered were Teradata, Sybase and Netezza.
The Teradata Data Warehouse Appliance 2500 was selected. When asked about the reasons, Ross Gough stated,
No one else came close to the price/performance or to the maturity. The decision was made easier when
Teradata made us an attractive offer. He continued, Even if you removed migration costs as a factor, Teradata
would probably still have been cheaper at the end of the day.
The purchase was approved in June of 2008, and the Teradata system was delivered one week later. Ross Gough
described that hectic time saying, The box was on site by Friday; all the pieces were in place by Thursday, and in
production by the following Tuesday. The conversion team consisted of only two database administrators along
with Ross. The team was pleasantly surprised that it took less than three weeks from purchase to production. It's a
testimony to the quality of the DBAs and the system they had architected that hundreds of conversion steps and
challenges were handled literally all in a week. The old system was phased out by mid-July.
Most engineers need only three months of historical data to analyze recent product changes and shipments.
However, many engineers demanded more attributes and wanted to keep all the data forever. A compromise was
made to store and maintain one year of historical data. But with all that sensor data and facts from many sources,
they rapidly ran out of storage on the 2500. So in November of 2009, WD acquired a Teradata Extreme Data
Appliance 1555 to hold the full year of history. Engineers now execute the deep history long elapsed time queries
on the Teradata system.
As shown in Figure 3, WD loads the same data into both the Teradata Data Warehouse Appliance 2500 and
Teradata Extreme Data Appliance 1555 each night. As this is done, data is aged off each system differently. The
Teradata 1555 holds 12 months of data whereas the Teradata 2500 only holds 3 months. Imagine the amount of
data captured from sensors and machines on the manufacturing line to fill up a 28TB data warehouse in just two
months. WD decided that the cheaper price per terabyte on the Teradata 1555 was better for them to use for long
term trend analysis than to store more data on the Teradata 2500. It is amazing that only three persons are now
managing both Teradata systems plus developing BI applications and ETL jobs.

Figure 3 - Teradata Configurations
There is no separate development machine, meaning that development work is performed on the production
system. Because the users are forgiving, WD has some flexibility to do development and testing on production
datacarefully.
Bolder Technology, Inc. 2011 Page 9
Data Integration
A major part of the QIS project is the data integration from a variety of sources. This effort is never-ending as
additional sources are continually added. Figure 4 shows the various data sources on top flowing into the QIS data
warehouse, which supports various analyses. Note that without the data integration within the QIS data
warehouse, a user would have to log onto several systems to extract and integrate the data themselves. This
represents a huge savings in time and quality of analyses.

Figure 4 - Business View of Data Integration
WD decided on the conventional approach to extract and load into a single centralized data warehouse and then
transform the data within the data warehouse. This approach is referred to as Extract-Load-Transform (ELT) as
opposed to ETL. Data is usually extracted every two hours from production systems, although some data sources,
like Thailand manufacturing plants, arrive every hour.
This extract processing often places too much burden on several production systems, up to a 20% overhead. To
avoid this, a simple DIY extraction process, using a home grown job scheduler, is executed on SUSE Linux system
using Perl scripts. Every two hours, a script does a bulk copy (BCP
7
) to extract data, compress it, and record that file
name in a log table. A Perl script cycles through the file names in the log table and processes each data file by
decompressing and concatenating the data. Teradata Multi-Load places the data into staging tables, which are
almost never empty. Some staged rows cannot be immediately applied so the process often has to wait until the
next day to apply a complete transaction. In general, all data is ready the next morning.
As shown in Figure 5, the QIS data warehouse contains build records (MITECS), shipping records (DLS), field test
(FSPT), the RMA records (returned material authorization), and Failure Analysis (FA) in the staging area of the QIS
data warehouse. There are two versions of the QIS data warehouse, one with 3 months of historical data on the
Teradata Data Warehouse Appliance 2500 while the other has 12 months of data on the Teradata Extreme Data
Appliance 1555. Data goes into the Teradata Data Warehouse Appliance 2500 which has a unified data model. The
Teradata Extreme Data Appliance 1555 has approximately the same data model so the ETL jobs put the same data
into that machine. The amount of manufacturing data from MITECS and XMMS is between 60 and 100 GB per day.
Bolder Technology, Inc. 2011 Page 10
ETL
ELT
ETL
ELT
Customer
field tests
MITECS
Shop floor
management
XMMS
Manufacturing
test results
DLS
Shipment
tracking
Oracle ERP
financials
Supplier
Data
Deep History
1555
12 Months history
QIS
2500
3 Months history
Refresh
MS Analysis Servers
Refresh
MySQL Servers
staging
staging
Perl
FastLoad
MultiLoad
BASH
BTEQ

Figure 5 - Nightly Batch Loads
MS Analysis Server is used as a cube performance accelerator. After all the data sources are loaded and
transformed into the QIS data warehouse, a process is executed to refresh the MS Analysis Server cubes. Next,
data is transferred to the San Jose site using FastExport, along with all the XMMS data every day. While a majority
of users directly access the Teradata Data Warehouse Appliance 2500, a few use the Teradata Extreme Data
Appliance 1555 for its long history or the MySQL servers for its analytics.
QIS Analysis Tools
Figure 6 below shows the spectrum of analysis tools supported by the QIS data warehouse. For 85% of users,
reporting and casual analyses, such as pivot tables and simple SQL queries, are sufficient. These are delivered
through the QIS Detail Tool, Microsoft Analysis Server cubes, dashboards, and good old Excel. For the other 15% of
users, more sophisticated tools are available, such as Data Deep Dive (D3), SAS, MATLAB, and TIBCO Spotfire.
These tools allow the complex analyses that are the magnifying glasses for finding and correcting defects
throughout the supply chain. Although the number of users may be less, the top of the pyramid makes the biggest
difference in WDs business success.
Statistics
Power OLAP
Data Mining
Capability Studies
Correlation Discovery
Parametric Correlation
Hypothesis Verification
Standard
Reporting
EIS
Executive Reporting
Casual OLAP
Ad-Hoc Query
85% of
users
15% of
users
Dashboards
Microsoft
Analysis Server
QIS Detail tool
Excel
Spotfire
SAS
MATLAB
D3

Figure 6 Spectrum of QIS Analysis Tools
Lets examine several of these tools in detail.
Bolder Technology, Inc. 2011 Page 11
QIS Detail Data
The main portal into WDs analytics is a self-service home-grown query system called QIS detailed data. This
analysis tool was designed to be intuitive and easy to use for the casual user, for whom traditional BI tools would
be too complex. The tool is described as
This tool is heavily used within WD. With a serial number for a disk drive, anyone can get 50+
attributes of data about that drive. They can view the data in its raw form or grab it as an Excel
spreadsheet. Examples of parameters include heads per drive, radius of platter, spin speed, build
location, genealogy, test results by station, etc. Engineers dont have to understand the data
structures or data sources.
Figure 7 shows an example of the QIS Detail data. The initial screen offers various ways of inputting the disk drive
serial numbers to be investigated. The engineer can select any number of serial numbers for analysis. There is a
limit of 250 serial numbers per request that can be copy-pasted into the query screen. Or, the user can submit a
list of thousands of serial numbers in a file, which is loaded into the data warehouse as a temporary table and
joined with the other tables.

Figure 7 - QIS Detail Data Reporting
In the upper left of the figure, commonly used pre-built reports are listed for quick selection. Instead of having the
user wait for results, many scripts will schedule a batch job to perform the analysis and then deliver results as Excel
files via email to the user. In the lower left, various processing workflows are listed for more complex analyses,
such as the Data Deep Dive report (described later).
WD is continually refreshing the tool to allow the user more flexibility and function. The key is that the users must
be able to use the output. It does no good if the users are overwhelmed with complex data.
To create the QIS Detail Data tool, Perl
8
and Microsoft .NET for ASP are used. Most processing is procedural so Perl
is ideal for this application. The user request executes a Perl script that initiates multiple paths to perform the
work.
Microsoft Analysis Services
WD is using Microsoft Reporting and Analysis services and predefines numerous reports and cubes. A database
administrator remarked:
We had to find a way to make analytics easier for the user so we went with Microsoft Analysis
Services and pre-computed the cubes. We have built reports that the user can use to drill down
Bolder Technology, Inc. 2011 Page 12
into a subject. By using parameterized Reporting Services reports, the engineers get data they can
actually use. The report viewer uses simple pull down lists and parameters. Its real hard for the
user to get lost. Even basic product brand families can have a couple dozen product types inside
it. So with the cubes and parameterized filters, one report captures a wide collection of attributes
and can produce 5-7 subordinate reports.
They took the approach that users should have the analysis tools designed for self-service, rather than asking IT
staff to run custom reports for them. It was often referred to as teach them to fish instead of fishing for them.
Using Microsoft Analysis Services to build cubes, the subject-domain fact tables are created in a unified data model
on the Teradata Data Warehouse Appliance 2500. Every night after source data is loaded into the EDW,
preprocessing jobs are initiated to load the cubes, followed by Microsoft Reporting Services that takes the cube
and builds the reports. This processing is scheduled on event completion, rather than based on time.
Spotfire for Quick Insights
Another analysis tool used at WD is TIBCO Spotfire, which has outstanding graphic displays and interactivity plus
the ability to encapsulate most data sets in-memory for fast processing. Once engineers gather the relevant data,
they use Spotfire to drill through the numerous factors to find the key patterns. For instance, if they may see a
combination of head-media types that are intriguing, they can drill down into more facts. Figure 8 shows an
example of a Spotfire report.

Figure 8 - Example of Spotfire Report
WD has been using Spotfire for about six months with seven developers (one in Asia) and about 30 consumers of
the Spotfire analyses, including service center providers Teleplan and Selectron. The developers are using the
Spotfire Professional client application to create analyses for themselves and for wider distribution. These analyses
are distributed by publishing via Spotfire Web. Using the Spotfire client, WD users have access worldwide to these
predefined Spotfire reports.
The WD data analyst reflected on his use of Spotfire, Once I get the data I am interested in, I take it to Spotfire.
The drill through is the key to Spotfire. I may see a combination of head-media types that I'm intrigued by so I then
drill down into additional detail, such as parametric test results. Spotfire is useful because it handles large data sets
and lets me drill down on them.
Bolder Technology, Inc. 2011 Page 13
Spotfire has become so useful that all failure analysis data is now reviewed with Spotfire. With simple pull down
menus it's possible to select a product, failure type, customer, date range, and so on. With only a few clicks, they
can drill down to the details. Failures like noisy motors, dynamic head slaps, no-fault-found, bad overwrite,
firmware, cannot initialize, and data erasure can be drilled into and visualized.
Spotfire reports reside in-memory with many options to pivot or filter the data instantly refreshing the
visualizations. Once the report is defined, it can be published on the Spotfire web server. Anyone who browses the
library can view the reports and zoom in on the data of interest to them. Spotfire captures the data in the report
when it is stored in the library of results so users can also go back and revisit an analysis done days earlier. Thus
the original data for a set of reports is also available. When the data becomes stale, Spotfire can refresh the data
from the Teradata data warehouse. Some published reports can be 100 MB or even 1 GB. Since the query could
run ten to twenty minutes, caching of the results is useful. There is a project in the Database Administration team
to automate the refresh of Spotfire reports on a weekly basis.
Another WD data analyst related an example of the effectiveness of Spotfire. While visiting a factory recently, he
arrived in the morning and had two hours to compile facts for a management meeting showing the causes for
certain failures. He quickly created several reports in Spotfire. Halfway through the meeting, product engineering
was discussing design and process changes to improve the results, instead of debating the data, They went
straight into solving the problems!
Another Spotfire example is a tree mapping analysis
9
used to find incorrect readings in large testing machines.
Remember that WD is testing 500,000 drives a day so the workers need tools to help them find bad test slots the
proverbial needle in a haystack. Figure 9 shows the treemap for a large bank of testing machines, with the darker
colors indicating unusually high rejection rates for a specific slot into which a disk drive is plugged for testing.

Figure 9 - Spotfire Identifies Bad Test Machines
Here is what is happening. Each machine tests 60 drives at a time and processes a thousand disk drives every day.
Each drive is given several hours of testing before it can be sold. If it fails the first testing, the drive endures a
second test that lasts for four solid days. After the second test, results are compared to determine whether the
drive is actually defective or whether there was a problem with the testing machines.
Bolder Technology, Inc. 2011 Page 14
If a testing machine indicates that the drive was defective but was actually good, this is a false failure, which is
expensive in terms of labor, test machines, and drives. This is often a mechanical failure, like bent HDD connector
pins, in the primary test machine. Knowing if there are bad slots in the primary test machines will prevent the
wasted effort in the secondary testing process. With so many devices being tested for several hours in one
machine and dozens of test machines, the technicians who are plugging drives into slots cannot keep track of
failures. Besides, some failure rates in a machine may simply be random in that a few bad drives got put in the
same slot several times.
Using the treemap, the engineers can tell if testing failures are caused by the testing machine itself. The red
indicates high rates of failure for specific slots relative to other slots. Hence, the tree map is analyzing the accuracy
of the test machine, not the defects in the drives. Requiring over a million rows of data, the use of pivot tables in
Excel will not handle this scale of analysis. With a glance at the Spotfire treemap, engineers can quickly detect the
testing machines and the specific slots with the highest failure rate.
When the treemap was first developed, the CIO immediately asked the manager responsible for all the service
centers worldwide to implement it. With only fifteen minutes of training, the staff quickly learned how to
effectively use the treemap. In less than a week, all the service centers had changed their business process for
testing. It was incredible that WD was able to move a new analysis tool from initial insights to major business
actions in one week. Fast decisions and corporate agility are clearly a trademark of WD. Indeed, It is not the big
that eat the small. It is the fast that eat the slow.
Data Deep Dive
Data Deep Dive (D3) is another analytic query tool that is called data mining for the masses. Like QIS Detail Data
reporting, D3 is user-friendly allowing engineers to simply know the serial numbers for the failed drives. Unlike QIS
Detail data, D3 uses more sophisticated analytics, such as a neural network, logistic regression, and decision trees
to produce various reports, both graphical as well as tabular.
The D3 requests may take two hours before the Excel result analysis file is sent to the engineer. Every D3 request
produces a collection of answers in one email to the requester: user guide with tips and how-to guidelines, a
neural net analysis, tribal knowledge, decision trees, failure analysis, etc. There is a user training component so the
user can understand the output. The user guide is sent with every data mining analysis along with zip files
containing the results. The requester sometimes doesn't understand a statistical significance. So it tells them which
parameter in the current analysis is statistically significant --which may not be the column attributes they are most
familiar with. At the current time, about 20-30 engineers in the factory and field quality are using D3.
D3 analytic processing is performed inside the Teradata system. A Perl script generates dynamic SQL, which
creates a temporary table to be used for one or two hours. These semi-temporary tables are deleted every 30 days
since the files are not large and the engineer wants to revisit the analysis. For a single D3 request, the Perl scripts
can issue an incredible 15,000 queries. It is also interesting that FORTRAN is used for some of the analyses.
As an intense user of D3, one of the data analysts discovers new ways of solving quality problems.
I mainly do ad hoc analysis. Maybe we roll out a new media type and run evaluations in the
factory, but no one has compared media-A versus media-B across the entire manufacturing-test-
field process. Thats a straightforward analysis.
At a deeper level, an analysis may focus on test data, such as touch-down power during the build
process for dynamic fly height adjustment to find out how high above the media the head flies.
There is actually no physical touching of the platter by the heads, although it comes very close.
This touch-down parameter is unique to every drive and head combination. We take 80
measurements per drive on an 8-head drive. Low touch-down values reduce reliability.
Finally, there is a variance reduction competency team that predicts quality in the field versus
engineering. For each drive, there are variances in the parameters for the thousands of
components and in manufacturing steps. The challenge is to determine which of those variances
and especially which combinations of variances make a difference in the quality of the drive."
Bolder Technology, Inc. 2011 Page 15
D3 is used to find the data that should be analyzed and then transfers that data to Spotfire to transform it into
something understandable and useful. In other words, D3 is the starting point for selecting drive serial numbers
and faults, and Spotfire generates the report results.
Another data analyst performs What If analyses in a big way. The speed of getting to an answer is critical,
because quick time to information allows us to drive a quick action. He has improved that cycle time
drasticallyby 2x or more.
When the magnifying glass is applied across major business processes, the database administrators may have to
change the database design or analysis processes to eliminate confusion and ambiguity. For example, engineers
need to compare test results with the build configuration, the list of specific components used in the disk drive.
This requires the joining of data from two applications, one (XMMS) containing the test data and another (MITECS)
containing the build configuration. Both use the serial number of the disk drive as the primary key. Using the serial
number works if the manufacturing is sequential; however, if a disk drive is tested and an error condition is
detected, then the drive must be rebuilt, for instance, by replacing a head assembly and then retesting. Figure 10
shows this rework sequence.
Build #1
Test #1 Rebuild #2
Retest #2
Ship
error
error

Figure 10 - Reworking Defective Builds
Using just the serial number, it is now ambiguous which test results go with which build configuration. Engineers
were spending considerable time resolving this confusion. The solution was for the database administrator to add
a date/time stamp to the primary key. Each time the disk drive was reworked, the test results could be uniquely
matched with the proper build configuration. Defective components entering the manufacturing line could be
detected and corrected quicker and with less effort.
The data analyst reviews the data to determine if specific test software or some man-machine combination may be
causing the problem. This data and analysis go to product engineering teams who do further analysis and make the
changes. They then monitor the results of their changes. Lag time for error correction and detection can be as little
as three weeks or up to six months given problems involving the field deployments and buyers.
New Product Design Evaluations
The Servo System Enabler Development team at WD is considered a key consumer of data from the Teradata
system. This staff consists of ten PhD researchers, many from the University of California at Berkeley. The focus is
new product introduction with an analysis system called NPI Express. The objective is to shorten the development
time and increase the yield for new products, called EVALs, meaning they are being evaluated for release to
manufacturing. The analyses are a micro focus on the internal design and manufacturing processes, in contrast to
previous analyses that had a macro focus on the external end-to-end customer satisfaction. The manager
remarked, The EVAL is the focus! If we add a new gadget into the product, what is its predicted failure rate? This
is very R&D oriented yield management. The objective is to reduce the time spent on building prototypes, which
Bolder Technology, Inc. 2011 Page 16
usually consumes 80% of the development time. With NPI Express, development time has been shortened, and
defects are reliably removed before the first production run of new products are scheduled.
MATLAB PCs
QIS
12 MySQL
Servers
Apache/PHP
web servers
Nightly
FastExport
~100GB
Designers of
new products
Reports

Figure 11 - NPI Express Architecture
As shown in Figure 11, NPI Express starts each night with extracting approximately 100+ GB of data from the QIS
data warehouse using FastExport and loads it into an array of the MySQL Servers. Data partitioning across servers
is done on product, then further on function servo, head, media, etc. Apache Tomcat and PHP are used to create
and deliver reports. MATLAB generates the reports that have tabular data, many graphs, trend curves, and scatter
plots, etc. MATLAB runs on 50 PC machines that obtain data from MySQL databases, flat files, or some directly
from Teradata EDW and then invokes PHP to produce the reports. Because the reports are pre-generated, analysts
are provided with instantaneous report delivery along with some drill-down capability. Redundancy is built into the
MySQL configurations so that if one server is down, another picks up those data report delivery tasks. The group
that owns this application is a data mining group that monitors every single drive.
It was asked if it was possible to run the NPI Express system directly on the Teradata system, instead of MySQL.
The reply was that, when this application was first built, the developers did not know about the QIS data
warehouse. Thus, they built the NPI Express application by gathering data from many data sources, which was an
enormous burden. When they discovered the QIS data warehouse, all the data gathering and cleansing was
discarded by simply pulling all the data from the Teradata system. With thousands of engineers using this
application, there is no motivation to move the application onto Teradata. The manager remarked, It works, dont
fix it. But the QIS database is the system of record they depend on daily.
The manager described the history of NPI Express going back to 2005. Back then, most data was locked in tens of
thousands of static flat file reports. He started building a database application using MySQL to contain all this data.
He was so passionate about analysis that he started out spending his own money and working on a server in his
living room. To produce reports on servomechanisms, he collected every day the data from many sources, which
was very slow. In 2007 when Ross Gough told him about the QIS database on the Teradata EDW, data collection
was reduced from several hours per day to just minutes. By the end of 2007, he had hired a full time person to
manage these servers, extracts, and reloads.
He concluded, I am only touching a small portion of the Teradata data. There is much more I could use when time
is available.
Benefits Realized
For a project like QIS that spans seven years, it is difficult to specify all the benefits that were realized. However,
there were several benefits that clearly stand out.
Bolder Technology, Inc. 2011 Page 17
First, WDs market share has consistently grown from 12% in 2001 to over 31% by mid-2010, which surpasses long-
time market share leader (Seagate) at about 30.0%. This growth has grown organically by selling more and more,
rather than through acquisitions. WD executives are convinced that the QIS project contributed the enabling tools
across the entire company to ensure this market success.
Second, WDs obsession with quality improvements using QIS produced the best warranty accrual rate in the
industry (at 1.9% defects per million HDD units)
10
, thus implying the highest quality product in the industry. Both
shareholders and customers appreciate this.
Third, WD has the lowest warranty returned units in the industry. In particular, accurate forecasts have reduced
their warranty reserves, along with more accurate reporting to stockholders. Again, the bottom line is improved,
and customer satisfaction increases proving that investments in quality pay off.
Fourth, WDs precision at identifying defective units reduces the number of units that need to be returned, along
with increasing trust and loyalty of their larger customers. Also, this precision reduces the number of ongoing
shipments of units not affected.
Finally, there were numerous benefits in the reduction of reaction times, as shown in Figure 12.
Before QIS After QIS Benefits
Quantify total
exposure
Date code range by
platter count
Serial # of affected and
unaffected product within
date range
Avoid recall, rework costs on
unaffected products sent to
customers
weeks 15 minutes
Detect affected
product outside of
date range
Hand sort Serial, package, and pallet
#s of affected product
Cleared inventory with speed and
accuracy, using list of locations
weeks 30 seconds
Customer ID by
box, pallet,
invoice, date
Pallets ID WD and customer serial
numbers
Ship unaffected product; customer
serial #s used
1 to 2 weeks 15 minutes
Efficiency of
Recall
Weekly roll up--
receipt only
Daily status by serial #,
received product,
processed returns
Better customer cooperation, lower
customer risk, daily exposure
insights
weekly daily
Field Analysis
Separate
incremental from
standard returns
Call Center, Receipt, Field
Test, all by serial numbers
Predict and manage warranty
reserves, reliability, spares, and
fraud with accurate data
weekly daily
Data Mining
Historical best
guess + intuitive
engineering
Supplier data to factory
data to physics of failure,
Serial # X 1000 attributes
Potential discrimination of FR -
individual drive - % no rework, %
left in field, new predictive metrics weeks/ months/
never
daily
Figure 12 Before/After Comparison of Reaction Times
Lessons Learned
Each person interviewed was asked about lessons that they learned from the QIS project. What of those lessons
would they share with other IT professionals? There are several insightful suggestions below.
Leverage the Resources of Key Partners
Gary Meister suggested that it is important to get access to a network of people that have struggled with similar
problems. Often this is networking through key partners, such as technology vendors like Teradata. Gary continued
with this example:
Through our Teradata Sales guys we talked to Teradata retailers to discuss mass market supply
chains. We learned a lot about demand planning from their point of view. If Steve Cobb was any
other sales guy, he would have ignored our requests and focused only on selling more Teradata
Bolder Technology, Inc. 2011 Page 18
products. We would have proceeded with our plans incorrectly. Access to thought leaders in other
industries, thanks to Teradata is valuable to Western Digital. So Teradata brings more to the
table than hardware. We learned things about demand planning and supply planning that we
needed to learn.
Sanity Check Your Data
The engineering data analyst emphasized doing a sanity check on your data data quality is not optional. He
continued
Dont blindly trust the data that you have is the correct data. Data quality is a high priority for the
database administrators. The unexpected lack of data quality is our biggest thorn. I learned this
the hard way. Recently two columns in a record started arriving from the source system swapped
in position, which caused all kinds of problems. IT can get a black eye for data being wrong even
when it came that way from the source. We have a lot of reports to indicate to us when data is
missing or incorrect.
Continuously Evolve Processes
The engineering data analyst suggested that a company must continuously evolve processes and analyses. What
WD did a year ago is different from today. Therefore, WD is continuously revising everything with the intent of
simplifying the use of data and getting more out of it.
Simplify Everything for Users
The engineering data analyst shared that everything must be simplified for the users. WD started with helping the
users to understand simple facts and the related causes of problems. Then they start using pull down parameter
reports. Next they started exploring data mining. Dont jump in and throw complex things at them at the start.
Start the users on the easiest things first and then build towards more complex capabilities.
Share Analysis Skills and Experiences
The field quality engineer suggested that the company must share the analysis skills and experiences. They are
publishing a document called QIS Tips and Tricks that describes how to find the data and interpret results. All the
users have access to this document because it helps them with techniques, definitions, and common tasks.
Need for a Data Strategy
The servo system manager suggested that all companies need to have a data strategy for centralizing data in one
place. No one should deal with the mess of extracting from many systems and then cleaning the data to make it
useable. One place should contain and manage all the data.
Pay Attention to the Organizational Cracks
The servo system manager also suggested that a company must spend labor and domain knowledge to look
carefully at the cracks between the organizational units. Every unit is strong minded and possessive about their
own data. Once each unit fixes their own faults, the hardest problems are to discover and fix are those that are
cross organizational.
Synthesis
Previously, the story about the QIS project was described by its corporate setting, business problem, business
solution, and project. Later, it was summarized in term of: benefits that were realized and lessons learned by the
persons involved. What should we take-away from this story?
First, WD has cultivated a strong culture for discovering actionable information from detail data, with much
patience and perseverance. Many manufacturing companies pay close attention to continually improve their
processes and thereby improve the quality of their product. What is unique to WD is that this continual
Bolder Technology, Inc. 2011 Page 19
improvement mentality is first applied to the data about the manufacturing process. At WD, it is the data that
drives the continual improvement of their products.
Second, the pervasive analytic culture at WD comes from management and not technology. Stories abound about
how a particular analysis resolved some nasty problem. When sharing the experiences of those situations,
everyone repeatedly uses the analogy of the magnifying glass. This attitude is especially reinforced by the top
executives who attribute analytics as the enabler for having the lowest defect rate in the industry and for growing
from 12% to 31% of the market share. As mentioned earlier, Sunlight is the best disinfectant. Errors can be fixed
once you can see them.
Third, the discovery of new data sources is also relentless. It is like workers lining up with buckets of coal eager to
throw the coal into the central furnace to heat the building. Engineers are innovative where they find those
buckets of data.
Fourth, developing and learning new analysis techniques does not scare people, as it does in many other
corporations. Often the statistics course taken in college puts a damper on discussing anything involving
mathematics. Not so at WD. References to Six Sigma, Cox proportional hazard models, and Chi square are
everywhere. When a new analysis technique proves effective, there is no hesitation to make it available
throughout the company, to engineers, technicians, and managers alike.
Fifth, WD has a wide variety of analysis tools in use, and the list is continually expanding. Analytics in its many
forms are exploding in the IT marketplace, and WD is riding that wave. This is a significant competitive advantage
over the long term for WD.
Finally, WD made an early commitment to a single integrated data warehouse. This is the most important insight
into the success of the QIS project and the company in general. In its natural form, data is diverse and messy.
Whatever and whoever initially collected the data did not have the ability or motivation to transform and enhance
the data so that the data could easily be integrated into a shared data warehouse. This transformation and
enhancement is a difficult process and a constant struggle. It is a cost paid up front before the business benefits
can start accruing. WD makes that commitment and is now accruing the benefits of the lowest defective parts per
million in their market. Shareholders like it. WD customers demand it. The competition may be unhappy.

Bolder Technology, Inc. 2011 Page 20
SIDEBAR: Social Responsibility is Smart Business
Gary Meister shared his experience with the importance of social responsibility for WD. He remarked that WD was
doing Six Sigma
11
before there was a name for it. WD was doing business process re-engineering before it was in
vogue too. Last year Gary spent time delving into corporate social responsibility:
WD was already doing all of the things the social responsibility experts suggested because it helps
drive our costs down. We have been reducing our electrical costs for years. Now the rage is
carbon footprint reduction. But for us, it was just the right thing to do long ago.
Also, we use a lot of water in our manufacturing processes. Reducing water consumption is
another wave of corporate responsibility. But not all the water is used or consumed. What water
WD does not use is sent back out cleaner than when we got it. WD is emitting ultra-clean water
because purified water is used in fabrication to eliminate contaminants
References

1
Note the title of Gary Meister as CIO and Senior VP of Customer Satisfaction. Not only does he manage
all of IT, but it is a clear message from WD top management that IT services are to be integrated with
customer satisfaction objectives.
2
Western Digital Corporation, Company Profile, February 2010.
3
http://www.thefreedictionary.com/obsession
4
See http://en.wikipedia.org/wiki/Hard_disk_drive for a technical overview of HDD. The photo was
extracted from this article.
5
See http://en.wikipedia.org/wiki/Survival_analysis for an overview of Survival Analysis that predicts
failure rates in mechanical systems.
6
http://www.mathworks.com/
7
Bulk copy (bcp) command of Microsoft SQL Server provides the ability to insert large numbers of records
directly from the command line.
8
Perl is a high-level dynamic interpretative language similar to C. http://en.wikipedia.org/wiki/Perl
9
Tree mapping is an analysis that uses hierarchical data as a set of nested rectangles.
http://en.wikipedia.org/wiki/Treemapping
10
From http://www.warrantyweek.com/
11
Six Sigma was defined by Motorola in 1981 to gage the maturity of a manufacturing process that
produced 99.99966% of the product without defects. http://en.wikipedia.org/wiki/Six_Sigma


Bolder Technology, Inc. 2011 Page 21
About the Methodology
The objective of this case study is educationto share insights with other IT professionals so that we can mature
as an industry, amid escalating business challenges and rapidly evolving technology. An on-site visit evolving a full
day of interviews was conducted to adequately document the details of this case. Prior to the on-site visit, there
were several telephone discussions that narrowed the scope to a specific IT project with its business requirements,
timeline, resources, and results. As synthesized from the on-site interviews, several drafts were circulated for
review. When the participants were satisfied with its contents, the document was submitted for publication
approval by the company. All discussions and collected materials were considered confidential until the company
had approved the case study for publication.
Richard Hackathorn of Bolder Technology and Dan Graham of Teradata conducted the interviews. The persons
interviewed at Western Digital Corporation were:
Ross Gough, Director, Data Warehousing/Business Intelligence
Gary Meister, CIO and Senior VP of Customer Satisfaction
QIS Technical Architect
QIS Senior Staff Engineer
Principal Engineer Field Quality
Senior Principal Engineer Advanced Reliability
Engineering Manager of Servo System Development
About Bolder Technology
Bolder Technology Inc. is a twenty year old consultancy focused on Business Intelligence and
Data Warehousing. The founder and president is Dr. Richard Hackathorn, who has over thirty
years of experience in the Information Technology industry as a well-known industry analyst,
technology innovator, and international educator. He has pioneered many innovations in
database management, decision support, client-server computing, database connectivity, associative link analysis,
data warehousing, and web farming.
Richard was a member of Codd & Date Associates and Database Associates, early pioneers in relational database
management systems. In 1982, he founded MicroDecisionware Inc. (MDI), an early vendor of database
connectivity products, growing the company to 180 employees and was acquired by Sybase, now part of SAP, in
1994. He has published numerous articles in DM Review and BeyeNETWORK. He is a member of the IBM Gold
Consultants and the Boulder BI Brain Trust. He has written three books and was a professor at the Wharton School
and the University of Colorado. He received his degrees from the California Institute of Technology and the
University of California, Irvine.
About the Sponsor
Teradata is the global technology leader in enterprise data warehousing, analytic
applications and data warehousing services. Organizations around the world rely on
the power of Teradatas award-winning solutions to obtain a single, integrated view of
their business to enhance decision-making, customer relationships and profitability.


EB-6334 > 0311

S-ar putea să vă placă și