01-CIRP Annals - Manufacturing Technology Volume 60 issue 1 2011 [doi 10.1016%2Fj.cirp.2011.03.042] Katja Windt; Marc-Thorsten Hütt -- Exploring due date reliability in production systems using data mining methods ada

Exploring due date reliability in production systems using data mining
methods adapted from gene expression analysis

Katja Windt *, Marc-Thorsten Hu tt
Jacobs University Bremen, School of Engineering and Science, Campus Ring 1, 28759 Bremen, Germany
Submitted by Hans-Peter Wiendahl (1), Hannover, Germany
1. Introduction
Technological highly demanding processes of multi-variant
series productions are inuenced by different factors, e.g. changing
sequencing rules, high set up times, specic customer wishes. In
combination with a network-like material ow between different
manufacturing sites, high due date reliability, as one key logistics
target gure, is often difcult to achieve. The challenge is to
identify the parameters most relevant for due date reliability and
to derive effective counter-measures.
Due date reliability is measured by the parameter lateness L.
Lateness is dened by the difference between actual delivery date
and planned delivery date. The positive lateness L
+
(L > 0) means
that the order is delayed whereas the negative lateness L
(L < 0) is
subjected to a too early delivery [1]. Reasons for lateness need to be
explored by feedback data analysis assuming that the planned
delivery dates are based on realistic planning data. The main
characteristic of lateness is that all possible parameters of either
process or order attributes can in principle exert an inuence. In
addition, parameter interdependencies make it more difcult to
identify the most relevant parameters for improving lateness. For
example a change of an order in the queue on one machine
inuences the order sequence on the successive machine. Hence,
the reason for lateness at one machine might be attributed to poor
sequence adherence at the previous machine. Even though this
would be the reason for lateness, it does not mean that the main
reason for lateness has been identied. The hypothesis of our
approach is that there is not just one reason for lateness. In fact, a
set of high-dimensional order and process attributes result in
lateness. The aim of this article is to present a set of methods
capable of coping with the complexity of due date behavior and
able to identify the most inuential attribute sets for high or low
lateness.
The article is divided into two parts: The rst part describes
extensions of conventional clustering methods and the second part
is devoted to a new method derived from gene expression data
analysis. Gene expression is the process of reading off an entity in
a biological genome, a gene, and then biochemically producing the
corresponding proteins encoded by this gene. Particularly over the
last decade, systems biology has developed a rich set of methods
(see, e.g. [2,3]) for systematically exploring the intricate activity
patterns of genes and relating them to the underlying production
system, i.e. to the networks of interacting genes (see [4,5]). A
fundamental challenge is to understand the origin of the robust
functioning of biological cells from these data. In many respects,
multi-variant series production of technologically challenging
products using network-like material ow layouts is confronted
with a similar complexity. To build a detailed model of such a
manufacturing scenario is nearly impossible, especially when the
numerous order sequencing rules are neither well known nor
understood at all the machines. Understanding lateness in
manufacturing as a result of the contribution of a multitude of
parameters is comparable to understanding the robust functioning
of biological cells from activity patterns of genes.
In a rst approach, traditional data mining tools were used to
identify parameter combinations most inuential for low or high
lateness, respectively. Each identied parameter combination (or
cluster in parameter space) was discussed and validated by experts
in the underlying manufacturing scenario (Sections 2 and 3). These
CIRP Annals - Manufacturing Technology 60 (2011) 473476
A R T I C L E I N F O
Keywords:
Logistics
Pattern recognition
Scheduling
A B S T R A C T
Identifying causes of lateness inmultistage production systems demands methods for considering a high-
dimensional order and process attribute space. Simultaneous measurement of expression levels of
thousands of genes in a biological cell provides a data set for understanding robust cellular function.
Methods developed in computational systems biology for analyzing gene expression data enable the
identication of the most inuential criteria sets. Gene expression is the production process of functional
elements (enzymes, proteins) in a biological cell. Logistics data analysis faces a similar challenge: What
attributes of orders can be associated with high and low punctuality? We combine methods from cluster
analysis and computational systems biology to explore the relationship between order and resource
parameters and lateness. With this novel approach we determine intrinsic interdependencies between
order parameters and process parameters. For the case study described here, this approach has improved
the precision of predicting the lateness of an order by 14% compared to a majority vote among
neighboring orders in parameter space.
2011 CIRP.
* Corresponding author.
Contents lists available at ScienceDirect
CIRP Annals - Manufacturing Technology
j ournal homepage: ht t p: / / ees. el sevi er. com/ ci rp/ def aul t . asp
0007-8506/$ see front matter 2011 CIRP.
doi:10.1016/j.cirp.2011.03.042
analysis techniques were augmented by methods from gene
expression studies (Section 4), namely an enrichment study (see,
e.g. [5,6] for details on the corresponding systems biology
applications). Here, enrichment means the over-representation
of a certain event type (here: positive or negative lateness) in a
cluster compared to a randomselection of events. In the analysis of
gene regulatory networks enrichment denotes for example the
over-representation of a certain biological function in a set of
expressed genes. Acritical summary is presented in Section 5 along
with future research steps. All results are based on a real scenario
of steel manufacturing.
2. Data mining as an analysis tool for huge data sets
Tools for data mining have the primary goals of revealing
relations (descriptive data mining) and predicting outcomes
(predictive data mining) [7]. In recent years the use of data
mining in manufacturing and logistics engineering is growing [8].
In [9] principal components and cluster analysis are used for the
analysis of system performance based on feedback data of wiring
harnesses manufacturing and assembly stages. Another applica-
tion of cluster analysis is demonstrated in [10]. There the aim is to
investigate the volumetric errors of a ve-axis machine tool.
The general workow of analysis in this paper is depicted in
Fig. 1. Starting from a large data set of orders (in this case approx.
10
5
individual orders), which are characterized in terms of several
tens of attributes, a subset of attributes is selected as potentially
relevant parameters.
Those processed data are then subjected to a traditional
cluster analysis and the biology-motivated enrichment analysis,
leading to transformed data (or, more specically, the shapes of
clusters in parameter space). Further data mining steps then lead
to patterns that can be interpreted from a systemic perspective,
yielding a deeper understanding of the system at hand.
3. Data mining technique detecting reasons for lateness
3.1. Manufacturing data analysis by use of cluster analysis
The manufacturing scenario of sheet metal production involves
the main processes of steelmaking, continuous casting, adjusting of
slabs, forming, hot roller mill, dimensioning of coils, cold roller
mill, annealing, hot coating, and againadjusting of coils. Fig. 2 is the
schematic rendering of the material ow for the data analyzed
here. The size of the arrows represents the typical size of the
material ow at each point of the production process. While the
overall process architecture is linear, each individual order can
take very different paths through this network. Furthermore, some
of the nodes are distributed over different locations. The dominant
linear arrangement is due to the directedness of the production
along a sequence of technologically determined steps. These
technological and logistical ow restrictions show the real
complexity of the material ow network. As an example, on the
hot roller mill about 200 sequencing rules have to be taken into
considerations. The steel grade, customer specic dimensions and
other attributes lead to a high product variety. Additional
complexity is created by shortfall quantities, highly prioritized
orders and changing consignments of customer orders to steel
slabs or coils. Overall, lateness was not satisfactory and needs to be
improved, ideally with a few, but very effective counter measures.
3.1.1. First step: Identication of order and process attributes
In close coordination with company representatives attributes
were identied in categories such as order and production process.
A selection of (numerical and categorial) parameters used in the
cluster analysis is given in Table 1. Further attributes are used but
not listed due to condentiality reasons.
3.1.2. Second step: Cluster analysis
The k-means clustering algorithm is used in the following [11].
For ease of determining the parameter k (number of clusters) the
parameter sum of squared errors (SSE) was introduced. While SSE
is decreasing, the challenge is to nd a suitable k-value (on the one
hand large distances between cluster centers and small distances
between cluster element vectors). With this initial parameter k the
cluster analysis was conducted. Relevant clusters are characterized
by a rather narrow distribution in the very high lateness regime
(cluster C
2
in Fig. 3). Each of the clusters C
j
has a footprint in each
of the attributes i, a distribution P
i
(C
j
) of values, which in the
following is characterized by its average value m[P
i
(C
j
)] and its
standard deviation s[P
i
(C
j
)]. Fig. 3 summarizes this situation
schematically for one moderately relevant cluster, C
1
, and one
relevant cluster, C
2
. For clarity, we also represent the correspond-
ing distributions for all orders, namely the full lateness distribution
(x-axis) and the full distribution P
i
of attribute i (y-axis;
characterized by m[P
i
(total)] and s[P
i
(total)]). After selecting the
most relevant clusters (Fig. 3) among all k clusters by means of size
and lateness the chosen clusters are further analyzed.
For this purpose measuring the relevance of each cluster
attribute we introduce the parameter distinctness T for numerical
and categorial attributes. For numerical attributes, the distinctness
T is the normalized sum of the differences (cluster vs. total orders)
of the means and standard deviations. For categorial attributes, it is
the sum over all categories of event differences. T can be used to
perform a ranking of the attributes of a certain cluster and an
accordant prioritization in later interpretation of the results.
Fig. 1. Data mining process diagram
(Adapted from [7]).
Fig. 2. Material ow network of steel manufacturing.

Table 1
Attributes and parameters used in cluster analysis.
Category Attribute Parameter
Order Type of product Steel grade, product category
Dimensions Width, thickness
Steel casts Number of suitable steel casts
Weight Coilweight, order weight
Production Production depth Number of production steps
Process Change of sites Number of location changes
Rework intensity Number of rework processes
Quality Surface quality, steel quality
Allocation changes Number of order-product allocation. changes
K. Windt, M.-T. Hutt / CIRP Annals - Manufacturing Technology 60 (2011) 473476 474
3.2. Results and validation of cluster analysis
Fig. 4 shows the results of the relevance analysis described in
Fig. 3 for all k clusters. In total, 10 clusters (marked as red) are
classied as relevant according to output lateness and cluster size
(grey bars in lower part of Fig. 4).
One example of a detailed analysis of attribute relevance of one
cluster is given in Table 2. Thickness and surface quality were
identied as the most inuential attributes towards lateness in this
cluster. The validation of these ndings by expert interviews has
shown that the respective steel grades in this cluster were mainly
served by unplanned degraded material fromother orders as a low
surface quality is one key attribute. Degraded steel slabs or coils
are then used for orders with lower quality requirements. As these
types of orders are not planned regularly, a clear reason for lateness
was provided by the experts. The small portion and prot margin of
these steel types does not call for counter measures. But it was
demonstrated that cluster analysis is an effective tool to identify
causes for lateness.
4. Enrichment analysis: high dimensional multi data methods
adapted from gene expression analysis
4.1. Fundamentals of enrichment analysis
The enrichment analysis developed here builds on related
methods for the analysis of gene expression data ([4,6]). Like before
its rst step is a clustering analysis of orders in a parameter space.
The k clusters C
i
, i = 1, . . ., k, identied in this way are then assessed
with respect to their statistical over- and under-representation of
positive lateness values. This is the enrichment factor R(C) of a
cluster C with size jCj. It is dened via an intermediate step
involving the (logarithmic) proportion X(C) of delayed orders
(positive lateness) in the cluster:
XC log
p
1
C
p
1

where p
+1
(C) is the percentage of +1 events in cluster C and p
+1
denotes the percentage of +1 events across all orders. Starting
from this quantity, the enrichment factor R(C) of the cluster C is
now dened as the z-score of this quantity X(C):
RC
XC
sC
where s(C) is the standard deviation of the quantity X obtained
from delayed orders for jCj randomly selected events. For
simplicity the lateness L was mapped onto two categories: early
(negative lateness; category CL = 1) and delayed (positive
lateness; category CL = +1).
4.2. Enrichment analysis results
The enrichment analysis identies clusters with above-random
occurrences of events with high and lowlateness, respectively. In a
subsequent step those individual parameters are extracted that
separate high and low lateness cases most strongly.
To this end, we consider the two clusters with the highest and
lowest enrichment factor, respectively, and analyze the distribu-
tion of one of the parameters, say P
i
within each of the two clusters.
These distributions are characterized by their respective mean
values m
+
(P
i
) and m
(P
i
), and standard deviations s
+
(P
i
) and s
(P
i
).
Whenever the difference of the mean values is larger than the sum
of the standard deviations, this parameter is said to exert a strong
inuence on lateness:
r p
i

jm
p
i
m
p
i
j
s
p
i
s p
i
>1
An arrow in Fig. 5 pointing towards lateness indicates such an
inuence (thin arrow: 1 > r(P
i
) 2, thick arrow: r(P
i
) > 2). In the
previous analysis, lateness served as the evaluation parameter, i.e.
as an outside label, whose distribution in the parameter space is
evaluated. In principle, we can apply the enrichment analysis also
consecutively to each of the parameters, serving as the outside
label, while the remaining parameters form the parameter space.
In this way, we can identify the network of inuences among the
parameters (arrows in Fig. 5), as mapped out by the available data.
This application of the enrichment analysis has been done for all
process parameters.
From validation workshops with experts, it is known that
producing tonnage is the most important target gure. It is clear
that weight and thickness have a strong impact on lateness (the
thinner and the less coilweight the higher the lateness). Conse-
quently, counter measures need to address the performance target
system of the entire company as well. It is also shown that
thickness and coilweight have an indirect inuence on lateness as
well. The higher thickness and coilweight the higher the number of
location changes. And the higher the number of location changes
and the more number of production steps the higher the lateness.
One can easily follow that changing manufacturing sites involves
risks of delays especially if these changes occurred often
unplanned. Equally, the likelihood of delays is higher the more
production steps are needed. Knowing about these multi-dimen-
sional causes and effects gives the possibility to derive effective
counter measures in two respects: To avoid a cause (e.g. reduce
number of location changes if possible for those orders of high
coilweight and high thickness) or to consider the needed change
over time in the respective planning algorithm.
Lastly, it is informative to use the enrichment of clusters with
respect to lateness as a predictor for the lateness of individual
orders. The quality of such a prediction is then given by the number
of successfully predicted lateness types (positive or negative
lateness) compared to the measured false positives and false
negatives. This forecast quality can be directly compared with the
simplest prediction scheme, where lateness labels (positive,
Fig. 3. Identication of relevant clusters and attributes.
Fig. 4. Identication of relevant clusters.

negative) are assigned at random, and a more sophisticated
scheme, where a majority vote among the ve nearest neighbors of
an order in parameter space are used as predictors of the lateness
type.
Fig. 6 shows the quality (given by the percentage of true
positives and the precision, i.e. the ratio of true positives and all
positives) of the enrichment-based prediction scheme (third bar)
compared with such a random prediction scheme (rst bar) and
with a neighbor-based prediction scheme (second bar). The
performance increase of the prediction based on the enrichment
analysis compared to the other two schemes is clearly visible.
5. Conclusion and outlook
Causes of lateness were identied by conventional cluster
analysis and enrichment analysis for a real manufacturing
scenario. Critical issues in both cluster analysis and enrichment
analysis are the separation of numerical and categorial parameters
and the selection of process and order parameters. Process
parameters can result from order parameters which means that
in these cases the attributes are dependent from each other,
whereas independency is needed for proper statistical results.
Furthermore, not all important evaluated parameters could be
selected for analysis due to lack of data.
In this rst work, the cluster as well as the enrichment analysis
has been proven by expert validation in a real manufacturing
scenario as a promising technique to identify reasons of lateness.
As this method is independent of specic industry sectors a
common method for analyzing lateness is presented. Further
research will be focused on the development of independent
parameter selection criteria and the statistical combination of
numerical and categorial parameters.
Acknowledgements
We are grateful to Michaela Kording for performing several data
mining analyses and to Johannes Nicolas Gebhardt and Ivelin Kolev
for the practical interpretationand expert validation of the analysis
results, as well as to Jo rn Eilmann (Institut fu r Fabrikanlagen und
Logistik, University of Hannover) for contributing to Fig. 2.
The research of Katja Windt is supported by the Alfried Krupp
Prize for Young University Teachers of the Alfried Krupp von
Bohlen und Halbach-Foundation.
References
[1] Nyhuis P, Wiendahl H-P (1999) Fundamentals of Production Logistics. Springer
Verlag. p. 24.
[2] Klipp E, Liebermeister W, Wierling C, Kowald A, Lehrach H, Herwig R (2009)
Systems Biology: A Textbook. Wiley.
[3] Liebovitch LS, Shehadeh LA, Jirsa VK, Hu tt M-Th, Marr C (2010) Determining
the Properties of Gene Regulatory Networks from Expression Data. in Das S,
Caragea D, Hsu WH, Welch SM, (Eds.) Computational Methodologies in Gene
Regulatory Networks. IGI Global Publishing.
[4] Marr C, Theis FJ, Liebovitch LS, Hu tt M-Th (2010) Patterns of Subnet Usage in
the Transcriptional Regulatory Network of Escherichia coli. PLoS Computational
Biology 6:e1000836.
[5] Marr C, Geertz M, Hu tt M-Th, Muskhelishvili G (2008) Dissecting the Logical
Types of Network Control inGene ExpressionProles. BMC Systems Biology 2:18.
[6] Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA,
Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set
Enrichment Analysis: A Knowledge-based Approach for Interpreting Genome-
wide Expression Proles. Proceedings of National Academy of Sciences
102:1554515550.
[7] Fayyad U-M, et al, (1996) Advances in Knowledge Discovery and Data Mining.
AAAI/MIT Press.
[8] Choudhary A-K, et al, (2009) Data Mining in Manufacturing: A Review Based
on the Kind of Knowledge. Intelligent Manufacturing 20:501521.
[9] Cunha PF (2005) Knowledge Acquisition from Assembly Operational Data
Using Principal Components Analysis and Cluster Analysis. CIRP Annals
Manufacturing Technology 54(1):2730.
[10] Erkan T, Mayer JRR (2010) A Cluster Analysis Applied to Volumetric Errors of
Five-axis Machine Tools Obtained by Probing an Uncalibrated Artefact. CIRP
Annals Manufacturing Technology 59(1):539542.
[11] Witten I-H, Frank E (2005) Data Mining: Practical Machine Learning Tools and
Techniques. 2nd ed. Morgan Kaufmann.
Fig. 6. Lateness prediction results: true positives (A) and precision (B).
Table 2
Detailed analysis of relevant cluster attributes.
Cluster attributes Total orders Cluster Distinctness T
Fraction of orders 100% 2.0%
Lateness [arbitrary units] 0.441.6 0.91.9
Numerical cluster parameters
Thickness [mm] 2.02.6 l.0l.l * 1.46
Order weight [t] 84181 118155 * 0.49
Number of production steps 5.31.8 6.81.3 * 0.57
Number of suitable steel casts 4.79.1 4.03.5 1.07
Categorial cluster parameters
Steel grade A 2% 41% * 0.53
B 3% 10%
C 0% 8%
Surface quality High 17% 2% * 0.80
Medium 73% 0%
Low 9% 96%
Product category A 21% 58% * 0.51
B 8% 18%
C 1% 5%
* low; medium; and *high.
direct relation
inverse relation
thickness
lateness
evaluation parameter
width
number of location changes
surface quality
number of suitable
steel casts
order product
allocation changes
influence on lateness
influence on process parameter
number of production steps
order weight
number of rework processes
coilweight
order parameters
process parameters
Fig. 5. Hierarchy of parameters and inuence on lateness.

01-CIRP Annals - Manufacturing Technology Volume 60 issue 1 2011 [doi 10.1016%2Fj.cirp.2011.03.042] Katja Windt; Marc-Thorsten Hütt -- Exploring due date reliability in production systems using data mining methods ada

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

01-CIRP Annals - Manufacturing Technology Volume 60 issue 1 2011 [doi 10.1016%2Fj.cirp.2011.03.042] Katja Windt; Marc-Thorsten Hütt -- Exploring due date reliability in production systems using data mining methods ada

Încărcat de

Drepturi de autor:

Formate disponibile

Exploring due date reliability in production systems using data mining

methods adapted from gene expression analysis

Fig. 2. Material ow network of steel manufacturing.

Fig. 3. Identication of relevant clusters and attributes.

Fig. 4. Identication of relevant clusters.

S-ar putea să vă placă și