Documente Academic
Documente Profesional
Documente Cultură
(L < 0) is
subjected to a too early delivery [1]. Reasons for lateness need to be
explored by feedback data analysis assuming that the planned
delivery dates are based on realistic planning data. The main
characteristic of lateness is that all possible parameters of either
process or order attributes can in principle exert an inuence. In
addition, parameter interdependencies make it more difcult to
identify the most relevant parameters for improving lateness. For
example a change of an order in the queue on one machine
inuences the order sequence on the successive machine. Hence,
the reason for lateness at one machine might be attributed to poor
sequence adherence at the previous machine. Even though this
would be the reason for lateness, it does not mean that the main
reason for lateness has been identied. The hypothesis of our
approach is that there is not just one reason for lateness. In fact, a
set of high-dimensional order and process attributes result in
lateness. The aim of this article is to present a set of methods
capable of coping with the complexity of due date behavior and
able to identify the most inuential attribute sets for high or low
lateness.
The article is divided into two parts: The rst part describes
extensions of conventional clustering methods and the second part
is devoted to a new method derived from gene expression data
analysis. Gene expression is the process of reading off an entity in
a biological genome, a gene, and then biochemically producing the
corresponding proteins encoded by this gene. Particularly over the
last decade, systems biology has developed a rich set of methods
(see, e.g. [2,3]) for systematically exploring the intricate activity
patterns of genes and relating them to the underlying production
system, i.e. to the networks of interacting genes (see [4,5]). A
fundamental challenge is to understand the origin of the robust
functioning of biological cells from these data. In many respects,
multi-variant series production of technologically challenging
products using network-like material ow layouts is confronted
with a similar complexity. To build a detailed model of such a
manufacturing scenario is nearly impossible, especially when the
numerous order sequencing rules are neither well known nor
understood at all the machines. Understanding lateness in
manufacturing as a result of the contribution of a multitude of
parameters is comparable to understanding the robust functioning
of biological cells from activity patterns of genes.
In a rst approach, traditional data mining tools were used to
identify parameter combinations most inuential for low or high
lateness, respectively. Each identied parameter combination (or
cluster in parameter space) was discussed and validated by experts
in the underlying manufacturing scenario (Sections 2 and 3). These
CIRP Annals - Manufacturing Technology 60 (2011) 473476
A R T I C L E I N F O
Keywords:
Logistics
Pattern recognition
Scheduling
A B S T R A C T
Identifying causes of lateness inmultistage production systems demands methods for considering a high-
dimensional order and process attribute space. Simultaneous measurement of expression levels of
thousands of genes in a biological cell provides a data set for understanding robust cellular function.
Methods developed in computational systems biology for analyzing gene expression data enable the
identication of the most inuential criteria sets. Gene expression is the production process of functional
elements (enzymes, proteins) in a biological cell. Logistics data analysis faces a similar challenge: What
attributes of orders can be associated with high and low punctuality? We combine methods from cluster
analysis and computational systems biology to explore the relationship between order and resource
parameters and lateness. With this novel approach we determine intrinsic interdependencies between
order parameters and process parameters. For the case study described here, this approach has improved
the precision of predicting the lateness of an order by 14% compared to a majority vote among
neighboring orders in parameter space.
2011 CIRP.
* Corresponding author.
Contents lists available at ScienceDirect
CIRP Annals - Manufacturing Technology
j ournal homepage: ht t p: / / ees. el sevi er. com/ ci rp/ def aul t . asp
0007-8506/$ see front matter 2011 CIRP.
doi:10.1016/j.cirp.2011.03.042
analysis techniques were augmented by methods from gene
expression studies (Section 4), namely an enrichment study (see,
e.g. [5,6] for details on the corresponding systems biology
applications). Here, enrichment means the over-representation
of a certain event type (here: positive or negative lateness) in a
cluster compared to a randomselection of events. In the analysis of
gene regulatory networks enrichment denotes for example the
over-representation of a certain biological function in a set of
expressed genes. Acritical summary is presented in Section 5 along
with future research steps. All results are based on a real scenario
of steel manufacturing.
2. Data mining as an analysis tool for huge data sets
Tools for data mining have the primary goals of revealing
relations (descriptive data mining) and predicting outcomes
(predictive data mining) [7]. In recent years the use of data
mining in manufacturing and logistics engineering is growing [8].
In [9] principal components and cluster analysis are used for the
analysis of system performance based on feedback data of wiring
harnesses manufacturing and assembly stages. Another applica-
tion of cluster analysis is demonstrated in [10]. There the aim is to
investigate the volumetric errors of a ve-axis machine tool.
The general workow of analysis in this paper is depicted in
Fig. 1. Starting from a large data set of orders (in this case approx.
10
5
individual orders), which are characterized in terms of several
tens of attributes, a subset of attributes is selected as potentially
relevant parameters.
Those processed data are then subjected to a traditional
cluster analysis and the biology-motivated enrichment analysis,
leading to transformed data (or, more specically, the shapes of
clusters in parameter space). Further data mining steps then lead
to patterns that can be interpreted from a systemic perspective,
yielding a deeper understanding of the system at hand.
3. Data mining technique detecting reasons for lateness
3.1. Manufacturing data analysis by use of cluster analysis
The manufacturing scenario of sheet metal production involves
the main processes of steelmaking, continuous casting, adjusting of
slabs, forming, hot roller mill, dimensioning of coils, cold roller
mill, annealing, hot coating, and againadjusting of coils. Fig. 2 is the
schematic rendering of the material ow for the data analyzed
here. The size of the arrows represents the typical size of the
material ow at each point of the production process. While the
overall process architecture is linear, each individual order can
take very different paths through this network. Furthermore, some
of the nodes are distributed over different locations. The dominant
linear arrangement is due to the directedness of the production
along a sequence of technologically determined steps. These
technological and logistical ow restrictions show the real
complexity of the material ow network. As an example, on the
hot roller mill about 200 sequencing rules have to be taken into
considerations. The steel grade, customer specic dimensions and
other attributes lead to a high product variety. Additional
complexity is created by shortfall quantities, highly prioritized
orders and changing consignments of customer orders to steel
slabs or coils. Overall, lateness was not satisfactory and needs to be
improved, ideally with a few, but very effective counter measures.
3.1.1. First step: Identication of order and process attributes
In close coordination with company representatives attributes
were identied in categories such as order and production process.
A selection of (numerical and categorial) parameters used in the
cluster analysis is given in Table 1. Further attributes are used but
not listed due to condentiality reasons.
3.1.2. Second step: Cluster analysis
The k-means clustering algorithm is used in the following [11].
For ease of determining the parameter k (number of clusters) the
parameter sum of squared errors (SSE) was introduced. While SSE
is decreasing, the challenge is to nd a suitable k-value (on the one
hand large distances between cluster centers and small distances
between cluster element vectors). With this initial parameter k the
cluster analysis was conducted. Relevant clusters are characterized
by a rather narrow distribution in the very high lateness regime
(cluster C
2
in Fig. 3). Each of the clusters C
j
has a footprint in each
of the attributes i, a distribution P
i
(C
j
) of values, which in the
following is characterized by its average value m[P
i
(C
j
)] and its
standard deviation s[P
i
(C
j
)]. Fig. 3 summarizes this situation
schematically for one moderately relevant cluster, C
1
, and one
relevant cluster, C
2
. For clarity, we also represent the correspond-
ing distributions for all orders, namely the full lateness distribution
(x-axis) and the full distribution P
i
of attribute i (y-axis;
characterized by m[P
i
(total)] and s[P
i
(total)]). After selecting the
most relevant clusters (Fig. 3) among all k clusters by means of size
and lateness the chosen clusters are further analyzed.
For this purpose measuring the relevance of each cluster
attribute we introduce the parameter distinctness T for numerical
and categorial attributes. For numerical attributes, the distinctness
T is the normalized sum of the differences (cluster vs. total orders)
of the means and standard deviations. For categorial attributes, it is
the sum over all categories of event differences. T can be used to
perform a ranking of the attributes of a certain cluster and an
accordant prioritization in later interpretation of the results.
Fig. 1. Data mining process diagram
(Adapted from [7]).
(P
i
), and standard deviations s
+
(P
i
) and s
(P
i
).
Whenever the difference of the mean values is larger than the sum
of the standard deviations, this parameter is said to exert a strong
inuence on lateness:
r p
i
jm
p
i
m
p
i
j
s
p
i
s p
i
>1
An arrow in Fig. 5 pointing towards lateness indicates such an
inuence (thin arrow: 1 > r(P
i
) 2, thick arrow: r(P
i
) > 2). In the
previous analysis, lateness served as the evaluation parameter, i.e.
as an outside label, whose distribution in the parameter space is
evaluated. In principle, we can apply the enrichment analysis also
consecutively to each of the parameters, serving as the outside
label, while the remaining parameters form the parameter space.
In this way, we can identify the network of inuences among the
parameters (arrows in Fig. 5), as mapped out by the available data.
This application of the enrichment analysis has been done for all
process parameters.
From validation workshops with experts, it is known that
producing tonnage is the most important target gure. It is clear
that weight and thickness have a strong impact on lateness (the
thinner and the less coilweight the higher the lateness). Conse-
quently, counter measures need to address the performance target
system of the entire company as well. It is also shown that
thickness and coilweight have an indirect inuence on lateness as
well. The higher thickness and coilweight the higher the number of
location changes. And the higher the number of location changes
and the more number of production steps the higher the lateness.
One can easily follow that changing manufacturing sites involves
risks of delays especially if these changes occurred often
unplanned. Equally, the likelihood of delays is higher the more
production steps are needed. Knowing about these multi-dimen-
sional causes and effects gives the possibility to derive effective
counter measures in two respects: To avoid a cause (e.g. reduce
number of location changes if possible for those orders of high
coilweight and high thickness) or to consider the needed change
over time in the respective planning algorithm.
Lastly, it is informative to use the enrichment of clusters with
respect to lateness as a predictor for the lateness of individual
orders. The quality of such a prediction is then given by the number
of successfully predicted lateness types (positive or negative
lateness) compared to the measured false positives and false
negatives. This forecast quality can be directly compared with the
simplest prediction scheme, where lateness labels (positive,
Fig. 6. Lateness prediction results: true positives (A) and precision (B).
Table 2
Detailed analysis of relevant cluster attributes.
Cluster attributes Total orders Cluster Distinctness T
Fraction of orders 100% 2.0%
Lateness [arbitrary units] 0.441.6 0.91.9
Numerical cluster parameters
Thickness [mm] 2.02.6 l.0l.l * 1.46
Order weight [t] 84181 118155 * 0.49
Number of production steps 5.31.8 6.81.3 * 0.57
Number of suitable steel casts 4.79.1 4.03.5 1.07
Categorial cluster parameters
Steel grade A 2% 41% * 0.53
B 3% 10%
C 0% 8%
Surface quality High 17% 2% * 0.80
Medium 73% 0%
Low 9% 96%
Product category A 21% 58% * 0.51
B 8% 18%
C 1% 5%
* low; medium; and *high.
direct relation
inverse relation
thickness
lateness
evaluation parameter
width
number of location changes
surface quality
number of suitable
steel casts
order product
allocation changes
influence on lateness
influence on process parameter
number of production steps
order weight
number of rework processes
coilweight
order parameters
process parameters
Fig. 5. Hierarchy of parameters and inuence on lateness.
K. Windt, M.-T. Hutt / CIRP Annals - Manufacturing Technology 60 (2011) 473476 476