Documente Academic
Documente Profesional
Documente Cultură
applications to materials
Yousef Saad
Department of Computer Science
and Engineering
University of Minnesota
SIAM 2012 Annual Meeting
Twin Cities, July 13th, 2012
Minneapolis, 07-13-2012
p.
p.
Drowning in data
Dimension
Reduction
PLEASE !
11001010011
1010111001101
110011
110010
110010
110010
0101101001100100 111001
1010110011000111011100
Minneapolis, 07-13-2012
p.
Minneapolis, 07-13-2012
p.
: x Rm y Rd
Rmn, find a
xi
yi
may be linear,
or nonlinear (implicit). Linear case
(W Rmd):
Y = W >X
Minneapolis, 07-13-2012
p.
max
W Rmd
W >W = I
W is computed to maxi-
2
n
d
X
X
1
, y i = W > xi .
yi
y
j
n
j=1
i=1
2
Leads to maximizing
>
>
> >
Tr W (X e )(X e ) W ,
= n1 ni=1xi
Minneapolis, 07-13-2012
p.
= U V >,
X
U >U = I,
V >V = I,
= Diag
X
i
kxi W W xik =
kxi W yik2
p.
10
20
10
20
5
10 15
10
20
5
10 15
10
20
10 15
10
10 15
10
20
10 15
10
10 15
10
20
10 15
10
10 15
10 15
10 15
10 15
10 15
10 15
10 15
10 15
10 15
20
5
10 15
10
20
10 15
10
10
10 15
20
5
10 15
10
20
5
20
5
10 15
10
10
20
5
10 15
10 15
20
5
20
5
10
10
20
5
10
20
5
10 15
10
20
5
10
20
10 15
10 15
20
5
20
5
20
5
10
10
10
20
5
10 15
20
5
10 15
20
5
10
20
5
10
10
20
5
10 15
20
5
10
20
5
10
20
5
10
20
5
10 15
Minneapolis, 07-13-2012
p.
2-D reductions:
PCA digits : 0 4
LLE digits : 0 4
0.15
0
1 0.1
2
0.05
3
0
4
4
2
0
0.05
2
0.1
4
6
10
0.15
5
10
0.2
0.2
KPCA digits : 0 4
0.1
0.2
ONPP digits : 0 4
0.2
0.1
0.05
0
1 0.05
2
0
3
4 0.05
0.1
0.05
0.15
0.1
0.2
0.15
0.1
0.15
0.07
0.1
0.071
0.072
0.073
0.074
0.25
5.4341
5.4341
5.4341
5.4341
5.4341
3
x 10
Minneapolis, 07-13-2012
p.
10
Unsupervised learning
PCA digits : 5 7
5
5
6
7
5
6
Photovoltaic
Superhard
Superconductors
Ferromagnetic
Multiferroics
Catalytic
Thermoelectric
Minneapolis, 07-13-2012
p.
11
Original
matrix
Goal: Find
ordering so
blocks
are
as dense as
possible
Use blocking techniques for sparse matrices
Advantage of this viewpoint: need not know # of clusters.
[data: www-personal.umich.edu/mejn/netdata/]
Minneapolis, 07-13-2012
p.
12
Digit ??
Digit 9
Digit 2
11
00
00
00 0110 11
11
11
00
Digit 9
Digit 2
Digfit 1
Digit ??
Test data
Training data
Digit 0
Dimension reduction
Given: a set of
labeled
samples
(training set), and
an (unlabeled) test
image.
Problem:
find
label of test image
Digfit 1
Digit 0
0100
11 01
p.
13
Minneapolis, 07-13-2012
p.
14
Minneapolis, 07-13-2012
p.
15
Define:
k=1
c
X
X
=
(xi (k))(xi (k))T
SB =
SW
c
X
Where:
nk ((k) )((k) )T ,
= mean (X )
(k) = mean (Xk )
Xk = k-th class
nk = |Xk |
k=1 xi Xk
CLUSTER CENTROIDS
GLOBAL CENTROID
X1
X3
X2
Minneapolis, 07-13-2012
p.
16
Consider 2nd
moments for a vector a:
aT SB a =
aT SW a =
c
X
i=1
c
X
k=1 xi Xk
max
a
aT SB a
aT SW a
p.
17
Tr [U T SB U ]
Tr [U T SW U ]
max
T
U SW U =I
Tr [U T SB U ]
Minneapolis, 07-13-2012
p.
18
max
np
T
U R
Tr [U T AU ]
, U U =I
Tr [U T BU ]
f () = max
Tr [V T (A B)V ]
T
V V =I
p.
19
Define G() A B
P () = V ()V ()T
Then:
Minneapolis, 07-13-2012
p.
20
Recall [e.g.
Kato 65] that:
P () =
1
2i
(G() zI)1 dz
p.
21
new =
Tr [V
()T BV
()]
Tr [V ()T AV ()]
Tr [V ()T BV ()]
iteration with
g() =
Tr [V T ()AV ()]
Tr [V
T ()BV
()]
p.
22
Minneapolis, 07-13-2012
p.
23
Graph-based methods
Start with a graph of data. e.g.: graph
of k nearest neighbors (k-NN graph)
Want: Perform a projection which preserves the graph in some sense
xj
x
L=DW
e.g.,: wij =
1 if j Ni
0 else
yi
D = diag dii =
wij
j6=i
p.
24
xj
x
yi
Minneapolis, 07-13-2012
p.
25
xj
x
Minneapolis, 07-13-2012
p.
26
s.t. V V = I
In LLE replace V >X by Y
Minneapolis, 07-13-2012
p.
27
- %
p.
28
...
=:
...
| {z }
A
p.
29
W
2
W =
...
As before, graph Laplacean:
Lc = D W
Wc
Can be used for ONPP and other graph based methods
Improvement: add repulsion Laplacean [Kokiopoulou, YS
09]
Minneapolis, 07-13-2012
p.
30
Class 1
Class 2
Lc LR
Lc = class-Laplacean,
LR = repulsion Laplacean,
= parameter
Class 3
Minneapolis, 07-13-2012
p.
31
ORL TrainPerClass=5
0.98
0.96
0.94
0.92
0.9
0.88
onpp
pca
olppR
laplace
fisher
onppR
olpp
0.86
0.84
0.82
0.8
10
20
30
40
50
60
70
80
90
100
p.
32
p.
33
Issue 2: databases, and more generally sharing, not too common in materials
The inherently fragmented and multidisciplinary nature of
the materials community poses barriers to establishing
the required networks for sharing results and information.
One of the largest challenges will be encouraging scientists to think of themselves not as individual researchers
but as part of a powerful network collectively analyzing
and using data generated by the larger community.
These barriers must be overcome.
NSTC report to the White House, June 2011.
Materials genome initiative [NSF]
Minneapolis, 07-13-2012
p.
34
Unsupervised learning
p.
35
Question: Can modern data mining achieve a similar diagrammatic separation of structures?
Should use only information from the two constituent atoms
Experiment: 67 binary octets.
Use PCA exploit only data from 2 constituent atoms:
1. Number of valence electrons;
2. Ionization energies of the s-states of the ion core;
3. Ionization energies of the p-states of the ion core;
4. Radii for the s-states as determined from model potentials;
5. Radii for the p-states as determined from model potentials.
Minneapolis, 07-13-2012
p.
36
Result:
2
Z
W
D
R
ZW
WR
1.5
0.5
MgTe AgI
CuI
CdTe MgSe
MgS CuBr
CuCl
0.5
SiC
ZnO
1.5
BN
2
2
Minneapolis, 07-13-2012
p.
37
Minneapolis, 07-13-2012
p.
38
Case
Case 1
Case 2
Case 3
KNN
0.909
0.945
0.945
ONPP
0.945
0.945
0.945
PCA
0.945
1.000
0.982
Minneapolis, 07-13-2012
p.
39
Minneapolis, 07-13-2012
p.
40
P
D
Minneapolis, 07-13-2012
p.
41
10
SP
5
DP
PP
6
0
2
5
2
10
SP
DP
PP
15
6
10
10
10
10
15
Features used: val. elec.; electronegativ.; covalent rad.; density; ionic rad.; chemical scale; + band-gap.
Minneapolis, 07-13-2012
p.
42
12
S
6
10
S
I
4
0
2
2
0
4
2
6
6
20
15
10
10
15
10
10
Minneapolis, 07-13-2012
p.
43
Conclusion
Many, interesting new matrix problems in areas that involve
the effective mining of data
Among the most pressing issues is that of reducing computational cost - [SVD, SDP, ... too costly]
Many online resources available in computer science/ applied math...
.. but there is an inertia to overcome. In particular ...
... data-sharing is not as widespread in materials science
Plus side: Materials informatics will likely be energized by
the materials genome project.
Minneapolis, 07-13-2012
p.
44
Thank you !
Minneapolis, 07-13-2012
p.
45