0 Voturi pozitive0 Voturi negative

2 (de) vizualizări1 paginihjc

Oct 29, 2016

© © All Rights Reserved

PDF, TXT sau citiți online pe Scribd

hjc

© All Rights Reserved

2 (de) vizualizări

hjc

© All Rights Reserved

- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- The Art of Thinking Clearly
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- The Wright Brothers
- The Other Einstein: A Novel
- State of Fear
- State of Fear
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Kiss Quotient: A Novel
- The 10X Rule: The Only Difference Between Success and Failure
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions

Sunteți pe pagina 1din 1

Thomas Degen, PANalytical B.V., Lelyweg 1, Almelo, The Netherlands

Introduction:

high throughput data analysis. Almost all cluster analysis

approaches (agglomerative, divisive, k-means, fuzzy,

DBSCAN and so on) require a matrix that can be

calculated by comparing all involved patterns with each

other. However, if the variation between the patterns is

not properly extracted into such a correlation matrix,

then subsequent methodology will not be able to reveal

it. This is because some of the essential information has

already been lost in this first comparison step.

One advantage of using a correlation matrix is the severe

data reduction that takes place. So techniques like MMS

(Metric Multi-dimensional Scaling), PCA (Principal

Component Analysis) or Sammons non-linear mapping

are much easier and quicker to use to visualize the

systematic and non-systematic data variations.

properly calculate the (dis)-similarity of two X-ray powder

diffraction patterns?

Here we compare a probability-based approach (Ref.7),

where the probability is constructed on a point-by-point

basis from the signal-to-noise ratio of the supplied profile

data, against another approach that does not take into

account the counting statistics of the XRDP raw data. We

simply perform the PCA (Ref. 5,6) and/or the cluster analysis

directly on a matrix (X) of the raw data, where the

individual scans form the rows of the matrix and the

individual data points form the columns. This of course

requires that all scans are either measured on the same

measurement grid (corresponding data points are measured

at the same positions) or that the scans are first

interpolated onto such a common grid.

Each observation represents one coordinate axis, so each

scan can be plotted as a point in m dimensional space. The

m observations

entire dataset is then a swarm of points in space. In this

point swarm the first principal component PC1 is the line

T

d principal components

V

that gives the best approximation to the data, i.e.,

represents the maximum variance within the data. PC2 is

orthogonal to PC1 and again has maximum variance in this

T*U

X

=

V

direction. Further components are generated accordingly.

n scans

X

U

Thus the PC's are constructed in the order of declining

importance. The coordinate of a scan, when projected

onto an axis given by a principal component, is called its

score.

In matrix notation the PCA approximates the data matrix X,

which has n scans and m observations, by two smaller

The two matrixes V and U are orthogonal. The loadings

matrices: the score matrix U (n scans and d principal

can be understood as the weights for each original

components) and the loadings matrix V (d principal

variable when calculating the principal component.

components and m observations/variables), where:

about the possible presence of more groups. But these

groups are much more difficult to spot than in Figure 2a.

Further the Eigenvalues plot (Figure 3c) shows that the total

variation in the data is almost completely collected in the

first principal component (x-direction), which accounts

already for 98% of the total variation. This means that PC2

and PC3 (y- and z-direction) are more sensitive to noise and

therefore much less reliable.

References:

1) Spearman, C. (1904), Am. J. Psychol. 15, 72-101.

2) Conover, W. J. (1998), Practical Nonparametric Statistics, 3rd ed., John Wiley & Sons, New York.

3) Gilmore et al. (2004), J. Appl. Cryst. 37, 231-242.

4) Pearson, K. (1896), Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity and

Panmixia, Philosophical Transactions of the Royal Society of London, 187, 253-318.

5) E.H. Malinowski and D.G. Howery (1980), Factor Analysis in Chemistry, John Wiley & Sons, New York.

6) I.T. Joliffe (1986), Principal Component Analysis, Springer-Verlag, New York.

7) Unpublished proprietary algorithm.

correlation matrix (disregarding the signal-to-noise ratio),

are, for example: the non-parametric Spearman rank order

test (Ref. 1,2,3) or the Pearson's r (Ref. 3,4), which is a

parametric linear correlation coefficient.

The sample material used in this example is an industrial,

proprietary, pre-product. Seven samples were taken from

four batches, each sample was prepared and measured

eight times, resulting in a grand total of 56 scans.

The first rough, visual inspection of the surface plot of all 56

scans (Figure 1) leads to the impression that two different

groups of scans are present.

correlation matrix calculated by a probability based

comparison algorithm. It clearly shows the presence of

four to five different groups of patterns.

the full matrix of observations without probabilities.

55

Ln(Counts)

9.156

8.941

8.727

8.513

8.299

8.085

7.87

7.656

7.442

7.228

7.014

6.8

6.585

6.371

6.157

5.943

5.729

5.515

5.3

5.086

4.872

4.658

50

45

40

Scan number

35

30

25

20

15

10

5

20

30

40

50

60

70

80

Position [2Theta]

90

100

110

120

Summary:

using counting statistics for the extraction of pattern

variation into a correlation matrix, like utilized in o

our

software package X'Pert HighScore (Plus). The outlined

method proves to be an effective noise-cancelling approach

and has therefore an obvious advantage over directly using

the full matrix of observations for PCA and/or cluster

analysis, which in fact resembles a pure image comparison

approach.

the full matrix of observations. It gives some vague

impression that 3 or 4 groups of patterns are present.

inspection reveals two pattern groups.

PC

PC

% var.

Eige

Ei

50

48

46

44

42

40

38

36

34

n Ac oun ed

32

30

28

26

24

Perc nta e V

However, cluster analysis taking into account the signal-tonoise ratio of the XRDP raw data (as used in our searchmatch-identify algorithm) immediately reveals the presence

of at least four or five different groups. Figure 2a presents

the PCA score plot calculated from such a correlation

matrix. Figure 2b shows the Eigenvalues plot, which

indicates that PC3 in z-direction is much less important,

because it only accounts for about 2% variation in the data,

whereas PC1 in x-direction accounts for 51% and PC2 in ydirection still accounts for 38% variation.

The four clusters (fourth cluster is green and brown

together) nicely correspond to the four different batches

under investigation. Further manipulation of the cut-off

line in the dendrogram allows each of the seven samples to

be clearly identified, proving the non-homogeneity of the

sampling.

The other approach using only the matrix of raw

observations/data points creates a different picture. The 3D

PCA score plot (Figure 3a) based on the full matrix of

observations shows the separation into two clearly

distinguishable groups, but this is in fact no more help than

just looking at the surface plot (Figure 1).

22

20

18

16

14

12

10

8

6

4

2

var.

0

1

2

omp nen Nu ber

51% of the variation, PC2 contains another 38% and PC3

only contains 2% of the total variation in the data. 8%

of variation is not accounted for in PC1 - PC3.

contains 98% of the variation; PC2 only adds 1%

additional variation.

- -28sici-291099-128x-28199707-2911-3A4-3C311-3A-3Aaid-cem478-3E3.0.co-3B2-4Încărcat deLata Deshmukh
- EFÎncărcat deFirefly
- Stock TrendsÎncărcat deAmit Kumar Agarwal
- Factor AnalysisÎncărcat deGeorgeZio
- Chapman.SOM.pdfÎncărcat deVlad Preda
- BoukusRosenbergÎncărcat deBrandon Lock
- 10.1.1.115.6334Încărcat deYahveh Vehuiah
- 10.1.1.1025.431 (1)Încărcat deSyeda Dilafza Tabussum
- A Detailed Review of Feature ExtractionÎncărcat deMadhuri Roy
- Computer Aided Diagnosis in NeuroimagingÎncărcat deFrancisco Jesús Martínez Murcia
- Ingineria_18Încărcat deBadica Dudu
- [3]Pca(Data Reduction)Încărcat deRinki Parikh
- 10.1016@s0146-6380(98)00039-4Încărcat deIka May Hartati
- Amul Sip ReportÎncărcat deSachin Nair
- Examples of Variational Inference With Gaussian-Gamma DistributionÎncărcat deJun Wang
- Trial Questions (Plm)Încărcat deApam Benjamin
- Makbul Et Al 2010Încărcat deGökalp Zengin
- Session 7 Factor AnalysisÎncărcat deBharat Mendiratta
- Chimera TutorialÎncărcat deumm_mike
- 8D Report ModelÎncărcat deXFlex2010
- technical problem solving syllabusÎncărcat deapi-269153238
- 1-s2.0-S1352231005001378-mainÎncărcat dejoselemix69
- RSMÎncărcat deSatwant Singh
- Research Methods Mpa Activity 2Încărcat degoblinsbride
- stats.pdfÎncărcat deKitty Evalo Tura
- rblowayÎncărcat deikusabira
- 18264218Încărcat deHerdin Dwiyantoro
- Nilai IntrinsikÎncărcat deMohammad Rizal S
- Regression.xlsxÎncărcat deGauri Agarwal
- analisisÎncărcat deJose Achicahuala Mamani

- Journal of Electroanalytical Chemistry Electropolymerization and Dopingdedoping Properties of Polyaniline Thin Lms as Studied by Electrochemical Surface Plasmon Spectroscopy and by the Quartz Crystal Microb(1)Încărcat deEka Puspa Rini
- 36 Nickel IronAlloyforLowTemperatureService 410Încărcat degigel1980
- [000048]Încărcat deEka Puspa Rini
- Piranha Disp RatioÎncărcat deWan Aizuddin
- 5-Plant Physiology 2.pptÎncărcat deEka Puspa Rini
- [000023]Încărcat deEka Puspa Rini
- [000007]Încărcat deEka Puspa Rini
- Li, Zhe Fei, 2013_Fabrication of High Surface Area Graphene-polyaniline Nanocomposite and Their App in SupercapasitorÎncărcat deEka Puspa Rini
- 2016 Ibdexpo Siteplan Final 2016.06.23Încărcat deEka Puspa Rini
- [000002]Încărcat deEka Puspa Rini
- [000014]Încărcat deEka Puspa Rini
- wangfeng515671-201001-5Încărcat deEka Puspa Rini
- GRAFIK.docxÎncărcat deEka Puspa Rini
- ElectroplatingÎncărcat dejawsm
- kimia fisikÎncărcat deEka Puspa Rini

- 3D Path Following for AUVÎncărcat deHowan Kim
- apostila-otimização-MITÎncărcat deMailson1
- IntegrationÎncărcat deAkarsh Tripathi
- RoboticsbasicÎncărcat dedanvic
- IB Math SL IA PrepÎncărcat deJuanita Dominguez
- ims 500Încărcat dePrashant Srivastava
- 12 How to analyse rainfall dataÎncărcat deSagar Jss
- 2.3 - Graphing in Standard FormÎncărcat deTim Bishop
- Methods and Software Engineering Tools for Simulation of Robot DynamicsÎncărcat deAnonymous Hy5Ir9QX
- Optimization of Mechanical Design Problems UsingImproved Differential Evolution AlgorithmÎncărcat deIDES
- Stats - Binomial ProbabilitiesÎncărcat deldlewis
- Latex Ed Mo Do Version 4Încărcat deFajrul Hukmi
- PSTricks Pst-plot ExamplesÎncărcat deBaba Seidu
- MECH1230 Dynamics Unit 2 - Rigid Body Kinematics (1)Încărcat deJamie Thomson
- Internasional Journal of Math_STAD 1Încărcat deDita Kusumaningtyas
- CSC336 Assignment 1Încărcat deyellowmoog
- Tic Tac ToeÎncărcat deaditi
- Nism Paper MfÎncărcat dePritesh kumar Singh
- Tfy-3 4323 Exercise 1Încărcat deSwarnav Banik
- A TextbookÎncărcat deJose Antonio Villalobos
- PROBLEM SOLVING ABILITY AND ACADEMIC ACHIEVEMENT OF HIGHER SECONDARY STUDENTS.Încărcat deIJAR Journal
- Walter Stromquist- Packing 10 or 11 Unit Squares in a SquareÎncărcat deMnaom
- Introduction_to_Decimals.pdfÎncărcat dePeaceandLove Keisha Roldan
- B38SA-fakeexam-20March2015Încărcat deNabil Isham
- 03-01 Lecture Notes Compilation - Vectors and ScalarsÎncărcat deElaine Barros
- Circuit Theory of Linear Noisy Networks Haus and AdlerÎncărcat deyaseenzaidi5117
- weierstrass-function.pdfÎncărcat deAnnisa Zakiya
- Chapter 2Încărcat deSheenaKaur
- 2nd_quantizationÎncărcat deDiego Andrés Barbosa Trujillo
- BlackBiblePerfectTetrabyRaineAshford2 2Încărcat defluggle

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.