Sunteți pe pagina 1din 22

Analiza Datelor

Analiza componentelor principale, Analiza


factoriala, Metode de recunoastere
nesupervizata, Metode de recunoastere
supervizata

Antoneac Raluca

FABBV, seria A, grupa 1548


[2]

0. Introducere
Lucrarea de fata are drept scop analiza unui set de date prin patru tehnici: analiza
componentelor principale, analiza factoriala, metode de recunoastere supervizata si
metode de recunoastere nesupervizata.

Consideram astfel datele financiare pentru 10 firme ce activeaza in industria textila


si cu sediul in Bucuresti:

Nume firma Cifra Stocuri Salariati Profit Datorii Capitaluri


afaceri
Albalact SA Alba Iulia 1104600 836938 650 4754 2635432 50325
27179 9493 345 -16142 43625 -15942
Bucovina SA Suceava

Danone P.D.P.A. SRL 100258 13523 555 -14039 31965 -13839


Arad

Delaco Distribution 73980 0 470 -49755 241021 -68662


SRL Brasov
Delta Lact SA Tulcea 92873 8400 280 21435 344540 29838
Diana Impex SRL 23890 0 155 28306 3780 28546
Ramnicu Valcea
Dorna Branzeturi SA 1049060 2135 600 18400 27617 66455
Vatra Dornei
Ecolact SRL Milisauti 402250 90116 50 -95967 331125 -107910
Ladorna Cheese SA 236703 78535 395 8309 130609 8549
Dorna Arini
Napolact SA Cluj- 1484710 95827 638 6463 798486 74443
Napoca

Avem, asadar, circa 10 observatii asupra 7 caractaristici.

1. Analiza componentelor principale


[3]

Cu ajutorul acestei tehnici multidiemensionale de analiza dorim sa restrangem


dimensionalitatea spatiului initial, sa obtinem un numar redus de componente, dar cu
conditia unei pierderi minimale de informatie.

Variables 6

Simple Statistics

  Cifra_afac Stocuri Salariati Profit Datorii Capitaluri


eri

Mea 459550.3 113496.7 473.80000 - - 5180.300


n 000 000 000 8823.600 458820.00 00
00 00

StD 412580.8 257222.2 513.38378 38147.35 801743.03 58504.74


317 919 699 007 82 549

Correlation Matrix

  Cifra_afaceri Stocuri Salariati Profit Datorii Capitaluri

Cifra_afaceri 1.0000 0.9226 0.3539 -.0163 -.9656 0.2294

Stocuri 0.9226 1.0000 0.2872 0.0735 -.9727 0.2538

Salariati 0.3539 0.2872 1.0000 -.3351 -.3343 -.3326

Profit -.0163 0.0735 -.3351 1.0000 -.0812 0.9032

Datorii -.9656 -.9727 -.3343 -.0812 1.0000 -.3037


[4]

Correlation Matrix

  Cifra_afaceri Stocuri Salariati Profit Datorii Capitaluri

Capitaluri 0.2294 0.2538 -.3326 0.9032 -.3037 1.0000

Eigenvalues of the Correlation Matrix

  Eigenvalue Difference Proportion Cumulative

1 3.13616478 1.04063527 0.5227 0.5227

2 2.09552951 1.47947300 0.3493 0.8719

3 0.61605651   0.1027 0.9746

Eigenvectors

  PRIN1 PRIN2 PRIN3

Cifra_afaceri 0.543839 -.096331 -.152962

Stocuri 0.545107 -.045929 -.191679

Salariati 0.203635 -.439491 0.868629

Profit 0.092791 0.640442 0.370721

Datorii -.559311 0.040610 0.112733

Capitaluri 0.210217 0.619389 0.187625


[5]

Conform tabelului “Eigen Values of the Correlation Matrix”, putem afirma ca


reducerea numarului de variabile de la 6 la 3 se face cu un grad cumulat de retinere a
informatiei de 97,46% (simplificarea modelului are loc cu o pierdere minimala), in timp ce
matricea de corelatie ne arata taria legaturilor dintre variabile (facilitand organizarea
componentelor principale).

Principal Components Plot


2
[6]

Graficele de mai sus, avand ca axe chiar componentele principale, alaturi de


scorurile obtinute ne ajuta sa clasificam si sa ierarhizam cele 10 firme, chestiune de
nerealizat, la inceput, datorita numarului relativ mare de variabile.
[7]

2. Analiza factoriala
Analiza factoriala este o analiza multivariata, care are ca scop de a explica corelatiile
manifestate între o serie de variabile, numite indicatori sau teste, prin intermediul unui
numar mai mic de factori ordonati si necorelati, numiti factori comuni. Proprietatea de
necorelare a factorilor, care apare în definitia anterioara, se refera la definirea si
determinarea acestora sub restrictia inexistentei redundantei informationale. In mod
similar, ordonarea factorilor se refera la ierarhizarea acestora într-o maniera
descrescatoare, în functie de marimea variantei fiecarui factor.

Prior Communality Estimates: ONE


Eigenvalues of the Correlation Matrix: Total
= 6 Average = 1

  Eigenvalue Difference Proportion Cumulative

1 3.13616478 1.04063527 0.5227 0.5227

2 2.09552951 1.47947300 0.3493 0.8719

3 0.61605651 0.51140448 0.1027 0.9746

4 0.10465203 0.06706181 0.0174 0.9921

5 0.03759023 0.02758328 0.0063 0.9983

6 0.01000694   0.0017 1.0000

6 factors will be retained by the NFACTOR criterion.


Factor Pattern

  Factor1 Factor2 Factor3 Factor4 Factor5 Factor6

Cifra_afacer 0.96310 - - 0.15392 0.11927 0.02584


[8]

Factor Pattern

  Factor1 Factor2 Factor3 Factor4 Factor5 Factor6

i 0.13945 0.12006

Stocuri 0.96534 - - - - 0.04899


0.06649 0.15045 0.18828 0.05666

Salariati 0.36062 - 0.68178 0.00956 - 0.00592


0.63621 0.01562

Profit 0.16433 0.92710 0.29098 - 0.09569 -


0.13972 0.01169

Datorii - 0.05879 0.08848 0.00690 0.03415 0.08011


0.99050

Capitaluri 0.37228 0.89662 0.14727 0.16078 - 0.01869


0.09792

Variance Explained by Each Factor

Factor1 Factor2 Factor3 Factor4 Factor5 Factor6

3.1361648 2.0955295 0.6160565 0.1046520 0.0375902 0.0100069

Final Communality Estimates: Total = 6.000000

Cifra_afaceri Stocuri Salariati Profit Datorii Capitaluri

1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000

The FACTOR Procedure


Rotation Method: Varimax
[9]

Orthogonal Transformation Matrix

  1 2 3 4 5 6

1 0.95472 0.21122 0.20687 0.01486 0.02959 -0.00214

2 -0.10405 0.89466 -0.43424 -0.00864 0.01095 -0.00213

3 -0.27575 0.39357 0.87620 -0.00632 -0.03542 -0.00690

4 -0.03679 0.00337 0.01791 0.74732 0.65449 -0.10712

5 0.01627 0.00668 -0.02248 0.66372 -0.74287 0.08245

6 -0.00541 0.00493 0.00942 0.02554 0.13242 0.99079

Rotated Factor Pattern

  Factor1 Factor2 Factor3 Factor4 Factor5 Factor6

Cifra_afac 0.9632 0.0328 0.1549 0.2111 0.0467 0.0180


eri 4 6 2 3 8 1

Stocuri 0.9757 0.0844 0.0951 - - 0.0631


7 3 2 0.1611 0.0414 5
9 8

Salariati Salaria 0.2218 - 0.9488 0.0034 - -


ti 5 0.2247 2 8 0.0018 0.0005
3 0 7

Profit - 0.9787 - - - 0.0069


0.0130 7 0.1184 0.0486 0.1593 4
5 0 1 7

Datorii - - - 0.0140 - 0.0828


0.9762 0.1211 0.1527 8 0.0420 4
9 5 9 4

Capitaluri 0.2139 0.9387 - 0.0525 0.1960 -


1 4 0.1780 0 6 0.0105
5 0
[10]

Variance Explained by Each Factor

Factor1 Factor2 Factor3 Factor4 Factor5 Factor6

2.9282610 1.9126242 1.0023770 0.0758865 0.0695181 0.0113332

Final Communality Estimates: Total = 6.000000

Cifra_afaceri Stocuri Salariati Profit Datorii Capitaluri

1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000

The FACTOR Procedure


Rotation Method: Varimax

Scoring Coefficients Estimated by Regression


Squared Multiple Correlations of the Variables with Each Factor

Factor1 Factor2 Factor3 Factor4 Factor5 Factor6

1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000

Standardized Scoring Coefficients

  Factor1 Factor2 Factor3 Factor4 Factor5 Factor6

Cifra_afac 0.33740 - - 3.27747 - 2.663011


eri 391 0.03250 0.09896 099 1.03736 98
11 54 52

Stocuri 0.37968 - - - 0.60792 4.920560


557 0.05149 0.08872 2.21343 081 38
29 68 57
[11]

Standardized Scoring Coefficients

  Factor1 Factor2 Factor3 Factor4 Factor5 Factor6

Salariati - 0.18867 1.14184 - 0.40768 0.534657


0.17712 285 906 0.19504 542 95
05 83

Profit - 0.59951 0.14042 0.65598 - -


0.02939 765 683 641 2.92993 0.809158
9 66 6

Datorii - 0.06064 0.10454 0.85075 0.41433 7.999354


0.37500 64 546 078 907 85
92

Capitaluri - 0.49893 0.15187 - 3.18758 1.468222


0.10610 428 313 0.53649 573 47
42 56

3. Analiza cluster
Analiza cluster este o tehnica de clasificare caracterizata prin faptul ca afectarea
formelor sau obiectelor în clustere sau grupe se face progresiv si fara a cunoaste aprioric
numarul de clase, în functie de verificarea a doua criterii fundamentale:
- obiectele sau formele clasificate în fiecare clasa sa fie cat mai similare din punct de vedere
al anumitor caracteristici;
- obiectele clasificate într-o clasa sa se diferentieze cat mai mult de obiectele clasificate în
oricare din celelalte clase.

Rezultate furnizate de metoda lui Ward:


[12]

Cluster Analysis Results

The CLUSTER Procedure


Ward's Minimum Variance Cluster Analysis
Eigenvalues of the Covariance Matrix

  Eigenvalue Difference Proportion Cumulative

1 8.00706E11 7.93213E11 0.9826 0.9826

2 7493611884 2997348049 0.0092 0.9918

3 4496263835 2449581912 0.0055 0.9973

4 2046681923 1931729146 0.0025 0.9999

5 114952777 114952719 0.0001 1.0000

6 57.9409324   0.0000 1.0000

Root-Mean-Square Total-Sample Standard Deviation = 368523.6


Root-Mean-Square Distance Between Observations = 1276603
Cluster History

NCL Clusters Joined FREQ SPRSQ RSQ ERSQ CCC PSF PST2 T
i
e

9 OB2 OB3 2 0.0000 1.00 . . 4098 .  

8 OB6 OB7 2 0.0002 1.00 . . 1335 .  

7 OB4 OB5 2 0.0018 .998 . . 244 .  

6 CL9 CL8 4 0.0027 .995 . . 167 25.4  

5 CL7 OB9 3 0.0031 .992 . . 157 1.7  

4 CL5 OB8 4 0.0051 .987 . . 152 2.1  

3 CL6 CL4 8 0.0294 .958 . . 79.0 13.6  

2 CL3 OB10 9 0.0641 .893 .801 1.82 67.1 10.6  

1 OB1 CL2 10 0.8935 .000 .000 0.00 . 67.1  


[13]

Cluster Analysis Plots

Cluster Analysis Plots


[14]

Cluster Analysis Plots

Cluster Analysis Tree Chart


[15]

The TREE Procedure


Cluster tree data for SASUSER.IMPW7714

Metoda genereaza 10 clustere, cate una pentru fiecare obiect observat. Dorim
restrangerea acestora, asadar:

4. Clusterizarea folosind algoritmul k-means


[16]

Rezultate furnizate de metoda celor k medii (parametri folositi: 1 iteratie, maxim 4


clustere):

The FASTCLUS Procedure


Replace=FULL Radius=0 Maxclusters=4 Maxiter=1
Initial Seeds

Clust Cifra_aface Stocuri Salaria Profit Datorii Capitaluri


er ri ti

1 1104600.0 836938.0 17.00 4754.000 - 50325.000


00 00 0 2635432.00
0

2 484710.00 95827.00 4.000 6463.000 - 74443.000


0 0 798486.000

3 10258.000 13523.00 1.000 - -31965.000 -13839.000


0 14039.00
0

4 402250.00 90116.00 4.000 - - -


0 0 95967.00 331125.000 107910.000
0

Criterion Based on Final Seeds = 39752.9

Cluster Summary

Cluster Frequency RMS Std Maximum Radius Nearest Distance


Deviation Distance Exceeded Cluster Between
from Seed Cluster
to Centroids
Observation

1 1 . 0   2 2075684

2 1 . 0   4 541313

3 5 47687.8 162613   4 339707


[17]

Cluster Summary

Cluster Frequency RMS Std Maximum Radius Nearest Distance


Deviation Distance Exceeded Cluster Between
from Seed Cluster
to Centroids
Observation

4 3 57906.9 133725   3 339707

Statistics for Variables

Variable Total STD Within STD R-Square RSQ/(1-RSQ)

Cifra_afaceri 317844 88020 0.948874 18.559474

Stocuri 257222 39267 0.984464 63.364985

Salariati 10.38375 10.17513 0.359852 0.562138

Profit 38147 37724 0.348038 0.533831

Datorii 801743 51330 0.997267 364.949947

Capitaluri 58505 49555 0.521708 1.090772

OVER-ALL 368524 51321 0.987071 76.345545

Pseudo F Statistic = 152.69

Approximate Expected Over-All R-Squared = .

Cubic Clustering Criterion = .

Cluster Means

Cluste Cifra_aface Stocuri Salaria Profit Datorii Capitaluri


r ri ti

1 1104600.0 836938.0 17.00 4754.000 - 50325.00


00 00 0 2635432.00 0
[18]

Cluster Means

Cluste Cifra_aface Stocuri Salaria Profit Datorii Capitaluri


r ri ti

2 484710.00 95827.00 4.000 6463.000 - 74443.00


0 0 798486.000 0

3 109624.60 20737.20 2.600 4966.800 -47519.200 14753.80


0 0 0

4 315688.66 32838.66 14.00 - - -


7 7 0 41429.00 305562.000 48911.33
0 3

Cluster Standard Deviations

Clust Cifra_aface Stocuri Salariati Profit Datorii Capitaluri


er ri

1 . . . . . .

2 . . . . . .

3 93017.618 32768.642 2.3021 19641.418 48657.548 34137.657


92 44 7 55 17 93

4 77058.092 49781.117 17.320 59142.193 56295.169 70966.150


39 76 51 97 13 39

Metoda furnizeaza, conform parametrilor de intrare, 4 clustere, apartenenta


obiectelor la clustere fiind ilustrata in tabelul de mai sus.
[19]

Bibliografie

- suport de curs si de seminar;

- documentatie SAS;

- SAS Enterprise Guide libraries;

- www.google.com

- http://www.mfinante.ro/

- http://www.indexb.ro/

Anexa:

Cod ACP
%LET _CLIENTTASKLABEL=%NRBQUOTE(Import Data); INPUT
%LET _EGTASKLABEL=%NRBQUOTE(Import Data); Nume_firma $
PROC SQL; Cifra_afaceri
DROP VIEW SASUSER.IMPW5778; Stocuri
DROP TABLE SASUSER.IMPW5778; Salariati
%MACRO _EG_ImportData; Profit
%LET IsMVS=FALSE; Datorii
[20]

%IF %sysfunc(getoption(filesystem))=MVS %THEN %DO; Capitaluri


%LET IsMVS=TRUE;
%END; ;
DATA SASUSER.IMPW5778; LABEL
INFILE Nume_firma = "Nume_firma"
"C:\DOCUME~1\Raluca\LOCALS~1\Temp\EGImport\Local\Date Cifra_afaceri = "Cifra_afaceri"
AD9a53d7ad-ed46-42df-9f64-f434c12d8f11.asc" Stocuri = "Stocuri"
DELIMITER='*' Salariati = "Salariati"
MISSOVER Profit = "Profit"
DSD Datorii = "Datorii"
%IF &IsMVS=FALSE %THEN %DO; Capitaluri = "Capitaluri"
LRECL=32767
%END;
FIRSTOBS=2 ;
; RUN;
LENGTH %MEND;
Nume_firma $ 25 %_EG_ImportData
Cifra_afaceri 8 RUN; QUIT;
Stocuri 8 TITLE; FOOTNOTE;
Salariati 8 RUN;
Profit 8 %LET _CLIENTTASKLABEL=;
Datorii 8 %LET _EGTASKLABEL=;
Capitaluri 8

Cod Factorial
%LET _CLIENTTASKLABEL=%NRBQUOTE(Import Data); INPUT
%LET _EGTASKLABEL=%NRBQUOTE(Import Data); Nume_firma $
PROC SQL; Cifra_afaceri
DROP VIEW SASUSER.IMPW5480; Stocuri
DROP TABLE SASUSER.IMPW5480; Salariati
%MACRO _EG_ImportData; Profit
Datorii
%LET IsMVS=FALSE; Capitaluri

%IF %sysfunc(getoption(filesystem))=MVS %THEN %DO; ;


%LET IsMVS=TRUE; LABEL
%END; Nume_firma = "Nume_firma"
DATA SASUSER.IMPW5480; Cifra_afaceri = "Cifra_afaceri"
INFILE Stocuri = "Stocuri"
"C:\DOCUME~1\Raluca\LOCALS~1\Temp\EGImport\Local\Date Salariati = "Salariati"
AD962ff262-2f1e-4e8d-a9d1-06944aa6848f.asc" Profit = "Profit"
DELIMITER='*' Datorii = "Datorii"
MISSOVER Capitaluri = "Capitaluri"
DSD
%IF &IsMVS=FALSE %THEN %DO;
LRECL=32767 ;
%END; RUN;
FIRSTOBS=2 %MEND;
; %_EG_ImportData
LENGTH RUN; QUIT;
Nume_firma $ 25 TITLE; FOOTNOTE;
Cifra_afaceri 8 RUN;
Stocuri 8 %LET _CLIENTTASKLABEL=;
Salariati 8 %LET _EGTASKLABEL=;
Profit 8
Datorii 8
Capitaluri 8

;
[21]

Cod Cluster (K-means)


%LET _CLIENTTASKLABEL=%NRBQUOTE(Import Data); Termen_inregistrare $ 21
%LET _EGTASKLABEL=%NRBQUOTE(Import Data); ;
PROC SQL; INPUT
DROP VIEW SASUSER.IMPW3597; Nume_firma $
DROP TABLE SASUSER.IMPW3597; Cifra_afaceri
%MACRO _EG_ImportData; Stocuri
Salariati
%LET IsMVS=FALSE; Profit
Datorii
%IF %sysfunc(getoption(filesystem))=MVS %THEN %DO; Capitaluri
%LET IsMVS=TRUE; ;
%END; LABEL
DATA SASUSER.IMPW3597; Nume_firma = "Nume_firma"
INFILE Cifra_afaceri = "Cifra_afaceri"
"C:\DOCUME~1\Raluca\LOCALS~1\Temp\EGImport\Local\Date Stocuri = "Stocuri"
AD271e8db8-009a-4bd5-b048-90da00c42c4f.asc" Salariati = "Salariati"
DELIMITER='*' Profit = "Profit"
MISSOVER Datorii = "Datorii"
DSD Capitaluri = "Capitaluri"
%IF &IsMVS=FALSE %THEN %DO;
LRECL=32767 ;
%END; RUN;
FIRSTOBS=2 %MEND;
; %_EG_ImportData
LENGTH RUN; QUIT;
Nume_firma $ 25 TITLE; FOOTNOTE;
Cifra_afaceri 8 RUN;
Stocuri 8 %LET _CLIENTTASKLABEL=;
Salariati 8 %LET _EGTASKLABEL=;
Profit 8
Datorii 8
Capitaluri 8

Cod Cluster (Ward)


%LET _CLIENTTASKLABEL=%NRBQUOTE(Import Data); Capitaluri 8
%LET _EGTASKLABEL=%NRBQUOTE(Import Data); ;
PROC SQL; INPUT
DROP VIEW SASUSER.IMPW3597; Nume_firma $
DROP TABLE SASUSER.IMPW3597; Cifra_afaceri
%MACRO _EG_ImportData; Stocuri
Salariati
%LET IsMVS=FALSE; Profit
Datorii
%IF %sysfunc(getoption(filesystem))=MVS %THEN %DO; Capitaluri
%LET IsMVS=TRUE; ;
%END; LABEL
DATA SASUSER.IMPW3597; Nume_firma = "Nume_firma"
INFILE Cifra_afaceri = "Cifra_afaceri"
"C:\DOCUME~1\Raluca\LOCALS~1\Temp\EGImport\Local\Date Stocuri = "Stocuri"
AD271e8db8-009a-4bd5-b048-90da00c42c4f.asc" Salariati = "Salariati"
DELIMITER='*' Profit = "Profit"
MISSOVER Datorii = "Datorii"
DSD Capitaluri = "Capitaluri"
%IF &IsMVS=FALSE %THEN %DO;
LRECL=32767 ;
%END; RUN;
FIRSTOBS=2 %MEND;
; %_EG_ImportData
LENGTH RUN; QUIT;
[22]

Nume_firma $ 25 TITLE; FOOTNOTE;


Cifra_afaceri 8 RUN;
Stocuri 8 %LET _CLIENTTASKLABEL=;
Salariati 8 %LET _EGTASKLABEL=;
Profit 8
Datorii 8

S-ar putea să vă placă și