Sunteți pe pagina 1din 8

Analele Universitii Constantin Brncui din Trgu Jiu, Seria Economie, Nr.

1/2010

O METOD DE ANALIZ FACTORIAL APLICAT N DOMENIUL DEZVOLTRII

A FACTOR ANALYSIS METHOD APPLIED IN DEVELOPMENT FIELD

Mdlina CRBUREANU Universitatea Petrol - Gaze din Ploieti

Mdlina CRBUREANU, Petroleum-Gas University of Ploieti

REZUMAT Att procesul dezvoltrii ct i cel al globalizrii, fac referire la economia unei naiuni i bunstarea oamenilor din ntreaga lume, constituind subiecte de actualitate pentru diferite organizaii. Realitatea economic i nivelul de trai al oamenilor, pot fi descrise prin intermediul unei mulimi de variabile. Problema apare atunci numrul variabilelor este semnificativ si apare nevoia manevrrii acestui volum imens de informaii. O soluie la aceast problem, poate fi aplicarea unei metode de analiz factorial, aa numita Analiza Componentelor Principale , cu scopul final de a stabilii i a analiza acele variabile care influeneaz ntr-o manier semnificativ, dezvoltarea uman. Cuvinte cheie: Analiza Componentelor Principale, Institutul Naional de Statistic, Programul Naiunilor Unite pentru Dezvoltare.

ABSTRACT Both processes of development and globalization, refers to the nations economy and people welfare from all over the world, being subjects of present interest for different organizations. The economic reality and people living status can be described through a batch of variables. The problem appears when the number of variables is significant, and appears the need of handle this great volume of information. A solution to this problem can be the application of a factor analysis method, so called Principal Components Analysis, with the final goal of establishing and analyzing, those variables, which influence in a meaningful way, the human development. Key words: Principal Components Analysis, National Institute of Statistics, United Nations Development Program.

Introducere n zilele noastre, multe organizaii dezvolt diferite programe, care vin cu soluii la provocrile dezvoltrii globale i naionale. O astfel de organizaie este PNUD (Programul Naiunilor Unite pentru Dezvoltare) care prin intermediul reprezentanelor sale din ntreaga lume, inclusiv Romnia, ncearc prin diferite programe s rezolve probleme majore, precum: dezvoltarea uman i economic, identificarea resurselor viabile, probleme ale mediului, bolile infecioase, conflictele i dezastrele naturale, i multe altele (PNUD, 2007). De asemenea, n ara noastr, Institutul Naional de Statistic (INSSE) furnizeaz rapoarte anuale pentru toate domeniile.

Introduction Nowadays, many organizations are developing different programs, which come with solutions to the global and national development challenges. Such an organization is UNDP (United Nations Development Program) which through his subsidiaries all over the world, including Romania, tries through different programs to solve major problems, such as: human and economic development, nation building, sustainable resources, environment problems, infectious diseases, conflicts and natural disasters, and many others (UNDP, 2007). Also, in our country, the National Institute of Statistics (INSSE) offers annual reports for all domains.

Bunstarea oamenilor este influenat de mai muli The people welfare is influenced by many factori, precum: sperana de via, produsul intern factors, such as: life expectancy, gross domestic brut, nivelul de srcie, media anual a schimbrii n preul consumatorului, dezvoltarea legat de product, the poverty level, average annual change gen, activitatea economic a femeilor, nivelul in consumer price, the degree of income
Annals of the Constantin Brncui University of Trgu Jiu, Economy Series, Issue 1/2010

187

Analele Universitii Constantin Brncui din Trgu Jiu, Seria Economie, Nr. 1/2010

inequality in a society, gender related development, female economic activity, the human development level, etc. Are a great number of variables with a certain influence on the quality of peoples life, but some of these variables are more important than others, so is useful the identification of those variables for a better understanding of the factors which can Analiza factorial este folosit pentru a increase or decrease the quality of life. rezolva dou tipuri de probleme: reducerea Factor analysis is used for solving two numrului de variabile pentru a crete viteza de procesare a datelor i identificarea abloanelor types of problems: the variables number reducing ascunse n relaiile existente ntre date. Analiza for increasing the data processing speed and the factorial se refer la o gam larg de tehnici hidden patterns recognition from data relations. statistice folosite pentru a reprezenta un set de Factor analysis refers to a variety of statistical variabile n concordan cu un numr redus de techniques used for representing a set of variables variabile ipotetice, numite factori. Acest tip de in accordance with a reduced number of analiz este folosit intens n diferite domenii, hypothetical variables, named factor. This type of precum: psihologie, tiine sociale, managementul analysis is intensively used in different domains, produciei, cercetare operaional, domeniul such as: psychology, social sciences, production operational researches, dezvoltrii, etc. Cteva din pachetele software management, dedicate acestui tip de analiz, sunt: Statistics, development domain, etc. Some of the software packages, dedicated for this type of analysis, are: SAS, SPSS. Statistics, SAS, SPSS. Analiza componentelor principale (ACP), Principal components analysis (PCA), cunoscut i sub denumirea de transformarea Hotelling sau transformarea Karhunen-Loeve, known also as Hotelling transform or Karhuneneste o tehnic de analiz factorial, n care, scopul Loeve transform, is a factor analysis technique, in este acela de a reduce numrul de variabile folosite which, the goal is the reducing of the variables iniial, lund n considerare un numr redus de number initially used, taking into consideration a reduced number of representative variables variabile reprezentative (Gorunescu, 2006). ACP este cea mai simpl analiz dintre analizele (Gorunescu, 2006). PCA is the simplest of the true eigenvectorbazate pe vectori proprii. Deoarece n date de mari dimensiuni, abloanele sunt foarte greu de based multivariate analyses. Because in data of identificat, metoda ACP este foarte util deoarece high dimensions, the patters are hard to find, the prin reducerea numrului de dimensiuni, PCA method is very useful because by reducing abloanele pot fi identificate fr o pierdere the number of dimensions, the patters can be important de informaie (Smith, 2002). ACP are found without an important information loss aplicaii n domenii precum recunoaterea feei i (Smith, 2002). PCA has found application in fields such as face recognition and image compresia imaginii. compression. Aplicarea ACP i interpretarea rezultatelor The PCA application and results Scopul aplicaiei este acela de a identifica interpretation acei factori care influeneaz nivelul dezvoltrii The goal of the application is to identify umane, folosind o metod de analiz factorial, numit ACP (Analiza Componentelor Principale). those factors which influence the level of human Scopul ACP este de a obine un numr mic de development, using a factor analysis method, dezvoltrii umane, etc. Exist un numr mare de variabile cu o anumit influen asupra calitii vieii oamenilor, dar unele dintre aceste variabile sunt mai importante dect celelalte, astfel nct este util identificarea acelor variabile pentru o mai bun nelegere a factorilor care pot conduce la creterea sau descreterea calitii vieii.
Annals of the Constantin Brncui University of Trgu Jiu, Economy Series, Issue 1/2010

188

Analele Universitii Constantin Brncui din Trgu Jiu, Seria Economie, Nr. 1/2010

combinaii liniare (componentele principale) dintr- called PCA (Principal Components Analysis). un set de variabile, care s rein ct mai mult The purpose of PCA is to derive a small number informaie din variabilele iniiale. of linear combinations (principal components) of Din pachetele software dedicate acestui a set of variables that retain as much of the information in the original variables as possible. tip de analiz, am ales s folosim, software-ul SPSS. SPSS (Statistical Package for the Social From the software packages dedicated to Sciences) este unul dintre cele mai accesibile this type of analysis, we chose to use, the SPSS software-uri, folosit de organizaii comerciale, software. SPSS (Statistical Package for the Social guvernamentale i academice pentru a rezolva Sciences) is the worlds most accessible statistical probleme legate de afaceri i probleme referitoare software suite used by commercial, government and academic organizations to solve business and la cercetare. research problems. Datele de intrare au fost furnizate de ctre The input data were provided by the Raportul Dezvoltrii Umane 2007/2008- Luptnd mpotriva schimbrilor climaterice: Solidaritatea Human Development Report 2007/2008-Fighting uman intr-o lume divizat, elaborat de ctre climate change: Human solidarity in a divided organizaia PNUD (PNUD, 2007). Structura bazei world, elaborated by UNDP organization de date SPSS conine 177 de nregistrri, i (UNDP, 2007). The SPSS data base structure has urmtoarele cmpuri: 177 recordings, and the following fields: o ara; o Indice_speran_via, reprezentnd valorile speranei medii de via la natere; o PIB_Indice, reprezentnd valorile produsului intern brut pe locuitor; o Indice_srcie; o Indice_medie_anuala_schimbare_pre_consu mator; o Indice_GINI, reprezentnd gradul inegalitii venitului intr-o societate; o GDI, fiind indicele dezvoltrii referitor la gen; o Activitate_economica_femeie; o IDU, reprezentnd indicele dezvoltrii umane, fiind alturi de indicele srciei, unul dintre cei mai importani parametrii sintetici ai dezvoltrii umane. Structura bazei de date SPSS este prezentat n figura 1: o Country_name; o Life_expectancy_index, representing the values of medium life expectancy at birth; o GDP_Index, representing the gross domestic product on inhabitant values; o Poverty_index; o Average_annual_change_in_consumer_price _index; o GINI Index, representing the degree of income inequality in a society; o GDI, being the gender related development index; o Female_economic_activity; o HDI, representing the humandevelopment index, being alongside the poverty index, one of the most important human developments synthetic parameter. The SPSS data base structure is presented in figure 1:

Annals of the Constantin Brncui University of Trgu Jiu, Economy Series, Issue 1/2010

189

Analele Universitii Constantin Brncui din Trgu Jiu, Seria Economie, Nr. 1/2010

Dup aplicarea metodei Analizei Componentelor Principale, o serie de rezultate au fost obinute, rezultate pe care le vom interpreta, precum: statistica descriptiv, matricea de corelaie, testul KMO i Bartlett, comunalitatea, variana total explicat, imaginea Scree Plot, matricea componentelor i matricea componentelor dup rotaie. Tabelul statistic descriptiv (Descriptive Statistics) furnizeaz media (media pentru fiecare dimensiune), deviaia standard ( nivelul de mprtiere al datelor) i numrul cazurilor analizate, pentru fiecare variabil, aa cum se poate observa n figura 2.

After applying the Principal Components Analysis method, a series of results were obtained, results which we will interpret, such as: Descriptive Statistics, Correlation Matrix, KMO and Bartlett's Test, Communalities, Total Variance Explained, Scree Plot, Component Matrix and Rotated Component Matrix. Descriptive Statistics table supplies the mean (the average across each dimension), the standard deviation (the data spread out level) and the number of analyzed cases, for each variable, as we can see in figure 2.

Matricea de corelaie Correlation Matrix (similar cu o matrice de covarian unde coloanele, au fost standardizate), prezentat n figura 3, descrie corelaia dintre variabilele analizate. Aa cum putem observa, ntre variabile, avem att corelaie pozitiv ct i negativ, lucru perfect normal, datorit semnificaiei fiecrei variabile analizate.

The Correlation Matrix (similar to a covariance matrix where the columns, have been standardized) table, presented in figure 3, describes the correlation among the analyzed variables. As we can see, between variables, we have also positive and negative correlation, which is perfectly normal, due to the meaning of each analyzed variable.

Annals of the Constantin Brncui University of Trgu Jiu, Economy Series, Issue 1/2010

190

Analele Universitii Constantin Brncui din Trgu Jiu, Seria Economie, Nr. 1/2010

Indicele Kaiser-Meyer-Olkin (0.80) este folosit pentru a compara dimensiunile coeficienilor de corelaie observai cu dimensiunile coeficienilor de corelaie parial. Valoarea testului Bartletts Test of Sphericity (2046.849, Sig=0.000), este suficient de mic pentru a respinge ipoteza c variabilele sunt necorelate, drept urmare exist o puternic relaie ntre date. Aceste valori, indic prezena unuia sau mai multor factori comuni ceea ce motiveaz aplicarea unei proceduri de reducie factorial, reprezentat de metoda ACP

Kaiser-Meyer-Olkin index (0.80) is used for comparing the magnitudes of the observed correlation coefficients to the magnitudes of the partial correlation coefficients. The Bartletts Test of Sphericity value (2046.849, Sig=0.000), is small enough to reject the hypothesis that the variables are uncorrelated, therefore there is a strong relationship among variables. These values, point to the presence of one or many common factors and motivates the application of a factor reduction procedure, represented by PCA method.

Comunalitatea reprezint acea parte din variana unei variabile explicat de structura unui factor (Pohlmann), sau altfel spus, caracterul comun al unei variabile reprezint acea parte din variana variabilei care este comun cu variana altor variabile. Valori minime ale caracterului comun pentru anumite variabile, indic faptul c respectivele variabile nu sunt bine reprezentate de modelul factorial aplicat. n cazul de fa, majoritatea variabilelor sunt bine reprezentate de ctre modelul factorial folosit, aa cum putem observa n figura 5.

Communality represents the proportion of a variable's variance explained by a factor structure (Pohlmann), or with other words, the common character of a variable represents that part from the variable variance witch is common with other variables variance. Minimal values of certain variables common character indicate that those variables are not well represented by the factor model applied. In the current situation, the majority of the variables are well represented by the factor model used, as we can see in figure 5

The

first

specific

factor

analysis

Annals of the Constantin Brncui University of Trgu Jiu, Economy Series, Issue 1/2010

191

Analele Universitii Constantin Brncui din Trgu Jiu, Seria Economie, Nr. 1/2010

Fig. 5. Comunalitatea variabilelor

Primele informaii specifice analizei factoriale sunt furnizate de ctre tabelul Variana Total Explicat (Total Variance Explained), prezentat n figura 6. Folosind metoda Analizei Componentelor Principale (ACP), a fost generat un numr de opt componente principale, aa numiii, factori. Aa cum putem observa n figura 6, doar primii doi factori ndeplinesc criteriul de selecie (valori proprii>=1).

informations are supplied by the Total Variance Explained Table, presented in figure 6. Using the Principal Components Analysis method (PCA), has been generated a number of eight principal components, so called, factors. As we can observe in figure 6, only the first two factors fulfil the selection criterion (Eigenvalue>=1).

Fig. 6. Tabelul Total Variance Explained

Coloanele Extraction Sums of Squared Loadings, furnizeaz valorile pentru valorile proprii (coloana Total), variana explicat ( coloana % of Variance) i variana cumulat ( coloana Cumulative % ), n contextul soluiei iniiale, naintea rotaiei. Variana explicat de fiecare factor este distribuit astfel: primul factor, 62.137% i cel de-al doilea factor, 13.277%. Ambii factori explic 75.414% din valoarea varianei analizate. Coloanele Rotation Sums of Squared Loadings, prezint valorile pentru ambii factori, dar dup aplicarea procedurii de rotaie. n contextul aceleiai variane totale (75.414%), poate fi observat o redistribuire a varianei explicate de ctre fiecare factor, astfel:primul factor 62.090% i al doilea factor, 13.324%. Aa cum se poate observa, prin metoda de rotaie, primul factor pierde din nivelul de saturaie, n favoarea celui de-al doilea factor. n figura 7, valorile proprii pentru toate componentele principale, obinute aplicnd metoda ACP, sunt reprezentate grafic ntr-o secven de factori principali. Numrul factorilor este ales acolo unde nivelele graficului prezint un ablon descresctor liniar. Figura de mai jos, sugereaz existena unei soluii cu doi factori, de

The Extraction Sums of Squared Loadings columns, supply the Eigenvalue values (Total column), the explained variance (% of Variance column) and the cumulative variance (Cumulative % column), in the context of the initial solution, before rotation. The explained variance by each factor is distributed as follow: the first factor, 62.137% and the second factor, 13.277%. Both factors explain 75.414% from the analyzed values variance. The Rotation Sums of Squared Loadings columns, are presenting the values for both factors, but after the rotation procedure application. In the context of the same total variance (75.414%), it can be observed a redistribution of the explained variance by each factor, as follow: the first factor, 62.090% and the second factor, 13.324%. As we can observe, by the rotation method, the first factor loses from the saturation level, in the favor of the second one. In figure 7, the eigenvalues (the sum of squared values in the column of a factor matrix) for all the principal components, obtained using the PCA method, are plotted in the sequence of the principal factors. The number of factors is chosen where the plot levels off to a linear decreasing pattern. The figure below suggests a two-factor solution, since the eigenvalues begin a linear decline.

Annals of the Constantin Brncui University of Trgu Jiu, Economy Series, Issue 1/2010

192

Analele Universitii Constantin Brncui din Trgu Jiu, Seria Economie, Nr. 1/2010

vreme ce valorile proprii prezint o scdere liniar.

The Component Matrix table presented in figure 8, supplies the variables list and their contribution to each selected factors Fig. 7. Reprezentarea grafic loading, in correlation terms. The data from this Matricea componentelor (Component Matrix ) prezentat n figura 8, furnizeaz lista to the initial factor solution, before the table refers variabilelor i contribuia lor la ncrcarea fiecruia dintre factorii selectai, n termeni rotation procedure application.
de corelaie. Datele din acest tabel se refer la soluia factorial iniial, nainte de

aplicarea procedurii de rotaie.


Fig. 8. Matricea componentelor

Figura 9 furnizeaz prezint matricea componentelor dup rotaie (Rotated Component Matrix), care conine datele obinute dup aplicarea procedurii de rotaie a factorilor (o transformare a factorilor principali sau componentelor, pentru a aproxima structura dat).

Figure 9 supplies the Rotated Component Matrix table, which contains the data obtained after application of factors rotation (a transformation of the principal factors or components in order to approximate simple structure).

Fig. 9. Matricea componentelor dup rotaie

The data from figure 9, allows us to detach final conclusions, as regards the factor structure of the analyzed variables:

The first factor is compound by Human Datele din figura 9, ne permite s o Development Index (0.990), Gender related desprinde concluziile finale, cu privire la structura
Annals of the Constantin Brncui University of Trgu Jiu, Economy Series, Issue 1/2010

193

Analele Universitii Constantin Brncui din Trgu Jiu, Seria Economie, Nr. 1/2010

factorilor pentru variabilele analizate:

o Primul factor este alctuit din variabilele Indicele Dezvoltrii Umane (0.990), Indicele dezvoltrii referitor la gen (0.987), Indicele speranei de via (0.936) i Indicele produsului intern brut (0.922), motiv pentru care vom redenumii acest factor ca fiind Indice_Calitate_Via. o Cel de-al doilea factor este alctuit numai din variabila Activitatea economic a femeilor (0.945) i i v-a menine acelai nume. Folosind acest tip de analiz factorial, putem obine informaii utile referitor la factorii care au o mare influen asupra calitii vieii oamenilor, oferind statisticienilor posibilitatea de a urmrii Conclusion This article highlights the principal evoluia ascendent sau descendent a acesteia. components analysis method application utility in development domain, with the final Concluzii goal of identifying those factors which Acest articol subliniaz utilitatea aplicrii metodei influence in a significant way the people analizei componetelor principale n domeniul quality of life, in the actual context of dezvoltrii umane, cu scopul final de a identifica globalization and accelerated development. acei factori care influeneaz ntr-un mod The immediate effects of this type of semnificativ calitatea vieii oamenilor, n contextul application can be the possibility of actual al globalizrii i dezvoltrii accelerate. analyzing, testing and the improvement of Efectul imediat al acestui tip de aplicaie poate fi those factors which are directly responsible posibilitatea de a analiza, testa i mbuntii acei with the life quality level. Knowing and factori care sunt direct responsabili cu nivelul de understanding those factors, can be initiated calitate al vieii. Cunoscnd i analiznd aceti programs for bringing to an upper level, the factori, pot fi iniiate programe pentru a aduce la people welfare. un nivel superior, bunstarea oamenilor. Bibliography 1. Gorunescu, F., 2006,Data mining. Concepts, Bibliografie Models and Techniques , (Cluj-Napoca, 1. Gorunescu, F., 2006,Data mining. Concepte, Romania) modele i tehnici , (Cluj-Napoca, Romnia) 2. UNDP, 2007, Human Development Report 2. PNUD, 2007, Raportul Dezvoltrii Umane 2007/2008, Fighting climate change: Human 2007/2008, Luptnd mpotriva schimbrilor solidarity in a divided world, 229-329 climaterice: Solidaritate uman ntr-o lume 3. UNDP, 2007, United Nations Development divizat, 229-329 Programme Around the World, 3. PNUD, 2007, Programul Naiunilor Unite http://www.undp.ro, accessed 10 September pentru Dezvoltare n Lume, 2008 http://www.undp.ro, accesat 10 Septembrie 4. Smith, L., 2002, A tutorial on Principal 2008 Components Analysis, 12-13 4. Smith, L., 2002, Tutorial pentru Analiza 5. Pohlmann, J., Factor Analysis glossary Componetelor Principale, 12-13 EPSY 580B-Factor Analysis Seminar 5. Pohlmann, J., Glosar Analiz Factorial EPSY http://www.siu.edu/~epse1/pohlmann/factglo 580B-Seminar Analiz Factorial s/ http://www.siu.edu/~epse1/pohlmann/factglos/
Annals of the Constantin Brncui University of Trgu Jiu, Economy Series, Issue 1/2010

development index (0.987), Life expectancy index (0.936) and Gross domestic product (0.922) variables that is way we will rename this factor as Life_Quality_Index. o The second factor is compound only by Female economic activity (0.945) variable and it will maintain the same name. Using this type of factor analysis method, we can obtain useful informations about the factors which have a great influence on the quality of people life, giving to the statisticians the possibility to pursue theirs ascending or descending evolution.

194