Analisis Komponen Utama Dg R

Analisis Komponen Utama/ Principal Component Analysis (Teori) •

•

Tujuannya mereduksi dimensi peubah yang saling berkorelasi menjadi peuba h2 baru yang tidak berkorelasi dengan tetap mempertahankan sebanyak mungkin keragaman data asalnya. (patokan 80%) Misal ada 1000 variable apa kelebihan n kekurangannya..!! 1. Terlalu rumit 2. "egi interpretasi sulit "ehingga perlu dilakukan reduksi data. "yaratnya harus ada korelasi kuat antar variable. #angkah langkah $& ' $* engujian hipotesis matriks korelasi+ melihat ada tidaknya korelasi yang erat antar variable. dengan menggunakan uji bartlet* H : ρ =I ("elain diagonal utama,0 $rtinya korelasi antar peubah 0) - * ρ≠ I ("elain diagonal utama,0 $rtinya korelasi yang erat antar peubah 0

p

1

p

UJi Barlett:

n = jumlah observasi p , jumlah variable  R , matrik korelasi (estimasi)  = determinan matrik korelasi Tolak -0 jika

x > x arena kita niatnya make $& $& yang kita harapkan Tolak Tolak -0. $rtinya $rtinya antar variable a/al ada korelasi sehingga tujuan reduksi' penyusutan dimensi data menjadi terapai. 1. Men Me nar arii akar akar ir irii dari dari mat matri riks ks kova kovari rian an (") (") ata atau u basi basiss kore korela lasi si () ().. ika ika sat satua uan n vari variab able le sama pake kovarian jika satuan berbeda pake korelasi. 2. Meng Me ngur urut utka kan n akar akar i iri ri yan ang g dipe dipero role leh h dari dari te terb rbes esar ar ke ter terke kei ill (3 (3 43 4 3 4 0) 2



hitung

2

tabel

1

5.

2... 2...

p

Membuat Memb uat pe peuba ubah h baru baru (k (kom ompo pone nen n utam utama) a) yang yang me meru rupa paka kan n komb kombin inas asii lin linea earr dari dari pe peub ubah ah asalnya.

Membuat vetor iri yang dinormalisasi (dibuat orthonormal) dari masing2 akar iri yang bersesuaian 6 ,e 7,e 9 :;:e 9 6 ,e 7,e 9 :;:e 9 1

1

11

1

1p

p

2

2

21

1

2p

p

; 6 ,e 7,e 9 :;:e 9 9 ; 9 ? "i@at peubah baru* Tidak saling berkorelasi dan berurutan dari ukuran kepentingannya. 61 paling penting sampai 6p 6p p

p

p1

1

pp

1

1.

p

p

Mela Me laku kuka kan n pro prose ses s red reduk uksi si KU yan ang g ter terbe bent ntuk uk.. Ada 3 ar ara a: =engan pr proporsi keragaman (b (bagi akar i iri pe per total ak akar i iri)

1. 2.

$kar iri A1

5.

"ree plot Misal * proporsi keragaman

6 ,e 7,e 9 :;:e 9

 BC%

6 ,e 7,e 9 :;:e 9

 25%

1

2

1

11

2

21

1

1p

1

2p

p

p

; 6 ,e 7,e 9 :;:e 9 p

p

p1

1

pp

p

roporsi keragaman variable baru pertama belum ukup sehingga ditambah dengan variable baru kedua. adi banyaknya & yang terbentuk adalah 2. $kar iri* "elama akar irinya A1 itulah banyakn ya &. "ree plot * dilihat landau uramnya dan besarnya akar iri. (sree plot tu plot antara jumlah variable dengan akar irinya)

Melakukan pena!aan pada KU yang digunakan setelah ter"adi proses reduksi. Ada 2 ara: 1. ore o rela lasi si ant antar ar & & deng dengan an var varia iabl blee asal asalny nya. a. or orel elas asii yang yang be besa sarr tu tu yang yang me meni niri rikan kan & 2. =engan melihat penimbang (weighting ) 61,e17,e 9 :;:e p9p penimbang tu eDnya. enimbangnya  enimbangnya yang paling besar. alo penimbangnya beda2 tipis berarti & diirikan oleh variable2 tsb. 11

1

1

5f unc t i onst odoPr i nc i pal Component s Anal y s i si nR Po s t e do nJ u ne1 7,2 0 1 2

Pr i n c i p alCo mp on en tAn al y s i s( PCA)i s a mul t i v ar i at et ec hni que t hata l l ows us t o s ummar i z et hes y st emat i cpat t er nsofv ar i at i onsi nt hedat a. Fr om adat aanal y s i ss t andpoi nt ,PCAi sus edf ors t udy i ngonet abl eofobs er v at i onsand v ar i abl eswi t ht hemai ni deaoft r ans f or mi ngt heobs er v edv ar i abl esi nt oas etofnew v ar i abl es ,t hepr i nc i palc omponent s ,whi c har eunc or r el at edande xpl ai nt hev ar i at i oni n t h ed at a .F ort h i sr e as o n,PCA a l l o ws t or e du c e a“ c o mp l e x ”d a t as e tt o al o we r di mens i oni nor dert or ev ealt hes t r uc t ur esort hedomi nantt y pesofv ar i at i onsi nbot h t heobs er v at i onsandt hev ar i abl es .

PCAi nR I nR,t her ear es ev e r alf unc t i onsf r om di ff er entpac k agest hatal l o w ust oper f or m PCA. I nt hi spos tI ’ l ls howy ou5di ffer entway st odoaPCAus i ngt hef ol l owi ngf unc t i ons( wi t h t hei rc or r es pondi ngpac k agesi npar ent hes es ) : •

prcomp() ( s t at s )

•

princomp() ( s t at s )

•

PCA() ( F ac t o Mi n eR)

•

dudi.pca() ( a de 4)

•

acp() ( amap)

Br i efnot e:I ti snoc oi nc i denc et hatt het hr eeex t er nalpac kages( "FactoMineR", "ade4", and"amap")ha v ebeende v el o pedb yFr enc hdat aanal y s t s ,whi c hha v eal ongt r adi t i on andpr ef er enc ef orPCAandot herr el at ede xpl or at or yt ec hni ques . Nomat t erwhatf unc t i ony oudec i det ous e,t het y pi c alPCAr es ul t sshoul dc ons i s tofa s etofei genv al ues ,at abl ewi t ht hes c or esorPr i nc i palComponent s( PCs ) ,andat abl e o fl o ad i n gs ( o rc or r e l at i o ns b et we en v a r i a bl es a nd PCs ) .Th ee i g en v al u es p r o v i d e i nf or mat i on oft he v ar i abi l i t yi nt he dat a.The s c or es pr o vi de i nf or mat i on aboutt he s t r uc t ur eoft heobs er v at i ons .Thel oadi ngs( orc or r el at i ons )al l ow y out ogetas ens eof t he r el at i ons hi psbet ween v ar i abl es ,aswel last hei ra ss oc i at i onswi t ht he ex t r ac t ed PCs.

TheDat a T omak et hi ngseas i er ,we ’ l lus et hedat as etUSArrests t hatal r eadyc omeswi t hR.I t ’ sa d at af r a me wi t h5 0r o ws( USA s t a t e s )a nd 4 c o l u mn sc o nt a i n i n gi n f o r ma t i o na bo ut v i o l entc r i mer at esb yUSSt at e.Si nc emos toft het i mest hev a r i ab l esar emeas ur edi n di ffer entsc al es,t he PCA mustbe per f or med wi t h st andar di zed dat a( mean = 0, v a r i an c e=1 ) .Theg oodne wsi st h ata l lo ft h ef un c t i o nst h atp er f or m PCA c o mewi t h par amet er st os pec i f yt hatt heanal y s i smus tbeappl i edons t andar di z eddat a.

Opt i on1:us i ngpr c omp( ) Thef unc t i onprcomp() c o meswi t ht hedef aul t"stats" p a c k a g e,wh i c hme an st h aty o u don’ tha v et oi ns t al lan yt hi ng.I ti sper hapst hequi c k es twa yt odoaPCA i fy oudon’ t wantt oi ns t al l ot herpac kages . # PCA with function prcomp pca1 = prcomp(USArrests, scale. = TRUE) # sqrt of eigenvalues pca1$sdev  !1 1.#$4% &.%%4% &.#%$1 &.41'4 # loadings head (pca1$rotation)



PC1

PC

PC

PC4 &.'4%

 Murder

*&.##%

&.41+ *&.41

 Assault

*&.#+

&.1++& *&.'+1 *&.$441

 UranPop *&.$+ *&.+$+ *&.$+&

&.1++

 Rape

&.&+%&

*&.#44 *&.1'$

&.+1$+

# PCs (aka scores) head (pca1$-)



PC1

PC

PC

PC4 &.1#4$&

 Alaama

*&.%$#$

1.1& *&.4%+&

 Alasa

*1.%&#

1.&'4

.&1%#& *&.441+

 Ari/ona

*1.$4#4 *&.$+#

&. *&.+''

 Aransas

&.14&&

1.1&+#

&.114 *&.1+&%$

 Cali0ornia *.4%+' *1.#$4

&.#%#4 *&.+#'

 Colorado

1.&+4&&

*1.4%% *&.%$$'

&.&&14#

Opt i on2:us i ngpr i nc omp( ) The f unc t i onprincomp() a l s oc ome s wi t ht h ed ef a ul t"stats" pac k age,and i ti sv e r y s i mi l art oherc ous i nprcomp().WhatIdon’ tl i k eofprincomp() i st hats omet i mesi twon’ t di s pl ayal l t hev al uesf ort hel oadi ngs ,butt hi si sami nordet ai l . # PCA with function princomp pca = princomp(USArrests, cor = TRUE) # sqrt of eigenvalues pca$sdev  Comp.1 Comp. Comp. Comp.4  1.#$4% &.%%4% &.#%$1 &.41'4 # loadings unclass(pca$loadins)



Comp.1

Comp.

Comp.

Comp.4 &.'4%

 Murder

*&.##%

&.41+ *&.41

 Assault

*&.#+

&.1++& *&.'+1 *&.$441

 UranPop *&.$+ *&.+$+ *&.$+&

&.1++

 Rape

&.&+%&

*&.#44 *&.1'$

&.+1$+

# PCs (aka scores) head (pca$scores)



Comp.1

Comp.

Comp.

Comp.4 &.1#''$

 Alaama

*&.%+#'

1.14 *&.444$

 Alasa

*1.%#&1

1.&$

.&4&&& *&.4+#+

 Ari/ona

*1.$' *&.$4'&

&.$+ *&.+4'#

 Aransas

&.1414

1.11%+

&.114#$ *&.1++11

 Cali0ornia *.#4& *1.#4%

&.#%+#' *&.41%%'

 Colorado

1.&%#&1

*1.#14' *&.%+$'

&.&&14'#

Op t i on3 :us i n gPCA( ) Ah i g hl yr e c omme nd ed o pt i o n,e s p ec i a l l yi fy o u wa ntmo r ed et a i l e dr e s ul t sa nd as ses s i ngt ool s ,i st hePCA() f u nc t i o nf r o mt h e pa c k ag e"FactoMineR".I ti sbyf art he b es tPCA f u nc t i o ni nR a ndi tc o me swi t han umb ero fp ar a me t e r st h ata l l o wy o ut o t weakt heanal y s i si nav er yni c ewa y . # PCA with function PCA lirar2(FactoMineR) # apply PCA

pca = PCA(USArrests, rap3 = FALSE) # matrix with eigenvalues pca$ei 

eienvalue percentae o0 variance cumulative percentae o0 variance

 comp 1

.4+&

'.&&'

'.&1

 comp 

&.%+%+

4.$44

+'.$#

 comp 

&.#''

+.%14

%#.''

 comp 4

&.1$4

4.'

1&&.&&

# correlations between variables and PCs pca$var$coord 

im.1

im.

im.

im.4

 Murder

&.+44& *&.41'&

&.&+

&.$&$

 Assault

&.%1+4 *&.1+$&

&.1'&1 *&.&%#%

 UranPop &.4+1

&.+'+

&.#$

&.&##$#

 Rape

&.1''# *&.4++

&.&$&$

&.+##+

# PCs (aka scores) head (pca$ind$coord)



im.

im.

im.4

 Alaama

&.%+#' *1.14

&.444$

&.1#''$

 Alasa

1.%#&1 *1.&$ *.&4&&& *&.4+#+

 Ari/ona

1.$'

 Aransas

im.1

&.$4'& *&.$+ *&.+4'#

*&.1414 *1.11%+ *&.114#$ *&.1++11

 Cali0ornia

.#4&

1.#4% *&.#%+#' *&.41%%'

 Colorado

1.#14'

&.%+$' *1.&%#&1

&.&&14'#

Opt i on4:us i ngdudi . pc a( ) Anot heropt i oni st ous et hedudi.pca() f u nc t i o nf r o mt hep ac k ag e"ade4"whi chhasa h ug ea mou nto fo t h erme t h od saswe l l a ss o mei n t er e s t i ngg r a ph i c s . # PCA with function dudipca lirar2(ade4) # apply PCA pca4 = dudi.pca(USArrests, n0 = #, scann0 = FALSE) # eigenvalues pca4$ei  !1 .4+& &.%+%+ &.#'' &.1$4

# loadings pca4$c1 

CS1

CS

CS

CS4 &.'4%

 Murder

*&.##%

&.41+ *&.41

 Assault

*&.#+

&.1++& *&.'+1 *&.$441

 UranPop *&.$+ *&.+$+ *&.$+&

&.1++

 Rape

&.&+%&

*&.#44 *&.1'$

&.+1$+

# correlations between variables and PCs pca4$co 

Comp1

Comp

Comp

Comp4 &.$&$

 Murder

*&.+44&

&.41'& *&.&+

 Assault

*&.%1+4

&.1+$& *&.1'&1 *&.&%#%

 UranPop *&.4+1 *&.+'+ *&.#$

&.&##$#

 Rape

&.&$&$

*&.+##+ *&.1''#

&.4++

# PCs head (pca4$li)



A-is1

A-is

A-is

A-is4 &.1#''$

 Alaama

*&.%+#'

1.14 *&.444$

 Alasa

*1.%#&1

1.&$

.&4&&& *&.4+#+

 Ari/ona

*1.$' *&.$4'&

&.$+ *&.+4'#

 Aransas

&.1414

1.11%+

&.114#$ *&.1++11

 Cali0ornia *.#4& *1.#4%

&.#%+#' *&.41%%'

 Colorado

1.&%#&1

*1.#14' *&.%+$'

&.&&14'#

Opt i on5:us i ngac p( ) Afi f t hpos s i bi l i t yi st heacp() f unc t i onf r om t hepac k age"amap". # PCA with function acp lirar2(amap) # apply PCA pca# = acp(USArrests) # sqrt of eigenvalues pca#$sdev  Comp 1 Comp  Comp  Comp 4  1.#$4% &.%%4% &.#%$1 &.41'4 # loadings pca#$loadins



Comp 1

Comp 

Comp 

Comp 4

 Murder

&.##%

&.41+ *&.41

&.'4%

 Assault

&.#+

&.1++& *&.'+1 *&.$441

 UranPop &.$+ *&.+$+ *&.$+&

&.1++

 Rape

&.&+%&

&.#44 *&.1'$

&.+1$+

# scores head (pca#$scores)



Comp 1

Comp 

Comp 

Comp 4

 Alaama

&.%$#$

1.1& *&.4%+&

&.1#4$&

 Alasa

1.%&#

1.&'4

.&1%#& *&.441+

 Ari/ona

1.$4#4 *&.$+#

&. *&.+''

 Aransas

*&.14&&

1.1&+#

&.114 *&.1+&%$

 Cali0ornia

.4%+' *1.#$4

&.#%#4 *&.+#'

 Colorado

1.4%% *&.%$$'

1.&+4&&

&.&&14#

Ofc our s et hes ear enott heonl yopt i onst odoaPCA,butI ’ l l l ea v et heot herappr oac hes f oranot herpos t .

PCAp l o t s Ev e r y b od yu s esPCA t ov i s u al i z et h ed at a,a ndmos to ft h ed i s c u s s edf u nc t i o nsc ome wi t ht hei rownpl otf unc t i ons .Buty ouc anal s omak eus eoft hegr eatgr aphi c aldi s pl a ys of"plot".J u s tt os h ow y o uac o up l eo fpl o t s ,l e t ’ st a k et h eb as i cr e s ul t s f r o m prcomp(). Pl o t o f o b s e r v a t i o n s # load ggplot! lirar2(plot) # create data frame with scores scores = as.data.frame(pca1$-) # plot of observations plot(data = scores, aes(- = PC1, 2 = PC, lael = rownames(scores))) + eom53line(2intercept = &, colour = "ra2'#") + eom5vline(-intercept = &, colour = "ra2'#") + eom5te-t(colour = "tomato", alp3a = &.+, si/e = 4) + title("PCA plot o0 USA States * Crime Rates")

Ci r c l eo f c o r r e l a t i o n s # function to create a circle circle <- function(center = c(&, &), npoints = 1&&) 6 r = 1 tt = seq (&,   !i, lent3 = npoints) -- = center!1 + r  cos(tt) 22 = center!1 + r  sin(tt) return(data.frame(- = --, 2 = 22))

7 corcir = circle(c(&, &), npoints = 1&&) # create data frame with correlations between variables and PCs correlations = as.data.frame(cor(USArrests, pca1 $-)) # data frame with arrows coordinates arro8s = data.frame(-1 = c(&, &, &, &), 21 = c(&, &, &, &), - = correlations$PC1, 2 = correlations$PC) # geom"path will do open circles plot() + eom5pat3(data = corcir, aes(- = -, 2 = 2), colour = "ra2'#") + eom5sement(data = arro8s, aes(- = -1, 2 = 21, -end = -, 2end = 2), colour = "ra2'#") +

eom5te-t(data = correlations, aes(- = PC1, 2 = PC, lael =

rownames(correlations))) +

eom53line(2intercept = &, colour = "ra2'#") + eom5vline(-intercept = &, colour = "ra2'#") + -lim(*1.1, 1.1) + 2lim(*1.1, 1.1) + las(- = "pc1 ai-s", 2 = "pc a-is") + title("Circle o0 correlations")

Publ i s hedi nc at egor i esho wt oT a gg edwi t hpr i nc i pal component sanal y si spc amul t i v ar i at e pl otR ← pr evi ous ne xt→ Seeal l pos t s→ ©Gast onSanchez . Al l c ont ent sunde r( CC)BY NCSAl i c e n s e, u nl es sot her wi s eno t ed. Di dy oufi ndt hi ss i t eus ef ul ?I fy es ,c on si derhe l pi ngmewi t hmy wi s hl i s t .

Principal Components and Factor Analysis This section covers principal components and factor analysis. The later includes both exploratory and confirmatory methods.

Principal Components The princomp( ) function produces an unrotated principal component analysis.

# #

Pricipal entering

#

Components

raw

data

from

t

the <-

s$mmary(t

and

extracting correlation

print

loadings(t plot(t,type=&lines&

cor=T!"

%ariance

#

acco$nted pc

# #

for loadings

scree the

PCs matrix

princomp(mydata,

#

t'scores

Analysis

principal

plot components

iplot(t

click to view Use cor=FALSE to base the principal components on the covariance matrix. Use the covmat= option to enter a correlation or covariance matrix directly. If entering a covariance matrix, include the optionn.obs=. The principal( ) function in the psych package can be used to extract and rotate principal components.

#

)arimax

#

otated retaining

Principal *

Components components

lirary(psych t

<-

principal(mydata,

nfactors=*,

rotate=&%arimax&

t # print res$lts mydata can be a raw data matrix or a covariance matrix. Pairwise deletion of missing data is used. rotate can "none", "varimax", "uatimax", "promax", "oblimin", "simplimax", or "cluster" .

Exploratory Factor Analysis

The factanal( ) function produces maximum likelihood factor analysis.

#+axim$mielihood.actorAnalysis #enteringrawdataandextracting/factors, #with%arimaxrotation t< -factanal (mydata,/,rota tion =&%arim ax& pri nt(t , #

plot

factor

load

d ig its=0, 3

c$ to1= 2/, y

sor t= T! "

factor

<-

plot(load,type=&n&

0

t'loadings4,3506

#

set

$p

plot

text(load,laels=names(mydata,cex=27 # add %ariale names

click to view The rotation= options include "varimax", "promax", and "none". !dd the option scores= "regression" or "artlett" to produce factor scores. Use the covmat= option to enter a correlation or covariance matrix directly. If entering a covariance matrix, include the option n.obs=. The factor.pa( ) fnction in the

psych package offers a number of factor analysis related functions, including principal axis

factoring.

#

Principal

Axis

.actor

Analysis

lirary(psych t

<-

factor2pa(mydata,

nfactors=/,

rotation=&%arimax&

t # print res$lts mydata can be a raw data matrix or a covariance matrix. Pairwise deletion of missing data is used. #otation can be "varimax" or "promax".

!eterminin" the #mber of Factors to Extract ! crucial decision in exploratory factor analysis is how many factors to extract. The nFactors package offer a suite of functions to aid in this decision. $etails on this methodology can be found in a

PowerPoint presentation by #aiche, #iopel, and

lais. %f course, any factor solution must be interpretable to be useful.

#

8etermine

lirary(n.actors

9$mer

of

.actors

to

"xtract

e%

<-

eigen(cor(mydata

ap

<-

#

get

eigen%al$es

parallel(s$:ect=nrow(mydata,%ar=ncol(mydata,

rep=3;;,cent=2;* n

<-

ncree(x=e%'%al$es,

aparallel=ap'eigen'e%pea

plotncree(n

click to view

$oin" Frther The Facto%ine& package offers a large number of additional functions for exploratory factor analysis. This includes the use of both uantitative and ualitative variables, as well as the inclusion of supplimentary variables and observations. &ere is an example of the types of graphs that you can create with this package.

#

PCA

)ariale

.actor

lirary(.acto+ine res$lt <- PCA(mydata # graphs generated a$tomatically

click to view Thye

$PA&otation package offers a wealth of rotation options beyond varimax and promax.

+ap

Principal Component Analysis (PCA)

Introduction rinipal omponent $nalysis ($) is a po/er@ul tool /hen you have many variables and you /ant to look into things that these variables an e9plain. $s the name o@ $ suggests $ @inds the ombination o@ your variables /hih e9plains the phenomena. En this sense PCA is useful when you want to reduce the number of the ariables . Fne ommon senario o@ $ is that you have n variables and you /ant to ombine them and make them 5 or G variables /ithout losing muh o@ the in@ormation that the original data have. More mathematially $ is trying to @ind some linear projetions o@ your data /hih preserve the in@ormation your data have. $ is one o@ the methods you may /ant to try i@ you have lots o@ #ikert data and try to understand /hat these data tell you. #etHs say /e asked the partiipants @our BDsale #ikert Iuestions about /hat they are about /hen hoosing a ne/ omputer and got the results like this. Particip Price Softwar AestheticBrand ant e s

P3

>

*

/

?

P0

7

/

0

0

P/

>

?

?

*

P?

*

7

3

/

P*

7

7

*

*

P>

>

?

0

/

P7

*

7

0

3

P@

>

*

?

?

P

/

*

>

7

P3;

3

/

7

*

P33

0

>

>

7

Particip Price Softwar AestheticBrand ant e s

P30

*

7

7

>

P3/

0

?

*

>

P3?

/

*

>

*

P3*

3

>

*

*

P3>

0

/

7

7

Price5 A new comp$ter is cheap to yo$ (35 strongly disagree B 75 strongly agree, oftware5 The  on a new comp$ter allows yo$ to $se software yo$ want to  $se (35 strongly disagree B 75 strongly agree, Aesthetics5 The appearance of a new comp$ter is appealing to yo$ (35  strongly disagree B 75 strongly agree,  Drand5 The rand of the  on a new comp$ter is appealing to yo$ (35 strongly disagree B 75 strongly agree Jo/ /hat you /ant to do is /hat ombination o@ these @our variables an e9plain the phenomena you observed. E /ill e9plain this /ith the e9ample  ode. 

R code e!ample #etHs prepare the same data sho/n in the table above.

Price <- c(>,7,>,*,7,>,*,>,/,3,0,*,0,/,3,0 oftware ,7,?,*,>,/ Aesthetics <- c(/,0,?,3,*,0,0,?,>,7,>,7,*,>,*,7 Drand <- c(?,0,*,/,*,/,3,?,7,*,7,>,>,*,*,7 data <- data2frame(Price, oftware, Aesthetics, Drand $t this point data looks pretty muh the same as the table above. Jo/ /e do $. En  there are t/o @untions @or $* promp() and prinomp(). promp() uses a orrelation oe@@iient matri9 and prinomp() uses a variane ovariane matri9. Kut it seems that the results beome similar in many ases (/hih E havenHt @ormally tested so be are@ul) and the results gained @rom prinomp() have nie @eatures so here E use prinomp().

pca <- princomp(data, cor=T s$mmary(pca, loadings=T $nd here is the result o@ the $.

Emportance of components5 Comp23 Comp20 Comp2/ Comp2? tandard de%iation 32**@/3 ;2@;?;0 ;2>@3>>7/ ;2/70*777 Proportion of )ariance ;2>;7*707 ;20?;/;;> ;233>3>7> ;2;/**33 C$m$lati%e Proportion ;2>;7*707 ;2@?7@7// ;2>?;?; 32;;;;;;;; oadings5 Comp23 Comp20 Comp2/ Comp2? Price -;2*0/ ;2@?@ oftware -;2377 ;277 -;230; Aesthetics ;2*7 ;23/? ;20* -;27/? Drand ;2*@/ ;23>7 ;2?0/ ;2>7? E /ill e9plain ho/ to interpret this result in the ne9t setion.

Interpretation of the results of PCA #etHs take a look at the table @or loadings /hih mean the oe@@iients @or the Lne/ variables. Comp.1Comp.2Comp.3Comp.4

Price

-;2*0/

;2@?@

oftwar -;2377 e

;277

-;230;

Aestheti ;2*7 cs

;23/?

;20*

-;27/?

Drand

;23>7

;2?0/

;2>7?

;2*@/

Nrom the seond table (loadings) $ @ound @our ne/ variables /hih an e9plain the same in@ormation as the original @our variables (rie "o@t/are $esthetis and Krand) /hih are omp.1 to omp.G. $nd omp.1 is alulated as @ollo/s* Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * e!thetic! + 0.5"3 * #rand

Thus $ suess@ully @ound a ne/ ombination o@ the variables /hih is good. The ne9t thing /e /ant to kno/ is ho/ muh eah o@ ne/ variables has a po/er to e9plain the in@ormation that the original data have. Nor this you need to look at "tandard deiation  and Cumulatie Proportion #of $ariance% in the result.

Comp.1Comp.2Comp.3Comp.4

tandard de%iation

32*>

;2@

;2>@

;2/@

C$m$lati%e Proportion

;2>3

;2@*

;2>

32;;

"tandard deviation means the standard deviation o@ the ne/ variables. $ alulates the ombination o@ the variables suh that ne/ variables have a large standard deviation. Thus generally a larger standard deviation means a better variable. $ heuristis is that /e take all the ne/ variables /hose standard deviations are roughly over 1.0 (so /e /ill take omp.1 and omp.2). $nother /ay to determine ho/ many ne/ variables /e /ant to take is to look at umulative proportion o@ variane. This means ho/ muh o@ the in@ormation that the original data have an be desribed by the ombination o@ the ne/ variables. Nor instane /ith only omp.1 /e an desribe C1% o@ the in@ormation the original data have. E@ /e use omp.1 and omp2 /e an desribe 8O% o@ them. Penerally 80% is onsidered as the number o@ the perentage /hih desribes the data /ell. "o in this e9ample /e an take omp.1 and omp.2 and ignore omp.5 and omp.G. En this manner /e an derease the number o@ the variables (in this e9ample @rom G variables to 2 variables). 6our ne9t task is to understand /hat the ne/ variable means in the onte9t o@ your data. $s /e have seen the @irst ne/ variable an be alulated as @ollo/s* Comp.1 = -0.523 * Price - 0.177 * Software + 0.597 * e!thetic! + 0.5"3 * #rand

Et is a very good idea to plot the data to see /hat this ne/ variable means. 6ou an use !core! to take the values o@ eah variable modeled by $.

plot(pca'scores4,36 arplot(pca'scores4,36 Qith the graphs (sorry E /as kinda laRy to upload the graph but you an Iuikly generate it by yoursel@) you an see artiipant 1 D 8 get negative values and the other partiipants get positive values. Et seems that this ne/ variable indiates /hether a user ares about rie and "o@t/are or $esthetis and Krand @or her omputer. "o /e probably an n ame this variable as LNeature'Nashion inde9 or something. There is no de@initive ans/er @or this part o@ $. 6ou need to go through your data and make sense /hat the ne/ variables mean by yoursel@.

PCA and &o'istic re'ression #ne you ha$e done the analysis %ith &'A( you !ay %ant to look into %hether the ne% $ariables an predit so!e pheno!ena %ell. )his is kinda like !ahine learning: *hether +eatures an lassi+y the data %ell. ,et-s say you ha$e asked the partiipants one !ore thing( %hih #they are using /*indo%s or Ma0 in your sur$ey( and the results are like this.

Particip Price Softwar AestheticBrand OS ant e s

P3

>

*

/

?

;

P0

7

/

0

0

;

P/

>

?

?

*

;

P?

*

7

3

/

;

P*

7

7

*

*

3

P>

>

?

0

/

;

P7

*

7

0

3

;

P@

>

*

?

?

;

P

/

*

>

7

3

P3;

3

/

7

*

3

P33

0

>

>

7

;

P30

*

7

7

>

3

P3/

0

?

*

>

3

P3?

/

*

>

*

3

P3*

3

>

*

*

3

P3>

0

/

7

7

3

ere %hat %e are going to do is to see %hether the ne% $ariables gi$en by &'A an predit the # people are using. # is  or 1 in our ase( %hih !eans the dependent $ariable is bino!ial. )hus( %e are going to do logisti regression.  %ill skip the details o+ logisti regression here. + you are interested( the details o+ logisti regression are a$ailable in a separate page. 4irst( %e prepare the data about #.

 <- c(;,;,;,;,3,;,;,;,3,3,;,3,3,3,3,3 )hen( +it the +irst $ariable %e +ound through &'A / i.e.. 'o!p.10 to a logisti +untion.

model <- glm( F pca'scores4,36, family=inomial s$mmary(model 5o% you get the logisti +untion !odel.

Call5 glm(form$la =  F pca'scores4, 36, family = inomial 8e%iance esid$als5 +in 3G +edian /G +ax -0237?> -;2??*@> ;2;3/0 ;2>;;3@ 32>*0>@ CoeHcients5 "stimate td2 "rror I %al$e Pr(JKIK (Entercept -;2;@/73 ;27?03> -;233/ ;23;0 pca'scores4, 36 32?07/ ;2>030 02/;3 ;2;03? L --- ignif2 codes5 ; MLLLN ;2;;3 MLLN ;2;3 MLN ;2;* M2N ;23 M N 3 (8ispersion parameter for inomial family taen to e 3 9$ll de%iance5 0023@3 on 3* degrees of freedom esid$al de%iance5 302;// on 3? degrees of freedom AEC5 3>2;// 9$mer of .isher coring iterations5 * ,et-s see ho% %ell this !odel predits the kind o+ #. 6ou an use +itted/0 +untion to see the predition.

tted(model 3 0 / ? * > 7 ;23*37/70/ ;2;?3*?? ;2/?>@7// ;2;??;>3// ;20**0;7?* ;2;7@;@>// ;2;0>?3>> @  3; 33 30 3/ 3? ;2037???*? ;2@?//;7 ;2/>30?33 ;23;*7? ;27/?0@>?@ ;2@*3;/3 ;27>0@*37; 3* 3> ;27@3?@@ ;2>?3;@?3 )hese $alues represent the probabilities o+ being 1. 4or exa!ple( %e an expet 178 hane that &artiipant 1 is using # 1 based on the $ariable deri$ed by &'A. )hus( in this ase( &artiipant 1 is !ore likely to be using # ( %hih agrees %ith the sur$ey response. n this %ay( &'A an be used %ith regression !odels +or alulating the probability o+ a pheno!enon or !aking a predition.

actor Analysis

Introduction Nator $nalysis is another po/er@ul tool to understand /hat your data mean partiularly /hen you have many variables. Qhat Nator $nalysis does is to try to @ind hidden variables /hih e9plain the behavior o@ your observed variables. Fur interests here also lie in reduing the number o@ variables. "o /e hope that /e an @ind a smaller number o@ ne/ variables /hih e9plain your data /ell. En this sense it sounds very similar to$. $lthough the outome is very similar in terms o@ reduing the number o@ variables the approah to redue the number o@ variable is di@@erent. E /ill e9plain this in the ne9t setion. E@ you are a little more kno/ledgeable you may have heard o@ the terms like S9ploratory Nator $nalysis (SN$) and on@irmatory Nator $nalysis (N$). SN$ means that you donHt really kno/ /hat hidden variables (or @ators) e9ist and ho/ many they are. "o you are trying to @ind them. N$ means that you already have some guesses or models @or your hidden variables (or @ators) and you /ant to hek /hether your models are orret. En many ases your Nator $nalysis is SN$ and E e9plain it in this page. Qe are going to use a similar e9ample in $. #etHs say you have some data like this @rom your survey about /hat is important /hen they deide /hih omputer to buy. Particip Price Softwar AestheticBrand amily riend ant e s

P3

>

*

/

?

7

>

P0

7

/

0

0

0

/

P/

>

?

?

*

*

?

P?

*

7

3

/

>

7

P*

7

7

*

*

0

3

P>

>

?

0

/

?

*

P7

*

7

0

3

3

?

P@

>

*

?

?

7

*

P

/

*

>

7

/

?

Particip Price Softwar AestheticBrand amily riend ant e s

P3;

3

/

7

*

0

?

P33

0

>

>

7

>

*

P30

*

7

7

>

7

7

P3/

0

?

*

>

>

0

P3?

/

*

>

*

0

/

P3*

3

>

*

*

?

*

P3>

0

/

7

7

*

>

Price5 A new comp$ter is cheap to yo$ (35 strongly disagree B 75 strongly agree,  oftware5 The  on a new comp$ter allows yo$ to $se software yo$ want to $se (35 strongly disagree B 75 strongly agree, Aesthetics5 The appearance of a new comp$ter is appealing to yo$ (35  strongly disagree B 75 strongly agree, Drand5 The rand of the  on a new comp$ter is appealing to yo$ (35  strongly disagree B 75 strongly agree,  .riend5 Oo$r friends opinions are important to yo$ (35 strongly disagree B 75 strongly agree, .amily5 Oo$r familys opinions are important to yo$ (35 strongly disagree B 75  strongly agree2 Nor suess@ully doing Nator $nalysis /e need more data than this e9ample. E@ your /ant to @ind n @ators you /ant to have roughly 3n - $ndimensions o@ data and 5n - 10n samples. $nd Nator $nalysis assumes the normality o@ the data so it is not a great tool @or ordinal data. -o/ever in pratie /e an use Nator $nalysis on ordinal data i@ the sale is O or more and data an be treated as interval data. 

Through Nator $nalysis you /ant to @ind hidden variables (common factors ) /hih may e9plain the responses you gained. Nor looking at ho/ to do Nator $nalysis in  E /ould like to brie@ly e9plain the di@@erene bet/een $ and N$.

(ifference between )actor Analysis and PCA

*he intuition of Principal Component Analysis is to find new combination of ariables which form lar'er ariances . Qhy are larger varianes important! This is a similar onept o@ entropy in in@ormation theory. #etHs say you have t/o variables. Fne o@ them (ar 1) @orms J(1 0.01) and the other (ar 2) @orms J(1 1). Qhih variable do you think has more in@ormation! ar 1 is al/ays pretty muh 1 /hereas ar 2 an take a /ider range o@ values like 0 or 2. Thus ar 2 has more hanes to have various values than ar 1 /hih means ar 2Hs entropy is larger than ar 1Hs. Thus /e an say ar 2 ontains more in@ormation than ar 1.

$lthough the e9ample above just looks at one variable at one time PCA tries to find linear combination of the ariables which contain much information by loo+in' at the ariance . This is /hy the standard deviation is one o@ the important metris to determine the number o@ ne/ variables in $. $nother interesting aspet o@ the ne/ variables derived by $ is that all ne/ variables are orthogonal. 6ou an think that $ is rotating and translating the data suh that the @irst a9is ontains the most in@ormation and the seond has the seond most in@ormation and so @orth. *he intuition of )actor Analysis is to find hidden ariables which affect your obsered ariables by loo+in' at the correlation . E@ one variable is orrelated /ith another variables /e an say that these t/o variables are generated @rom one hidden variable so /e an e9plain the phenomena /ith that one hidden variable instead o@ the t/o variable. #etHs take a look at the orrelation matri9 o@ the data /e have (see the ode e9ample belo/ to reate the data @rame) be@ore doing Nator $nalysis.

cor(data $nd you get the orrelation matri9.

Price oftware Aesthetics Drand .riend .amily Price 32;;;;;;;; ;23@*>30/ -;2>/0;003 -;2*@;0>>@; ;2;/;@0;;> -;2;>3@/33@ oftware ;23@*>30/; 32;;;;;;; -;23?>03*3> -;233@*@>?* ;23;;>77? ;237>*70/> Aesthetics -;2>/0;003 -;23?>03*0 32;;;;;;;; ;2@*0@*?/> ;2;/@7 -;2;>77/>; Drand -;2*@;0>>@; -;233@*@>? ;2@*0@*?/> 32;;;;;;;; ;2///3>73 ;2;0>>0/@ .riend ;2;/;@0;;> ;23;;>77 ;2;/@7 ;2///3>73 32;;;;;;;; ;2>;7073@ .amily -;2;>3@/33@ ;237>*70? -;2;>77/>; ;2;0>>0/@ ;2>;7073@ 32;;;;;;;; "o it looks like that rie has strong negative orrelations /ith $esthetis and Krand and Nriend has a strong orrelation /ith Namily. This means that /e an e9pet that /e /ill have t/o ommon @ators and one /ill be related to rie $esthetis and Krand and the other /ill be related to Nriend and Namily. #etHs move on to Nator $nalysis and see /hat /ill happen.

R code e!ample En the @ollo/ing ode e9ample E skipped some details suh as using varima9 rotation or proma9 rotation ( uses varima9 rotation by de@ault). E@ you /ant to kno/ more details E reommend you to read other books or re@erenes @or no/. E may add these details later but not sure; Nirst /e prepare the data.

Price <- c(>,7,>,*,7,>,*,>,/,3,0,*,0,/,3,0 oftware ,7,?,*,>,/ Aesthetics <- c(/,0,?,3,*,0,0,?,>,7,>,7,*,>,*,7 Drand <- c(?,0,*,/,*,/,3,?,7,*,7,>,>,*,*,7 .riend ,0,?,3,7,/,0,>,7,>,0,?,* .amily <- c(>,/,?,7,3,*,?,*,?,?,*,7,0,/,*,> data

Analisis Komponen Utama Dg R

Recommend Documents