Sunteți pe pagina 1din 3

Hierarchical Cluster Analysis | R Tutorial

http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-an...

About Resources Terms of Use

An R Introduction to Statistics HOME DOWNLOAD EBOOK SITE MAP CONTACT

Search this site:

Search With the distance matrix found in previous tutorial, we can use various techniques of cluster analysis for relationship discovery. For example, in the data set mtcars, we can run the distance matrix with hclust, and plot a dendrogram that displays a hierarchical relationship among the vehicles. > d <- dist(as.matrix(mtcars)) > hc <- hclust(d) > plot(hc) # find distance matrix # apply hirarchical clustering # plot the dendrogram

R Tutorial eBook

Careful inspection of the dendrogram shows that 1974 Pontiac Firebird and Camaro Z28 are classified as close relatives as expected.

R Tutorials
R Introduction Elementary Statistics with R GPU Computing with R Distance Matrix by GPU Hierarchical Cluster Analysis Kendall Rank Coefficient

Similarly, the dendrogram shows that the 1974 Honda Civic and Toyota Corolla are close to each other.

Significance Test for Kendall's Tau-b Support Vector Machine with GPU Support Vector Machine with GPU, Part II Bayesian Classification with Gaussian Process Hierarchical Linear Model Installing GPU Packages

Recent Articles
Installing CUDA Toolkit 5.5 on

In general, there are many choices of cluster analysis methodology. The hclust function in R uses the complete linkage method for hierarchical clustering by default. This particular clustering method defines the cluster distance between two clusters to be the maximum

Ubuntu 12.10 Linux August 2, 2013 Installing CUDA Toolkit 5.5 on

1 of 3

1/21/2014 1:40 PM

Hierarchical Cluster Analysis | R Tutorial

http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-an...

distance between their individual components. At every stage of the clustering process, the two nearest clusters are merged into a new cluster. The process is repeated until the whole data set is agglomerated into one single cluster. For a data set with 4,000 elements, it takes hclust about 2 minutes to finish the job on an AMD Phenom II X4 CPU. > test.data <- function(dim, num, seed=17) { + + + } > m <- test.data(120, 4500) > > library(rpud) > d <- rpuDist(m) > > system.time(hclust(d)) user 115.765 system elapsed 0.087 115.914 # complete linkage # load rpud with rpudplus # Euclidean distance set.seed(seed) matrix(rnorm(dim * num), nrow=num)

Fedora 18 Linux August 2, 2013 Hierarchical Linear Model July 22, 2013 Bayesian Classification with Gaussian Process January 6, 2013

By code optimization, the rpuHclust function in rpud equipped with the rpudplus add-on performs much better. Moreover, as added bonus, the rpuHclust function creates identical cluster analysis output just like the original hclust function in R. Note that the algorithm is mostly CPU based. The memory access turns out to be too excessive for GPU computing. > system.time(rpuHclust(d)) user 0.792 system elapsed 0.104 0.896 # rpuHclust with rpudplus

Here is a chart that compares the performance of hclust and rpuHclust with rpudplus in R:

Exercises 1. Run the performance test with more vectors in higher dimensions. 2. Compute hierarchical clustering with other linkage methods, such as single,

2 of 3

1/21/2014 1:40 PM

Hierarchical Cluster Analysis | R Tutorial

http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-an...

median, average, centroid, Wards and McQuittys. Distance Matrix by GPU Tags: GPU Computing with R matrix hierarchical clustering set.seed rpud rpuHclust up cluster analysis dist mtcars hclust complete linkage library matrix Kendall Rank Coefficient dendrogram rnorm distance

plot

rpuDist

Copyright 2009 - 2014 Chi Yau All Rights Reserved Theme design by styleshout Fractal graphics by zyzstar Adaptation by Chi Yau

3 of 3

1/21/2014 1:40 PM

S-ar putea să vă placă și