Sunteți pe pagina 1din 3

K-Means Clustering Tutorial: Matlab Code

http://people.revoledu.com/kardi/tutorial/kMean/matlab_kMeans.htm

Home Numerical Excel Tutorial Microscopic Pedestrian Simulation Kardi Teknomo's Tutorial Micro-PedSim Free Download Personal Development Handbook

Research Publications Tutorials Resume Personal Resources Contact

K-Mean Clustering Code in Matlab


By Kardi Teknomo, PhD.

< Previous | Next | Contents>

Quantitative Geography A. Stewart Fotheri... Best Price $29.98 or Buy New $39.29

Privacy Information

For you who like to use Matlab, Matlab Statistical Toolbox contain a function name kmeans. If you do not have the statistical toolbox, you may use my generic code below. The kMeanCluster and distMatrix can be downloaded as text files. Alternatively, you may simply copy and paste the code below. For more information about what is k means clustering, how the algorithm works, and numerical example of this code, or application to machine learning and other resources in k means clustering, your may visit the Content of this tutorial

GIS and Multicriteria Decision Analy... Jacek Malczewski Best Price $130.55 or Buy New $141.49

Privacy Information

function y=kMeansCluster(m,k,isRand) %%%%%%%%%%%%%%%% % % kMeansCluster - Simple k means clustering algorithm % Author: Kardi Teknomo, Ph.D. Geographic Data Mining % and Knowledge... % Purpose: classify the objects in data matrix based on the attributes Harvey J. Miller, ... % Criteria: minimize Euclidean distance between centroids and object points Best Price $81.80 % For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html or Buy New % Output: matrix data plus an additional column represent the group of each object % % Example: m = [ 1 1; 2 1; 4 3; 5 4] or in a nice form Privacy Information % m = [ 1 1; Visit Tutorials below: % 2 1; Adaptive Learning from % 4 3; Histogram % 5 4] Adjacency matrix % k=2 Analytic Hierarchy Process % kMeansCluster(m,k) produces m = [ 1 1 1; (AHP) % 2 1 1; ArcGIS tutorial % 4 3 2; Arithmetic Mean % 5 4 2] Bayes Theorem % Input: Bootstrap Sampling % m - required, matrix data: objects in rows and attributes in columns Bray Curtis Distance Break Even Point % k - optional, number of groups (default = 1)

1 of 3

1/19/2012 2:38 PM

K-Means Clustering Tutorial: Matlab Code

http://people.revoledu.com/kardi/tutorial/kMean/matlab_kMeans.htm

Chebyshev Distance % isRand - optional, if using random initialization isRand=1, otherwise input any number (default) City Block Distance % it will assign the first k data as initial centroids Conditional Probability % Continued Fraction % Local Variables Data Analysis from % f - row number of data that belong to group i Questionnaire % c - centroid coordinate size (1:k, 1:maxCol) Data Revival from % g - current iteration group matrix size (1:maxRow) Statistics % i - scalar iterator Decimal to Rational % maxCol - scalar number of rows in the data matrix m = number of attributes Decision tree % maxRow - scalar number of columns in the data matrix m = number of objects Difference equations Digital Root % temp - previous iteration group matrix size (1:maxRow) Discriminant analysis % z - minimum value (not needed) Divisibility %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Eigen Value using Excel Euclidean Distance if nargin<3, isRand=0; end Euler Integration if nargin<2, k=1; end Euler Number Excel Iteration [maxRow, maxCol]=size(m) Excel Macro if maxRow<=k, Excel Tutorial y=[m, 1:maxRow] Feasibility Study Financial Analysis else Generalized Inverse Generalized Mean % initial value of centroid Geometric Mean if isRand, Ginger Bread Man and p = randperm(size(m,1)); % random initialization Chaos for i=1:k Graph Theory c(i,:)=m(p(i),:) Growth Model end Hamming Distance else Harmonic Mean for i=1:k Hierarchical Clustering c(i,:)=m(i,:) % sequential initialization Independent Events Incident matrix end Jaccard Coefficient end Kernel basis function Kernel Regression temp=zeros(maxRow,1); % initialize as zero vector k-Means clustering K Nearest Neighbor while 1, LAN Connections Switch d=DistMatrix(m,c); % calculate objcets-centroid distances Learning from data [z,g]=min(d,[],2); % find group matrix g Lehmer Mean if g==temp, Linear Algebra break; % stop the iteration Logarithm Rules else Mahalanobis Distance Market Basket Analysis temp=g; % copy group matrix to temporary variable Mean Absolute Deviation end Mean and Average for i=1:k Mean, median, mode f=find(g==i); Minkowski Distance if f % only compute centroid if f is not empty Minkowski Mean c(i,:)=mean(m(find(g==i),:),1); Monte Carlo Simulation end Multi Agent System end Multicriteria decision end making Mutivariate Distance Newton Raphson y=[m,g]; Non-Linear Transformation Normalization Index end Normalized Rank Ordinary Differential The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. It works for multi-dimensional Euclidean Equation distance. Learn about other type of distance here. PI Power rules function d=DistMatrix(A,B) Prime Factor %%%%%%%%%%%%%%%%%%%%%%%%% Prime Number % DISTMATRIX return distance matrix between points in A=[x1 y1 ... w1] and in B=[x2 y2 ... w2] Q Learning % Copyright (c) 2005 by Kardi Teknomo, http://people.revoledu.com/kardi/ Quadratic Function % Rank Reversal % Numbers of rows (represent points) in A and B are not necessarily the same. Recursive Statistics % It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway), Regression Model % Reinforcement Learning % A and B must contain the same number of columns (represent variables of n dimensions), Root of Polynomial % first column is the X coordinates, second column is the Y coordinates, and so on. Runge-Kutta Scenario Analysis % The distance matrix is distance between points in A as rows Sierpinski gasket % and points in B as columns. Sieve of Erastosthenes % example: Spacing= dist(A,A) Similarity and Distance % Headway = dist(A,B), with hA ~= hB or hA=hB Solving System Equation % A=[1 2 3; 4 5 6; 2 4 6; 1 2 3]; B=[4 5 1; 6 2 0] Standard deviation % dist(A,B)= [ 4.69 5.83; Summation Tricks % 5.00 7.00; System dynamic % 5.48 7.48; Time Average % 4.69 5.83] Tower of Hanoi % Variance % dist(B,A)= [ 4.69 5.00 5.48 4.69; Vedic Square Visual Basic (VB) tutorial % 5.83 7.00 7.48 5.83] What If Analysis %%%%%%%%%%%%%%%%%%%%%%%%%%%

[hA,wA]=size(A); [hB,wB]=size(B); if wA ~= wB, error(' second dimension of A and B must be the same'); end

2 of 3

1/19/2012 2:38 PM

K-Means Clustering Tutorial: Matlab Code

http://people.revoledu.com/kardi/tutorial/kMean/matlab_kMeans.htm

for k=1:wA C{k}= repmat(A(:,k),1,hB); D{k}= repmat(B(:,k),1,hA); end S=zeros(hA,hB); for k=1:wA S=S+(C{k}-D{k}').^2; end d=sqrt(S);
For more interactive example, you may use the K means program that I made using VB

< Previous | Next | Contents>

2006 Kardi Teknomo. All Rights Reserved. Designed by CNV Media

3 of 3

1/19/2012 2:38 PM

S-ar putea să vă placă și