Sunteți pe pagina 1din 3

PROCESS OF ML CODE/ALGORITHM

Import  Instantiate Fit  Transform/Predict

 Import the required/necessary libraries/modules


 Instantiate the required variable
 Fit the input/independent variable with the output/dependent variables

KNN Algorithm

 Supervised
 Lazy learning algorithm – It completely learns from training data only
 KNN Classifier sets the benchmark for other complex classifiers such as Artificial neural networks
(ANN) and support vector machines (SVM)
 It is slow
 Good domain expert on the team is a must in order to use KNN in order to select appropriate set
of features/training set
 KNN can be used for classification as well as regression
 Whenever we have a new point to classify we find the K nearest neighbors from the training
data set.
 The distance calculation in KNN can be done in 3 way:
 Euclidean distance
 Minkowski distance
 Mahalanobis distance
 Output of KNN is predicted class

KNN TYPE I – INPUT TEST SAMPLE METHOD:

Entire dataset is fed to the model as training dataset  a single test sample input/ feature provided by
user is provided to the model  Test sample is matched against each training data  a class is
predicted for test sample based on the comparison and the distance (nearest)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',


metric_params=None, n_jobs=1, n_neighbors=3, p=2, weights='uniform')
 algorithm: {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

 ‘ball_tree’ will use BallTree


 ‘kd_tree’ will use KDTree
 ‘brute’ will use a brute-force search.
 ‘auto’ will attempt to decide the most appropriate algorithm based on the
values passed to fit method.

 leaf_size : int, optional (default = 30)

Leaf size passed to BallTree or KDTree. This can affect the speed of the
construction and query, as well as the memory required to store the tree. The optimal
value depends on the nature of the problem.

 metric : string or callable, default ‘minkowski’

The distance metric to use for the tree. The default metric is minkowski, and with p=2 is
equivalent to the standard Euclidean metric.

 metric_params : dict, optional (default = None)

Additional keyword arguments for the metric function.

 n_jobs : int, optional (default = 1)

The number of parallel jobs to run for neighbor’s search. If -1, then the number of jobs
is set to the number of CPU cores. Doesn’t affect fit method.

 n_neighbors : int, optional (default = 5)

Number of neighbors to use by default for kneighbors queries.

An odd value is preferred to avoid tie.

 p : integer, optional (default = 2)

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using
manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p,
minkowski_distance (l_p) is used.

 weights : str or callable, optional (default = ‘uniform’)


weight function used in prediction. Possible values:

 ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
 ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors
of a query point will have a greater influence than neighbors which are further away.
 [callable] : a user-defined function which accepts an array of distances, and returns an
array of the same shape containing the weights.

S-ar putea să vă placă și