
knn classifier implementation with r caret package
Knn classifier implementation in R with caret packageIn this article, we are going to build a Knn classifier using R programming language. We will use the Rmachine learning caret package to build our Knn classifier. In our previous article,we discussed the core concepts behind K-nearest neighbor algorithm. If you don’t have the basic understanding of Knnalgorithm, it’s suggested to read our introduction to k-nearest neighbor article.
For Knn classifier implementation in R programming language using caret package, we are going to examine a wine dataset . Our motive is to predict the origin of the wine. As in our Knnimplementation in R programmingpost, we built a Knn classifier in R from scratch, but that process is not a feasible solution while working on big datasets.
To work on big datasets, we can directly use some machine learning packages. Developer community of R programming language has built some great packages to make our work easier. The beauty of these packages is that they are well optimized and handling maximumexceptionsto make our job simple, we just need to call functions for implementing algorithms with theright parameters. For machine learning caret package is a nice package with proper documentation.
The principle behind KNN classifier (K-Nearest Neighbor) algorithm is to find K predefined number of training samples that are closest in the distance to a new point & predict a label for our new point using these samples.
Euclidean DistanceThe most commonly used distance measure is Euclidean distance . The Euclidean distance is also known as simply distance. The usage of Euclidean distance measure is highly recommended when data is dense or continuous. Euclidean distance is the best proximity measure.
KNN classifier is also considered to be an instance based learning / non-generalizing algorithm. It stores records of training data in a multidimensional space. For each new sample & particular value of K, it recalculates Euclidean distances and predicts the target class. So, it does not createa generalized internal model.
Caret Package InstallationThe R programming machine learning caret package( C lassification A nd RE gression T raining) holds tons of functions that helps to build predictive models. It holds tools for data splitting, pre-processing, feature selection, tuning and supervised unsupervised learning algorithms, etc. It is similar to sklearn library in python.
For using it, we first need to install it. Open R console and install it by typing:
install.packages("caret")caret package provides us direct access to various functions for training our model with various machine learning algorithms like Knn, SVM, decision tree,linear regression, etc.
Knn implementation withcaretpackage Wine Recognition Data Set Description
Wine recognition with knn in R
For this experiment, wines were grown in the same region in Italy but derived from 3 different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. We have a dataset with 13 attributes having continuous values and one attribute with class labels of wine origin.
Using the wine dataset our task is to build a model to recognize the origin of thewine. The original owners of this dataset areForina, M. et al. , PARVUS,Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. This wine dataset is hosted as open data on UCI machine learning repository.
The 13 Attributes of the dataset are: Alcohol Malic acid Ash Alkalinity of ash Magnesium Total phenols Flavanoids Nonflavonoids phenols Proanthocyanins Color intensity Hue OD280/OD315 of diluted wines ProlineThe attribute with theclass label is at index 1. It consists of 3 values 1, 2 & 3. These class labels are going to be predicted by our KNN model.
Wine Recognition Problem Statement:To model a classifier for classifying the origin of the wine. The classifier should predict whether thewine is from origin “1” or “2” or “3”.
Knn classifier implementation in R with Caret Package R caret Library:For implementing Knn in r, we only need to import
caret package. As we mentioned above, it helps to perform various tasks to perform our machine learning work.
library(caret) Data Import:We are using wine dataset from UCI repository . For importing the data and manipulating it, we are going to usedata frames. First of all, we need to download the dataset.
dataurl <- "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data" download.file(url = dataurl, destfile = "wine.data") wine_df <- read.csv("wine.data", header = FALSE)In dataurl vector, we areputting URL of our wine data. Using download.file() method, we can download the data file from URL. For downloading, URL of data and destination file name should be mentioned as theparameters of download.file() method. Here, our destfile parameter is set with value “wine.data”.
For importing data into an R data frame, we can use read.csv() method with parameters as a file name and whether our dataset consists o the 1st row with a header or not. If a header row exists then, the header should be set TRUE else header should setto FALSE .
For checking the structure of data frame we can call the function str over wine_df:
> str(wine_df) 'data.frame': 178 obs. of14 variables: $ V1 : int1 1 1 1 1 1 1 1 1 1 ... $ V2 : num14.2 13.2 13.2 14.4 13.2 ... $ V3 : num1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ... $ V4 : num2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ... $ V5 : num15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ... $ V6 : int127 100 101 113 118 112 96 121 97 98 ... $ V7 : num2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ... $ V8 : num3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ... $ V9 : num0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ... $ V10: num2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ... $ V11: num5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ... $ V12: num1.04 1.05