Naive Bayes is a supervised algorithm, trained with the help of training data. It is a comparatively fast classification algorithm related to others. Before getting to know what exactly naive Bayes does, let us understand what conditional probability is.
P(X|Y) = P(Y|X).P(X) / P(Y)
P(X) denotes the probability that Hypothesis (H) is true. This hypothesis probability is also called prior probability.
P(Y) denotes the evidence probability.
P(X|Y) denotes the evidence probability, when it is given that H is correct.
P(Y|X) denotes the hypothesis probability when it is given that evidence is correct.
Bayes theorem works on conditional probability. It is the probability of an event X occurring when we already know that event Y has occurred.
A classifier is something that differentiates objects or classes on the basis of some specific features of the independent variables. It is the classifier that comes under Bayes theorem. The probability of each class is determined as a membership probability, denoting the probability of the number of data points associated with a specific class.
The class that has max probability is stated as the suitable class. This is also called Maximum A Posterior, or MAP.
Figure 1
P(E) is the probability evidence, used in the normalization of the result. It will not affect the result if P(E) is removed.
Naïve Bayes classifiers actually tell us that all the independent variables in the dataset are not related to each other. The availability or unavailability of any independent variable does not create an impact on the rest of the variables present. Let’s see a scenario.
A fruit that is round, red, and has a diameter of 4” can be considered an apple.
In the above scenario, all the characteristics are actually related to each other but the naive Bayes classifier will consider them independently.
The probabilities that are included in the naïve Bayes model are as follows:
Class Probabilities: The probability of each single class that’s present in training data.
Conditional Probability: The conditional probability of every input independent value when a single class value is given.
Naive Bayes model learns from the training data in no time. This is because probability of every class needs to be calculated given the conditional probability of the input values. Like other algorithms, no coefficients need to be put in place and optimized.
The class probability is calculated by the quantity of instances belonging to a specific class, divided by the full count of instances.
Example: Binary classification will be calculated as:
P(1) = count (class is 1)/ count(class =1) + count(class=2)
In the above scenario the probability of each class is going to be 50% as it’s a binary classification.
The conditional probability is the number of each input value for a given class divided by the number of total instances of that class.
Example:
The weather column has values rain and sun, and there is another column called class that has values staying-home and going-out. Let’s calculate the conditional probability of the weather column with respect to the class column.
P(weather = rain | class = staying-home) = count(number of instances with weather = rain and class =staying-home)/ count( number of instances with class =staying-home)
P(weather = sun | class = staying-home) = count(number of instances when weather = sunny and class =staying-home) /count(number of instances class = staying-home)
P(weather = rain | class = going-out) = count(number of instances with weather = rain and class = going-out) / count(number of instances of class = going-out)
P(weather = sun | class = going-out) = count(number of instances with weather = sun and class = going-out)/ count(number of instances with class = going-out)
Taking our case above, suppose we have an instance where weather = Sun, then the calculation can be:
Going-out = P(weather = sun| class= going-out)* P(class = going-out)
Staying-home = P(weather= sun | class= staying-home) * P(class = staying-home)
After this we can normalize this calculation into probability:
P(going-out| weather = sun) = going-out/ (going-out + staying-home)
P(staying-home|weather = sun) = staying-home/ going-out + staying-home)
It is used to solve prediction problems that have multiple classes.
Naive Bayes performs the best if the data has input as categorical variables.
Naive Bayes only needs a small amount of training data to approximate the test data. Thus, the training time of Naïve Bayes is less.
Naive Bayes automatically assumes that every variable is not correlated or mutually independent on its own. In real life scenarios, that does not happen.
If the test data has a categorical variable that was not seen in train data, then the model won’t be able to make a prediction about it and will assign a zero probability to it.
Naive Bayes is a linear classifier unlike K-NN. It is faster with big data as compared to K-NN as many calculations are required in K-NN at each step.
Logistic regression is somewhat better than naive Bayes if we compare collinearity, as naïve Bayes expects all features to be independent.
It can be used for real time prediction as the algorithm is pretty fast.
It is used in text classification, weather prediction as in the above example, medical diagnosis, etc.
As it is good for solving problems with multiple classes, therefore it’s good for sentiment analysis for identifying positive and negative sentiments.
Code:
1 2 3 4 5 6
From sklearn import datasets #loading the dataset Wine = datasets.load_wine() #print the name of the features Print(“Features: “, wine.feature_names)
Output:
1
Features: [‘alcohol’, ‘, ‘ash’, ‘alcalinity_of_ash’, ‘malic_acid’‘ magnesium’, flavanoids’‘ total_phenols’, ‘nonflavanoid_phenols’, ‘color_intensity’, proline’, diluted_wine]
Code:
1 2
#Print the label of the wines Print(“Labels: “, wine.target_names)
Output:
1
Labels: [‘class_0’‘ class_1’‘ class_2’]
Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#Splitting data into training and test set
from sklearn.model_selection
import train_test_split
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size = 0.2, random_state = 107)
#Importing naïve bayes model
from sklearn.naive_bayes
import GuassianNB
gn = GuassianNB()
gn.fit(x_train, y_train)
#predict the response
Y_pred = gn.predict(x_test)
#evaluate the model
From sklearn
import metrics
Print(“Accuracy: “metrics.accuracy.score(y_test, y_pred))
Output:
Accuracy: 0.9074074074074074