Importing necessary libraries

Uploading text file

Setting column name as ID, Revies and Sentiment and arranging data by reading text file imported above

Removing all the special characters

Plotting graph using seaborn with sentiments as Positive and and Negative Reviews

Splitting Train,Test and Development data We have 448 Train set,150 Test Set and 150 Development


Plotting with Train,Test and Develpoment Graph of Negative and Positive Sentiments

Now, need to create a vocabulary list

Function to omit word having count less than 5

Now, counting and displaying negative and positive words,

a) Counting Negative words with Sentiment =0 and removing words with 5 or less reviews

b) Now counting Positive words and Omitting words having reviews less than or equal to 5

Calculating probabilities of positive a well as negative words before manipulations

Printing all words with reviews greater than 5

Probability of occurence of 'the' in wordlist, positive and negative list

Removing stop words like 'these','those','is','for' etc which has nothing to do with reviews

Probability of all word list

Conditional Probability of all the Positive words

As we have all the information available,Now we will perform prediction to check accuracy

Imeplementing five fold cross validation

SMOOTHING

Now,performing accuracy after smoothing

Finally we have accuracy of 71.11% which can be futher improved by removing stopwords.

Top words:

Contribution

Challenges

References