Machine Learning Final Report¶

1. Defining The Problem:¶

The problem this report is going to explore is to conduct sentiment analysis on Movie Reviews by either classifying them as a positive or a negative review. The idea is to explore the dataset and evaluate it before going through the rigourous process of creating different models and testing it.

The Dataset that will be used for this is the IMDB Movie Review dataset provided by keras library which consists of 50,000 reivews which has the sentiment anotation and the review of the movie which is great for the project.

The type problem this report is exploring is Binary Classification of Movie Reviews which will allow us to create an efficient model that will allow us to successfully predict if the review is positive or negative.

While the aims and objectives for this exploration are:

  • To split the data set and create a validation set from the test set.
  • Normalize the dataset while also vectorizing it and preparing the dataset for the model.
  • Creating a baseline model followed by a model that is better than the baseline.
  • Creating an overfitting model while also Regularising the model and fine tune it.
  • Finally Creating a final model with optimal settings and train it from scratch and test.

2. Measure of Success:¶

The measure success will be evaluated by measuring the:

  • Accuracy
  • Precision

This will allow us to get much more information about the model and will allow us to compare each of them appropriately, and also allows the metrics to be kept simple to compare to each other for our Binary Classification.

3. Evaluation Protocol:¶

The evaluation protocol used for this will be simple hold-out, since the dataset has 50,000 reviews, it is good enough to keep this simple and this will be less computationally expensive.

Now I am going to start with importing the dataset and then going ahead to prepare it before making the models.

In [ ]:
# Importing the dataset
from keras.datasets import imdb
import numpy as np
from keras import models, layers, optimizers, metrics
import matplotlib.pyplot as plt

# Importing the IMDB data
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(
num_words=10000)

4. Preparing the IMDB Data:¶

Below I first have vectorized the dataset and then hold-out some data for validation as planned before.

In [ ]:
# PREPARING THE DATA
# Function Vectorizing the the data
def vectorize_sequences(sequences, dimension=10000):
  results = np.zeros((len(sequences), dimension))
  for i, sequence in enumerate(sequences):
    results[i, sequence] = 1.
  return results

# Vectorizing the data using the function above
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

# Holding Out Data For Validation
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]

5. Building a Smallest Model:¶

Now, the next task is to beat the baseline for the IMDB dataset, and for this dataset, anything that is better than 0.5 accuracy is enough to beat the baseline model.

Model:¶

  • For the model, it is kept realy simple, just one layer and activation function used is "sigmoid".
In [ ]:
# CREATING THE SMALLEST MODEL
# Only 1 Layer model with SIGMOID activation function.
model = models.Sequential()
model.add(layers.Dense(1, activation='sigmoid', input_shape=(10000,)))

Settings:¶

The following settings I have for the model:

  • Optimizer: RMSPROP
  • Loss Function: Binary Cross-Entropy
  • Metrics: Accuracy and Precision

RMSProp works best and provides stable results while training, Adam was also used but RMSProp gave much better results hence used here and in the other models as well.

Since it is Binary Classification, hence the loss function used is Binary Cross-Entropy.

While the metrics are kept the same as planned before and they will be the same for other models as well.

In [ ]:
# Compiling, Keeping RMSPROP Optimizer, Binary Cross-Entropy Loss-Function
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy',metrics.Precision(name='precision')])

Training the model:¶

As seen below, only one epochs is used and batch size is kept at 32 which is the lowest minimum training for the smallest model. And as seen below in the output of this code, the accuracy is around 91% which is well above then baseline 50% of the IMDB reviews hence beating the baseline model.

In [ ]:
# Training the Model, Keeping the EPOCHS to 3 and Batch Size to 32
history = model.fit(partial_x_train,
partial_y_train,
epochs=3,
batch_size=32,
validation_data=(x_val, y_val))
Epoch 1/3
469/469 [==============================] - 5s 9ms/step - loss: 0.4795 - accuracy: 0.8355 - precision: 0.8382 - val_loss: 0.3789 - val_accuracy: 0.8709 - val_precision: 0.8576
Epoch 2/3
469/469 [==============================] - 2s 4ms/step - loss: 0.3197 - accuracy: 0.8951 - precision: 0.8883 - val_loss: 0.3176 - val_accuracy: 0.8836 - val_precision: 0.8931
Epoch 3/3
469/469 [==============================] - 2s 4ms/step - loss: 0.2648 - accuracy: 0.9086 - precision: 0.9035 - val_loss: 0.2915 - val_accuracy: 0.8903 - val_precision: 0.8760

Below we are plotting the Loss from training and validation, this plot gives some insight about the this smallest model. And right now as seen in the at the 3rd Epoch we can see that the model started to slightly overfit however it is quite stable and the difference between the Validation and Training Loss isn't much.

But this brings to one conclusion that we need have less EPOCHS for the final model because the model starts to ovefit quick with more layers and higher testing dataset.

In [ ]:
# Plotting the Loss for Valiation and Training
history_dict_small = history.history
loss_values = history_dict_small['loss']
val_loss_values = history_dict_small['val_loss']
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, 'bo', label='Training loss')
plt.plot(epochs, val_loss_values, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
No description has been provided for this image

Below are the graphs for Accuracy and Precision: Both of values are quite equivalent to each other and seems to show that the for less layers the model can still be accurate and have high precision.

In [ ]:
# Plotting the results for Training and Validation Accuracy
plt.clf()
acc = history_dict_small['accuracy']
val_acc = history_dict_small['val_accuracy']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
No description has been provided for this image
In [ ]:
# Plotting the results of Training and Validation Precision
plt.clf()
prec = history_dict_small['precision'] 
val_prec = history_dict_small['val_precision']
plt.plot(epochs, acc, 'bo', label='Training Precision')
plt.plot(epochs, val_acc, 'b', label='Validation Precision')
plt.title('Training and validation Precision')
plt.xlabel('Epochs')
plt.ylabel('Precision')
plt.legend()
plt.show()
No description has been provided for this image

6. Scaling up - Developing a Model That Overfits:¶

Now, the task is to create a model that overfits. In this section a model has the following settings:

  • 4 Layers and Last Activation Layer is Still Sigmoid.
  • The initial layers are Relu.
  • The units are kept to 32, which is higher than the smallest model we created.

Relu is chosen because it is consistent at performing and for this dataset, it worked well and proved to be better than Tanh activation.

In [ ]:
# Creating an overfit model
overfit_model = models.Sequential()
overfit_model.add(layers.Dense(32, activation='relu', input_shape=(10000,)))
overfit_model.add(layers.Dense(32, activation='relu'))
overfit_model.add(layers.Dense(32, activation='relu'))
overfit_model.add(layers.Dense(1, activation='sigmoid'))

The model is compiled the same way as before.

In [ ]:
# Compiling the Model
overfit_model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy',metrics.Precision(name='precision')])

The model is now being train with following settings:

  • 10 Epochs that allows it to train longer.
  • Batch Size of 512 which allows us to train with more data per Epoch and proves to be faster

This eventually is an overfitting model as the Validation loss started to increase after the 5th Epoch.

In [ ]:
# Training the Model with 10 Epochs and 512 Batch Size
overfit_history = overfit_model.fit(partial_x_train,
partial_y_train,
epochs=10,
batch_size=512,
validation_data=(x_val, y_val))
Epoch 1/10
30/30 [==============================] - 4s 114ms/step - loss: 0.5235 - accuracy: 0.7491 - precision: 0.7444 - val_loss: 0.3596 - val_accuracy: 0.8684 - val_precision: 0.8583
Epoch 2/10
30/30 [==============================] - 4s 128ms/step - loss: 0.3018 - accuracy: 0.8881 - precision: 0.8821 - val_loss: 0.2937 - val_accuracy: 0.8822 - val_precision: 0.8523
Epoch 3/10
30/30 [==============================] - 2s 79ms/step - loss: 0.2173 - accuracy: 0.9212 - precision: 0.9154 - val_loss: 0.2886 - val_accuracy: 0.8861 - val_precision: 0.9173
Epoch 4/10
30/30 [==============================] - 2s 59ms/step - loss: 0.1767 - accuracy: 0.9343 - precision: 0.9328 - val_loss: 0.2802 - val_accuracy: 0.8871 - val_precision: 0.8881
Epoch 5/10
30/30 [==============================] - 2s 52ms/step - loss: 0.1290 - accuracy: 0.9576 - precision: 0.9560 - val_loss: 0.2979 - val_accuracy: 0.8840 - val_precision: 0.8774
Epoch 6/10
30/30 [==============================] - 1s 46ms/step - loss: 0.1182 - accuracy: 0.9588 - precision: 0.9592 - val_loss: 0.3139 - val_accuracy: 0.8831 - val_precision: 0.8829
Epoch 7/10
30/30 [==============================] - 1s 44ms/step - loss: 0.0779 - accuracy: 0.9748 - precision: 0.9733 - val_loss: 0.3726 - val_accuracy: 0.8777 - val_precision: 0.8412
Epoch 8/10
30/30 [==============================] - 2s 52ms/step - loss: 0.0707 - accuracy: 0.9788 - precision: 0.9779 - val_loss: 0.3701 - val_accuracy: 0.8796 - val_precision: 0.8821
Epoch 9/10
30/30 [==============================] - 2s 73ms/step - loss: 0.0541 - accuracy: 0.9843 - precision: 0.9849 - val_loss: 0.3956 - val_accuracy: 0.8799 - val_precision: 0.8757
Epoch 10/10
30/30 [==============================] - 2s 52ms/step - loss: 0.0479 - accuracy: 0.9849 - precision: 0.9848 - val_loss: 0.4201 - val_accuracy: 0.8766 - val_precision: 0.8755

Our graph below shows the validation and training loss. As seen below, the validation loss just starts to increase after 5th Epoch and then peaks at 41.8% loss, this shows that the model is clearly overfitting.

In [ ]:
# Plotting the results of Validation and Training Loss
history_dict_overfit = overfit_history.history
loss_values = history_dict_overfit['loss']
val_loss_values = history_dict_overfit['val_loss']
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, 'bo', label='Training loss')
plt.plot(epochs, val_loss_values, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
No description has been provided for this image

We can see the training accuracy and precision increasing and but same values for validation are constant throughout before increasing a little, meaning this model is again clearly overfitting. This lays some foundation what I need for the final stable model to get the best accuracy and precision.

In [ ]:
# Plotting the results for Training and Validation Accuracy for the Overfit Model
plt.clf()
acc = history_dict_overfit['accuracy']
val_acc = history_dict_overfit['val_accuracy']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
No description has been provided for this image
In [ ]:
# Plotting the results for Training and Validation Precision for the Overfit Model
plt.clf()
prec = history_dict_overfit['precision']
val_prec = history_dict_overfit['val_precision']
plt.plot(epochs, acc, 'bo', label='Training Precision')
plt.plot(epochs, val_acc, 'b', label='Validation Precision')
plt.title('Training and validation Precision')
plt.xlabel('Epochs')
plt.ylabel('Precision')
plt.legend()
plt.show()
No description has been provided for this image

7. Regularizing and Tuning Hyperparameters¶

Now the next task is to create a model that neither overfits or underfits. This time I will scale down from the previous model and since we have both sides of the model, the on that underfits, the first model with one layer, and the second model that overfit showing that we need to find a middle ground.

For this model I will keep the following settings:

  • 3 layers with Relu Activation Function and Last Layer Activation Function being Sigmoid again.
  • 2 Dropout Layers for Regularization.
  • The Units are now decreased to 16, which seems to be a middle ground after using 32 before for the previous model.
In [ ]:
# STABLE MODEL
stable_model = models.Sequential()
stable_model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
stable_model.add(layers.Dropout(0.4)) # Drop Out Layer
stable_model.add(layers.Dense(16, activation='relu'))
stable_model.add(layers.Dropout(0.4)) # Drop Out Layer
stable_model.add(layers.Dense(1, activation='sigmoid'))

Well the compilation code is the same as before, I had tested few other models before coming to the conclusion that I don't need to use special learning rate in the optimizer and default values are good enough for the scope of this model.

In [ ]:
# Compiling the model and setting the optimizer
stable_model.compile(optimizer="rmsprop",
              loss='binary_crossentropy',
              metrics=['accuracy',metrics.Precision(name='precision')])

Now, the parameters we have here are the following:

  • 5 Epochs, because we saw the model just starts to overfit after 5 epochs.
  • Batch size of 512, because it was proved to be faster than keeping smaller size.

And finally the results do show that the model is not overfitting but will be further discussed in the graph section.

In [ ]:
# Training the Stable model with 5 Epochs adn 512 batch size
stable_history = stable_model.fit(partial_x_train,
partial_y_train,
epochs = 5,
batch_size=512,
validation_data=(x_val, y_val))
Epoch 1/5
30/30 [==============================] - 3s 68ms/step - loss: 0.6078 - accuracy: 0.6857 - precision: 0.6661 - val_loss: 0.4901 - val_accuracy: 0.8542 - val_precision: 0.8550
Epoch 2/5
30/30 [==============================] - 2s 53ms/step - loss: 0.4666 - accuracy: 0.8151 - precision: 0.7954 - val_loss: 0.3777 - val_accuracy: 0.8726 - val_precision: 0.8488
Epoch 3/5
30/30 [==============================] - 1s 38ms/step - loss: 0.3737 - accuracy: 0.8649 - precision: 0.8527 - val_loss: 0.3362 - val_accuracy: 0.8754 - val_precision: 0.8380
Epoch 4/5
30/30 [==============================] - 1s 38ms/step - loss: 0.3127 - accuracy: 0.8926 - precision: 0.8844 - val_loss: 0.2924 - val_accuracy: 0.8891 - val_precision: 0.8773
Epoch 5/5
30/30 [==============================] - 1s 39ms/step - loss: 0.2688 - accuracy: 0.9134 - precision: 0.9115 - val_loss: 0.2855 - val_accuracy: 0.8833 - val_precision: 0.9153

This graph below shows some Early underfitting in the start but then both of the graphs meet together at 5th Epoch, and shows that the model has generalized better after 5th Epoch and maybe even start to overfit after 6th Epoch so staying at 5 is the most optimal option.

In [ ]:
# Plotting the training and validation Loss results for the stable model
history_dict_stable = stable_history.history
loss_values = history_dict_stable['loss']
val_loss_values = history_dict_stable['val_loss']
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, 'bo', label='Training loss')
plt.plot(epochs, val_loss_values, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
No description has been provided for this image

Now looking at the precision and accuracy we can see that we are reaching a point of convergence where the model has extracted enough information from the data and seems like the model has stayed pretty stable throughout.

In [ ]:
# Plotting the training and validation accuracy results for the stable model
plt.clf()
acc = history_dict_stable['accuracy']
val_acc = history_dict_stable['val_accuracy']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
No description has been provided for this image
In [ ]:
#  Plotting the training and validation Precision results for the stable model
plt.clf()
prec = history_dict_stable['precision']
val_prec = history_dict_stable['val_precision']
plt.plot(epochs, acc, 'bo', label='Training Precision')
plt.plot(epochs, val_acc, 'b', label='Validation Precision')
plt.title('Training and validation Precision')
plt.xlabel('Epochs')
plt.ylabel('Precision')
plt.legend()
plt.show()
No description has been provided for this image

Testing and Evaluation:¶

In this section, now the first task is to retrain the previous model using the whole training data and finally evaluating it and then testing it.

In [ ]:
# FINAL TRAINING ON ALL OF THE DATA
# With 5 Epochs and 512 Batch Size
stable_history_final = stable_model.fit(x_train,
y_train,
epochs = 5,
batch_size=512)
Epoch 1/5
49/49 [==============================] - 2s 30ms/step - loss: 0.2681 - accuracy: 0.9077 - precision: 0.9093
Epoch 2/5
49/49 [==============================] - 1s 29ms/step - loss: 0.2287 - accuracy: 0.9247 - precision: 0.9266
Epoch 3/5
49/49 [==============================] - 1s 30ms/step - loss: 0.2003 - accuracy: 0.9337 - precision: 0.9358
Epoch 4/5
49/49 [==============================] - 2s 34ms/step - loss: 0.1740 - accuracy: 0.9432 - precision: 0.9463
Epoch 5/5
49/49 [==============================] - 1s 30ms/step - loss: 0.1559 - accuracy: 0.9514 - precision: 0.9566

Evaluation:¶

Now for evaluation, we can see the model has gotten 88.4% accuracy adn 88.7% of precision with a loss of 33% which compared to some of the other iterations I have tested the model with has better results.

Other paramaters such as more Epochs, or more layers resulted in more loss which isn't what we wanted and seems like having few Epochs with ample batch size and standard optimizers resulted in a good model which is accurate enough and seems like generalize the problem better than other models because the previous one just either underfit or overfit but testing them provided a clear to making this stable model e.g the model with 7 Epochs gave around 44% loss with similar accuracy.

While another I tested out was using L2 regularizers along with the dropout layer but for longer Epochs and they more or less gave the same results in some cases slightly worse hence keeping the simple dropout method as way to regularize the model was a better approach. But the point of this model is to reduce the loss as much as while being accurate and precise and this model achieves that.

In [ ]:
# Evaluating the stable model
results = stable_model.evaluate(x_test, y_test)
782/782 [==============================] - 2s 3ms/step - loss: 0.3300 - accuracy: 0.8840 - precision: 0.8870
In [ ]:
# Predicting some values
stable_model.predict(x_test)
782/782 [==============================] - 2s 2ms/step
Out[ ]:
array([[0.0795797 ],
       [0.99999917],
       [0.9914555 ],
       ...,
       [0.05077999],
       [0.04882602],
       [0.7836425 ]], dtype=float32)

References:¶

This notebook uses all the code from Chapter 1-4 of Chollet First Edition Book.

  • Chollet Deep Learning First Edition