As I looked at the options available to me for the capstone project as part of Udacity’s Data Scientist for Enterprise course, one jumped out at me beyond the rest: classifying dog breeds using Convolutional Neural Networks (CNNs).
In my last role, I managed websites that allowed users to upload an image. Our expectation was that the images would be of a human’s face. However, many times people uploaded images of animals or objects or full bodies, etc., which were not what we wanted. These images had to be manually (by a human) rejected and an email sent to the user to reupload a new photo.
We tried implementing an automated rejection mechanism that could give users instant feedback on their image but were not able to get it working well enough. I thought this could be an opportunity for me to see how well I could do with something similar.
The main goal of this project was to build an algorithm that could take in a path to an image of a dog and classify the breed of the dog properly. As an added bit of fun, the algorithm could also determine whether the image was of a dog or a human or something else.
If the image is of a dog, then the algorithm will use a model to classify the breed.
If the image is of a human, then the algorithm will call that out and then still use the model to see what breed of dog the human most resembles.
Finally, if the image is not of a dog or a human, then the algorithm will report an error.
The main metric used to measure the performance of the algorithm is accuracy: is it accurate in determining that the photo is of a dog and, if so, is it then accurate in determining the proper breed of the dog?
This is a good metric to use here for two reasons:
- We want to be able to determine the actual dog breed so having an accurate response is important.
- The training dataset, as is shown later in this article, has a good balance of images across all of the different dog breeds so the model has been trained well to be able to accurately determine the breed.
This type of algorithm could be very useful for my old role in classifying uploaded images!
First, as any good computer or data scientist would do, I needed to create the building blocks for this algorithm.
All models need data and the model I built here was no different. Udacity supplied me with two datasets: one for dogs and one for humans.
The dataset for dogs included 8351 images and 133 total dog categories. This dataset was then further broken down into training, validation and testing datasets.
The dataset for humans included 13233 images.
Once we loaded in our datasets, we could then build functions to detect humans and dogs in images. Udacity provided the majority of the code for both of these functions.
To detect human faces, Udacity used OpenCV’s implementation of Haar feature-based cascade classifiers. Prior to calling it, the images were converted to grayscale. The human detector was able to properly classify 100% of the human images in our data subset. However, it did incorrectly classify 11% of the dog images in our data set as humans, so it is not perfect.
To detect dog faces, Udacity used a pre-trained ResNet-50 model along with weights that were trained on ImageNet. Due to this, we would be looking for the model to predict the image to have a dictionary key between 151 and 268, inclusive, as that is the range for dogs. The dog detector was able to properly classify 100% of the dog images in our data subset and was able to properly classify 0% of the human images in our data subset.
It was suggested to me that I visualize the frequency of dog breeds in the training data set, which you can see in the bar graph to the left. This shows that the training data set contains a good spread of images for all of the different breeds, which means that our model has a good amount of opportunities to learn the differences between the breeds. This also highlights why accuracy is a good metric for this model that we can actually validate because the training data is balanced across the different breeds.
As noted by Udacity, “when using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape (nb_samples, rows, columns, channels).”
- nb_samples is the total number of images
- rows, columns, and channels are related to the rows, columns, and channels for each image
Additionally, the images needed to be resized to 224 x 224 pixels and the RGB channels needed to be changed to BGR.
In this guided project, I was first asked to build a CNN from scratch. Udacity provided a sample model architecture and I built my model off of it because I thought it was a good architecture for this image classification problem. A CNN is best suited for working with two dimensional image data, which is what we have in this situation, so that is why we are using Conv2D layers. Additionally, we have Pooling layers because the feature maps generated by the Conv2D layers “are sensitive to the location of the features in the input” (link below) and so we need to do down sampling.
You can see the details of my model below:
After training this model through 5 epochs I was able to get an accuracy of 1.6746% on the test dataset. Not very promising, which highlights the difficulty of the problem we are trying to solve.
As this is such a difficult problem and really requires the model to have been trained on a lot of data to be successful, I refined my model by creating a CNN using Transfer Learning. I used a pre-trained ResNet-50 model.
This time, as I was using a pre-trained model, my model architecture was much simpler. After deciding to use ResNet-50, I copied the same planned architecture as was set up in this exercise for the VGG16 architecture because I think it makes sense: use the pre-trained Resnet50 model and then pool the data together to do down sampling. I did change the Dense layer from ‘softmax’ to ‘sigmoid’ as I think ‘sigmoid’ is better suited to our image classification problem.
After creating all of the building blocks above, actually writing the algorithm was very simple.
My algorithm takes in an image path and goes through a simple IF-ELSE-IF structure:
- IF it is identified as an image of a dog, respond with the predicted breed of the dog.
- ELSE IF it is identified as an image of a human, respond that the image appears to be a human and still provide the predicted breed of dog that the human most resembles.
In the event that the image fell through those conditions, an error message is output asking the user to submit another photo.
I tested this algorithm with the example photos provided by Udacity as well as some of my own photos and the results were pretty impressive: it cataloged most images correctly and was able to discern between humans and dogs.
Model Evaluation and Validation
After training my final CNN model through 20 epochs I was able to get an accuracy of 80.3828% on the test dataset. This was a huge improvement over the model I had built from scratch and significantly better than the 60% required by Udacity on this project!
Beyond just trusting that this accuracy was good, I also ran my algorithm against real-world images including the sample images provided by Udacity, images of myself and my coworkers, and images of objects. The results were pretty impressive: it cataloged most images correctly and was able to discern between humans and dogs. Additionally, when I uploaded a photo of M&M’S set up like Darth Vader it properly recognized that it was not a dog or a human.
Overall, the results were very promising and much better than I expected that I could accomplish on my own as someone who is still “new” to data science.
It makes sense that my CNN utilizing the pre-trained ResNet-50 model performed significantly better than the CNN I had built from scratch.
Beyond that, it seems that some of the dog images that were provided by Udacity were difficult for the model to classify correctly because the breeds are so close. For example, one dog image I ran through the algorithm was of an American Water Spaniel and the model thought it was a Boykin Spaniel.
I think that the results will continue to get better with more data for the model to learn from. Additionally, I think there are some improvements that could be made to the algorithm overall, which I note below.
Working through this project has given me the confidence that, using neural networks, there is a lot that my old team could do to improve the user experience on the website when uploading photos and decrease the amount of manual work needed in reviewing those photos in the back office.
Additionally, working on this project has given me confidence that I can solve real-world problems using data science even though I am just completing this course within Udacity. I think that is pretty amazing and speaks highly to the quality of the content in this course.
That said, I could not have come anywhere close to the results I achieved if it were not for “standing on the shoulders of giants”. I was able to use a pre-trained model in this project as well as benefit from the genius of others in providing simple tools and libraries for data science work.
If I were to improve upon this algorithm for the purposes of classifying dog breeds I would focus on three main areas:
- Providing a list of the top 3 dogs that it might be instead of just one answer. This would help in the situation where multiple breeds are very similar and almost impossible for a computer (or a human) to tell the difference.
- When the photo does not appear to be a dog or a human, note what the model thinks it might be. This would highlight the intelligence of the model.
- If this was set up within a web application it could provide more user-friendly responses and feedback to the user to improve results. This, in a real-world setting, would enable users to get the results they want quickly and easily.
This was a great initiative and I feel very proud of being able to complete this as my capstone project!