Game of Thrones Season 8 has just come to an end, and it was emotional! How emotional? Using face detection and emotion recognition we analyzed which scenes evoke certain reactions among fans of the series. In this post, we are presenting how to measure emotions of over 200 viewers spending no money on surveys. And how to close a project in a week!
You have probably already noticed wide popularity of Game of Thrones. It has over 1.5 million rates on IMDB! What is also interesting is that videos of people reacting to the show have gained incredible popularity on the internet — there are even dedicated YouTube channels publishing fans’ reactions with clips reaching millions of views!
This fascinating phenomenon enabled our AI team to collect a dataset of 18086 frames with subjects reacting to various moments of the final season of Game of Thrones.
Having the dataset, we began data cleaning — the beloved phase of all data scientists and machine learning engineers.
As one may expect, YouTube videos tend to be quite irregular in their structure. Typically, in a reaction video there are viewers sitting on a couch and a frame from the movie of interest as an overlapping layer.
What is even more challenging, viewers may have some additional comments they share while pausing the show. As they were of no interest to us, we decided to ignore those fragments and filter only the parts of YouTube videos which contain scenes of Game of Thrones.
The next steps were to cut out the overlapping frame and match the frame with the scene in the corresponding episode. In parallel, we combined two pretrained solutions: face-detection and emotion recognition models.
To sum this up, the established pipeline was as follows:
- GoT frame detection
- Frame matching
- Face detection
- Emotion recognition
GoT frame detection
How difficult can it be to detect rectangles on an image? Well… it can be nearly impossible if there are no actual rectangles at all. Edges between a YouTube video’s background and overlaid GoT scenes are visible only due to the optical illusion called illusory contours. In this task, we had to rely a little bit on the incredible vision of humans and make some annotations. Thankfully, it was enough to manually mark coordinates only on a single frame per YouTube video. It was also a great opportunity to get more familiar with the data.
Having the coordinates, we could use Canny Edge Detection and Hough Line Transform to analyze every video frame by frame and check if we’re dealing with a shape limited by horizontal and vertical lines or a rectangle as some may call it. This way, using classical Computer Vision algorithms, we were able to not only extract GoT scenes but also detect fragments of videos where GoT wasn’t displayed. Perfect!
Are you waiting for Deep Learning? Not yet! We used a difference hash to encode each frame from Game of Thrones as a 64-bit number and map it to the time when it was displayed. We created a mapping like that for each episode separately.
Now, to perform the actual matching of a frame extracted from a YouTube video we encode it using the same difference hash function and search for the most similar one in the previously created dictionaries.
If you wonder how we define the similarity, it’s Hamming distance between bits represented as zeros and ones.
Smoothing Time Series Data
Algorithms (like people) are sometimes imperfect. First two steps produced good results but they were a bit noisy. Below you can see an example.
On the x axis we plot the time in a YouTube video and on the y axis the time in an episode of Game of Thrones. As long as authors of YouTube videos don’t edit them in a crazy manner, we would expect the scenes from GoT to be present in the same order they appear in the show. Having said so, we knew we needed a straight line here and we didn’t want to affect correctly matched points. We managed to create such a method for smoothing our data by deleting anomalies. Just look at the results!
As we wanted to determine emotions expressed by each person in the video, we actually needed to … inspect their faces. We extracted 18086 frames with GoT viewers in them .It means that we had to automate the whole process of face detection – it was too daunting to do it manually. We used an already trained face detection model that allowed us to extract a rectangle with a face from each frame. We used a well-established Python library, which made the whole process fast and seamless. We could choose between HOG features-based detector or Convolutional Neural Network. We had 8x 1080Ti GPUs available so we went for deep learning without any hesitation.
Once we detected faces of people watching the episode in the YouTube video, it was finally time to detect their emotions. We used the model already trained on popular FER dataset available on Kaggle. It consists of 48×48 pixel grayscale images of faces annotated with one of seven emotions -anger, disgust, fear, happiness, sadness, surprise, neutral. We used model publicly available on GitHub that was accompanied by a thorough explanation so we had a lot of trust in the model itself.
Oh, and if you want to know how AI understands human emotions, you might be interested in this post by Ivona, our AI researcher.
Almost done! The time has come to see what we’ve got. Analyzing the emotional profile of GoT viewers alone would have been great but we decided to go a step further. We found on GitHub data including the exact times when a given character appears in the show. We couldn’t resist!
Viewers aversion to Daenerys increased over time
You may get mad from time to time but you must agree that slaughtering an entire city wouldn’t make you a very likable person. The figure below clearly shows an upward trend of aversion to Dany. If she only had Data Science she would know before it was too late…
Chances are the dragons made you happy
Scary or not, for some reason the dragons often appeared in scenes that made people laugh. Was the dragon looking at John and Dany kissing funny or was it his sigh expressing pure disapproval? We don’t know, but dragons appear among the top characters causing happiness. The figures below present characters who were evoking a given emotion the most.
By adding a second dimension, we can see not only how often a character evoked a given emotion but also how “emotionally intense” he or she was in general in the whole season.
So, here we are – we hope you are convinced that it is possible to deliver a successful project in a week, having a lot of fun along the way. If you want to follow in our footsteps, consider our protips that we learned on the go:
- Use what is already available before working on custom solutions.
- Better done than perfect – time boxing is the king.
- Smart division of tasks and parallelization is a must when you want to deliver something fast, but you need the right team to benefit from that.
- Be realistic and critical. We’re aware that there are many biases in our analysis. For example, people tend to be happier when watching movies with their friends.
- Have fun!
This post was written by joint efforts of Agnieszka Sitko, Adam Słucki & Dominika Basaj.
Wanna discuss emotion recognition or any other AI-related topic? Let’s talk! 🙂
Read also about Augmenting AI image recognition with partial evidence