Are you about to hire a team of AI specialists for the project or want to learn more about our workflow while delivering an AI project? Either way, this article should answer some of your questions, so keep on reading to broaden the knowledge.
In this article you’ll find:
- What an AI project looks like behind the scenes
- General workflow that you can implement in your projects
- A bit of insight into communication with a client
- Hopefully some entertainment 😉
I’m going to walk you through one of our real SPIKES, as we call self-contained mini projects for our clients. It’s not an idealized case of a typical process. Everything here is the naked truth based on a heartless Jira report. I must warn you that the problem we were solving is domain-specific but still you should be able to get an idea. The process itself is quite generic and we successfully apply it to the majority of our projects.
If you find the story confusing, the following guide should help. Note that terms are defined in the context of this article.
Data representation – numerical features extracted from raw data (here images and videos) that can be used for training ML models.
Clustering – grouping individual data points based on their similarity.
ResNet50 – AI model trained for image classification, it can be used also for feature extraction to represent images (https://arxiv.org/pdf/1512.03385.pdf).
C3D – AI model trained for scene classification, which can also be used for feature extraction to represent videos (http://vlg.cs.dartmouth.edu/c3d/c3d_video.pdf).
It’s the middle of a famous Polish winter that makes our New York’s client curious. We’ve been working for a while on a complex predictive model pushing the boundaries of its accuracy, solving technical issues and testing the model from different angles. Now, we want to check if a different method of representing input video data could help us. This is the goal of our SPIKE – to learn if we should change our model’s input or not.
Phase 0 – Planning
Be on the same page with the rest of the team
Before we even start, our Product Owner asks us what it means to represent a video and why we want to conduct this analysis at all. Although it’s a very reasonable question, I can sense nervousness and astonishment of my colleagues resulting in an extended moment of silence. It may even look like we don’t know the fundamental concept we’re about to analyze in the next few days. To be honest, yes, we don’t know the answer, although the concept itself is perfectly clear to all of us, which makes the situation even more tense. We’re so used to hermetic terminology that we assume it’s standard in daily communication. No wonder we end up in awkward situations from time to time. But we constantly strive for improvement in this matter.
Simply put, a video is a sequence of still images. Every image consists of pixels that are represented by numerical values. We could use those raw pixel values as the input but it wouldn’t be efficient. It would require our model to perform two tasks – learn to “understand” raw pixels and predict metrics of our interest. We want to make it easier for our model so we rely on existing components.
Know WHY you want to work on the problem and communicate it to the client
The solution that we’re currently using is based on the ResNet50 model. The model was trained on over 14 million images to classify objects to one of 1000 classes. Although we don’t need to classify any objects, we can use internal states of the model to extract meaningful information from images. If that model can use them to make predictions, they could be useful for our model as well.
The first solution took us really far, but we started to wonder if we could squeeze more out of our model by using different processing of videos. Convolutions 3D are very promising and there is a model prepared by Facebook’s team that we could use. Their paper says it allows for creating general representations of videos. We’re excited about it and can’t wait to check if it works any better than our standard, simpler method.
A great success, we’re on the same page with the rest of the team, the Product Owner thinks it’s a good idea and the client accepts our plan.
Prepare more detailed plan to better manage the process
As the planning session was scheduled for the evening, once it’s finished, it’s almost the end of the day. We’re preparing a list of subtasks to make the progress more transparent for our Scrum Master and have a better picture of what has to be done for our own reference. We have plenty of time to think in the subway when commuting back home, in our free time and before falling asleep as this is exactly what happens when you’re working on tasks that genuinely excite you. I’m making the full list of subtasks with a note that it requires an additional description to be provided later.
- Prepare the environment
- Collect video metadata
- Implement video processing
- Extract ResNet features
- Extract C3D features
- Reduce dimensionality and cluster representations
- Analysis – compare representations
- Write the summary
Phase 1 – Actual work
Remember about the reproducibility
We usually start new projects from preparing the environment we’ll be working in. We strive for high reproducibility or at least reduction of the number of cases when someone tries to run our code and it doesn’t work on their machine. We could probably just use virtual environments but we prefer Docker just to be sure that others will be able to easily recreate the same exact environment. This part rarely causes any trouble, but when it does, dealing with it may be time-consuming. Sometimes we even have to ask backend engineers for help. Fortunately, there are many here at Tooploox eager to help. When we’re done with the setup and every line of code we write from now on could be easily executed anywhere, we can move forward.
Something unexpected definitely WILL happen
As in life, some things go smoother than expected and some seemingly easy things turn out to be major hurdles, like subtask number five – C3D feature extraction. It the case you don’t remember, it’s the alternative model we want to compare our current solution with. It was the model created by the Facebook AI team, which we’ve read about some time ago and assumed it’s easily accessible like many other Facebook AI projects. Except that it isn’t!
The official model can be used only with a framework forgotten by humanity a long time ago and I must have missed its resurrection. We don’t have time to learn the framework to the point when we are able to comfortably introduce required modifications to the model. If it was a life or death situation, we might come up with a solution subconsciously. But it isn’t, so we have to think what to do, think hard.
It takes the whole evening and it’s winter so evenings are long, but we’ve got it. The model was serialized in a different format but at least it wasn’t encrypted so it is possible to convert it to our framework. Someone even did it already, hurray! Unfortunately, he didn’t care about reproducibility so as we expected, the code didn’t work. But no need to worry – we already know the direction so the rest is easy. Finally, we have all we need and we can proceed with the analysis.
You can’t have it all
It is said that data preparation takes about 80% of the time in ML projects. If there were no deadlines, it would mean that data preparation is a major part of the project. But deadlines do exist and data preparation indeed takes a lot of time so we have to carefully select experiments to perform. It would be much easier if we were clairvoyants. Instead of predicting, we decide to list all the future work in final reports to let the clients know about the other possibilities. Quite often they become our tasks in next sprints.
Phase 2 – Communicating the results
It’s not a big deal, right? Just show a chart or two and tell the audience the results are fascinating. Don’t worry, they’ll never ask you about the fascinating dots scattered randomly on a white background. They’ll also never hire you again.
Visualization and communication of results are probably even more important than all the previous steps. There is the whole field of the science behind it! For example, you should know that it’s easier to perceive differences when they’re represented by positioning rather than color or that a single line on a chart may clutter the message or that you can easily lie with your visualizations.
In the Spike, we present plots with the results of clustering the obtained data using two different methods. You guessed it, we show scattered dots. But since we don’t want this to be our last project for the client, we also add one chart with the ideal scenario for comparison. Now we have three charts with scattered dots but we believe everyone knows what’s going on. We have two experiments and the ideal case. The chart with the results of the experiment resembles more the ideal case so it’s better – the client can see it, success! Of course, we also use some metrics like an adjusted rank score to tell which clustering is better. But oddly enough, no one seems to care about the metrics.
Phase 3 – Retrospective
Now is the time when we meet with the team and share our thoughts about the process, the things that went well and those that went wrong.
What surprised us in our project was the unavailability of one of the models, so it’s definitely something to avoid in the future. At the same time, we managed to follow all the steps that were necessary to answer the business questions and communicate them to the client.