We implement three background subtraction algorithms in MATLAB: frame difference, approximate median, and mixture of Gaussians. We provide m-code and test videos showing how each works.
Part 2 uses Agility MCS to translate our MATLAB models to C tool. Also see MATLAB to C using MCS: Advanced topics, where we highlight advanced topics using the mixture-of-Gaussians background subtraction method.
Introduction
Many DSP designs start out in MATLAB and are then translated into C. Unfortunately, MATLAB has a number of quirks that make this translation a headache. MATLAB extends arrays without programmer intervention. It organizes matrices in column major order, while C uses row major order. It uses vector math which must be translated into loops. It indexes the first element of an array with '1' instead of '0'. And so on. All this leads to an error-prone, time-consuming translation process that typically doesn't begin until the MATLAB is 100% verified— lest the C programmers waste valuable time hunting MATLAB bugs.
So when Catalytic (now Agility), released their MATLAB to C tool (MCS) in early 2006, I was intrigued. As were the readers of the DSP DesignLine, where articles on automatic MATLAB to C conversion are some of the site's most popular.
Wanting to see what all the fuss was about, I contacted Agility and obtained an evaluation copy to try it out for myself.
Because MCS targets image/video processing, I chose to implement three types of background subtraction—a common first step in many video processing applications. The algorithms were implemented from scratch in MATLAB, then converted to C using MCS. The C code was then verified in MATLAB using mex files. (All m-files, generated C files, and the test video clip used—can be downloaded here).
In Part 1 of this 2-part series, I'll give a brief overview of background subtraction and go into detail on the three methods I chose to implement: frame differencing, approximate median, and mixture of Gaussians. Part 2 illustrates the MATLAB to C conversion process and offers impressions of the tool.
Background Subtraction
As the name suggests, background subtraction is the process of separating out foreground objects from the background in a sequence of video frames. Background subtraction is used in many emerging video applications, such as video surveillance (one of today's hottest applications), traffic monitoring, and gesture recognition for human-machine interfaces, to name a few.
Many methods exist for background subtraction, each with different strengths and weaknesses in terms of performance and computational requirements. Most were developed in university labs over the last couple decades (I'm guessing 99% of them in MATLAB). Unfortunately, as is often the case in academia, many of these methods are currently impractical for commercial application. Eigen-analysis on 10 seconds of video? May earn you a PhD, but it'll probably never leave the lab. For this evaluation, my goal was to implement three methods that were
Computationally efficient enough to make the leap from MATLAB to commercial application, and
A good representation of background subtraction implementations in today's video applications.
Since background subtraction is being implemented on a wide range of hardware—and thus within a wide range of computational budgets—I chose to implement methods of varying complexity;
Low-complexity, using the frame difference method,
Medium complexity, using the approximate median method, and
High-complexity, using the Mixture of Gaussians method.
The rest of this article focuses on detailing these three methods. For an overview of other background subtraction methods, this Power Point presentation gives a good high-level overview. For a more in-depth review and comparison of techniques, this paper is fairly comprehensive and was a good start for my MATLAB implementations.
Test video
Figure 1. Test video used for background subtraction
I shot the test video at an intersection near my house in San Francisco California. My aim was not to obtain the most representative test video for a particular use case. Rather, I simply wanted to get the most challenging video I could and see how the different methods handled it. As seen in Figure 1, the video contains many challenging elements, including traffic moving at fast and slow speeds, pedestrians, bicyclists, changes in light levels, and a waving rainbow flag. The video was shot at VGA resolution (640x480), 30 frames per second. Because background subtraction algorithms typically process lower resolution grayscale video, I converted the video to grayscale and scaled it to QVGA (320x240).