Simple computer vision techniques can help us analyze real-time video streams automatically using computers that have low processing power given that the algorithms are implemented keeping the memory and processor constraints in mind. Computer vision on a Raspberry Pi has become easier than ever, with a rate as good as 10 – 12 frames per second while running several filters and dense optical flow algorithm variants on a real-time video stream. If we can achieve good results for a small computing platform like this, there are many potentially useful applications we can build.
How can we utilize the efficiency and mobility factor of a system designed to do computer vision at low processing power? In this case, however, we did not build the hammer first. There is an exciting frontier utilizing computer vision and some other Internet of Things (IoT) based analytics in urban space, shops, and retail stores to inform policy makers, shop owners, and general public about how they interact with the physical space. In one of our research efforts, we decided to explore these ideas a little deeper.
GoDaddy recently collaborated with the Social Computing group in the MIT Media Lab in one of our projects (Placelet) that aims to understand pedestrian and shop customer foot traffic dynamics. This will eventually shed some light on economic activities in a given area. We also expect that foot traffic analytics will fundamentally change the way small business owners understand and operate their businesses.
Technological tools to help researchers understand urban dynamics in the cities have existed for decades. The inventions that help us understand pedestrian and traffic dynamics in streets ranged from computer vision, ultrasonic (or other wave based) Time of Flight (ToF ) tracking, etc. to simple manual counting of how many pedestrians and cars pass by an intersection. This data has been used to inform city planners about the efficiency of their designs and also to suggest possible interventions in the city.
At the same time, there has been a boom in small businesses that operate in the urban landscape. Internet has given the small business owners opportunities to have an online identity (through domain names) and get rated by customers for their service (yelp.com, etc.). There has been a lot of research and businesses created based on analytics of website visitors, popularity, and aggregated ratings from different rating sites. However, what small business owners still lack is an analytics platform for the physical space their business resides in.
Google Analytics provides us with data about website traffic, but in our project, we aim to design a system that can provide the equivalent of Google Analytics for the physical space. Just like website visitors leave their mark in different pages by interacting with different HTML elements, customers of many stores (small grocery shops to retail chain shops) leave their physical footprint by interacting with the products on display. Before entering the shop, they may slow down and spend some time deciding to go in based on what’s displayed outside. After entering the shop, they move around different shelves, stop at certain shelves, and do not bother to stop for other ones. It would immensely help business owners to know how customers interact with different shelves in their store, if there are particularly popular spots, and also the factors that make a shelf or a spot popular. This kind of data can also help them design their shops better, deploy interventions at the right places (placing advertisements and promotion offers at the popular sites). We can also possibly understand how appealing the store’s current design is to different aged people.
The analytics part of this process can be accomplished using techniques that already exist in the computer vision community. We have used computer vision to detect customer flow and velocity profiles, while keeping the data anonymized by virtue of the system design. In this blog post, I will describe the system design, show some preliminary data analysis, and present a future direction for this kind of research.
Things computer vision can easily answer
Given a store’s position and some features in a scene shot in an outdoor location, we can measure foot traffic velocity and group dynamics outside that store. Given a store’s floor plan, we can also answer several questions about customer’s interaction dynamics. These include (but are not limited to): What are the popular sites in the store? How many customers visited the store at different times of the day? What were their velocity profiles? In other words, where were they slowing down, which shelves did they pass by comparably quickly?
Customer interaction dynamics questions aren’t limited to the above set. However, for the sake of keeping the article shorter, we will only talk about tackling the above questions.
There are several constraints that need to be taken care of when designing a solution for the above problems. The anonymity of the customer needs to be maintained at all times to respect the privacy of customers. At the same time, the business owner should not miss out the detailed interaction information about customers either.
Simple algorithms for handling real-time streams
Our solution to the above questions is to build and place mobile video processing units in the stores. The video will be captured and processed in real-time, without saving any video or image data. Specific algorithms that will be used in processing the video are as follows.
Optical flow in a scene. This will measure flow velocity of moving customers in general. Optical flow is a technique to measure velocity of each pixel in a scene, based on magnitude and direction of movement in each pixel. The first two images show the optical flow method being applied to a traffic video using a simple matlab script. Pixels that correspond to moving objects are colored with yellow arrows, with the arrows pointing towards the direction of movement.
The other pixels in this particular scene remain stationary so they are left as is.
Blob detection, based on optical flow. This method will distinguish between moving objects and also find customer group sizes. Based on the optical flow data, we can find pixels that are currently in motion for each frame. After some standard noise reduction and morphological smoothing, we would get labelled ‘blobs’ that represent each customer in a given scene. The rest of the images show reconstruction of blobs from the above optical flow data.
Contour reconstruction to find the polygonal shape of moving blobs in a scene. Analyzing the polygons can tell us more about group size and proximity between customers. In some cases, this will allow us to correlate the size of the polygon to age group of the customer. However, that would depend on light, shadow, and occlusion conditions.
Among the three algorithms described above, the video processing unit will only do the first (optical flow calculation) in real-time, and save the optical flow data in the disk to save processing time. The data can be post-processed later with the other two algorithms. We particularly do this to save processing time and capture information without any considerable amount of lag. Also, note that customer privacy is maintained by using computer vision in real-time – no images are saved, no face recognition etc. are done, just pixel velocity data are saved in the device.
Once the units start collecting data, we will need to provide the user (small business owner) tools to understand the data. There should be a mobile phone app to visualize this data, and also a web interface for completeness.
Video processing unit
The video processing unit we designed is an enclosure that contains a Raspberry Pi and a web camera. There is a USB Wi-Fi antenna to help access the Raspberry Pi through secure internet protocols (for batch transferring data).
The casing is designed so the unit can be attached to the ceiling to get a top view of the store. In case a top view is not possible to attain, or we are deploying the unit outside, we need to project all the blobs acquired from the elevated view to a top view. This can be done by transforming all blob coordinates guided by an appropriately calculated projection matrix.
There are two components to the software part. Let me describe them briefly.
Computer Vvsion: Raspberry Pi has its own operating system called Raspbian, which is a variant of standard Linux distribution. We installed OpenCV in the Raspberry Pis, and wrote OpenCV code to capture video stream from the attached webcam. The optical flow algorithm is given in OpenCV, we enhanced the algorithm by adding some of our own filters to precondition the image and the resulting data (flow velocity profile of moving pixels) is stored in a set of local files periodically. The figure shows a snapshot of the data. The files contain rows of pixel information. The first two coordinates are the original pixel, and the next coordinates (separated by a dot) represent the direction and magnitude of velocity change from the pixel.
OS maintenance: We wrote bash scripts to execute the computer vision code at certain times of the day. OS scheduler jobs were written for Linux to execute the computer vision scripts on startup (in case of power issues in the store or accidental restart of the Raspberry Pi). Some scripts were also written to securely copy the flow velocity data to a remote server over Wi-Fi.
Preliminary data collection and post-processing
We have collected some pilot data to understand the prospect of this system. In collaboration with GoDaddy, we deployed these units in several shops in the Downtown Crossing area of Boston. These stores are small business customers of GoDaddy (for example, Henry Herrara’s Mexican Grill). The video units were deployed in one outdoor location (to possibly measure foot traffic outside some stores), and three other indoor locations (two restaurants and one crafts shop).
The units were deployed in late October, and we collected them back two weeks later, in mid-November. Each unit had approximately 30 GB of pixel velocity data. The first job was to clean, aggregate, and calculate velocity profiles from the data.
The following diagram shows a snapshot of view from an outdoor location and the approximately projected top view of optical flow at the intersection.
The optical flow data can now be averaged and interpolated to create a smooth velocity map across many frames. This kind of a map can show trends over longer time periods. In this case, the velocity trend plot is created based on 5 minutes of optical flow data.
The data can also be used to create maps of averaged velocity magnitudes, accumulated over time. This gives us heatmaps of regions where activities were occurring. The following are such activity maps, each picture representing 5 minutes of activity at the intersection of Summer Street and Winter Street beside Macy’s.
This activity data can be properly visualized to inform store owners about outdoor and indoor activity hot spots. The same kind of data that were collected indoors can be used to inform the store owners about customer’s interaction dynamics.
We are currently working on the third computer vision task – understanding shape and size of moving polygons in the scene. A detailed analysis using velocity and shape features may reveal more information about group size of customers. Devising an unsupervised method that requires less to no labelled data is one of our main goals.
Secondly, based on customer behavior, an AI-based software may suggest interventions in the shop. These interventions may be in the form of advertisement or product placement recommendations. These will require a combination of advanced recommender systems and some human interior designer’s inputs.
The system can be used to track product and advertisement life cycle to some extent. An advertisement may initially draw attention but later customers may not be interested in the offer. By analyzing customer movement behavior around the advertisement area over a period of days, the owner may take down or replace the promotion/advertisement offer. The same goes for understanding how/if a new product attracts customer attention.
Finally, a cheap and working technology around this kind of physical space analytics is Bluetooth scanning. Assuming a significant number of customers have their Bluetooth discovery option enabled in their smartphones, we can build cheap and energy efficient Bluetooth sniffing devices that can be deployed in stores. Counting the number of Bluetooth devices in a designated space and understanding their movement may give us similar results at a cheaper price. Proximity tracking based on RSSI signal strength is a way to understand movement patterns in a store if there are enough sniffers sampling at a high time resolution.
Interested to join GoDaddy’s Emerging Products group and working on hard technology problems to help small businesses be more successful? We are constantly growing our team – for example have a look at the following current job openings for our San Francisco and Sunnyvale office:
- Principal Engineer, Emerging Products
- Senior Mobile Engineer, Emerging Products
Visit GoDaddy careers to learn about these and all our current openings across the company.