Walking around in a train station or an airport, on public transport, shopping, or just popping into a store for a sandwich? Rest assured, you’re being watched. Security cameras are becoming ubiquitous in modern cities, with technologies that are becoming increasingly more refined. And due to their prevalence, law enforcement officials are relying on them more than ever to prevent and solve crimes. According to some estimates, there will be about 26 million Internet-connected IP cameras in Japan by 2020 *1.
*1: Estimate by Yano Research Institute in 2016.
All these cameras everywhere mean absolutely enormous amounts of footage. The issue then becomes—how to manage all this? How do we limit labor while also making efficient use of this footage? There is an obvious limit to how much footage can be monitored simultaneously, as well as a limit to the amount of information that one specific security camera is capable of collecting. As security cameras become more and more a part of our everyday lives, what we need is a system that can simplify the process of analyzing enormous amounts of footage, and organize them in highly systematic ways.
Toshiba released an AI technology that could track the movements of people in video feeds from multiple cameras in October 2017. We asked Tomoyuki Shibata of the Corporate Research & Development Center Media AI Laboratory, where the technology was developed, how this plays out.
“Let’s say there’s someone suspicious leaving the field of view of a certain camera. Even if that specific camera can’t see him, we can map his movements using footage from a number of different cameras. So in real life, you’d be watching some footage, and then when you click on a person you think seems suspicious, you can see where this person came from, where they went next, across footage from multiple cameras. And the system can do this for any person in the footage.” Shibata explains.
Tomoyuki Shibata, Research Scientist at Media AI Laboratory, Corporate Research & Development Center
Before, AI could only identify specific individuals recorded by one camera. With this new technology, however, the AI has gained an understanding of how video footage taken by multiple cameras relate to one another, and through this understanding is capable of mapping movements across the video footage. But the truly revolutionary aspect of this technology is that it can identify every single person who is recorded on-camera, and track the movements of every single one of these people with up to 88.4% accuracy. This is an extremely high level of efficiency compared to the technologies of yesteryear.
What has made this high-precision analysis possible is Toshiba’s communication AI, “SATLYS™.” Shibata tells us how they utilized the AI to overcome the various technological issues they faced.
“Anybody who’s ever been photographed can tell you how different you can look, depending on the camera, the angle, the light, the colors around you, or even your posture. Because of this, we had to develop a process to extract only the specific characteristics that could be used to identify an individual (Image 1), as well as a process to identify a person only when the characteristics of his or hers match in different video footage (Image 2). The real breakthrough though was when we were able to develop an analytical process that could identify the same individual across video feeds from multiple cameras (Image 3).”
Image 1: Process to extract features for similar characteristics exhibited by the same individual across locations and cameras
Image 2: Process to quickly identify the same individual in additional camera footage according to the images (frames) captured in existing camera footage
Image 3: Analytical process that identifies the same individual across video footage from multiple cameras
Footage that had previously taken a tremendous amount of time to process can now be processed in a fraction of the time, with even higher accuracy. This is due to these newly-developed processes cutting down the computational complexity of the analysis itself.
Searches can also be narrowed down by using certain key words. If you are looking for a lost child, you can choose the key words, “white dress,” “red backpack,” “girl,” and the AI will carry out a search and show you where she has been and is.
As in the example of the lost child, key words for searching can be features of the person, meaning you don’t need to know any technical language to be able to use the technology. The interface is also intuitive and easy to use, regardless of how good you are at using a computer. Indeed, its accessibility will be a great advantage when the technology is adapted for use in commercial facilities.
The environment at the Corporate Research & Development Center, full of specialists at the top of their game, fueled the drive for this project. “I don’t think there’s anywhere else in Japan where there are this many specialists of image recognition technology gathered in one place,” says Shibata. Toshiba’s Research & Development Center is a home to all kinds of specialists—data-matching, audio analyses, natural language processing, and more. It was in 1967 that Toshiba first developed an automatic mail-sorting machine that could identify hand-written post codes and categorize mail accordingly. Since then, the company has been developing AI technologies.
“We have the world’s best researchers and have presented at conferences throughout the world. When we hit a wall, like in our research for data-matching, we can just ask the people around us, and they’ll say, well, ‘what about this?’ or ‘what about that?’ Everyone is just full of ideas. We don’t just stay cooped up in the lab either. It’s important to know what customers are actually thinking, so sometimes those on the business side will take us out and give us an opportunity to hear those voices.”
Customers’ opinions allows Toshiba to envision final products from the perspective of the consumer. The ideal R&D environment and open culture helped to customize the technology and make it more user friendly.
Shibata’s zeal is boundless: “It can be hard even for us to differentiate between people when they’re all wearing the same uniform. This is true for the technology as well. An example is the train station during rush hour, where there are so many people wearing navy, grey, black—mostly suits— and they all look so similar. The data-matching is very accurate even at this stage, but we want to make it even better. Development in the future will focus on improving the capability of identifying the characters, the processing speed, all that, to make this technology more accessible to more people.”
Many companies have shown interest in this technology since it was announced in August 2017. Verification testing is already underway at certain security companies, and Toshiba has fielded a number of offers from foreign retail giants, with whom discussions are just on the horizon.
As Toshiba’s image recognition technologies evolve, they will find their way into a variety of industries, and be used in a variety of ways. Demand for identifying suspicious individuals and finding lost children has existed since time immemorial. This technology can meet these demands, but it can also carve an opening for new demand, through its cutting-edge use in fields like marketing.
“Toshiba Group offers retail solutions and cloud services. We can see this technology being used for security, of course, but also in a variety of services in different industries and lines of business. What we’re really looking for is to come up with services that will excite our end users, and make them glad they chose our services and our products.”
* SATLYS is a registered trademark and/or trademark of Toshiba Digital Solutions Corporation in Japan and other countries.