VidQ: Video Query Using Optimized Audio-Visual Processing

Abstract	As the amount of recorded and stored videos on mobile devices increase, efficient techniques for searching video content become more and more important, especially for applications like searching for the moment of crime or other specific actions. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio content processing approaches. We build a system, called VidQ, which consists of several stages and uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query has been issued, we identify the possible different stages of processing that will take place. This is followed by identifying the order of these stages that build up the system. Finally, we distribute the process among the available network resources to further improve the performance by minimizing the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.
Authors	Noor Felemban (PSU) Fidan Mehmeti (PSU) (Tom La Porta (PSU)
Date	Nov-2021
Venue	Proceedings of the 17th ACM Symposium on QoS and Security for Wireless and Mobile Networks 2021 Nov 22 (pp. 51-60).