Joint Data Compression and Caching: Approaching Optimality with Guarantees

Abstract	Ananthram Swami U.S. Army Research Laboratory Adelphi, MD 20783 USA ananthram.swami.civ@mail.mil power supply associated with mobile devices are limited, efficient data communication is critical. In this paper, we consider a network of nodes, each capable of compressing data and caching a constant amount of data. A set of nodes generates real time data and a sink node collects the data from these nodes through fixed paths to serve requests for these data. However, the requests need not reach nodes that generated the data, i.e. request forwarding stops upon reaching a node on the path that has cached the requested data. Upon finding the data, it is sent along the reverse path to the sink node to serve the requests. While each node can cache data to serve future requests so as to reduce access latency and bandwidth requirement, it incurs addi- tional caching costs [9]. Furthermore, data compression reduces the transmission cost at the expense of computation cost [4, 26]. Thus, there is an energy consumption tradeoff among data compression, transmission, and caching to reduce latency. Since bandwidth and energy required for network operation is expensive [26], it is criti- cal to efficiently compress, transmit and cache the data to reduce latency. This raises the following question, what is the right balance between compression and caching to minimize total communica- tion latency for limited energy consumption? Our primary goal is to minimize average network latency (de- lay) due to data transfer across the network, subject to an energy consumption constraint on compression and caching of the data. This problem is naturally abstracted and motivated by many real world applications, including wireless sensor networks (WSNs) [9], peer-to-peer networks [10], content distribution networks (CDNs) [6, 13, 23], Information Centric Networks (ICNs) [18] and so on. For example, in a hierarchical WSN, the sensors generate data, which can be compressed and forwarded to the sink node through fixed paths to serve requests generated from outside the network. These requests can be served from the intermediate nodes along the path that cache the data; if, however, data are not cached on any node along the path, the request can subsequently be forwarded to the edge sensor that generates the requested data. Similarly, in an ICN, requests for data can be served locally from intermediate caches We consider the problem of optimally compressing and caching data across a communication network. Given the data generated at edge nodes and a routing path, our goal is to determine the optimal data compression ratios and caching decisions across the network in order to minimize average latency, which can be shown to be equivalent to maximizing the compression and caching gain under an energy consumption constraint. We show that this problem is NP- hard in general and the hardness is caused by the caching decision subproblem, while the compression sub-problem is polynomial- time solvable. We then propose an approximation algorithm that achieves a (1 − 1/e)-approximation solution to the optimum in strongly polynomial time. We show that our proposed algorithm achieve the near-optimal performance in synthetic-based evalua- tions. In this paper, we consider a tree-structured network as an illustrative example, but our results easily extend to general net- work topology at the expense of more complicated notations.
Authors	Jian Li (UMass) Faheem Zafari (Imperial) Don Towsley (UMass) Kin Leung (Imperial) Ananthram Swami (ARL)
Date	Sep-2018
Venue	2nd Annual Fall Meeting of the DAIS ITA, 2018

Variants	doc-2650