Doc2Img: A New Approach to Vectorization of Documents

Abstract	Vector space representations of text have increased in popularity and are used in various text classification problems. We present Doc2Img, a new approach to create document vectors that improves upon existing approaches such as Word2Vec and Doc2Vec in capturing similarities between words within a document and the differences across documents. We apply this new vector space representation to the problem of deriving the sensor requirements of apps (for smartphones and IoT devices) by learning a classification model using document vectors. We show that this learned model outperforms existing vector space representations (Word2Vec and Doc2Vec) by more than 10%. Further, this model can predict with an average accuracy of 75% and greater than 85% on the top-20 sensor requirements for 300 different applications.
Authors	ShreeRanjani SrirangamSridharan (IBM US) Mudhakar Srivatsa (IBM US) Raghu Ganti (IBM US) Chris Simpkin (Cardiff)
Date	Jul-2018
Venue	21st International Conference on Information Fusion

Variants	doc-5783