Doc2Img: A New Approach to Vectorization of Documents

Abstract Vector space representations of text have increased in popularity and are used in various text classification problems. We present Doc2Img, a new approach to create document vectors that improves upon existing approaches such as Word2Vec and Doc2Vec in capturing similarities between words within a document and the differences across documents. We apply this new vector space representation to the problem of deriving the sensor requirements of apps (for smartphones and IoT devices) by learning a classification model using document vectors. We show that this learned model outperforms existing vector space representations (Word2Vec and Doc2Vec) by more than 10%. Further, this model can predict with an average accuracy of 75% and greater than 85% on the top-20 sensor requirements for 300 different applications.
Authors
  • ShreeRanjani SrirangamSridharan (IBM US)
  • Mudhakar Srivatsa (IBM US)
  • Raghu Ganti (IBM US)
  • Chris Simpkin (Cardiff)
Date Jul-2018
Venue 21st International Conference on Information Fusion
Variants