Dimensionality reduction method for distributional semantics
Random indexing is a
dimensionality reduction method and computational framework for
distributional semantics, based on the insight that very-high-dimensional
vector space model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) are encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately.
It can be also verified that random indexing is a random projection technique for the construction of Euclidean spaces—i.e. L2 normed vector spaces.[7] In Euclidean spaces, random projections are elucidated using the Johnson–Lindenstrauss lemma.[8]
The TopSig technique[9] extends the random indexing model to produce
bit vectors for comparison with the
Hamming distance similarity function. It is used for improving the performance of
information retrieval and
document clustering. In a similar line of research, Random Manhattan Integer Indexing (RMII)[10] is proposed for improving the performance of the methods that employ the
Manhattan distance between text units. Many random indexing methods primarily generate similarity from co-occurrence of items in a corpus. Reflexive Random Indexing (RRI)[11] generates similarity from co-occurrence and from shared occurrence with other items.
^Sahlgren, Magnus (2005)
An Introduction to Random Indexing, Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, August 16, Copenhagen, Denmark
^Qasemi Zadeh, Behrang & Handschuh, Siegrfied. (2014)
Random Manhattan Indexing, In Proceedings of the 25th International Workshop on Database and Expert Systems Applications.
^Geva, S. & De Vries, C.M. (2011)
TopSig: Topology Preserving Document Signatures, In Proceedings of Conference on Information and Knowledge Management 2011, 24–28 October 2011, Glasgow, Scotland.