Hypergraph of Text: a Mathematical Structure for Organizing and Analyzing Big Text Data

Published in 2024 IEEE International Conference on Big Data (BigData), 2009

Since the collective knowledge of our world is primarily encoded in massive amounts of text data, people rely on text data to get access to all kinds of useful knowledge. However, how to organize, navigate, and analyze large amounts of text data remains a difficult open challenge. To address this challenge, we propose the Hypergraph of Text (HoT), a mathematical structure for organizing and analyzing big text data. We discuss how to create HoT from large text collections and various applications of HoT. Experimentally, we show the promise of HoT by creating a HoT on a subset of Wikipedia pages covering topics in philosophy. Experiment results show the structure created by Hot has many uses such as facilitating information access via enabling flexible corpus navigation and discovering interesting topical structures.

Recommended citation: D. E. Alvarez and C. Zhai, "Hypergraph of Text: a Mathematical Structure for Organizing and Analyzing Big Text Data," 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 8605-8607, doi: 10.1109/BigData62323.2024.10824995.
Download Paper