Cluster-discovery of Twitter messages for event detection and trending
thesisposted on 2022-10-20, 15:31 authored by Shakira Banu Kaleel
Social media data carries abundant hidden occurrences of real-time events in the world which raises the demand for efficient event detection and trending system. The Locality Sensitive Hashing (LSH) technique is capable of processing the large-scale big datasets. In this thesis, a novel framework is proposed for detecting and trending events from tweet clusters presence in Twitter1 dataset that are discovered using LSH. The experimental results obtained from this research work showed that the LSH technique took only 12.99% of the running time compared to that required for K-means to find all of the tweet clusters. Key challenges include: 1) construction of dictionary using incremental TF-IDF in high-dimensional data in order to create tweet feature vector 2) leveraging LSH to find truly interesting events 3) trending the behavior of event based on time, geo-locations and cluster size and 4) speed-up the cluster-discovery process while retaining the cluster quality.