The algorithm we developed identifies a hierarchy of densely connected subgraphs within the larger bipartite graph. This can be used to detect fraudsters and bot accounts in social networks by identifying anomalous connectivities. It can be used in content recommendation algorithms and as a subprocess for discovering densely connected communities of vertices in general graphs.
To develop a parallel algorithm for mining bipartite graphs, we needed a deep understanding of parallel computing, general graph mining algorithms, and specific properties of bipartite graphs. Most of these contents are not covered in a traditional computer science curriculum because they are relatively novel or else specific fields. We benefited from the MIT PRIMES program, which matched us with our mentors, Jessica Shi and Prof. Julian Shun from the MIT Computer Science and Artificial Intelligence Lab. Under the guidance of our mentors, we studied parallel computing using lectures from MIT OCW and brought ourselves up to speed with pioneering research in general graph mining algorithms by reading papers. During the pandemic, we divided our work and met through Zoom to synchronize progress.
Our algorithm can process bipartite graphs several times faster than existing algorithms. This will enable researchers to analyze the density hierarchy structure of gene-disease correlation networks, protein-protein interaction networks, and social networks much more quickly. We expect our algorithm, when applied, to be able to speed up research in genetics, biochemistry, and social studies. Our algorithm can also be applied to detect spam bots or fraudsters in social networks quickly. This can conceivably help social media companies improve the user experience of their products. The ideas developed in our work could conceivably also have wider applications to other parallel computing research. The ideas used to parallelize our algorithm could conceivably inspire research to parallelize other similar algorithms. Increasing the parallelism of algorithms is important in the modern era as it becomes difficult to increase the clock speed of CPU cores due to the generated heat. But Moore’s Law permits us to fit ever more transistors in one CPU. So, chip manufacturers fitted the increasing number of transistors into a larger number of cores instead. With the number of cores in a commercial CPU increasing at an exponential rate, it’s critical for algorithms to have significant parallelism to take advantage of a large number of cores, hence the general importance of research that focuses on developing algorithms with high parallelism.
Our work has many specific applications we have described before, such as detecting spammers or better understanding the relationship between people in social networks and different proteins. Beyond that, there are many general applications based on the optimization methods we’ve used for parallelism (i.e., scheduling or multithreading) and graph decomposition beyond just bipartite graphs. This is super useful as the amount of data any piece of software will have to process as an increasing number of people rely on social networks or discover new proteins.