15-418/618 · Parallel Computer Architecture and Programming · Spring 2026

Parallel Betweenness Centrality on
Protein-Protein Interaction Networks

Team: Tanishqa Puhan & Vera Zhong

Project URL

Web Page

https://zhongvera.github.io/15418-finalprojectwebsite/

Overview

Summary

We plan on implementing and optimizing parallel betweenness centrality computation on protein-protein interaction (PPI) graphs. We will develop multiple parallel implementations and analyze how load balancing, memory locality, and synchronization strategy affect performance. Our goal is to understand scalability limits and bottlenecks when running graph algorithms on irregular, real-world biological networks.

Context

Background

Betweenness centrality is a graph metric that measures how often a vertex lies on the shortest path between pairs of other vertices. It is widely used in network analysis to identify important nodes — in the context of protein-protein interaction networks, high-centrality proteins often correspond to biologically significant molecules critical to cellular function or disease pathways.

The standard algorithm for computing betweenness centrality is Brandes' algorithm, which runs a BFS (or Dijkstra's for weighted graphs) from every vertex in the graph and accumulates dependency scores along the resulting shortest-path DAGs. In this project, we will implement Brandes' algorithm on PPI graphs sourced from public databases like BioGRID or STRING. Each BFS traversal produces partial centrality contributions that must be aggregated across all source vertices, making the computation a natural target for parallelization. Unlike regular stencil-style workloads, graph algorithms on biological networks introduce irregular access patterns and skewed degree distributions, allowing us to explore a richer design space for studying parallel performance.

Problem Statement

The Challenge

Although betweenness centrality is conceptually parallel across source vertices, achieving high performance is nontrivial due to the irregular structure of PPI graphs. Real biological networks tend to follow power-law degree distributions, meaning a small number of hub proteins have far more connections than most other nodes. This creates significant load imbalance when naively distributing BFS sources across threads.

Memory locality is also a challenge because graph traversal produces unpredictable access patterns that interact poorly with caches. Additionally, accumulating dependency scores into shared centrality arrays requires synchronization or per-thread accumulators, both of which introduce overhead. The primary challenge is choosing the right parallel granularity: parallelizing across source vertices is coarse-grained and simple, but high-degree hub vertices create severe load imbalance. Parallelizing within a single BFS offers finer granularity but requires synchronization at each BFS level and introduces frontier management overhead. Finding the right hybrid approach is the core challenge and goal.

Infrastructure

Resources

We will use multicore CPU systems available through the GHC clusters and Bridges-2. The project will be implemented in C/C++ using OpenMP for parallelization, and we may explore lock-free or atomic-based accumulation strategies as a stretch goal. We are starting from scratch, using the original Brandes paper and follow-up work on parallel betweenness (such as Madduri et al.) as references. Our primary dataset will be the human PPI network from BioGRID (~20k proteins, ~400k interactions), with potential testing on the larger STRING database or smaller organism-specific networks (e.g., yeast).

C / C++ OpenMP GHC Clusters Bridges-2 BioGRID STRING

Deliverables

Goals & Deliverables

Plan to Achieve (MVP)

Implement a correct serial Brandes' algorithm and at least two parallel OpenMP versions: (1) static source-parallel, (2) dynamically-scheduled source-parallel to mitigate hub-induced imbalance, and (3) a hybrid version parallelizing within individual BFS traversals for high-degree vertices. Evaluate across graph sizes and thread counts, produce speedup graphs, and analyze bottlenecks including load imbalance, contention, and cache behavior.

Hope to Achieve (Stretch)

Implement a basic CUDA version to compare GPU performance on irregular graph traversal against our CPU implementations. Explore graph reordering techniques (e.g., reverse Cuthill-McKee) to improve cache locality, and experiment with fine-grained BFS parallelism or lock-free accumulation of dependency scores.

Rationale

Platform Choice

We chose multicore CPUs with OpenMP because Brandes' algorithm exposes coarse-grained parallelism across source vertices that maps well to thread-based parallelism. OpenMP allows us to quickly implement and compare different scheduling and decomposition strategies while maintaining control over thread behavior.

CPUs are also well-suited for studying irregular memory access patterns and load imbalance effects, which are central to this project. The GHC cluster machines give us access to a consistent, well-characterized hardware environment with enough cores to observe meaningful scaling behavior across our biological network datasets.

Timeline

Schedule

Week	Goals
Apr 8 – Apr 14	Set up PPI dataset loading and graph data structures. Implement serial Brandes' algorithm and verify correctness against reference implementations on small graphs. Set up benchmarking framework and produce an initial profiling report to evaluate workload characteristics.
Apr 15 – Apr 21	Implement OpenMP source-parallel version with static scheduling. Test correctness and measure speedup against serial baseline. Begin identifying scaling behavior and load imbalance from high-degree hub vertices.
Apr 22 – Apr 25	Implement dynamic or guided scheduling and a load-balanced variant. Compare against static source-parallel implementation and analyze the impact of load balancing on performance. Begin structured data collection for experiments.
Apr 26 – Apr 28	Run full experiments varying graph size, thread count, and dataset. Generate performance graphs and analyze bottlenecks such as load imbalance, contention, and cache behavior.
Apr 29 – Apr 30	Implement stretch goal optimizations if possible — fine-grained BFS parallelism or lock-free accumulation. Evaluate performance improvements and compare against previous implementations.
Apr 30	Finalize analysis, complete report, and prepare poster materials.