GPU Scheduler for De Novo Genome Assembly with Multiple MPI Processes


De Novo Genome assembly is one of the most important tasks in computational biology. ELBA is the state-of-the-art distributed-memory parallel algorithm for overlap detection and layout simplification steps of De Novo genome assembly but exists a performance bottleneck in pairwise alignment.

In this work, we introduce 3 GPU schedulers for ELBA to accommodate multiple MPI processes and multiple GPUs. The GPU schedulers enable multiple MPI processes to perform computation on GPUs in a round-robin fashion. Both strong and weak scaling experiments show that 3 schedulers are able to significantly improve the performance of baseline while there is a trade-off between parallelism and GPU scheduler overhead. For the best performance implementation, the one-to-one scheduler achieves ~7-8x speed-up using 25 MPI processes compared with the baseline vanilla ELBA GPU scheduler.

Guanghao(Gary) Wei
Guanghao(Gary) Wei
M.Eng. in Computer Science

My research interests include machine learning systems, AI for science, Optimization theory, and High Performance Computing.