Complex Genomics Analysis Pipelines made Simple with NextFlow & AWS Batch

Abstract

The science of genomics is increasingly contributing to the sort of insights which lead to longer and healthier lives for us all. But the computational pipelines needed to support accurate and timely analysis of huge quantities of genomic data are truly extreme - more than enough to break the most well configured on-premise HPC facility. The key factor? A lack of elasticity: no ability to stretch the infrastructure a million different ways to accommodate needs that change from minute to minute and just aren’t predictable. When this happens, whole collections of analysis jobs get piled up behind each other, similar to a traffic jam on a complex road network. We can’t fix the traffic on the roads, but for genomics, we have tools like NextFlow.io (a popular open source data-driven pipeline tool for orchestrating scientific pipelines) and AWS Batch, which have access to Amazon EC2 and all the elasticity a scientist could want. Thanks to all these technologies, our customer QBiC Tübingen is able to put all their efforts into their number one mission: to improve patient health by translating our understanding of the Human Gut Microbiome into knowledge and techniques for creating innovative new medicines that will help us all. You’ll learn about: • QBiC’s use of NextFlow’s to manage complex pipelines and orchestrate interdependent jobs; • Their reasons for choosing AWS and the steps they took to build trust in us; • NextFlow’s integration with AWS Batch which dynamically expands and contracts cloud resources to meet the needs of the pipeline as it works its way through the data. Suited for: HPC Users, Medical IT and the engineers who support them.

Date
Event
Invited Talk at AWS Summit 2019 Berlin
Location
Station Berlin, Luckenwalder Str. 4-6, 10963 Berlin