Be the first to like this
This talk will showcase work done by the bioinformatics team at CSIRO in Sydney, Australia to make Spark more useful and usable for the bioinformatics community. They have created a custom library, variant-spark, which provides a DSL and also a custom implementation of Spark ML via random forests for genomic pipeline processing. We’ve created a demo, using their ‘Hipster-genome’ and a Databricks notebook to better explain their library to the world-wide bioinformatics community. This notebooks compares results with another popular genomics library (HAIL.io) as well.