Welcome to Reflexiv
Reflexiv is an open source parallel De novo genome assembler. It addresses the challenge of high memory consumption during De novo genome assembly by leveraging distributed computational resources. It also improves the run time performance using a parallel assembly algorithm. Having problem fitting a 500GB
De Bruijn graph into the memory? Here, you can use 10 64GB
computer nodes to solve the problem (and faster).
How it works
We developed a new data structure called Reflexible Distributed K-mer (RDK). It is built on top of the Apache Spark platform, uses Spark RDD (resilient distributed dataset) to distribute large amount of K-mers across the cluster and assembles the genome in a recursive way.
Comparing RDK to the conventional (state-of-the-art) De Bruijn graph, RDK stores only the nodes of the graph and discards all the edges. Since all K-mers are distributed in different compute nodes, RDK uses a random K-mer reflecting method to reconnect the nodes across the cluster (a reduce step of the MapReduce paradigm). This method iteratively balancing the workloads between each node and assembles the genome in parallel.
Getting started
Follow the tutorial to run a simple Reflexiv application on your laptop.
Re-assembly: Extending assembled fragments
Reflexiv can also re-assemble pre-assembled or probe-targeted genome/gene fragments. This is useful to improve the quality of assemblies, etc., to complete a gene from a gene domain using whole genome sequencing data.
Command:
/usr/bin/Reflexiv reassembler \ --driver-memory 6G \ ## Spark parameter --executor-memory 60G \ ## Spark parameter -fastq '/vol/human-microbiome-project/SRS*.tar.bz2' \ # Reflexiv parameter -frag /vol/private/gene_fragment.fa \ # Reflexiv parameter: gene fragments/domains -outfile /vol/mybucket/Reflexiv/assembly \ # Reflexiv parameter -kmer 31 # Reflexiv parameterRead more.
Setup cluster
A Spark cluster is essential to scale-out (distribute to multiple compute nodes)
Support or Contact
Having troubles using