GenomePrep: Preprocess Direct-To-consumer (DTC) Genomes
GenomePrep is a Python library for genetics enthusiasts that prepare direct-to-consumer DNA data for analysis using popular bioinformatics tools. It performs the following tasks:
Parse SNPs from popular DTC providers
Check for missing calls and duplicated SNPs
Determine assembly, and sanity check for similarity to the reference genome
Deduce the genotyping array version
Apply a genotyping-array-based SNP sanity filter (optional)
Convert to 23andMe-like format and VCF format for downstream analysis
This project was developed on the goodwill of over 7,000 open genome data made public between 2011 and 2020, addressing the problem of processing raw DTC DNA data in the context of the present: genotype arrays.
More information about genotyping array, ancestral relatedness can be found in our paper on CSBJ:
Lu, B. Greshake Tzovaras, J. Gough, A survey of direct-to-consumer genotype data,and quality control tool (GenomePrep) for research, Computational and Structural Biotechnology Journal(2021)
You can upload and process your genome on server.
You can also download preprocessed open genomes. Check out the Usage section for further information, including how to Installation the project.
Note
This project is under active development.