DNAnalyzer

DNAnalyzer is a fiscally sponsored 501(c)(3) nonprofit organization (EIN: 81-2908499) dedicated to revolutionizing the field of DNA analysis. We aim to democratize access to machine learning-powered DNA analysis through efficient on-device computation and interpretive tools. It was created by Piyush Acharya and is currently led by him and @LimesKey. As a testament to its significant impact on the computational biology research industry, it currently has 45 computational biologist and computer scientist contributors from accredited international institutions and corporations such as Microsoft Research, University of Macedonia, and Northeastern University. Additionally, it has earned the star of the Organizer of the AI Engineer World's Fair (OpenAI, Microsoft, Google DeepMind, Anthropic, etc) and the CEO of Forem, among others.

Background

Currently, the average cost of accessing an individual's DNA sequence is approximately $100 [1]. On top of that, most personal genomics providers charge up to $600 for insights such as carrier status, health predisposition, wellness reports, and traits analyses [2, 3]. This restricts these valuable health insights from reaching the communities that need them most.

At the same time, there are significant privacy risks associated with direct-to-consumer genetic testing companies. Unlike a credit card number or password, stolen or misused genetic information cannot be changed. A 2018 study by Vanderbilt University found that 78% of DTC genetic testing companies shared genetic information with third parties in de-identified or aggregate forms without additional consumer consent [4]. Few laws regulate how genetic data should be stored and protected, and companies have experienced data breaches. For example, in 2023, 23andMe suffered a data breach where hackers accessed the genetic information of 6.9 million users, demonstrating an urgent need for tools such as DNAnalyzer [5].

By enabling secure, on-device genomic data analysis with no costs for consumers, DNAnalyzer aims to mitigate these risks while making insights more accessible to underserved communities.

Features

Start and Stop Codons
- Indicate the start and stop of a protein. There are 20 different amino acids. A protein consists of one or more chains of amino acids (called polypeptides) whose sequence is encoded in a gene. [2]
High Coverage Regions
- Promoter sequences in the genome that code for proteins have a relatively high proportion of guanine and cytosine nucleotides to the 4 nucleotide bases (45-60% GC-content). Such CpG islands are likely to reveal important information about the genome. [3]
Neurodevelopmental Disorders
- A group of disorders, usually characterized by longer genes, that affect the development of the brain and nervous system. These disorders are caused by genetic mutations that affect the development of the brain and nervous system. These disorders include autism, attention deficit hyperactivity disorder (ADHD), and schizophrenia. [4]
Core Promoter Elements
- Promoter sequences are short DNA sequences that are located upstream of a gene and are responsible for initiating transcription (e.g. BRE, TATA, INR, and DPE). [5]
FASTA File Support
- Supports multi-line and single-line FASTA database files. Files can either be uploaded or linked to from the web. [7]
Command-Line Interface (CLI)
- The Methionine command-line interface (abbreviated as Met CLI) is a unified tool for running DNAnalyzer services from the command-line. The CLI is a powerful tool for using DNAnalyzer services and scripting a sequence of commands to execute. You can currently access all the core features present in DNAnalyzer without having to log in, although account support will be implemented soon. To get more information on Met CLI installation and currently supported commands, refer to Met CLI GitHub repository.
Web UI Coming Soon

Quick Introduction to DNA

DNA

DNA, present in most cells of the body, holds the blueprint for creating over 200 distinct cell types. Like a programming language, it is exclusive to living organisms. With the aid of ML, we can decode and comprehend DNA, leading to life-saving discoveries and valuable insights.

Databases

A DNA database is crucial for interpreting DNA sequences. By leveraging machine learning, predictions can be made on previously unseen DNA sequences. This is the foundation on which modern DNA analysis programs operate.

Getting Started

Please refer to the Getting Started document for more information on how to use DNAnalyzer.

Future Support and Improvements

Optimized SQL Database for Genomic Data

Our goal is to find the best SQL database fork that can handle high performance and vertical scaling. We will store and query genomic data from thousands of species, including their genes and mutations. This will help us train our machine learning model more effectively.

Improved Neural Network for Genotyped Data

This will bring the ability to use genotyped data from 3rd-party DNA testing services with our algorithm. In the future, to use this program, all you will need is a simple $99 DNA Test to be able to experience all the features of DNAnalyzer.

DIAMOND Implementation, a BLAST fork

This will combine DIAMOND's performance advantage along with BLAST's algorithm.

Citations

View our in-line citations in the Citations document.

Contributing

Terms of Use

The use of this application is entirely at your own discretion and responsibility, including all actions and outcomes that may result. While the DNAnalyzer team is committed to addressing significant issues reported by users or identified during ongoing research, we disclaim any liability for losses, damages, or other consequences arising from the use of this application, regardless of the circumstances. For any questions or concerns, please contact us at [email protected].

If you utilize this software in your research, we kindly request that you provide an appropriate citation. You may use the following formats:

APA Citation:

Acharya, P. (2022). DNAnalyzer: ML-Powered DNA Analysis Platform (Version 3.5.0-beta.0) [Computer software]. https://doi.org/10.5281/zenodo.14556577

BibTeX Citation:

@software{Acharya_DNAnalyzer_ML-Powered_DNA_2022,
  author = {Acharya, Piyush},
  doi = {10.5281/zenodo.14556577},
  month = oct,
  title = {{DNAnalyzer: ML-Powered DNA Analysis Platform}},
  url = {https://github.com/VerisimilitudeX/DNAnalyzer},
  version = {3.5.0-beta.0},
  year = {2022}
}

Stars

Please star the repository to show your support!

Name		Name	Last commit message	Last commit date
Latest commit History 1,381 Commits
.github		.github
.vscode		.vscode
assets		assets
docs		docs
gradle/wrapper		gradle/wrapper
installer		installer
lib		lib
src		src
web		web
.classpath		.classpath
.deepsource.toml		.deepsource.toml
.gitattributes		.gitattributes
.gitignore		.gitignore
.project		.project
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNAnalyzer

Background

Features

Quick Introduction to DNA

DNA

Databases

Getting Started

Future Support and Improvements

Optimized SQL Database for Genomic Data

Improved Neural Network for Genotyped Data

DIAMOND Implementation, a BLAST fork

Citations

Contributing

Terms of Use

Stars

About

Releases 13

Sponsor this project

Contributors 45

Languages

License

VerisimilitudeX/DNAnalyzer

Folders and files

Latest commit

History

Repository files navigation

DNAnalyzer

Background

Features

Quick Introduction to DNA

DNA

Databases

Getting Started

Future Support and Improvements

Optimized SQL Database for Genomic Data

Improved Neural Network for Genotyped Data

DIAMOND Implementation, a BLAST fork

Citations

Contributing

Terms of Use

Stars

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 13

Sponsor this project

Contributors 45

Languages