MB&B professor Mark Gerstein’s group published a paper called “Storing and analyzing a genome on a blockchain” on June 29th in Genome Biology. The group explores how blockchain technology- a type of secure database that stores information in connected groups called blocks- can be used to securely store genomic information. Their technology aims to give ownership to the individual that the genome belongs to, allowing data owners to both protect and share their genetic data.
In this paper, the technology is presented as the first “open-source, proof-of-concept private blockchain network,” including two modules, the first to store raw genomic reads and the second to to store genetic variant files. To overcome the extensive storage requirements of whole genomic information, the group designed SAMchain, a blockchain file format that contains only the difference between a read and a reference genome, allowing whole genomic data to be reconstructed. The group’s technology is unique because its architecture is optimized towards the accessibility of the data once it is stored, even if it means that the storage process itself is more memory and time intensive.
Storing genomic data through blockchain technology presents many advantages over current systems in use. For example, the 1000 Genomes Project and the Personal Genomes Projects store their data in open, centralized storage systems, which are susceptible to corruption and tampering. Unlike these systems, once a blockchain data stream is created it is sealed and cannot be altered, which preserves the integrity of the data. As quoted by Gerstein in YaleNews, “As genomic data becomes increasingly integral to our understanding of human health and disease, its integrity and security must be a priority when providing solutions to storage and analysis,” noting that changes in genomes can pose serious problems in future patient care and research integrity.
The group explains how now more than ever, genomic data has the potential to advance medical research, diagnostics, and even personalized medicine. They describe how information stored on SAMchain can provide direct insight to existing genetic diseases and genetic risk factors.They plan to have this technology available as open-source and free-of-charge to the researchers, clinicians and data owners who would use it. Gerstein also hopes that in the future, the technology can also support gene expression profiles, providing a whole new level of data about an individual.
Yale authors affiliated with MB&B include Gamze Gürsoy, Charlotte M. Brannon, Eric Ni & Mark Gerstein.
By Shravani Balaji