Join me on my ML Journey from novice to expert! with some nomadic meanderings along the way.

Machine Learning

Advancing genomics to better understand and treat disease

Advancing genomics to better understand and treat disease

Genome sequencing can help us better understand, diagnose and treat disease. For example, healthcare providers are increasingly using genome sequencing to diagnose rare genetic diseases, such as elevated risk for breast cancer or pulmonary arterial hypertension, which are estimated to affect roughly 8% of the population.

At Google Health, we’re applying our technology and expertise to the field of genomics. Here are recent research and industry developments we’ve made to help quickly identify genetic disease and foster the equity of genomic tests across ancestries. This includes an exciting new partnership with Pacific Biosciences to further advance genomic technologies in research and the clinic.

Helping identify life-threatening disease when minutes matter

Genetic diseases can cause critical illness, and in many cases, a timely identification of the underlying issue can allow for life-saving intervention. This is especially true in the case of newborns. Genetic or congenital conditions affect nearly 6% of births, but clinical sequencing tests to identify these conditions typically take days or weeks to complete.

We recently worked with the University of California Santa Cruz Genomics Institute to build a method – called PEPPER-Margin-DeepVariant – that can analyze data for Oxford Nanopore sequencers, one of the fastest commercial sequencing technologies used today. This week, the New England Journal of Medicine published a study led by the Stanford University School of Medicine detailing the use of this method to identify suspected disease-causing variants in five critical newborn intensive care unit (NICU) cases.

In the fastest cases, a likely disease-causing variant was identified less than 8 hours after sequencing began, compared to the prior fastest time of 13.5 hours. In five cases, the method influenced patient care. For example, the team quickly turned around a diagnosis of Poirier–Bienvenu neurodevelopmental disorder for one infant, allowing for timely, disease-specific treatment.

Time required to sequence and analyze individuals in the pilot study. Disease-causing variants were identified in patient IDs 1, 2, 8, 9, and 11.

Applying machine learning to maximize the potential in sequencing data

Looking forward, new sequencing instruments can lead to dramatic breakthroughs in the field. We believe machine learning (ML) can further unlock the potential of these instruments. Our new research partnership with Pacific Biosciences (PacBio), a developer of genomic sequence platforms, is a great example of how Google’s machine learning and algorithm development tools can help researchers unlock more information from sequencing data.

PacBio’s long-read HiFi sequencing provides the most comprehensive view of genomes, transcriptomes and epigenomes. Using PacBio’s technology in combination with DeepVariant, our award-winning variant detection method, researchers have been able to accurately identify diseases that are otherwise difficult to diagnose with alternative methods.

Additionally, we developed a new open source method called DeepConsensus that, in combination with PacBio’s sequencing platforms, creates more accurate reads of sequencing data. This boost in accuracy will help researchers apply PacBio’s technology to more challenges, such as the final completion of the Human Genome and assembling the genomes of all vertebrate species.

Supporting more equitable genomics resources and methods

Like other areas of health and medicine, the genomics field grapples with health equity issues that, if not addressed, could exclude certain populations. For example, the overwhelming majority of participants in genomic studies have historically been of European ancestry. As a result, the genomics resources that scientists and clinicians use to identify and filter genetic variants and to interpret the significance of these variants are not equally powerful across individuals of all ancestries.

In the past year, we’ve supported two initiatives aimed at improving methods and genomics resources for under-represented populations. We collaborated with 23andMe to develop an improved resource for individuals of African ancestry, and we worked with the UCSC Genomics Institute to develop pangenome methods with this work recently published in Science.

In addition, we recently published two open-source methods that improve genetic discovery by more accurately identifying disease labels and improving the use of health measurements in genetic association studies.

We hope that our work developing and sharing these methods with those in the field of genomics will improve overall health and the understanding of biology for everyone. Working together with our collaborators, we can apply this work to real-world applications.