Tools

Back to the Top

Series

New Genetics—From Sequence to Knowledge

When the first human genome sequence was finished in 2003, it quickly became clear that its seemingly unending stream of letters was not enough to comprehend what makes people tick. All the moving parts that bring the DNA code to life needed to be understood as well. To address this, researchers launched projects such as the Encylopedia of DNA Elements (Encode) and 1000 Genomes. In Encode, more than 440 scientists collaborated to identify functional parts of the genome—those regions that regulate how, when, and where different genes are turned on or off. In 2012, the consortium released data in a fusillade of papers, including six in the September 6 Nature. The scientists found that about 80 percent of the genome, much of which was previously considered “junk” DNA, has at least one biochemical function in some cells. Though the scientists have not yet examined every human cell type, this latest effort widens their window into the intricacies of the genetic code. The data could help researchers understand how genetic variants outside of genes can alter risk for disease, including neurodegenerative disorders such as Alzheimer’s. Other genetics initiatives are generating bucket loads of sequence data for researchers to mine.

ENCODE Turns Human Genome From Sequence to Machine

This is Part 1 of a two-part story. See also Part 2.

When the first human genome sequence was finished in 2003, it quickly became clear that its seemingly unending stream of letters was not enough to understand what makes people tick. All the moving parts that bring the DNA code to life needed to be understood as well. To address this, researchers launched the Encyclopedia Of DNA Elements (ENCODE) in 2003. More than 440 scientists collaborated to identify functional parts of the genome—those regions that regulate how, when, and where different genes are turned on or off. This month, the consortium released its latest data in a fusillade of papers, including six in the September 6 Nature. Together, the scientists found that about 80 percent of the genome, much of which was previously considered “junk” DNA, has at least one biochemical function in some cells. Though the scientists have not yet examined every human cell type, this latest effort widens their window into the intricacies of the genetic code. The data could help researchers understand how genetic variants found outside of genes can alter risk for disease, including neurodegenerative disorders such as Alzheimer’s (see Part 2). "This short-circuits a lot of work that you'd have to do to figure out signals from a genetic study," said Gerard Schellenberg, University of Pennsylvania School of Medicine, Pennsylvania.

ENCODE is funded by the National Human Genome Research Institute (NHGRI). Its public consortium of scientists, who hail from more than 30 institutions around the world, has so far examined 147 cell types in about 1,640 separate datasets to identify functional elements in human DNA. A Nature special feature helps navigate the findings, allowing users to search by topic thread or by individual paper. In more than 30 published papers, scientists map the genome by regions of transcription, transcription factor binding, histone modification, and methylation. A five-minute Nature YouTube video explains the rationale behind ENCODE.

"At the start of this project, most people thought that if genes make up 2 percent of the genome, maybe another 3 or 4 percent made up the instruction that controls them," said John Stamatoyannopoulos, University of Washington School of Medicine, Seattle, an ENCODE project leader. "It has been surprising just how much of the genome actually is involved."

One interesting map for those studying Alzheimer's disease and other neurodegenerative disorders pinpoints the genome's regulatory sites. These are the enhancers, promoters, insulators, and silencers (see Thurman et al., 2012). It turns out that almost 2.9 million regulatory regions populate the genome, 10 times more than expected at ENCODE's outset. Only 200,000 are active in any given cell at one time.

The researchers discovered the regulatory elements by identifying DNA sequences that had unwound from their histone spools in preparation to affect transcription elsewhere in the genome. By determining which stretches of DNA were active at the same time as certain promoters, researchers matched 20 percent of regulatory elements to specific genes that were regulated by them. Interestingly, non-coding single-nucleotide polymorphisms that turn up in genomewide association studies (GWAS) are often found in these very regions (see Part 2 of this series). In the Alzheimer’s field, researchers have associated non-coding polymorphisms with disease, but are struggling to interpret how they might alter risk (see ARF related news story).

"Understanding the genome sequence got us a long way in being able to do genetic studies of disease, but having this map of genome function is important for understanding the actual biology that leads to disease," said Stamatoyannopoulos.

Mapping regulatory elements will also allow researchers to find potentially important genomic regions for disease treatment, including for Alzheimer's disease (AD), said Schellenberg, who noted, for example, that tinkering with the regulation of β-secretase (BACE1)—one of the enzymes that produces Aβ—is one possible way to treat AD. "You can go into ENCODE and at least begin to see the potential regulatory units and networks to target," he said.

Companion papers offer more insight about control of gene transcription. For instance, researchers led by Job Dekker and Amartya Sanyal, both of the University of Massachusetts Medical School in Worcester, have begun to map long-range interactions between enhancers and gene promoters that are quite distant from each other in the genome but may come into contact when DNA is folded and coiled around histones (see Sanyal et al., 2012). They have mapped but 1 percent of the genome, but in doing so they found, among other things, that both enhancers and promoters can interact with multiple long-range partners.

In another study, scientists including Mark Gerstein of Yale University, New Haven, Connecticut, and Michael Snyder of Stanford University, California, have worked out how and where different combinations of 119 transcription factors bind to DNA to regulate gene activity (see Gerstein et al., 2012). Hundreds of other transcription factors remain to be analyzed; even so, this is one step toward understanding how networks of these regulatory proteins function and contribute to human disease, wrote the authors.

One shortcoming of ENCODE is that it did not sample brain tissue, Schellenberg noted. "There are many different kinds of cell types in the brain, many different kinds of neurons, and each probably has its own gene regulatory network." Stamatoyannopoulos said that researchers hope to look at more cell types associated with human disease, such as the brain, in the future.

ENCODE researchers still have a long road ahead before they reach a thorough understanding of how the genome behaves at different stages of development, in different tissues, and within individual cells of a heterogeneous tissue. Phase 3 of the project is dedicated to filling in these blanks. However, firsthand observation may not be necessary or even possible for every conceivable situation, wrote Eran Segal in an accompanying News and Views article. He suggested that when scientists learn enough about the genome, they may be able to predict how it will behave in different circumstances. "We must work towards deriving quantitative models that integrate the relevant protein, RNA, and chromatin components; describe how these components interact with each other; how they bind the genome; and how these binding events regulate transcription," he said. "If successful, such models will be able to predict the genome’s function at times and in settings that have not been directly measured."—Gwyneth Dickey Zakaib

This is Part 1 of a two-part story. See also Part 2.

Comments

No Available Comments

Make a Comment

To make a comment you must login or register.

References

News Citations

  1. Newly Mapped DNA Elements Help Interpret GWAS
  2. Barcelona: What Lies Beyond Genomewide Association Studies?

Paper Citations

  1. . The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82. PubMed.
  2. . The long-range interaction landscape of gene promoters. Nature. 2012 Sep 6;489(7414):109-13. PubMed.
  3. . Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 Sep 6;489(7414):91-100. PubMed.

External Citations

  1. Encyclopedia Of DNA Elements
  2. special feature
  3. Nature YouTube video

Further Reading

Papers

  1. . An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012 Sep 6;489(7414):83-90. PubMed.
  2. . Landscape of transcription in human cells. Nature. 2012 Sep 6;489(7414):101-8. PubMed.
  3. . Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012 Sep;22(9):1790-7. PubMed.
  4. . Linking disease associations with regulatory information in the human genome. Genome Res. 2012 Sep;22(9):1748-59. PubMed.
  5. . Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 2012 Sep 5;13(9):R48. PubMed.
  6. . Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012 Sep 7;337(6099):1190-5. PubMed.

Newly Mapped DNA Elements Help Interpret GWAS

This is Part 2 of a two-part story. See also Part 1.

With the advent of inexpensive genotyping technology, genomewide association studies (GWAS) have turned up thousands of point changes in DNA that can alter risk for disease. Most of those “hits” appear in genomic regions that code for no particular gene, leaving researchers puzzled about how they exert their influence. Now, scientists led by John Stamatoyannopoulos, University of Washington, Seattle, provide some clues. The researchers correlated newly catalogued functional regions in the human genome with published GWAS. They found that, not only did the majority of reported variants fall in regulatory DNA that controls gene expression, but they do so only in cells linked to pathology. That suggests these variants could indirectly influence coding genes that then go on to affect disease. "It basically says we've only been looking at the tip of the iceberg with these GWAS," said Stamatoyannopoulos. The findings, reported in the September 7 Science, come hot on the heels of a slew of related papers in Nature, revealing new insights into functional elements in the human genome (see Part 1). Together, the work could help researchers interpret some seemingly obscure links between genetic polymorphisms and neurodegenerative diseases.

GWAS of diseases, or clinical traits such as plasma cholesterol, look for genomic changes that could explain the phenotype at hand. Some studies reported GWAS hits in regulatory regions of the genome (see, e.g., Pomerantz et al., 2009 and Musunuru et al., 2010), but it was unclear whether the observation was specific to those particular diseases or true in general. "We showed that it is really across the board. Every single disease or trait we looked at shows the same phenomenon," said Stamatoyannopoulos.

To broadly examine regulatory regions, joint first authors Matthew Maurano, Richard Humbert, and Eric Rynes constructed an overall genomic map by analyzing 349 cell and tissue types—including fetal tissues, tumor cells, pluripotent cells, hematopoietic cells, and cultured cells—as part of the Encyclopedia Of DNA Elements Project (ENCODE) and the Roadmap Epigenomics Program (see ARF related news story). Both of these projects probe the finer points of genomic function. Maurano and colleagues cut the DNA into pieces with nuclease DNase1. This enzyme preferentially snips DNA that has been exposed, such as when regulatory proteins unwind the nucleic acid to activate gene transcription. After the DNA had been cut, the research group analyzed the hundreds of millions of DNA fragments to pinpoint the DNase1-hypersensitive sites (DHSs).

Almost four million DHSs pervade the genome, the team found. On average, about 200,000 DHSs were active in any one cell. Each type of cell had a unique DHS pattern, depending on which areas of DNA were active. With this DHS map in hand, the team looked at more than 5,500 single-nucleotide polymorphisms (SNPs) found in GWAS of hundreds of diseases and quantitative traits. About 75 percent of these were in or near DHSs, suggesting that a considerable number of non-coding GWAS hits are functional and exert effects on coding genes through regulation, the authors wrote.

Probing further, the authors found that specific SNP-containing genome regions often sported DHSs only in disease-relevant cells. As an example, a DHS popped up in a fetal heart cell (but not a brain cell) around a coronary heart disease-associated mutation. In a few hundred cases, DHSs harboring GWAS-related variants seemed to control distant genes, up to 500 kilobases—that is, several genes—away. In addition, seemingly unrelated SNPs that had previously been linked to individual diseases within a family of disorders (such as autoimmune disease) often struck elements that were recognized by the same transcription factor. "[That correlation] allows you to construct relationships among diseases in ways that nobody had previously anticipated," said Stamatoyannopoulos.

About 88 percent of non-coding SNPs lay in DHSs active in early fetal development. Most of the diseases or traits tied to these SNPs are thought to start in the womb or are influenced by development. The remaining SNPs, including some related to Alzheimer's disease, breast cancer, and lupus, occurred in DHSs found only in adult tissues. The majority of those diseases and traits have not previously been associated with any early causation. "This would suggest that pathology likely begins in the adult stage," said Stamatoyannopoulos. Researchers studying AD debate how early in life the disease begins to take hold. Though the researchers possess scant data so far on DHSs encompassing AD-linked SNPs, Stamatoyannopoulos hopes to sample adult brains in the coming year.

"These results are potentially very important to people carrying out GWAS in neurodegenerative disorders," wrote Peter Holmans, Cardiff University School of Medicine, U.K. in an e-mail to Alzforum. Not only will these findings help weed out true GWAS signals, but "they will have profound implications for pathway analyses, both on the way that GWAS SNPs are assigned to genes and how genes are grouped into pathways."

The study adds a level of support to an idea that scientists have held for some years. It is that several genes, each imparting a subtle effect on biology, may cause a disease, rather than one mutation causing its own profound effect, said Julie Williams, also at Cardiff University. "It also tells us that there's yet another layer of complexity, that [DNA] regions may actually affect genes that are some distance away, not necessarily the most proximal gene."

If this study can be validated, researchers may look specifically for mutations in the regulatory DNA in GWAS, increasing statistical power and the ability to detect disease-associated variants, wrote Christiane Reitz, Columbia University, New York, to Alzforum in an e-mail (see full comment below).

In addition to studying the human brain, the Seattle group will sample other tissues associated with human disease, expand the range of GWAS hits they explore, and continue to improve the technology. "The ultimate goal of this line of research is to have a complete map of the regulatory circuitry of the human genome," Stamatoyannopoulos said. With such a map, researchers could understand how the genome controls everything from normal developmental processes to disease.—Gwyneth Dickey Zakaib

This is Part 2 of a two-part story. See also Part 1.

References

News Citations

  1. ENCODE Turns Human Genome From Sequence to Machine
  2. Bethesda: "Ome" Sweet "Ome"—Epigenome Joins Genome, Proteome

Paper Citations

  1. . The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat Genet. 2009 Aug;41(8):882-4. PubMed.
  2. . From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature. 2010 Aug 5;466(7307):714-9. PubMed.

External Citations

  1. Encyclopedia Of DNA Elements Project
  2. Roadmap Epigenomics Program

Further Reading

Papers

  1. . Genetic variants influencing human aging from late-onset Alzheimer's disease (LOAD) genome-wide association studies (GWAS). Neurobiol Aging. 2012 Aug;33(8):1849.e5-1849.e18. PubMed.
  2. . For Alzheimer disease GWAS, pulling needles from the haystack is just the first step. Neurology. 2012 Jul 17;79(3):204-5. PubMed.
  3. . The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82. PubMed.

Genetics Project Update: Over 1,000 Genomes and Counting

Your mama always told you that you were special. Now, an international consortium of researchers has clarified just how unique we all are. Each human being harbors several thousand genetic variants, hundreds of which are rare, according to a progress report from the ongoing 1000 Genomes Project in the November 1 Nature. Despite overshooting the titular quantity—it reports on 1,092 genomes from 14 nations—the sequencing consortium is still collecting and reading DNA in its effort to record 2,500 total genomes from a wider geographical area.

While genomewide association studies (GWAS) help researchers discover common genetic variants, the genome and exome sequencing used by the 1000 Genomes group can pick out uncommon ones. Based on a pilot project, the consortium’s leaders elected to perform both low-coverage whole-genome sequencing, reading each strand a handful of times, and deep sequencing, with up to 100 reads of exomes only (see ARF related news story on 1000 Genomes Project Consortium, 2010). While exomes encode proteins, scientists in another large project, the Encyclopedia of DNA Elements (ENCODE), recently reported that most of the rest of the genome has important functions, too (see ARF related news story on ENCODE Project Consortium et al., 2012).

The consortium sequenced DNA from 14 different populations, ranging from the Yoruba of Ibadan, Nigeria, to Han Chinese in Beijing, to African-Americans in the southwestern United States. The 1,092 genomes included 38 million single nucleotide polymorphisms (SNPs) and 1.4 million insertions and deletions. That accounts for most common variants, as well as 98 percent of the rare SNPs found in one in 100 people. While common variants are global, rare ones are often limited to small populations. Rare variants also tend to be deleterious to the proteins they encode, the authors reported.

“This type of massive resource helps us to improve our disease research,” said Gerard Schellenberg of the University of Pennsylvania in Philadelphia, who was not involved in the consortium. He and others have been using the 1000 Genomes data for imputation, a method of deduction geneticists use to fill in gaps in GWAS datasets. The typical gene chips used in GWAS identify a long list of single nucleotide polymorphisms, but do not cover all possible variant sites. Since nearby sequences are co-inherited in chunks, knowing some of the linked variants typically allows researchers to predict the others. Imputation accuracy, according to the Nature paper, is 90-95 percent for common variants. Researchers can use sequencing to confirm imputed variants, wrote Philippe Amouyel of the Institut Pasteur de Lille, France, in an e-mail to Alzforum (see full comment, below).

It is no surprise, Schellenberg said, that the consortium found that rare variants tend to appear only in specific populations. However, the 1000 Genome findings underscore how careful geneticists will have to be in matching control populations precisely to case populations, Schellenberg said. Scientists studying a rare variant must take care to show it is truly linked to disease, and not simply the ethnicity or background of the population they are studying, he said. Large numbers of study samples will also be required to find rare, disease-linked variants, added Amouyel, who was not part of the 1000 Genomes team.

Researchers working on Alzheimer’s and other neurodegenerative diseases are doing just that. While GWAS have uncovered common variants that confer relatively low risk for AD, some researchers believe that rare variants will be found that are of much higher risk. Even if these polymorphisms only affect a few people or a single population, they can provide valuable clues about pathological pathways, Schellenberg said. For example, a newly discovered, protective APP variant in Icelanders helped support the β amyloid hypothesis (see ARF related news story on Jonsson et al., 2012).—Amber Dance

References

News Citations

  1. Next-Generation Sequencing: Boldly Going Where No Geneticist...
  2. ENCODE Turns Human Genome From Sequence to Machine
  3. Protective APP Mutation Found—Supports Amyloid Hypothesis

Paper Citations

  1. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed.
  2. . An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. PubMed.
  3. . A mutation in APP protects against Alzheimer's disease and age-related cognitive decline. Nature. 2012 Aug 2;488(7409):96-9. PubMed.

External Citations

  1. 1000 Genomes Project
  2. Encyclopedia of DNA Elements

Further Reading

Papers

  1. . Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012 Mar 16;148(6):1293-307. PubMed.
  2. . Alzheimer's genetics in the GWAS era: a continuing story of 'replications and refutations'. Curr Neurol Neurosci Rep. 2011 Jun;11(3):246-53. PubMed.
  3. . Genomics: ENCODE explained. Nature. 2012 Sep 6;489(7414):52-5. PubMed.
  4. . An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012 Sep 6;489(7414):83-90. PubMed.
  5. . Genomics: In search of rare human variants. Nature. 2010 Oct 28;467(7319):1050-1. PubMed.
  6. . The 1000 Genomes Project: new opportunities for research and social challenges. Genome Med. 2010;2(1):3. PubMed.

Thousands of Whole Genomes to Be Mined for New Clues to AD

With its ever-dropping price tag, whole-genome sequencing is taking off in Alzheimer's research. Several new collaborative projects aim to use this technology to more deeply understand Alzheimer's disease and find genetic clues to better diagnostics and therapeutics. In the latest such venture, the Cure Alzheimer's Fund donated $5.4 million to Massachusetts General Hospital in Boston (see press release) to sequence 1,500 samples from a DNA repository collected from members of families with AD. Scientists from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) have also started sequencing all of its participants' genomes.

"There are several sequencing projects in progress or in the planning stages, so this is an exciting time for AD genetics research," said Alison Goate, Washington University in St. Louis, Missouri. "These sequencing efforts are important because they represent some of the first to apply whole-genome sequencing approaches in AD."

From the beginning of November this year until about mid-summer 2013, scientists led by Rudolph Tanzi of Mass General will partner with Illumina, Inc., to sequence all 1,500 genomes kept in the National Institute of Mental Health (NIMH) AD family sample. Most DNA in the repository comes from sporadic AD patients and their relations, but a small subset was taken from families who have early onset AD with unknown genetic mutations. In addition to these samples, the team will sequence genomes from a brain repository kept by Bradley Hyman and Teresa Gomez-Isla, also at Mass General. About 100 people who died with normal cognition but turned out to have significant AD pathology donated those tissues.

Comparing whole-genome sequences from those with and without AD or cognitive deficits will allow scientists to find more than just the genetic risk markers that genomewide association studies provide, said Tanzi. "The ultimate goal of any genetics experiment is to find the functional change in the DNA that matters biologically, and that requires whole-genome sequencing," he told Alzforum. "Now, the cost has come down enough that we can afford to sequence the whole genome in these large cohorts." Mass General will manage and analyze the data and provide access to scientists around the globe through NIH public databases.

The Mass General news follows last July’s announcement of a similar push within ADNI to sequence the genomes of all 800-plus participants. Scientists already have longitudinal neuroimaging and fluid biomarkers for these individuals. Researchers have sequenced more than 100 ADNI genomes thus far, and the rest may be completed within the next six months, said Arthur Toga, University of California School of Medicine in Los Angeles. Illumina will help sequence all the genomes, with funding from the Alzheimer's Association and the Brin Wojcicki Foundation, led by Google co-founder Sergey Brin and 23andMe co-founder Anne Wojcicki. After various quality control measures are complete, the data could be publicly available in the next nine months, though the time frame is not certain. Perhaps the greatest challenge is figuring out how to disseminate such large amounts of data, expected to reach nearly 200 terabytes, Toga said.

These large-scale projects add to at least one other large whole-genome sequencing effort afoot. The New York Genome Center and The Feinstein Institute for Medical Research announced in February that they, too, would partner with Illumina to sequence up to 1,000 AD patient genomes over four years. The goal is to compare these genomes to those from a group of healthy elderly controls to investigate genetic risk for AD and find the underlying molecular pathways of neuronal degeneration. Once compiled, data will be freely available to the scientific community.

"The most important aspect of each of these projects is that the data will be shared," said Goate. "It is very likely that we will need to combine sequencing data across multiple studies to have good power to identify rare variants of large effect."

Not only that, but having publicly available data means that many more scientists will be engaged in studies than otherwise, said Toga. "There are a lot of people who may have potentially creative and innovative ways of analyzing the data that maybe we didn't think of," he told Alzforum. "The more people we engage, the better the likelihood that these tough problems will be solved."—Gwyneth Dickey Zakaib.

Comments

No Available Comments

Make a Comment

To make a comment you must login or register.

References

External Citations

  1. press release
  2. announcement
  3. announced

Further Reading

Papers

  1. . The 1000 Genomes Project: new opportunities for research and social challenges. Genome Med. 2010;2(1):3. PubMed.
  2. . The Predictive Capacity of Personal Genome Sequencing. Sci Transl Med. 2012 Apr 2; PubMed.
  3. . Personal genome sequencing: current approaches and challenges. Genes Dev. 2010 Mar 1;24(5):423-31. PubMed.