1 October 2006. This week researchers from the Broad Institute at MIT and Harvard unveiled a public database of genomic signatures for 164 drugs and bioactive compounds in human cells. In a paper published in today’s Science, the group, led by Todd Golub and Eric Lander, dubs the data set the “connectivity map,” because the data, along with new pattern-matching software, allow researchers to make connections among genes, diseases, and drugs. Of particular interest, they used the information to show that the gene expression profile associated with an inhibitor of amyloid-β (Aβ) fibril formation was the mirror image of the expression signature of human Alzheimer disease brain. The map, which Golub and coworkers hope to expand to cover more compounds and more cell types, should facilitate the discovery and development of new drugs for many conditions.
In other big-picture studies, papers appeared this week showing an interesting pyramid scheme—that’s the shape of transcription factor regulatory networks as analyzed by Mark Gerstein and Haiyuan Yu of Yale. This week also saw the public release of a huge genome-wide SNP analysis of Parkinson disease from Andrew Singleton and John Hardy at the NIH, and a host of collaborators
The ultimate goal of the Broad group is to produce a large public database of gene signatures which can be queried to find common patterns associated with diseases, drugs, and physiological processes. To start, the researchers used gene expression in cultured human cells as a model system. Since the variations in cell types and treatment options are limitless, they settled on exposing an easily grown human lung cancer cell line (MCF7) to a selection of 164 different drugs and bioactive compounds for 6-12 hours. Over the course of a year, first author Justin Lamb and colleagues collected 564 sets of genome-wide mRNA expression profiles.
To generate an expression signature for each profile, they ranked expression of the 22,000 genes measured in each sample according to how much they varied from control levels. Induced genes got a positive sign, repressed genes a negative. This unit-less scoring system allows comparison of data across experimental platforms. To query the database, they took expression data from published reports, generated a similar signature, and compared it to all the signatures in the database to generate a ranking of all the database profiles based on similarity. Thus, if many of the same genes appear near the top of the list, and the negative near the bottom, the researchers called that positive connectivity. On the other hand, if plus genes in the query appear near the bottom and negative genes near the top in a database profile, there is an inverse relationship between the states, which they call negative connectivity. By assigning a connectivity score between +1 and -1 to the profiles, the researchers ranked the reference sets from the most strongly correlated to the most strongly anti-correlated.
To illustrate how data can be used, the researchers showed several examples of possible queries and outcomes. Gene expression data from published studies on small molecules, such as histone deacetylase (HAD) inhibitors, were positively connected with their samples using other HDA inhibitors in the database. Query profiles of estrogen-treated cells connected positively to the estrogen-treated samples in the database, and negatively to an anti-estrogenic compound. The researchers also showed that they could detect similar gene profiles for a number of the common anti-psychotic phenothiazines, despite the fact that the data on gene expression was not collected in neurons. Thus, connections could be made despite the use of different cells types and concentrations of drugs used in the query and database. Two papers published concurrently in Cancer Cell show in more detail the use of the database to discover the mechanisms of action of a novel compound and to identify novel actions of known drugs (Hieronymus et al., 2006; Wei et al., 2006).
To look for connections to disease states, the group queried the database with profiles derived from two independent reports of gene expression changes in Alzheimer disease brains. Although the two profiles had no genes in common, they both yielded negative connectivity with profiles of cells treated with a compound 4,5-dianilinophthalimide (DAPH). DAPH was discovered in an in vitro screen for small molecules that could inhibit formation of Aβ fibrils, and analogs have been produced as potential AD treatments (see ARF related news story and Hennessy et al., 2005). This example suggests that the method could be used to discover new drug candidates based on disease-specific perturbations in gene expression in AD, and in other neurological diseases.
There are some limitations to the method. The researchers did not find connectivity in their database with some dopamine receptor antagonists, because the test cells (MCF7) lack dopamine receptors. Also, as of yet there is no good way to judge the statistical significance of the connectivity scores.
The authors liken the new method to comparing DNA sequences: eventually, they would like to describe all biological states with transcriptional signatures. Calling the database the “first installment” of a reference collection of transcriptional profiles, they propose
a community effort to expand the connectivity map to cover more cell types, more compounds, and more treatment variations. The current set of data is available on the Web at www.broad.mit.edu/cmap, where users can query the resource with their own signatures.
Transcriptional profiles are set in place by the combined action of many transcription factors, and the question of how these networks are shaped is the subject of another paper out this week, this one in PNAS Early Edition. Haiyuan Yu and Mark Gerstein from Yale University in New Haven, Connecticut, looked at the gene regulatory networks in Escherichia coli and yeast, and found that transcription factor hierarchies take on a pyramid shape. By analyzing how the factors are regulated, and regulate, they found there are a few big bosses at the top who get inputs from protein-protein interactions with cell signaling molecules. The top dogs then control a layer of middle managers, who regulate a larger class of low-level factors. In the scheme, the middle managers directly control more targets than the top level. And while the top factors are most influential, they were often dispensable, while the lowest level tended to be essential to cell survival. They speculate that this is because the top-level factors modulate pathways, while those at the bottom are solely responsible for turning on specific critical genes. The hierarchy resembled efficient corporate and government structures.
Last but not least, in other news of the big-picture type, a multi-institute group of researchers, headed up by Andrew Singleton at the National Institute on Aging in Bethesda, Maryland, have published the first genome-wide single nucleotide polymorphism (SNP) map for Parkinson disease. With samples from 267 PD patients and 270 normal controls, the researchers genotyped more than 408,000 unique SNPs, generating 220 million genotypes. The study, which appeared online this week in the Lancet Neurology, did not replicate the SNPs implicated in a smaller study published last year (Maraganore et al., 2005), and indeed the new data identified no SNPs strongly associated with the risk of PD. However, the study did identify loci that would be candidates for further examination in larger studies with additional populations. As the authors conclude, “These data suggest that there is no common genetic variant that exerts a large genetic risk for late-onset Parkinson’s disease in white North Americans. These data are now available for future mining and augmentation to identify common genetic variability that results in minor and moderate risk for the disease.” This genotyping data represents the largest public release of high-density SNP data outside of the International HapMap Project.—Pat McCaffrey.
Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006 Sep 29;313(5795):1929-35. Abstract
Yu H, Gerstein M. Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci U S A. 2006 Sep 26; [Epub ahead of print] Abstract
Fung H, Scholz S, Matarin M, Simón-Sánchez J, Hernandez D, Britton A, Gibbs JR, Langefeld C, Stiegert ML, Schymick J, Okun MS, Mandel RJ, Fernandez HH, Foote KD, Rodríguez RL, Peckham E, De Vrieze FW, Gwinn-Hardy K, Hardy JA, Singleton A. Genome-wide genotyping in Parkinson's disease and neurologically normal controls: first stage analysis and public release of data. Lancet Neurology. 2006 September 28; published online. Abstract
Hieronymus H, Lamb J, Ross KN, Peng XP, Clement C, Rodina A, Nieto M, Du J, Stegmaier K, Raj SM, Maloney KN, Clardy J, Hahn WC, Chiosis G, Golub TR. Gene expression signature-based chemical genomic prediction identifies novel class of HSP90 pathway modulators. Cancer Cell. 2006 Sep 29; immediate early publication. Abstract
Wei G, Twomey D, Lamb J, Schlis K, Agarwal J, Stam RW, Opferman JT, Sallan SE, den Boer ML, Pieters R, Golub TR, Armstrong SA. Armstrong. Gene expression-based chemical genomics identifies rapamycin as a modulator of MCL-1 and glucocorticoid resistance. Cancer Cell. 2006 Sep 29; immediate early publication. Abstract