Institute for Systems Biology
  Home: Scientists and Research: Technology: Data Generation Print Page
Scientists and Research
Data Generation
 Microarrays
 MPSS
 SNP
 Microsatellite
 Quantitative Proteomics
 Affinity Purification
 Mass Spectrometry Analysis
 Peptide Fractionation
 Cell Sorting
 DNA Sequencing
Data Management
Data Visualization and Analysis
ISB Facilities
 Data Generation
Data Generation

Analysis of the molecular interactions that underlie biological processes has been accelerated by the development of more comprehensive technologies for collecting data. Some of these technologies (e.g., DNA sequencing and gene chips) have been commercialized into automated, large-scale, high-throughput platforms. For example, those emerging in the proteomics field, remain labor-intensive and relatively low throughput yet powerful.

Because molecular networks integrate across different data types, there is not a one-to-one correspondence between data type and technology, and the type of network analysis for which the data is collected for. Molecular interactions are analyzed from a number of different perspectives in light of available experimental technologies, background knowledge, and range of biological questions. Gene regulation, for example, entails DNA-protein interactions (transcription factors binding to cis-regulatory elements), and protein-protein interactions (transcription factors binding to each other and to cellular context-specific accessory proteins). Metabolic pathway analysis involves enzyme-substrate interactions and protein-protein macromolecular complexes. Cell signaling processes involve a complex interplay of metabolic and protein-protein interaction networks.

The technologies used to generate data for systems biology research can be grouped around different types of analyses.

  • Probing genetic frameworks: What is the genomic parts list of an organism? What genes interact in concert to regulate or create a molecular interaction network? How does genetic variation influence gene expression and protein function?

    Representative technologies: DNA sequencing, genotyping, large-scale gene deletion constructs; RNAi knockouts

  • Probing gene expression patterns: What genes are up-regulated or down-regulated in response to a genetic or environmental perturbation? What genes are expressed in what tissues under what conditions?

    Representative technologies: microarrays and DNA tagging procedures

  • Probing DNA-protein interactions: What genes does a particular transcription factor regulate under defined experimental conditions?

    Representative technology: chromatin-immunoprecipitation and gene chips to localize binding sites (ChIP-chip)

  • Probing protein-protein interactions: What proteins are present in enzyme complexes, nuclear pore complexes, the cytoskeleton? Which proteins modify other proteins in signaling cascades?

    Representative technologies: two-hybrid-based interactions; affinity purification; mass spectrometry; quantitative proteomics

  • Probing subcellular protein localization: When during development is a protein made and where in the cell does it go?

    Representative technologies: cell sorting, molecular imaging based on reporter genes or antibody staining

Exploring a gene regulatory network, or a cell signaling cascade, a metabolic pathway, might involve all of the above data-driven analyses.

For example, Dr. Aitchison´s lab at the ISB has observed that yeast make more peroxisomes in the presence of oleic acid, and the question is how? What genes are involved in peroxisome biogenesis? To answer this, Dr. Aitchison might use microarrays to examine which genes become more highly expressed when yeast are exposed to oleic acid and deletion mutants to knock out candidate peroxisome genes based on the gene expression results. What regulatory circuits kick in to induce the expression of the relevant genes? Here, Dr. Aitchison might have noticed that the expression of a select group of transcription factors increased when the yeast was fed oleic acid. To see if the set of genes potentially regulated by these transcription factors corresponds to the genes whose expression was observed to increase from the microarray experiments, he might do ChIP-chip and map the genomic location of the observed transcription factor binding sites.

Suppose that Dr. Aitchison finds that two or three different transcription factors give similar ChIP-chip results, suggesting that perhaps they act in combination to induce peroxisome genes. This hypothesis might be tested by purifying the transcription factor protein complexes using antibody affinity methods and subjecting the complexes to analysis by mass spectrometry; for determining the series of events by which proteins assemble into the macromolecular complexes that comprise peroxisomes, and when and where in the cell these proteins are made. To understand what is going on in the cell to create peroxisomes, Dr. Aitchison will take an integrated approach by collecting several different types of data.

An exhaustive list of data generation technologies employed by systems biologists is beyond the scope of this overview. Some representative technologies are briefly described. Data generation technologies used in research performed at the ISB are described in accompanying links and on many of the project description pages provided elsewhere on the website.

DNA sequencing: — It is used to delineate the genetic "parts list" of a species (i.e., genes, regulatory motifs, transposable elements etc.), and assess genetic variation within and between species (polymorphisms; comparative genomics). It also is used for constructing catalogues of transcripts ( e.g., EST or full-length cDNAs) made from mRNAs extracted from various cell types or tissue types. EST/cDNA catalogues are especially useful for revealing the presence of alternative splice forms that might be tissue-specific.

Libraries of deletion mutants and RNAi gene knockouts: — used to identify genes whose protein products are involved in specific molecular networks associated with gene regulation, macromolecular complex formation, cell signaling, and metabolic pathways. Synthetic lethal analysis is used to decipher genetic networks by examining the effects on the cell when pairs of genes are knocked out simultaneously. Knocking out each gene separately may have no phenotypic effect because of robustness provided by genetic redundancy, but knocking out both genes has a severe, possibly lethal effect.

Gene expression microarrays: — used to assess global patterns of gene expression. RNA is extracted from cell or tissue samples and bound to oligonucleotides or longer DNA molecules that have been affixed to a chip. The manufacture of "slides" that contain, all of the genes in a genome can be multiplexed, thereby allowing "apples-to-apples" comparison of gene expression pattern changes across experimental conditions. When microarry data are subjected to clustering algorithms, inferences are drawn regarding networks of genes whose protein products might participate in a molecular interaction network, such as a metabolic pathway.

DNA tagging: — sensitive methods used to quantitate gene expression profiles, thereby facilitating comparisons of effects between varying experimental conditions (e.g., normal vs. diseased tissue) or cellular responses to environmental perturbations. These techniques include serial analysis of gene expresssion (SAGE; see http://www.sagenet.org/findings/index.html) and multiple parallel signature sequencing (MPSS; see http://www.lynxgen.com/wt/tert.php3?page_name=mpss).

ChIP-chip microarrays:— used to identify DNA-protein binding interactions that might be indicative of regulatory circuits. Typically, non-coding and non-repetitive genomic DNA in the vicinity of genes (e.g., 5´ of the transcriptional start site, or 3´ of the first exon) is bound to a slide ("chip"). In the experimental cell or tissue sample, chromosomal DNA is fragmented by a procedure such as sonication. The proteins which are bound to the DNA are reversibly cross-linked to preserve the binding profiles. Antibody to a specific protein (e.g., a transcription factor that regulates a battery of genes) is added to the extract and the antibody-protein-DNA complex is precipitated ("chromatin immunoprecipitation, "ChIP"). After removal of the cross-linked protein, the precipitated DNA is mapped onto the gene chip, yielding a set of genes that is inferred to be regulated by the transcription factor examined under the experimental conditions assayed.

Yeast two-hybrid system:— used to identify interacting partners by fusing a protein of interest to a DNA-binding domain and transfecting a yeast host cell bearing a reporter gene controlling this DNA-binding domain. When this fusion protein cannot activate transcription on its own, it can be used as "bait" or as a "target" to screen a library of cDNA clones that are fused to an activation domain. The cDNA clones within the library that encode the proteins that are capable of forming protein-protein interactions, and with the bait are identified by virtue of their ability to cause activation of the reporter gene.

Affinity capture methods:— used to purify protein complexes by taking advantage of a feature that allows sequestration of the complex from a larger mixture such as binding to specific antibodies or DNA. Glycocapture methods are currently being developed.

Mass spectrometry:— used to identify peptide components of protein complexes by determining signatures based on their mass. Peptides are identified by searching the mass spec fragmentation patterns against peptide/protein databases for which the predicted fragmentation patterns of amino acid sequences have been pre-computed. (e.g., SEQUEST)

Quantitative proteomics:— used to compare protein compositions in different samples by seeding in isotope tags that allow measurement of ratios on the mass spectrometer.

Cell sorting: — used to separate and isolate cells based on features such as size, shape, DNA content, cell surface markers. This can be done using high speed cell sorters or microfluidic devices.

Molecular imaging: — used to localize proteins within a cell by antibody staining or to determine regulatory networks underlying gene expression by fusing a reporter gene to the promoter of a gene whose regulation is being investigated.

Alan Aderem


HOME | ABOUT ISB | NEWS | CAREERS | CONTACT ISB | SITE MAP | TERMS OF USE | PURCHASE TERMS | INTRANET
© 2007, Institute for Systems Biology, All Rights Reserved