Significance of Research
Recent advances in genetic technologies and the completion of the Human Genome Project have generated a plethora of information that is essential to the understanding of specific aspects of human biology and medicine. Alongside the goal of sequencing the entire human genome and identifying genes associated with every disease, there has been a tremendous drive to design and maximize technologies that would have a profound impact on drug discovery, development, and therapy within the pharmaceutical industry. Automated instrumentation such as the DNA microarray system has enabled economic high-throughput DNA sequencing and gene mapping for genomic research (Satoh, 2008). Site-directed mutagenesis has allowed protein scientists to identify key amino acids that confer activity in specific proteins. In addition, several databases have been created to accommodate both protein and nucleic acid sequences that have been reported by scientists around the world. Among these databases are those of the National Center for Biotechnology Information (NCBI), the Protein DataBank (PDB), and Ensembl. This proposal aims to perform modeling, analysis, and simulation of complex biological networks through the use of available software and database information.
Review of Background of the Study
Use of computational data and bioinformatics tools has been used in the field of biology for the last few decades and has resulted in the establishment of the specialty field of systems biology. Such area uses information generated from computer software analyses of relationships between sequences, as well as to predict functions of specific genes and proteins that have been isolated and sequenced. Such efforts have thus covered a significant amount of analysis for the human genome and the employment of model organisms has also facilitated the understanding of the processes that occur within biological systems. Simple genetic tests have revealed that significant modifications have occurred in the genomes of different species in order to survive in the ever-changing environment and at the same time carrying their own genetic material. Computational analysis of different biological systems thus allows the comparison of genomes of different species to provide evidence on the actual events that must have occurred during the course of evolution.
Several models of evolution have fascinated scientists for almost a century and to date, a number of mechanisms have been proposed. One of the theories is that of chromosomal rearrangements that have resulted in speciation. The most acclaimed observation is that of the fusion of two chromosomes in the genome of the chimpanzee, resulting in the human genome which has been determined to be 98% similar in nucleic acid sequence yet missing a chromosome pair. The exact mechanism behind this fusion involves the telomeres of chimpanzee chromosomes 12 and 13, which are represented in the human genome as chromosome 2. Another mechanistic theory that has recently caught the attention of computational biologists is the presence of segmental duplications, which are small segments of DNA sequences that have been copied and transposed to another location within the genome (Perry et al., 2008). Segmental duplications have been determined to be recent, occurring in the last few hundred million years and showing high sequence homology, often approximately more than 90% in sequence similarity. It is thus the aim of the Ph.D. proposal to perform analysis of sequences that have been submitted in nucleic acid and protein databases in order to generate possible mechanisms that may have been responsible for biological events such as evolution, speciation, and differentiation.
The variation in the size of the genomes of different species as compared to that of the other species has generated questions regarding genome reduction. It has earlier been postulated that the genome of more recent species is remnants of earlier species. More interestingly, there are certain organelles that have been determined to be cellular inclusions that were introduced during the course of evolution. For example, the chloroplast has been proposed to be an endosymbiotic organelle, originating from a cyanobacterium to enter a eukaryotic cell that was in need of machinery that could generate its own nutrient and is thus under the control of the nucleus (Larkum et al., 2007). A couple of decades later, this observation had been further strengthened by sequencing data generated from genomes extracted from chloroplasts. In addition, the regulation of chloroplast activities by the dominant nucleus has been observed in most of the eukaryotic cells, suggesting that the nucleus is inherently equipped as a diploid genome with two alleles representing two copies of a single gene. This setting thus provides more complex machinery in handling the activities of the entire cell. With the nucleus carrying two homologous chromosomes per pair, the cell has the opportunity to include any of the two chromosomes during meiosis for independent segregation. During fertilization, the developing meiotic cell thus carries genes that follow simple Mendellian laws of assortment.
However, it should be understood that although the genomes of chloroplasts are largely smaller than the nucleus, these endosymbiotic organelles produce a significant amount of DNA for the eukaryotic cell (Lawrence 708). The exact contribution of the chloroplast to the entire cellular genome depends on the plant species, ranging from at least 1,000 genomes to almost 2,000 chloroplast genomes per cell. It has been estimated that the variation in the contribution of the chloroplast genome to the total cellular genome may be influenced by the actual genome content of the nucleus of the host eukaryotic cell (Millen et al., 2001). The ploidy condition of the eukaryotic cell also influences the size of the chloroplast genome, as well as the number of chloroplasts that are present in the eukaryotic cell. It should also be understood that the chloroplast may contain several genomes in one organelle hence the number of copies of each gene may vary depending on several factors inherent to the cell. One example that shows such variation is associated with the gene producing the enzyme ribulose bisphosphate carboxylase/oxygenase (RuBisCo). The enzyme is composed of two subunits, one large and one small component. The large subunit of this enzyme is generated by a gene that is present in the endosymbiotic organelle as several copies, while the small subunit of the enzyme is produced by a gene that is present in the nucleus of the eukaryotic cell. Well-coordinated machinery is thus necessary in order to completely produce the enzyme for plant biological activities.
Specific Aims for Research
The goal of this proposal is to stimulate the use of systems biology concepts in addressing specific conditions and phenomena in nature. This proposal would like to answer the following questions: How do biological patterns and components interact with the rest of the biological activities? How do gene-environment interactions influence the rate and occurrence of physiological activities?
A scientific investigation on the relationship between computational biology and biological events is proposed. These outcomes include, but are not limited to the occurrence of genomic rearrangements in a particular species. Protein and nucleic acid sequences will be extracted from existing genome and protein databases and will be analyzed using in order to model the dynamics of biological networks (Brown et al., 2008). Of particular focus will be the topic on signaling pathways such as the triggered by growth and transcriptions factors, as well as issues on estimating the parameters that initial such biochemical cascades, including association and dissociation constants. The research also aims to construct unique pathways by analyzing structural and sequence information. Experimental evidence will also be generated in order to further support the computational results.
Computational biology techniques will thus be employed in the development, understanding, and elucidation of the mechanism of specific biological outcomes. This revolutionary field of molecular biology, in association with sequence informatics, has the potential to confound results of biological systems, particularly for a specific phenomenon where the protein or nucleic acid sequences are involved. Study designs eventually are applied to actual in vivo biological systems involving model organisms such as the nematode (Caenorhabditis elegans), the fruit fly (Drosophila melanogaster), or the mouse (Mus musculus).
The interaction between nucleic acid sequences and protein in biological systems is an understudied area of biology. This is surprising as both biomolecular entities are complex to understand and yet these molecules hold the key to a better understanding of biological function. In addition, the interactions between proteins and nucleic acids, as well as proteins to other proteins, have yet to be analyzed further in order to generate mechanisms that may explain certain pathways that influence proper biological functions, from survival to evolution. The research proposal aims to campaign for research excellence in integrative biology, in order to provide answers to previously understudied areas.
Brown M, He F, Yeung LF (2008): Robust measurement selection for biochemical pathway experimental design. Int J Bioinform Res Appl. 4(4):400-16.
Larkum AWD, Lockhart PJ and Howe CJ (2007): Shopping for plastids. Trends in Plant Science 12:189–195.
Millen RS, Olmstead RG, Adams KL, Palmer JD and Lao NT (2001): Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus. Plant Cell 13:645–658.
Perry GH, Yang F, Marques-Bonet T, Murphy C, Fitzgerald T, Lee AS, Hyland C, Stone AC, Hurles ME, Tyler-Smith C, Eichler EE, Carter NP, Lee C, Redon R (2008): Copy number variation and evolution in humans and chimpanzees. Genome Res. 18(11):1698-710.
Satoh J (2008): Recent progress in bioinformatics for microarray analysis. Yakugaku Zasshi. 128(11):1537-45.