What is Bioinformatics?

Image Credit: 
Main Image: 

The genomic era has seen a gigantic explosion in the amount of biological information offered due to enormous advances in the fields of molecular biology and genomics. Bioinformatics is the application of computer technology to the management and analysis of biological data. The result is that computers are being used to gather, store, analyze and merge biological data. Bioinformatics is an interdisciplinary research area that is the interface between the biological and computational sciences. The primary goal of bioinformatics is to uncover the wealth of biological information hidden in the mass of data and obtain a clearer insight into the fundamental biology of organisms. This new knowledge could have profound impacts on fields as varied as human health, agriculture, the environment, energy and biotechnology. Bioinformatics is the application of statistics and computer science to the field of molecular biology.


Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques and theory to solve formal and practical problems arising from the management and analysis of biological data. Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them and creating and viewing 3-D models of protein structures. Major research efforts in the field include sequence alignment of DNA, RNA etc., gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution.


Who introduced the term, Bioinformatics?

The term bioinformatics was coined by Paulien Hogeweg, a Dutch theoretical biologist and another biologist, Ben Hesper in 1978 for the study of informatic processes in biotic systems. Its major use since at least the late 1980s has been in genomics and genetics, mainly in those areas of genomics involving large-scale DNA sequencing.


What are the vital sub-disciplines within Bioinformatics and Computational biology?

The field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as ‘computational biology’. Imperative sub-disciplines within bioinformatics and computational biology include:

  • Development and implementation of tools that enable efficient access to, and use and management of, various types of information.
  • Development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences.


There are two basic ways of modeling a Biological system (e.g., living You do not have access to view this node) both of which fall under the area of Bioinformatics:
I. Static:

  •  Sequences - Proteins, Nucleic acids and Peptides.
  •  Structures - Proteins, Nucleic acids, Ligands (including metabolites and drugs) and Peptides.
  •  Interaction data among the above entities including microarray data and Networks of proteins, metabolites.

II. Dynamic:

  • Systems Biology falls under this category encompassing reaction fluxes and variable concentrations of metabolites.
  •  Multi-Agent based modeling approaches capturing cellular events such as signaling, transcription and reaction dynamics.


What are the key research areas in Bioinformatics?

  • Sequence analysis: The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that those atoms. From 1977, the DNA sequences of thousands of organisms have been decoded and stored in databases. This sequence information is analyzed to determine genes that encode polypeptides (proteins), RNA genes, regulatory sequences, structural motifs, and repetitive sequences. A comparison of genes within a species or between different species can show similarities between protein functions, or relations between species. With the growing amount of data, it long ago became impractical to analyze DNA sequences manually. Today, computer programs such as “BLAST” are used daily to search sequences from more than 260 000 organisms, containing over 190 billion nucleotides (the basic structural unit of nucleic acids DNA or RNA). These programs can compensate for mutations (exchanged, deleted or inserted bases) in the DNA sequence, in order to identify sequences that are related, but not identical. For a genome as large as the human genome, it may take many days of CPU time on large-memory, multiprocessor computers to assemble the fragments, and the resulting assembly will usually contain numerous gaps that have to be filled in later. Shotgun sequencing is the method of choice for virtually all genomes sequenced today, and genome assembly algorithms are a critical area of bioinformatics research.
  • Genome annotation: In the background of genomics, annotation is the process of marking the genes and other biological features in a DNA sequence. The first genome annotation software system was designed in 1995 by Dr. Owen White. Dr. White built a software system to find the genes (places in the DNA sequence that encode a protein), the transfer RNA, and other features, and to make initial assignments of function to those genes. Most current genome annotation systems work similarly, but the programs available for analysis of genomic DNA are constantly changing and improving.
  • Computational evolutionary biology: Evolutionary biology is the study of the origin and descent of species, as well as their change over time. Informatics has assisted evolutionary biologists in several key ways; it has enabled researchers to trace the evolution of a large number of organisms by measuring changes in their DNA, and more recently, compare entire genomes, which permits the study of more complex evolutionary events, such as gene duplication, horizontal gene transfer (process in which an organism incorporates genetic material from another organism without being the offspring of that organism), track and share information on an increasingly large number of species and organisms.
  • Analysis of mutations in cancer: In cancer, the genomes of affected cells are rearranged in complex or even unpredictable ways. Massive sequencing efforts are used to identify previously unknown point mutations (type of mutation) in a variety of genes in cancer. Bio-informaticians continue to produce specialized automated systems to manage the sheer volume of sequence data produced, and they create new algorithms and software to compare the sequencing results to the growing collection of human genome sequences and ‘germline polymorphisms’. The germline of a mature or developing individual is the line (sequence) of germ cells that have genetic material that may be passed to a child. New physical detection technologies are employed, such as ‘oligonucleotide’ microarrays to identify chromosomal gains and losses (called comparative genomic hybridization), and single-nucleotide polymorphism arrays to detect known point mutations. These detection methods simultaneously measure several hundred thousand sites throughout the genome.
  • Virtual evolution: Artificial life or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.
  • High-throughput image analysis: Computational technologies are used to accelerate or fully automate the processing, quantification and analysis of large amounts of high-information-content biomedical imagery. Modern image analysis systems augment an observer's ability to make measurements from a large or complex set of images, by improving accuracy, objectivity, or speed. A fully developed analysis system may completely replace the observer.


What are the key applications of Bioinformatics?

  • Prediction of protein structure: Protein structure prediction is another important application of bioinformatics. The amino acid sequence of a protein, the so-called primary structure, can be easily determined from the sequence on the gene that codes for it. In the vast majority of cases, this primary structure uniquely determines a structure in its native environment. One of the key ideas in bioinformatics is the concept of ‘Homology’. In the genomic branch of bioinformatics, homology is used to predict the function of a gene: if the sequence of gene A, whose function is known, is homologous to the sequence of gene B, whose function is unknown, one could infer that B may share A's function. In the structural branch of bioinformatics, homology is used to determine which parts of a protein are important in structure formation and interaction with other proteins. In a technique called homology modeling, this information is used to predict the structure of a protein once the structure of a homologous protein is known. This currently remains the only way to predict protein structures reliably. One example of this is the similar protein homology between hemoglobin in humans and the hemoglobin in legumes (leghemoglobin). Both serve the same purpose of transporting oxygen in the organism. Though both of these proteins have completely different amino acid sequences, their protein structures are virtually identical, which reflects their near identical purposes.
  • A major hesitation for the biological scientist is whether it is practical to predict possible protein-protein interactions only based on these 3D shapes, without doing protein-protein interaction (occur when two or more proteins bind together) experiments. A variety of methods have been developed to tackle the Protein-protein docking (Proteinprotein complexes are the most commonly attempted targets of such modeling, followed by protein–nucleic acid complexes) problem, though it seems that there is still much work to be done in this field.


What is the web-service in Bio-informatics?

Basic bioinformatics services are classified by the European Bioinformatics Institute (EBI) into three categories: SSS (Sequence Search Services), MSA (Multiple Sequence Alignment) and BSA (Biological Sequence Analysis). The availability of these service-oriented bioinformatics resources demonstrates the applicability of web based bioinformatics solutions.

External References
Related Videos: 
See video
Related Images: