Clever Geek Handbook
📜 ⬆️ ⬇️

Bioinformatics

Map X of the human chromosome (from the NCBI site). The assembly of the human genome is one of the greatest achievements of bioinformatics.

Bioinformatics - a set of methods and approaches [1] , including:

  1. mathematical methods of computer analysis in comparative genomics (genomic bioinformatics).
  2. development of algorithms and programs for predicting the spatial structure of biopolymers ( structural bioinformatics ).
  3. study of strategies, appropriate computational methodologies, as well as general management of the information complexity of biological systems [2] .

Bioinformatics uses methods of applied mathematics , statistics and computer science . Bioinformatics is used in biochemistry , biophysics , ecology and other fields.

The most commonly used tools and technologies in this area are the programming languages ​​Java , C # , Perl , C , C ++ , Python , R ; markup language - XML ; Databases - SQL hardware-software architecture of parallel computing - CUDA ; application package for solving technical computing problems and the programming language of the same name used in this package - MATLAB , and spreadsheets .

Content

  • 1 Introduction
    • 1.1 History
    • 1.2 Objectives
  • 2 Main areas of research
    • 2.1 Analysis of genetic sequences
    • 2.2 Genome Abstract
    • 2.3 Computational evolutionary biology
    • 2.4 Assessment of biological diversity
    • 2.5 Basic bioinformatics programs
  • 3 Bioinformatics and Computational Biology
  • 4 Structural bioinformatics
  • 5 See also
  • 6 notes
  • 7 Literature

Introduction

Bioinformatics has become an important part of many areas of biology. In experimental molecular biology, bioinformatics techniques, such as imaging and signal processing , provide useful results from a large amount of input data. In the fields of genetics and genomics, bioinformatics helps in ordering and annotating genomes and observed mutations. It plays a role in the analysis of data from biological literature and the development of biological and genetic ontologies for organizing and querying biological data. It plays a role in gene analysis, protein expression, and regulation. Bioinformatics tools help in comparing genetic and genomic data and, in general, in understanding the evolutionary aspects of molecular biology. In general terms, it helps analyze and catalog biological pathways and networks that are an important part of systems biology. In structural biology, it helps in the simulation and modeling of DNA, RNA and protein structures, as well as molecular interactions.

History

Based on the recognition of the important role of information transfer, storage and processing in biological systems, in 1970 Pauline Hogeweg introduced the term “bioinformatics”, defining it as the study of information processes in biotic systems [3] [4] . This definition draws a parallel between bioinformatics and biophysics (the study of physical processes in biological systems) or biochemistry (the study of chemical processes in biological systems) [3] .

At the beginning of the "genomic revolution", the term "bioinformatics" was rediscovered and meant the creation and maintenance of a database for storing biological information

Sequences . Computers became necessary in molecular biology when protein sequences became available after Frederick Senger determined the sequence of insulin in the early 1950s. Comparing multiple sequences manually proved impractical. The pioneer in this area was Margaret Oakley Dayhoff. David Lipman (director of the National Center for Biotechnological Information) called her "the mother and father of bioinformatics." Deihof was one of the first bases of protein sequences, originally published in books and became the pioneer in sequence alignment and molecular evolution.

Genomes . As complete genome sequences became available, again with the groundbreaking work of Frederick Senger, the term “bioinformatics” was rediscovered to refer to the creation and maintenance of databases for storing biological information, such as nucleotide sequences (GenBank database in 1982). The creation of such databases included not only design issues, but also the creation of a comprehensive interface that allows researchers to request available data and add new ones. With the public availability of data, tools for their processing were quickly developed and described in magazines such as the “Study of Nucleic Acids”, which published specialized questions on bioinformatics tools already in 1982.

Goals

The main goal of bioinformatics is to promote understanding of biological processes. The difference between bioinformatics and other approaches is that it focuses on the creation and application of intensive computational methods to achieve this goal. Examples of such methods: pattern recognition , data mining , machine learning algorithms and visualization of biological data . The main efforts of researchers are aimed at solving the problems of sequence alignment , finding genes (searching for a region of DNA encoding genes), decoding the genome, designing drugs, developing drugs, aligning the protein structure, predicting protein structure , predicting gene expression and protein-protein interactions, a genome-wide association search and evolution modeling.

Bioinformatics today involves the creation and improvement of databases, algorithms, computational and statistical methods and theories to solve practical and theoretical problems that arise in the management and analysis of biological data.

Major research areas

Genetic Sequence Analysis

 
Processing a huge amount of data obtained by sequencing is one of the most important tasks of bioinformatics

Since the phage was sequenced in 1977, the DNA sequences of an increasing number of organisms have been decrypted and stored in databases. These data are used to determine the sequences of proteins and regulatory sites. Comparison of genes within the framework of one or different species can demonstrate the similarity of the functions of proteins or the relationship between species (thus phylogenetic trees can be composed). With the increasing amount of data, it has long been impossible to manually analyze sequences. Today, computer programs are used to search the genomes of thousands of organisms consisting of billions of nucleotide pairs. Programs can unambiguously match ( align ) similar DNA sequences in the genomes of different species; often, such sequences carry similar functions, and differences result from small mutations, such as substitutions of individual nucleotides, insertion of nucleotides, and their “loss” (deletion). One of the options for this alignment is used in the sequencing process itself. The so-called “ fractional sequencing ” technique (which, for example, was used by the for sequencing the first bacterial genome, Haemophilus influenzae ) instead of a complete nucleotide sequence yields sequences of short DNA fragments (each about 600-800 nucleotides long). The ends of the fragments overlap each other and, properly aligned, give the complete genome. This method quickly yields sequencing results, but assembling fragments can be quite a challenge for large genomes. In a project to decipher the human genome, assembly took several months of computer time. Now this method is used for almost all genomes, and genome assembly algorithms are one of the most acute bioinformatics problems at the moment.

Another example of the use of computer sequence analysis is the automatic search for genes and regulatory sequences in the genome. Not all nucleotides in the genome are used to specify protein sequences. For example, in the genomes of higher organisms, large segments of DNA do not explicitly encode proteins and their functional role is unknown. The development of algorithms for identifying protein-coding regions of the genome is an important task of modern bioinformatics.

Bioinformatics helps link genomic and proteomic projects, for example, by helping to use DNA sequences to identify proteins.

Genome Abstract

In the context of genomics, annotation is the process of labeling genes and other objects in a DNA sequence . The first genome annotation software system was created in 1995 by Owen White, a team at the Institute for Genomic Research , who sequenced and analyzed the first genome-encoded free-living organism, the Haemophilus influenzae bacteria. Dr. White built a system for locating genes (a portion of DNA that defines the sequence of a particular polypeptide or functional RNA), tRNA, and other DNA objects and made the first notation of the functions of these genes. Most modern genome annotation systems work in a similar way, but programs available for genomic DNA analysis, such as GeneMark, are used to find the genes encoding the protein in Haemophilus influenzae, and they are constantly changing and improving.

Computational Evolutionary Biology

Evolutionary biology explores the origin and appearance of species , as well as their development over time. Computer science helps evolutionary biologists in several aspects:

  • study the evolution of a large number of organisms by measuring changes in their DNA , and not just in structure or physiology ;
  • compare whole genomes (see BLAST ), which allows us to study more complex evolutionary events, such as: duplication of genes , horizontal gene transfer , and predict bacterial specializing factors;
  • build computer models of populations to predict system behavior over time;
  • Track the appearance of publications containing information on a large number of species.

A field in computer science that uses genetic algorithms is often confused with computer evolutionary biology , but the two are not necessarily related. Work in this area uses specialized software to improve algorithms and computation and is based on evolutionary principles such as replication , diversification through recombination or mutation , and survival in natural selection .

Biodiversity Assessment

The biological diversity of the ecosystem can be defined as the complete genetic totality of a certain environment, consisting of all living species, whether it was a biofilm in an abandoned mine, a drop of sea water, a handful of earth or the entire biosphere of planet Earth . Databases are used to collect species names, descriptions, areas of distribution, and genetic information. Specialized software is used to search, visualize and analyze information, and, more importantly, provide it to other people. Computer simulators simulate things like population dynamics, or calculate the overall genetic health of a crop in agronomy . One of the most important potentials of this area is to analyze DNA sequences or the complete genomes of entire endangered species, allowing you to remember the results of a genetic experiment of nature in a computer and possibly use it again in the future, even if these species are completely extinct.

Often, methods for assessing other components of biodiversity — taxa (primarily species) and ecosystems — fall out of the scope of bioinformatics. Currently, the mathematical foundations of bioinformation methods for taxa are presented in the framework of such a scientific direction as phenetics , or numerical taxonomy. Methods of analyzing the structure of ecosystems are considered by specialists in such areas as systemic ecology, biocenometry .

Basic bioinformatics programs

  • ACT (Artemis Comparison Tool) - Genomic Analysis
  • Arlequin - analysis of population genetic data
  • Bioconductor - a large-scale FLOSS- project, providing many individual packages for bioinformatic research. Written in R.
  • BioEdit - editor for multiple alignment of nucleotide and amino acid sequences
  • BioNumerics - commercial universal software package
  • BLAST - search for related sequences in the database of nucleotide and amino acid sequences
  • Clustal - multiple alignment of nucleotide and amino acid sequences
  • DnaSP - DNA sequence polymorphism analysis
  • FigTree - phylogenetic tree editor
  • Genepop - population genetic analysis
  • Genetix - population genetic analysis (the program is available only in French)
  • JalView - editor for multiple alignment of nucleotide and amino acid sequences
  • MacClade - a commercial program for interactive evolutionary data analysis
  • MEGA - Molecular Evolutionary Genetic Analysis
  • Mesquite - Java Comparative Biology Program
  • Muscle is a multiple comparison of nucleotide and amino acid sequences. Faster and more accurate than ClustalW
  • PAUP - phylogenetic analysis using the parsimony method (and other methods)
  • PHYLIP - phylogenetic software package
  • Phylo_win - phylogenetic analysis. The program has a graphical interface.
  • PopGene - Population Genetic Diversity Analysis
  • Populations - population genetic analysis
  • PSI Protein Classifier - a summary of the results obtained using the PSI-BLAST program
  • Seaview - phylogenetic analysis (with graphical interface)
  • Sequin - depositing sequences in GenBank , EMBL , DDBJ
  • SPAdes - a collector of bacterial genomes
  • SplitsTree - a program for building phylogenetic trees
  • T-Coffee - multiple progressive alignment of nucleotide and amino acid sequences. More sensitive than in ClustalW / ClustalX .
  • UGENE is a free Russian-language tool, multiple alignment of nucleotide and amino acid sequences, phylogenetic analysis, annotation, work with databases.
  • Velvet - Genome Collector
  • ZENBU - generalization of results

Bioinformatics and Computational Biology

Bioinformatics means any use of computers for processing biological information. In practice, sometimes this definition is narrower, it means using computers to process experimental data on the structure of biological macromolecules ( proteins and nucleic acids ) in order to obtain biologically significant information. In light of the change in the cipher of scientific specialties (03.00.28 “Bioinformatics” turned into 03.01.09 “Mathematical Biology, Bioinformatics”), the field of the term “bioinformatics” has expanded to include all implementations of mathematical algorithms related to biological objects.

The terms bioinformatics and " computational biology " are often used as synonyms, although the latter more often indicates the development of algorithms and specific computational methods. It is believed that not every use of computational methods in biology is bioinformatics, for example, mathematical modeling of biological processes is not bioinformatics.

Bioinformatics uses methods of applied mathematics , statistics and computer science . Research in computational biology often intersects with systems biology . The main efforts of researchers in this area are aimed at studying genomes , analyzing and predicting the structure of proteins , analyzing and predicting the interactions of protein molecules with each other and other molecules, as well as reconstructing evolution .

Bioinformatics and its methods are also used in biochemistry , biophysics , ecology and other fields. The main line in bioinformatics projects is the use of mathematical tools to extract useful information from “noisy” or too voluminous data on the structure of DNA and proteins obtained experimentally.

Structural Bioinformatics

Structural bioinformatics includes the development of algorithms and programs for predicting the spatial structure of proteins. Research topics in structural bioinformatics:

  • X-ray analysis (XRD) of macromolecules
  • Quality indicators of a macromolecule model constructed according to SAR
  • Macromolecule Surface Algorithms
  • Algorithms for finding the hydrophobic core of a protein molecule
  • Algorithms for finding structural domains of proteins
  • Spatial alignment of protein structures
  • Structural classifications of SCOP and CATH domains
  • Molecular dynamics

See also

  • Computational Biology
  • Mathematical biology
  • Chemoinformatics
  • International Society for Computational Biology
  • Gene ontology
  • Pangenom
  • List of Bioinformatics Scientific Journals

Notes

  1. ↑ E. Kunin Nail Soup. A leading evolutionist spoke about the Multiverse and the anthropic principle. // Lenta.ru, December 1, 2012
  2. ↑ Ivan Y. Torshin Bioinformatics in the Post-Genomic Era: The Role of Biophysics Archived December 27, 2006 at Wayback Machine , Novapublishers, 2006, ISBN 1-60021-048-1
  3. ↑ 1 2 Hogeweg P. The roots of bioinformatics in theoretical biology. (English) // Public Library of Science for Computational Biology. - 2011 .-- Vol. 7, no. 3 . - P. e1002021. - DOI : 10.1371 / journal.pcbi.1002021 . - PMID 21483479 .
  4. ↑ Hesper B., Hogeweg P. Bioinformatica: een werkconcept (neopr.) . - Kameleon, 1970. - T. 1 , No. 6 . - S. 28-29 .

Literature

  • Jonathan Pevsner (2013) Bioinformatics and Functional Genomics
  • Jean-Michel Claverie Ph.D. (2007) Bioinformatics For Dummies. 2nd edition.
  • Durbin R, Eddie Sh, Krog A, Mutchison G. "Analysis of biological sequences." - M.-Izhevsk: Research Center “Regular and chaotic dynamics”, 2006. - 480 p. - ISBN 5-93972-559-7
  • Borodovsky M., Ekisheva S. "Tasks and solutions for the analysis of biological sequences." - M.-Izhevsk: Research Center “Regular and chaotic dynamics”, 2008. - 420 p. - ISBN 978-5-93972-644-3
  • Setubal J., Meidanis J. "Introduction to Computational Molecular Biology." - M.-Izhevsk: Research Center “Regular and chaotic dynamics”, 2007. - 420 p. - ISBN 978-5-93972-623-8
  • V. A. Talanov, Mathematical models for the synthesis of peptide chains and methods of graph theory in decoding genetic texts
Source - https://ru.wikipedia.org/w/index.php?title=Bioinformatics&oldid=101928101


More articles:

  • Charas Collection
  • Hippocrates (son of Megacles)
  • Central Po
  • Baime, Jaime
  • Durov, Alexey Danilovich
  • Fireworks (film, 1954)
  • Udaltsov, Sergey Stanislavovich
  • 7th Infantry Regiment (Austria-Hungary)
  • Stevens Will
  • Kolev, Stoyan

All articles

Clever Geek | 2019