Chemoinformatics ( chemical informatics , molecular informatics ) - the use of informatics methods to solve chemical problems.
Areas of application of chemoinformatics: prediction of physicochemical properties of chemical compounds (in particular, lipophilicity , water solubility), properties of materials, toxicological and biological activity, ADME / T, ecotoxicological properties, development of new drugs and materials.
The definition of chemoinformatics
The term Chemoinformatics was coined by F.K. Brown [1] [2] in 1998:
Chemoinformatics means the sharing of information resources to convert data into information and information into knowledge for the fastest possible adoption of the best decisions when searching for leading compounds in drug development and their optimization.
In the future, this definition was expanded by J. Hasteiger [3] [4] :
Chemoinformatics is the application of computer science methods to solve chemical problems.
G. Paris from the Novartis company gave the following definition of chemoinformatics [5] :
Chemoinformatics is a scientific discipline that encompasses the design, creation, organization, management, search, analysis, distribution, visualization and use of chemical information.
According to the definition given by A. Varnek and I. Baskin [6] [7] :
Chemoinformatics is part of theoretical chemistry, based on its own molecular model; unlike quantum chemistry, in which molecules are represented as ensembles of electrons and nuclei, and based on force fields of molecular modeling dealing with classical βatomsβ and βbondsβ, chemoinformatics considers molecules as objects in chemical space.
The most complete and detailed definition of chemoinformatics as a scientific discipline is contained in the Obernai Declaration [8] :
Chemoinformatics is a scientific discipline that has arisen over the past 40 years in the boundary field between chemistry and computational mathematics. It was realized that in many areas of chemistry, the vast amount of information accumulated during chemical research can be processed and analyzed only using computers. Moreover, many of the problems in chemistry are so complex that they require new approaches based on the application of computer science methods. Based on this, methods have been developed for building databases of chemical compounds and reactions, for predicting the physical, chemical and biological properties of compounds and materials, for finding new drugs, analyzing spectral information, for predicting the course of chemical reactions and planning for organic synthesis.
Chemoinformatics and other sciences
- Chemoinformatics, along with quantum chemistry and molecular modeling , is a branch of theoretical chemistry and a field of computational chemistry .
- Chemoinformatics is closely related to bioinformatics , and there is no clear boundary between them. Bioinformatics can be considered a special case of chemoinformatics for biological macromolecules, and chemoinformatics - the extension of bioinformatics to non-biological molecules. There are a number of areas, for example, chemogenomics ( chemogenomics ), which are equally related to bioinformatics and chemoinformatics.
- At the intersection of chemoinformatics and pharmacology is medical (pharmaceutical) chemistry .
- At the intersection of chemoinformatics and analytical chemistry is chemometrics .
- The mathematical foundations of chemoinformatics related to the representation of chemical compounds in the form of molecular graphs are dealt with in mathematical chemistry .
Basics
Chemoinformatics is at the intersection of chemistry and computer science . Chemoinformatics is based on the concept of chemical space - the totality of all available chemical objects (chemical compounds, reactions, mixtures, solutions, catalytic systems, materials, etc.). A distinctive feature of chemoinformatics is that, within its framework, the prediction of the properties of chemical objects is carried out by transferring (interpolating) the known values ββof the properties from similar chemical objects. In most cases, chemical objects are representable in the form of molecular graphs , and therefore methods of graph theory are widely used in chemoinformatics. The traditional approach to the processing of chemical information, however, consists in mapping the chemical space to a descriptor space formed by vectors of molecular descriptors calculated for each chemical object β numerical characteristics describing chemical objects (in particular, molecular graphs ). This makes it possible to apply the methods of mathematical statistics and machine learning (including data mining ) for working with chemical objects.
The basics of chemoinformatics are presented in textbooks [3] [9] [10] [11] [12] [13] , monographs [4] [5] [14] [15] and review articles [1] [2] [7] .
Main sections
Computer representation of chemical information
In chemoinformatics, molecular graphs are usually used for the internal representation of the structures of chemical compounds, which, if necessary, can be supplemented with information on the three-dimensional coordinates of atoms, as well as the dynamics of their change in time. Long-term storage of chemical information and its exchange between applications is carried out using files organized in accordance with the types of external representation of chemical information.
The simplest type of external representation of the structures of chemical compounds is linear notation in the form of a string of characters. Historically, the first kind of linear notation was the Wiswesser Linear Notation (WLN). Currently, the most common type of linear notation is the SMILES string. In addition, linear SLN notations ( Sybyl Line Notation , Tripos, Inc .; also include the possibility of specifying Markush structures), SMARTS (SMILES extension for search queries to chemical databases), ROSDAL are also used. For coding chemical structures, IUPAC has proposed the universal linear InChI notation.
The second type of external representation of the structures of chemical compounds and reactions between them is based on the direct coding of the adjacency matrix of the molecular graph. Such widespread formats as MOL, SDF and RDF, which are currently standard for the exchange of chemical information, can be considered ways of representing the adjacency matrix of the molecular graph as a tact file. Specific formats MOL2, HIN, PCM, etc., designed to work with common molecular modeling programs, also serve the same purpose.
Finally, the third type of external representation of chemical compound structures is based on XML technology. The most common chemical information description language based on these principles is CML.
Computer representation of chemical information is discussed in detail in the manual [10] .
Creation and management of chemistry databases
A feature of chemistry database management is that it provides the following types of searches specific to chemical information> [10] :
- Search for identical chemical structure, duplicate control
- Substructural Search
- Molecular Similarity Search
- Pharmacophore Search
- Search Markush Structures
Software for working with databases of chemical structures (storage, search):
- ISIS / Host, ISIS / Base ( www.mdli.com )
- ChemFinder, ChemOffice ( www.cambridgesoft.com )
- JChem ( www.chemaxon.com )
- THOR ( www.daylight.com )
- MOE ( www.chemcomp.com )
- ICM Pro (under mySQL) ( www.molsoft.com )
- CheD (Sergey Trepalin)
- UNITY ( www.tripos.com )
- OrChem ( orchem.sourceforge.net )
- Bingo ( ggasoftware.com/opensource/bingo )
- Pgchem :: tigress ( pgfoundry.org/projects/pgchem )
Public databases containing chemical information:
- PubChem ( pubchem.ncbi.nlm.nih.gov )
- ZINC ( zinc.docking.org )
- NCI ( 129.43.27.140/ncidb2 (unavailable link from 13-05-2013 [2295 days] - history ) )
- DrugBank ( www.drugbank.ca )
- BindingDB ( www.bindingdb.org )
- DUD ( dud.docking.org )
- ChemSpider ( www.chemspider.com )
- ChEMBL ( www.ebi.ac.uk )
- ChEBI ( www.ebi.ac.uk )
Databases on chemistry are discussed in detail in the manual [11] .
Prediction of the properties of chemical compounds and materials
Prediction of the properties of chemical compounds in chemoinformatics is based on the use of mathematical statistics and machine learning methods for constructing models that allow predicting their properties (physical, chemical, biological activity) by describing the structures of chemical compounds. For models to predict the quantitative characteristics of biological activity, the English name Quantitative Structure-Activity Relationship (QSAR) has historically entrenched. The acronym QSAR is often interpreted extensively to refer to any structure-property models.
Pharmacophores and pharmacophore search
A pharmacophore is a set of spatial and electronic signs necessary to ensure optimal supramolecular interactions with a specific biological target that can cause (or block) its biological response. In the pharmacophore search, a correspondence is sought between the description of the pharmacophore and the characteristics of the molecules from the database that are in valid conformations.
Molecular Similarity and Search for Molecular Similarity
Molecular similarity (or chemical similarity, chemical similarity ) is the proximity, similarity, similarity of the structures of chemical compounds. As a quantitative measure of molecular similarity, a quantity that often increases with decreasing distance between chemical compounds in the descriptor space is often considered. A chemical similarity search is based on the assumption that similar compounds have similar biological or catalytic activity.
Virtual Screening
Virtual screening is a computational procedure that includes an automated scan of a database of chemical compounds and selection of those for which the desired properties are predicted. Most often, virtual screening is used in the development of new drugs for the search for chemical compounds with the desired type of biological activity.
Computer synthesis
Computer synthesis is a field of chemoinformatics that encompasses methods, algorithms, and computer programs that implement them, assisting a chemist in planning the synthesis of organic compounds, predicting the results, and designing new types of organic reactions based on a synthesis of data on known synthetic transformations.
Visualization and study of chemical space
One of the central tasks of chemoinformatics is the visualization and mapping of the chemical space, navigation and identification of unexplored zones in it [7] . The analysis of chemical space is usually based either on the representation of chemical objects (structures and reactions) in the form of descriptor vectors of a fixed size, or on the description of chemical objects using molecular graphs. In the latter case, molecular skeleton trees are often used to represent the chemical space.
Molecular design of chemical compounds with desired properties
One of the most important tasks of chemoinformatics is the molecular design of chemical compounds with desired properties. This is understood as directed generation of structures of chemical compounds (molecular graphs), which, in accordance with various models, must possess one or a set of predetermined properties. When using the QSAR and QSPR models obtained by searching for quantitative structure β property relationships for this purpose, one speaks of βinverse QSARβ, βinverse QSPRβ, or solving the inverse problem in the structure-property problem [16] . These approaches are based on the use of molecular graph generators. Using a physical model that describes ligand-protein interactions, one speaks of de novo design methods for chemical structures.
Scientific journals
- Journal of Chemical Information and Modeling
- Molecular Informatics
- Journal of Cheminformatics
- Journal of Computer-Aided Molecular Design
- SAR & QSAR in Environmental Research
See also
- Computer chemistry
- Medical chemistry
- Mathematical statistics
- Chemometrics
- Bioinformatics
- Machine learning
- Blue obelisk
Notes
- β 1 2 FK Brown. Chapter 35. Chemoinformatics: What is it and How does it Impact Drug Discovery (Eng.) // Annual Reports in Med. Chem. : journal. - 1998. - Vol. 33 . - P. 375 . - DOI : 10.1016 / S0065-7743 (08) 61100-8 .
- β 1 2 Brown, Frank. Editorial Opinion: Chemoinformatics - a ten year update (English) // Current Opinion in Drug Discovery & Development: journal. - 2005. - Vol. 8 , no. 3 . - P. 296-302 .
- β 1 2 Gasteiger J. (Editor), Engel T. (Editor): Chemoinformatics: A Textbook . John Wiley & Sons, 2003, ISBN 3-527-30681-1
- β 1 2 Gasteiger, Johann (ed.) Handbook of Chemoinformatics . From Data to Knowledge. Wiley-VCH, Weinheim, 2003, in 4 volumes, ISBN 3-527-30680-3
- β 1 2 Varnek A., Tropsha, A. Chemoinformatics Approaches to Virtual Screening , RSC Publishing, 2008, ISBN 978-0-85404-144-2
- β Varnek, A. Chemoinformatics: recognition through teaching. Presented at 235th ACS National Meeting. New Orleans, Louisiana, April 6-10, 2008
- β 1 2 3 Alexandre Varnek and Igor Baskin. Chemoinformatics as a Theoretical Chemistry Discipline (Eng.) // Molecular Informatics : journal. - 2011. - Vol. 30 , no. 1 . - P. 20β32 .
- β Obernai Declaration
- β AR Leach, VJ Gillet: An Introduction to Chemoinformatics . Springer, 2003, ISBN 1-4020-1347-7
- β 1 2 3 Majidov T.I., Baskin I.I., Antipin I.S., Varnek A.A. Introduction to chemoinformatics: a training manual. Part 1. Computer representation of chemical structures , Kazan: Kazan University, 2013, ISBN 978-5-00019-131-6
- β 1 2 Majidov T.I., Baskin I.I., Varnek A.A. Introduction to chemoinformatics: a training manual. Part 2. Chemical databases , Kazan: Kazan University, 2015, ISBN 978-5-00019-429-4
- β Baskin I.I., Majidov T.I., Varnek A.A. Introduction to chemoinformatics: a training manual. Part 3. Modeling structure-property , Kazan: Kazan University, 2015, ISBN 978-5-00019-442-3
- β Baskin I.I., Majidov T.I., Varnek A.A. Introduction to chemoinformatics: a training manual. Part 4. Methods of machine learning , Kazan: Kazan University, 2016, ISBN 978-5-00019-695-3
- β J. Bajorath, Chemoinformatics: Concepts, Methods, and Tools for Drug Discovery , Humana Press: Totowa, New Jersey, 2004, ISBN 1-58829-261-4
- β TI Oprea, Chemoinformatics in Drug Discovery , Wiley-VCH, 2005, ISBN 3-527-30753-2
- β I. I. Baskin, E. V. Gordeeva, R. O. Devdariani, N. S. Zefirov , V. A. Palyulin, M. I. Stankevich. Methodology for solving the inverse problem in the problem of the βstructure-propertyβ relationship for the case of topological indices (rus.) // DAN SSSR: Journal. - 1989. - T. 307 , No. 3 . - S. 613-616 .
Links
- Repository of databases and network resources for chemoinformatics
- International Society of Chemoinformatics and QSAR
- Russian section of the International Society for Chemoinformatics and Analysis of Quantitative Structural-Activity Relations
- Russian section of the International Society for Chemoinformatics and Analysis of Quantitative Structural-Activity Relationships - on SciPeople
- Chemoinformatics and QSAR UK
- French Society for Chemoinformatics
- QSAR World - Global QSAR Network Resource
- QSAR History - ( PDF )
- Institute of Chemoinformatics (India) (inaccessible link from 13-05-2013 [2295 days] - history )
- Chemoinformatics Education Article at QSAR World
- "Chemoinformatics and molecular modeling"
- Chemoinformatics Facebook Group
- Interview with Timur Majidov: "Chemoinformatics is a breakthrough into the future"