Clever Geek Handbook
📜 ⬆️ ⬇️

Gene ontology

“Gene ontology” ( Eng. Gene Ontology , or GO ) is a bioinformatic project dedicated to creating a unified terminology for annotating genes and gene products of all biological species [1] .

The aim of the project is to maintain and replenish a certain list of attributes of genes and their products, compile annotations of genes and products, develop tools for working with the project database , as well as for analyzing new experimental data, in particular, analysis of the representation of functional groups of genes . It is worth noting that the GO project created a markup language for classifying data (information about genes and their products, that is, RNA and proteins, as well as their functions), which allows you to quickly find systematic information about gene products [2] [3] [ 4] .

“Gene ontology” is part of a larger classification project - “Open Biomedical Ontologies” ( OBO ) [5] .

Content

History and Current Status

Ontologies in computer science are used to formalize certain areas of knowledge using a data system about real-world objects and the relationships between them (the so-called knowledge base ). In biology and related disciplines, the problem of the lack of a universal standard of terminology arose. Terms expressing similar concepts , but used for different biological species , different areas of research, or even within different groups of scientists, can have fundamentally different meanings, which complicates the exchange of data. In this regard, the task of the “Gene ontology” project was to create an ontology of terms that reflect the properties of genes and their products and are applicable to any organisms [2] [3] [4] .

“Gene ontology” was created in 1998 by a consortium of scientists studying the genomes of three model organisms : Drosophila melanogaster (fruit fly), Mus musculus (mouse) and Saccharomyces cerevisiae (baker's yeast) [6] . Then, many databases for other model organisms joined the GO Consortium, thereby contributing not only to expanding the annotation database, but also creating services for viewing and applying data.

The GO Consortium ( GOC ) is a multitude of biological databases and research groups actively involved in the Gene Ontology project [7] . It includes several databases for various model organisms, common protein databases, software development groups , and editors of the Gene Ontology.

“Gene ontology” is a large-scale and rapidly developing project. As of September 2011, “Gene ontology” contained more than 33 thousand terms and about 12 million annotations of gene products applicable to more than 360 thousand living organisms [2] . After the end of 2016, the number of terms exceeded 44 thousand copies, while the number of organisms annotated in this knowledge base exceeded the mark of 460 thousand individuals [3]

Over the past few years, the GO Consortium has introduced a number of ontology changes to increase the number, quality, and specificity of GO annotations. By 2013, the number of annotations exceeded 96 million. The quality of annotations was improved through automated quality control. Also, the annotation of the data presented in the GO database has improved, new terms have been added. [4] . In 2007, a new InterMine service was created [8] , the purpose of which is to integrate genomic data from a large number of disparate sources, and to facilitate computational tasks, such as searching for specific genomic regions and performing statistical tests. The project was originally created to integrate data for Drosophila, but currently includes a large number of model organisms. In recent years, the development of the LEGO service (Linked Expressions using the Gene Ontology) has been underway, which allows us to study the interaction of various annotations in the GO database, combining them into more general gene models and their functions [3] .

Structure and Terms

It should be understood that “Gene ontology” describes complex biological phenomena, and not specific biological objects. The database of “Gene ontology” includes three independent dictionaries [1] [9] :

  • Molecular functions - the classification by the specific function of a gene product (protein or RNA) at the molecular level, for example, carbohydrate binding or ATPase activity.
  • Biological processes ( English biological process ) - classification by a complex process, usually necessary for the life of organisms and occurring through the implementation of a sequence of molecular reactions, for example, mitosis or purine biosynthesis.
  • Cellular components ( English cellular component ) - classification by part of the cell or extracellular space, where the function of the gene product, for example, the nucleus or ribosome , is carried out.

Each term in the “Gene ontology” has a number of attributes: a unique digital identifier, a name, a dictionary to which the term belongs, and a definition. Terms can have synonyms that are divided into exactly the meaning of the term, broader, narrower and related to the term. Attributes such as links to sources, to other databases and comments on the meaning and use of the term [1] [9] may also be present.

The ontology is built on the principle of a directed acyclic graph : each term is associated with one or more other terms through a different type of relationship . The following types of relations are distinguished [1] :

  • “A is a B” - A is a special case of B,
  • “A part of B” - A is part B,
  • “B has part A” - B includes A,
  • "A regulates B" - A regulates B,
  • "A positively regulates B" - A positively regulates B,
  • "A negatively regulates B" - A negatively regulates B,
  • “A occurs in B” - A occurs at B.

An example of one of the terms of the GO project [10] :

  id: GO: 0043417
 name: negative regulation of skeletal muscle tissue regeneration
 namespace: biological_process
 def: "Any process that stops, prevents, or reduces the frequency, rate or extent of skeletal muscle regeneration."  [GOC: jl]
 synonym: "down regulation of skeletal muscle regeneration" EXACT []
 synonym: "down-regulation of skeletal muscle regeneration" EXACT []
 synonym: "downregulation of skeletal muscle regeneration" EXACT []
 synonym: "inhibition of skeletal muscle regeneration" NARROW []
 is_a: GO: 0043416!  regulation of skeletal muscle tissue regeneration
 is_a: GO: 0048640!  negative regulation of developmental growth
 relationship: negatively_regulates GO: 0043403!  skeletal muscle tissue regeneration

The database of “Gene Ontology” is constantly being amended and supplemented by both the curators of the GO project and other researchers. Proposed user amendments are reviewed by the project editors and are applied if amendments are approved [9] .

A file containing the entire database [10] can be obtained in various formats on the official website of the Gene Ontology, and the terms are available online using the AmiGO Gene Ontology browser. In addition, with its help it is possible to extract an array of data on gene products related to a particular term. Also on the site you can download maps of the correspondence of GO terms to other classification systems [11] .

Annotations

Genome annotation is aimed at obtaining information about the properties of gene products. GO annotations use the terms “Gene Ontology” for this. Members of the GO Consortium post their annotations on the Gene Ontology website, where annotations are available for direct download, or for viewing in the AmiGO browser [12] .

The gene annotation contains the following data: name and identifier of the gene product; corresponding term GO; type of data on which the annotation is based ( English evidence code ); link to the source; as well as the creator and date of creation of the annotation. For data types that indicate the reliability of the annotation ( evidence code ), there is a special ontology related to the PSB project [13] . It includes various methods of annotation: both manual and automatic. For example [1] :

  • IDA (Inferred from Direct Assay) - experimental data.
  • TAS (Traceable Author Statement) - data from a scientific publication.
  • IMP (Inferred from Mutant Phenotype) - data obtained on the basis of a mutant phenotype .
  • IGI (Inferred from Genetic Interaction) - based on gene interaction .
  • IPI (Inferred from Physical Interaction) - based on physical interaction.
  • RCA (Inferred from Reviewed Computational Analysis) - based on reliable computational analysis.
  • ISS (Inferred from Sequence Similarity) - based on sequence similarity.
  • IGC (Inferred from Genomic Context) - based on the genomic context.
  • IEP (Inferred from Expression Pattern) - based on the nature of expression .
  • NAS (Non-traceable Author Statement) - based on unpublished data.
  • IEA (Inferred from Electronic Annotation) - based on automatic extraction of annotations from other databases.
  • IC (Inferred by Curator) - data attributed by the curator.
  • ND (No biological Data available) - no reliable data.

According to September 2012, more than 99% of all annotations of the “Gene ontology” were obtained automatically [4] . Since such annotations are not manually verified, the GO Consortium considers them to be less reliable, and only part of them is available in the AmiGO browser. A complete database of annotations can be downloaded on the website of “Gene Ontology”.

AmiGO

AmiGO [9] is a web application (GO service) that allows users to query, find, and visualize GO terms and gene product annotations. In addition, the application contains the BLAST tool (available in AmiGO 1, was removed in AmiGO 2), services that allow you to analyze large data sets and an interface for searching directly in the GO database [14] . AmiGO can be used online on the Gene Ontology website to access data provided by the GO Consortium, or it can be downloaded and installed for local application to any database built on the GO principle. AmiGO 2 is open source and free software .

Data Research

Visualization

Visualization provides an opportunity for the user to construct a graph characterizing the gene ontology for a specific GO term. There are two input formats [15] :

  • The standard format is a list of id GO terms (e.g. GO: 1234567), separated by a space.
  • Advanced format - description of nodes in a graph in JSON (JavaScript Object Notation) format. Depending on the prescribed format, the contents of the node may change (adding additional annotations, changing colors, etc.)

JSON input example:

  {"GO: 0002244": {"title": "foo",
               "body": "bar",
               "fill": "#ccccff",
               "font": "# 0000ff",
               "border": "red"},
 "GO: 0005575": {"title": "alone",
              "body": ""},
 "GO: 0033060": {}}
 
Visualization of GO term

Coding relationships using color:

AttitudeColour
is_ablue
part_oflightblue
develops_frombrown
regulatesblack
negatively_regulatesred
positively_regulatesgreen

Visualization of the term consists in constructing a graph from the vertex representing the original GO term to the root vertex, which is represented by the name of one of the three main dictionaries: biological processes , molecular functions, and cellular components [1] [9] .

Data Overview

In addition to the possibility of creating graphs displaying the gene ontology of the GO term, AmiGO also implements several tools that can give the user an idea of ​​the GO project data. Among them [14] :

  • Basic statistics - information about GO data in the form of various histograms (for example, the distribution of annotations and their nature (experimental / non-experimental) relative to various types of living organisms). Implemented using the Plotly service.
  • The expanded browser (drill-down browser) - allows you to explore ontologies and annotations, moving in a hierarchy, starting from a high level. In this tool, the use of various filters is possible.
  • Search templates - an interface consisting of boxes for entering data and performing typical queries for the GO database for them.

GOOSE

GOOSE [16] - SQL query environment implemented in online mode and available to users of AmiGO service for creating data sets. This service uses SQL syntax to compile various queries to the GO database. EBI mirrors (UK, Cambridge), Berkeley BOP and Berkeley BOP (lite) (both located in Berkeley, California) are also available to reduce the load on the system.

In addition to directly writing the request manually, you can use templates to partially simplify this task. A typical query to the database is presented below (search for the maximum depth of the tree for the cellular component) [16] :

  SELECT distance as max from graph_path, term 
  WHERE graph_path.term2_id = term.id and 
  term.term_type = 'cellular_component' 
  ORDER BY distance desc
  limit 1; 

The database in GO has a complex structure and consists of many tables. The main databases [16] :

  • termdb - a database containing information about GO terms and the relationship between them.
  • assocdb is a database containing GO vocabulary and annotations between GO terms and gene products. This database is dependent on termdb.
  • seqdb is a database containing GO terms, gene products, and sequences that are annotated with these gene products. Depends on termdb and assocdb. In addition, a seqbdlite database is implemented, in which there are no IEA annotations.

The following data export formats are possible as a result of a query [16] :

  • .rdf - xml
  • .obo - xml
  • .owl - OWL
  • .tables
  • .sql

Data Analysis

PANTHER

PANTHER ( English Protein Analysis TH rough Evolutionary relations ) is a huge database of genes / protein families and subfamilies functionally similar to them, which can be used to classify the functional spectrum of gene products [17] . PANTHER is part of the GO project, the main purpose of which is the classification of proteins and their genes.

In PANTHER, the database is edited not only by the project staff, but also through classification algorithms. Proteins are classified according to their belonging to families (and subfamilies), molecular function or biological process [17] .

The main application of PANTHER is to elucidate the functions of unexplained genes of any organism, based on their evolutionary relationships with genes, the functions of which are information in the database. Using gene functions, ontology and statistical-analytical methods, PANTHER allows biologists to analyze big data, whole genomes, obtained by sequencing or studies of gene expression [18] .

The main tools available on the PANTHER website [18] :

  • Analysis of the gene list:
    • Functional analysis of genes and their classification - includes information about the family and subfamily of genes, their molecular function, the biological processes in which they are involved, and the cellular components where they can be found. This data can be presented either in the form of a list or in the form of a pie chart.
    • Statistical tests (Overrepresentation test and enrichment test) are designed to find the general biological functions of genes submitted to the input by the user.
  • Study of data ontology, annotations between terms and families, subfamilies PANTHER.
  • Search for protein sequences in PANTHER libraries
  • Analysis of single nucleotide polymorphisms (cSNP) - an assessment of the probability of a non-synonymous single nucleotide mutation to a change in the functional activity of a gene.

GO Slimmer

GO Slimmer [19] is a tool that allows you to compare detailed annotations of a set of genes with one or more parental terms of a higher level (GO slim terms). The GO slim term is a stripped-down version of the GO ontology containing a subset of the terms of the entire GO without a detailed description of specific low-level terms.

Using GO Slimmer allows presenting GO genome annotations, analyzing the results of expression microarrays or complementary DNA collections when an extensive classification of the functions of gene products is necessary [19] .

The result of this algorithm is presented in three columns [19] :

  • GO Slim term
  • The number of gene products found in the query matching the given slim term.
  • The location of the term in the three main parts of GO ontology: the biological process (P), the cellular component (C), and the molecular function (F).

AmiGO version of this tool is written in map2slim Perl script [19] . The curators of the project note that GO slimmer service is currently loaded, and the input data of an impressive size can adversely affect its work. Service time for processing input sequences is limited.

BLAST

BLAST ( Eng. B asic L ocal Alignment Search T ool ) is a family of computer programs used to search for homologues of proteins or nucleic acids for which the sequence is known by alignment. Using BLAST, the researcher can compare his sequence with sequences from the database and find the ones most similar to this one, which will be the alleged homologues.

The implementation of this tool in AmiGO 1 is presented in the form of the WU-BLAST package developed by the University of Washington in St. Louis (Washington University in St. Louis). [20]

In AmiGO 2, this tool (GO BLAST) was removed, but you can use the search in AmiGO 1 . Инструмент позволяет фильтровать результаты поиска по генному продукту, базе данных, таксономической принадлежности, словарю GO, OBO аннотации.

Term Matrix

Term Matrix [21] (матрица терминов) — инструмент AmiGO для изучения информации о схожести генной продукции терминов. Результатом его работы является матрица, элементами которой является количество генных продуктов, аннотированных для конкретной пары GO терминов. Для использования функции [21] необходимо ввести список идентификаторов GO, чтобы увидеть совместные аннотации - количество общих генных продуктов, аннотированных по парам терминов. Есть возможность задавать конкретные виды или таксоны. Подцветка тепловой карты может быть осуществлена в виде градации от чёрного к белому, либо используя стандартную палитру карты.

OBO-Edit

OBO-Edit [22] — это находящийся в открытом доступе редактор онтологий, разработанный и поддерживаемый Консорциумом GO. Он реализован на языке Java и использует подход, основанный на работе с графами , для визуализации и редактирования онтологий. OBO-Edit имеет удобный интерфейс поиска и фильтрации, позволяющий визуализировать и разделять подмножества терминов GO. Интерфейс можно настраивать в соответствии с предпочтениями пользователя. Также OBO-Edit позволяет автоматически создавать новые связи на основе существующих отношений и их свойств. Несмотря на то, что OBO-Edit был разработан для биомедицинских онтологий, он может быть использован для просмотра и редактирования любой онтологии.

PAINT

PAINT [23] ( англ. P hylogenetic A nnotation and IN ference T ool ) — JAVA-приложение, являющееся частью проекта аннотации геномов (Reference Genome Annotation Project), базирующееся на принципе « транзитивной аннотации». Понятие транзитивной аннотации состоит в присваивании экспериментально установленной функции одного гена другому, ввиду схожести их нуклеотидных последовательностей.

С помощью PAINT пользователь может исследовать экспериментальные аннотации для генов из отдельного семейства и использовать данную информацию для заключения новых аннотаций для членов семейства генов, которые ещё не были достаточно изучены [3] . Инструментарий PAINT позволяет строить модель, которая объясняла бы наследование или потерю той или иной функциональности гена в пределах отдельных ветвей филогенетических деревьев . Новые аннотации, полученные с помощью данной модели, именуются как аннотации на основе биологического предка (IBA — Inferred from Biological Ancestry) [1] .

Данное приложение бесплатно доступно для загрузки на Github.

See also

  • Онтология в информатике
  • « Открытые биомедицинские онтологии »

Notes

  1. ↑ 1 2 3 4 5 6 7 du Plessis L., Skunca N., Dessimoz C. The what, where, how and why of gene ontology — a primer for bioinformaticians (англ.) // Brief Bioinform. : journal. — 2011. — November ( vol. 12 , no. 6 ). — P. 723—735 . — DOI : 10.1093/bib/bbr002 . — PMID 21330331 .
  2. ↑ 1 2 3 The Gene Ontology Consortium. The Gene Ontology: enhancements for 2011. (англ.) // Nucleic Acids Res. : journal. - 2012 .-- January ( vol. 40 , no. Database issue ). — P. D559—64 . — DOI : 10.1093/nar/gkr1028 . — PMID 22102568 .
  3. ↑ 1 2 3 4 5 The Gene Ontology Consortium. Expansion of the Gene Ontology knowledgebase and resources (англ.) // Nucleic Acids Res. : journal. — 2017. — January ( vol. 45 , no. D1 ). — P. D331—D338 . — DOI : 10.1093/nar/gkw1108 .
  4. ↑ 1 2 3 4 The Gene Ontology Consortium. Gene Ontology annotations and resources (англ.) // Nucleic Acids Res. : journal. — 2013. — January ( vol. 41 , no. Database issue ). — P. D530—5 . — DOI : 10.1093/nar/gks1050 . — PMID 23161678 .
  5. ↑ Smith B., Ashburner M., Rosse C., Bard J., Bug W., Ceusters W., Goldberg LJ, Eilbeck K., Ireland A., Mungall CJ, Leontis N., Rocca-Serra P., Ruttenberg A., Sansone SA, Scheuermann RH, Shah N., Whetzel PL, Lewis S. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration (англ.) // Nature Biotechnology : journal. — Nature Publishing Group , 2007. — November ( vol. 25 , no. 11 ). — P. 1251—1255 . — DOI : 10.1038/nbt1346 . — PMID 17989687 .
  6. ↑ Ashburner M., Ball CA, Blake JA, Botstein D., Butler H., Cherry JM, Davis AP, Dolinski K., Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L., Kasarskis A., Lewis S., Matese JC, Richardson JE, Ringwald M., Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium (англ.) // Nat. Genet. : journal. — 2000. — May ( vol. 25 , no. 1 ). — P. 25—9 . — DOI : 10.1038/75556 . — PMID 10802651 .
  7. ↑ The GO Consortium (неопр.) .
  8. ↑ Richard N. Smith, Jelena Aleksic, Daniela Butano, Adrian Carr, Sergio Contrino. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data (англ.) // Bioinformatics. — 2012-12-01. - Vol. 28 , iss. 23 . — P. 3163—3165 . — ISSN 1367-4803 . — DOI : 10.1093/bioinformatics/bts577 .
  9. ↑ 1 2 3 4 5 Carbon S., Ireland A., Mungall CJ, Shu S., Marshall B., Lewis S; AmiGO Hub; Web Presence Working Group. AmiGO: Online access to ontology and annotation data. (англ.) // Bioinformatics : journal. — 2008. — January ( vol. 25 , no. 2 ). — P. 288—289 . — DOI : 10.1093/bioinformatics/btn615 . — PMID 19033274 .
  10. ↑ 1 2 The GO Consortium. База данных «Генной онтологии» в формате .obo (неопр.) (OBO 1.2 flat file).
  11. ↑ The GO Consortium. Mappings of External Classification Systems to GO. (unopened) (inaccessible link) . Дата обращения 9 мая 2014. Архивировано 25 июня 2014 года.
  12. ↑ The GO Consortium. Search annotations. (unspecified) .
  13. ↑ The Open Biological and Biomedical Ontologies: Evidence Codes. (unspecified) . Архивировано 26 ноября 2009 года.
  14. ↑ 1 2 Руководство по работе с AmiGO. (unspecified) .
  15. ↑ The GO Consortium. Manual Visualization (неопр.) .
  16. ↑ 1 2 3 4 The GO Consortium. Manual GOOSE (неопр.) .
  17. ↑ 1 2 Huaiyu Mi, Xiaosong Huang, Anushya Muruganujan, Haiming Tang, Caitlin Mills, Diane Kang, and Paul D. Thomas. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements (англ.) // Nucleic Acids Research : journal. — 2016. — 28 November ( vol. 45 , no. Database ). — P. D183—D189 . — DOI : 10.1093/nar/gkw1138 .
  18. ↑ 1 2 The GO Consortium. Manual PANTHER (неопр.) .
  19. ↑ 1 2 3 4 The GO Consortium. Manual GO Slimmer (неопр.) .
  20. ↑ The GO Consortium. Manual GO BLAST (неопр.) .
  21. ↑ 1 2 Gene Ontology Consortium. AmiGO 2: Matrix (англ.) . amigo2.berkeleybop.org. Дата обращения 4 апреля 2018.
  22. ↑ Day-Richter J., Harris MA, Haendel M., Gene Ontology OBO-Edit Working Group, Lewis S. OBO-Edit – an ontology editor for biologists. (неопр.) // Bioinformatics. — 2007. — August ( т. 23 , № 16 ). — С. 2198—2200 . — DOI : 10.1093/bioinformatics/btm112 . — PMID 17545183 .
  23. ↑ The GO Consortium. Manual PAINT (неопр.) .

Links

  • The Gene Ontology — официальный сайт проекта. (англ.)
  • AmiGO — браузер «Генной онтологии». (англ.)
  • PAINT — бесплатное приложение на Github. (англ.)
  • Term Matrix — инструмент AmiGO. (англ.)
  • BLAST — инструмент AmiGO. (англ.)
  • GO slimmer — инструмент AmiGO. (англ.)
  • map2slim — скрипт GO slimmer. (англ.)
  • GO data scheme — схема базы данных GO. (англ.)
  • Plotly — сервис инфорграфики. (англ.)
  • Visualization — инструмент AmiGO. (англ.)
  • Annotation Database — полная база данных аннотаций. (англ.)
Источник — https://ru.wikipedia.org/w/index.php?title=Генная_онтология&oldid=101239683


More articles:

  • Bridges (Bakhmach district)
  • FIBA Champions Cup 1988/1989
  • Dog Type RS Variable
  • Ukrainian (Bakhmach district)
  • Korin, Pavel Dmitrievich
  • Ordruf
  • Apophysis
  • 1978 in science
  • Cup of Ukraine on football 2009/2010
  • Rabi Kung

All articles

Clever Geek | 2019