Clever Geek Handbook
📜 ⬆️ ⬇️

Encyclopedia of DNA Elements

The Encyclopedia of DNA Elements ( ENCODE ) is an international research consortium created in September 2003 . Organized and funded by the American National Institute for Human Genome Research ( NHGRI ) [2] [3] [4] . Conceived as a continuation of the Human Genome project, ENCODE aims to conduct a complete analysis of the functional elements of the human genome . All results obtained during the implementation of the project are published in publicly accessible databases .

ENCODE
Content
DescriptionWhole-genome database
Contacts
Research CenterUniversity of California Santa Cruz
LaboratoryCenter for Biomolecular Science and Engineering
The authorsBrian J Raney [1]
Original publicationPMID 21037257
Date of issue2010
Availability
Siteencodeproject.org

On September 5, 2012, the first results of the project were published as 30 related publications on the websites of the journals Nature , and [5] [6] . These publications show that at least 80% of the human genome is biologically active; before that, the idea that most of the DNA was “ junk ” prevailed. However, such hasty conclusions are criticized by many scientists, who point to the lack of necessary evidence of the functionality of these elements [7] .

Content

Relevance

According to rough estimates, the human genome contains 20,000 protein-coding genes (collectively they make up the exome ), and they account for only about 1.5% of the DNA of the human genome. The primary goal of the ENCODE project is to determine the function of the rest of the genome, most of which has traditionally been considered “garbage” (for example, DNA that is not transcribed ).

Approximately 90% of single nucleotide polymorphisms in the human genome (which, as shown by the genome-wide search for associations , are associated with various diseases) are found outside the protein coding regions. [eight]

The activity and expression of protein-coding genes can be regulated by a regulom — various DNA elements, such as a promoter , regulatory sequences and chromatin regions, as well as histone modifications. It is believed that changes in regulatory areas may interfere with protein expression and cell function and thus lead to disease ( ENCODE Project Background ). By determining the location of regulatory elements and their effect on transcription, we can find out the relationship between the change in the expression levels of specific genes and the development of diseases. [9]

ENCODE is conceived as a comprehensive resource that will allow the scientific community to better understand how the genome can affect human health and stimulate the development of new methods for the prevention and treatment of diseases. [ten]

To date, the project helps in the discovery of new regulatory elements of DNA, allowing a new understanding of the organization and regulation of our genes and genome, as well as how changes in the DNA sequence can affect the development of diseases. [8] One of the main results of the project is the description of the fact that at least one biochemical function is currently shown for 80% of the human genome. [11] [12] Most of this non-coding DNA is involved in the regulation of the expression of coding genes. [13] In addition, the expression of each coding gene is controlled by many regulatory regions located both close to and at a distance from the gene. These results demonstrate that gene regulation is much more complex than previously thought. [14]

ENCODE Project

The ENCODE project is implemented in three stages: the initial phase, the technology development phase and the productive phase.

During the initial phase, the ENCODE consortium evaluated strategies for identifying various types of genome elements. The purpose of the initial stage was to determine a set of procedures that together would accurately and most fully characterize large regions of the human genome , taking into account economic profitability and high efficiency of the process. The initial phase was to identify gaps in the set of tools for determining functional sequences, as well as to show if any of the methods used would be ineffective or unsuitable for large-scale application. Some of these problems had to be solved at the stage of ENCODE technology development (taking place simultaneously with the initial stage of the project), the purpose of which was to develop new laboratory and computational methods that would improve the determination of known functional sequences or the study of new functional elements of the genome. The result of the first two stages, using an example of a study of 1% of the human genome, determined the best way to analyze the remaining 99% with maximum efficiency and lowest cost during the productive phase. [ten]

Phase I of the ENCODE Project: Initial Phase

During the pilot phase, research and comparison of existing methods for the thorough analysis of a specific part of the sequence of the human genome was conducted. It was organized as an open consortium and brought together researchers with a different base and experience to evaluate the merits of each technique, technology and strategy from a diverse set. At the same time, the goal of the development phase of the project technology was to develop new highly efficient methods for determining functional elements. The aim of this work was to identify a set of approaches that would allow the most accurate determination of all functional elements in the human genome. During the initial phase, the ability of various scaling methods to analyze the entire human genome was determined and gaps were identified in the determination of functional elements in the genome sequence.

The initial phase of the project took place with the close cooperation of experimenters and theorists, which allowed us to evaluate a number of methods for annotating the human genome. A set of plots comprising approximately 1% (30 Mb) of the human genome was chosen as a target for the initial phase of the project and was analyzed by all participants in the pilot phase of the project. All data on these regions obtained by ENCODE members was quickly released to public databases. [15] [16]

Phase II of the ENCODE Project: Productive Phase

 
Image of ENCODE data in UCSC Genome Browser . Shows several tracks with information on the regulation of gene expression . The gene located on the left (ATP2B4) is expressed in many types of cells. The gene located on the right is expressed only in a few types of cells, including embryonic stem cells.

In September 2007, funding began for the productive phase of the ENCODE project. At this stage, the goal was to analyze the entire genome and conduct "additional studies in an industrial environment. [17]

As in the initial phase, the work of the productive phase was organized as an open consortium. In October 2007, the National Institute for Human Genome Research allocated grants to it totaling more than $ 80 million for 4 years. [18] During the productive phase, the project included a Data Coordination Center, a Data Analysis Center and a Technology Development Center. [19] At this time, the project was turning into a truly large-scale enterprise, including 440 scientists from 32 laboratories around the world. In 2007, when the initial stage was completed, the project increased capacity largely due to the sequencing of a new generation . Really a lot of data was processed, the researchers received about 15 terabytes of raw information.

By 2010, the ENCODE project had received more than 1,000 genome-wide data sets. Taken together, these data demonstrate which regions are transcribed, which regions seem to control the expression of genes used in cells of a certain type, and which regions interact with a large set of proteins. The main biological tests used in ENCODE are ChIP-seq , a search for DNase I- hypersensitive regions, RNA-seq and DNA methylation studies .

ENCODE Consortium

The ENCODE Consortium is primarily composed of scientists sponsored by the US National Institute for Human Genome Research . Other project participants are members of the Consortium or the Analytical Working Group.

The initial phase of the project consisted of eight research groups and twelve groups participating in the technology development phase of the ENCODE project ( ENCODE Pilot Project: Participants and Projects ). By the end of 2007, when the pilot phase of the project officially ended, the number of participants increased to 440 scientists from 32 laboratories located around the world. Currently, the consortium consists of various centers that perform various tasks ( ENCODE Participants and Projects ):

  1. Production Centers (ENCODE Production Centers)
  2. ENCODE Data Coordination Center
  3. Data Analysis Center (ENCODE Data Analysis Center)
  4. Computational Analysis Awards (ENCODE)
  5. Technological Development (ENCODE Technology Development Effort)

Criticism

Despite the consortium’s claims that the ENCODE project is far from complete, the reaction to already published articles and press coverage was positive. The editors of Nature magazine and the authors of the ENCODE project write: “... we collaborated for many months to create this biggest news that will attract the attention of not only the scientific community, but also the general public” (“... collaborated over many months to make the biggest splash possible and capture the attention of not only the research community but also of the public at large "). [20] The statement made by the ENCODE project that 80% of the human genome has a biochemical function [11] was quickly picked up by popular science publications that described the results of the project as entailing the death of “junk” DNA . [21] [22]

However, the conclusion that the main part of the genome is “functional” was criticized on the grounds that the ENCODE project too broadly defines “functionality”, namely: everything that is transcribed in the cell has its own function. This conclusion was made, despite the generally accepted point of view that many of the elements of DNA that are transcribed , for example, pseudogenes , however, are not functional. Moreover, the ENCODE project focuses on sensitivity rather than specificity, which leads to many false-positive results . [23] [24] [25] To some extent, the arbitrary choice of cell lines and transcription factors , as well as the lack of necessary control experiments, became an additional reason for serious criticism of ENCODE, since a random DNA molecule can imitate such a “functional” behavior in the interpretation of ENCODE. [26]

In response to this criticism, it was suggested that transcribing most of the genome and splicing , which are observed in humans, are a more accurate indicator of genetic function than sequence conservatism. In addition, most of the “junk” DNA is involved in epigenetic regulation and was a necessary prerequisite for the development of complex organisms. [27] In response to comments regarding the definition of the word “functional”, many noted that in this case the dispute concerns the difference in definition, and not the essence of the project, which consists in providing data for subsequent studies of the biochemical activity of non-protein-coding DNA sites. Despite the fact that definitions are important, and science is enclosed in a language framework, ENCODE seems to have achieved its goal, because at present, a large number of research articles use the data obtained by the project, and do not discuss the definitions of “functionality”. [28] Ewan Birney, an ENCODE researcher, commented on some reactions to the project. He notes that the word “function” was used pragmatically to mean “specific biochemical activity”, which manifests itself in different classes of experiments in different ways: the presence of RNA , histone modifications, DNase I- hypersensitive regions, ChIP-seq peaks of transcription factors , DNA footprinting , binding sites of transcription factors and exons . [29]

In addition, the project was criticized for its high budget (about $ 400 million in total) and the patronage of the so-called “big science”, fundamental scientific research that takes money from more productive scientific developments that have to be carried out at the expense of the researchers themselves. [30] The cost of the initial phase of the ENCODE project was estimated at $ 55 million, its expansion cost about $ 130 million, and the US National Institute for Human Genome Research was ready to allocate up to 123 million for the next phase of the project. Some researchers argue that there has not yet been a proper return on investment. In an attempt to count all publications in which ENCODE plays a significant role, since 2012, 300 such articles have been identified, 110 of which were based on results obtained in laboratories without financial participation of ENCODE. An additional problem was that ENCODE is not a unique name that refers only to the ENCODE project, so the word 'encode' (encode) pops up in a lot of literature on genetics and genome research. [eight]

As another major comment, it is suggested that the results did not justify the amount of time spent, and that the project, in principle, is endless in nature. Although it is compared to the Human Genome project and even called its continuation, the Human Genome has a clear ending, which ENCODE currently lacks.

The authors of the project apparently share the concern of the scientific world and do not deny the existence of problems, but at the same time, they try to justify their efforts, explaining in their interviews the details of the project not only to the scientific community, but also to the media. They say that it took more than half a century to go from understanding that DNA is the material basis of heredity for life, to the sequence of the human genome , so their plan for the next century is to understand this sequence. [eight]

Other projects

Currently, the ENCODE consortium is participating in several additional projects with similar goals. Some of these projects were part of the second phase of ENCODE.

modENCODE

By analogy with the ENCODE project, a project was also launched to map the functional elements of the genome of the main model objects - Drosophila melanogaster and Caenorhabditis elegans . Model Organism ENCyclopedia Of DNA Elements (modENCODE) . The advantage of this project is the possibility of conducting some experiments on model organisms that are difficult or impossible to carry out on humans. [31]

The project was founded in 2007 by the National Institutes of Health (NIH ). [32] [33] In 2010, the modENCODE consortium presented a number of articles in Science on annotation and analysis of the distribution of functional elements in the genome of Drosophila melanogaster and Caenorhabditis elegans: Data from these publications is available on the modENCODE website. [34]

At the moment, modENCODE is a research association consisting of 11 initial projects, divided between the research of D. melanogaster and C. elegans . The project covers research on the following areas:

  • Gene structure
  • Profiling of mRNA and ncRNA expression
  • Transcription factor binding sites
  • Modifications and replacement of histones
  • Chromatin structure
  • Initiation and sequence of DNA replication steps
  • Variation in the number of copies. [35]

modERN

modERN ( English model organism Encyclopedia of Regulatory Networks ) is a branch of modENCODE. The project combines the studies of C. elegans and D. melanogaster groups and focuses on the identification of additional transcription factor binding sites. The project was launched simultaneously with the third phase of ENCODE, completion is planned for 2017. Currently, modERN has published the results of 198 experiments, another 500 are accepted for publication and are being processed by the ENCODE Data Coordination Center.

Genomics of Gene Regulation

The Genomics of Gene Regulation (GGR) program was launched in early 2015 by the US National Institutes of Health and will last three years. The aim of the program is to study gene networks and pathways in various body systems to further deepen understanding of the mechanisms that control gene expression. Although the ENCODE project is run separately from the GGR, the ENCODE Data Coordination Center stores GGR data on its portal.

Roadmap

In 2008 , the US National Institutes of Health established the Epigenetic Map Markup Consortium ( Roadmap Epigenomics Mapping Consortium ), whose goal was to develop a public source of human genome epigenetic data for biological and medical research. Based on the results of work in February 2015, the consortium published an article “Integrative analysis of 111 reference human epigenomes”. The consortium collected and annotated regulatory elements in 127 reference epigenomes, 16 of which were part of the ENCODE project. Roadmap project data is available on Roadmap or ENCODE portals.

fruitENCODE

FruitENCODE Project: An encyclopedia of ripening fruit DNA elements that is part of ENCODE. The aim of the project is the generation of datasets: DNA methylation sites, histone modifications, chromatin sections hypersensitive to DNase I, gene expression, transcription factor binding sites for juicy fruits of all kinds at different stages of development. A preliminary publication date for the results is available on the fruitENCODE portal.

FactorBook

The data obtained by ENCODE on the binding of transcription factors are currently available on Factorbook.org [36] - in the database created on the wiki engine. The first issue of FactoeBook contains:

  • 457 ChIP-seq datasets for 119 transcription factors in some human cell cultures
  • Averaged profiles of histone modifications and nucleosome positioning around transcription factor binding sites
  • Motives that enrich the binding sites, as well as the distance between them and their orientation. [37]

See also

  • Next Generation Sequencing Methods
  • RNA sequencing
  • SIMAP @ home
  • Human Genome Project
  • GENCODE

Notes

  1. ↑ Raney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K., Barber GP, Meyer LR, Sloan CA, Malladi VS, Roskin KM, Suh BB, Hinrichs AS, Clawson H., Zweig AS, Kirkup V., Fujita PA, Rhead B., Smith KE, Pohl A., Kuhn RM, Karolchik D., Haussler D., Kent, WJ . ENCODE whole-genome data in the UCSC genome browser (2011 update ) // Nucleic Acids Res. : journal. - 2011 .-- January ( vol. 39 , no. Database issue ). - P. D871-5 . - DOI : 10.1093 / nar / gkq1017 . - PMID 21037257 .
  2. ↑ Raney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K., Barber GP, Meyer LR, Sloan CA, Malladi VS, Roskin KM, Suh BB, Hinrichs AS, Clawson H., Zweig AS, Kirkup V., Fujita PA, Rhead B., Smith KE, Pohl A., Kuhn RM, Karolchik D., Haussler D., Kent, WJ . ENCODE whole-genome data in the UCSC genome browser (2011 update ) // Nucleic Acids Res. : journal. - 2011 .-- January ( vol. 39 , no. Database issue ). - P. D871-5 . - DOI : 10.1093 / nar / gkq1017 . - PMID 21037257 .
  3. ↑ EGASP: the human ENCODE Genome Annotation Assessment Project. (eng.) . PubMed
  4. ↑ Kleshenko E. DNA without trash // The New Times. - 2012. - Issue. 29 (256) .
  5. ↑ ENCODE project at UCSC (unspecified) (inaccessible link) . ENCODE Consortium. Date of treatment September 5, 2012. Archived on September 10, 2012.
  6. ↑ Walsh, Fergus . Detailed map of genome function (September 5, 2012). Archived on September 5, 2012. Date of treatment September 6, 2012.
  7. ↑ Dan Graur's blog (neopr.) .
  8. ↑ 1 2 3 4 Maher B. ENCODE: The human encyclopaedia (Eng.) // Nature. - 2012 .-- September ( vol. 489 , no. 7414 ). - P. 46-8 . - DOI : 10.1038 / 489046a . - PMID 22962707 .
  9. ↑ Saey, Tina Hesman Team releases sequel to the human genome (neopr.) . Society for Science & the Public (October 6, 2012). Date of treatment October 18, 2012.
  10. ↑ 1 2 The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project (neopr.) . Science (2004).
  11. ↑ 1 2 Bernstein BE, Birney E., Dunham I., Green ED, Gunter C., Snyder M. An integrated encyclopedia of DNA elements in the human genome (Eng.) // Nature: journal. - 2012 .-- September ( vol. 489 , no. 7414 ). - P. 57-74 . - DOI : 10.1038 / nature11247 . - . - PMID 22955616 .
  12. ↑ Timmer J. Most of what you read was wrong: how press releases rewrote scientific history (neopr.) . Staff / From the Minds of Ars . Ars Technica (September 10, 2012). Date of treatment September 10, 2012.
  13. ↑ Bernstein BE, Birney E., Dunham I., Green ED, Gunter C., Snyder M. An integrated encyclopedia of DNA elements in the human genome (Eng.) // Nature: journal. - 2012 .-- September ( vol. 489 , no. 7414 ). - P. 57-74 . - DOI : 10.1038 / nature11247 . - . - PMID 22955616 .
  14. ↑ Pennisi E. Genomics. ENCODE project writes eulogy for junk DNA (Eng.) // Science: journal. - 2012 .-- September ( vol. 337 , no. 6099 ). - P. 1159, 1161 . - DOI : 10.1126 / science.337.6099.1159 . - PMID 22955811 .
  15. ↑ Birney E. , Stamatoyannopoulos JA , Dutta A. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. (Eng.) // Nature. - 2007. - Vol. 447, no. 7146 . - P. 799-816. - DOI : 10.1038 / nature05874 . - PMID 17571346 .
  16. ↑ ENCODE Program Staff. ENCODE: Pilot Project: overview (neopr.) . National Human Genome Research Institute (October 18, 2012).
  17. ↑ Genome.gov | ENCODE and modENCODE Projects (unspecified) . The ENCODE Project: ENCyclopedia Of DNA Elements . United States National Human Genome Research Institute (August 1, 2011). Date of treatment August 5, 2011.
  18. ↑ National Human Genome Research Institute - Organization (Neopr.) . The NIH Almanac . United States National Institutes of Health. Date of treatment August 5, 2011.
  19. ↑ Genome.gov | ENCODE Participants and Projects (Neopr.) . The ENCODE Project: ENCyclopedia Of DNA Elements . United States National Human Genome Research Institute (August 1, 2011). Date of treatment August 5, 2011.
  20. ↑ Maher B. Fighting about ENCODE and junk (neopr.) . News Blog . Nature Publishing Group (September 6, 2012).
  21. ↑ Kolata G. Far From 'Junk,' DNA Dark Matter Proves Crucial to Health , The New York Times (September 5, 2012).
  22. ↑ Gregory TR. The ENCODE media hype machine (neopr.) . Genomicron (September 6, 2012).
  23. ↑ Graur D., Zheng Y., Price N., Azevedo RB, Zufall RA, Elhaik E. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE (English) / / Genome Biol Evol : journal. - 2013 .-- Vol. 5 , no. 3 . - P. 578-590 . - DOI : 10.1093 / gbe / evt028 . - PMID 23431001 .
  24. ↑ Moran LA. Sandwalk: On the Meaning of the Word "Function" (unspecified) . Sandwalk (March 15, 2013).
  25. ↑ Gregory TR. Critiques of ENCODE in peer-reviewed journals. “Genomicron (neopr.) . Genomicron (April 11, 2013).
  26. ↑ White MA, Myers CA, Corbo JC, Cohen BA Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks (Eng.) // Proceedings of the National Academy of Sciences of the United States of America : journal. - 2013 .-- July ( vol. 110 , no. 29 ). - P. 11952-11957 . - DOI : 10.1073 / pnas . 1307449110 . - PMID 23818646 .
  27. ↑ Mattick JS, Dinger ME The extent of functionality in the human genome (neopr.) // The HUGO Journal. - 2013. - T. 7 , No. 1 . - S. 2 . - DOI : 10.1186 / 1877-6566-7-2 .
  28. ↑ Nature Editorial. Form and function (Eng.) // Nature. - 2013 .-- March 14 ( vol. 495 ). - P. 141-142 . - DOI : 10.1038 / 495141b .
  29. ↑ ENCODE: My own thoughts (neopr.) . Ewan's Blog: Bioinformatician at large (September 5, 2012).
  30. ↑ Timpson T. Debating ENCODE: Dan Graur, Michael Eisen (neopr.) . Mendelspod (March 5, 2013).
  31. ↑ The modENCODE Project: Model Organism ENCyclopedia Of DNA Elements (modENCODE) (neopr.) . NHGRI website . Date of treatment November 13, 2008.
  32. ↑ modENCODE Participants and Projects (neopr.) . NHGRI website . Date of treatment November 13, 2008.
  33. ↑ Berkeley Lab Life Sciences Awarded NIH Grants for Fruit Fly, Nematode Studies (neopr.) . Lawrence Berkeley National Laboratory website (May 14, 2007). Date of treatment November 13, 2008.
  34. ↑ modENCODE (unspecified) . The National Human Genome Research Institute.
  35. ↑ Celniker S. Unlocking the secrets of the genome (neopr.) . Nature (June 11, 2009).
  36. ↑ FactorBook
  37. ↑ Wang J. Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium (neopr.) . Nucleic Acid Research (November 29, 2012).

Links

  • - the official site Encyclopedia of DNA elements
  • ENCODE Project on NHGRI
  • ENCODE / GENCODE Project at Wellcome Trust Sanger Institute
  • United in the interactive scheme all published materials of the participants of the ENCODE project
  • ENCODE: The human encyclopaedia
  • I. Yakutenko. Slightly exaggerated
  • modENCODE official site
  • Encyclopedia of DNA Elements at UCSC Genome Browser
  • Factorbook
  • modENCODE official site
  • GENCODE
Source - https://ru.wikipedia.org/w/index.php?title=DNA_elements_encyclopedia&oldid=101046038


More articles:

  • Peremyaki
  • Burgundy
  • Argentinian Rock
  • Fujifilm X-E1
  • Sayan Territorial Production Complex
  • Neonatal Fc Receptor
  • Azerbaijan at the 2004 Summer Paralympic Games
  • Carmelites - Theresian missionaries
  • Vedernitsa (Vologda Oblast)
  • Churilovo (Sokolsky district)

All articles

Clever Geek | 2019