jagomart
digital resources
picture1_Community Ecology Pdf 161316 | 1038 Nrmicro1157


 134x       Filetype PDF       File size 0.43 MB       Source: www.nature.com


File: Community Ecology Pdf 161316 | 1038 Nrmicro1157
focus on metagenomics community genomics in microbial ecology and evolution eric e allen and jillian f banfield abstract it is possible to reconstruct near complete and possibly complete genomes of ...

icon picture PDF Filetype PDF | Posted on 21 Jan 2023 | 2 years ago
Partial capture of text on file.
                           
                                                                                                                                      FOCUS ON METAGENOMICS
                                                       COMMUNITY GENOMICS IN 
                                                       MICROBIAL ECOLOGY AND 
                                                       EVOLUTION
                                                                                                                     ‡
                                                       Eric E. Allen* and Jillian F. Banfield*
                                                       Abstract | It is possible to reconstruct near-complete, and possibly complete, genomes of the 
                                                       dominant members of microbial communities from DNA that is extracted directly from the 
                                                       environment. Genome sequences from environmental samples capture the aggregate 
                                                       characteristics of the strain population from which they were derived. Comparison of the 
                                                       sequence data within and among natural populations can reveal the evolutionary processes 
                                                       that lead to genome diversification and speciation. Community genomic datasets can also 
                                                       enable subsequent gene expression and proteomic studies to determine how resources are 
                                                       invested and functions are distributed among community members. Ultimately, genomics can 
                                                       reveal how individual species and strains contribute to the net activity of the community.
                CLONE LIBRARY                        Microbial genomics has, until recently, been confined                      members. This cannot be adequately addressed by 
                A collection of targeted DNA         to individual, isolated microbial strains. Genome                          focused isolation and individual genome sequencing 
                sequences, such as the               sequence information for isolates from phylogeneti-                        efforts, as isolates might not be representative of the 
                16S rRNA gene, most often            cally diverse lineages has had a marked impact on our                      full genetic and metabolic potential of their associ-
                derived from PCR amplification       understanding of microbial physiology, biochemis-                          ated natural populations. Moreover, artificial cultiva-
                and subsequent cloning into a        try, genetics, ecology and evolution. However, this                        tion conditions often do not replicate those found in 
                vector. Specifically, 16S rRNA 
                gene clone libraries are often       approach is limited because we do not know how to                          nature. Therefore, there is a compelling impetus to 
                used in surveys of microbial                                                 1
                diversity from environmental         cultivate most microorganisms . Consequently, many                         move beyond the culture-centric realm of microbial 
                samples.                             questions about the roles of uncharacterized organisms                     sequencing and to begin focusing sequencing efforts 
                                                     in natural ecosystems remain.                                              on microbial communities en masse.
                                                         Our ability to survey the resident microbiota in                           The analysis of genome sequence data that has 
                                                     a given community has been greatly expanded by                             been recovered directly from the environment is 
                                                     various cultivation-independent methodologies,                             motivated by many objectives, which include the 
                                                     which include 16S rRNA gene CLONE LIBRARY collections                      establishment of gene inventories and natural prod-
                *Department of                       and group-specific fluorescence in situ hybridization                      uct discovery3,4. This approach is often referred to 
                Environmental Science,                        2
                Policy, and Management,              (FISH) . Although the description and quantitation of                      as metagenomics, which is defined as the functional 
                ‡                                    the phylogenetic diversity of microbial communities is                     and sequence-based analysis of the collective micro-
                Department of Earth                  an important first step, linking these organisms to their                  bial genomes that are contained in an environmental 
                and Planetary Sciences,                                                                                                   3. Recent reviews have covered environmental 
                University of California,            ecological roles remains a significant challenge.                          sample
                Berkeley, Berkeley,                                                                                                                                     3,5–8
                                                         In the natural environment, individual organisms                       and functional metagenomics                 .
                California 94720, USA.               do not exist in isolation. Rather, microbial communi-                          Here we centre our discussion on the opportu-
                Correspondence to J.F.B.             ties are dynamic CONSORTIA of microbial species popu-                      nities for analysis of ecological and evolutionary 
                e-mail:                              lations. The understanding of consortia function will                      processes in natural microbial consortia using envi-
                jill@eps.berkeley.edu
                doi:10.1038/nrmicro1157              benefit from genomic information from all coexisting                       ronmentally-derived genome sequence data. We 
                NATURE REVIEWS | MICROBIOLOGY                                          © 2005 Nature Publishing Group                                              VOLUME 3 | JUNE 2005 | 489
                           
                REVIEWS
                                                       Box 1 | Acid mine drainage community genomics
                                                                                                               78                       79
                                                       A decade of research on the biogeochemistry  and microbiology  of the Richmond Mine at Iron Mountain, 
                                                       California, provided the important scientific foundation for the acid mine drainage (AMD) community genome 
                                                       sequencing project. Initial work determined the types of organism that were present and correlated community 
                                                                                                            80
                                                       membership with geochemical conditions . In 2002–2003, 76 Mb of environmental sequence data were obtained 
                                                                                                                             9
                                                       from a small-insert library from a single biofilm sample . Using this data, it was possible to reconstruct the genomes 
                                                       of the dominant bacterium, Leptospirillum group II (10X coverage) and the dominant archaeon, Ferroplasma type II 
                                                       (10X coverage). Partial reconstruction was also possible for the bacterial genomes of Leptospirillum group III 
                                                       (3X coverage) and a Sulfobacillus species (0.5X coverage) that is closely related to Sulfobacillus thermosulfidooxidans. 
                                                       Archaeal genomes that were partially reconstructed include Ferroplasma acidarmanus Type I (very closely related 
                                                       to F. acidarmanus fer 1; 4X coverage) and G-plasma (3X coverage) — a novel group within the Thermoplasmatales.
                                                         The sequencing allowed metabolic reconstructions of these organisms based on genome annotations and an 
                                                                                                                                      9
                                                       analysis of functional partitioning among community members . Importantly, it was revealed that a relatively minor 
                                                       community component, Leptospirillum group III, possessed the sole complement of nitrogen fixation (nif) genes. 
                                                       This subsequently led to the design of a selective isolation strategy to successfully cultivate this organism using 
                                                                                   43
                                                       genome sequence data . Furthermore, carbon fixation pathways and gene products that are possibly involved in iron 
                                                       oxidation were revealed, which provided important insights into the intricate metabolism of these 
                                                       chemolithoautotrophic communities.
                                                         Genomic analyses also provided insights into population structure. These included evidence for genetic 
                                                       recombination among archaeal populations, which revealed a high degree of genome mosaicism in these species. 
                                                       Furthermore, comprehensive population genomic information has allowed analysis of factors that contribute to 
                                                       genomic heterogeneity within species populations and the ability to assess evidence of selection based on the analysis 
                                                       of nucleotide substitutions (E.E.A. et al., unpublished observations). Finally, the community genomic dataset has 
                                                                                                                                                                                         10
                                                       provided the foundation for performing environmental proteomic surveys from a natural biofilm sample . These 
                                                       studies have revealed the complement of genes that are expressed in situ, and therefore go beyond inferences based on 
                                                       genome-annotation gazing to provide insights into how functions are distributed and which functions are 
                                                       important in natural microbial consortia.
                                                      focus on community genomics, which emphasizes                                 Insights into the metabolic functions of uncultivated 
                                                      the analysis of species populations and their interac-                      microorganisms have been facilitated by exploiting 
                                                      tions, recognizing that both species composition and                        phylogenetic anchors that are contained in environ-
                                                      interactions change over time, and in response to envi-                     mental libraries BOX 2. For example, in large-insert 
                                                      ronmental stimuli. This requires that the system under                      environmental libraries, contiguous DNA that flanks 
                                                      investigation can be sampled repeatedly, and defined                        taxonomic-specific markers such as 16S rRNA genes 
                                                      well enough to enable in situ ecological studies and the                    can provide a glimpse into the genetic potential of sam-
                                                                                                                                                      11–15
                                                      analysis of adaptive processes. Genomics can resolve                        pled organisms           . Alternatively, random clones from 
                                                      the genetic and metabolic potential of communities                          shotgun libraries can be sequenced. In this review, we 
                                                      and establish how functions are partitioned in and                          focus primarily on the shotgun sequencing method, 
                                                      among populations, reveal how genetic diversity is cre-                     which represents a relatively unbiased, non-directed 
                                                      ated and maintained, and identify the primary drivers                       approach to survey the structure and metabolic capacity 
                                                      of genome evolution and speciation.                                         of a community.
                                                          We draw upon experiences from our ongoing                                   As a first step, consideration should be given to 
                                                      analyses of an extreme acid mine drainage (AMD)                             the community chosen for investigation. On the 
                                                      ecosystem9,10 BOX 1. We discuss the challenges that                       one hand, simple communities with low species 
                                                      are associated with the assembly of near-complete,                          diversity can be characterized thoroughly with mod-
                                                      and potentially complete, genomes of uncultivated                           est sequencing effort. On the other hand, complex 
                                                      organisms, the documentation of genomic heterogene-                         communities are more representative of most natu-
                CONSORTIUM                            ity in populations and the use of these data to enable                      ral microbial assemblages, but their characteriza-
                Physical association between          comprehensive functional studies.                                           tion presents myriad challenges that require special 
                cells of two or more types                                                                                        consideration. For example, it is necessary to address 
                of microorganism. Such                Approaches to community genomics                                            gaps in knowledge owing to incomplete sequence 
                an association might be               Community genomics provides a platform to assess                            COVERAGE, and limitations that might arise owing to a 
                advantageous to at least one          natural microbial phenomena that include biogeo-                            lack of reproducibility that results from community 
                of the microorganisms.
                                                      chemical activities, population ecology, evolutionary                       heterogeneity.
                COVERAGE                              processes such as lateral gene transfer (LGT) events,                           Currently, both the cost of sequencing and the 
                The average number of times           and microbial interactions. Only by placing these                           challenges that are associated with the management of 
                a nucleotide is represented           processes in their environmental context can we                             vast datasets precludes comprehensive genomic stud-
                by a high-quality base in the         begin to understand complex community structure                             ies of highly complex communities. Consequently, we 
                sequence data; full genome            and functions, and the evolutionary constraints that                        favour an initial approach that is based on the analysis 
                coverage is usually attained 
                at 8–10X coverage.                    define and sustain them.                                                    of simpler model communities. The technical and 
                490 | JUNE 2005 | VOLUME 3                                               © 2005 Nature Publishing Group                                            www.nature.com/reviews/micro
                             
                                                                                                                                                FOCUS ON METAGENOMICS
                  Box 2 | Environmental libraries                                                                                        Sampling and defining the biogeochemical framework. 
                                                                                                                                         To understand the ecology of a community, it is neces-
                  The extraction of high quality DNA is central to the success of any sequencing                                         sary to describe the associations of organisms with each 
                  project. In the case of environmental samples that contain a mixed consortia of                                        other and with their environment. Characterization of 
                  organism types, the objective is to obtain a quantitatively accurate representation of                                      ABIOTIC system attributes is important to understand 
                  all community members during extraction and subsequent construction of shotgun                                         the 
                  sequencing libraries. Realizing this importance, microbial ecologists have invested                                    the factors that control community membership. 
                  substantial effort in optimizing DNA extraction procedures for various                                                 Spatial and temporal environmental heterogeneity is 
                                                 15,81                                                                                   inextricably linked to successional changes in com-
                  environmental samples              .                                                                                   munity composition and diversity16–18. Therefore, it is 
                     The advantage of large-insert libraries (for example, ~40 kb for fosmids) is that                                   important to define physicochemical gradients such 
                  they provide substantial contiguous genomic information that is representative of                                      as pH, temperature, osmotic strength, mineralogy 
                                                             15,82,83
                  individual community members                      . For community genome sequencing and                                and nutrient levels, and to identify sources of energy, 
                  assembly, paired-end sequences from large-insert libraries are particularly useful as                                  nutrient fluxes and feedbacks owing to microorgan-
                  they provide valuable linking information for orientation and scaffolding of                                           ism–mineral interactions. For instance, geochemical 
                  assembled genome fragments. Furthermore, the complete sequences of large-insert 
                                                                                                                                                     19
                  clones can be used as reference sequences for the assembly and statistical analysis of                                 patterns  might indicate important metabolic func-
                                                                    22                                                                   tions in the system. In combination with genomic infor-
                  environmental shotgun sequence data .
                     Despite the obvious utility of large-insert libraries, certain environments present a                               mation, the assessment of environmental conditions 
                  considerable challenge to obtaining the high-molecular-weight DNA that is suitable                                     that contribute to spatial or temporal heterogeneity in 
                  for large-insert library construction. For example, small-insert shotgun libraries                                     species composition might enable the identification of 
                  (3- to 4-kb insert size) might be the only viable option for the acid mine drainage                                    traits that are important to microbial adaptation in the 
                  (AMD) biofilm community from Iron Mountain, as the many steps that are required                                        community.
                  for DNA purification result in excessive DNA shearing. Nevertheless, small-insert                                           The biological attributes of the system are also an 
                  DNA libraries alone have been used in environmental genomic studies with                                               important consideration. For example, the presence of 
                  considerable success9,36.                                                                                              a microbial species might depend upon the presence 
                                                                                                                                         (or absence) of another species. This might be due to 
                                                                                                                                         a metabolic dependence and is often suggested to be 
                                                         scientific lessons that have been gleaned from these                            a phenomenon that limits the success of cultivation 
                                                         studies can then be extended to more complex systems                            endeavours20. The interdependence of community 
                                                         and their generality evaluated.                                                 members might also take the form of thermodynamic 
                                                                                                                                         control, such as that observed in microbial consortia 
                                                         System tractability. Extreme geological environ-                                that can couple methane oxidation to sulphate reduc-
                                                                                                                                               21,22
                                                         ments, such as acidic geothermal hotsprings, highly                             tion       . Biotic features, such as grazing and phage 
                                                         acidic, or hypersaline habitats, provide important                              predation, also impact community structure. Grazing 
                                                         geochemical and selective constraints on species                                pressure that is imposed by eukaryotic protozoa, such 
                                                         diversity, which makes them ideal for high-resolution                           as flagellates and ciliates, is one example of a top–down 
                                                         studies of microbial ecology and evolution. There                                          23–25. Perhaps more important, however, is the 
                                                                                                                                         control
                                                         are other system attributes that can enhance our                                well-documented contribution of phage to microbial 
                                                         ability to learn about the structure of communities                             mortality. The efficacy of phage predation can have 
                                                         and the degree to which they function as integrated                             profound effects on the composition of microbial 
                                                         synergistic assemblages. These include: first, self-                                                                                                 26,27
                                                                                                                                         assemblages by controlling dominant groups                                . 
                                                         sustaining systems, in which all essential metabolic                            Phage-induced cell lysis can also release cellular 
                                                         functions are carried out in situ and which therefore                           contents into the environment, thereby influencing 
                                                         represent a complete ecosystem microcosm; second,                               microbial food-web dynamics and biogeochemical 
                                                         systems that are characterized by strong and clearly                            processes28. Furthermore, the capacity for phage-medi-
                                                         defined geochemical–microbiological feedbacks,                                  ated DNA transfer (transduction), or the direct release 
                                                                                                                                                                                                           29
                                                         which enables analysis of the interplay between                                 of free DNA during virus-induced host-cell lysis , can 
                                                         organism function and environmental conditions;                                 contribute to the overall mobile gene pool in natural 
                                                         third, systems that are characterized by systematic                             communities. Laterally transferred genes and genome 
                                                         fluctuations in environmental conditions, and that                              fragments can alter the metabolic properties of the 
                                                         can be sampled over space and time, to understand                               host30 and represent a primary driving factor that 
                                                         how the community-level metabolic networks change                               contributes to genomic heterogeneity, and therefore 
                                                         during colonization and as a function of community                              evolution, in natural species populations (REF. 31 and 
                                                         membership and geochemical conditions; fourth,                                  E.E.A. et al., unpublished observations).
                                                         systems that are defined by well-established species 
                                                         interactions, as expected in extreme environments                               Estimating the community sequencing endeavour. It is 
                                                         and other specialized habitats, such as host–pathogen                           possible to predict the amount of sequencing that will 
                 ABIOTIC                                 and host–symbiont relationships, in which organ-                                be needed to analyse a given community based on the 
                 The non-living physical and             isms have co-evolved over extended evolutionary                                 desired degree of coverage of genomes and the avail-
                 chemical attributes of a system,        time periods; and finally, systems that have sufficient                         able information about species number, relative species 
                 which include pH, temperature,          biomass for post-genomic functional assays (such as                             abundance and genome sizes. An approximation of 
                 pressure, osmotic strength, 
                 and chemical composition.               proteomic surveys).                                                             community diversity can be made through the analysis 
                 NATURE REVIEWS | MICROBIOLOGY                                                © 2005 Nature Publishing Group                                                  VOLUME 3 | JUNE 2005 | 491
                           
                REVIEWS
                                                     of 16S rRNA gene libraries, together with a quantita-                       overrepresentation of the Archaea, which prompted a 
                                                                                                                                                                                   9
                                                     tive assessment of relative species richness (number                        reappraisal of actual genome coverages . 
                                                     of species) and evenness (relative abundance of each 
                                                     species) using FISH. However, diversity estimates are                       Community genomics. Perhaps the primary challenge 
                                                     complicated by PCR bias, rrn (ribosomal RNA gene)                           of any community genomic study that aims to extract 
                                                     copy number per genome, and the fact that libraries are                     ecological insights is to correctly assign genome frag-
                                                     rarely sequenced to completion. Genome sizes can be                         ments to organism types. In our experience, the weight 
                                                     estimated from known sizes of related species, if avail-                    of this requirement falls most heavily on genome 
                                                     able, or approximated using the average prokaryotic                         assembly. Various genome assembly programmes 
                                                     genome size (~3.16 Mb ± 1.79 Mb; calculated from                            are currently available (ARACHNE, CAP, CELERA, 
                                                     215 prokaryotic genomes published in the Genomes                            EULER, JAZZ, PHRAP and TIGR assemblers, to name 
                                                     Online Database at the time of writing; see the Online                      but a few). However, the relative efficacy with which 
                                                     links box). Such estimates can prove imperfect, how-                        most of these programs handle mixed community 
                                                     ever, owing to marked variation in genome size in a                         DNAs has yet to be determined (JAZZ, PHRAP and 
                                                                            32                                                                                                 9,36
                                                                               and the fact that current genome                               have been used previously            ).
                                                     microbial species                                                           CELERA
                                                     databases are biased towards pathogens and symbionts,                           Conventional shotgun sequencing of microbial iso-
                                                     which often have reduced genomes. Correlations that                         lates is simplified by the fact that the sequenced clones 
                                                     have been drawn between the ecology of an organism                          are derived from organisms with a single genome 
                                                     and its genome size might provide a more refined                            type. In environmental samples, however, each clone 
                                                     estimate of genomic complexity for community                                represents a unique sequence that is probably derived 
                                                                  33
                                                     members .                                                                   from an individual in the community, and the genomes 
                                                          To predict the amount of sequencing that will be                       that are sampled come from a pool of both distinct and 
                                                     required for community coverage, estimates of spe-                          related genome types. This might pose challenges that 
                                                     cies richness and the abundance of the dominant                             currently available genome assembly programs are not 
                                                     organism(s) can be used with statistical methods to                         designed to deal with. So far, studies have revealed that 
                                                                                                                34
                                                     describe the species abundance distribution . If the                        the genomes of different species have sufficient nucle-
                                                     abundance of a given organism is 1%, with a genome                          otide-level sequence divergence (as well as changes 
                                                     size of 3 Mb, then 2.4 Gb of sequence would be required                                                                       9,36. Hurdles do 
                                                                                                                                 in gene order) to prevent co-assembly
                                                     to obtain 8X (near complete) genome coverage of that                        arise, however, owing to genomic variation within 
                                                     organism. A sequencing effort of this magnitude would                       species populations.
                                                     vastly over-sample the genomes of more dominant                                 The resolution of strain-level differences is a fun-
                                                     community members. Therefore, directed strategies to                        damental goal of community genomic analysis (FIG. 1). 
                                                     target low abundance organisms may be advantageous                          Although many comparative genomic studies of strain 
                                                                                                                                                                                                 37–39
                                                     (see below).                                                                variants indicate a highly conserved gene order                      , 
                                                          It is likely that sequencing projections will be impre-                extensive genome rearrangements in members of the 
                                                                                                                                                  40,41
                                                     cise. For example, although species abundance impacts                       same species           will confuse genome assembly and 
                                                     on the relative proportion of DNAs that are present                         can preclude the assembly of environmental shotgun 
                                                                                                                                                    22
                                                     in sequencing libraries, cloning bias might skew spe-                       sequence data . In the AMD community, genome 
                                                     cies representation. Furthermore, there might be                            rearrangements that involve more than two genes were 
                                                     multiple genome types per species. Therefore, predic-                       extremely rare in archaeal populations, and breakdown 
                                                     tions should be refined after the assembly of an initial                    of conserved SYNTENY occurs primarily after species 
                                                     sequencing increment. One simple approach is to use                         divergence (E.E.A. et al., unpublished observations). 
                                                     the coverage statistics of the assembly based on a ver-                     In regions where single-nucleotide polymorphisms are 
                SYNTENY                                                                                 35
                                                     sion of the Lander–Waterman equation  that is modi-                         the predominant form of genomic heterogeneity, it is 
                Refers to the presence of two        fied to take into consideration the relative abundance                      possible to define composite species genome sequences 
                or more genes on the same            of species in the community. If the equation predicts                       (that is, an aggregate sequence comprised of multiple 
                chromosome. However, the             fewer contigs than are observed, the representation of                      strain sequence types). However, assembly is problem-
                term is often used to refer 
                to the shared colinearity in         organisms in the library or effective genome sizes can                      atic in regions where members of the strain population 
                orthologous gene content and                                                                                                                          (FIG. 2).
                gene order between genomes.          be refined. The prediction should be performed itera-                       have different gene contents 
                                                     tively as more sequence data is analysed. This approach                         It is important to develop mixed genome assembly 
                SCAFFOLD                             was used to successfully predict the outcome of the                         methods to deal with differences in gene content and 
                A genome fragment                    community sequencing project undertaken by Tyson                            gene sequence, because these phenomena can artifi-
                constructed by the ordering and            9
                                                     et al.  Specifically, estimates based solely on community                   cially terminate SCAFFOLDS or separate sequencing reads 
                orienting of sets of unlinked        characterization by 16S rRNA gene library and quan-                         into multiple scaffolds at regions of strain genome con-
                contigs generated from raw           titative FISH analyses, with an average genome size of                      fusion. This results in separate, but homologous, DNA 
                shotgun sequence data by             ~2 Mb, estimated that ~80 Mb of sequence would be                           fragments that can be mapped onto the composite spe-
                using additional information 
                (such as paired-end sequence                                                                                                               (FIG. 1). Other complications owing 
                                                     sufficient to cover the five dominant genome types,                         cies genome dataset 
                information or homology data)        with individual genome coverages ranging from 0.4                           to strain assembly include inaccurate (over)estimation 
                to determine proper contig           to 30X. Analyses post-assembly of sequencing incre-                         of genome sizes and artificial duplication of open read-
                linkage and placement along the      ments (2, 10, 15 and 25 Mb) revealed that cloning                           ing frames (ORFs) in community genome datasets. If 
                chromosome. Scaffolds can be 
                comprised of multiple contigs.       bias in sequencing libraries resulted in the significant                    assembly heuristics can overcome complications owing 
                492 | JUNE 2005 | VOLUME 3                                              © 2005 Nature Publishing Group                                           www.nature.com/reviews/micro
The words contained in this file might help you see if this file matches what you are looking for:

...Focus on metagenomics community genomics in microbial ecology and evolution eric e allen jillian f banfield abstract it is possible to reconstruct near complete possibly genomes of the dominant members communities from dna that extracted directly environment genome sequences environmental samples capture aggregate characteristics strain population which they were derived comparison sequence data within among natural populations can reveal evolutionary processes lead diversification speciation genomic datasets also enable subsequent gene expression proteomic studies determine how resources are invested functions distributed ultimately individual species strains contribute net activity clone library has until recently been confined this cannot be adequately addressed by a collection targeted isolated focused isolation sequencing such as information for isolates phylogeneti efforts might not representative s rrna most often cally diverse lineages had marked impact our full genetic metabol...

no reviews yet
Please Login to review.