+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
EurekaMag Most Shared ContentMost Shared
EurekaMag PDF Full Text ContentPDF Full Text
+ PDF Full Text
Request PDF Full TextRequest PDF Full Text
+ Follow Us
Follow on FacebookFollow on Facebook
Follow on TwitterFollow on Twitter
Follow on LinkedInFollow on LinkedIn

+ Translate

Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions

Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions

Bmc Bioinformatics 11(): 461-461

The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output. Panseq also includes a loci selector that calculates the most variable and discriminatory loci among sets of accessory loci or core gene SNPs. Panseq is freely available online at Panseq is written in Perl.

(PDF emailed within 0-6 h: $19.90)

Accession: 054853020

Download citation: RISBibTeXText

PMID: 20843356

DOI: 10.1186/1471-2105-11-461

Related references

Combined Analysis of Variation in Core, Accessory and Regulatory Genome Regions Provides a Super-Resolution View into the Evolution of Bacterial Populations. Plos Genetics 12(9): E1006280-E1006280, 2016

Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria. Gigascience 7(4): 1-11, 2018

Single-read sequence tags of a limited number of genomic DNA fragments provide an inexpensive tool for comparative genome analysis. Yeast 14(14): 1327-1332, 1998

Rapid genome evolution revealed by comparative sequence analysis of orthologous regions from four triticeae genomes. Plant Physiology 135(1): 459-470, 2004

Identification ofSalmonella entericaSpecies- and Subgroup-specific Genomic Regions Using Panseq 2. 2011

Analysis of multiple genomic sequence alignments: a web resource, online tools, and lessons learned from analysis of mammalian SCL loci. Genome Research 14(2): 313-318, 2004

Identification of Salmonella enterica species- and subgroup-specific genomic regions using Panseq 2.0. Infection, Genetics and Evolution 11(8): 2151-2161, 2012

CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome. Bmc Bioinformatics 10(): 424-424, 2010

Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). Dna Research 3(3): 185-209, 1996

Genomic sequence analysis and SNP identification in genomic regions surrounding genes involved in carcinogenesis. American Journal of Human Genetics 65(4): A129, 1999

ClustAGE: a tool for clustering and distribution analysis of bacterial accessory genomic elements. Bmc Bioinformatics 19(1): 150-150, 2018

Genome-wide functional analysis using the barcode sequence alignment and statistical analysis (Barcas) tool. Bmc Bioinformatics 17(Suppl 17): 475-475, 2017

GenomePeek-an online tool for prokaryotic genome and metagenome analysis. Peerj 3: E1025-E1025, 2015

Comparative genomic analysis of a neurotoxigenic Clostridium species using partial genome sequence: Phylogenetic analysis of a few conserved proteins involved in cellular processes and metabolism. Anaerobe 16(2): 147-154, 2010

Genome-wide analysis of the Hsp20 gene family in soybean: comprehensive sequence, genomic organization and expression profile analysis under abiotic and biotic stresses. Bmc Genomics 14(): 577-577, 2014