近日,国际著名杂志Evolutionary Bioinformatics 刊登了中国科学院北京基因组研究所基因组科学与信息重点实验室的最新研究成果“LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes。”研究人员成功构建出“脊椎动物进化分支共调控基因数据库”。
脊椎动物,尤其是哺乳动物,其基因组的序列特征和基因位置关系具有良好的共线性关系,这些复杂且动态的基因排列和染色体结构对于维持体形发育和细胞分化具有重要意义。但其中几个基本问题却一直困扰着科研人员,如:不同进化分支物种(灵长目、啮齿目、食肉目和偶蹄目等)基因组的保守和变异的基因聚类的最小单位是什么?这些基因聚类与核小体定位和染色体折叠的关系?这些基因的聚集是随机的还是有所偏好的?哪些是随机的,哪些是功能相关的?
基于以上科学问题,在基因组所副所长、基因组科学与信息重点实验室主任于军研究员的指导下,王大鹏博士、张宇宾和樊中华所在小组收集了广泛范围物种的基因组注释信息,包括哺乳动物、鸟类、爬行类、两栖类和鱼类,并且选择有代表性的昆虫、线虫和真菌作为外群。研究以人类基因组为参照,将其它各物种的基因组以同源基因为原则,以保守的两个“核心基因”为单位(保持转录方向保守的“头对头”、“尾对尾”或者“头对尾”)对应到人类基因组上。研究同时提供了多种研究共调控机制的工具,如共进化、共表达、基因功能富集和启动子分析等模块。
该数据库及相关工具的构建,为解析具体一个基因或者几个基因在不同进化树分支内保守性和分支间变异性相关的基因复制、丢失、插入、倒位以及染色体水平的多倍化等基因组变异事件,提供了有力的支持。(生物谷Bioon.com)
doi:10.4137/EBO.S8540
PMC:
PMID:
LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes
Dapeng Wang, Yubin Zhang, Zhonghua Fan, Guiming Liu and Jun Yu
Animal genes of different lineages, such as vertebrates and arthropods, are well-organized and blended into dynamic chromosomal structures that represent a primary regulatory mechanism for body development and cellular differentiation. The majority of genes in a genome are actually clustered, which are evolutionarily stable to different extents and biologically meaningful when evaluated among genomes within and across lineages. Until now, many questions concerning gene organization, such as what is the minimal number of genes in a cluster and what is the driving force leading to gene co-regulation, remain to be addressed. Here, we provide a user-friendly database—LCGbase (a comprehensive database for lineage-based co-regulated genes)—hosting information on evolutionary dynamics of gene clustering and ordering within animal kingdoms in two different lineages: vertebrates and arthropods. The database is constructed on a web-based Linux-Apache-MySQL-PHP framework and effective interactive user-inquiry service. Compared to other gene annotation databases with similar purposes, our database has three comprehensible advantages. First, our database is inclusive, including all high-quality genome assemblies of vertebrates and representative arthropod species. Second, it is human-centric since we map all gene clusters from other genomes in an order of lineage-ranks (such as primates, mammals, warm-blooded, and reptiles) onto human genome and start the database from well-defined gene pairs (a minimal cluster where the two adjacent genes are oriented as co-directional, convergent, and divergent pairs) to large gene clusters. Furthermore, users can search for any adjacent genes and their detailed annotations. Third, the database provides flexible parameter definitions, such as the distance of transcription start sites between two adjacent genes, which is extendable to genes that flanking the cluster across species. We also provide useful tools for sequence alignment, gene ontology (GO) annotation, promoter identification, gene expression (co-expression), and evolutionary analysis. This database not only provides a way to define lineage-specific and species-specific gene clusters but also facilitates future studies on gene co-regulation, epigenetic control of gene expression (DNA methylation and histone marks), and chromosomal structures in a context of gene clusters and species evolution. LCGbase is freely available at http://lcgbase.big.ac.cn/LCGbase.