版纳植物园生态进化组的Cannon教授和其组员发明了一种新的研究方法,该方法不用事先组装,通过分析检测数据中达到某种“复杂度”的基因片段是否存在及其出现频次,来探讨一定数量目标基因组中的序列差异。
在以往的研究中,针对短测序片段(short read sequence,SRS)进行的比较基因组分析多数都需有事先组装好的DNA序列作为参照,这一定程度上制约了这类数据在生物信息学研究的发展。
Cannon教授等的研究比较九个树种从种群到科一级的基因组多样性的海量数据,并利用已知的3个树种的基因组数据作为对照,探知测序反应中数据的质量和分布偏差。
该方法定义了3类主要的富含生物信息的复杂DNA片段,其中每一类都具有其特殊的统计属性。第一类复杂片段为某一基因组所特有但假阳性的概率很高,高度依赖于测序覆盖度和分布情况;第二类复杂片段为两个基因组所共有并能显示其潜在的拷贝数差异;第三类复杂片段为某一些基因组所共有,与物种的形态和地理差异相联系。由于该方法不需事先组装,即可分析海量数据,极大的推进了短序列测序技术在非模式生物上的应用,并为更为进一步的基因组装和细致研究直接筛选出最有效的遗传部件提供新的途径。该研究中也展示了该技术的实际应用前景,例如,我们可为一种濒危木材树种找到大量的种群水平上的遗传标记,从而可以界定木材个体的来源,规范国际木材交易。
新一代DNA测序技术的突破为研究热带森林的生态和进化提供了一个新的平台,Cannon教授等的研究是版纳植物园为把基因组学应用在植物功能适应进化与气候变化、物种多样化和共存、以及极度濒危的亚洲热带森林自然资源保护诸方面所迈出的重要一步。(生物谷Bioon.com)
生物谷推荐原始出处:
Molecular Ecology Volume 19 Issue s1, Pages 147 - 161
Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack
CHARLES H. CANNON*?, CHAI-SHIAN KUA*, D. ZHANG* and J.R. HARTING
*Ecological Evolution Group, Xishuangbanna Tropical Botanic Garden, Chinese Academy of Sciences, Menglun, Mengla 666303, China , ?Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
Most comparative genomic analyses of short-read sequence (SRS) data rely upon the prior assembly of a reference sequence. Here, we present an assembly free analysis of SRS data that discovers sequence variants among focal genomes by tabulating the presence and frequency of 'complex' fragments in the data. Using data from nine tree species, we compare genomic diversity from populations to families. As a control, we simulated SRS data for three known plant genomes. The results provide insight into the quality and distributional bias of the sequencing reaction. Three main types of informative complexmers were identified, each possessing unique statistical properties. Type I complexmers are unique to a genome but suffer from a high false positive rate, being highly dependent on read coverage and distribution. Type II complexmers are shared between two genomes and can highlight potential copy-number differences. Type III complexmers are exclusive to a subset of genomes and can be useful for associating genetic differences with phenotypic or geographic variation. At the population level in an endangered timber species, numerous markers were identified that could potentially determine geographic origin of individuals and regulate international trade. We observed that the genomic data for the four fig species were more divergent than for stone oak species, possibly due to their complex pollination syndrome and high rates of gene flow. Our approach greatly enhances the application of SRS technology to the study of non-model organisms and directly identifies the most informative genetic elements for more detailed study and assembly.