生物谷报道:来自美国克莱格凡特研究所(J. Craig Venter Institute,由TIGR所建立),加拿大多伦多大学,加州大学圣地亚哥分校,西班牙巴塞罗那大学(Universitat de Barcelona)的研究人员近期公布了单个个体二倍体基因组序列,为未来的基因组比较打开了一道门,也开创了个体基因组信息的新纪元。
我们每个人的基因组信息一般都是被包装进23对染色体中的,每23条来自一个亲代,他们的DNA又是来自其祖先基因的混合。因此人类基因组都是作为二倍体行使功能,而且由于等位基因和/或其非编码功能调控元件之间复杂的相互作用也会产生新表型。
大约40多年前科学家们首次在染色体上观察到了人类基因组的二倍体特性,而且目前临床实验室依然将染色体组型作为全基因组检测的标准。随着分子生物学的进步,其它的比如染色体荧光原位杂交(chromosomal fluorescence in situ hybridization,FISH),以芯片技术为基础的遗传分析等技术也为遗传分析的进步贡献了不小的力量。但是尽管有这些技术,科学家们依然怀疑在实验样品中只观测到一小部分的遗传突变。
过去的十年当中,随着高通量DNA测序技术,以及先进的生物信息学分析方法的发展,获得人类基因组大多数序列的测序结果已经成为可能,国际人类基因序列协会(Human Genome Sequencing Consortium,HGSC)目前已经获得了人类基因组的两个版本version,分别利用的是克隆的方法,以及任意全基因组鸟枪法。
在这篇文章中,克莱格凡特研究所的研究人员分析1900万条基因序列和1300万条非编码序列,使用最新的方法详细检测了不同版本的相同染色体的基因序列。结果发现了400万种变异,包括单个核苷差异、序列插入和删除以及单个基因副本数的不同。
他们利用的方法主要是基于全基因组鸟枪法,并配合先进的基因组组合策略和软件,从而完成了二倍体基因组大片段的测序((>200 kilobases)。与之前的人类基因组序列相比,研究人员发现测序结果中基因组变化的大部分是基于SNPs的已研究过比较多的变异,但是这一测序也发现了一些很少研究的基因组变异,插入和删除,这组成了基因组突变事件的一小部分(22%)。
这些数据描绘了一个二倍体人类基因组的分子特征,为未来的基因组比较打开了一道门,也开创了个体基因组信息的新纪元。
原始出处:
PLoS Biology
Received: May 9, 2007; Accepted: July 30, 2007; Published: September 4, 2007
The Diploid Genome Sequence of an Individual Human
Samuel Levy1*, Granger Sutton1, Pauline C. Ng1, Lars Feuk2, Aaron L. Halpern1, Brian P. Walenz1, Nelson Axelrod1, Jiaqi Huang1, Ewen F. Kirkness1, Gennady Denisov1, Yuan Lin1, Jeffrey R. MacDonald2, Andy Wing Chun Pang2, Mary Shago2, Timothy B. Stockwell1, Alexia Tsiamouri1, Vineet Bafna3, Vikas Bansal3, Saul A. Kravitz1, Dana A. Busam1, Karen Y. Beeson1, Tina C. McIntosh1, Karin A. Remington1, Josep F. Abril4, John Gill1, Jon Borman1, Yu-Hui Rogers1, Marvin E. Frazier1, Stephen W. Scherer2, Robert L. Strausberg1, J. Craig Venter1
1 J. Craig Venter Institute, Rockville, Maryland, United States of America, 2 Program in Genetics and Genomic Biology, The Hospital for Sick Children, and Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada, 3 Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America, 4 Genetics Department, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
Presented here is a genome sequence of an individual human. It was produced from 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.