人类基因存在着大范围的复制和缺失,并且与人类基因组的变异和多态性有关,而基因的拷贝数是一个重要因素,但是,由于在高分辨率下,确定基因组的DNA 拷贝数的能力的限制,因而,没有针对在整个基因组扫描这种拷贝数多态性(copy number polymorphisms,简称CNPs)的技术,故人们对CNPs对人类基因的变异以及多态性的影响程度所知甚少。
在2004年7月23日的Science上,发表了冷泉港实验室由Michael Wigler领导的一项新研究,此项研究借助一种新技术——代表性寡核苷酸微阵列分析ROMA(representational oligonucleotide microarray analysis),揭示出不同人的正常细胞DNA之间存在惊人的差异。
他们从来自不同地域的个体上采得血样和多种组织样品。然后用ROMA技术,借助一套探针,做差异标记杂交,测定从样品中提纯的染色体DNA 的相对浓度。简单的说,他们用代表性Bgl II基因组法,大大降低了样品的复杂性;寡聚核苷酸微阵列探针是从人的染色体序列汇编分析得到,而设计在芯片上,并通过操作进一步优化;而杂交数据用Hidden Markov 模型(HMM)进行分析。
他们对来自不同地域的20名实验对象的血液及组织样本进行了分析。他们鉴定了221个拷贝数差异,并发现所有志愿者体细胞中有70个基因存在76处“拷贝数多态性”或称CNPs,表现为大段DNA序列的缺失或复制。在70个与新发现的CNPs有关的基因中,有一些神经发育有关,一些则与细胞生长调控有关,一些CNPs的基因与代谢调控有关,另外有些已知与疾病有关。
此项研究的结果是令人震惊的,而且ROMA技术功能是强大的。ROMA的几个特征决定了它的信噪比高于全基因组DNA与BACs芯片杂交获得的信号。研究人员正在不断的改进ROMA技术,以期能发现更多的有关人类基因组中大范围多态性的信息。
Large-Scale Copy Number Polymorphism in the Human Genome
The extent to which large duplications and deletions contribute to human genetic variation and diversity is unknown. Here, we show that large-scale copy number polymorphisms (CNPs) (about 100 kilobases and greater) contribute substantially to genomic variation between normal humans. Representational oligonucleotide microarray analysis of 20 individuals revealed a total of 221 copy number differences representing 76 unique CNPs. On average, individuals differed by 11 CNPs, and the average length of a CNP interval was 465 kilobases. We observed copy number variation of 70 different genes within CNP intervals, including genes involved in neurological function, regulation of cell growth, regulation of metabolism, and several genes known to be associated with disease.
Fig. 1. Genome-wide map of CNPs identified by ROMA. The position of all CNPs (excluding somatic differences) is shown. CNPs identified in multiple individuals (by Bgl II–ROMA) are indicated in yellow, and CNPs observed in only one individual are indicated in red. Additional CNPs identified by one Hind III–ROMA experiment are indicated in blue. Symbols denoting CNPs are not drawn to scale. Genome assembly gaps in pericentromeric and satellite regions are indicated by gray boxes. Genomic regions where recurring de novo rearrangements cause the developmental disorders Prader-Willi and Angelman syndromes, cat eye syndrome, DiGeorge/velocardiofacial syndrome, and spinal muscular atrophy are labeled A, B, C, and D, respectively.
Fig. 2. Validation of ROMA results by FISH. (A), (C), (E), and (G) show CNPs identified by ROMA and include the CNP identification number, the name of one gene located entirely within the interval, and the experiment name. (B), (D), (F), (H), and (I) show cytogenetic analyses of one or both individuals with probes that target the same CNP intervals. In all panels, the polymorphic probe is labeled red. In interphase cells [(B), (D), and (F)], a control probe (labeled green) was also included to confirm that cells were diploid. (B) CNP15 probe in GM11322 cells; (D) CNP56 probe in GM10470 cells; (F) CNP21 probe in GM10470 cells; (H) CNP32 probe in GM10540 cells; (I) CNP32 probe in SKN1 cells. In (I), one parental copy of chromosome 16 in SKN1 lacks the duplication (arrow).
全文