来自杜克大学的研究人员创造了第一张人类基因组印记基因(imprintedgenes)图谱,并且他们表示其成功的关键在于一个称为机器学习(machinelearning)的人工智能形式:modern-dayRosettastone。这项研究新发现了四倍于之前识别的印记基因,并即将公布在12月3日Genome Research封面上。
印记基因是指存在亲本染色体上的等位基因的表达取决于它们是在父源染色体上还是在母源染色体上,来自父系、母系的印记基因有所不同,当精卵结合时,父母双方印记基因均应出现,否则发育就不正常。这种基因印记是等位基因依赖双亲性别表达的不符合孟德尔遗传定律的特殊遗传现象,基因印记异常调节可引起一些遗传性疾病。
在传统的遗传学中,子女会继承一个基因的两个拷贝,一个来自于父本,一个来自于母本,这两个拷贝的活性形式会影响子女的发育。但是当印记基因出现——这两个拷贝中一个会被来自母本或父本的分子调控关闭,这也就意味着子女只会继承基因的一个拷贝的信息,这样的子女易受到环境压力的影响:如果一个功能拷贝受到损伤或遗失,那么就没有顶替的后备了。
杜克大学放射肿瘤学及病理学系的遗传学家Randy Jirtle博士表示,“基因印记一直以来都是一个谜,这部分是由于它们并不遵循传统的遗传规律”,“我们希望这一新发现的roadmap能帮助我们和其他研究人员发现更多有关这些基因如何影响我们的健康的信息。”
在文章的其他作者AlexanderHartemink,PhilippeLuedi的合作下,Jirtle他们将两类基因——一类是已知的印记基因,一类不是——的序列数据输入到计算机中,利用程序帮助发现其中的差别,通过这一机器分析的方法获得了一个运算法则:能像最原始的Rosettastone解码看上去费解的数据,在这里指的是指向印记基因的特异性DNA序列。
Hartemink表示,“我们不能完全肯定的说我们识别了所有印记基因,但是我们认为这是其中的大部分。”
Jirtle研究印记已经多年了,他表示印记事件是一个表观遗传事件,这也就是说不需要改变DNA的序列就可以改变基因的功能,“印记基因容易受到环境的攻击——甚至是我们的饮食和呼吸。而且重要的是,表观遗传变化是可以遗传,我想人们还没有意识到这一点。”
预计印记基因占人类基因组的1%,并且至今只发现了一部分,利用这一研究中的新“Rosettastone”方法,Jirtel和Hartemink发现了156个新的印记基因,其中两个特殊基因定位在8号染色体上,这在之前是没有发现过的,其中一个基因:KCNK9,在大脑中十分活跃,已知是引起癌症,和双相障碍(bipolardisorder),癫痫的原因之一,而第二个基因:DLGAP2是一个可能的膀胱癌肿瘤抑制因子。
原始出处:
Cover Just as the discovery of the Rosetta Stone by Napoleon’s troops in 1799 led to the deciphering of Egyptian hieroglyphics, computational machine learning techniques have recently been used to decipher the imprint status of a gene from nearby genomic sequence features. These techniques permit the genome-wide identification of human genes that have a high probability of being imprinted. These candidate imprinted genes are in turn linked to complex human conditions where parent-of-origin inheritance is involved. (Cover design by James V. Jirtle, Webwiz Design, www.webwizdesign.com. Phototgraph of the Rosetta Stone used with permission © The Trustees of the British Museum.
Published online before print November 30, 2007, 10.1101/gr.6584707
Genome Res. 17:1723-1730, 2007
Computational and experimental identification of novel human imprinted genes
Philippe P. Luedi1, Fred S. Dietrich2,3, Jennifer R. Weidman4, Jason M. Bosko5, Randy L. Jirtle4,6, and Alexander J. Hartemink1,5,6
1 Center for Bioinformatics and Computational Biology, Duke University, Durham, North Carolina 27708, USA; 2 Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA; 3 Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA; 4 Department of Radiation Oncology, Duke University Medical Center, Durham, North Carolina 27710, USA; 5 Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
Imprinted genes are essential in embryonic development, and imprinting dysregulation contributes to human disease. We report two new human imprinted genes: KCNK9 is predominantly expressed in the brain, is a known oncogene, and may be involved in bipolar disorder and epilepsy, while DLGAP2 is a candidate bladder cancer tumor suppressor. Both genes lie on chromosome 8, not previously suspected to contain imprinted genes. We identified these genes, along with 154 others, based on the predictions of multiple classification algorithms using DNA sequence characteristics as features. Our findings demonstrate that DNA sequence characteristics, including recombination hot spots, are sufficient to accurately predict the imprinting status of individual genes in the human genome.
6 Corresponding authors.
E-mail amink@cs.duke.edu ; fax (919) 660-6519.
E-mail jirtle@radonc.duke.edu ; fax (919) 684-5584.