近日,国际著名学术期刊Human Molecular Genetics在线刊登了了上海生科院计算生物学所金力教授等的最新研究成果“A Systematic Characterization of Genes Underlying both Complex and Mendelian Diseases”。该研究系统分析了人类疾病相关基因的特征并与其他各类基因进行了全面比较,研究结果对于认识人类疾病基因的特性,了解致病遗传变异的产生、基因组分布、自然选择和进化机制,以及理解疾病基因表达和调控网络模式有重要理论意义,对研究复杂疾病的实验设计具有参考价值和指导性。
传统上,遗传类疾病可以分为罕见的孟德尔遗传病(Mendelian Diseases)和较常见的复杂疾病(Common Diseases or Complex Diseases)。孟德尔遗传病往往由单基因控制,在人群中的发病率很低,表现出很强的家族聚集性,如镰刀性贫血、白化病、色盲、苯丙酮尿症、血友病、短指症等。导致孟德尔遗传病的单基因变异效应较强,这些疾病在家系传递中遵循孟德尔遗传定律。这类疾病基因也基本上都是通过基于家系数据的连锁分析(Linkage Analysis)鉴定出来的。复杂疾病往往受多个基因控制,在人群中发病率高,因而也称常见疾病,如癌症、高血压、糖尿病、哮喘、精神分裂症等。复杂疾病是在多种因素的共同作用下发生的,其遗传模式复杂,不遵循典型的孟德尔遗传定律,每个基因变异的效应很弱,受人群和个体整体的遗传背景影响很大,这类疾病基因主要是通过关联研究(Association Study)鉴定出来的。近几年大量的全基因组关联研究(GWAS)已经发现大量与复杂疾病相关的基因和遗传变异。
然而,对目前最新的孟德尔遗传病基因数据库和复杂疾病基因数据库进行对比分析后,该研究发现孟德尔遗传病基因和复杂疾病基因并不像通常分类那样界限分明,相反地,在两类疾病基因存在大量的重叠,即与两类疾病共同关联的基因(姑且称之为双联基因),而且比基于统计学随机假设的预期数目多出8倍。为了阐明这类双联基因的特征,该研究对已知的人类基因按照功能重要性和与疾病的关系进行分类,除了双联基因,还有必需基因(Essential Genes),单基因疾病基因(去除与复杂疾病相关基因),复杂疾病基因(去除与单基因疾病相关基因)和其它基因,并进行了系统的比较分析。该研究发现双联基因和复杂疾病基因都受到了近期的正向自然选择,而必需基因和单基因疾病基因受到较强的负向选择。对物种间差异数据分析表明必需基因总是最保守,这支持必需基因在长期的进化史中总是受到最强的负向选择;而双联基因在保守性上排在第二位,提示其在进化中也受到较强的负向选择。同时,该研究也比较了各类基因在基因表达模式、基因结构、蛋白蛋白相互作用和群体分化等方面的差异。该研究根据这些分析结果推测双联基因的很多特征和他们在复杂疾病和单基因疾病中的双重作用相关。该研究是第一个对双联基因特征进行系统分析的研究,结果同时也对其它四类基因的特征有新的认识。比如该研究发现很多复杂疾病基因落在拷贝数变异区域(Copy Number Variations, CNVs),表明拷贝数变异可能在很多种复杂疾病的遗传因素中起着重要作用。拷贝数变异在不同类基因类型中的富集分析也支持双联基因同时受到较强的正选择与负向选择。
该工作由博士生靳文菲、秦鹏飞和楼海一在导师金力教授和徐书华研究员的指导下共同完成。该研究工作得到了国家自然科学基金委、上海市科委、中国科学院、德国马普学会、香港王宽诚教育基金会等多项基金的资助。(生物谷Bioon.com)
doi:10.1093/hmg/ddr599
PMC:
PMID:
A Systematic Characterization of Genes Underlying both Complex and Mendelian Diseases
Wenfei Jin1, Pengfei Qin1, Haiyi Lou1, Li Jin1,2,* and Shuhua Xu1,*
Traditionally, genetic disorders have been classified as either Mendelian or complex diseases. This nosology has greatly benefited genetic counseling and the development of gene mapping strategies. However, based on two well-established databases, we identified that 54% (524 of 968) of the Mendelian diseases genes were also involved in complex diseases, and this kind of genes has not been systematically analyzed. Here, we classified human genes into five categories: Mendelian and complex diseases (MC) genes, Mendelian but not complex diseases (MNC) genes, complex but not Mendelian diseases (CNM) genes, essential genes and OTHER genes. Firstly, we found that MC genes were associated with more diseases and phenotypes, and were involved in more complex protein-protein interaction network than MNC or CNM genes on average. Secondly, MC genes encoded the longest proteins and had the highest transcript count among all gene categories. Especially, tissue specificity of MC genes was much higher than that of any other gene categories (P< 7.5×10-5), although their expression level was similar to that of essential genes. Thirdly, evidences from different aspects supported that MC genes have been subjected to both purifying and positive selection. Interestingly, functions of some human disease genes might be different from those of their orthologous genes in non-primate-mammalians since they were even less conserved than OTHER genes. The significant over-representation of CNVs in CNM genes suggested the important roles of CNVs in complex diseases. In brief, our study not only revealed the characteristics of MC genes, but also provided new insights into the other four gene categories.