3月27日,国际学术期刊BMC Genomics在线发表了中科院上海生科院计算生物学所杨力研究组和生化与细胞所陈玲玲研究组的最新合作研究论文Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences。该项研究发展了一新的计算分析流程,在不需要相应基因组信息的情况下,仅通过对多个样本的RNA转录组信息进行比较,发现了体内存在的大量成簇RNA编辑新位点及其组织差异性调控。
在高等生物中,最主要的RNA编辑是A(adenosine)-to-I(Inosine) 的修饰,其受到蛋白酶ADAR(adenosine deaminases that act on RNA )的催化调控,编辑后腺苷酸(A)变成了次黄嘌呤核苷酸(I)。在翻译水平上,次黄嘌呤核苷酸(I)被识别为鸟核苷酸(G),因此在该位点的编辑相当于A到G的转换,从而改变了所在位置的编码氨基酸序列,丰富了基因的多样性和多能性。同时,存在于非编码区的RNA编辑也可以通过影响RNA的可变剪接和细胞内定位等途径来改变RNA分子的功能和命运,因此RNA编辑调控对于转录后RNA的多样性和功能至关重要。近年来,高通量测序技术被广泛地应用于RNA编辑位点的预测分析,极大地推动了RNA编辑的研究。但是,由于高通量技术和后续计算分析的局限性,对RNA编辑在全转录组水平的精确预测还存在着很大的挑战。
该研究工作发展了一个高效的计算分析流程,并应用于RNA编辑位点的预测,在人体组织中发现了600多个成簇(clustered) A-to-I RNA编辑的新位点及其在人组织间的差异调控;重要的是,该研究还发现了在非重复序列中存在的成簇RNA编辑位点及其序列结构特征。该计算流程及其所带来的新发现,进一步丰富了人们对RNA编辑的认识,也开拓了对RNA编辑功能研究的思路。与以往RNA编辑检测方法不同,这一计算流程不需要测定同一样本的基因组DNA序列来排除背景干扰,而只需要多个样本的RNA转录组信息进行比较,获得高准确度的A-to-I RNA编辑预测。值得一提的是,在此项研究工作审稿过程中,一篇Nat Methods (Ramaswami, et al, Nat Methods, 2013, 10: 128-132)文章报道了另一种只利用转录组RNA信息来预测A-to-I RNA编辑的方法,这提示在今后的研究中可以利用类似的方法对更多转录组数据进行分析,来进一步研究RNA编辑在基因表达调控上的功能作用。
该工作由计算生物学所朱闪闪博士和生化与细胞所研究生向剑锋等共同完成,并得到中科院、国家自然科学基金委、和上海市科委的经费支持。(生物谷Bioon.com)
doi:10.1186/1471-2164-14-206
PMC:
PMID:
Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences
Shanshan Zhu, Jian-Feng Xiang, Tian Chen, Ling-Ling Chen and Li Yang
Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named "editing boxes") in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes.