[转]科学家分析现有全RNA测序工具利弊

科学家分析现有全RNA测序工具利弊

日前,两项刊登在《自然-方法学》(Nature Methods)上的研究对比了用于分析从单个细胞所有 RNA 获得的测序数据的不同计算工具。

生物体内每个细胞的DNA是相同的,但不同种类细胞间的转录为 RNA 的部分基因组却有所差别。这种差别在很大程度上影响细胞产生不同功能,所以弄清楚这种差别很重要。

单个细胞所有 RNA 集合(被称为转录组)的高通量测序采用的是一种名为 RNA-seq 的方法,该测序手段对了解许多基因功能起着一定作用,但是,从75个短碱基对序列片段中重建整个长度可达数千碱基对的转录体则需要更先进的计算工具。

研究人员对比了用于转录体分析的20多种最先进的计算手段,并用来执行 RNA-seq 分析过程的两个重要步骤:

第一项研究探讨了哪种方法最适合将序列片段标记到参考基因组上;

另一项研究侧重于重建被标记序列的转录体所需要的方法。

这两项研究均强调了现有计算方法的优势,也提出了缺点和需要改进之处。大多数转录重建方法能很好地执行一些操作比如转录的部分重组,但所有的方法都无法精确重组整个 RNA 。此次研究为今后校准方法的研发提供了有用的衡量标准。

Assessment of transcript reconstruction methods for RNA-seq

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.

Systematic evaluation of spliced alignment programs for RNA-seq data

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.