FASTQ格式不同质量的互转

  1. 代码

  2. 1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    #!/usr/bin/perl
    use Bio::SeqIO::fastq;
     
    my $in = Bio::SeqIO->new(
     -format  => 'fastq',
     -variant => 'illumina',
     
     #-variant => 'solexa',
     -file => 'in.fq'
    );
     
    my $out = Bio::SeqIO->new(
     -format  => 'fastq',
     -variant => 'sanger',
     -file    => '>out.fq'
    );
     
    while ( my $seq = $in->next_seq ) {
     $out->write_seq($seq);
    }
  3. 注释

  4. 第2行——使用Bio::SeqIO::fastq模块;
    第5行——指定输入文件的格式为FASTQ;
    第6~8行——指定输入文件FASTQ中的质量标准;
    第9行——指定输入文件的名称;
    第13行——指定输出文件的格式为FASTQ;
    第14行——指定输出文件FASTQ中的质量标准;
    第15行——指定输出文件的名称;

  5. FASTQ的质量变体

    • sanger: original; ASCII encoding from 33-126, PHRED quality score from 0 to 93
    • solexa: Solexa, Inc. (2004), aka Illumina 1.0; ASCII encoding from 59-104, SOLEXA quality score from -5 to 40
    • illumina: Illumina 1.3; ASCII encoding from 64-104, PHRED quality score from 0 to 40
  6. 注意

  7. Variants can be converted back and forth from one another; however, due to the difference in scaling for solexa quality reads, converting from ‘illumina’ or ‘sanger’ FASTQ to solexa is not recommended.

  8. 扩展阅读

  9. Bio::SeqIO::fastq(CPAN)