单机版Blat的匹配结果中没有percent identity(网络版的Blat输出结果中有)。为了筛选输出结果,常常需要计算每个匹配的percent identity;UCSC的Blat—FAQ中虽然给出了相应的解决办法(http://genome.ucsc.edu/FAQ/FAQblat#blat4),但其代码是用C写的,不利于使用Perl的生物信息学工作人员调用。此处给出相应的perl代码,如下所示(非原创,版权归原作者所有):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | while () { chomp $_; my @v = split( /t/, $_ ); get_pid(@v); } sub get_pid { my @line = @_; my $pid = ( 100.0 - ( &pslCalcMilliBad(@line) * 0.1 ) ); print "The percentage: $pidn"; #return $pid; } sub pslCalcMilliBad { my @cols = @_; # sizeNul depens of dna/Prot my $sizeMul = 1; #if ($option{p}) { # $sizeMul = 3; #} else { # $sizeMul = 1; #} # cols[0] matches # cols[1] misMatches # cols[2] repMaches # cols[4] qNumInsert # cols[6] tNumInsert # cols[11] qStart # cols[12] qEnd # cols[15] tStart # cols[16] tEnd my $qAliSize = $sizeMul * ( $cols[12] - $cols[11] ); my $tAliSize = $cols[16] - $cols[15]; # I want the minimum of qAliSize and tAliSize my $aliSize; $qAliSize < $tAliSize ? $aliSize = $qAliSize : $aliSize = $tAliSize; # return 0 is AliSize == 0 return 0 if ( $aliSize <= 0 ); # size diff my $sizeDiff = $qAliSize - $tAliSize; if ( $sizeDiff < 0 ) { $sizeDiff = 0; #if ($option{m}) { #$sizeDiff = 0; #} else { #$sizeDiff = -($sizeDiff); #} } # insert Factor my $insertFactor = $cols[4]; #$insertFactor += $cols[6] unless ($option{m}); my $milliBad = ( 1000 * ( $cols[1] * $sizeMul + $insertFactor + &round( 3 * log( 1 + $sizeDiff ) ) ) ) / ( $sizeMul * ( $cols[0] + $cols[2] + $cols[1] ) ); return $milliBad; } sub round { my $number = shift; return int( $number + .5 ); } |
默认的psl输入没有 但是其他格式有identify
blast8- NCBI blast tabular format
blast9 – NCBI blast tabular format with comments
请看https://cgwb.nci.nih.gov/goldenPath/help/blatSpec.html
你就贴了这个代码出来,里面是有错误的。看不下去我重新写了下http://xiaofeng1982.blog.163.com/blog/static/31572458201432034241989/测试通过
非常感谢!貌似是我直接抄过来的,没有进行测试,非常抱歉!