-
缘起
模仿lh3开发的网页版工具SeqTools
-
功能
1.格式化FASTA文件。
2.反向互补FASTA序列。
3.获取FASTA序列的长度。
4.计算GC含量并对ATGC计数。
5.搜索模式(子序列、motif等)。
-
改进
1.添加了GC含量的计算和ATGC的计数。
2.扩展了search功能(得益于Perl的正则表达式)。
-
不足
没有网页界面。
(有熟悉perl与网页编程的朋友可以改写一下:只要在浏览器中打开就可以使用)
-
源码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | #!/usr/bin/perl #Please replace all "《" and "》" with "",respectively. use strict; use warnings; use Getopt::Std; use Bio::SeqIO; &main; exit; sub main { &usage if ( @ARGV 《 1 ); my $command = shift(@ARGV); my %function = ( format =》 &format, revcom =》 &revcom, length =》 &length, content =》 &content, search =》 &search ); die("Unknown command "$command".n") if ( !defined( $function{$command} ) ); }; } sub format { my %opts = ( w =》 60 ); getopts( 'i:o:w:', %opts ); die( qq/ Usage: seqTools.pl format [options] Options: -i STR Input file in FASTA format. -o STR Output file in FASTA format. -w INT The line width for FASTA output. [$opts{w}] n/ ) unless ( exists $opts{i} && exists $opts{o} ); my ( $fi, $fo, $width ) = ( $opts{i}, $opts{o}, $opts{w} ); my $in = Bio::SeqIO-》new( -file =》 "$fi", -format =》 'fasta' ); my $out = Bio::SeqIO-》new( -file =》 "》$fo", -format =》 'fasta' ); while ( my $seq = $in-》next_seq() ) { $out-》width($width); $out-》write_seq($seq); } } sub revcom { my %opts = ( w =》 60 ); getopts( 'i:o:', %opts ); die( qq/ Usage: seqTools.pl revcom [options] Options: -i STR Input file in FASTA format. -o STR Output file in FASTA format. -w INT The line width for FASTA output. [$opts{w}] n/ ) unless ( exists $opts{i} && exists $opts{o} ); my ( $fi, $fo, $width ) = ( $opts{i}, $opts{o}, $opts{w} ); my $in = Bio::SeqIO-》new( -file =》 "$fi", -format =》 'fasta' ); my $out = Bio::SeqIO-》new( -file =》 "》$fo", -format =》 'fasta' ); while ( my $seq = $in-》next_seq() ) { my $rc_seq = $seq-》revcom; $out-》width($width); $out-》write_seq($rc_seq); } } sub length { my %opts; getopts( 'i:', %opts ); die( qq/ Usage: seqTools.pl length [options] Options: -i STR Input file in FASTA format. n/ ) unless ( exists $opts{i} ); my $fi = $opts{i}; my $in = Bio::SeqIO-》new( -file =》 "$fi", -format =》 'fasta' ); while ( my $seq = $in-》next_seq() ) { print $seq-》id, "t", $seq-》length, "n"; } } sub content { my %opts; getopts( 'i:p:', %opts ); die( qq/ Usage: seqTools.pl content [options] Options: -i STR Input file in FASTA format. n/ ) unless ( exists $opts{i} ); my $fi = $opts{i}; my $in = Bio::SeqIO-》new( -file =》 "$fi", -format =》 'fasta' ); while ( my $seq = $in-》next_seq() ) { print $seq-》id, "t"; my $length = $seq-》length; my $sequence = $seq-》seq; my $num_a = $sequence =~ tr/A//; my $num_c = $sequence =~ tr/C//; my $num_g = $sequence =~ tr/G//; my $num_t = $sequence =~ tr/T//; my $num_o = $length - $num_a - $num_c - $num_g - $num_t; my $gc = sprintf( "%.2f", ( $num_c + $num_g ) / $length * 100 ); print "Length=$lengthtGC=${gc}%tA=$num_a,C=$num_c,G=$num_g,T=$num_t,Others=$num_on"; } } sub search { my %opts; getopts( 'i:p:', %opts ); die( qq/ Usage: seqTools.pl search [options] Options: -i STR Input file in FASTA format. -p STR Pattern you want to search. Notes: 1. You can use RegExp supported by Perl. 2. It uses case-sensitive mode. n/ ) unless ( exists $opts{i} && $opts{p} ); my ( $fi, $pattern ) = ( $opts{i}, $opts{p} ); my $in = Bio::SeqIO-》new( -file =》 "$fi", -format =》 'fasta' ); while ( my $seq = $in-》next_seq() ) { print $seq-》id, "t"; my @starts; my $sequence = $seq-》seq; while ( $sequence =~ m/($pattern)/g ) { my $start = pos($sequence) - CORE::length($1) + 1; pos($sequence) = $start; push @starts, $start; } my $num = @starts; my $str = join ",", @starts; print "$numt$strn"; } } sub usage { die( qq/ Usage: seqTools.pl 《command》 [《arguments》]n Command: format Format the FASTA record(s). revcom Reverse and complement the FASTA record(s). length Get the length of FASTA record(s). content Calculate the GC content of FASTA record(s). search Find subseq in the FASTA record(s). Author: Yixf, yixf1986@gmail.com Version: 20110601 Notes: 1. A similar webtool developed by lh3: http://lh3lh3.users.sourceforge.net/fasta.shtml 2. When you search, you can use RegExp supported by Perl. 3. The search function use case-sensitive mode. n/ ); } |
main函数~很有C的风格
这个程序是我模仿lh3老师的一个Perl程序写的,我自己一点不懂C。
PS:你的BioPerl教程啥时候继续呀?俺还一直期待着呢。
现在国内基本上没有BioPerl的系统教程,希望你能尽快完工,填补此项空白。
PS2:如果你不抓紧时间,等7.7《仙剑奇侠传5》上市了,估计忙着玩游戏,更没时间了。嘿嘿。
呵呵,我没那么痴迷的。一般每周就玩一两小时左右吧。
BioPerl的其他内容涉及得比较深,不是很容易写出来。而且我发现以前写的内容还有些问题。现正在整理中……很感谢你的鼓励!
建议你给博客加一个评论回复和邮件通知功能,否则我根本看不到回复呢~