去除序列比对之后的连续空位

缘起

示例

去除序列比对结果中的连续空位（此处以不少于2个空位为例）。

比对结果1

>one
ATGC---ATGCA
>two
A---ATG--GC-

处理结果1

>one
AGCA
>two
AGC-

比对结果2

>one
ATGC---ATGCAATTCGC
>two
A---ATG--GC-
>three
A--------G--CT

处理结果2

>one
AGATTCGC
>two
AG
>three
AGCT

程序

我写的程序

#!/usr/bin/perl
 
use strict;
use warnings;
use Bio::SeqIO;
use Set::Infinite;
 
my $file_in  = $ARGV[0];
my $file_out = $ARGV[1];
my ( @aa, %hash, @ab );
 
my $in = Bio::SeqIO->new( -file => "$file_in", '-format' => 'Fasta' );
while ( my $seq = $in->next_seq() ) {
    my $sequence = $seq->seq;
    while ( $sequence =~ m/(-{2,})/g ) {
        my $end   = pos($sequence);
        my $start = $end - length($1);
        push @aa, [ $start, $end ];
    }
}
 
my $set = Set::Infinite->new;
$set = $set->union(@$_) for @aa;
@ab = split /,/, $set;
foreach my $range (@ab) {
    if ( $range =~ m/[(d+)..(d+)]/ ) {
        my $length = $2 - $1;
        $hash{$1} = $length;
    }
}
 
my $in2 = Bio::SeqIO->new( -file => "$file_in", '-format' => 'Fasta' );
open my $OUT, '>', $file_out
    or die "$0 : failed to open  output file '$file_out' : $!n";
select $OUT;
while ( my $seq = $in2->next_seq() ) {
    my $id       = $seq->id;
    my $sequence = $seq->seq;
    foreach my $key ( sort { $b <=> $a } keys %hash ) {
        substr( $sequence, $key, $hash{$key}, "" );
    }
    print ">$idn$sequencen";
}
close $OUT or warn "$0 : failed to close output file '$file_out' : $!n";

师弟snowhawkyrf写的程序

#!perl -w
use strict;
 
if ( !open FILE,   "<$ARGV[0]" ) { die "Cannot open file:$!"; }
if ( !open FILE2,  "<$ARGV[0]" ) { die "Cannot open file:$!"; }
if ( !open RESULT, ">result" )   { die "Cannot open result:$!"; }
my @n; #用于标志序列连续-的状态，后面定义：1为连续-出现的地方，0为正常的序列
 
while (<FILE>) {
chomp;
    if (/^>/) { next; }
    for ( my $t = 10 ; $t >= 2 ; $t-- ) #长度从10到2递减，将匹配的“--”的地方都定义为“11”，可以按照需要，设置最高-的个数为100等数值
    {
        my $temp = 1 x $t;
        s/(-){$t}/$temp/;
    }
    my @a = split //, $_;
    for ( my $i = 0 ; $i <= $#a ; $i++ ) {
        if ( $a[$i] eq "1" ) {
        $n[$i] = 1;
        }
    }
}
 
while (<FILE2>) {
    chomp;
    if (/^>/) { print RESULT "$_n"; next; }
    my @b = split //, $_;
    for ( my $j = 0 ; $j <= $#b ; $j++ ) {
        if ( !defined $n[$j] ) {
        $n[$j] = 0;    #将@n中不是1的地方定义为0
    }
    if ( $n[$j] == 1 ) {
        $b[$j] = "";    #忽略序列中对应@n为1的位置
    }
}
    my $out = join "", @b;
    print RESULT "$outn";
}
 
close FILE;
close FILE2;
close RESULT;

Yixf's blog

Face to Fate in Dream.

去除序列比对之后的连续空位

缘起

示例

比对结果1

处理结果1

比对结果2

处理结果2

程序

我写的程序

师弟snowhawkyrf写的程序

相关

Follow Yixf's blog