[荐]一些实用的生信程序和脚本

University of Alberta的Stothard Research Group主页上有一些很好的生物信息学程序和脚本,如:

  • The Sequence Manipulation Suite – a collection of simple programs for generating, formatting, and analyzing short DNA and protein sequences.
  • annotate_SNPs.pl – this Perl script annotates SNPs identified by the next-generation sequencing of genomic DNA or transcripts.
  • backup.sh – this shell script archives directories of interest on a Linux-based system.
  • genome_pattern_search.pl – a Perl program that reads a genomic sequence in FASTA format and searches for the patterns you specify using regular expressions.
  • get_cds.pl – this Perl script accepts a GenBank or EMBL file and extracts the protein translations or the DNA coding sequences and writes them to a new file in FASTA format.
  • get_genes_in_area.pl – this Perl script accepts as input a position or list of positions in a genome and returns descriptions of nearby genes.
  • get_orfs.pl – this Perl script accepts a sequence file as input and extracts the open reading frames (ORFs) greater than or equal to the size you specify.
  • get_snps_by_gene_ontology.pl – this Perl script accepts a species name and a Gene Ontology (GO) accession number, and returns a list of SNPs located in or nearby genes associated with the GO accession.
  • maq_pipeline.sh – this bash script processes short sequence reads from Illumina’s Genome Analyzer (Solexa) system, using the Maq package.
  • md5_sums.pl – this Perl script accepts a list of directories and recursively generates a list of the files in the directories and their MD5 values.
  • NGS-SNP – this collection of scripts annotates raw SNP lists returned from programs such as Maq.
  • space_check.sh – this shell script monitors hard drive space and sends an email when space becomes scarce.