University of Alberta的Stothard Research Group主页上有一些很好的生物信息学程序和脚本,如:
- The Sequence Manipulation Suite – a collection of simple programs for generating, formatting, and analyzing short DNA and protein sequences.
- annotate_SNPs.pl – this Perl script annotates SNPs identified by the next-generation sequencing of genomic DNA or transcripts.
- backup.sh – this shell script archives directories of interest on a Linux-based system.
- genome_pattern_search.pl – a Perl program that reads a genomic sequence in FASTA format and searches for the patterns you specify using regular expressions.
- get_cds.pl – this Perl script accepts a GenBank or EMBL file and extracts the protein translations or the DNA coding sequences and writes them to a new file in FASTA format.
- get_genes_in_area.pl – this Perl script accepts as input a position or list of positions in a genome and returns descriptions of nearby genes.
- get_orfs.pl – this Perl script accepts a sequence file as input and extracts the open reading frames (ORFs) greater than or equal to the size you specify.
- get_snps_by_gene_ontology.pl – this Perl script accepts a species name and a Gene Ontology (GO) accession number, and returns a list of SNPs located in or nearby genes associated with the GO accession.
- maq_pipeline.sh – this bash script processes short sequence reads from Illumina’s Genome Analyzer (Solexa) system, using the Maq package.
- md5_sums.pl – this Perl script accepts a list of directories and recursively generates a list of the files in the directories and their MD5 values.
- NGS-SNP – this collection of scripts annotates raw SNP lists returned from programs such as Maq.
- space_check.sh – this shell script monitors hard drive space and sends an email when space becomes scarce.
很不错啊,省掉很多时间。