10月 13

人类基因组中的基因密度、特征长度和SNP密度

一、源起

本周,在为本科生设计生物信息学实验的时候,突发奇想,想要利用Galaxy工具计算一下人类基因组的常识性信息,如:每条染色体上的基因密度,外显子、内含子等特征(feature)的平均长度,SNP在UTR、编码区、内含子等不同特征区域的密度。

二、工具

系统平台:Linux(Ubuntu 12.04,AMD64)。
UCSC Table:下载基因组数据至本地。
Galaxy:用于在线处理基因组数据。
BEDTools(v2.16.2):用于本地处理基因组数据
R(v2.15.1):绘制图表。
其他:Vim(v7.3.429)。

三、数据库

human genome:hg19
dbSNP:135

四、结果

1.基因在每条染色体上的数目与密度。

  • 数据表格
chromosomelength.bplength.100kbgeneNumbergeneDensity.numberPer100Mb
chr12492506212492.5062141771675.82330718
chr22431993732431.9937325631053.86784858
chr31980224301980.224322511136.73991376
chr41911542761911.542761592832.835149343
chr51809152601809.15261739961.223503203
chr61711150671711.1506720711210.29669468
chr71591386631591.3866319401219.06264853
chrX1552705601552.705620831341.5292635
chr81463640221463.640221437981.79865541
chr91412134311412.1343115731113.91670669
chr101355347471355.3474717491290.4439922
chr111350065161350.0651624851840.65189861
chr121338518951338.5189521021570.39241021
chr131151698781151.69878711617.348921738
chr141073495401073.495413341242.66950748
chr151025313921025.3139213571323.49710028
chr1690354753903.5475316001770.79782399
chr1781195210811.952123312870.85900757
chr1878077248780.77248599767.188925511
chr2063025520630.255211671851.63089491
chrY59373566593.73566347584.435167664
chr1959128983591.2898327164593.34807095
chr2251304566513.045669241801.00929028
chr2148129895481.298955341109.49753786

 

  • 条形图展示

基因在每条染色体上的数目与密度
Continue reading

3月 16

Bioinformatics Links Directory

The Bioinformatics Links Directory features curated links to molecular resources, tools and databases. The links listed in this directory are selected on the basis of recommendations from bioinformatics experts in the field. We also rely on input from our community of bioinformatics users for suggestions. Starting in 2003, we have also started listing all links contained in the NAR Webserver issue.……【阅读全文】

6月 21

产生深度测序模拟数据的程序

  1. DNAA中的dwgsim

  2. DNAA is the DNA analysis package, for analyzing next-generation post-alignment whole genome resequencing data. Specifically, DNAA is able to find structural variation, SNP and indel variants, as well as evaluating the mapping and data quality.
    主页
    下载
    wiki
    帮助……【阅读全文】

4月 14

【转载】编程与女人(四则)

之一

程序员就像男人,语言就像女人,每个男人都想要很多女人,却很少有男人能真正了解一个女人。

之二

男人都像Python,总喜欢更美的(Beautiful is better than ugly);
又想遇到像Perl一样的女人,因为有多种方式可以用(There is more than one way to do it)。
Continue reading