人类基因组中的基因密度、特征长度和SNP密度

一、源起

本周,在为本科生设计生物信息学实验的时候,突发奇想,想要利用Galaxy工具计算一下人类基因组的常识性信息,如:每条染色体上的基因密度,外显子、内含子等特征(feature)的平均长度,SNP在UTR、编码区、内含子等不同特征区域的密度。

二、工具

系统平台:Linux(Ubuntu 12.04,AMD64)。
UCSC Table:下载基因组数据至本地。
Galaxy:用于在线处理基因组数据。
BEDTools(v2.16.2):用于本地处理基因组数据
R(v2.15.1):绘制图表。
其他:Vim(v7.3.429)。

三、数据库

human genome:hg19
dbSNP:135

四、结果

1.基因在每条染色体上的数目与密度。

  • 数据表格
chromosomelength.bplength.100kbgeneNumbergeneDensity.numberPer100Mb
chr12492506212492.5062141771675.82330718
chr22431993732431.9937325631053.86784858
chr31980224301980.224322511136.73991376
chr41911542761911.542761592832.835149343
chr51809152601809.15261739961.223503203
chr61711150671711.1506720711210.29669468
chr71591386631591.3866319401219.06264853
chrX1552705601552.705620831341.5292635
chr81463640221463.640221437981.79865541
chr91412134311412.1343115731113.91670669
chr101355347471355.3474717491290.4439922
chr111350065161350.0651624851840.65189861
chr121338518951338.5189521021570.39241021
chr131151698781151.69878711617.348921738
chr141073495401073.495413341242.66950748
chr151025313921025.3139213571323.49710028
chr1690354753903.5475316001770.79782399
chr1781195210811.952123312870.85900757
chr1878077248780.77248599767.188925511
chr2063025520630.255211671851.63089491
chrY59373566593.73566347584.435167664
chr1959128983591.2898327164593.34807095
chr2251304566513.045669241801.00929028
chr2148129895481.298955341109.49753786

 

  • 条形图展示

基因在每条染色体上的数目与密度

2.人类基因组不同特征的长度统计。

使用的特征包括:基因,基因间,外显子,内含子,5‘UTR 外显子,编码区外显子,3’UTR 外显子。

  • 基本统计信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#Summary of gene length:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     20    6442   20250   56380   57220 2305000 
#Summary of intergenic length:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
       1     4359    18160    90600    58490 31220000 
#Summary of exon length:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    2.0    93.0   133.0   307.4   199.0 91670.0 
#Summary of intron length:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1     473    1516    6127    4228 1044000 
#Summary of utr5 length:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0    63.0   114.0   203.3   204.0 37030.0 
#Summary of coding length:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0    85.0   122.0   165.3   169.0 21690.0 
#Summary of utr3 length:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0   128.0   395.0   989.5  1318.0 91670.0
  • 盒形图展示

人类基因组不同特征的长度统计

3.每条染色体与不同特征上的SNP密度。

使用的特征包括:基因,基因间,基因上游200bp,基因下游200bp,外显子,内含子,5‘UTR 外显子,编码区外显子,3’UTR 外显子。

  • 数据表格
chromosomelength.featurenumber.snpdensity.snpdensity.snp.deviationFromMeanfeature
chr12139110527612393.55867-0.102761224115423gene
chr101495143615879863.932640.271208775884577gene
chr111182266794468513.779610.118178775884576gene
chr121259216334676973.714190.0527587758845764gene
chr13572887172191293.824990.163558775884577gene
chr14767192432913463.797560.136128775884576gene
chr15730347802665683.64988-0.0115512241154234gene
chr16660827243135254.744431.08299877588458gene
chr17829346623055303.683980.0225487758845766gene
chr18595126912262273.801320.139888775884577gene
chr19523732482208494.216830.555398775884576gene
chr21934856506953703.59391-0.0675212241154233gene
chr20526582052029523.854140.192708775884577gene
chr21310283651444814.656420.994988775884576gene
chr22371559011520104.091140.429708775884577gene
chr31750077766458593.690460.0290287758845764gene
chr41282464144910093.828640.167208775884577gene
chr51253818704335273.45765-0.203781224115423gene
chr61284869605143994.003510.342078775884577gene
chr71455748885414023.719060.0576287758845764gene
chr81084626454312133.975680.314248775884577gene
chr9922069823599173.903360.241928775884577gene
chrX1189363542066131.73717-1.92426122411542gene
chrY2679528243380.161894-3.49953722411542gene
chr11423481174805873.37614-0.233800339848831intergenic
chr10739869123139074.242740.632799660151169intergenic
chr11759745073188654.1970.587059660151169intergenic
chr12749448903081424.111580.501639660151169intergenic
chr13800276622643603.30336-0.306580339848831intergenic
chr14717352672288743.19054-0.419400339848831intergenic
chr15618595791729072.79515-0.814790339848831intergenic
chr16549410771914313.4843-0.125640339848831intergenic
chr17382064631428103.737850.127909660151169intergenic
chr18481280781989864.134510.524569660151168intergenic
chr19302017091247204.129570.519629660151169intergenic
chr21436725245608013.903330.293389660151169intergenic
chr20355781331436784.038380.428439660151169intergenic
chr21343669321004812.92377-0.686170339848831intergenic
chr2232785971782502.38669-1.22325033984883intergenic
chr31092607424498324.117050.507109660151169intergenic
chr41239885895377584.337160.727219660151169intergenic
chr51159349564705334.058590.448649660151168intergenic
chr61018990624654434.567690.957749660151169intergenic
chr7863179783573874.140350.530409660151169intergenic
chr8892801973768294.220750.610809660151169intergenic
chr9916927032834283.09106-0.518880339848831intergenic
chrX1064928432059061.93352-1.67642033984883intergenic
chrY5487873053710.0978703-3.51207003984883intergenic
chr183540034634.145320.0605169074655709up200
chr1035080014484.127710.0429069074655715up200
chr1149700023214.670020.585216907465571up200
chr1242040017144.07707-0.007733092534429up200
chr131422005694.00141-0.0833930925344291up200
chr1426680010243.83808-0.246723092534429up200
chr1527140010904.01621-0.0685930925344289up200
chr1632000013454.203120.118316907465571up200
chr1746620018593.98756-0.0972430925344288up200
chr181198005284.407350.322546907465571up200
chr1954320025444.683360.598556907465571up200
chr251260021274.149430.0646269074655708up200
chr202334009474.05741-0.027393092534429up200
chr211070005675.299071.21426690746557up200
chr221848009515.14611.06129690746557up200
chr345020017503.88716-0.197643092534429up200
chr431860012443.90458-0.180223092534429up200
chr534800012913.70977-0.375033092534429up200
chr641420025426.137132.05232690746557up200
chr738800014973.85825-0.226553092534429up200
chr828740011744.08499.69074655712276e-05up200
chr931460012734.04641-0.0383930925344291up200
chrX4166005361.28661-2.79819309253443up200
chrY69400100.144092-3.94071109253443up200
chr183540031553.776630.0788769195457841down200
chr1035080013553.86260.164846919545784down200
chr1149700018803.78270.0849469195457844down200
chr1242040014973.56089-0.136863080454216down200
chr131422004923.45992-0.237833080454216down200
chr142668009953.729390.0316369195457842down200
chr1527140010143.736180.0384269195457843down200
chr1632000012473.896880.199126919545784down200
chr1746620016823.60789-0.089863080454216down200
chr181198004673.898160.200406919545784down200
chr1954320022694.17710.479346919545784down200
chr251260017403.39446-0.303293080454216down200
chr202334009173.928880.231126919545784down200
chr211070005445.084111.38635691954578down200
chr221848008964.848481.15072691954578down200
chr345020015873.5251-0.172653080454216down200
chr431860011933.744510.0467569195457842down200
chr534800013163.781610.0838569195457843down200
chr641420022055.323521.62576691954578down200
chr738800013823.56186-0.135893080454216down200
chr828740010733.733470.0357169195457843down200
chr931460011823.757150.0593969195457844down200
chrX4166005171.241-2.45675308045422down200
chrY6940050.0720461-3.62570698045422down200
chr112660160385123.041980.019269884972156exon
chr105621527173263.082080.0593698849721558exon
chr117073434217033.068240.0455298849721557exon
chr126869799196052.8538-0.168910115027844exon
chr13221246064472.91395-0.108760115027844exon
chr143912101120143.070980.0482698849721559exon
chr154099640113832.77659-0.246120115027844exon
chr164539134146003.216470.193759884972156exon
chr176744232208343.089160.066449884972156exon
chr18217725865132.99138-0.0313301150278442exon
chr196462276250093.870.847289884972156exon
chr29173129248092.70453-0.318180115027844exon
chr203188751103423.243280.220569884972156exon
chr21139647354493.901970.879259884972156exon
chr222733051100983.694770.672059884972156exon
chr37504631210552.8056-0.217110115027844exon
chr45140247152482.96639-0.0563201150278441exon
chr55935961172002.89759-0.125120115027844exon
chr66264781256384.09241.06968988497216exon
chr75951915184363.097490.0747798849721559exon
chr84476053144213.221810.199099884972156exon
chr94822501142412.95303-0.0696801150278441exon
chrX562125182401.46587-1.55684011502784exon
chrY9684893760.388234-2.63447611502784exon
chr12012508927227303.59119-0.104952486392963intron
chr101438928345706753.965970.269827513607037intron
chr111111532454251493.824890.128747513607037intron
chr121190518344480943.763860.0677175136070374intron
chr13550762572126823.861590.165447513607037intron
chr14728071422793323.83660.140457513607037intron
chr15689351402552103.702180.00603751360703697intron
chr16615435902989264.857141.16099751360704intron
chr17761904302847013.73670.0405575136070371intron
chr18573354332197143.832080.135937513607037intron
chr19459109721958434.265710.569567513607038intron
chr21843125216705623.63818-0.0579624863929626intron
chr20494694541926103.893510.197367513607037intron
chr21296318921390384.692170.996027513607037intron
chr22344228501419144.122670.426527513607037intron
chr31675031456248103.730140.0339975136070372intron
chr41231061674757613.864640.168497513607037intron
chr51194459094163403.48559-0.210552486392963intron
chr61222221794887693.999020.302877513607037intron
chr71396229735229803.745660.0495175136070372intron
chr81039865924167924.008130.311987513607038intron
chr9873844813456813.955860.259717513607037intron
chrX1133151031983731.75063-1.94551248639296intron
chrY2582679339620.153407-3.54273548639296intron
chr1150141254673.641240.0411945865830874utr5
chr1065527124533.743490.143444586583088utr5
chr1180112327203.39523-0.204815413416912utr5
chr1276574727963.651340.0512945865830874utr5
chr132616739373.58081-0.0192354134169124utr5
chr1447481417683.723560.123514586583088utr5
chr1555499214972.69734-0.902705413416912utr5
chr1649916416873.37965-0.220395413416913utr5
chr1777705130633.941830.341784586583088utr5
chr182108888313.940480.340434586583088utr5
chr1973286331854.345970.745924586583088utr5
chr292732130943.33649-0.263555413416912utr5
chr2040842215053.684910.0848645865830875utr5
chr212149519014.191650.591604586583088utr5
chr2239578215563.931460.331414586583088utr5
chr383516230703.675930.0758845865830877utr5
chr456602421693.831990.231944586583087utr5
chr563473321573.39828-0.201765413416912utr5
chr670424139175.562021.96197458658309utr5
chr774713326043.48532-0.114725413416912utr5
chr849758119173.852640.252594586583088utr5
chr955496919923.58939-0.0106554134169126utr5
chrX6598759761.47907-2.12097541341691utr5
chrY151958580.381684-3.21836141341691utr5
chr15648070150772.669410.0520799475195366coding
chr10245892863762.593-0.0243300524804635coding
chr11325075090412.78120.163869947519537coding
chr12304711970652.31858-0.298750052480464coding
chr1396855422212.29311-0.324220052480463coding
chr14171162644612.60629-0.0110400524804635coding
chr15186035243222.32322-0.294110052480463coding
chr16226265563562.809090.191759947519536coding
chr17323512186082.66080.0434699475195366coding
chr1888860622532.53543-0.0819000524804636coding
chr193431310120513.512070.894739947519537coding
chr24531228105132.32012-0.297210052480463coding
chr20136688739552.893440.276109947519537coding
chr2161022219263.156230.538899947519536coding
chr22119078637843.177730.560399947519536coding
chr3334778877662.31974-0.297590052480464coding
chr4234549254162.30911-0.308220052480463coding
chr5262091864902.47623-0.141100052480463coding
chr62863757107493.753461.13612994751954coding
chr7258869567622.61213-0.0052000524804634coding
chr8191554151902.709420.0920899475195367coding
chr9229361357552.50914-0.108190052480464coding
chrX256987736861.43431-1.18302005248046coding
chrY2898081440.496881-2.12044905248046coding
chr15510678179683.26058-0.0385129867774729utr3
chr10250732885073.392850.0937570132225272utr3
chr11302156199423.29035-0.00874298677747287utr3
chr12305693397443.18751-0.111582986777473utr3
chr1398223332893.348490.049397013222527utr3
chr14172566157853.352340.0532470132225269utr3
chr15168429655643.303460.00436701322252686utr3
chr16177731565573.689270.390177013222527utr3
chr17273206091633.353880.0547870132225272utr3
chr18107776434293.18159-0.117502986777473utr3
chr19229810397734.252640.953547013222527utr3
chr23714580112023.01568-0.283412986777473utr3
chr20141344248823.453980.154887013222527utr3
chr2157130026224.589531.29043701322253utr3
chr22114648347584.150080.850987013222527utr3
chr33321681102193.07645-0.222642986777473utr3
chr4222873176633.438280.139187013222527utr3
chr5268031085533.19105-0.108042986777473utr3
chr62696783109724.068550.769457013222527utr3
chr7261608790703.467010.167917013222527utr3
chr8206293173143.545440.246347013222527utr3
chr9197391964943.2899-0.00919298677747316utr3
chrX239149935781.49613-1.80296298677747utr3
chrY5267231740.330344-2.96874898677747utr3

 

  • 条形图展示

snpDensity1
snpDensity2
snpDensity3

五、扩展

关于EMBOSS和Galaxy的生物信息学实验课程设计

六、下载

所有数据(统计数据、程序脚本、结果图表)打包下载

PS:人类基因组范围(包括性染色体和线粒体,不包括chr*_*)的SNP密度(snp总数/基因组长度)为:11329891/3095693983*1000=3.659887,即每kb大约有3-4个SNP。