实用的SHELL脚本

Posted on 2011年3月8日 by Yixf

实际问题

假设有成百上千个GEO数据文件，每个文件中都有好多行的基因表达数据，文件的第一列是NCBI geneid。现在想从这许多GSE中提取出每个geneid的基因表达量，即根据geneid来对原始数据进行分组。如何实现呢？下面是使用shell的一种解决办法。

SHELL脚本

#!/bin/bash
 
echo "merging datafiles..."
cat GSE* > dump
 
echo "get unique GeneIDs..."
listID=`cat dump | sort | cut -f 1 | uniq`
 
echo "exporting gene expression data to GeneID files..."
for id in $listID
do
grep "^$idW" dump > $id
done
rm dump
echo "done."

资料来源

Code more shell scripts (Part. I)

相关

Follow

Follow Yixf's blog

Get every new post delivered to your Inbox

Join other followers