iT邦幫忙

2019 iT 邦幫忙鐵人賽

DAY 20
0

經過了將近一週

我自己大概地將CNV-calling的方法快速瀏覽過一遍了,今天先花了點時間找了一下看有哪些工具我自己不需要重寫過一次,歸納如下:

  • 讀寫及操作BAM/SAM檔: 我可以使用BioJulia底下的BioAlignments.jl
  • 讀寫BED檔:我有BioJulia底下的BED.jlBigBed.jl可使用

但說到要有像Python底下pybedtools近乎是bedtools那種程度的操作,在Julia裡頭我倒是還沒有看到,所以今天的目標是先用Julia寫個bedtools裡頭makewindows的功能。

閹割版的BEDTools MakeWindows

為什麼要說是閹割版的makewindows呢?首先得先來看一下bedtools makewindows的說明:

Tool: bedtools makewindows
Version: v2.23.0
Summary: Makes adjacent or sliding windows across a genome or BED file.

Usage: bedtools makewindows [OPTIONS] [-g <genome> OR -b <bed>]
 [ -w <window_size> OR -n <number of windows> ]

Input Options: 
	-g <genome>
		Genome file size (see notes below).
		Windows will be created for each chromosome in the file.

	-b <bed>
		BED file (with chrom,start,end fields).
		Windows will be created for each interval in the file.

Windows Output Options: 
	-w <window_size>
		Divide each input interval (either a chromosome or a BED interval)
		to fixed-sized windows (i.e. same number of nucleotide in each window).
		Can be combined with -s <step_size>

	-s <step_size>
		Step size: i.e., how many base pairs to step before
		creating a new window. Used to create "sliding" windows.
		- Defaults to window size (non-sliding windows).

	-n <number_of_windows>
		Divide each input interval (either a chromosome or a BED interval)
		to fixed number of windows (i.e. same number of windows, with
		varying window sizes).

	-reverse
		 Reverse numbering of windows in the output, i.e. report 
		 windows in decreasing order

ID Naming Options: 
	-i src|winnum|srcwinnum
		The default output is 3 columns: chrom, start, end .
		With this option, a name column will be added.
		 "-i src" - use the source interval's name.
		 "-i winnum" - use the window number as the ID (e.g. 1,2,3,4...).
		 "-i srcwinnum" - use the source interval's name with the window number.
		See below for usage examples.

在這邊我們可以看到,bedtools windows所支援分割windows的方式還挺多樣化的,我實在是沒有時間可以花一整天的時間來把所有的功能都寫出來,所以就先以我會在CNV-calling當中用到的切割分式為主—也就是設定window size來切,這樣每條染色體就會根據我們設定的這個window size被切成固定的大小。

我的Julia實踐

#!/usr/bin/env julia              
# Implementation of BEDTools makewindows in Julia
"""            
    makeWindows(genomeFileInfo::DataFrame, windowSize::Int)    
Arguments      
---------      
- `genomeFileInfo`: a DataFrame object by reading genome file with CSV.read
- `windowSize`: set windown size (bp)
"""

using DataFrames
using CSV

function makeWindows(df::DataFrame, winSize::Int)
    for c = 1:nrow(df)
        chr = df.chr[c]
        chrLen = df.chrlen[c]
        winNum = Int(floor(chrLen/winSize))
        restLen = chrLen % winSize
        for i = 0:winNum
            startpos = i * winSize
            if i != winNum
                endpos = startpos + winSize
            else
                endpos = startpos + restLen
            end
            window = join([chr, startpos, endpos], " ")
            println(window)
        end      
    end        
end            


genomeFile = ARGS[1]
windowSize = parse(Int,ARGS[2])
df = CSV.read(genomeFile, header = ["chr", "chrlen"], delim = "\t")
makeWindows(df, windowSize)

使用方式也很容易,首先要準備一個genome file,裡頭從放著每個chromosome的長度,像這樣子

chr1	249250621
chr2	243199373
...
chr18_gl000207_random	4262

執行makeWindowsForBED.jl

./makeWindowsForBED.jl hg19.txt 1000000 > mywindowns.bed

大概就是這樣子 @@>


上一篇
[Day 19] 分析Copy number variation系列(肆)
下一篇
[Day 21] 回頭看一下Julia裡頭的一些規範
系列文
When Bioinfo met Julia: Bioinformatician的30天Julia學習之路32

1 則留言

0
杜岳華
iT邦新手 5 級 ‧ 2018-10-21 23:45:06

有程式了!!

不開始動作也不行啦~ 資料都到手上了 XD

我要留言

立即登入留言