record.description會有
'NM_001001182.3 Mus musculus bromodomain adjacent to zinc finger domain, 2B (Baz2b), mRNA'
要如何取最後一個括號內的字串,因括號可能有二個以上
我用p1 = re.compile(r'([)]', re.S)
print(re.findall(p1,record.description))
def insidelastparentheses(str):
return str.rpartition('(')[2].partition(')')[0]
lst = ['NM_001001182.3 Mus musculus bromodomain adjacent to zinc finger domain, 2B (Baz2b), mRNA',
'xxx(12) yyy(second)',
'yyy(12) zzz(second) aaa(last)',
'yyy(12) zzz(second) aaa(3) bb(4)',
'yyy(12) zzz(second) aaa(3) bb(4) cc(5) haha',
]
for str in lst:
print(insidelastparentheses(str))
執行結果:
Baz2b
second
last
4
5
附上比較笨的做法:
s = 'NM_001001182.3 Mus musculus bromodomain adjacent to zinc finger domain, 2B (Baz2b), mRNA'
length = len(s)
start = 0
end = 0
for index in range(0, length):
if s[length - 1 - index] == ')':
end = length - 1 - index
if s[length - 1 - index] == '(':
start = length - 1 - index
result = s[start + 1:end]
print(result)
使用 Regular Expression 解法:
import re
x = 'NM_00100(1182.3) Mus musculus (bromodomain) adjacent to zinc finger domain, 2B (Baz2b), mRNA'
p1 = re.compile(r'\(\w*\)')
list1 = re.findall(p1, x)
#last one
if len(list1) > 0:
print(list1[-1][1:-1])
import re
s = 'NM_001001182.3 Mus musculus bromodomain adjacent to zinc finger domain, 2B (Baz2b), mRNA'
p1 = re.findall('\(([^)]+)', s)
print(p1[-1])