DAY 21 : python3 pandas 資料處理

第 11 屆 iThome 鐵人賽

DAY 21

AI & Data

蟲王養成 - scrapy系列第 21 篇

11th鐵人賽

kevin8701111

團隊NUTC_IMAC_GREEN

2019-10-07 20:59:32

1909 瀏覽

分享至

先前發文
DAY 01 : 參賽目的與規劃
 DAY 02 : python3 virtualenv 建置
 DAY 03 : python3 request
DAY 04 : 使用beautifulsoup4 和lxml
DAY 05 : select 和find 抓取tag
DAY 06 : soup解析後 list取值
 DAY 07 : request_header_cookie 通過網頁18限制
 DAY 08 : ppt內文爬取
 DAY 09 : 資料處理 split replace strip
DAY 10 : python csv 寫入和dict 合併
 DAY 11 : python class function
DAY 12 : crawl 框架 scrapy 使用
 DAY 13 : scrapy 架構
 DAY 14 : scrapy pipeline data insert mongodb
DAY 15 : scrapy middleware proxy
DAY 16 : scrapy selenium
DAY 17 : scrapy 爬取js畫面資料(二)
DAY 18 : scrapy splash 爬取js畫面資料(三)
DAY 19 : python .env 使用
 DAY 20 : python chartify 資料視覺化套件
DAY 21 : python3 pandas 資料處理

pip3 install pandas
pip3 install mlxtend xlrd

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules


# header = 
# read 檔案
df1 = pd.read_csv('data/test1.txt', sep=",", chunksize = 1000, header=None,names=["Sequence", "Start", "End", "Coverage"])
# print(df1)

df2 = pd.read_csv('data/test2.txt', sep=",", chunksize = 1000, header=None,names=["Sequence", "Start", "End", "Coverage"])

df = pd.DataFrame(columns=[])

for df1 in df1:
#     print(df1)
    print(type(df1))
    df3 = pd.concat([df,df1],ignore_index=True) 
for df2 in df2: 
#     print(df2)
    print(type(df2))
    df4 = pd.concat([df,df2],ignore_index=True) 
    
print(type(df1))
print(type(df2))
print(type(df3))
print(type(df4))

res = pd.concat([df3, df4],  axis=0)
res['count'] = 1
print(res)

DAY 20 : python chartify 資料視覺化套件

DAY 22 : scrapy 資料應用apriori

系列文

蟲王養成 - scrapy 共 30 篇

RSS系列文訂閱系列文

27 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

蟲王養成 - scrapy系列 第 21 篇

DAY 21 : python3 pandas 資料處理

尚未有邦友留言

標記使用者

蟲王養成 - scrapy系列第 21 篇