[Day 23] 部署 Instagram 機器人

第 12 屆 iThome 鐵人賽

DAY 23

自我挑戰組

資料蒐集與分散式運算 30 天系列第 23 篇

12th鐵人賽

Walter

團隊Outcome First

2020-10-06 23:35:12

3177 瀏覽

分享至

歡迎來到第 23 天，進入倒數的日子總是特別艱難，讓我們一起撐下去吧！

今天我們將結合過去幾天的內容，整合成一個完整的 Instagram 機器人。

流程

一個 Instagram 機器人將被期待具備以下功能

自動搜尋貼文素材（爬蟲）
自動撰寫貼文內容（文字產生 API）
定時自動發文（部署至伺服器、模擬人為發文）

自動搜尋貼文素材（爬蟲）

還記得在第 18 / 19 天的 IG 圖片爬蟲嗎？我們可以透過 Hashtag 的搜索搜尋素材。但首先要先做登入，因此將登入直接寫成一個 Class 當中的 __init__，並新增一個函數 hastag_search 將搜尋到的圖片儲存起來

import os
import datetime
import requests
import json
from time import sleep
from selenium import webdriver

class IG_BOT():
    def __init__(self):
        driver = webdriver.Chrome(executable_path = 'Path to webdriver')
        driver.get("https://www.instagram.com/?hl=zh-tw")
        inputs = driver.find_elements_by_xpath("//input")
        btn = driver.find_element_by_xpath("//button[@class='sqdOP  L3NKy   y3zKF     ']")
        username = inputs[0]
        password = inputs[1]
        account = "帳號"   # 可以利用環境變數取代提高安全性
        password = "密碼"  # 可以利用環境變數取代提高安全性

        username.send_keys(account)
        sleep(2) # 亦可以加入 random 隨意調整時間差
        password.send_keys(password)
        sleep(2) # 亦可以加入 random 隨意調整時間差
        btn.click()
        driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent":"Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1"})

		driver.refresh()
        self.driver = driver
    def hastag_search(self):
        self.driver.get(f"https://www.instagram.com/tags/shiba")
        self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        photo_tag = self.driver.find_elements_by_xpath("//div[@class='KL4Bh']/img")
        for i, photo in enumerate(photo_tag):
            timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
            if "pic" not in os.listdir():
                os.mkdir(f"{os.getcwd()}/pic")
            with open(f'pic/img_{timestamp}.png','wb') as f:
                f.write(requests.get(photo.get_attribute("src")).content)

自動撰寫貼文內容（文字產生 API）

在貼文中的文字部分，這裡就串接唬爛產生器的 API 作為舉例，當然也可以串接任一付費或更高階人工智慧的文章產生 API。

def post_content(self):
    response = requests.post("https://api.howtobullshit.me/bullshit",data=json.dumps({"Topic":"柴犬","MinLen":30}))
    return response.text.replace(" ","")

定時自動發文（部署至伺服器、模擬人為發文）

最後就是昨天的重頭戲模擬人為發文的部分 new_post 函數，並另外加入一個 get_img 函數用以判斷圖片是否存在

def new_post(self):
    post = self.driver.find_element_by_xpath("//div[@class='q02Nz _0TPg']")
	post.click()
    input_tags = self.driver.find_elements_by_xpath("//input[@class= 'tb_sK']")
	input_tags[3].send_keys("圖片路徑")
    full_size = self.driver.find_element_by_xpath("//button[@class='pHnkA']")
	full_size.click()
    next_step = self.driver.find_element_by_xpath("//button[@class='UP43G']")
	next_step.click()
    text_box = self.driver.find_element_by_xpath("//textarea['_472V_']")
	text_box.send_keys(self.post_content())
    share = self.driver.find_element_by_xpath("//button[@class='UP43G']")
	share.click()
def get_img(self):
    if "pic" not in os.listdir() or len(os.listdir(f"{os.getcwd()}/pic")) == 0:
        self.hastag_search()
    pic_name = os.listdir(f"{os.getcwd()}/pic").pop()
    return f"{os.getcwd()}/pic/{pic}"

最後可以搭配 [Day 13] 動態爬蟲 - 5 的內容部署到伺服器上，這樣就完成一個每天都會固定發文的 Instagram 機器人了！