Day 21：投資情緒的解讀--情緒分析與自然語言處理

2024 iThome 鐵人賽

DAY 22

AI/ ML & Data

打開就會 AI 與數據分析的投資理財術系列第 22 篇

16th鐵人賽 nltk tweepy sentiment intensity analyzer tradebot

zivzhong

2024-10-06 23:13:49

775 瀏覽

分享至

除了交易市場的股價、歷史股價可以成為我們的交易參考原則，市場的情緒也是一個非常好的參考來源，例如可以透過新聞猜測現在市場是否為恐慌、經濟是否下行導致市場信心不足等等。在本課中，我們將探討如何使用自然語言處理（NLP）技術來分析文本數據，特別是從新聞和社交媒體中提取市場情緒。我們將學習如何使用 NLTK 和 SpaCy 等庫進行文本分析，並瞭解情緒分析在金融領域的重要性。今日 Colab

一、引言

1. 為什麼情緒分析對金融市場很重要？

市場情緒影響價格走勢：投資者的情緒往往會影響市場的供需關係，進而影響資產價格。
資訊來源多樣化：隨著互聯網的發展，投資者可以從新聞、社交媒體等多個渠道獲取信息。
大數據分析的必要性：面對大量的非結構化文本數據，使用 NLP 技術可以有效地提取有用的信息。

二、環境設置

1. 安裝必要的庫

我們將使用以下 Python 庫：

NLTK：自然語言處理工具包。官網
SpaCy：高效的自然語言處理庫。官網
TextBlob：簡單易用的文本處理庫，用於情緒分析。官網
tweepy：用於訪問 Twitter API 的庫。官網
Newspaper3k：用於從新聞網站提取文章的庫。官網
WordCloud：用於生成文字雲的庫。官網

!pip install nltk
!pip install spacy
!pip install textblob
!pip install tweepy
!pip install newspaper3k
!pip install wordcloud
!python -m spacy download en_core_web_sm

2. 導入庫

import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
import spacy
from textblob import TextBlob
import tweepy
from newspaper import Article
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud

三、使用 NLTK 進行文本數據分析

1. 下載 NLTK 資源

nltk.download('vader_lexicon')
nltk.download('punkt')

2. 使用 SentimentIntensityAnalyzer 進行情緒分析

sia = SentimentIntensityAnalyzer()

3. 示例：對句子進行情緒分析

sentence = "Apple's new product has received rave reviews from critics. Everyone loves the new apple product! Great job, Apple!"
scores = sia.polarity_scores(sentence)
print(scores)

輸出：
可以看到這個中性偏向正面的句子，市場很喜歡蘋果的這個消息！

四、從新聞中提取市場情緒

1. 使用 Newspaper3k 獲取新聞文章

我們以 CNBC 上一篇關於 Apple 的新聞為例

url = 'https://www.cnbc.com/2024/10/04/apple-is-turning-to-its-army-of-developers-for-an-edge-in-the-ai-race.html?&qsearchterm=apple'
article = Article(url)

# 下載並解析文章
article.download()
article.parse()

2. 提取文章文本

text = article.text
print(text)

3. 對文章進行情緒分析

scores = sia.polarity_scores(text)
print(scores)

五、使用 SpaCy 進行文本處理

除了使用 NLTK 來進行 NLP，我們也可用 SpaCy 來做

1. 加載 SpaCy 模型

nlp = spacy.load('en_core_web_sm')

2. 對文本進行處理

我們一樣使用 Apple 的新聞

doc = nlp(text)

3. 進行實體識別（Named Entity Recognition, NER）

我們可以分析出句子裡的每個字是什麼，例如 Apple 是組織(公司)

for ent in doc.ents:
    print(ent.text, ent.label_)

六、從 Twitter 獲取社交媒體數據

1. 設置 Twitter API 憑證

您需要在 Twitter Developer 平台上申請 API 憑證。API key 設置教學可參考這裡

consumer_key = '你的消費者金鑰'
consumer_secret = '你的消費者密鑰'
access_token = '你的存取權杖'
access_token_secret = '你的存取權杖密鑰'

2. 認證並建立 API 物件

auth = tweepy.OAuth1UserHandler(consumer_key, consumer_secret, access_token, access_token_secret)
api = tweepy.API(auth)

3. 搜索相關推文

tweets = api.search_tweets(q='Apple', lang='en', count=100)

4. 對推文進行情緒分析

tweet_texts = [tweet.text for tweet in tweets]
scores = [sia.polarity_scores(text)['compound'] for text in tweet_texts]

5. 可視化結果

plt.hist(scores, bins=20)
plt.title('Twitter 上關於 Apple 的情緒分佈')
plt.xlabel('情緒得分')
plt.ylabel('推文數量')
plt.show()

七、可視化文本數據：生成文字雲

1. 生成文字雲

我們將使用從 Twitter 獲取的推文文本來生成文字雲，直觀地展示頻繁出現的詞語。

# 合併所有推文文本
text_combined = ' '.join(tweet_texts)

# 創建文字雲對象
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text_combined)

# 繪製文字雲
plt.figure(figsize=(15, 7.5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Twitter 上關於 Apple 的文字雲', fontsize=20)
plt.show()

2. 說明

文本合併：將所有推文文本合併成一個大的字符串，便於生成文字雲。
WordCloud 參數：
- width 和 height：設定圖片的寬度和高度。
- background_color：設定背景顏色，這裡選擇白色。
生成和顯示：
- 使用 generate 方法生成文字雲。
- 使用 Matplotlib 繪製並顯示圖片。

3. 結果分析

通過文字雲，我們可以直觀地看到在推文中出現頻率較高的詞語。這有助於我們了解市場關注的熱點話題和主要情緒。

八、整合新聞和社交媒體情緒

1. 建立資料框

data = pd.DataFrame({
    'Text': tweet_texts,
    'Sentiment': scores
})

2. 計算平均情緒分數

average_sentiment = data['Sentiment'].mean()
print(f"Twitter 上關於 Apple 的平均情緒分數為：{average_sentiment}")

3. 將結果應用於投資決策

正面情緒：如果平均情緒分數為正，可能表示市場對該股票的看漲情緒。
負面情緒：如果平均情緒分數為負，可能表示市場對該股票的看跌情緒。

九、案例研究：建立情緒驅動的交易策略

1. 定義交易策略

策略原則：當市場情緒指標超過某個閾值時，執行買入或賣出操作。

2. 實現交易策略

def trading_strategy(sentiment_score):
    if sentiment_score > 0.05:
        action = 'Buy'
    elif sentiment_score < -0.05:
        action = 'Sell'
    else:
        action = 'Hold'
    return action

3. 測試交易策略

action = trading_strategy(average_sentiment)
print(f"根據當前的市場情緒，建議的行動是：{action}")

十、作業

拓展數據來源：嘗試從其他新聞網站或社交媒體平台（如 Reddit）獲取數據。
改進情緒分析：使用 TextBlob 或其他情緒分析工具，對比結果並討論差異。
多語言支持：處理中文文本，分析國內市場的情緒。
情緒與股價關係：將情緒分析結果與歷史股價數據結合，探索情緒指標對股價走勢的預測能力。

提示

API 限制：注意各平台的 API 調用限制，避免被封禁。
數據清洗：對文本數據進行預處理，如去除網址、標點符號、表情符號等。
情緒閾值調整：根據實際情況調整情緒分數的閾值，以提高策略的準確性。

注意事項

合規要求：遵守各數據來源的使用條款和法律法規。
倫理考慮：不得利用爬蟲等手段非法獲取數據，確保數據來源合法。
風險提示：情緒分析僅作為輔助工具，不應作為唯一的投資決策依據。

通過本課的學習，應該對如何使用自然語言處理技術從文本數據中提取市場情緒有了初步的了解。我們還學習了如何生成文字雲，以視覺化方式展示文本數據中的關鍵詞。希望這些技能能夠幫助您在金融領域的數據分析中取得進展。

完整程式碼

以下是本課中使用的完整程式碼，您可以直接在 Jupyter Notebook 或 Google Colab 中運行。

# 安裝必要的庫
!pip install nltk
!pip install spacy
!pip install textblob
!pip install tweepy
!pip install newspaper3k
!pip install wordcloud
!python -m spacy download en_core_web_sm

# 導入庫
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
import spacy
from textblob import TextBlob
import tweepy
from newspaper import Article
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# 下載 NLTK 資源
nltk.download('vader_lexicon')
nltk.download('punkt')

# 使用 NLTK 的情緒分析器
sia = SentimentIntensityAnalyzer()

# 從新聞網站獲取文章
url = 'https://www.bloomberg.com/news/articles/2021-09-14/apple-unveils-iphone-13-new-ipads-and-watches-at-big-event'
article = Article(url)
article.download()
article.parse()
text = article.text

# 對文章進行情緒分析
scores = sia.polarity_scores(text)
print("新聞文章的情緒分析結果：")
print(scores)

# 使用 SpaCy 進行實體識別
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
print("\n文章中的命名實體：")
for ent in doc.ents:
    print(ent.text, ent.label_)

# 設置 Twitter API 憑證
consumer_key = '你的消費者金鑰'
consumer_secret = '你的消費者密鑰'
access_token = '你的存取權杖'
access_token_secret = '你的存取權杖密鑰'

# 認證並建立 API 物件
auth = tweepy.OAuth1UserHandler(consumer_key, consumer_secret, access_token, access_token_secret)
api = tweepy.API(auth)

# 搜索推文
tweets = api.search_tweets(q='Apple', lang='en', count=100)
tweet_texts = [tweet.text for tweet in tweets]

# 對推文進行情緒分析
scores = [sia.polarity_scores(text)['compound'] for text in tweet_texts]

# 可視化情緒分佈
plt.hist(scores, bins=20)
plt.title('Twitter 上關於 Apple 的情緒分佈')
plt.xlabel('情緒得分')
plt.ylabel('推文數量')
plt.show()

# 生成文字雲
text_combined = ' '.join(tweet_texts)
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text_combined)
plt.figure(figsize=(15, 7.5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Twitter 上關於 Apple 的文字雲', fontsize=20)
plt.show()

# 建立資料框並計算平均情緒分數
data = pd.DataFrame({
    'Text': tweet_texts,
    'Sentiment': scores
})
average_sentiment = data['Sentiment'].mean()
print(f"Twitter 上關於 Apple 的平均情緒分數為：{average_sentiment}")

# 定義交易策略
def trading_strategy(sentiment_score):
    if sentiment_score > 0.05:
        action = 'Buy'
    elif sentiment_score < -0.05:
        action = 'Sell'
    else:
        action = 'Hold'
    return action

# 測試交易策略
action = trading_strategy(average_sentiment)
print(f"根據當前的市場情緒，建議的行動是：{action}")

Day20：強化學習在交易中的應用--使用 Stable Baselines 3 實現基於 LSTM 的強化學習，並使用 Backtrader 進行回測

Day 22：投資情緒的解讀--結合情緒分析的交易策略

系列文

打開就會 AI 與數據分析的投資理財術共 30 篇

RSS系列文訂閱系列文

15 人訂閱

完整目錄

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22211 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

打開就會 AI 與數據分析的投資理財術系列 第 22 篇