iT邦幫忙

0

文字雲問題

  • 分享至 

  • xImage

請問為什麼文字雲產生的時候每個字詞後面都會有'符號呢?

https://drive.google.com/file/d/1WZ3xl37KX-HM71z_LVi1HIuPtuzTVX2h/view?usp=sharing

# 資料整理
comment = pd.read_csv('drink留言內容.csv')

removeword = ['span','class','f3','https','imgur','h1','_   blank','href','rel','nofollow','target','cdn','cgi','b4','jpg','hl','b1','f5','f4','goo.gl','f2','email','map','f1','f6','__cf___','data','bbs''html','cf','f0','b2','b3','b5','b6','原文內容','原文連結','作者''標題','時間','看板','<','>',',','。','?','—','閒聊','・','/','=','\"','\n','」','「','!','[',']',':','‧','╦','╔','╗','║','╠','╬','╬',':','╰','╩','╯','╭','╮','│','╪','─','《','》','_','.','、','(',')',' ','*','※','~','○','”','“','~','@','+','\r','▁',')','(','-','═','?',',','!','…','&',';','『','』','#','=',"'"
,'\l','的','了','也','就','在','以','會','都','XD','不是','覺得','沒','喔','知道','店','可能','說','看到','感覺','應該']

for i in removeword:
    comment["content"] = comment["content"].apply(lambda x: str(x).replace(i,""))

sentence=''
#中文範例
for i in comment['content']:
    sentence=sentence+i
    print(sentence)

seg_list = jieba.cut(sentence, cut_all=False)
seg_list = jieba.lcut(sentence, cut_all=False)

def remove_stop_words(file_name,seg_list):
  with open(file_name,'r',encoding='utf-8') as f:
    stop_words = f.readlines()
  stop_words = [stop_word.rstrip() for stop_word in stop_words]
  new_list = []
  
  for seg in seg_list:
    if seg not in stop_words:
      new_list.append(seg)
  return new_list


#統計詞頻
file_name = 'stopwords.txt'
seg_list = remove_stop_words(file_name,seg_list)

def count_segment_freq(seg_list):
  seg_df = pd.DataFrame(seg_list,columns=['seg'])
  seg_df['count'] = 1
  sef_freq = seg_df.groupby('seg')['count'].sum().sort_values(ascending=False)
  sef_freq = pd.DataFrame(sef_freq)
  return sef_freq
sef_freq = count_segment_freq(seg_list)
sef_freq.head()


font_path = r'msjh.ttc'
wc = WordCloud(background_color='black',font_path=font_path)
wc.generate(str(seg_list))
plt.imshow(wc)
plt.axis("off")
plt.show()

https://ithelp.ithome.com.tw/upload/images/20220609/20139313l84GXNdmhe.jpg

圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

2 個回答

0
海綿寶寶
iT邦大神 1 級 ‧ 2022-06-09 22:22:18

Google 到2022/03/16 的文章供參考

0

其實我覺得這更多的是編碼過程中的規則。 我剛開始學習這麼高級的課程,知識面真廣,難度也大 geometry dash lite

我要發表回答

立即登入回答