iT邦幫忙

2022 iThome 鐵人賽

DAY 27
0
AI & Data

親手打造推薦系統系列 第 27

Day27 - 利用Gensim實作 item2vec 的動畫推薦 - 親手打造推薦系統

  • 分享至 

  • xImage
  •  

昨日分享了item2vec 是什麼,今天我們來實作吧!這次用的資料集,就是動畫推薦的資料集!

item2vec 適合做的推薦是屬於: 看過這部影片的人,也看過以下這幾部

先看一下成果

看過 鋼之鍊金術師 Fullmetal Alchemist: Brotherhood 的人,會推薦哪些呢?

[('1575', 'Code Geass: Hangyaku no Lelouch'),
 ('6746', 'Durarara!!'),
 ('1535', 'Death Note'),
 ('3588', 'Soul Eater'),
 ('121', 'Fullmetal Alchemist'),
 ('2904', 'Code Geass: Hangyaku no Lelouch R2'),
 ('9253', 'Steins;Gate'),
 ('6547', 'Angel Beats!'),
 ('2001', 'Tengen Toppa Gurren Lagann'),
 ('16498', 'Shingeki no Kyojin')]

看起來都是同一類的

若是 Nana 的人,會推薦什麼呢?

[('2034', 'Lovely★Complex'),
 ('322', 'Paradise Kiss'),
 ('4722', 'Skip Beat!'),
 ('6045', 'Kimi ni Todoke'),
 ('1698', 'Nodame Cantabile'),
 ('9656', 'Kimi ni Todoke 2nd Season'),
 ('1222', 'Bokura ga Ita'),
 ('3731', 'Itazura na Kiss'),
 ('4477', 'Nodame Cantabile: Paris-hen'),
 ('1562', 'Yamato Nadeshiko Shichihenge♥')]

果然都是少女戀愛動畫

要是喜歡 Doraaemon 的人,會推薦什麼呢?

[('501', 'Doraemon'),
('2116', 'Captain Tsubasa'),
('1614', 'Captain Tsubasa: Road to 2002'),
('516', 'Keroro Gunsou'),
('1744', 'Wagamama☆Fairy Mirumo de Pon!'),
('1663', 'Haha wo Tazunete Sanzenri'),
('3545', 'Kochira Katsushikaku Kameari Kouenmae Hashutsujo (TV)'),
('4936', 'Ninja Hattori-kun'),
('350', 'Ojamajo Doremi'),
('1668', 'Bakuten Shoot Beyblade G Revolution')]

就出一堆子供向的動畫

開始實作訓練階段

開始導入資料

import pandas as pd
import numpy as np
from google.colab import drive
drive.mount('/content/drive')
animes = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/dataset/anime recommendation data/anime.csv", engine='python')
ratings = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/dataset/anime recommendation data/rating.csv", engine='python')

為了方便,我們拿掉 rating 是 -1 的資料,並將大於 7 的 liked 欄位設成 1

real_ratings = ratings.drop(ratings[ratings['rating'] == -1].index)
ratings_train['liked'] = np.where(ratings_train['rating'] >= 7, 1, 0)

造出 anime_groups ,裡面都是某個使用者他曾評過分,並且認為好看的動畫。

ratings_train['anime_id_str'] = ratings_train['anime_id'].astype('str')
gp_user_like = ratings_train.groupby(['liked','user_id'])
anime_groups = [gp_user_like.get_group(gp)['anime_id_str'].tolist() for gp in gp_user_like.groups]
pd.options.mode.chained_assignment = None  # make code faster 

把評分的集合做shuffle

import random
for gp in anime_groups:
  random.shuffle(gp)

裝gensim

!pip install gensim

開始訓練

from gensim.models import word2vec

# 以下開始訓練 
model = word2vec.Word2Vec(
    sentences = anime_groups,
    iter = 5, # 
    min_count = 10, # 最少要出現 10 次
    size = 100, # 向量維度
    workers = 12, # 用多少 thread 下去跑
    sg = 1, # 使用 skip-gram 
    hs = 0, 
    negative = 5,
    window = 999999 # 指定 skip-gram 要看的範圍,拉一個很大的範圍
  )
model.save("/content/drive/MyDrive/Colab Notebooks/data/word2vec_model")  # 把訓練好的模型存起來

推薦階段

import pandas as pd
import numpy as np
from gensim.models import word2vec
from google.colab import drive
drive.mount('/content/drive')
animes = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/dataset/anime recommendation data/anime.csv", engine='python')
item2vec_model = word2vec.Word2Vec.load("/content/drive/MyDrive/Colab Notebooks/data/word2vec_model")

設定 index ,等下抓資料好抓

animes['anime_id_str'] = animes['anime_id'].astype('str')
animes.set_index('anime_id_str', inplace=True)

推薦用函式,把動畫名丟進去,就會找出要推薦的動畫。

def get_recommend(anime_name, animes, item2vec_model, topn=10):
  anime_id_str = animes.index[animes['name'] == anime_name].tolist()[0]
  recommendations = item2vec_model.wv.most_similar_cosmul(positive=[anime_id_str], topn=topn)
  similar_animes = [ (anime_id_str, animes.loc[anime_id_str]['name']) for (anime_id_str, _) in recommendations ]
  return similar_animes

推薦看鋼之鍊金術士的人,還會看什麼

similar_animes = get_recommend('Fullmetal Alchemist: Brotherhood', animes, item2vec_model, topn=10)
similar_animes
[('1575', 'Code Geass: Hangyaku no Lelouch'),
 ('6746', 'Durarara!!'),
 ('1535', 'Death Note'),
 ('3588', 'Soul Eater'),
 ('121', 'Fullmetal Alchemist'),
 ('2904', 'Code Geass: Hangyaku no Lelouch R2'),
 ('9253', 'Steins;Gate'),
 ('6547', 'Angel Beats!'),
 ('2001', 'Tengen Toppa Gurren Lagann'),
 ('16498', 'Shingeki no Kyojin')]

看 Nana 的人,還會看什麼

similar_animes = get_recommend('Nana', animes, item2vec_model, topn=10)
similar_animes
[('2034', 'Lovely★Complex'),
 ('322', 'Paradise Kiss'),
 ('4722', 'Skip Beat!'),
 ('6045', 'Kimi ni Todoke'),
 ('1698', 'Nodame Cantabile'),
 ('9656', 'Kimi ni Todoke 2nd Season'),
 ('1222', 'Bokura ga Ita'),
 ('3731', 'Itazura na Kiss'),
 ('4477', 'Nodame Cantabile: Paris-hen'),
 ('1562', 'Yamato Nadeshiko Shichihenge♥')]

看 Doraemon 的人,還會看什麼

similar_animes = get_recommend('Doraemon (1979)', animes, item2vec_model, topn=10)
similar_animes
[('501', 'Doraemon'),
 ('2116', 'Captain Tsubasa'),
 ('1614', 'Captain Tsubasa: Road to 2002'),
 ('516', 'Keroro Gunsou'),
 ('1744', 'Wagamama☆Fairy Mirumo de Pon!'),
 ('1663', 'Haha wo Tazunete Sanzenri'),
 ('3545', 'Kochira Katsushikaku Kameari Kouenmae Hashutsujo (TV)'),
 ('4936', 'Ninja Hattori-kun'),
 ('350', 'Ojamajo Doremi'),
 ('1668', 'Bakuten Shoot Beyblade G Revolution')]

小結

若要實作 ItemCF ,可以利用 gensim 實作 item2vec ,簡單快速,效果不錯。


上一篇
Day26 - item2vec 用 embedding 技術做 ItemCF 的方法 - 親手打造推薦系統
下一篇
Day28 - Deepwalk 亂走也能做推薦? - 親手打造推薦系統
系列文
親手打造推薦系統30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言