昨日分享了item2vec 是什麼,今天我們來實作吧!這次用的資料集,就是動畫推薦的資料集!
item2vec 適合做的推薦是屬於: 看過這部影片的人,也看過以下這幾部
[('1575', 'Code Geass: Hangyaku no Lelouch'),
('6746', 'Durarara!!'),
('1535', 'Death Note'),
('3588', 'Soul Eater'),
('121', 'Fullmetal Alchemist'),
('2904', 'Code Geass: Hangyaku no Lelouch R2'),
('9253', 'Steins;Gate'),
('6547', 'Angel Beats!'),
('2001', 'Tengen Toppa Gurren Lagann'),
('16498', 'Shingeki no Kyojin')]
看起來都是同一類的
[('2034', 'Lovely★Complex'),
('322', 'Paradise Kiss'),
('4722', 'Skip Beat!'),
('6045', 'Kimi ni Todoke'),
('1698', 'Nodame Cantabile'),
('9656', 'Kimi ni Todoke 2nd Season'),
('1222', 'Bokura ga Ita'),
('3731', 'Itazura na Kiss'),
('4477', 'Nodame Cantabile: Paris-hen'),
('1562', 'Yamato Nadeshiko Shichihenge♥')]
果然都是少女戀愛動畫
[('501', 'Doraemon'),
('2116', 'Captain Tsubasa'),
('1614', 'Captain Tsubasa: Road to 2002'),
('516', 'Keroro Gunsou'),
('1744', 'Wagamama☆Fairy Mirumo de Pon!'),
('1663', 'Haha wo Tazunete Sanzenri'),
('3545', 'Kochira Katsushikaku Kameari Kouenmae Hashutsujo (TV)'),
('4936', 'Ninja Hattori-kun'),
('350', 'Ojamajo Doremi'),
('1668', 'Bakuten Shoot Beyblade G Revolution')]
就出一堆子供向的動畫
開始導入資料
import pandas as pd
import numpy as np
from google.colab import drive
drive.mount('/content/drive')
animes = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/dataset/anime recommendation data/anime.csv", engine='python')
ratings = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/dataset/anime recommendation data/rating.csv", engine='python')
為了方便,我們拿掉 rating 是 -1 的資料,並將大於 7 的 liked 欄位設成 1
real_ratings = ratings.drop(ratings[ratings['rating'] == -1].index)
ratings_train['liked'] = np.where(ratings_train['rating'] >= 7, 1, 0)
造出 anime_groups ,裡面都是某個使用者他曾評過分,並且認為好看的動畫。
ratings_train['anime_id_str'] = ratings_train['anime_id'].astype('str')
gp_user_like = ratings_train.groupby(['liked','user_id'])
anime_groups = [gp_user_like.get_group(gp)['anime_id_str'].tolist() for gp in gp_user_like.groups]
pd.options.mode.chained_assignment = None # make code faster
把評分的集合做shuffle
import random
for gp in anime_groups:
random.shuffle(gp)
裝gensim
!pip install gensim
開始訓練
from gensim.models import word2vec
# 以下開始訓練
model = word2vec.Word2Vec(
sentences = anime_groups,
iter = 5, #
min_count = 10, # 最少要出現 10 次
size = 100, # 向量維度
workers = 12, # 用多少 thread 下去跑
sg = 1, # 使用 skip-gram
hs = 0,
negative = 5,
window = 999999 # 指定 skip-gram 要看的範圍,拉一個很大的範圍
)
model.save("/content/drive/MyDrive/Colab Notebooks/data/word2vec_model") # 把訓練好的模型存起來
import pandas as pd
import numpy as np
from gensim.models import word2vec
from google.colab import drive
drive.mount('/content/drive')
animes = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/dataset/anime recommendation data/anime.csv", engine='python')
item2vec_model = word2vec.Word2Vec.load("/content/drive/MyDrive/Colab Notebooks/data/word2vec_model")
設定 index ,等下抓資料好抓
animes['anime_id_str'] = animes['anime_id'].astype('str')
animes.set_index('anime_id_str', inplace=True)
推薦用函式,把動畫名丟進去,就會找出要推薦的動畫。
def get_recommend(anime_name, animes, item2vec_model, topn=10):
anime_id_str = animes.index[animes['name'] == anime_name].tolist()[0]
recommendations = item2vec_model.wv.most_similar_cosmul(positive=[anime_id_str], topn=topn)
similar_animes = [ (anime_id_str, animes.loc[anime_id_str]['name']) for (anime_id_str, _) in recommendations ]
return similar_animes
推薦看鋼之鍊金術士的人,還會看什麼
similar_animes = get_recommend('Fullmetal Alchemist: Brotherhood', animes, item2vec_model, topn=10)
similar_animes
[('1575', 'Code Geass: Hangyaku no Lelouch'),
('6746', 'Durarara!!'),
('1535', 'Death Note'),
('3588', 'Soul Eater'),
('121', 'Fullmetal Alchemist'),
('2904', 'Code Geass: Hangyaku no Lelouch R2'),
('9253', 'Steins;Gate'),
('6547', 'Angel Beats!'),
('2001', 'Tengen Toppa Gurren Lagann'),
('16498', 'Shingeki no Kyojin')]
看 Nana 的人,還會看什麼
similar_animes = get_recommend('Nana', animes, item2vec_model, topn=10)
similar_animes
[('2034', 'Lovely★Complex'),
('322', 'Paradise Kiss'),
('4722', 'Skip Beat!'),
('6045', 'Kimi ni Todoke'),
('1698', 'Nodame Cantabile'),
('9656', 'Kimi ni Todoke 2nd Season'),
('1222', 'Bokura ga Ita'),
('3731', 'Itazura na Kiss'),
('4477', 'Nodame Cantabile: Paris-hen'),
('1562', 'Yamato Nadeshiko Shichihenge♥')]
看 Doraemon 的人,還會看什麼
similar_animes = get_recommend('Doraemon (1979)', animes, item2vec_model, topn=10)
similar_animes
[('501', 'Doraemon'),
('2116', 'Captain Tsubasa'),
('1614', 'Captain Tsubasa: Road to 2002'),
('516', 'Keroro Gunsou'),
('1744', 'Wagamama☆Fairy Mirumo de Pon!'),
('1663', 'Haha wo Tazunete Sanzenri'),
('3545', 'Kochira Katsushikaku Kameari Kouenmae Hashutsujo (TV)'),
('4936', 'Ninja Hattori-kun'),
('350', 'Ojamajo Doremi'),
('1668', 'Bakuten Shoot Beyblade G Revolution')]
若要實作 ItemCF ,可以利用 gensim 實作 item2vec ,簡單快速,效果不錯。