iT邦幫忙

1

Python 演算法 學習日誌 Day 4

非監督式學習 (Unsupervised):Unlebaling

1. 集群 (Clustering):

https://ithelp.ithome.com.tw/upload/images/20210621/20138527Xbu6ZOJKEY.png

https://en.proft.me/2015/12/24/types-machine-learning-algorithms/

以下為範例 1. CLV (Regression):
https://ithelp.ithome.com.tw/upload/images/20210621/20138527QNJfwXIom0.png

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

ds = pd.read_csv('CLV.csv')
print(ds.describe().T)

https://ithelp.ithome.com.tw/upload/images/20210621/20138527Wibajolmlm.png

X=ds.iloc[:,[0,1]].values   # 沒有 y

# 1. 手動分群,分 1~10群,計算誤差平方和 (elbow method) 最少者為優
from sklearn.cluster import KMeans
wcss = []
for i in range(1,11):      
    km=KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
    km.fit(X)
    wcss.append(km.inertia_)
plt.plot(range(1,11),wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('wcss')
plt.show()

結果:
https://ithelp.ithome.com.tw/upload/images/20210621/20138527VU4GTs5LQO.png
結論:2群、4群 or 10群。

# 2. 自動分群,計算輪廓係數 (Silhoutte Coefficient)
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans

for n_cluster in range(2, 11):
    kmeans = KMeans(n_clusters=n_cluster).fit(X)
    label = kmeans.labels_
    sil_coeff = silhouette_score(X, label, metric='euclidean')
    print(f"n_clusters={n_cluster}, Silhouette Coefficient is {sil_coeff:.4}")

結果:
n_clusters=2, Silhouette Coefficient is 0.4401
n_clusters=3, Silhouette Coefficient is 0.3596
n_clusters=4, Silhouette Coefficient is 0.3721
n_clusters=5, Silhouette Coefficient is 0.3617
n_clusters=6, Silhouette Coefficient is 0.3632
n_clusters=7, Silhouette Coefficient is 0.3629
n_clusters=8, Silhouette Coefficient is 0.3538
n_clusters=9, Silhouette Coefficient is 0.3441
n_clusters=10, Silhouette Coefficient is 0.3477
結論:分 9 群最優。

# 視覺化分群結果
##Fitting kmeans to the dataset
km4=KMeans(n_clusters=8,init='k-means++', max_iter=300, n_init=10, random_state=0)
y_means = km4.fit_predict(X)

# Visualising the clusters for k=4
plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50, c='purple',label='Cluster1')
plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50, c='blue',label='Cluster2')
plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50, c='green',label='Cluster3')
plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50, c='cyan',label='Cluster4')
plt.scatter(X[y_means==4,0],X[y_means==4,1],s=50, c='yellow',label='Cluster5')
plt.scatter(X[y_means==5,0],X[y_means==5,1],s=50, c='black',label='Cluster6')
plt.scatter(X[y_means==6,0],X[y_means==6,1],s=50, c='brown',label='Cluster7')
plt.scatter(X[y_means==7,0],X[y_means==7,1],s=50, c='red',label='Cluster8')

plt.scatter(km4.cluster_centers_[:,0], km4.cluster_centers_[:,1],s=200,marker='s', c='red', alpha=0.7, label='Centroids')
plt.title('Customer segments')
plt.xlabel('Annual income of customer')
plt.ylabel('Annual spend from customer on site')
plt.legend()
plt.show()

https://ithelp.ithome.com.tw/upload/images/20210621/20138527tIrBcBAA6o.png

補充:一般客戶分析會使用 RFM (Recency-Frequency-Monetary) 分析
此為機器學習第三步:Feature Engineering

.
.
.
.
.
.
.
.
.
.
.
.
.
.

圖像化 (視覺化):

# 1. Plot styling
import seaborn as sns; sns.set()  # for plot styling
plt.rcParams['figure.figsize'] = (16, 9)
plt.style.use('ggplot')

# 2. Visualising the data
plot_income = sns.distplot(ds["INCOME"])
plot_spend = sns.distplot(ds["SPEND"])
plt.xlabel('Income / spend')
plt.show()

https://ithelp.ithome.com.tw/upload/images/20210621/20138527F6R08YGruo.png

f, axes = plt.subplots(1,2, figsize=(12,6), sharex=True, sharey=True)
v1 = sns.violinplot(data=ds, x='INCOME', color="skyblue",ax=axes[0])
v2 = sns.violinplot(data=ds, x='SPEND',color="lightgreen", ax=axes[1])
v1.set(xlim=(0,420))
plt.show()

https://ithelp.ithome.com.tw/upload/images/20210621/20138527IAWTesOgbL.png

# 3. Plotting the values to understand the spread
Income = ds['INCOME'].values
Spend = ds['SPEND'].values
X = np.array(list(zip(Income, Spend)))
plt.scatter(Income, Spend, c='black', s=100)
plt.show()

https://ithelp.ithome.com.tw/upload/images/20210621/20138527S1rUcWkvAM.png

# 4. plot in 3D space
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X[:, 0], X[:, 1])
plt.show()

https://ithelp.ithome.com.tw/upload/images/20210621/20138527Dgrr8P48Ch.png

2. 降維 (Dimensionality Reduction):預測股價

3. 強化學習 (Reinforcement Learning):AlphaGo


尚未有邦友留言

立即登入留言