Day 27 sklearn

第 11 屆 iThome 鐵人賽

DAY 27

AI & Data

Predicting Inter Bus Arrival Times 系列第 27 篇

11th鐵人賽

阿瑜

2019-10-12 00:04:02

2689 瀏覽

分享至

Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language.[3] It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Generalized Linear Models

Sampple Code: [取自(改自)上方連結]

>>> from sklearn import linear_model     #引入 Library
>>> reg = linear_model.LinearRegression() #建立  linear_model 
>>> reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) # 訓練資料 by X,y
...                                       
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
                 normalize=False)
>>> reg.coef_ # 查看係數(weights)
array([0.5, 0.5])

MyCode - 動態模型/最後階段

import pandas as pd
from sklearn import linear_model

from sklearn import preprocessing                # 標準化1
from sklearn.preprocessing import MinMaxScaler   # 標準化2
scaler = MinMaxScaler()                          # 標準化2

import time
today = time.strftime("%Y-%m-%d",time.localtime())
filename = '/home/turningpoint1125/'+today+'.csv'
df = pd.read_csv(filename)
print('資料數量:',len(df))


#df_normalize = preprocessing.scale(df.drop(['Time','GPSTime'],axis='columns'))  # 標準化1
#df_normalize = scaler.fit_transform(df.drop(['Time','GPSTime'],axis='columns')) # 標準化2
#print(df_normalize)

reg = linear_model.LinearRegression()
reg.fit(df.drop(['Time','GPSTime'],axis='columns'),df.Time)
#reg.fit(df_normalize,df.Time)  # 標準化 1 2
#print('R^2:',reg.score(df_normalize,df.Time))

print('R^2:',reg.score(df.drop(['Time','GPSTime'],axis='columns'),df.Time))
print('weight:',reg.coef_ )
print('bias',reg.intercept_ )

import csv
with open('/home/turningpoint1125/daily_log.csv','a',encoding='utf8',newline='') as fd :
    writer = csv.writer(fd)
    writer.writerow([float(reg.coef_[0:1]),float(reg.coef_[1:2]),float(reg.coef_[2:3]),float(reg.coef_[3:4]),float(reg.intercept_ ),float(reg.score(df.drop(['Time','GPSTime'],axis='columns'),df.Time))])
    
import smtplib
from email.mime.text import MIMEText
gmail_user = 'turningpoint1125@gmail.com'
gmail_password = 'XXX' # your gmail password

LatW =  str(reg.coef_[0:1])
LonW =  str(reg.coef_[1:2])
DisW =  str(reg.coef_[2:3])
SpeW =  str(reg.coef_[3:4])
context = 'Lat: '+ LatW + '\n' + 'Lon: '+LonW+'\n'+'Dis: '+DisW+'\n'+'Speed: '+SpeW+'\n'+'Bias: '+str(reg.intercept_)+'\n'+'R^2: '+str(reg.score(df.drop(['Time','GPSTime'],axis='columns'),df.Time))+'\n'
 
msg = MIMEText(context)
msg['Subject'] = 'Good Night!'
msg['From'] = 'turningpoint1125@gmail.com'
msg['To'] = 'turningpoint1125@gmail.com'

server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
server.ehlo()
server.login(gmail_user, gmail_password)
server.send_message(msg)
server.quit()

print('Email sent!')

訓練: model.fit
預測: model.predict
參數: