[Day 29] Google AI Hub - 2 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

第 11 屆 iThome 鐵人賽

DAY 29

Google Developers Machine Learning

Overview of Machine Learning Products系列第 29 篇

[Day 29] Google AI Hub - 2

11th鐵人賽 google ai hub google development

Joseph-bug

2019-10-06 23:58:47

2638 瀏覽

分享至

今天要來玩的是AI Hub裡面的Reusing a pipeline component，對Python超不熟的我弄了超久。

這邊會需要run起tensorflow的docker
docker pull tensorflow/tensorflow:latest-py3-jupyter
docker run -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 --name jupyter tensorflow/tensorflow:latest-py3-jupyter

pipeline component

首先，我們先透過AI Hub Kubeflow pipeline找到要用的pipeline component，doc文件裡推薦我們找Scikit-learn Trainer。
scikit-learning

這邊我們選擇下載component.yaml，因為還沒找到正確的方法使用Copy link跟Download

install kubeflow pipeline sdk

接著我們照著kubeflow pipeline的教學，逐步安裝SDK。

這裡需要docker exec -it jupyter /bin/bash進入docker shell

apt-get update; apt-get install -y wget bzip2
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# 因為在docker裡面run
export PATH=/root/miniconda3/bin:$PATH 

conda create --name mlpipeline python=3.7
conda activate mlpipeline

pip install https://storage.googleapis.com/ml-pipeline/release/latest/kfp.tar.gz --upgrade

建立pipeline.zip

回到文件，開始建立pipeline.zip

from kfp import compiler
import kfp.dsl as dsl
import kfp.components as comp

scikit_learn_train = comp.load_component_from_file('component.yaml')
@dsl.pipeline(
    name='Scikit-learn Trainer',
    description='Trains a Scikit-learn model')
# Use a function to define the pipeline.
def scikit_learn_trainer(
    training_data_path='gs://cloud-samples-data/ml-engine/iris/classification/train.csv',
    test_data_path='gs://cloud-samples-data/ml-engine/iris/classification/evaluate.csv',
    output_dir='/tmp',
    estimator_name='GradientBoostingClassifier',
    hyperparameters='n_estimators 100 max_depth 4'):

    # Use the component you loaded in the previous step to create a pipeline task.
    sklearn_op = scikit_learn_train(training_data_path, test_data_path, output_dir,
                                    estimator_name, hyperparameters)
compiler.Compiler().compile(scikit_learn_trainer, './pipeline.zip')

接著呢，就可以看到產生的pipeline.zip了。

可以看出來，這邊我們用別人提供的pipeline，建立自己要用的pipeline.zip，接著就可以透過kubeflow pipeline上傳、測試，大大節省了很多步驟，也就是plug-and-play AI components。