Day24 - Ray Serve - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2023 iThome 鐵人賽

DAY 24

AI & Data

MLOps/LLMOps - 從零開始系列第 24 篇

Day24 - Ray Serve

15th鐵人賽

jimmyliao

2023-10-09 22:41:50

703 瀏覽

分享至

在 ML/MLOps 的領域中，我們常常需要將訓練好的模型部署到 production 環境中，讓其他人可以透過 API 來使用這些模型。而 Ray Serve 就是一個可以讓我們快速部署 ML model 的 framework。

而從前一篇的內容可以知道，透過 ray remote 來執行程式碼，可以讓我們在多個節點上執行程式碼，並看到效能可以有顯著的提升。而 Ray Serve 的目的就是將 Ray 的分散式運算能力，應用在 ML model 的部署上。

Quick Start

pip install "ray[serve]"

一個簡單的 HTTP server：

import requests
from starlette.requests import Request
from typing import Dict

from ray import serve


# 1: Define a Ray Serve application.
@serve.deployment(route_prefix="/")
class MyModelDeployment:
    def __init__(self, msg: str):
        # Initialize model state: could be very large neural net weights.
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        return {"result": self._msg}


app = MyModelDeployment.bind(msg="Hello world!")

# 2: Deploy the application locally.
serve.run(app)

# 3: Query the application and print the result.
print(requests.get("http://localhost:8000/").json())
# {'result': 'Hello world!'}

進階用法：FastAPI

FastAPI server 是一個可以讓我們快速建立 API server 的 framework，而 Ray Serve 也可以與 FastAPI 整合，讓我們可以快速建立一個 API server。

import requests
from fastapi import FastAPI
from ray import serve

# 1: Define a FastAPI app and wrap it in a deployment with a route handler.
app = FastAPI()


@serve.deployment(route_prefix="/")
@serve.ingress(app)
class FastAPIDeployment:
    # FastAPI will automatically parse the HTTP request for us.
    @app.get("/hello")
    def say_hello(self, name: str) -> str:
        return f"Hello {name}!"


# 2: Deploy the deployment.
serve.run(FastAPIDeployment.bind())

# 3: Query the deployment and print the result.
print(requests.get("http://localhost:8000/hello", params={"name": "Theodore"}).json())
# "Hello Theodore!"

進階用法：同時 Serve 多個 model deployment

import requests
import starlette
from typing import Dict
from ray import serve
from ray.serve.handle import DeploymentHandle


# 1. Define the models in our composition graph and an ingress that calls them.
@serve.deployment
class Adder:
    def __init__(self, increment: int):
        self.increment = increment

    def add(self, inp: int):
        return self.increment + inp


@serve.deployment
class Combiner:
    def average(self, *inputs) -> float:
        return sum(inputs) / len(inputs)


@serve.deployment
class Ingress:
    def __init__(self, adder1, adder2, combiner):
        self._adder1: DeploymentHandle = adder1.options(use_new_handle_api=True)
        self._adder2: DeploymentHandle = adder2.options(use_new_handle_api=True)
        self._combiner: DeploymentHandle = combiner.options(use_new_handle_api=True)

    async def __call__(self, request: starlette.requests.Request) -> Dict[str, float]:
        input_json = await request.json()
        final_result = await self._combiner.average.remote(
            self._adder1.add.remote(input_json["val"]),
            self._adder2.add.remote(input_json["val"]),
        )
        return {"result": final_result}


# 2. Build the application consisting of the models and ingress.
app = Ingress.bind(Adder.bind(increment=1), Adder.bind(increment=2), Combiner.bind())
serve.run(app)

# 3: Query the application and print the result.
print(requests.post("http://localhost:8000/", json={"val": 100.0}).json())
# {"result": 101.5}