在 ML/MLOps 的領域中,我們常常需要將訓練好的模型部署到 production 環境中,讓其他人可以透過 API 來使用這些模型。而 Ray Serve 就是一個可以讓我們快速部署 ML model 的 framework。
而從前一篇的內容可以知道,透過 ray remote
來執行程式碼,可以讓我們在多個節點上執行程式碼,並看到效能可以有顯著的提升。而 Ray Serve 的目的就是將 Ray 的分散式運算能力,應用在 ML model 的部署上。
pip install "ray[serve]"
一個簡單的 HTTP server:
import requests
from starlette.requests import Request
from typing import Dict
from ray import serve
# 1: Define a Ray Serve application.
@serve.deployment(route_prefix="/")
class MyModelDeployment:
def __init__(self, msg: str):
# Initialize model state: could be very large neural net weights.
self._msg = msg
def __call__(self, request: Request) -> Dict:
return {"result": self._msg}
app = MyModelDeployment.bind(msg="Hello world!")
# 2: Deploy the application locally.
serve.run(app)
# 3: Query the application and print the result.
print(requests.get("http://localhost:8000/").json())
# {'result': 'Hello world!'}
FastAPI server 是一個可以讓我們快速建立 API server 的 framework,而 Ray Serve 也可以與 FastAPI 整合,讓我們可以快速建立一個 API server。
import requests
from fastapi import FastAPI
from ray import serve
# 1: Define a FastAPI app and wrap it in a deployment with a route handler.
app = FastAPI()
@serve.deployment(route_prefix="/")
@serve.ingress(app)
class FastAPIDeployment:
# FastAPI will automatically parse the HTTP request for us.
@app.get("/hello")
def say_hello(self, name: str) -> str:
return f"Hello {name}!"
# 2: Deploy the deployment.
serve.run(FastAPIDeployment.bind())
# 3: Query the deployment and print the result.
print(requests.get("http://localhost:8000/hello", params={"name": "Theodore"}).json())
# "Hello Theodore!"
import requests
import starlette
from typing import Dict
from ray import serve
from ray.serve.handle import DeploymentHandle
# 1. Define the models in our composition graph and an ingress that calls them.
@serve.deployment
class Adder:
def __init__(self, increment: int):
self.increment = increment
def add(self, inp: int):
return self.increment + inp
@serve.deployment
class Combiner:
def average(self, *inputs) -> float:
return sum(inputs) / len(inputs)
@serve.deployment
class Ingress:
def __init__(self, adder1, adder2, combiner):
self._adder1: DeploymentHandle = adder1.options(use_new_handle_api=True)
self._adder2: DeploymentHandle = adder2.options(use_new_handle_api=True)
self._combiner: DeploymentHandle = combiner.options(use_new_handle_api=True)
async def __call__(self, request: starlette.requests.Request) -> Dict[str, float]:
input_json = await request.json()
final_result = await self._combiner.average.remote(
self._adder1.add.remote(input_json["val"]),
self._adder2.add.remote(input_json["val"]),
)
return {"result": final_result}
# 2. Build the application consisting of the models and ingress.
app = Ingress.bind(Adder.bind(increment=1), Adder.bind(increment=2), Combiner.bind())
serve.run(app)
# 3: Query the application and print the result.
print(requests.post("http://localhost:8000/", json={"val": 100.0}).json())
# {"result": 101.5}
以上範例可綜合使用,例如透過 FastAPI framework 的方式做為 API server,並且透過 Ray Serve 的方式來部署 ML model。另外,也可以 Serve 不同的 model,是不是很方便呢?
Reference: