Day 30: 使用Amazon Bedrock的AI21 Labs Jurassic-2 model分析sql query

2024 iThome 鐵人賽

DAY 30

DevOps

探索亞馬遜雨林30天系列第 30 篇

16th鐵人賽 aws bedrock ai21 labs jurassic-2

Judy Huang

2024-10-11 09:05:47

499 瀏覽

分享至

最後一天，來說說現在最紅的AI。最近在學習如何使用postgre的Explain Analyze分析sql語法，靈機一動想說何不問問AI怎麼根據分析的結果去優化資料庫，於是就用Amazon Bedrock的生成式AI服務，怎麼優化資料庫。

要在Amazon Bedrock使用Jurassic-2，需要先去Model access那裡打開model的使用權限。

然後就可以透過API去使用Model。如果是使用Golang，可以用官方提供的sdk。指定Model Id，並給定該model的參數，像是maxTokens和temperature，然後準備好prompt就可以讓AI幫忙囉！API的使用方法可以參考下面的範例。
aws-doc-sdk-examples

package main

import (
    "context"
    "encoding/json"
    "fmt"
    "github.com/spf13/viper"
    "log"
    "os"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/service/bedrockruntime"
)

type Jurassic2Request struct {
    Prompt      string  `json:"prompt"`
    MaxTokens   int     `json:"maxTokens,omitempty"`
    Temperature float64 `json:"temperature,omitempty"`
}

type Jurassic2Response struct {
    Completions []Completion `json:"completions"`
}
type Completion struct {
    Data Data `json:"data"`
}
type Data struct {
    Text string `json:"text"`
}

type InvokeModelWrapper struct {
    BedrockRuntimeClient *bedrockruntime.Client
}

// Invokes AI21 Labs Jurassic-2 on Amazon Bedrock to run an inference using the input
// provided in the request body.
func (wrapper InvokeModelWrapper) InvokeJurassic2(prompt string) (string, error) {
    modelId := "ai21.j2-mid-v1"

    body, err := json.Marshal(Jurassic2Request{
        Prompt:      prompt,
        MaxTokens:   1000,
        Temperature: 0.5,
    })

    if err != nil {
        log.Fatal("failed to marshal", err)
    }

    output, err := wrapper.BedrockRuntimeClient.InvokeModel(context.TODO(), &bedrockruntime.InvokeModelInput{
        ModelId:     aws.String(modelId),
        ContentType: aws.String("application/json"),
        Body:        body,
    })

    if err != nil {
        fmt.Println(err, modelId)
    }

    var response Jurassic2Response
    if err := json.Unmarshal(output.Body, &response); err != nil {
        log.Fatal("failed to unmarshal", err)
    }

    return response.Completions[0].Data.Text, nil
}

func main() {
    region := "us-east-1"
    sdkConfig, err := config.LoadDefaultConfig(context.Background(), config.WithRegion(region))
    if err != nil {
        log.Fatal(err)
    }

    client := bedrockruntime.NewFromConfig(sdkConfig)
    wrapper := InvokeModelWrapper{client}
    prompt := "You are an experienced backend engineer and DBA with many years of experience. Please provide optimization suggestions."
    fmt.Println(wrapper.InvokeJurassic2(prompt))
}

而在優化資料庫的部分，我選擇在prompt提供db schema和Explain Analyze的結果讓AI參考，並讓AI給建議。Explain Analyze分析的sql query如下：

SELECT *
FROM public.person
join city on city.city_id = person.city_id
ORDER by city.city_id ASC

prompt先指定AI的身份，然後請他根據給的db schema和Explain Analyze進行分析。

You are an experienced backend engineer and DBA with many years of experience. 
Please provide optimization suggestions based on the following DB schema and 
EXPLAIN analysis.\n-- Table: public.country

-- DROP TABLE IF EXISTS public.country;

CREATE TABLE IF NOT EXISTS public.country
(
    country_id integer NOT NULL DEFAULT nextval('country_country_id_seq'::regclass),
    country character varying(50) COLLATE pg_catalog."default" NOT NULL,
    last_update timestamp without time zone NOT NULL DEFAULT now(),
    CONSTRAINT country_pkey PRIMARY KEY (country_id)
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.country
    OWNER to postgres;

-- Trigger: last_updated

...
...
...
"QUERY PLAN"
"Gather Merge  (cost=1505813.23..3158730.36 rows=14166872 width=55) (actual time=9621.824..13488.480 rows=16971667 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Sort  (cost=1504813.20..1522521.79 rows=7083436 width=55) (actual time=9492.646..10397.541 rows=5657222 loops=3)"
"        Sort Key: person.city_id"
"        Sort Method: external merge  Disk: 386240kB"
"        Worker 0:  Sort Method: external merge  Disk: 376264kB"
"        Worker 1:  Sort Method: external merge  Disk: 372944kB"
"        ->  Hash Join  (cost=18.50..214634.22 rows=7083436 width=55) (actual time=146.802..3947.501 rows=5657222 loops=3)"
"              Hash Cond: (person.city_id = city.city_id)"
"              ->  Parallel Seq Scan on person  (cost=0.00..195890.36 rows=7083436 width=32) (actual time=0.368..2631.384 rows=5666667 loops=3)"
"              ->  Hash  (cost=11.00..11.00 rows=600 width=23) (actual time=146.369..146.379 rows=600 loops=3)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 43kB"
"                    ->  Seq Scan on city  (cost=0.00..11.00 rows=600 width=23) (actual time=0.019..0.062 rows=600 loops=3)"
"Planning Time: 2.772 ms"
"JIT:"
"  Functions: 30"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 7.985 ms, Inlining 195.804 ms, Optimization 125.889 ms, Emission 117.084 ms, Total 446.762 ms"
"Execution Time: 14095.377 ms"

實測結果每次的response都不太一樣，不知道是不是temperature設0.5太高了XD。有一些建議確實還滿有道理，這個範例是隨意想的，所以兩個table關聯的欄位沒有建foreign key，很細心地點出一些問題，是個可以幫忙注意一些細節的小夥伴>///////<。

完整的範例可以參考下面的Repository
https://gitlab.com/hjoruhjoru/aws-ai