AI 的文理雙全：整合 RAG 與 Judge0，強制輸出結構化 JSON

2025 iThome 鐵人賽

DAY 14

Modern Web

前端工程師的AI應用開發實戰：30天從Prompt到Production - 以打造AI前端面試官為例系列第 14 篇

17th鐵人賽 next.js gemini

windate3411

2025-09-28 23:15:04

153 瀏覽

分享至

前言

歡迎來到第二週的收尾～！再次強調，如果你有感受到這兩天的文章變水了，就跟水屬性的魔法師第11集一樣水，那你的感受是完全正確的！家族內的事情讓我實在很難抽出完整的時間寫文章，預計到Day15時我才會稍稍恢復自由之身，我會在那之後回顧一下這幾天的文章並加以完善，暫時還請你們稍微忍耐一下，感謝！

昨天我們成功的透過 Judge0 標準的流程建立了一套基本的程式碼評估系統，我們能正確地將使用者寫的程式碼提交並拿到執行後的結果，這麼一來，程式實作的題目現在我們也能至少處理最基本的演算法問題了！（前端相關的問題例如React或 Css實作則目前的設計還沒有能力處理，這會一路到第四週才作為挑戰題攻克），配合上之前 Supabase 實作的 RAG 系統，概念問題我們也能有著很不錯的回覆！最關鍵的兩大拼圖目前都已經在我們手上了，剩下的就是將拼圖拼到我們現有的框架中了，但在我們準備將兩者結合時，我們共同發現了一個至關重要的設計細節：並非所有題目都需要同時動用這兩大利器，兩者在評估正確性的部分時仰賴的是完全不同的系統，因此在將素材交給 AI 時我們需要下點工夫，免得拿水果刀切豬肉！

今天，我們就要來建立這個智慧系統的「大腦」——一個總指揮 API。它會懂得判斷題型，並為不同類型的問題指派最合適的評估工具。最終，它會命令 AI 將所有分析結果，彙整成一份穩定、可靠的結構化 JSON 報告，為我們的應用程式打下最堅實的資料基礎。

今日目標

確立設計模式：根據題型，「專才專用」地調度 RAG 或 Judge0。
設計統一的 JSON Schema：定義一個能靈活適應兩種題型的回饋結構。
打造兩套專業 Prompt：分別為「概念題」和「實作題」量身定做劇本。
啟用 SDK 的 JSON 模式：學習使用 @google/genai 的內建功能，確保 AI 回傳語法正確的 JSON。
建立總指揮 API：實作 /api/interview/evaluate，完成整個評估流程的自動化調度。

Step 1: 重新定義設計 - 專才專用，人盡其才

在我們目前的設計中，有分為「程式實作」與「概念問答」兩種題目，兩種題目有各自需要用的欄位，也有對應的判斷工具，大致上如下方的說明：

概念題 (concept)：

評估重點在於知識點的覆蓋度。因此，我們將使用 RAG 來檢索相關的 keyPoints，作為 AI 評分的黃金標準。

實作題 (code)：

評估重點在於程式碼的正確性與品質。因此，我們將使用 Judge0 來執行程式碼和測試案例，取得客觀的 stdout 和 stderr 作為主要證據。
我們的總指揮 API 將扮演智慧分流的角色，其工作流程如下：

graph TD
    A[使用者提交答案] --> B{/api/interview/evaluate};
    B --> C{判斷題目類型};
    C -- 概念題 (concept) --> D[路徑 A];
    C -- 實作題 (code) --> E[路徑 B];

    subgraph "路徑 A：概念題評估"
        D --> D1[1.執行 RAG];
        D1 --> D2[Supabase: 取得相關 keyPoints];
        D2 --> D3[2.組裝「概念題」Prompt];
        D3 --> D4[3.呼叫 Gemini];
    end

    subgraph "路徑 B：實作題評估"
        E --> E1[1.執行 Judge0];
        E1 --> E2[取得 stdout/stderr];
        E2 --> E3[2.組裝「實作題」Prompt];
        E3 --> E4[3.呼叫 Gemini];
    end

    D4 --> F[回傳結構化 JSON];
    E4 --> F;

這個設計確保了我們總是使用最合適的工具來完成任務，既高效又精準。

Step 2: 設計統一且靈活的 JSON 結構

為了適應兩種評估路徑，我們的 JSON Schema 需要保持結構統一，同時具備靈活性。

{
  "summary": "string",
  "score": "number (1-5)",
  "grounded_evidence": { // Judge0 的客觀證據
    "tests_passed": "number",
    "tests_failed": "number",
    "stderr_excerpt": "string|null"
  } | null, // 對於概念題，這個欄位將是 null
  "pros": ["string"],
  "cons": ["string"],
  "next_practice": ["string"]
}

設計關鍵：grounded_evidence 欄位。對於程式題，它會是一個包含執行結果的物件；對於概念題，它將直接是 null。前端可以依此判斷是否要渲染測試結果相關的 UI。

Step 3: 打造兩套專業劇本 (Prompts)

我們為兩條路徑分別撰寫專用的 Prompt。

方案A：概念題 Prompt (RAG-focused)

這個 Prompt 的目標是評估文字論述能力，核心是比對 keyPoints。

<role>
You are a senior frontend technical interviewer evaluating a candidate's answer to a conceptual question.
</role>
<task>
Evaluate the <candidate_answer> based on whether it covers the concepts in <rag_context>.
Your response MUST be a single, valid JSON object that adheres to the provided schema. In this conceptual evaluation, the "grounded_evidence" field MUST be null.
Answer in Traditional Chinese in a must.
</task>
<json_schema>
{
  "summary": "string",
  "score": "number (1-5)",
  "grounded_evidence": null,
  "pros": ["string"],
  "cons": ["string"],
  "next_practice": ["string"]
}
</json_schema>
<rag_context>
\${ragContext}
</rag_context>
<candidate_answer>
\${userAnswer}
</candidate_answer>

方案B：實作題 Prompt (Judge0-focused)

這個 Prompt 的目標是進行 Code Review，核心是分析 Judge0 的執行結果。

<role>
You are a world-class senior frontend technical interviewer providing a comprehensive code review based strictly on the execution result and the code itself.
</role>
<task>
Evaluate the <user_code> based on the objective <judge0_result>. Analyze the code for quality, correctness, and best practices.
Your response MUST be a single, valid JSON object that adheres to the provided schema.
Answer in Traditional Chinese in a must.
</task>
<json_schema>
{
  "summary": "string",
  "score": "number (1-5)",
  "grounded_evidence": { "tests_passed": "number", "tests_failed": "number", "stderr_excerpt": "string|null" },
  "pros": ["string"],
  "cons": ["string"],
  "next_practice": ["string"]
}
</json_schema>
<judge0_result>
\${judge0Result}
</judge0_result>
<user_code>
\${userCode}
</user_code>

Step 4: 建立總指揮 API 與啟用 JSON 模式

現在，我們來實作 /api/interview/evaluate，並啟用 @google/genai SDK 內建的 JSON 模式以確保穩定性。

// app/api/interview/evaluate/route.ts
import { NextResponse } from 'next/server';
import { createClient } from '@supabase/supabase-js';
import { GoogleGenAI, Content } from '@google/genai';
import questions from '@/data/questions.json';

// --- 初始化客戶端 ---
const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);
const GEMINI_API_KEY = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenAI({ apiKey: GEMINI_API_KEY });

// --- Prompt 模板 ---
const conceptPromptTemplate = `<role>
You are a senior frontend technical interviewer evaluating a candidate's answer to a conceptual question.
</role>
<task>
Evaluate the <candidate_answer> based on whether it covers the concepts in <rag_context>.
Your response MUST be a single, valid JSON object that adheres to the provided schema. In this conceptual evaluation, the "grounded_evidence" field MUST be null.
Answer in Traditional Chinese in a must.
</task>
<json_schema>
{
  "summary": "string",
  "score": "number (1-5)",
  "grounded_evidence": null,
  "pros": ["string"],
  "cons": ["string"],
  "next_practice": ["string"]
}
</json_schema>
<rag_context>
\${ragContext}
</rag_context>
<candidate_answer>
\${userAnswer}
</candidate_answer>`;

const codePromptTemplate = `<role>
You are a world-class senior frontend technical interviewer providing a comprehensive code review based strictly on the execution result and the code itself.
</role>
<task>
Evaluate the <user_code> based on the objective <judge0_result>. Analyze the code for quality, correctness, and best practices.
Your response MUST be a single, valid JSON object that adheres to the provided schema.
Answer in Traditional Chinese in a must.
</task>
<json_schema>
{
  "summary": "string",
  "score": "number (1-5)",
  "grounded_evidence": { "tests_passed": "number", "tests_failed": "number", "stderr_excerpt": "string|null" },
  "pros": ["string"],
  "cons": ["string"],
  "next_practice": ["string"]
}
</json_schema>
<judge0_result>
\${judge0Result}
</judge0_result>
<user_code>
\${userCode}
</user_code>`;

export async function POST(request: Request) {
  try {
    const { questionId, answer } = await request.json();

    const question = questions.find((q) => q.id === questionId);
    if (!question) {
      return NextResponse.json(
        { error: 'Question not found' },
        { status: 404 }
      );
    }

    let finalPrompt = '';

    if (question.type === 'concept') {
      // --- 概念題路徑 (RAG) ---
      const embeddingResponse = await genAI.models.embedContent({
        model: 'gemini-embedding-001',
        contents: answer,
        config: {
          outputDimensionality: 768,
        },
      });
      if (
        !embeddingResponse.embeddings ||
        embeddingResponse.embeddings.length === 0
      ) {
        return NextResponse.json(
          { error: 'Embedding response is empty' },
          { status: 500 }
        );
      }
      const answerEmbedding = embeddingResponse.embeddings[0].values;

      const { data: ragData, error: ragError } = await supabase.rpc(
        'match_documents',
        {
          query_embedding: JSON.stringify(answerEmbedding),
          match_threshold: 0.7,
          match_count: 5,
          question_id: questionId,
        }
      );

      const ragContext =
        !ragError && ragData?.length > 0
          ? ragData.map((d: any) => `- ${d.content}`).join('\n')
          : 'No relevant context found.';

      finalPrompt = conceptPromptTemplate.replace(
        /\${ragContext}/g,
        ragContext
      );
      finalPrompt = finalPrompt.replace(/\${userAnswer}/g, answer);

      console.log('finalPrompt', finalPrompt);
    } else if (question.type === 'code') {
      const judge0Response = await fetch(
        `${process.env.NEXT_PUBLIC_APP_URL}/api/judge0/execute`,
        {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({ source_code: answer }),
        }
      );

      const judge0Result = await judge0Response.json();
      const judge0ResultText = `Status: ${
        judge0Result.status.description
      }\nStdout: ${judge0Result.stdout || 'N/A'}\nStderr: ${
        judge0Result.stderr || 'N/A'
      }`;

      finalPrompt = codePromptTemplate.replace(
        /\${judge0Result}/g,
        judge0ResultText
      );

      finalPrompt = finalPrompt.replace(/\${userCode}/g, answer);
    }

    if (!finalPrompt) {
      return NextResponse.json(
        { error: 'Invalid question type' },
        { status: 400 }
      );
    }

    const contents: Content[] = [{ parts: [{ text: finalPrompt }] }];

    const result = await genAI.models.generateContent({
      model: 'gemini-2.5-flash',
      contents: contents,
      config: {
        responseMimeType: 'application/json',
      },
    });

    const responseText = result.text;
    const jsonResponse = JSON.parse(responseText || '{}');

    try {
      return NextResponse.json(jsonResponse);
    } catch (e) {
      console.error('Gemini did not return valid JSON:', responseText);
      return NextResponse.json(
        { error: 'AI response is not valid JSON' },
        { status: 500 }
      );
    }
  } catch (error) {
    console.error('Error in evaluation API:', error);
    if (error instanceof Error) {
      return NextResponse.json({ error: error.message }, { status: 500 });
    }
    return NextResponse.json(
      { error: 'Internal Server Error' },
      { status: 500 }
    );
  }
}

Step 5: 前端串接驗證

最後，我們只需小幅修改面試頁面(app/(main)/interview/[sessionId]/page.tsx)的 handleSubmit 函式。

// ...
  const handleSubmit = async () => {
    if (!answer || !currentQuestion) return;

    const newHistory: ChatMessage[] = [
      ...chatHistory,
      { role: 'user', content: answer },
    ];
    setChatHistory(newHistory);
    setAnswer('');
    setIsLoading(true);

    try {
      const response = await fetch('/api/interview/evaluate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          questionId: currentQuestion.id,
          answer: answer,
          userId: 'anonymous-user',
        }),
      });

      if (!response.ok) throw new Error('API request failed');

      const data = await response.json();
      console.log('Structured AI Feedback:', data);
      const aiResponse: ChatMessage = {
        role: 'ai',
        content: data.summary,
        evaluation: data,
      };
      setChatHistory([...newHistory, aiResponse]);
    } catch (error) {
      console.error('錯誤:', error);
      const errorResponse: ChatMessage = {
        role: 'ai',
        content: '抱歉，我現在無法提供回饋，請稍後再試。',
      };
      setChatHistory([...newHistory, errorResponse]);
    } finally {
      setIsLoading(false);
    }
  };
// ...

提交答案後，打開瀏覽器的開發者工具 Console，你將看到一個工整的、結構化的 JSON 物件。我們的 AI 面試官，終於學會了邏輯清晰地思考和表達！

今日回顧

今天，我們的專案完成了一次巨大的進化，從一個有趣的玩具，蛻變成一個具備了穩健架構的應用雛形。

✅ 我們確立了「概念題用 RAG，實作題用 Judge0」的智慧分流設計。
✅ 我們設計了一套能靈活適應不同場景的統一 JSON Schema。
✅ 我們為兩種題型量身打造了專屬的 Prompt，提升了回饋的精準度。
✅ 我們學會了使用 SDK 內建的 JSON 模式，這是邁向生產級應用的關鍵一步。
✅ 我們成功建立了一個強大的總指揮 API，實現了整個評估流程的自動化。