阿，又是一個RAG :: 2025 iThome 鐵人賽

poyuanchih (poyuanchih)

iT邦新手 5 級 ‧ 點數 267

數學系

1175

累計瀏覽數

站內簡訊追蹤

鐵人檔案

2025 iThome 鐵人賽

回列表

生成式 AI

阿，又是一個RAG 系列

這是一個以 RAG為名的系列，但其實大部分篇幅應該都在處理資料，預計內容包含：
(1) 取得(context, question, answer)對的三種方法
(2) 以Label Studio建立ground-truth
(3) 搭建 RAG baseline
(4) 探索驗證框架
(5) 各式方法論的實測
我會實測檢索的recall 、答題的忠實度以及 LLM as a judge的表現
看看現代 LLM／RAG／Agent 在自製題目集的能力與限制

鐵人鍊成｜共 30 篇文章｜ 1 人訂閱訂閱系列文 RSS系列文

0 Like 0 留言 161 瀏覽

DAY 21

Day20: Structured Output Challenge！五大選手比拚結果

Situation 我們在 Day18: structured output challenge 跑了 5 個選手的 inference 結果，他們分別是：...

2025-10-05 ‧ 由 poyuanchih 分享

0 Like 0 留言 159 瀏覽

DAY 22

Day21: Evaluating the correctness evaluator

Situation 我們昨天 Day20: Structured Output 初驗！五大選手比拚結果使用 normalized_exact_match 初...

2025-10-06 ‧ 由 poyuanchih 分享

0 Like 0 留言 166 瀏覽

DAY 23

Day22: Evaluating Semantic Similarity and P-R Curve

tl;dr 我們今天會實際驗證兩個 embed model(text-embedding-3-small 和 text-embedding-ada-002)...

2025-10-07 ‧ 由 poyuanchih 分享

0 Like 0 留言 164 瀏覽

DAY 24

Day23: RAG as workflow

Intro 我們今天要來實作 RAG baseline，當然，用的是 llama-index 的 workflow 如果你對 llama-index 的...

2025-10-08 ‧ 由 poyuanchih 分享

0 Like 0 留言 173 瀏覽

DAY 25

Day24: Baseline RAG 的 Evaluation

Intro 我們昨天用 workflow 架構了我們的 Baseline RAG，並且跑出了對應的回答我們今天有三個需求：首先我們需要一個 End-to-E...

2025-10-09 ‧ 由 poyuanchih 分享

0 Like 0 留言 162 瀏覽

DAY 26

Day25: 實戰：MCQ Data Challenge

Intro 最後一個篇章我們要把 Evaluating 加到解題的過程裡系統會自己檢索，自己看看有沒有找到，再自己回答，(也許)再自己看看有沒有亂講所...

2025-10-10 ‧ 由 poyuanchih 分享

0 Like 0 留言 159 瀏覽

DAY 27

Day26: Online Retriever Evaluator

Intro 延續昨天的討論: 目前主要的問題集中在 Retriever 階段我們可以透過 Context_Relevancy 來驗證 Retrieval 結...

2025-10-11 ‧ 由 poyuanchih 分享

0 Like 0 留言 358 瀏覽

DAY 28

Day27: 開源的標註工具： Label-Studio

Day27: 開源的標註工具： Label-Studio Intro 今天是相對獨立的一篇，我們想要介紹 Label-Studio 這款標註工具順便把我們最...

2025-10-12 ‧ 由 poyuanchih 分享

0 Like 0 留言 161 瀏覽

DAY 29

Day28: 清理檢索回來的文檔

Intro 我們今天將會搭建我們開篇以來最複雜的 workflow，他的長相如下：但實測後這個 workflow 的答題率仍然只有 7/10究竟是怎麼回事，我們...

2025-10-13 ‧ 由 poyuanchih 分享

0 Like 0 留言 171 瀏覽

DAY 30

Day29: multi step workflow 與 Day30: 總結

Day29: multi step workflow Intro 首先是我們今天架設的 workflow 的正確題數： print(f"{...

2025-10-14 ‧ 由 poyuanchih 分享

poyuanchih的鐵人檔案

poyuanchih的收藏

poyuanchih的追蹤

poyuanchih的Like

poyuanchih的紀錄

poyuanchih的訂閱列表