Day02:布林模型(Boolean Model)

12th鐵人賽布林模型資訊檢索與擷取

WenTingTseng

2020-09-02 16:12:35

3727 瀏覽

分享至

在介紹布林模型(Boolean Model)之前我們要先定義一個名詞為Index term。每篇文章都由一些index term或是關鍵字(keywords)所組成的。而每一個Index term是由一個詞(word)或一群連續詞(word)所組成。

什麼是布林模型(Boolean Model)

布林模型(Boolean Model)於1973年提出，它是一個最簡單的檢索模型主要基於幾何理論 (set theory) 和布林代數(Boolean algebra)。中心思想是把所有文章建立成Index term by document的矩陣，透過布林(Boolean)運算可以計算出哪幾篇文章是使用者所想要的。

舉個例子

假設今天有一群文章

$d_1$ =This is an apple
$d_2$ =I like to go to school
$d_3$ =Apple is a kind of fruit and I like to eat it

首先，我先建立一個Index term by document的矩陣。其中列(row)表示所有文章出現過的term，行(column)表示每篇文章在這裡總共有三篇文章。
若下一個query為school，則school=[0 1 0]，answer= $d_2$ 。
若下一個query為"school" and "go"，則school^go=[0 1 0]^[0 1 0]=[0 1 0]，answer= $d_2$ 。
若下一個query為"apple" or "apple"，則apple or school=[1 0 1] or [0 1 0]=[1 1 1]，answer= $d_1$ & $d_2$ & $d_3$