額外小知識
void element
有些 HTML tag 是不需要結尾的,也會被自動認定成 tagSelfClose,像是 <img> <br> <hr> 等等,這種 tag 就稱為 void element。
那 void element 有那些呢 ?
可以參考 w3.org - 4.3. Elements 來查看有哪些 void element。
昨天我們分析了 HTML 的狀態圖,今天我們來實作一下 HTML 加上 attr 的 Tokenizer 吧!
預計輸入 - sample.html
<div id="app">
    <p class="text-red">Vue</p>
    <input type="text" id="username" placeholder="請輸入姓名" disabled/>
    <img src="https://ithelp.ithome.com.tw/storage/image/fight.svg"
         alt='"圖片"'>
    <p style="margin-top: 3px">Template</p>
</div>
預期輸出 Tokens
const tokens = [
    {"type": "tagStart", "name": "div", attrStr: `id="app"`},
    {"type": "tagStart", "name": "p", attrStr: `class="text-red"`},
    {"type": "text", "content": "Vue"},
    {"type": "tagEnd", "name": "p"},
    {
        "type": "tagSelfClose",
        "name": "input",
        isVoidElement: true,
        attrStr: `type="text" id="username" placeholder="請輸入姓名" disabled`
    },
    {
        "type": "tagSelfClose",
        "name": "img",
        isVoidElement: true,
        attrStr: `src="https://ithelp.ithome.com.tw/storage/image/fight.svg" \n alt='"圖片"'`
    },
    {"type": "tagStart", "name": "p", attrStr: `style="margin-top: 3px"`},
    {"type": "text", "content": "Template"},
    {"type": "tagEnd", "name": "p"},
    {"type": "tagEnd", "name": "div"},
]
try {
    while (charList.length > 0) {
        const current = charList.shift();
        if (CURR_STATUS === STATUS.INITIAL) handle_INITIAL(current);
        if (CURR_STATUS === STATUS.IN_TAG) handle_IN_TAG(current);
        if (CURR_STATUS === STATUS.IN_TAG_END) handle_IN_TAG_END(current);
    }
} catch (e) {
    console.log('e=', e);
}
console.log('tokens=', tokens);
const STATUS = {
    INITIAL: 0,
    IN_TAG: 1,
    IN_TAG_END: 2,
    IN_ATTR: 3,
}
利用 void-elements 套件的資料來判斷是否為 void element。
const voidElements = require('void-elements');
const voidElementChecker = tagName => voidElements[tagName];
handle_IN_TAG_END 跟 handle_INITIAL 根據昨天的分析,這兩個狀態我們不用動。
4.3 - IN_TAG 的狀態變化
const handle_IN_TAG = current => {
    // 遇到空格變成狀態 IN_ATTR
    if (current === ' ') {
        CURR_STATUS = STATUS.IN_ATTR;
        const tagName = collected;
        const isVoidElement = voidElementChecker(tagName);
        const token = isVoidElement ? {type: 'tagSelfClose', name: tagName, isVoidElement} :  {type: 'tagStart', name: tagName};
        tokens.push(token);
        resetCollect();
        return;
    }
    // 跟 [ day-12 ] 的實作相同
}
4.4 - IN_ATTR 的狀態變化
const handle_IN_ATTR = current => {
    const next = charList[0];
    if (current === '>' || (current === '/' && next === '>')) {
        CURR_STATUS = STATUS.INITIAL;
        const attrStr = collected;
        tokens[tokens.length - 1].attrStr = attrStr; // 設定最後一個 token 的 attrStr
        resetCollect();
        return;
    }
    // 如果不是上述的特殊字元,則收集起來
    if (isAlphabet(current)) {
        collected += current;
    }
}
將上面的區塊做整合,就可以得到 完整程式碼 htmlTokenizer.js
完整程式碼 htmlTokenizer.js 請到 github 上查看