[Day 07] 爬蟲前要先理解的HTML架構

2019 iT 邦幫忙鐵人賽

DAY 7

AI & Data

Scrapy爬蟲與資料處理30天筆記系列第 7 篇

2019鐵人賽

plusone

團隊NUTC_imac

2018-10-22 15:30:21

8001 瀏覽

分享至

嗨，第7天，說明完requests之後，來說明HTML吧。

HTML 是一種標記語言（markup language），非一般的程式設計語言。它告訴瀏覽器該如何呈現網頁HTML，含了一系列的元素（elements），而元素包含了標籤（tags）與內容（content）。

舉例來說：

<p class="hello-title"> hello world </p>

起始標籤 : \<p\>
結束標籤 : <\/p\>
內容 : hello world
屬性 : class="hello-title"

屬性(Attribute)不會呈現在網頁上，但可以透過屬性提供更多的資訊幫助我們編輯網頁的呈現，包含：

在元素名稱和屬性之間有一個空格(標籤內可有多個屬性)
屬性名稱後面接=符號
屬性包在起始標籤裡面

HTML 主要架構：

<!DOCTYPE html> : 文件類型（doctype）
<html></html> : <html>元素為根元素包含了所有顯示在該網頁面的內容。
<head></head> : 裡面放的是你想涵括的重要資訊，但不會顯示於網頁上的。
<body></body> : 包含所有會顯示於網頁瀏覽者眼前的內容。
<title></title> : 呈現於網頁瀏覽者眼前的網頁標題。

以下是一個HTML範例：

<!DOCTYPE html>
<html>
<head>
    <title>The Dormouse's story</title>
</head>
<body>
    <p class="title"><b>The Dormouse's story</b></p>
    <p class="story">Once upon a time there were three little sisters; and their names were
        <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
        <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
        <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
        and they lived at the bottom of a well.</p>
    <p class="story">...</p>
</body>
</html>