不要把網頁 cache 起來的語法

瀏覽器 cache html 網頁設計搜尋引擎

逮丸逮丸 2008-10-30 17:07:55 ‧ 78846 瀏覽

分享至

一些動態網頁，可能會被瀏覽器 cache 起來，而看不出其變化，原來有以下幾個語法可防止被 cache ，而每次 query 就都到網站來抓該網頁。
一般HTML上的語法：

<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
<!-- IE可能不見得有效 -->
<META HTTP-EQUIV="EXPIRES" CONTENT="0">
<!-- 設定成馬上就過期 -->
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">
<!-- 與第一行是同樣的作用 -->
# 其參數可以用這些方式設定：
#HTTP 1.1. Allowed values = PUBLIC | PRIVATE | NO-CACHE | NO-STORE.
#Public - may be cached in public shared caches
#Private - may only be cached in private cache
#no-Cache - may not be cached
#no-Store - may be cached but not archived 
<META HTTP-EQUIV="EXPIRES" CONTENT="Mon, 22 Jul 2002 11:12:01 GMT">
<!-- 常見此寫法 -->

在 perl 的 CGI 的寫法：

print "Content-type: text/html\; charset=big5\n";
print "Pragma: no-cache\n";
print "expires: Mon, 22 Jul 2002 11:12:01 GMT\n\n";

要不要給 Google 或其他 Spider 抓的寫法：

<META NAME="ROBOTS" CONTENT="ALL">
<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
<META NAME="ROBOTS" CONTENT="NONE"> 
# 可用的參數：
#CONTENT="ALL | NONE | NOINDEX | INDEX| NOFOLLOW | FOLLOW | NOARCHIVE"
#default = empty = "ALL"
#"NONE" = "NOINDEX, NOFOLLOW"
#
#The CONTENT field is a comma separated list:
#INDEX: search engine robots should include this page.
#FOLLOW: robots should follow links from this page to other pages.
#NOINDEX: links can be explored, although the page is not indexed.
#NOFOLLOW: the page can be indexed, but no links are explored.
#NONE: robots can ignore the page.
#NOARCHIVE: Google uses this to prevent archiving of the page. See http://www.google.com/bot.html 
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

最簡單是在網頁根目錄加上robots.txt防止任何Spider來抓。