iT邦幫忙

14

不要把網頁 cache 起來的語法

一些動態網頁,可能會被瀏覽器 cache 起來,而看不出其變化,原來有以下幾個語法可防止被 cache ,而每次 query 就都到網站來抓該網頁。
一般HTML上的語法:

<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
<!-- IE可能不見得有效 -->
<META HTTP-EQUIV="EXPIRES" CONTENT="0">
<!-- 設定成馬上就過期 -->
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">
<!-- 與第一行是同樣的作用 -->
# 其參數可以用這些方式設定:
#HTTP 1.1. Allowed values = PUBLIC | PRIVATE | NO-CACHE | NO-STORE.
#Public - may be cached in public shared caches
#Private - may only be cached in private cache
#no-Cache - may not be cached
#no-Store - may be cached but not archived 
<META HTTP-EQUIV="EXPIRES" CONTENT="Mon, 22 Jul 2002 11:12:01 GMT">
<!-- 常見此寫法 -->

在 perl 的 CGI 的寫法:

print "Content-type: text/html\; charset=big5\n";
print "Pragma: no-cache\n";
print "expires: Mon, 22 Jul 2002 11:12:01 GMT\n\n";

要不要給 Google 或其他 Spider 抓的寫法:

<META NAME="ROBOTS" CONTENT="ALL">
<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
<META NAME="ROBOTS" CONTENT="NONE"> 
# 可用的參數:
#CONTENT="ALL | NONE | NOINDEX | INDEX| NOFOLLOW | FOLLOW | NOARCHIVE"
#default = empty = "ALL"
#"NONE" = "NOINDEX, NOFOLLOW"
#
#The CONTENT field is a comma separated list:
#INDEX: search engine robots should include this page.
#FOLLOW: robots should follow links from this page to other pages.
#NOINDEX: links can be explored, although the page is not indexed.
#NOFOLLOW: the page can be indexed, but no links are explored.
#NONE: robots can ignore the page.
#NOARCHIVE: Google uses this to prevent archiving of the page. See http://www.google.com/bot.html 
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> 

最簡單是在網頁根目錄加上robots.txt防止任何Spider來抓。

User-agent: *
Disallow: /

2 則留言

0
jamesjan
iT邦高手 1 級 ‧ 2008-10-31 08:32:05

感謝!很實用

0

謝謝分享咯!

我要留言

立即登入留言