标点符(钱魏 Way)

Google Search Appliance 勿抓取以下格式的网址

 以下内容为Google Search Appliance 勿抓取以下格式的网址自带配置,了解搜索引擎不抓取哪些网址,才能避免自己生成的网址不被搜索引擎接受,同事如果自己想要不让搜索引擎收入某些页面,也可以在URL中使用某些特殊的符号等。

# The following are popular filetype extensions – uncomment the lines to
# disable crawling them
# Microsoft Word
#regexpIgnoreCase:\\.doc$
# Microsoft Excel
#regexpIgnoreCase:\\.xls$
#regexpIgnoreCase:\\.xlw$
# Microsoft Powerpoint
#regexpIgnoreCase:\\.ppt$
# Microsoft Access
#regexpIgnoreCase:\\.mdb$
# DBase / Xbase
#regexpIgnoreCase:\\.dbf$
# Adobe Portable Document Format (PDF)
#regexpIgnoreCase:\\.pdf$
# Rich Text Format (RTF)
#regexpIgnoreCase:\\.rtf$
# The following are some typical Microsoft SharePoint patterns that you
# may not want to crawl – uncomment the lines to disable crawling them.
#contains:_layouts
#contains:_vti_bin
#WebFldr.aspx$
#Upload.aspx$
#EditForm.aspx$
# These patterns prevent crawling of repetitive URLs
# prevents http://example.com/foo/foo/foo/…..
regexp:/([^/]*)/\\1/\\1/
# prevents http://example.com/foo/bar/foo/bar/….
regexp:/([^/]*)/([^/]*)/\\1/\\2/
# prevents http://example.com/foo?bar=1&bar=1&bar=1…
regexp:&([^&]*)&\\1&\\1
##############################################
# Filetypes we don’t crawl
# Images
.gif$
.jpg$
.jpeg$
.png$
# Used instead of jpeg sometimes
.jpe$
.pcx$
.tif$
.tiff$
.bmp$
# Binaries/Executables
regexpIgnoreCase:\\.dll$
regexpIgnoreCase:\\.exe$
.a$
.o$
.so$
.bin$
.class$
.jnilib$
# Font types
# true type font
.ttf$
.pfb$
.pfm$
.afm$
# Mac files
.hqx$
.sea$
.dmg$
# Adobe
#.ps$
#.ps.gz$
#.ps.Z$
.eps$
.ai$
# Media
.ram$
.wav$
.avi$
.mid$
.mov$
.mpg$
.mpeg$
.mp3$
.ogg$
.3gp$
.m4a$
.m4v$
.wma$
.wmv$
.wrl$
# Databases
.dat$
.dta$
.log$
.lst$
# Archives, except .ps.gz, .ps.Z
.bz2$
.jar$
.arj$
.cab$
.rar$
.rpm$
.tar$
.zip$
.tar.gz$
.upp$
.tgz$
.sdd$
regexpIgnoreCase:([^.]..|[^p].|[^s])[.]z$
regexpIgnoreCase:([^.]..|[^p].|[^s])[.]gz$
.lzh$
.msi$
# Linux distribution files
.hdr$
.iso$
.img$
.gpg$
# Google
.gg$
.kml$
.kmz$
.skb$
.skp$
# Others
.gbk$
.fac$
.ghg$
.mdic$
.chm$
.mht$
# Apache directory listings
/?S=A$
/?S=D$
/?D=A$
/?D=D$
/?M=A$
/?M=D$
/?N=A$
/?N=D$
/?C=N&O=A$
/?C=M&O=A$
/?C=S&O=A$
/?C=D&O=A$
/?C=N&O=D$
/?C=M&O=D$
/?C=S&O=D$
/?C=D&O=D$
/?C=N;O=A$
/?C=M;O=A$
/?C=S;O=A$
/?C=D;O=A$
/?C=N;O=D$
/?C=M;O=D$
/?C=S;O=D$
/?C=D;O=D$
# Invalid characters
contains:\001
contains:\002
contains:\003
contains:\004
contains:\005
contains:\006
contains:\007
contains:\010
contains:\011
contains:\012
contains:\013
contains:\014
contains:\015
contains:\016
contains:\017
contains:\020
contains:\021
contains:\022
contains:\023
contains:\024
contains:\025
contains:\026
contains:\027
contains:\030
contains:\031
contains:\032
contains:\033
contains:\034
contains:\035
contains:\036
contains:\037
contains:\040
contains:\177
# Invalid endings
.html/$
.htm/$
.phtml/$
.ghtml/$
.asp/$
.jsp/$
.shtml/$
# Invalid endings
!/
“/
$/
%/
&/
‘/
(/
)/
+/
,/
./
</
=/
>/
{/
|/
}/
~/
[/
\\\
]/
^/
`/

码字很辛苦,转载请注明来自标点符《Google Search Appliance 勿抓取以下格式的网址》

评论