windtear 追求完美: 过滤搜索引擎agent的python代码

proxy代理	soft软件	IT 业界特快	norton 诺顿病毒库	代理列表	search FTP搜索	whois IP地理位置	blog 追求完美
money理财	life生活	RSS聚合门户	firefox WEB浏览器	免费域名	typeset 假古文	AntiVirus 反病毒	ipcn 站点导航

« 改密码的 expect 脚本附送加用户的 shell 脚本 | Main | 更新 ipcn proxy allow.site 顺便去掉了微软系列网站 »

July 8, 2006

过滤搜索引擎agent的python代码

版权声明：可以任意转载，转载时请务必以超链接形式标明文章原始出处和作者信息及本声明。
https://windtear.net/archives/2006/07/08/001024.html

http://windtear.net/archives/2006/07/08/001024.html

过滤搜索引擎agent的python代码

在一个大循环里面

	# deal with useragent
	spider = agent.find('Yahoo! Slurp')
	if spider != -1:
		continue
	spider = agent.find('Baiduspider')
	if spider != -1:
		continue
	spider = agent.find('Googlebot')
	if spider != -1:
		continue

这几家搜索引擎爬虫/Spider的UserAgent信息为：
Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Posted by windtear at July 8, 2006 11:48 PM

本站使用中的任何问题,请与 windtear @ windtear.net 联系
Copyright© 1999-2024 Windtear. All rights reserved.