发布于 2016-12-07 02:38:46 | 204 次阅读 | 评论: 0 | 来源: 网友投递
Scrapy Python的爬虫框架
Scrapy是一个Python开发的一个快速,高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。
Scrapy 1.2.2 发布了。
更新内容:
Bug 修复
Fix a cryptic traceback when a pipeline fails on open_spider()
(issue 2011)
Fix embedded IPython shell variables (fixing issue 396 that re-appeared in 1.2.0, fixed in issue 2418)
A couple of patches when dealing with robots.txt:
handle (non-standard) relative sitemap URLs (issue 2390)
handle non-ASCII URLs and User-Agents in Python 2 (issue 2373)
文档
Document "download_latency"
key in Request
‘s meta
dict (issue 2033)
Remove page on (deprecated & unsupported) Ubuntu packages from ToC (issue 2335)
A few fixed typos (issue 2346, issue 2369, issue 2369, issue 2380) and clarifications (issue 2354, issue 2325, issue 2414)
其他变更
Advertize conda-forge as Scrapy’s official conda channel (issue 2387)
More helpful error messages when trying to use .css()
or .xpath()
on non-Text Responses (issue 2264)
startproject
command now generates a sample middlewares.py
file (issue 2335)
Add more dependencies’ version info in scrapy version
verbose output (issue 2404)
Remove all *.pyc
files from source distribution (issue 2386)
下载地址