发布于 2015-12-31 00:22:59 | 206 次阅读 | 评论: 0 | 来源: 网友投递
Scrapy Python的爬虫框架
Scrapy是一个Python开发的一个快速,高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。
Scrapy 1.0.4 发布,更新如下:
Ignoring xlib/tx folder, depending on Twisted version. (:commit:`7dfa979`)
Run on new travis-ci infra (:commit:`6e42f0b`)
Spelling fixes (:commit:`823a1cc`)
escape nodename in xmliter regex (:commit:`da3c155`)
test xml nodename with dots (:commit:`4418fc3`)
TST don't use broken Pillow version in tests (:commit:`a55078c`)
disable log on version command. closes #1426 (:commit:`86fc330`)
disable log on startproject command (:commit:`db4c9fe`)
Add PyPI download stats badge (:commit:`df2b944`)
don't run tests twice on Travis if a PR is made from a scrapy/scrapy branch (:commit:`a83ab41`)
Add Python 3 porting status badge to the README (:commit:`73ac80d`)
fixed RFPDupeFilter persistence (:commit:`97d080e`)
TST a test to show that dupefilter persistence is not working (:commit:`97f2fb3`)
explicit close file on file:// scheme handler (:commit:`d9b4850`)
Disable dupefilter in shell (:commit:`c0d0734`)
DOC: Add captions to toctrees which appear in sidebar (:commit:`aa239ad`)
DOC Removed pywin32 from install instructions as it's already declared as dependency. (:commit:`10eb400`)
Added installation notes about using Conda for Windows and other OSes. (:commit:`1c3600a`)
Fixed minor grammar issues. (:commit:`7f4ddd5`)
fixed a typo in the documentation. (:commit:`b71f677`)
Version 1 now exists (:commit:`5456c0e`)
fix another invalid xpath error (:commit:`0a1366e`)
fix ValueError: Invalid XPath: //div/[id="not-exists"]/text() on selectors.rst (:commit:`ca8d60f`)
Typos corrections (:commit:`7067117`)
fix typos in downloader-middleware.rst and exceptions.rst, middlware -> middleware (:commit:`32f115c`)
Add note to ubuntu install section about debian compatibility (:commit:`23fda69`)
Replace alternative OSX install workaround with virtualenv (:commit:`98b63ee`)
Reference Homebrew's homepage for installation instructions (:commit:`1925db1`)
Add oldest supported tox version to contributing docs (:commit:`5d10d6d`)
Note in install docs about pip being already included in python>=2.7.9 (:commit:`85c980e`)
Add non-python dependencies to Ubuntu install section in the docs (:commit:`fbd010d`)
Add OS X installation section to docs (:commit:`d8f4cba`)
DOC(ENH): specify path to rtd theme explicitly (:commit:`de73b1a`)
minor: scrapy.Spider docs grammar (:commit:`1ddcc7b`)
Make common practices sample code match the comments (:commit:`1b85bcf`)
nextcall repetitive calls (heartbeats). (:commit:`55f7104`)
Backport fix compatibility with Twisted 15.4.0 (:commit:`b262411`)
pin pytest to 2.7.3 (:commit:`a6535c2`)
Merge pull request #1512 from mgedmin/patch-1 (:commit:`8876111`)
Merge pull request #1513 from mgedmin/patch-2 (:commit:`5d4daf8`)
Typo (:commit:`f8d0682`)
Fix list formatting (:commit:`5f83a93`)
fix scrapy squeue tests after recent changes to queuelib (:commit:`3365c01`)
Merge pull request #1475 from rweindl/patch-1 (:commit:`2d688cd`)
Update tutorial.rst (:commit:`fbc1f25`)
Merge pull request #1449 from rhoekman/patch-1 (:commit:`7d6538c`)
Small grammatical change (:commit:`8752294`)
Add openssl version to version command (:commit:`13c45ac`)
更多内容请看:news.rst
下载地址:1.0.4
Scrapy 是一套基于基于Twisted的异步处理框架,纯python实现的爬虫框架,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内容以及各种图片,非常之方便~