发布于 2016-05-12 00:21:46 | 192 次阅读 | 评论: 0 | 来源: 网友投递

这里有新鲜出炉的Scrapy教程,程序狗速度看过来!

Scrapy Python的爬虫框架

Scrapy是一个Python开发的一个快速,高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。


Scrapy 1.1.0 发布了。

改进记录如下:

  • Scrapy 1.1 has beta Python 3 support (requires Twisted >= 15.5). See:ref:`news_betapy3` for more details and some limitations.

  • Hot new features:

  • These bug fixes may require your attention:

    • Don't retry bad requests (HTTP 400) by default (:issue:`1289`). If you need the old behavior, add 400 to :setting:`RETRY_HTTP_CODES`.

    • Fix shell files argument handling (:issue:`1710`, :issue:`1550`). If you try scrapy shell index.html it will try to load the URL http://index.html, use scrapy shell ./index.html to load a local file.

    • Robots.txt compliance is now enabled by default for newly-created projects (:issue:`1724`). Scrapy will also wait for robots.txt to be downloaded before proceeding with the crawl (:issue:`1735`). If you want to disable this behavior, update :setting:`ROBOTSTXT_OBEY` in settings.py file after creating a new project.

    • Exporters now work on unicode, instead of bytes by default (:issue:`1080`). If you use PythonItemExporter, you may want to update your code to disable binary mode which is now deprecated.

    • Accept XML node names containing dots as valid (:issue:`1533`).

    • When uploading files or images to S3 (with FilesPipeline orImagesPipeline), the default ACL policy is now "private" instead of "public" Warning: backwards incompatible!. You can use :setting:`FILES_STORE_S3_ACL` to change it.

    • We've reimplemented canonicalize_url() for more correct output, especially for URLs with non-ASCII characters (:issue:`1947`). This could change link extractors output compared to previous scrapy versions. This may also invalidate some cache entries you could still have from pre-1.1 runs.Warning: backwards incompatible!.

下载地址:



历史版本 :
Scrapy 1.5.0 发布,Web 爬虫框架
Scrapy 1.4.0 发布,Web 爬虫框架
Scrapy 1.3.3 发布,web 爬虫框架
Scrapy 1.2.3,1.1.4 和 1.0.7 发布,web 爬虫框架
Scrapy 1.3.2 发布,web 爬虫框架
Scrapy 1.3.1 发布,web 爬虫框架
Scrapy 1.3.0 发布,web 爬虫框架
Scrapy 1.2.2 发布,Web 爬虫框架
Scrapy 1.2.1 发布,web 爬虫框架
Scrapy 1.2.0 发布,web 爬虫框架
Scrapy 1.1.3 发布,web 爬虫框架
Scrapy 1.1.2 发布,web 爬虫框架
最新网友评论  共有(0)条评论 发布评论 返回顶部

Copyright © 2007-2017 PHPERZ.COM All Rights Reserved   冀ICP备14009818号  版权声明  广告服务