发布于 2016-01-15 03:34:02 | 284 次阅读 | 评论: 0 | 来源: PHPERZ

这里有新鲜出炉的Scrapy 0.24 中文文档,程序狗速度看过来!

Scrapy Python的爬虫框架

Scrapy是一个Python开发的一个快速,高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。


class TestSmtSpider(scrapy.Spider):

    name = "test_smt"

    #allowed_domains = ["http://www.maiziedu.com/"]

    start_urls = ('http://www.maiziedu.com/',

    )

    def parse(self, response):

        # scrapy.FormRequest()

        return scrapy.FormRequest(url='http://www.maiziedu.com/user/login/',

                                                formdata={'account_l':'******',

                                                      'password_l':'****'},

                                                callback=self.after_l)

    def after_l(self,response):

        print(response.body)

        return scrapy.Request(url='http://www.maiziedu.com/user/center/?source=login',

                              callback=self.after_lo)

    def after_lo(self,response):

        rel = r'class="dt-username"([\s\S]*?)v5-icon v5-icon-rd'

        my_name = re_fuc(response.body,rel)[0]

        print('*******************', my_name)

另外一种:

It is usual for web sites to provide pre-populated form fields through <input type="hidden"> elements, such

as session related data or authentication tokens (for login pages). When scraping, you’ll want these fields to be

automatically pre-populated and only override a couple of them, such as the user name and password. You can use the

FormRequest.from_response() method for this job. Here’s an example spider which uses it:

import scrapy

class LoginSpider(scrapy.Spider):

name = 'example.com'

start_urls = ['http://www.example.com/users/login.php']

def parse(self, response):

return scrapy.FormRequest.from_response(

response,

formdata={'username': 'john', 'password': 'secret'},

callback=self.after_login

)

def after_login(self, response):

# check login succeed before going on

if "authentication failed" in response.body:

self.logger.error("Login failed")

return



最新网友评论  共有(0)条评论 发布评论 返回顶部

Copyright © 2007-2017 PHPERZ.COM All Rights Reserved   冀ICP备14009818号  版权声明  广告服务