发布于 2016-10-29 08:03:59 | 257 次阅读 | 评论: 0 | 来源: 网络整理
如果您在UNIX平台上工作,那么最好安装 IPython。 如果有IPython的无法访问,您也可以使用bpython。
[settings]
shell = bpython
scrapy shell <url>
S.N |
快捷方式和说明
|
---|---|
1 | shelp()
它提供了可用对象和快捷方式的帮助选项
|
2 | fetch(request_or_url)
它会从请求或URL的响应收集相关对象可能的更新
|
3 | view(response) 可以在本地浏览器查看特定请求的响应,观察和正确显示外部链接,追加基本标签到响应正文。 |
S.N. |
对象和说明
|
---|---|
1 | crawler
它指定当前爬行对象
|
2 | spider
如果对于当前网址没有蜘蛛,那么它将通过定义新的蜘蛛处理URL或蜘蛛对象
|
3 | request
它指定了最后采集页面请求对象
|
4 | response
它指定了最后采集页面响应对象
|
5 | settings
它提供当前Scrapy设置
|
scrapy shell 'http://scrapy.org' --nolog
[s] Available Scrapy objects:
[s] crawler
[s] item {}
[s] request
[s] response <200 http://scrapy.org>
[s] settings
[s] spider
[s] Useful shortcuts:
[s] shelp() Provides available objects and shortcuts with help option
[s] fetch(req_or_url) Collects the response from the request or URL and associated objects will get update
[s] view(response) View the response for the given request
>> response.xpath('//title/text()').extract_first()
u'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework'
>> fetch("http://reddit.com")
[s] Available Scrapy objects:
[s] crawler
[s] item {}
[s] request
[s] response <200 https://www.yiibai.com/>
[s] settings
[s] spider
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
>> response.xpath('//title/text()').extract()
[u'reddit: the front page of the internet']
>> request = request.replace(method="POST")
>> fetch(request)
[s] Available Scrapy objects:
[s] crawler
...
import scrapy
class SpiderDemo(scrapy.Spider):
name = "spiderdemo"
start_urls = [
"http://yiibai.com",
"http://yiibai.org",
"http://yiibai.net",
]
def parse(self, response):
# You can inspect one specific response
if ".net" in response.url:
from scrapy.shell import inspect_response
inspect_response(response, self)
scrapy.shell.inspect_response
2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None)
2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None)
2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None)
[s] Available Scrapy objects:
[s] crawler
...
>> response.url
'http://yiibai.org'
>> response.xpath('//div[@class="val"]')
It displays the output as
[]
>> view(response)
It displays the response as
True