R语言处理Web数据

发布于 2016-01-02 09:41:28 | 374 次阅读 | 评论: 0 | 来源: 网络整理

许多网站提供的数据，以供其用户的消费。例如，世界卫生组织(WHO)提供的CSV，TXT和XML文件的形式的健康和医疗信息报告。基于R程序，我们可以通过编程提取这些网站的具体数据。R中一些程序包，用来提取网络数据形式- "RCurl",XML", 和"stringr". 它们被用于连接到的URL，确定所需链接的文件，并将它们下载到本地环境。

安装R程序包

下面的软件包都需要处理的URL和链接文件。如果它们没有R环境中，可以使用下面的命令进行安装。

install.packages("RCurl")
install.packages("XML")
install.packages("stringr")
install.packages("pylr")

输入数据

我们将访问URL：气象资料，并下载使用R中的CSV文件（这是在2015年之前的数据）。

示例

我们将使用函数getHTMLLinks()来收集文件的网址。然后，我们将使用函数download.file()将文件保存到本地系统。我们将一次又一次应用相同的代码下载多个文件，我们将创建一个函数被调用多次。该文件名通过在R列表对象的形式参数到这个函数。

# Read the URL.
url <- "http://www.geos.ed.ac.uk/~weather/jcmb_ws/"

# Gather the html links present in the webpage.
links <- getHTMLLinks(url)

# Identify only the links which point to the JCMB 2015 files. 
filenames <- links[str_detect(links, "JCMB_2015")]

# Store the file names as a list.
filenames_list <- as.list(filenames)

# Create a function to download the files by passing the URL and filename list.
downloadcsv <- function (mainurl,filename){
		filedetails <- str_c(mainurl,filename)
		download.file(filedetails,filename)
		}

# Now apply the l_ply function and save the files into the current R working directory.
l_ply(filenames,downloadcsv,mainurl="http://www.geos.ed.ac.uk/~weather/jcmb_ws/")

验证文件下载

运行上面的代码后，可以在当前R工作组目录下面找到文件。

"JCMB_2015.csv"     "JCMB_2015_Apr.csv" "JCMB_2015_Feb.csv" "JCMB_2015_Jan.csv" "JCMB_2015_Mar.csv"

安装R程序包

输入数据

示例

验证文件下载

后端技术

前端技术

数据库

热门框架

常用IDE

其他