发布于 2016-10-24 02:17:33 | 156 次阅读 | 评论: 0 | 来源: 网友投递
jsoup HTML解析器
jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。
Jsoup 1.10.1 发布了,更新内容如下:
改进
Improved support for extended HTML entities, including supplemental characters and multiple character references. Also reduced memory consumption of the entity tables.
Added support for *|E
wildcard namespace selectors.
Added support for setting multiple connection headers in Jsoup.connect
at once with Connection.headers(Map)
Added support for setting/overriding the response character set in Connection.Response
, for cases where the charset is not defined by the server, or is defined incorrectly.
Improved the performance of class selectors by reducing memory allocation and garbage collection.
Improved performance of HTML output by reducing the creation of temporary attribute list iterators.
修复
Fixed an issue when converting to the W3CDom XML, where valid (but ugly) HTML attribute names containing characters like "
could not be converted into valid XML attribute names. These attribute names are now normalized if possible, or not added to the XML DOM.
Fixed an OOB exception when loading an empty-body URL and parsing with the XML parser.
Fixed an issue where attribute names starting with a slash would be parsed incorrectly.
Don't reuse charset encoders from OutputSettings, to make threadsafe.
Fixed an issue in connections with a requestBody where a custom content-type header could be ignored.
下载地址: