來聊Jsoup : JSOUP解析器

第 11 屆 iThome 鐵人賽

DAY 24

自我挑戰組

跟Kotlin一起來聊Android元件或許還有應用，或許還有一些資訊雜談系列第 24 篇

11th鐵人賽 jsoup kotlin

larsnoya

團隊好想工作室 v3.0

2019-10-09 09:55:00

2308 瀏覽

分享至

之前說過JSOUP解析器分為DOM以及Select。

DOM

 File input = new File("/tmp/input.html");
 Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
 Element content = doc.getElementById("content");
 Elements links = content.getElementsByTag("a");
 
 for (Element link : links) {
   String linkHref = link.attr("href");
   String linkText = link.text();
}

查看元素的話，可以使用

getElementById(String id)
getElementsByTag(String tag)
getElementsByClass(String className)
getElementsByAttribute(String key) (and related methods)

Select

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Elements links = doc.select("a[href]"); 
Elements pngs = doc.select("img[src$=.png]");
 
Element masthead = doc.select("div.masthead").first();
Elements resultLinks = doc.select("h3.r > a");

select可以透過許多規範來查找元素，像是

<div id=”im” name=”diva”><a herf>this is a examle</a></div>

tagname: 透過標籤查找元素，使用doc.select("div").first()
#id: 通过ID查找元素, 使用doc.select("#id").first()
[attribute]: 利用屬性查找元素，使用doc.select[name].first()
[attr=value]: 利用屬性值来查找元素，使用[name=diva]
ancestor child: 查找某元素下子元素，使用.div a查找在"div"元素下的所有 a元素