Jsoup Iterate all elements of HTML example shows how to select and iterate all elements of the HTML document using Jsoup. The example also shows how to iterate elements of the HTML body.
How to iterate all elements of HTML using Jsoup?
Jsoup provides the select
method which accepts CSS style selectors to select the HTML elements. For selecting all the elements of an HTML page, you need to use the “*” as the selector as given below.
1 |
document.select("*"); |
The “*” selector selects all the elements of the HTML document. You can then iterate over elements using for loop as given below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
package com.javacodeexamples.libraries.jsoup; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class JsoupIterateAllElementsExample { public static void main(String[] args) { String strHTML = "<html>" + "<head>" + "<title>Page Title</title>" + "</head>" + "<body bgcolor=\"ffffff\">" + "<a href=\"http://www.google.com\">Google</a>" + "<h1>Heading 1</h1>" + "<div>div tag <strong>content</strong></div>" + "</body>" + "</html>"; //parse HTML Document document = Jsoup.parse(strHTML); //select all elements Elements elements = document.select("*"); //iterate elements using enhanced for loop for(Element e : elements) System.out.println( e.tagName() + ": " + e.text()); } } |
Output
1 2 3 4 5 6 7 8 9 |
#root: Page Title Google Heading 1 div tag content html: Page Title Google Heading 1 div tag content head: Page Title title: Page Title body: Google Heading 1 div tag content a: Google h1: Heading 1 div: div tag content strong: content |
As you can see from the output, when we print the text of an element, it prints text contained in all the child elements as well.
If you want all elements of the HTML body only, you can use the selector like,
1 |
Elements elements = document.body().select("*"); |
Output
1 2 3 4 5 |
body: Google Heading 1 div tag content a: Google h1: Heading 1 div: div tag content strong: content |
If you want only direct children of the body tag, you can use children
method as given below.
1 |
Elements elements = document.body().children(); |
Output
1 2 3 |
a: Google h1: Heading 1 div: div tag content |
As you can see from the output, the “<strong>” element was not returned because it is not a direct child of the HTML body tag.
Please also visit how to remove HTML tags from a string using Jsoup in Java.
This example is a part of the Jsoup tutorial with examples.
Please let me know your views in the comments section below.
nice explanation.
Thanks.