Jsoup is a Java library that allows users to parse HTML documents and extract data from them. The document discusses how to install Jsoup using Maven or by downloading the Jsoup JAR file. It then provides examples of using Jsoup to extract the title from a URL or HTML file, get links and images from a URL, and retrieve form parameters from HTML.
2. Introduction to Jsoup Tutorial
➢ Jsoup is a java html parser.
➢ It is a java library that to parse html document.
➢ Jsoup is uses DOM, CSS and Jquery-like method for extracting and
manipulating file.
3. How to install Jsoup?
If you want to run Jsoup query it is necessary to install jsoup.
There are two way to install jsoup:-
1. By maven pom.xml
2. By jsoup.jar file
4. Install by Maven pom.xml
To install jsoup using maven:-
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.8.1</version>
</dependacy>
5. Install by jsoup.jar file
To download jsoup.jar file:-
1. Click here to ======>>> download jsoup.jar file
2. to set the classpath of jsoup.jar file.
3. write the following command on console.
set classpath=jsoup-1.8.1.jar;.;%classpath%
6. Jsoup Example
There are given a lot jsoup examples as follow:
1. Get Title of URL
2. Get Title from HTML file
3. Get Total link of URL
4. Get meta information of URL
5. Get Total images of URL
6. Get Form parameters
7. Jsoup Example: print title of an url
Let’s see the example of print title of an url given below:
1. import java.io.IOException;
2. import org.jsoup.Jsoup;
3. import org.jsoup.nodes.Document;
4. public class FirstJsoupExample{
5. public static void main( String[] args ) throws IOException{
6. Document doc = Jsoup.connect("http://www.javatpoint.com").get();
7. String title = doc.title();
8. System.out.println("title is: " + title);
9. }
10. }
Output:
title is : Javatpoint- A solution of all Technology
8. Jsoup Example: get title from html file
Let’s see the example of get jsoup title from html file as given below:
1. import java.io.File;
2. import java.io.IOException;
3. import org.jsoup.Jsoup;
4. import org.jsoup.nodes.Document;
5. public class JsoupPrintTitlefromHtml{
6. public static void main( String[] args ) throws IOException{
7. Document doc = Jsoup.parse(new File("e:register.html"),"utf-8");
8. String title = doc.title();
9. System.out.println("title is: " + title);
10. }
11. }
Output: title is: Please Register
9. Jsoup Example: get the link of an url
1. import java.io.IOException;
2. import org.jsoup.Jsoup;
3. import org.jsoup.nodes.Document;
4. import org.jsoup.nodes.Element;
5. import org.jsoup.select.Elements;
6. public class JsoupPrintLinks {
7. public static void main( String[] args ) throws IOException{
8. Document doc = Jsoup.connect("http://www.javatpoint.com").get();
9. Elements links = doc.select("a[href]");
10. for (Element link : links) {
11. System.out.println("nlink : " + link.attr("href"));
12. System.out.println("text : " + link.text());
13. }
14. }
15. }
10. Output: get links of an url
output:-
link : http://www.javatpoint.com/contribute-us
text : Contribute Us
link : http://www.javatpoint.com/asknewquestion.jsp
text : Ask Question
link : http://www.javatpoint.com/login.jsp
text : login
.....
11. Jsoup Example: get the meta data of url
Let see the example of get meta data of url:
1. import java.io.IOException;
2. import org.jsoup.Jsoup;
3. import org.jsoup.nodes.Document;
4. public class JsoupPrintMetadata {
5. public static void main( String[] args ) throws IOException{
6. Document doc = Jsoup.connect("http://www.javatpoint.com").get();
7.
8. String keywords = doc.select("meta[name=keywords]").first().attr("content");
9. System.out.println("Meta keyword : " + keywords);
10. String description = doc.select("meta[name=description]").get(0).attr("content");
11. System.out.println("Meta description : " + description);
12. }
13. }
12. Output: get meta data of url
Output:-
Meta keyword : jsoup, tutorial, beginners, professionals, introduction, example,
java, html, parser
Meta description : Jsoup tutorial for beginners and professionals provides html
parsing facility
in java with examples of printing title, links, images, form elements from url.
13. jsoup Example: get image of url
Example of get image url-
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JsoupPrintImages {
public static void main( String[] args ) throws IOException{
Document doc = Jsoup.connect("http://www.javatpoint.com").get();
Elements images = doc.select("img[src~=(?i).(png|jpe?g|gif)]");
for (Element image : images) {
System.out.println("src : " + image.attr("src"));
System.out.println("height : " + image.attr("height"));
System.out.println("width : " + image.attr("width"));
System.out.println("alt : " + image.attr("alt"));
}
}
}
14. Output: get image url
src : http://www.javatpoint.com/images/social/r.png
height :
width :
alt : RSS Feed
src : http://www.javatpoint.com/images/social/m.png
height :
width :
alt : Subscribe to Get Email Alerts
src : http://www.javatpoint.com/images/social/f.png
height :
width :
alt : Facebook Page
src : http://www.javatpoint.com/images/social/g.png
height :
width :
alt : Google Page