Jauntium Tutorial - Quickstart

Overview of Browser Automation with Jauntium

Jauntium is a Java library that allows the user to automate Chrome, Firefox, Safari, Internet Explorer, and other modern browsers in order to easily perform web scraping operations or other automation, such as automated testing of websites. One of the most important classes in the Jauntium library is Browser, which represents a browser window. When the browser loads an HTML page, it creates a Document object. The Document object exposes the content as a searchable tree of Nodes, such as Elements, comments, and text. For example, an HTML document has the following tree structure: it begins with the <html> Element, who's child nodes are <head> and <body> Elements. Each Element contains zero or more attributes (such as <body class='foo'>) and zero or more child nodes. The children can be Text Nodes, Comment Nodes, or other Elements.

In addition to exposing the DOM, Jauntium also provides utility classes for webscraping and automated testing. For example, class Form and Table provide convenience methods for submitting forms and extracting data from tables. Class Browser also exposes the Selenium WebDriver (via Browser.driver), which provides support for Javascript execution, querying with XPath, querying with CSS-selectors, support for multiple windows/frames, screenshot-capturing, and all the other features that are familiar to Selenium users.

To begin using Jauntium, please visit the download page and follow the installation instructions, at which point you will be able to run the examples below.

Example 1: Create a (Chrome) browser window, visit a url, print the HTML.

Example1.java:

//Add system property for chromedriver 	
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");   //setup (edit the path)

Browser browser = new Browser(new ChromeDriver());   //create new browser window
browser.visit("https://gmail.com");                  //visit a url 
System.out.println(browser.doc.outerHTML());	     //print the HTML            
browser.quit();                                      //terminate browser

This example illustrates creating a browser object, visiting a webpage, and printing the HTML. Before running this example, be sure to install Chrome if you have not already done so. It's also necessary to download the ChromeDriver executable for your platform.

On line 2, a System property is created that specifies the path to Chromedriver. You will need to edit this line so that the path points to the location of the Chromedriver executable that you downloaded. On line 4, the Brower object object is constructed, using the ChromeDriver object. When you run the program, this is the point where a Chrome browser window will appear.

When the browser visits a url (line 5), a Document object (browser.doc) is created to represent the HTML content. On line 6, we call the Document's outerHTML() method to retrieve the HTML as a String, which we print. An alternative way to access the HTML source is to call the browser's getSource() method. Finally, on line 7 we call browser.quit(), which terminates the browser and disposes of all its resources (not included in all examples, but recommended as a final step).

Note that we are not restricted to using Chrome as the browser. The constructor for class Browser (line 4) will accept any WebDriver object (ChromeDriver, FirefoxDriver, SafariDriver, InternetExplorerDriver, EdgeDriver, etc). Selenium documentation covers how to create each of these WebDriver objects.

Example 2a: Searching using findFirst.

Example2a.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");    //setup
try{
  Browser browser = new Browser(new ChromeDriver());               //create chrome browser
  
  browser.visit("https://heroku.com");                             //visit a url.
  String title = browser.doc.findFirst("<title>").getChildText();  //get child text of title element.
  System.out.println("Heroku's website title: " + title);          //print the title 

  browser.visit("https://reddit.com");                             //visit another url.
  title = browser.doc.findFirst("<title>").getChildText();         //get child text of title element.
  System.out.println("Reddit's website title: " + title);          //print the title  
}
catch(JauntiumException e){   //if title element isn't found, handle JauntiumException.
  System.err.println(e);          
}

This example illustrates visiting two websites, in each case extracting and printing the title of the webpage.

The document's findFirst(String) method (lines 6 and 10) accepts a tagQuery that (in simple cases) resembles an HTML tag, and searches the document tree until it finds a matching element. It should be noted that the tagQuery "<title>" will match any Element who's tagname is title (case insensitive), whether or not the element has additional attributes. As we'll see in later examples, the tagname portion of the query is actually a regular expression, which provides a powerful syntax for pattern matching. For example, the tagQuery "<h(1|2)>" would match any h1 or h2 tag. Example 11 provides a full account of the tagQuery syntax.

Note that in this example we catch JauntiumException, which is the superclass of all other Jauntium-related checked Exceptions.

Example 2b: Headless browser mode and other Chrome options.

Example2b.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");    //setup
try{
  ChromeOptions options = new ChromeOptions();                //create chrome options object
  options.addArguments("--headless");                         //specify headless mode (no GUI)
  
  Browser browser = new Browser(new ChromeDriver(options));   //create headless browser
  browser.visit("http://northernbushcraft.com");              //visit a url.
  System.out.println(browser.doc.findFirst("<meta>"));        //find & print first meta tag
}
catch(JauntiumException e){ 
  System.err.println(e);          
}

This example illustrates visiting a website in 'headless' mode (no GUI), and then searching for and printing the first meta tag in the document.

On lines 3-4, an options object for Chrome is created with an argument that places Chrome into headless mode, which means that the browser window will not be visible when it opens. See all the different possible option settings available for Chrome. These options include settings for incognito mode, disabling popups, ingnoring SSL certificate errors, specifying HTTP/HTTPS proxies, and much more.

On lines 7-8, the browser visits a url, then searches for and prints the first meta tag in the document, using steps similar to the previous example.

Example 3: Opening HTML from a String and retrieving an Element's text.

Example3.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver()); 
  browser.openContent("<html><body>WebPage <div>Hobbies:<p>beer<p>skiing</div> Copyright 2018</body></html>");
  Element body = browser.doc.findFirst("<body>");
  Element div = body.findFirst("<div>");

  System.out.println("body's child text: " + body.getChildText());//joins child text of body element
  System.out.println("-----------");
  System.out.println("body's text: " + body.getTextContent());   //joins all text within body element
  System.out.println("-----------");
  System.out.println("body's visible text: " + body.getText());  //joins all visible text within body element
  System.out.println("-----------");

  System.out.println("div's child text: " + div.getChildText()); //joins child text of div element
  System.out.println("-----------");
  System.out.println("div's text: " + div.getTextContent());     //joins all text within the div element
  System.out.println("-----------");
  System.out.println("div's visible text: " + div.getText());    //joins all visible text within div element
}
catch(JauntiumException e){
  System.err.println(e);
}

This example illustrates opening HTML content from a String and printing the text content of specific Elements.

On line 6, we see that the findFirst(String) method can be invoked on an Element (or on the Document as on the previous line). When invoked on an Element, the search is restricted to that Element's descendants.

On lines 8-10, we see that the getChildText() method returns the concatenation of the text children of the element, whereas the getTextContent() method returns the concatenation of all text descendants. If an element does not contain any text, either method will return an empty String. In both methods, any entity references (such as  ) are included, as their single-character equivalents.

The variation getTextContent(String, boolean, boolean) accepts additional parameters: the first is a string separator to insert between each joined text, the other two boolean values indicate whether or not to include the text from within HTML comments and from within script tags, respectively.

On line 12, the getText() method is invoked, which returns a String concatenation of all visible Text descendants and omits those that are not visible (for example as a result of CSS styling).

Example 4: Accessing an Element's attributes/properties.

Example4.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());   
  browser.visit("http://intel.com");                       //visit intel.com
  
  Element anchor = browser.doc.findFirst("<a href>");      //find 1st anchor element with href attribute
  System.out.println("anchor element: " + anchor);                         //print anchor element
  System.out.println("anchor's tagname: " + anchor.getTagName());          //print anchor's tagname
  System.out.println("anchor's href attribute: " + anchor.getAt("href"));  //print anchor's href attribute
  System.out.println("anchor's parent Element: " + anchor.getParent());    //print anchor's parent element
			   
  Element meta = browser.doc.findFirst("<head>").findFirst("<meta>");      //find 1st meta element in head 
  System.out.println("meta element: " + meta);                             //print meta element
  System.out.println("meta's tagname: " + meta.getTagName());              //print meta's tagname
  System.out.println("meta's parent Element: " + meta.getParent());        //print meta's parent element	
}
catch(JauntiumException e){              
  System.err.println(e);          
}

This example illustrates visiting a website, searching for specific elements, then accessing various attributes and properties of those elements.

The tagQuery "<a href>" on line 6 specifies not only the tagname but also that the Element must contain an href attribute. As previously noted, the tagname portion of the query is a regular expression. The attributename, however, is not; it is matched as a case-insensitive String.

On lines 7, 10, 13 and 15, an Element's toString() method is implicity called, which returns a String representation of the Element excluding its children. See Example 5 for how to obtain a String representation of an Element that does include its children.

On line 9, the getAt(String) method is called to retrieve the attribute value associated with the (case insensitive) attribute name href. If the anchor tag did not have an href attribute, calling getAt(String) would throw a NotFound Exception. The related method getAtStr(String) differs in that it returns an empty String rather than throwing a NotFound Exception if the attribute value does not exist (not shown). Both the methods discussed above automatically convert relative urls to absolute urls before returning them as an attribute value.

An example of chaining search methods can be seen on line 12, where the document is searched for the first head Element, which is subsequently searched for the first meta Element. In this case, the same result would be obtained by simply calling browser.doc.findFirst("<meta>"). However the latter search would be slower if no meta tag was present, since it would search the entire document rather than only searching the head section.

Example 5: Opening HTML from a file, accessing innerHTML and outerHTML.

colors.htm:

<html>
   <div class='colors'>redgreen</div> 
   <p>visit again soon!</p>
</html>

Example5.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());   
  browser.open(new File("path/to/colors.htm"));                   //open local file (edit this path)
   
  Element div = browser.doc.findFirst("<div class=colors>");      //find div who's class matches 'colors'  
  System.out.println("div's outerHTML():\n" + div.outerHTML());   //no extra indenting
  System.out.println("-------------");
  System.out.println("div's outerHTML(2):\n" + div.outerHTML(2)); //two extra spaces used per indent
  System.out.println("-------------");             
  System.out.println("div's innerHTML():\n" + div.innerHTML());   //no extra indenting
  System.out.println("-------------");
  System.out.println("div's innerHTML(3):\n" + div.innerHTML(3)); //three extra spaces used per indent   	
  System.out.println("-------------");
  
  //make some changes
  div.innerHTML("Presto!");       //replace div's content with different elements.
  System.out.println("Altered document:\n" + browser.doc.innerHTML());  //print the altered document.
}
catch(JauntiumException e){
   System.err.println(e);
}

This example illustrates opening HTML content from a local file, searching for specific elements, printing those elements, then altering the HTML content of an element, and finally printing the entire document. Before running this example, you will need to edit line 4 to point to the location of colors.htm on your local filesystem.

The query used on line 6 can be read as "find the first element which has a tagname that matches the case-insensitive regular expression div and which has an attribute who's name matches the case-insensitive String class, where the value of the attribute matches the case-insensitive regular expression images. Note that on line 6, the attribute value within the query is unquoted and that quotes are optional.

When using the indenting options as on lines 9 and 13, extra whitespace characters are printed to indent each node, so this indending whitespace will appear in addition to any that was already present.

Example 6: Searching by attribute value using regular expressions and downloading files.

Example6.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());   
  browser.visit("http://northernbushcraft.com");                   //visit website
			
  Element imgElement = browser.doc.findFirst("<img src='.*jpg'>"); //find first jpg image
  String url = imgElement.getAt("src");                            //extract image url (src attribute)
  System.out.print("downloading: " + url);                         //print image url
  browser.download(url, new File("result.jpg"));                   //download image
}
catch(JauntiumException e){                         
  System.err.println(e);         
}

This example illustrates using a regular expression in a tagQuery in order to find an element with a matching attribute value. It also illustrates downloading a file.

On line 6 the findFirst method is called. It specifies the regular expression .*jpg for matching the src attribute of an image tag. This regular expression can be read as "match any string that begins with zero or more of any character and terminates with the characters jpg". If you are unfamiliar with regular expressions, see the regular expressions tutorial.

The result of the search is an image Element, from which the src is extracted on line 7, to yield the url of the image. The url is then used to download the file on line 8. Note that the download(String, File) method of class Browser can be used to download not only images, but other content as well, including html files, css files, javascript files, etc.

Tip: In some scenarios, it may be useful download an HTML file in order to access the orginal page source rather than by using Browser.getSource() or Document.outerHTML(), which returns the page source after it has been loaded and potentially altered by the browser.

Example 7: Searching by child text using regular expressions, and following a hyperlink.

links.htm:

<html>
   <a href='http://intelligent.com'>visit intelligent</a>

   <a href='http://intel.com'>visit intel</a>

</html>

Example7.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());  
  browser.visit("http://jauntium.com/examples/links.htm");     
  
  Element link_A = browser.doc.findFirst("<a>visit intelligent"); //returns first link
  System.out.println("link_A: " + link_A.outerHTML());  
  
  Element link_B = browser.doc.findFirst("<a>visit intel");       //returns first link
  System.out.println("link_B: " + link_B.outerHTML());    
  
  Element link_C = browser.doc.findFirst("<a>^visit intel$");     //returns second link
  System.out.println("link_C: " + link_C.outerHTML());  

  link_C.click();                                             //click link
  System.out.println("location: " + browser.getLocation());   //print browser location
}
catch(JauntiumException e){                         
  System.err.println(e);         
}

This example illustrates searching for <a> tags (hyperlinks) on the basis of their child text*, and then clicking one of the hyperlinks.

After opening the HTML file on line 4, we use the tagQuery "<a>visit intelligent" on line 6. This tagQuery can be read as "find the first <a> tag who's child text* matches the regular expression 'visit intelligent'. Note that when searching by child text, the regular expression will accept a substring match (any string that contains 'visit intelligent') as well as a whole-string match.

The important difference between substring matching and whole-string matching is illustrated in the next two searches, on lines 9 and 12. On line 9, the tagQuery contains the regular expression "visit intel". This regular expression actually matches on the text "visit intelligent" of the second link, since it contains 'visit intel' as a substring. In order to restrict the search to whole-string matching, as on line 12, it's necessary to denote the beginning and end of the string in the regular expression. This is acheived with the ^ and $ characters, respectively. The tagQuery on line 12 contains the regular expression ^visit intel$, which can be read as "match any string that starts with a v, followed by i, s, i, t, [space], i, n, t, e, l, followed by end of string." This search matches only the third link.

On line 15 the browser performs a click action on the hyperlink, which sends the browser to a new location. On line 16 we print the new location (url) of the browser. An alternative approach for following a hyperlink is to extract the href value, (ie the url), and then call Browser.visit(url). The advantage of performing a click operation instead is that the click action may trigger javascript code that is important to how the page functions.

* - 'child text' is the concatenation of all child text nodes.

Example 8: Searching using findEach and iterating through search results.

Example8.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");

Browser browser = new Browser(new ChromeDriver());
browser.visit("https://amazon.com");   

Elements tables = browser.doc.findEach("<table>");         //find non-nested table elements 
System.out.println("Found " + tables.size() + " tables:"); //print number of search results
for(Element table : tables){                               //iterate through results
  System.out.println(table.outerHTML() + "\n----\n");      //print each table 
}

This example demonstrates the findEach(tagQuery) method.

On line 6, the findEach method is invoked on the document, so it walks the document tree searching for any Elements that match the query "<table>". Any such elements are returned in an Elements object, which is a container for search results. The defining feature of the findEach search is that it when it finds an element that matches the query, the does not search further into that element. So in this example, the findEach method only returns non-nested tables (ie, does not include tables that occur within other tables).

Class Elements has convenience methods that make the search results themselves easily searchable. The search methods are similar to those already covered in class Element (eg, findFirst, findEach, etc).

One benefit of class Elements itself being searchable is that it allows searches to be easily chained together. A good way of thinking about class Elements is as a <#elements> tag and each of its children is a single search result.

If the findEach(String) method does not locate any Elements that match the tagQuery, an empty Elements container is returned.

Example 9: Searching using findEvery vs. findEach

food.htm:

<html>
  <body>
    <div>vegetables</div>
    <div>fruits</div>
    <div class='meat'>
      Meats
      <div>chicken</div>
      <div>beef</div>
    </p>
    <div class='nut'>
      Nuts
      <div>peanuts</div>
      <div>walnuts</div>
    </div>
  </body>
</html>

Example9.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  browser.visit(("http://jauntium.com/examples/food.htm"));  
  
  Elements elements = browser.doc.findEvery("<div>");               //find all divs in the document
  System.out.println("Every div: " + elements.size() + " results"); //report number of search results.
   
  elements = browser.doc.findEach("<div>");                         //find all non-nested divs
  System.out.println("Each div: " + elements.size() + " results");  //report number of search results.
                                                                    //find non-nested divs within <p class='meat'>
  elements = browser.doc.findFirst("<div class=meat>").findEach("<div>"); 
  System.out.println("Meat search: " + elements.size() + " results");//report number of search results.
}
catch(JauntiumException e){
  System.err.println(e);
}

The findEvery method operates by examining all the descendants of an element (or of a document). Every Element that matchs the tagQuery is added to the Elements container, which is returned by the method. As discussed in the previous example, class Elements is a container for search results that is itself searchable.

On line 6, the findEvery search is invoked on the document, so it retrieves every div Element in the document (eight divs). The findEach method on line 8 retrieves only four divs from the document, since it will not find the nested divs. The last findEach method (line 12) is not invoked on the document object but rather on a particular Element. It retrieves the two divs that are children of <div class='meat'>.

As with the findEach method, if the findEvery method does not find any Elements that match the tagQuery, an empty <#elements> container is returned (no Exception is thrown).

Example 10: Searching using getElement and getEach

food.htm:

<html>
  <body>
    <div>vegetables</div>
    <div>fruits</div>
    <div class='meat'>
      Meats
      <div>chicken</div>
      <div>beef</div>
    </div>
    <div class='nut'>
      Nuts
      <div>peanuts</div>
      <div>walnuts</div>
    </div>
  </body>
</html>

Example10.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{ 
  Browser browser = new Browser(new ChromeDriver());
  browser.visit(("http://jauntium.com/examples/food.htm"));  
  
  Element body = browser.doc.findFirst("<body>");                      //find body element
  Element element = body.getElement(2);                                //retrieve 3rd child element within the body.      
  System.out.println("result1: " + element);                           //print the element
   
  String text = body.getElement(3).getElement(0).getChildText();       //get text of 1st child of 4th child of body.
  System.out.println("result2: " + text);                              //print the text
   
  element = body.findFirst("<div class=meat>").getElement(1);          //retrieve 2nd child element of div
  System.out.println("result3: " + element.outerHTML());               //print the element and its content
   
  Elements elements = body.getEach("<div>");                           //get body's child divs
  System.out.println("result4 has " + elements.size() + " divs:\n");   //print the search results
  System.out.println(elements.innerHTML(2));                           //print elements, indenting by 2
}
catch(JauntiumException e){
  System.err.println(e);
}

This example illustrates a variety of search methods who's names begin with 'get', which indicates that it searches only children (as opposed to 'find' methods, which search all descendants).

The getElement(int) method on line 7 retrieves the first child of the body element. On line 10, several getElement(int) methods are chained together to create a path to the <div>peanut</div> element. On lines 13-14 a similar technique is used to retrieve and print <div>beef</div>. On line 16 the getEach(String) method searches the child elements of <body> for div elements. The results of the search (four divs) are then printed. When reviewing the output, remember that each child of <#elements> constitutes a single search result.

Search Method Summary: a table of search methods

The following table summarizes the most important search methods covered in previous examples.

	First	Each	Every
get	getFirst(String query)	getEach(String query)	--	searches children only
find	findFirst(String query)	findEach(String query)	findEvery(String query)	searches children/descendants to any depth
	searches for first Element that matches the query, returns Element or throws NotFound	searches for matching, non-nested Elements, which are returned in Elements container.	searches for all matching Elements, which are returned in Elements container.

Example 11: More searching with regular expressions.

hello.htm:

<html>
  <body>
    <p id='1'>hi</p>
    <span id='2'>bonjour</span>
    <div id='3'>hola</div>
    <p id='4'>ahoj</p>
  </body>
</html>

Example11.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
Browser browser = new Browser(new ChromeDriver());
browser.visit("http://jauntium.com/examples/hello.htm");

Elements elements = browser.doc.findEvery("<div|span>");    //find every element who's tagname is div or span
System.out.println("results1:\n" + elements.innerHTML());   //print the search results

elements = browser.doc.findEvery("<p id=1|4>");             //find every p element who's id is 1 or 4
System.out.println("results2:\n" + elements.innerHTML());   //print the search results

elements = browser.doc.findEvery("< id=[2-6]>");            //find every element (any name) with id from 2-6
System.out.println("results3:\n" + elements.innerHTML());   //print the search results
     
elements = browser.doc.findEvery("<p>ho");                  //find every p who's child text contains 'ho'
System.out.println("results4:\n" + elements.innerHTML());   //print the search result

elements = browser.doc.findEvery("<p|div>^ho");    //find every p or div who's child text starts with 'ho'
System.out.println("results5:\n" + elements.innerHTML());  //print the search result

elements = browser.doc.findEvery("<p>^(hi|ahoj)"); //find every p who's child text starts with 'hi' or 'ahoy'
System.out.println("results6:\n" + elements.innerHTML());  //print the search result

This example illustrates using regular expressions within tagQueries. [Note that regular expressions written in Java must have double downslashes rather than single downslash when writing an escape sequence]. A tagQuery has the general form:

<tagnameRegex attributeName='attributeValueRegex'>childTextRegex

where multiple attributes are allowed. In order for the query to match against an element, all parts of the query (ie, the tagnameRegex, attribute name, attributeValueRegex and childTextRegex) must match if they are specified.

tagnameRegex:: If tagnameRegex is a whitespace character, it will match any tagname. Otherwise, the tagnameRegex will be treated as case-insensitive and be evaluated against entire tagnames (ie will not match substrings). The tagnameRegex must begin with either an alphabetical character or a round opening bracket, and may not contain whitespace (though it may contain \\s, which matches any whitespace character)
attributeName:: If no attributes are included in the query, the query will match any attributes in a candidate element (including one without attributes). Otherwise, the attributeName in the query is matched as a case-insensitive string, not as a regular expression.
attributeValueRegex:: If attributeValueRegex is not present, the attributeName in the query will be matched against candidate attributeNames irrespective of their attributeValues. If attributeValueRegex is present, it will be treated as case-insensitive and be evaluated against the entire corresponding attribute value (ie will not match substrings).
childTextRegex:: If childTextRegex is not present, the query will match any child text (including lack of text). Otherwise, childTextRegex will be evaluated against the concatenation of Text children of the Element. It's important to note that the childTextRegex is case sensitive and will match against substrings.

Example 12: Filling-out form fields in sequence using Document.apply().

signup.htm:

Example12.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  browser.visit("http://jauntium.com/examples/signup.htm");
		   
  browser.doc.apply(       //fill-out the form by applying a sequence of inputs
    "tom@mail.com",        //apply text to textfield 
    "advanced",            //select a menu item by label
    "no comment",          //apply text to textarea
    "no thanks"            //select radiobutton by label 
  ).submit("create trial account");           //click submit button specified by label
  System.out.println(browser.getLocation());  //print the current location (url)	
}
catch(JauntiumException e){
  System.out.println(e);			
}

This example illustrates using the Document.apply(...) method to fill out a sequence of form fields. The apply(...) method allows the user to fill-out editable fields by specifying a sequence of input values. The input values are applied starting at the first field in the form (or starting at whichever field currently has focus).

On lines 6-8, the apply(...) method is called. It can be used for filling-out any sequence of editable and visible textfields, password fields, textareas, radiobuttons, checkboxes, or menus. In this case, the sequence of inputs has the following effect: it fills-out the textfield with tom@mail.com, selects the menu option that matches the case-insensitive string advanced, fills-out the textarea with the text no comment, and finally selects the radiobutton who's label matches the case-insensitive string 'no thanks'. Although not shown in this example, boolean values (true/false) can be applied in order to check/uncheck checkboxes and the string "\t" can be applied in order to skip to the next field.

On line 11, the submit button is pressed by invoking the submit method of the Form object, which is returned by Document.appy(...). The parameter ("create trial account") is used to target the submit button of the form that has matching* text. The Form object is discussed in more detail in the next example. On line 12, the url of the followup page is retrieved using getLocation(), and is printed.

It is worth noting that like the Document object, the Form object also has an apply(...) method. So in a similar fashion, a sequence of inputs can be applied to a specific form, which is useful when targeting a specific form in a document that has multiple forms.

*matched as a case-insenstive String.

Example 13: Filling-out form fields by label with the Form object (textfields, password fields, checkboxes).

Example13.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  browser.visit("http://jauntium.com/examples/login.htm");
  Form form = browser.doc.getForm(0);           //get a Form object for the first form in the document
			
  form.filloutField("Username:", "tom");        //fill out the form field labelled 'Username:' with "tom"
  form.filloutField("Password:", "secret");     //fill out the form field labelled 'Password:' with "secret"
  form.chooseCheckBox("Remember me");           //choose the checkbox right-labelled 'Remember me'.
  form.submit();                                //submit the form
  System.out.println(browser.getLocation());    //print the current location (url)
}
catch(JauntiumException e){
  System.out.println(e);			
}

Using the Form component is a convenient way to fill-out a specific field in a particular form. This example illustrates using the Form component to fill-out/manipulate text fields and checkboxes on the basis of how they are visibly labelled (ie, based on the text that appears adjacent to the input field, or the 'placeholder' label that appears within the field itself).

On line 5 the Document.getForm(int) method is used to create a form component for the first form in the document, by specifying its index (starting at index 0 for the first form). It is also possible use a tagQuery to target a specific form, using Document.getForm(String tagQuery). See class Document for several other options as well.

On lines 7 and 8, the filloutField(String, String) method of the Form is called, which can be used for filling out either textfields, password fields, or textarea fields. The first argument is a String used to match* the text label that appears to the left of the field. The second argument is the value to be entered into the field.

On line 9, the chooseCheckBox(String) method is called. The parameter is a String used to match* the label that occurs on the right side of the checkbox. On line 10, the form is submitted and on the following line the location of the browser (current url) is printed.

Be aware that each time a field has its value set, the focus shifts to the next field in the form. Knowing which field has focus can be useful, since as previously mentioned, the Form apply(Obect ... args) method can be called to continue filling out the next/remaining fields starting from the field that currently has focus.

* - String matching is performed in a case-insensitive and white-space-insensitive manner.

Example 14: Filling-out form fields by label with the Form object (menus, textareas, radiobuttons)

signup.htm:

Example14.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  browser.visit("http://jauntium.com/examples/signup.htm");
			
  Form form = browser.doc.getForm(0);   		
  form.filloutField("E-mail:", "tom@mail.com");     //fillout the textfield labelled "E-mail:"
  form.chooseMenuItem("Account Type:", "advanced"); //choose "advanced" from the menu labelled "Account Type:"
  form.filloutField("Comments:", "no comment");     //fill out the textarea labelled "Comments:"
  form.chooseRadioButton("No thanks");              //choose the radiobutton labelled "No thanks"
  form.submit("create trial account");              //click the submit button labelled 'create trial account'
  System.out.println(browser.getLocation());        //print the current location (url)
}
catch(JauntiumException e){
  System.out.println(e);			
}

As with the previous example, this example illustrates using the Form component to fill out form fields by label.

The filloutField(String, String) method, seen on lines 7 and 9, was covered in example 13. On line 8, the chooseMenuItem(String, String) method is used to select a menuitem of a particular menu. The first parameter is a String used to locate a particular menu by matching* its (left-side) text label. The second parameter is a String that is matched* against menuitem text to select the first one that matches.

The chooseRadioButton(String) method (line 10) is used to select a radio button. The String parameter is used to match* the label that occurs on the right side of the radiobutton. See also the Form method chooseRadioButton(String, LabelSide) which provides the ability to select a radiobutton by matching text on either the left or right side of the radiobutton.

On line 10, the form is submitted by specifying a String to match* the text of a particular submit button. If no matching button label is found, a NotFound Exception is thrown, which is a subclasses of JauntiumException (caught on line 14). The ability to select a button by its text is useful when the form has more than one submit button. Otherwise it may be sufficient to use the Form method submit(), which is equivalent to calling the javascript submit() method on the form.

* - String matching is performed in a case-insensitive and whitespace-insensitive manner.

Example 15: Filling-out a form by manipulating input Elements.

signup2.htm:

<html>
Sign up:<br>
<form name="signup" action="http://jauntium.com/examples/signup2Response.htm">
  E-mail:<input type="text" name="email"><br>
  Password:<input type="password" name="pw"><br>
  Remember me <input type="checkbox" name="remember"><br>
  Account Type:<select name="account"><option>regular<option>advanced</select><br>
  Comments:<br><textarea name='comment'></textarea><br>
  <input type="radio" name="inform" value="yes" checked>Inform me of updates<br>
  <input type="radio" name="inform" value="no">No thanks<br>
  <input type="submit" name="action" value="create account">
  <input type="submit" name="action" value="create trial account">
</form>
</html>

Example15.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  Document doc = browser.visit("http://jauntium.com/examples/signup2.htm");
			
  doc.findFirst("<input name=email>").setAttribute("value", "tom@mail.com");
  doc.findFirst("<input name=pw>").setAttribute("value", "abc123");
  doc.findFirst("<input name=remember>").setAttribute("checked", "true");
  doc.findFirst("<option>advanced").setAttribute("selected", "true");
  doc.findFirst("<textarea name=comment").innerHTML("no comment at this time");
  doc.findFirst("<input name=inform value=no>").click();
  doc.findFirst("<input type=submit value='create trial account'>").click();
  System.out.println(browser.getLocation());//print the current location (url)
}
catch(JauntiumException e){
  System.out.println(e);			
}

This example illustrates filling out and submitting a form by manipulating its Elements directly, which is a relatively low-level technique of form manipulation compared to using Document.apply(...), Form.apply(...) or using the setter methods of the Form component (discussed in examples 12-14). The techniques used in this example are not new material; searching using findFirst and setting attributes are covered in earlier examples.

On lines 6-11 various input elements are located by using findFirst(String). Each input element is then modified to change its default/blank value to the intended value. The form is then submitted on line 12 by invoking the click() method on the submit button. The location (url) of the followup page is then printed on line 13.

Example 16: Traversing nodes to access elements, text and comments.

goodbye.htm:

<html>
  <body>
    Goodbye Cruel World
	<br>	
	<!--html comment-->
	&copy; 2018
  </body>
</html>

Example16.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  browser.visit("http://jauntium.com/examples/goodbye.htm"); 
 
  Node bodyNode = browser.doc.findFirst("<body>").toNode();   //get body node
  List<Node> childNodes = bodyNode.getChildNodes();     //get list of child nodes
  for(Node node : childNodes){							//print each child node
    System.out.println("NODE:");
    System.out.println(node.toString());
    System.out.println("---");
  }
}
catch(JauntiumException e){
  System.out.println(e);			
}

This example illustrates how to access HTML elements, comments, and text as Node objects.

On line 6, the body element is located by using the tagQuery <body> and is then converted to a Node by calling the toNode() method. On the following line, the child nodes of the body element are retreived as a List by calling the method getChildNodes(). On lines 8-12 we iterate through each Node in the List and print it out by calling the Node's toString() method.

Note that once an element has been converted to a Node, it can be converted back to type Element by calling Node.toElement(). Node objects also have the methods nextNodeSibling() and previousNodeSibling() for moving to the previous and next node sibling, respectively.

Example 17: Table traversal

stocks.htm:

<html>
  <table class="stocks" border="1">
    <tr><td>MSFT</td><td>GOOG</td><td>APPL</td></tr>
    <tr><td>$31.58</td><td>$896.57</td><td>$465.25</td></tr>
  </table>
</html>

Example17.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  browser.visit("http://jauntium.com/examples/stocks.htm");
			   
  Element table = browser.doc.findFirst("<table class=stocks>");    //find table element
  Elements tds = table.findEach("<td|th>");                         //find non-nested td/th elements
  for(Element td: tds){                                             //iterate through td/th's
    System.out.println(td.outerHTML());                             //print each td/th element
  }
}
catch(JauntiumException e){
  System.out.println(e);
}

This example does not introduce any new concepts. Rather, rather it illustrates a technique for traversing a table by seaching for an navigating through the elements that constitute each cell (<td> or <th>).

On line 6, the findEach(String) method is used to collect every non-nested td/th descendant of the table element. The parameter "<td|th>" is a tagQuery that uses the regular expression td|th to match the tagname.

Example 18: Table text extraction using the Table component.

schedule.htm:

Example18.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  browser.visit("http://jauntium.com/examples/schedule.htm");
  Element tableElement = browser.doc.findFirst("<table class=schedule>");   //find table Element
  Table table = new Table(tableElement);                    //create Table component 

  System.out.println("\nText of first column:");                            
  List<String> results = table.getTextFromColumn(0);       //get text from first column
  for(String text : results) System.out.println(text);     //iterate through results & print     
      
  System.out.println("\nText of column containing 'Mon':");
  results = table.getTextFromColumn("Mon");                //get text from column containing 'Mon'
  for(String text : results) System.out.println(text);     //iterate through results & print 
		   
  System.out.println("\nText of first row:");
  results = table.getTextFromRow(0);                       //get text from first row
  for(String text : results) System.out.println(text);     //iterate through results & print 
    
  System.out.println("\nText of row containing '2:00pm':");
  results = table.getTextFromRow("2:00pm");                //get text from row containing '2:00pm'
  for(String text : results) System.out.println(text);     //iterate through results & print
      
  System.out.println("\nCreate Map of text from first two columns:");  
  Map<String, String> map = table.getTextFromColumns(0, 1);//create map containing text from cols 0 and 1
  for(String key : map.keySet()){                          //print keys (from col 0) and values (from col 1) 
    System.out.println(key + ":" + map.get(key));           
  }
}
catch(JauntiumException e){
  System.out.println(e);
}

This example illustrates the text extraction method of the Table component, which is a utility object that makes it easy to extract text content of a particular row, column, or cell of an HTML table. It can also be used to to extract text from two columns into a Map, where one column constitutes the keys of the Map and the second column constitutes the values.

On line 4, the target table is located using the tagQuery <table class=schedule>; the table Element is then passed into the constructor for a Table component. Note that the Document object provides a number of alternative ways exist to create a Table component in a single step, including Document.getTable(String tagQuery) and Document.getTableByText(String... regex).

As you peruse each data extraction method, note that several of them accept a regular expression for matching the text within a particular cell (td/th element). These regular expressions are matched in a case-insentive way against the visible text of the td/th elements (see Element.getText()). Regular expressions are matched against text using Matcher.matches(), which performs whole-string matching as opposed to substring matching. In cases where there is more than one td/th that matches the regular expression, the first matching cell is used, where the table is processed row by row, left to right, top to bottom.

Example 19: Table cell extraction using the Table component.

schedule.htm:

Example19.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
try{
  Browser browser = new Browser(new ChromeDriver());
  browser.visit("http://jauntium.com/examples/schedule.htm");
  Element tableElement = browser.doc.findFirst("<table class=schedule>");   //find table Element
  Table table = new Table(tableElement);                    //create Table component 

  System.out.println("\nCell at position 3,3:");
  Element element = table.getCell(3,3);                    //get element at col index 3, row index 3
  System.out.println(element.outerHTML());                 //print element         

  System.out.println("\nCell for Fri at 10:00am:"); 
  element = table.getCell("Fri", "10:00am");               //get element at intersection of 'fri' and '10:00am'
  System.out.println(element.outerHTML());                 //print element
}
catch(JauntiumException e){
  System.out.println(e);
}

This example illustrates the cell-extraction methods of the Table component. The Table component makes it easy to extract the td/th elements for a paricular cell.

On line 6, a table component is aquired via an element query (queries are covered in the examples on search methods). It can also be aquired by calling Document's getTable(int) method, or using other variations of that method mentioned in the previous example.

The getCell method online 13 accepts several parameters that are regular expression for matching the text within a particular cell (td/th element). These regular expressions are matched in a case-insentive way against the innerText() of the td/th elements. In cases where there is more than one td/th that matches the regular expression, the first encountered cell will constitute the match, where the table is processed row by row, left to right, top to bottom.

Example 20: Pagination Discovery

Google's pagination:

Example20.java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");  //setup
try{
  Browser browser = new Browser(new ChromeDriver());    
  browser.visit("https://google.com");              //visit google.com
  browser.doc.apply("seashells").submit();          //apply search term and submit form

  String nextPageUrl = browser.doc.nextPageUrl();   //extract url to next page of results
  browser.visit(nextPageUrl);                       //visit next page (p 2).
}  
catch(JauntiumException e){
  System.err.println(e);
}

This example illustrates using the pagination discovery feature of class Document in order to navigate through a series of paginated web pages, such as those produced by a search engine or database-type web interfactes.

On line 4 the browser visits google.com, and on the following line a search for "seashells" is applied to the textfield and then submitted. At this point in the program the browser window will display a page of search results with pagination links visible at the bottom of the screen.

On line 7 the method nextPageUrl() is invoked, which returns the url to the next page of results, which in this case is for page 2. On the next line, the browser visits the url for page 2.