Categoria: Web Technologies

Jericho Selector, a.k.a jQuery for Java

CSS3

I’m pleased to announce the first stable release of Jericho Selector.

Jericho Selector is an extension to the known library Jericho HTML Parser that allows you to select elements from an HTML document just like you do with jQuery, using CSS selectors.

Why Jericho? Different from jsoup, it allows you to modify the document keeping the original formatting. Other libraries rewrite the entire document. Anyway, I like to have a choice! 😉

Jericho Selector is completely free. It uses MIT license.

How to use

Jericho Selector is available at Maven Central Repository, so you just need to add the following dependency to your project:

<dependency>
    <groupId>br.com.starcode.jerichoselector</groupId>
    <artifactId>jericho-selector</artifactId>
    <version>1.0.1-RELEASE</version>
</dependency>

Import the static method $ that is the entry point for Jericho Selector:

import static br.com.starcode.jerichoselector.jerQuery.$;

Then you can query HTML elements just like jQuery:

$(html, "p.my-text")

What has been done

Before implementing Jericho Selector, I had to implement a full CSS parser. In order to do that, I created another library called parCCSer. It was based on the official W3C CSS3 specification and covers almot all of the specification, except some details that are valid only in the context of a browser and also something related to UTF-8 support that I considered not entirely necessary. It’s also under MIT license.

Jericho Selector then uses the object tree generated by parCCser, as the Jericho HTML Parser API, to query the HTML document elements given a CSS selector.

All implementitions are covered by unit testes above 90%, without taking in account excepcional cases that the plugin are not able to analyse.

What is coming

In the next weeks I aim to add some fluent API features to Jericho Selector, similar to jQuery, so you can make some operations using lambdas, for example. Methods like closest, parentsUntil, find, each are my priorities.

Another point to improve is the performance. Specific selectors can be optimized using cache or specific Jericho HTML Parser methods like getAllElementsByClass.

What you can do

Report any problem and suggest new features!

Souce code

You can get check out the source code from GitHub account:

https://github.com/utluiz/jericho-selector/

Running Python (or any language) in your browser

browsers-war-java-script-engines

Some time ago a very interesting question arose in StackOverflow in Portuguese about running an arbitrary programming language in the browser, just like Javascript does.

How could we run our own language in a wesob page? Let’s face the challenge and implement it!

Requirements

There are two ways to include Javascript in HTML documents:

Inside a <script> tag:

<script type="text/javascript">
alert('javascript running');
</script>

From an external source:

<script type="text/javascript" src="script.js"></script>

We want to use our language in the same fashion.

Inside a <script> tag:

<script type="text/ourlanguage">
...
</script>

From an external source:

<script type="text/ourlanguage" src="our-script.abc"></script>

We won’t consider inline code, as in onclick, onchange, and other HTML tag attributes.

Keep in mind that <script> tags have certain characteristics we want to preserve;

  • They’re executed sequentially, in the order they appear in HTML.
  • If a script has the src attribute defined, it has to be downloaded and executed before the browser continues to parse the rest of the page or execute any other script. This is the reason we should only include scripts in the end of the source (close to </body>) whenever it’s possible.

Also, our new language should coexist with Javascript code in the web page, as in the example below:

<script type="text/javascript">
var a = 6;
</script>
<script type="text/ourlanguage">
print(a) # should prints 6
</script>

Working the solution

As browsers ignore script languages they don’t recognize, there is no problem if we use the <script> tag with an arbitrary type attribute. But how can we do the browser run it?

At first, I thought the only solution would be through a generic loader in javascript, which would able to load the entire page and process its content, adding all the HTML tags programmatically to the DOM and running scripts as they would be found.

It’s nothing more, nothing less, than a Javascript version of PHP or some template processor, isn’t it? It looks to complicated to be useful.

However, what I found out is that modern browsers provide an API to intercept web page changes, including those when new tags are added to the DOM during the HTML parsing. We’re talking about MutationObserver.

The possibilities are extraordinary!

Using MutationObserver

To use MutationObserver we just need to instantiate it passing a listener function to the constructor. The function will receive changes in the page as an argument. Then, we just call observe method in the instance passing the element we want to monitor. In this case, it’s the entire document.

Let’s see a basic example:

// creates the observer instance
var mutationObserver = new MutationObserver(function(mutations) {

    // iterates over the changes in the document
    mutations.forEach(function(mutation) {

        // gets all added nodes
        var addedNodes = mutation.addedNodes;

    });

});

// starts to monitor for changes in any level of the document
mutationObserver.observe(document, { childList: true, subtree: true })

Now we can check if there’s added notes of our language and then delegate the execution to its interpreter.

Choosing a language: Python

Which language to use? It should be a high-level, powerful, productive language when compared to Javascript.

Python looks like a great choice!

Building (or borrowing) an interpreter

The first step would be to build an interpreter in Javascript for our language. However, we won’t reinvent the wheel! There is implementations of the language for various platforms already. In Javascript we have a Python 3 implementation called Brython.

This project accomplishes partially our requirements about execution in <script> tags and acessing Javascript objects and variables. But Python scripts run only after the webpage is fully loaded in the the onload event from body. We want it like movie credits, that is, in the order of appearance of <script> tags, remember?

Look at the original Brython use case:

<body onload="brython({debug:0, cache:'none'})">

But, we can use MutationObserver to delegate Python scripts to Brython as they are found in the web page, offering a better integration and meeting our requirements.

Delegating the execution to Brython

Our language depends on Brython interpreter, so we need to know how to delegate scripts to it.

The bad news is that Brython don’t provide a public API to do that. The good news is that we’re dealing with an open-source project, so we can inspect the source to find out how he does its magic!

All we need is in the following excerpt:

//defines the module name as 'main' because it's in the main page
**BRYTHON**.$py_module_path['**main**'] = window.location.href;

//runs the analyzer/converter to Javascript 
var $root = **BRYTHON**.py2js($python_source, '**main**');

//gets the actual Javascript code
$javascript = $root.to_js();

Choosing the name for our scripting language: Pyscript

Our scripting language deserves a good name. As luizscript is too ugly, I decided to calculate:

Javascript – Java + Python = Pyscript

So, the tag will be like this:

<script type="text/pyscript"></script>

Adding support to Pyscript language

With all we have learned until now, let’s implement a solution using MutationObserver to add support to Pyscript in a web page.

This is the main implementation in the file pyscript.js:

//inits brython 
brython();

//creates observer instance
var mutationObserver = new MutationObserver(function(mutations) {

    //iterates over document changes
    mutations.forEach(function(mutation) {

        //gets new node
        var node = mutation.addedNodes[0];

        //is it the type we want?
        if(node && node.tagName === 'SCRIPT' && node.type === 'text/pyscript') {

            //test log
            console.log('Pyscript found!');

            //python source
            var $src;

            //If src attribute is found, do a synchronous ajax to get 
            //the code in order to execute it immediately
            if (node.src!=='') {

                if (window.XMLHttpRequest){
                    // for IE7+, Firefox, Chrome, Opera, Safari
                    var $xmlhttp = new XMLHttpRequest();
                } else {
                    // for IE6, IE5
                    var $xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
                }
                $xmlhttp.open('GET', node.src, false);
                $xmlhttp.send();
                if ($xmlhttp.readyState === 4 && $xmlhttp.status === 200) {
                    $src = $xmlhttp.responseText;
                }
                if ($src === undefined) { // houston, we have a problem!!!
                    console.log('Error loading pyscript: ' + node.src);
                    return;
                }

            } else {

                //without src, source is tag content
                $src = node.textContent || node.innerText;

            }

            //python -> javascript
            __BRYTHON__.$py_module_path['__main__'] = window.location.href;
            var $root = __BRYTHON__.py2js($src, '__main__');
            var $jssrc = $root.to_js();

            //eval in global scope
            if (window.execScript) {
               window.execScript($jssrc);
               return;
            }

            //fix for old browsers
            var fn = function() {
                window.eval.call(window, $jssrc);
            };
            fn();
        } 
    });    

});

//init observer, monitoring changes in all nodes of any level
mutationObserver.observe(document, { childList: true, subtree: true });

The code above initiates an observer (MutationObserver) for all elements that will be loaded in the webpage. If a <script> tag is found with language (type attribute) equals text/pyscript, the code will be converted by Bryton. If the tag has the src attribute, we do an Ajax synchronous call to get the source and then run it immediately.

Using in a web page

In order to use Pyscript language in any web page, we need to include both Brython and our MutationObserver implementation, in this order, in the beginning of the HTML code.

See this example:

<html> 
<head> 
<script type="text/javascript" src="brython.js"></script> 
<script type="text/javascript" src="pyscript.js"></script>

And – voilà – we have our own language running in a web page! 😀

A practical example

<!DOCTYPE html>  
<html> 
<head> 
<meta charset="UTF-8"> 
<title>Pyscript Test Page</title>

<!-- init brython and pyscript --> 
<script type="text/javascript" src="brython.js"></script> 
<script type="text/javascript" src="pyscript.js"></script>

<!-- set javascript variable --> 
<script type="text/javascript"> 
var value = 1; 
</script>

<!-- python: print text and javascript variable --> 
<script type="text/pyscript"> 
print('Print Test!!!') 
print(value) 
</script>

</head>

<body> 
    <div id="content">Content Here</div> 
</body>

<!-- python: browser interaction --> 
<script type="text/pyscript"> 
from _browser import doc, alert

alert('Variable value: ' + str(value))

mylist = ['items', 'of', 'mylist'] 
doc['content'].text = 'Content from python: ' + ' '.join(mylist) 
</script>

<!-- python: execute external script --> 
<script type="text/pyscript" src="myscript.py"></script>

</html>

In the page above, after including the two dependencies in the beginning, there’s many script tags. I’ll explain what each one does:

  • The first one runs Javascript and sets the value variable.
  • The second runs Python and prints the value variable in the browser console.
  • The third runs Python and interacts with the browser, showing an alert box and manipulating the text of the <div id="content"> element.
  • The fourth runs Python from an external file.

The content of myscript.py file is very simple. have a look at it:

d = { '11': 'one', '22': 'two' }
for i in d: print(i, '=', d[i])

The script above creates a dictionary d (dict) and then outputs its keys and values in a for loop.

If you look back to the third script, you’ll notice the _browser library. It’s from Brython and gives us access to browser features in a simple and straightforward manner.

Notice that Brython doesn’t have all its libraries in the main script we included in the HTML. For each import it downloads the corresponding library, unless it was previously loaded in a previous import or included in a <script> tag.

Check it out

Do you want to see a working example? It’s available in my GitHub site: http://utluiz.github.io/pyscript/!

Note: do not forget to open the developer tools of your browser (F12) and check the console output!

Wanna see the source code? It’s availabel in my GitHub account: https://github.com/utluiz/utluiz.github.io!

Performance issues

At this point, good engineers should be very worried about the performance of this solution.

Well, you probably know about a lot of initiatives from bowser developers (Google, Mozilla, Microsoft) to improve Javascript performance. Today, running code in a browser is not a problem anymore. But is it enough to support something like a new language?

Obviously the conversion from Python to Javascript is very, very slow! On the other hand, Brython caches the translated code. And it gets even better. Looking Brython source code over, I found that it uses HTML5 local storage API to cache translated scripts, so the cache persists through page reloading.

“And about the runtime?”, you can ask. The final code is pure Javascript, so the browser is able to do the same optimizations it does with native Javascript code! Well, of course it doesn’t mean it’ll be as efficient as native code, after all there is a natural overhead due to instruction “adaptation” and each Python instruction can be translated into several Javascript commands.

Limitations

Brython has some limitations.

For instance, Javascript code can’t access objects and variables defined in Python. But I think it’s not a great need anyway.

Another point is about inline code as I have mentioned in the beginning. For now, we can’t do something like this:

<button onclick="python: print(1)">Button</button>

Indeed, it’s not impossible. We could manipulate the onclick attribute in our MutationObserver implementation, replacing the value by a Javascript function that delegates the execution to Brython!

Well, this article is bit long already, so I’ll assign it to the reader as homework. 😉

Anyway, you should know that is much better don’t use event attributes. It’s a good practice doing unobstructive Javascript through listener. Brython docs tell us how we can do that:

btn.bind('click', show)

Browser compatibility

According to the documentation, MutationObserver is compatible with the following browsers:

  • Google Chrome: version 26 or greater
  • Mozilla Firefox: version 14 or greater
  • Internet Explorer: version 11 or greater
  • Opera: version 15 or greater
  • Safari: version 6 or greater

As always, Internet Explorer is late. However, we’re looking to the future now. In a few years the majority of the IE users will be able to run this solutions without any problem.

Brython tell us nothing about compatibility, but in its source code I found specific implementations to IE 6 and 7. Therefore the limiting factor dwells with MutationObserver.

Final thoughts

In practice, I believe this kind of implementation is not viable for real applications. However, it was a great exercise!

What to expect in the future? Many developers are anxious to run their favorite language!

Today, we have the JVM already supporting many languages like Scala, Ruby, Python, PHP, and so forth. Perhaps, in a near future, we’ll testimony freedom to run any language in a web browser!

Creative Commons O blog State of the Art de Luiz Ricardo é licenciado sob uma Licença Creative Commons. Copie, compartihe e modifique, apenas cite a fonte.