• W^X JIT-code enabled in Firefox

    Back in June, I added an option to SpiderMonkey to enable W^X protection of JIT code. The past weeks I've been working on fixing the remaining performance issues and yesterday I enabled W^X on the Nightly channel, on all platforms. What this means is that each page holding JIT code is either executable or writable, never both at the same time.


    Almost all JITs (including the ones in Firefox until now) allocate memory pages for code with RWX (read-write-execute) permissions. JITs typically need to patch code (for inline caches, for instance) and with writable memory they can do that with no performance overhead. RWX memory introduces some problems though:

    • Security: RWX pages make it easier to exploit certain bugs. As a result, all modern operating systems store code in executable but non-writable memory, and data is usually not executable, see W^X and DEP. RWX JIT-code is an exception to this rule and that makes it an interesting target.
    • Memory corruption: I've seen some memory dumps for crashes in JIT-code that might have been caused by memory corruption elsewhere. All memory corruption bugs are serious, but if it happens for whatever reason, it's much better to crash immediately.

    How It Works

    With W^X enabled, all JIT-code pages are non-writable by default. When we need to patch JIT-code for some reason, we use a RAII-class, AutoWritableJitCode, to make the page(s) we're interested in writable (RW), using VirtualProtect on Windows and mprotect on other platforms. The destructor then toggles this back from RW to RX when we're done with it.

    (As an aside, an alternative to W^X is a dual-mapping scheme: pages are mapped twice, once as RW and once as RX. In 2010, some people wrote patches to implement this for TraceMonkey, but this work never landed. This approach avoids the mprotect overhead, but for this to be safe, the RW mapping should be in a separate process. It's also more complicated and introduces IPC overhead.)


    Last week I fixed implicit interrupt checks to work with W^X, got rid of some unnecessary mprotect calls, and optimized code poisoning to be faster with W^X.

    After that, the performance overhead was pretty small on all benchmarks and websites I tested: Kraken and Octane are less than 1% slower with W^X enabled. On (ancient) SunSpider the overhead is bigger, because most tests finish in a few milliseconds, so any compile-time overhead is measurable. Still, it's less than 3% on Windows and Linux. On OS X it's less than 4% because mprotect is slower there.

    I think W^X works well in SpiderMonkey for a number of reasons:

    • We run bytecode in the interpreter before Baseline-compiling it. On the web, most functions have less than ~10 calls or loop iterations, so we never JIT those and we don't have any memory protection overhead.
    • The Baseline JIT uses IC stubs for most operations, but we use indirect calls here, so we don't have to make code writable when attaching stubs. Baseline stubs also share code, so only the first time we attach a particular stub we compile code for it. Ion IC stubs do require us to make memory writable, but Ion doesn't use ICs as much as Baseline.
    • For asm.js (and soon WebAssembly!), we do AOT-compilation of the whole module. After compilation, we need only one mprotect call to switch everything from RW to RX. Furthermore, this code is only modified on some slow paths, so there's basically no performance overhead for asm.js/WebAssembly code.


    I've enabled W^X protection for all JIT-code in Firefox Nightly. Assuming we don't run into bugs or serious performance issues, this will ship in Firefox 46.

    Last but not least, thanks to the OpenBSD and HardenedBSD teams for being brave enough to flip the W^X switch before we did!

  • Testing Math.random(): Crushing the browser

    (For tl;dr, see the Conclusion.)

    A few days ago, I wrote about Math.random() implementations in Safari and (older versions of) Chrome using only 32 bits of precision. As I mentioned in that blog post, I've been working on upgrading Math.random() in SpiderMonkey to XorShift128+. V8 has been using the same algorithm since last week. (Update Dec 1: WebKit is now also using XorShift128+!)

    The most extensive RNG test is TestU01. It's a bit of a pain to run: to test a custom RNG, you have to compile the library and then link it to a test program. I did this initially for the SpiderMonkey shell but after that I thought it'd be more interesting to use Emscripten to compile TestU01 to asm.js so we can easily run it in different browsers.

    Today I tried this and even though I had never used Emscripten before, I had it running in the browser in less than an hour. Because the tests can take a long time, it runs in a web worker. You can try it for yourself here.

    I also wanted to test window.crypto.getRandomValues() but unfortunately it's not available in workers.

    Disclaimer: browsers implement Math functions like Math.sin differently and this can affect their precision. I don't know if TestU01 uses these functions and whether it affects the results below, but it's possible. Furthermore, some test failures are intermittent so results can vary between runs.


    TestU01 has three batteries of tests: SmallCrush, Crush, and BigCrush. SmallCrush runs only a few tests and is very fast. Crush and especially BigCrush have a lot more tests so they are much slower.


    Running SmallCrush takes about 15-30 seconds. It runs 10 tests with 15 statistics (results). Here are the number of failures I got:

    Browser Number of failures
    Firefox Nightly 1: BirthdaySpacings
    Firefox with XorShift128+ 0
    Chrome 48 11
    Safari 9 1: RandomWalk1 H
    Internet Explorer 11 1: BirthdaySpacings
    Edge 20 1: BirthdaySpacings

    Chrome/V8 failing 11 out of 15 is not too surprising. Again, the V8 team fixed this last week and the new RNG should pass SmallCrush.


    The Crush battery of tests is much more time consuming. On my MacBook Pro, it finishes in less than an hour in Firefox but in Chrome and Safari it can take at least 2 hours. It runs 96 tests with 144 statistics. Here are the results I got:

    Browser Number of failures
    Firefox Nightly 12
    Firefox with XorShift128+ 0
    Chrome 48 108
    Safari 9 33
    Internet Explorer 11 14

    XorShift128+ passes Crush, as expected. V8's previous RNG fails most of these tests and Safari/WebKit isn't doing too great either.


    BigCrush didn't finish in the browser because it requires more than 512 MB of memory. To fix that I probably need to recompile the asm.js code with a different TOTAL_MEMORY value or with ALLOW_MEMORY_GROWTH=1.

    Furthermore, running BigCrush would likely take at least 3 hours in Firefox and more than 6-8 hours in Safari, Chrome, and IE, so I didn't bother.

    The XorShift128+ algorithm being implemented in Firefox and Chrome should pass BigCrush (for Firefox, I verified this in the SpiderMonkey shell).

    About IE and Edge

    I noticed Firefox (without XorShift128+) and Internet Explorer 11 get very similar test failures. When running SmallCrush, they both fail the same BirthdaySpacings test. Here's the list of Crush failures they have in common:

    • 11 BirthdaySpacings, t = 2
    • 12 BirthdaySpacings, t = 3
    • 13 BirthdaySpacings, t = 4
    • 14 BirthdaySpacings, t = 7
    • 15 BirthdaySpacings, t = 7
    • 16 BirthdaySpacings, t = 8
    • 17 BirthdaySpacings, t = 8
    • 19 ClosePairs mNP2S, t = 3
    • 20 ClosePairs mNP2S, t = 7
    • 38 Permutation, r = 15
    • 40 CollisionPermut, r = 15
    • 54 WeightDistrib, r = 24
    • 75 Fourier3, r = 20

    This suggests the RNG in IE may be very similar to the one we used in Firefox (imported from Java decades ago). Maybe Microsoft imported the same algorithm from somewhere? If anyone on the Chakra team is reading this and can tell us more, it would be much appreciated :)

    IE 11 fails 2 more tests that pass in Firefox. Some failures are intermittent and I'd have to rerun the tests to see if these failures are systematic.

    Based on the SmallCrush results I got with Edge 20, I think it uses the same algorithm as IE 11 (not too surprising). Unfortunately the Windows VM I downloaded to test Edge shut down for some reason when it was running Crush so I gave up and don't have full results for it.


    I used Emscripten to port TestU01 to the browser. Results confirm most browsers currently don't use very strong RNGs for Math.random(). Both Firefox and Chrome are implementing XorShift128+, which has no systematic failures on any of these tests.

    Furthermore, these results indicate IE and Edge may use the same algorithm as the one we used in Firefox.

  • Math.random() and 32-bit precision

    Last week, Mike Malone, CTO of Betable, wrote a very insightful and informative article on Math.random() and PRNGs in general. Mike pointed out V8/Chrome used a pretty bad algorithm to generate random numbers and, since this week, V8 uses a better algorithm.

    The article also mentioned the RNG we use in Firefox (it was copied from Java a long time ago) should be improved as well. I fully agree with this. In fact, the past days I've been working on upgrading Math.random() in SpiderMonkey to XorShift128+, see bug 322529. We think XorShift128+ is a good choice: we already had a copy of the RNG in our repository, it's fast (even faster than our current algorithm!), and it passes BigCrush (the most complete RNG test available).

    While working on this, I looked at a number of different RNGs and noticed Safari/WebKit uses GameRand. It's extremely fast but very weak. (Update Dec 1: WebKit is now also using XorShift128+, so this doesn't apply to newer Safari/WebKit versions.)

    Most interesting to me, though, was that, like the previous V8 RNG, it has only 32 bits of precision: it generates a 32-bit unsigned integer and then divides that by UINT_MAX + 1. This means the result of the RNG is always one of about 4.2 billion different numbers, instead of 9007199 billion (2^53). In other words, it can generate 0.00005% of all numbers an ideal RNG can generate.

    I wrote a small testcase to visualize this. It generates random numbers and plots all numbers smaller than 0.00000131072.

    Here's the output I got in Firefox (old algorithm) after generating 115 billion numbers:

    And a Firefox build with XorShift128+:

    In Chrome (before Math.random was fixed):

    And in Safari:

    These pics clearly show the difference in precision.


    Safari and older Chrome versions both generate random numbers with only 32 bits of precision. This issue has been fixed in Chrome, but Safari's RNG should probably be fixed as well. Even if we ignore its suboptimal precision, the algorithm is still extremely weak.

    Math.random() is not a cryptographically-secure PRNG and should never be used for anything security-related, but, as Mike argued, there are a lot of much better (and still very fast) RNGs to choose from.

  • Making `this` a real binding in SpiderMonkey

    Last week I landed bug 1132183, a pretty large patch rewriting the implementation of this in SpiderMonkey.

    How this Works In JS

    In JS, when a function is called, an implicit this argument is passed to it. In strict mode, this inside the function just returns that value:

    function f() { "use strict"; return this; }
    f.call(123); // 123

    In non-strict functions, this always returns an object. If the this-argument is a primitive value, it's boxed (converted to an object):

    function f() { return this; }
    f.call(123); // returns an object: new Number(123)

    Arrow functions don't have their own this. They inherit the this value from their enclosing scope:

    function f() {
        "use strict";
        () => this; // `this` is 123

    And, of course, this can be used inside eval:

    function f() {
        "use strict";
        eval("this"); // 123

    Finally, this can also be used in top-level code. In that case it's usually the global object (lots of hand waving here).

    How this Was Implemented

    Until last week, here's how this worked in SpiderMonkey:

    • Every stack frame had a this-argument,
    • Each this expression in JS code resulted in a single bytecode op (JSOP_THIS),
    • This bytecode op boxed the frame's this-argument if needed and then returned the result.

    Special case: to support the lexical this behavior of arrow functions, we emitted JSOP_THIS when we defined (cloned) the arrow function and then copied the result to a slot on the function. Inside the arrow function, JSOP_THIS would then load the value from that slot.

    There was some more complexity around eval: eval-frames also had their own this-slot, so whenever we did a direct eval we'd ensure the outer frame had a boxed (if needed) this-value and then we'd copy it to the eval frame.

    The Problem

    The most serious problem was that it's fundamentally incompatible with ES6 derived class constructors, as they initialize their 'this' value dynamically when they call super(). Nested arrow functions (and eval) then have to 'see' the initialized this value, but that was impossible to support because arrow functions and eval frames used their own (copied) this value, instead of the updated one.

    Here's a worst-case example:

    class Derived extends Base {
        constructor() {
            var arrow = () => this;
            // Runtime error: `this` is not initialized inside `arrow`.
            // Call Base constructor, initialize our `this` value.
            // The arrow function now returns the initialized `this`.

    We currently (temporarily!) throw an exception when arrow functions or eval are used in derived class constructors in Firefox Nightly.

    Boxing this lazily also added extra complexity and overhead. I already mentioned how we had to compute this whenever we used eval.

    The Solution

    To fix these issues, I made this a real binding:

    • Non-arrow functions that use this or eval define a special .this variable,
    • In the function prologue, we get the this-argument, box it if needed (with a new op, JSOP_FUNCTIONTHIS) and store it in .this,
    • Then we simply use that variable each time this is used.

    Arrow functions and eval frames no longer have their own this-slot, they just reference the .this variable of the outer function. For instance, consider the function below:

    function f() {
        return () => this.foo();

    We generate bytecode similar to the following pseudo-JS:

    function f() {
        var .this = BoxThisIfNeeded(this);
        return () => (.this).foo();

    I decided to call this variable .this, because it nicely matches the other magic 'dot-variable' we already had, .generator. Note that these are not valid variable names so JS code can't access them. I only had to make sure with-statements don't intercept the .this lookup when this is used inside a with-statement...

    Doing it this way has a number of benefits: we only have to check for primitive this values at the start of the function, instead of each time this is accessed (although in most cases our optimizing JIT could/can eliminate these checks, when it knows the this-argument must be an object). Furthermore, we no longer have to do anything special for arrow functions or eval; they simply access a 'variable' in the enclosing scope and the engine already knows how to do that.

    In the global scope (and in eval or arrow functions in the global scope), we don't use a binding for this (I tried this initially but it turned out to be pretty complicated). There we emit JSOP_GLOBALTHIS for each this-expression, then that op gets the this value from a reserved slot on the lexical scope. This global this value never changes, so the JITs can get it from the global lexical scope at compile time and bake it in as a constant :) (Well.. in most cases. The embedding can run scripts with a non-syntactic scope chain, in that case we have to do a scope walk to find the nearest lexical scope. This should be uncommon and can be optimized/cached if needed.)

    The Debugger

    The main nuisance was fixing the debugger: because we only give (non-arrow) functions that use this or eval their own this-binding, what do we do when the debugger wants to know the this-value of a frame without a this-binding?

    Fortunately, the debugger (DebugScopeProxy, actually) already knew how to solve a similar problem that came up with arguments (functions that don't use arguments don't get an arguments-object, but the debugger can request one anyway), so I was able to cargo-cult and do something similar for this.

    Other Changes

    Some other changes I made in this area:

    • In bug 1125423 I got rid of the innerObject/outerObject/thisValue Class hooks (also known as the holy grail). Some scope objects had a (potentially effectful) thisValue hook to override their this behavior, this made it hard to see what was going on. Getting rid of that made it much easier to understand and rewrite the code.
    • I posted patches in bug 1227263 to remove the this slot from generator objects, eval frames and global frames.
    • IonMonkey was unable to compile top-level scripts that used this. As I mentioned above, compiling the new JSOP_GLOBALTHIS op is pretty simple in most cases; I wrote a small patch to fix this (bug 922406).


    We changed the implementation of this in Firefox 45. The difference is (hopefully!) not observable, so these changes should not break anything or affect code directly. They do, however, pave the way for more performance work and fully compliant ES6 Classes! :)

  • Bye Wordpress, hello Jekyll!

    This week I migrated this blog from Wordpress to Jekyll, a popular static site generator. This post explains why and how I did this, maybe it will be useful to someone.


    Wordpress powers thousands of websites, is regularly updated, has a ton of features. Why did I abandon it?

    I still think Wordpress is great for a lot of websites and blogs, but I felt it was overkill for my simple website. It had so many features I never used and this came at a price: it was hard to understand how everything worked, it was hard to make changes and it required regular security updates.

    This is what I like most about Jekyll compared to Wordpress:

    • Maintainance, security: I don't blog often, yet I still had to update Wordpress every few weeks or months. Even though the process is pretty straight-forward, it got cumbersome after a while.
    • Setup: Setting up a local Wordpress instance with the same content and configuration was annoying. I never bothered so the little development I did was directly on the webserver. This didn't feel very good or safe. Now I just have to install Jekyll, clone my repository and generate plain HTML files. No database to setup. No webserver to install (Jekyll comes with a little webserver, see below).
    • Transparency: With Wordpress, the blog posts were stored somewhere in a MySQL database. With Jekyll, I have Markdown files in a Git repository. This makes it trivial to backup, view diffs, etc.
    • Customizability: After I started using Jekyll, customizing this blog (see below) was very straight-forward. It took me less than a few hours. With Wordpress I'm sure it'd have taken longer and I'd have introduced a few security bugs in the process.
    • Performance: The website is just some static HTML files, so it's fast. Also, when writing a blog post, I like to preview it after writing a paragraph or so. With Wordpress it was always a bit tedious to wait for the website to save the blog post and reload the page. With Jekyll, I save the markdown file in my text editor and, in the background, jekyll serve immediately updates the site, so I can just refresh the page in the browser. Everything runs locally.
    • Hosting: In the future I may move this blog to GitHub Pages or another free/cheaper host.

    Why Jekyll?

    I went with Jekyll because it's widely used, so there's a lot of documentation and it'll likely still be around in a year or two. Octopress is also popular but under the hood it's just Jekyll with some plugins and changes, and it seems to be updated less frequently.


    I decided to use the default template and customize it where needed. I made the following changes:

    • Links to previous/next post at the end of each post, see post.html
    • Pagination on the homepage, based on the docs. I also changed the home page to include the contents instead of just the post title.
    • Archive page, a list of posts grouped by year, see archive.html
    • Category pages. I wrote a small plugin to generate a page + feed for each category. This is based on the example in the plugin documentation. See _plugins/category-generator.rb and _layouts/category.html
    • List of categories in the header of each post (with a link to the category page), see post.html
    • Disqus comments and number of comments in the header of each post, based on the docs, see post.html. I was able to export the Wordpress comments to Disqus.
    • In _config.yml I changed the post URL format ("permalink" option) to not include the category names. This way links to my posts still work.
    • Some minor tweaks here and there.

    I still want to change the code highlighting style, but that can wait for now.


    After using Jekyll for a few hours, I'm a big fan. It's simple, it's fun, it's powerful. If you're tired of Wordpress and Blogger, or just want to experiment with something else, I highly recommend giving it a try.