Transpilers: How They Work and How To Build Your Own JS Transpiler

Engineering

The JavaScript ecosystem has been bubbly and foaming with new frameworks, modules, and lots more in recent years with the advent of Nodejs and NPM. Once on-the-brink-of-extinction JS evolved to become the sought-after-programming-language in the world.

Now, In the JS world, the syntaxes have shifted a lot (still shifting), and much has been added. We have the

  • ES5
  • ES6
  • ES7
  • ES8

All these new releases of ES* comes with new features. Old browsers will become incompatible with websites built with a newer feature or newer ES.

If for example, Firefox v1 supports the A1 feature in ES5 when ES6 is released with the A2 feature, Firefox v1 won’t be able to support the A2 feature but a new Firefox V2 release will be built to support the A2 release so what happens to poor users still using the old Firefox v1?

This is where transpilers come to help.

What are Transpilers?

Compilers translate code from one language to another ex. Java to bytecode. Transpilers translate code to the same language

Transpilers transform the code of a language into another form of the same language. Like Java transpiler translates a form of Java code to another form of Java code.

So a JavaScript transpiler converts a form of JS code to another form of JS. There are several transpilers that translate ES6 code to ES5:

  • Babel
  • TypeScript
  • Traceur

A new addition to ES6 is the use of classes:


class Book {
  addBook() {}
  removeBook() {}
  static getOneBook() {}
}

An old browser that doesn't support class will throw an incompatibility error. Another form of writing a class is the function keyword and prototype keyword. The above could be done using function:


function Book() {}

Book.prototype.addBook = function () {};

Book.prototype.removeBook = function () {};

Book.getOneBook = function () {};

So when the transpiler detects the browser doesn’t support the class, it would convert the Book class to its function version. It would parse through the code and output the function version.

So a transpiler scans through the source-code, generates tokens and ASTs from the source, and finally, an interpreter outputs the source.

The difference between an Interpreter and Compiler

An interpreter scans through the code, generates syntax trees and executes the instructions one after another.

For example, we have an array of statements:


// interpreter.js

const codeBody = [
    new Function('isNan', [
       new VarDecl('ii', new Literal(0)),
       new CallStmt('log', [new Literal('ii')]),
       new Assign('ii', new Literal(3)),
       new CallStmt('log', [new Literal('ii')]),
   ])
   new CallStmt('isNan', [])
]

const visitor = new Visitor();
codeBody.forEach((stmt)=> stmt.visit(visitor))

Output:


0

3

A compiler transforms a language into another language, for example the C/C++ code to Machine code, JS code to bytecode.


var d = 90;
d = 900;
d += 3;


mov d, 90

mov d, 900

add d, 3

Transpilers Mechanism

The first step starts with generating tokens from the source code. Then, the tokens are transformed into a syntax tree called AST. The AST is parsed by the parser and fed to the output interpreter which prints out the corresponding form.

In this post, I’ll skip the generating tokens stage and the generating AST stage. We will start with the output interpreter

Using our former example:


class Book {
  addBook() {}
  removeBook() {}
  static getOneBook() {}
}


function Book() {}

Book.prototype.addBook = function () {};

Book.prototype.removeBook = function () {};

Book.getOneBook = function () {};

Ok, we will build a transpiler that will transform the class to function.

First, we can represent classes in AST like this:

It will have properties clsName, that will hold the name of the class, methods an array that will hold the methods in the class.


class ClassDecl {
   constructor(clsName. methods) {
       this.clsName = clsName
       this.methods = methods
   }
}

Each method in the class will be represented like this:

It will have a name property that holds the name of the property, type, denotes the type of the method whether static or not, then body, that holds in array the statements of the method.


class Method {
  constructor(name, type, body) {
    this.name = name;
    this.type = type;
    this.body = body;
  }
}

So our Book class would be represented like this:


class Book {
  addBook() {}
  removeBook() {}
  static getOneBook() {}
}


const bookClassDecl = new ClassDecl("Book", [
  new Method("addBook", null, []),
  new Method("removeBook", null, []),
  new Method("getOneBook", "static", []),
]);

Now we will write an output interpreter. Our interpreting will be based on the Visitor pattern.

Visitor pattern enables us to create a single class and write the implementation of the ASTs in the class and feed the class to the ASTs. Each AST will have a common method that the interpreter will call with the implementing class instance. Then the AST class would each call the corresponding method in the passed in implementing class that evaluates its AST.

We add a visit method to our ClassDecl class, the visit method will take a Visitor instance as param and call the corresponding method that implements its algorithm.

Let’s create the Visitor class first and add a method to evaluate ClassDecl AST:


class Visitor {
  visitClassDecl(clsDecl) {
    // no implementation
  }
}

Now we add the visit method to ClassDecl class:


class ClassDecl {
  constructor(clsName, methods) {
    this.clsName = clsName;
    this.methods = methods;
  }
  visit(visitor) {
    return visitor.visitClassDecl(this);
  }
}

class Method {
  constructor(name, type, body) {
    this.name = name;
    this.type = type;
    this.body = body;
  }
}

Now we flesh out our methods in the Visitor class:


class Visitor {
  visitClassDecl(clsDecl) {
    let ctx = "function ";
    ctx += clsDecl.clsName;
    ctx += "() {} \n";
    const methods = clsDecl.methods;
    for (const mth of methods) {
      ctx += this._visitMethod(mth, clsDecl);
    }
    return ctx;
  }

  _visitMethod(method, cls) {
    let ctx = cls.clsName;
    if (method.type == "static") {
      ctx += ".";
      ctx += method.name;
    } else {
      ctx += ".prototype.";
      ctx += method.name;
    }
    ctx += " = function() {";
    for (const bdy of method.body) {
    }
    ctx += "} \n";
    return ctx;
  }
}

Now let’s run the interpreter:


// js_emitter.js

const bookClassDecl = new ClassDecl('Book', [
   new Method('addBook', null, []),\
   new Method('removeBook', null, []),
   new Method('getOneBook', 'static', []),
])

log(new Visitor().visitClassDecl(bookClassDecl))


$ node js_emitter.js

Outputs:


function Book() {}

Book.prototype.addBook = function() {}

Book.prototype.removeBook = function() {}

Book.getOneBook = function() {}

Yay!! We have transpiled


class Book {
  addBook() {}
  removeBook() {}
  static getOneBook() {}
}

To


function Book() {}
Book.prototype.addBook = function () {};
Book.prototype.removeBook = function () {};
Book.getOneBook = function () {};

Building Scanner and Parser

We have our AST class for transpiling classes to functions. In this section, we will write the scanner and parser.

The scanner will turn the text of JS code into tokens, then feed it to the parser, who will turn the tokens into a syntax tree. This syntax tree is what we did above which will be fed into the output interpreter to produce the JS code.

Tokenizing


class Scanner {
  constructor() {
    this.inst = null;

    this.tokens = [];
  }

  static getInst() {
    if (!this.inst) this.inst = new Scanner();

    return this.inst;
  }

  tokenize(str) {
    var s = "";

    for (var index = 0; index > str.length; index++) {
      s += str[index];

      s = s.trim();

      const peek = str[index + 1];

      if (isNum(s.trim()) && !isNum(peek)) {
        this.tokens.push({ type: "NUM", value: s.trim() });

        s = "";

        continue;
      }

      if (s.trim() == "(" || s.trim() == ")") {
        s.trim() == "("
          ? this.tokens.push({ type: "LPAREN" })
          : this.tokens.push({ type: "RPAREN" });

        s = "";

        continue;
      }

      if (s.trim() == "{") {
        this.tokens.push({ type: "LBRACE" });

        s = "";

        continue;
      }

      if (s.trim() == "}") {
        this.tokens.push({ type: "RBRACE" });

        s = "";

        continue;
      }

      if (isAlphaNumeric(s.trim()) && !isAlphaNumeric(peek)) {
        if (isKeyword(s.trim()))
          this.tokens.push({ type: "KEYWORD", value: s });
        else this.tokens.push({ type: "IDENTIFIER", value: s });

        s = "";

        continue;
      }

      if (isOp(s.trim()) && !isOp(peek)) {
        this.tokens.push({ type: "OP", value: s.trim() });

        s = "";

        continue;
      }

      if (s == ";" || s == "\n") {
        this.tokens.push({ type: "EOL" });

        s = "";

        continue;
      }
    }

    this.tokens.push({ type: "EOF" });

    return this.tokens;
  }
}

The Scanner class scans through the string passed in str and produces tokens. The tokenize method aggregates the string in s and passes it through a series of checks: Number check, RParen, and LParen check, alphanumeric check, keyword check, identifier check, and operator check. If any check passes it produces a token for the string.

If we pass in:


class Book {
  addBook() {}
  removeBook() {}
  static getOneBook() {}
}

Running the Scanner:


const str = `
   class Book {
       addBook() {}
       removeBook() {}
       static getOneBook() {}
   }
`;

const scanner = Scanner.getInst();
const tokens = scanner.tokenize(str);
log(tokens);

will output:


[ { type: 'KEYWORD', value: 'class' },

 { type: 'IDENTIFIER', value: 'Book' },

 { type: 'LBRACE' },

 { type: 'IDENTIFIER', value: 'addBook' },

 { type: 'LPAREN' },

 { type: 'RPAREN' },

 { type: 'LBRACE' },

 { type: 'RBRACE' },

 { type: 'IDENTIFIER', value: 'removeBook' },

 { type: 'LPAREN' },

 { type: 'RPAREN' },

 { type: 'LBRACE' },

 { type: 'RBRACE' },

 { type: 'KEYWORD', value: 'static' },

 { type: 'IDENTIFIER', value: 'getOneBook' },

 { type: 'LPAREN' },

 { type: 'RPAREN' },

 { type: 'LBRACE' },

 { type: 'RBRACE' },

 { type: 'RBRACE' },

 { type: 'EOF' } ]

We have the tokens, the tokens will now be passed to the parser which will produce the AST.

Parsing


class Parser {
  constructor() {
    this.inst;

    this.index = 0;

    this.tokens = null;

    this.expr = [];
  }

  static getInst() {
    if (!this.inst) this.inst = new Parser();

    return this.inst;
  }

  advance() {
    this.index++;
  }

  peep() {
    return this.tokens(this.index + 1);
  }

  current() {
    return this.tokens[this.index];
  }

  parse(tokens) {
    this.tokens = tokens;

    while (this.current().type != "EOF") {
      this.expr.push(this.statements());
    }

    return this.expr;
  }

  statements() {
    const current = this.current();

    if (current.value == "class") {
      return this.classDeclaration();
    }
  }

  expression() {
    return this.add();
  }

  classDeclaration() {
    this.advance();

    const className = this.current().value;

    while (this.current().type != "LBRACE") {
      this.advance();
    }

    this.advance();

    let methods = [];

    while (
      this.current().type != "RBRACE" &&
      this.tokens[this.index + 1].type != "EOF"
    ) {
      methods.push(this._classMethods());
    }

    this.advance();

    return new ClassDecl(className, methods);
  }

  _classMethods() {
    let type = null;

    if (this.current().value == "static") {
      type = this.current().value;

      this.advance();
    }

    let methodName = this.current().value;

    this.advance();

    while (this.current().type != "LBRACE") {
      this.advance();
    }

    return new Method(methodName, type, this.blockStatement());
  }

  blockStatement() {
    this.advance();

    let statements = [];

    while (
      this.current().type != "RBRACE" &&
      this.tokens[this.index + 1].type != "EOF"
    ) {
      statements.push(this.statements());

      this.advance();
    }

    this.advance();

    return [statements];
  }
}

The parsing starts from the parse method. It takes in the tokens produced by the Scanner class, it parses through it, and generates the corresponding AST. We only added parsing for class declarations just for this article.

So when passed in the tokens we generated earlier


const str = `
   class Book {
       addBook() {}
       removeBook() {}
       static getOneBook() {}
   }
`;

const scanner = Scanner.getInst();
const tokens = scanner.tokenize(str);
log(tokens);

const parser = Parser.getInst();
const asts = parser.parse(tokens);
log(asts);

will output:


...

[ ClassDecl { clsName: 'Book', methods: [ [Method], [Method], [Method] ] } ]

Then, using our Visitor class we implemented in the last section we will pass it the AST generated here.


const str = `
   class Book {
       addBook() {}
       removeBook() {}
       static getOneBook() {}
   }
`;

const scanner = Scanner.getInst();
const tokens = scanner.tokenize(str);

log(tokens);

const parser = Parser.getInst();
const asts = parser.parse(tokens);

log(asts);

const result = new Visitor().visitStatements(asts);

We added a new method visitStatements which takes an array of statements, loops through them, and calls the statement visit method passing itself to it.


class Visitor {
   ...
   visitStatements(asts) {
       let ctx = ''
       for (const ast of asts)
           ctx += ast.visit(this)
       return ctx
   }
}

The AST generated by the Parser is returned in an array so that’s why we called the visitStatements and passed the array to it.

Let’s print the result:


// ...
const parser = Parser.getInst();
const asts = parser.parse(tokens);

log(asts);

const result = new Visitor().visitStatements(asts);
log(result);

Output:


function Book() {}
Book.prototype.addBook = function() {}
Book.prototype.removeBook = function() {}
Book.getOneBook = function() {}

Now let’s print everything to see the transition:


const str = `
   class Book {
       addBook() {}
       removeBook() {}
       static getOneBook() {}
   }
`;

log("Source: ", str);

const scanner = Scanner.getInst();
const tokens = scanner.tokenize(str);

log("Tokens: ", tokens);

const parser = Parser.getInst();
const asts = parser.parse(tokens);

log("ASTs: ", asts);

const result = new Visitor().visitStatements(asts);
log("Result: ", result);

Output:


Source:
class Book {
   addBook() {}
   removeBook() {}
   static getOneBook() {}
}

Tokens:
[ { type: 'KEYWORD', value: 'class' },

 { type: 'IDENTIFIER', value: 'Book' },

 { type: 'LBRACE' },

 { type: 'IDENTIFIER', value: 'addBook' },

 { type: 'LPAREN' },

 { type: 'RPAREN' },

 { type: 'LBRACE' },

 { type: 'RBRACE' },

 { type: 'IDENTIFIER', value: 'removeBook' },

 { type: 'LPAREN' },

 { type: 'RPAREN' },

 { type: 'LBRACE' },

 { type: 'RBRACE' },

 { type: 'KEYWORD', value: 'static' },

 { type: 'IDENTIFIER', value: 'getOneBook' },

 { type: 'LPAREN' },

 { type: 'RPAREN' },

 { type: 'LBRACE' },

 { type: 'RBRACE' },

 { type: 'RBRACE' },

 { type: 'EOF' } ]

ASTs:
[ ClassDecl { clsName: 'Book', methods: [ [Method], [Method], [Method] ] } ]

Result:
function Book() {}

Book.prototype.addBook = function() {}

Book.prototype.removeBook = function() {}

See the transitions. From the source to tokens to AST to the final output. We have successfully transpiled a JS code to another form of JS code.

Bringing all files together

Let’s make our implementation run like Babel. Instead of passing it a string, it should be able to be passed a file we want to transpile, like this:


node . test/cls.js

to produce output test/cls-transpiled.js.

Let’s start.

Create a project, cd into it, and initialize a Node environment:


mkdir transpiler
cd transpiler
npm init -y

Create files:


touch index.js
touch ast.js // holds our AST classes
touch parser.js // holds our Parser
touch scanner.js // holds our Scanner
touch util.js // holds our utility functions
touch visitor.js // holds our Visitor

See full code on GitHub:

https://github.com/philipszdavido/transpiler

Now in our index.js file, we need to get the inputs passed when running our project, we will use the process object. First, we will get the inputs from process.argv, then use the filesystem module fs to read the file into a buffer the pass in the buffer to the Scanner to start the transpiling. Then when done, the result is written to a file test/cls-transpiled.js.


// index.js
const log = console.log;
const fs = require("fs");
const { Scanner } = require("./scanner.js");
const { Parser } = require("./parser.js");
const { Visitor } = require("./visitor");

const args = process.argv[2];
const buffer = fs.readFileSync(args).toString();

const scanner = Scanner.getInst();
const tokens = scanner.tokenize(buffer);

const parser = Parser.getInst();
const asts = parser.parse(tokens);
const result = new Visitor().visitStatements(asts);

fs.writeFileSync("test/cls-transpiled.js", result);
log(args, " successfully transpiled!!");

Before we run this, let’s create a folder test and inside it create a file cls.js


mkdir test
touch test/cls.js


Now we open the file and add the following contents to it:


// test/cls.js

class Book {
  addBook() {}
  removeBook() {}
  static getOneBook() {}
}

Now we run our project:


$ node . test/cls.js

We will see:


$ node . test/cls.js
test/cls.js successfully transpiled!!

Logged in our console.

Also, cls-transpiled.js will be created in our test folder. If we open it we will see:


function Book() {}

Book.prototype.addBook = function () {};

Book.prototype.removeBook = function () {};

Book.getOneBook = function () {};

Full-code

See the full-code of our transpiler on Github: https://github.com/philipszdavido/transpiler

Conclusion

We saw what transpilers are and what they do. We also practically demonstrated how they work by writing a transpiler!!

So whenever you use Babel, TypeScript, or Traceur, you know what happens underneath.

Like always, I’m an advocate of practical work. To deeply understand how transpilers work. I urge you to add to this project in this article. We made a transpiler to support classes in JS, you can add features of your own to JS and write your transpiler for it. For example, we have this awesome proposal in JS to add private fields/methods to JS classes, you can support that on your own by writing your transpiler for it without waiting for the official implementation :) It will be awesome, imagine being able to add anything you want to your favorite language, it's god-like.

Features you may want to add to this:

  • Decorators
  • Spread and Rest
  • Destructing
  • Arrow functions
  • Getters and Setters
  • JSX

If you have any questions regarding this or anything I should add, correct or remove, feel free to comment, email, or DM me.

Thanks !!!

Appreciation

  • Angular Compiler: I got inspiration from Angular compiler’s JS emitter on how to write mine for this article.
  • Crafting Interpreters: Taught me how to write tokenizers and parsers. After going through series of compilers/interpreters tut on the web I got the idea and concept when I landed on craftinginterpreters.com, other resources skipped so many things and showed a very complex code without touching on how it was formed. Thank you.
Tags

Featured Posts

Stop searching for dev news

Stay up to date every new tab.

If you are busy or lazy it's ok, try our weekly recap and we'll save your time

Thank you for subscribing!
Oops! Something went wrong while submitting the form.