Deceiving Test Coverage

Contents

What’s your test coverage?
Why we test
Quality of tests
Test coverage
Summary

What’s your test coverage?

If you are a developer and write tests probably you already heard people around you talking about test coverage a lot. Usually it is something like: “we do not have this module covered yet”, “we have 90% test coverage”, “this new class is covered 100%”. It is implied that the higher the test coverage is, the better it is: we are more confident in the code, there are fewer bugs, and the code is well-documented. And you may even get asked from time to time by other developers or wonder yourself what is the test coverage for the code you are currently working on.

Test coverage then seems like a pretty important metric that describes how well our code is tested. At the same time we use this term so often that inevitably we gloss over some important details and can even forget what we are trying to achieve with test coverage to begin with.

This article tries to revisit some of the reasoning behind the test coverage metric and provide a few simple examples that illustrate how we can use this metric and why we should better understand what it really means.

Hopefully, after reading this article you will be a bit more sceptical when somebody claims that the code is “100% tested”, “everything is covered”, “we should try to reach 100% coverage here”, etc. Moreover, you will be able to offer some good arguments and maybe convince others to also be more practical and reasonable when it comes to this metric. And also you will be hopefully having much more meaningful discussions about the quality of the code than just:

– “What’s our test coverage?”
– “It is 100%”
– “Good, it should definitely work”

Why we test

Probably, you were already doing some testing and the topic is quite familiar to you, but still, let’s start again from the very beginning in small baby steps in order to better understand how exactly do we come up with such a metric as test coverage.

We test our code because we want to be sure, at least to some degree, that it works. Every software developer knows that a just written piece of code rarely works as expected when it is first run. Getting from the first version of the code to the working one can involve a few cycles of applying some changes and observing the results in the working code, that is testing the code. If everything is good, than the code is tested and works in the end.

Well, almost, because it turns out that due to the subjectivity of testing by the same person(s) who develop the code they can sometime miss a few errors (known as “bugs”). That’s where testers come in and try to catch those bugs in the code which developers inadvertently left there.

If the testers miss bugs, then it is the turn of users to catch errors in software. Hopefully the users get well-tested and high-quality software, but, as we all know, unfortunately all too often the software we use still has quite a few bugs.

As we want our code to work correctly, ideally we would like to have no bugs in it when we make it available to the users. But this turns out to be a quite challenging task.

For some areas, the cost of error is so high and prohibiting, that finding all the errors and fixing them becomes a must. A good example of this can be software that is installed on a space ship that can cost a lot of money and requires everything to work precisely and perfectly.

For other applications, such as, for example, a video hosting service, the errors are still undesirable, but cost less.

It is all about making a trade-off: more testing means more effort, the cost of the left bugs should be less than the cost of additional testing required to find them.

Another approach is formally proving that some code works and implements specification which in its turn does not have bugs in it to begin with. Testing is quite different. Tests give us confidence, but do not prove completely that the code works, and this is the first important point we want to make in this article and that is often missed by some developers. There is a separate field in Computer Science that focuses on formally proving that programs are correct and some of its methods are even applied practically. We will not be talking about formally proving programs here, but for more information and additional links feel free to start with the following Wikipedia article “Formal verification” in case you have interest.

Often tests are written in the form of executable specifications that can be run by computer which saves a lot of precious human time from doing tedious, repetitive and error-prone work and leaves it to machines that are well-suited precisely for such things.

If tests are written together with the code and before the actual code is written we are talking about test-driven development, or test-first development. This has become quite a popular technique in the last years, but still many people write tests after the code has already been written or, unfortunately, do not write any tests at all. Guess why? Yes, “there is no time for that”, although, surprisingly, it is somehow always possible to later find quite a lot of time to fix all those numerous issues that were missed because of poor testing.

Some articles and practical experience suggests that writing tests for the code, ideally doing test-driven development, in the end saves time and development effort by reducing the number of defects dramatically. Although some bugs still escape and need to be caught by QA later, their number is then not as large as it would have been without tests.

So writing tests and testing code well pays off and makes sense because of the better quality of the resulting software, less stress for developers, more satisfied users and less development effort required to develop and fix the issues.

Quality of tests

Tests are good, as we just have seen and we need them. But there are so many ways to write tests for some piece of code, then how do we know if some set of tests is better than another? Can we somehow measure the quality of our tests and compare different sets of tests with each other?

Good tests are the ones after running which we are quite sure that the code works and the major usage scenarios have been tested or we say “covered”. Good tests should not be more complicated and take more effort to support than the code they test. They should be as independent as possible and allow to trace down a bug from a test failure in a matter of seconds without long debugging sessions. There should not be a lot of duplication, the test code should be well-structured. The tests should be fast and lightweight. etc.

We see that there are quite a few metrics and criteria that we can come up with that will tell us whether our tests are good enough.

Test coverage

Test coverage is just one such simple metric, although probably the most popular one, that allows us to judge whether our “tests are good”. Quite often, and, as we are already starting to see, a bit simplistically, people make this synonymous to “test coverage is good”. Moreover, “test coverage” often means the “line coverage of code by tests” which is quite different from, for example, “scenario coverage by tests” as it can be observed in the following simple example.

The example will be in JavaScript and will use a particular testing framework Jasmine but is not really JavaScript-specific. We just use JavaScript here to make our example real and executable in some environment, and you are free to port it to your favorite language and testing framework as you wish.

Let’s say we want to write a simple function that will return values from some dictionary based on the set of keys that are passed to it.

  var dictionary = {
    "key1": "value1",
    "key2": "value2",
    "key3": "value3"
  };

Let’s write some tests first that specify the desired behavior:

  describe("getValues - full line coverage", function() {

    it("should provide values for requested keys", function() {
      expect(getValues(["key1", "key2", "key3"]))
          .toEqual(["value1", "value2", "value3"]);
    });
  });

The implementation is pretty simple:

  function getValues(keys) {
    return keys.map(function(key) {
      return dictionary[key];
    });
  }

We just convert a list of keys into the list of values with map by replacing each key with a corresponding value from dictionary.

As we can see, each line of our function getValues is executed once and we have the so longed for 100% test coverage. Good job?

Well, let’s pause for a moment and look at an alternative implementation that will make our test pass and will have the same 100% test coverage.

  function getValues(keys) {
    return ["value1", "value2", "value3"];
  }

The implementation is obviously not what we want, because it always returns the same values no matter which keys we pass to the function. Something obviously went wrong with our test coverage metric here because it gave us false confidence in a broken code. And also then our tests are still not good enough as we did not recognize the wrong implementation. Let’s add more tests and be more specific for what behavior we wish.

  describe("getValues", function() {

    it("should provide values for requested keys", function() {
      expect(getValues(["key1", "key2", "key3"]))
          .toEqual(["value1", "value2", "value3"]);
    });

    it("should provide values for single key", function() {
      expect(getValues(["key1"])).toEqual(["value1"]);
    });
  });

Now we have failing test (and 100% test coverage at the same time ;)) and our implementation should definitely be fixed.

But, if we are really stubborn or have no clue how to implement this required function we can fix it like this.

  function getValues(keys) {
    return (keys.length == 1) ? ["value1"]
        : ["value1", "value2", "value3"];
  }

which is still wrong, although we already have two passing tests and our test coverage is still 100%.

Then we have no choice but to test all the possible combinations of the keys present in the dictionary.

  describe("getValues", function() {

    sets(["key1", "key2", "key3"]).forEach(function(testInput) {
      var testCaseName = "should provide values for '" 
          + testInput.join(",") + "'";

      it(testCaseName, function() {
        var expectedOutput = testInput.map(function(key) {
          return key.replace("key", "value");
        });
        expect(getValues(testInput)).toEqual(expectedOutput);
      });
    });
  });

Here we are using some function sets that generates all the possible sets from the given arguments. Its implementation is left out of scope of this article.

It looks like our test suite is now more complicated then the code we test. It has 8 test cases which all pass and provides 100% test coverage. But now it seems that the only valid implementation to make all those tests pass is the original one.

  function getValues(keys) {
    return keys.map(function(key) {
      return dictionary[key];
    });
  }

OK. So, are we finished yet?

Well, not quite. There are still many scenarios for our code that were not covered but which we may in fact expect to occur when our code is used.

What if we provide some unexpected input or a key that is not present in the dictionary? What should the behavior be? Let’s add a couple more tests.

  describe("getValues", function() {

    sets(["key1", "key2", "key3"]).forEach(function(testInput) {
      var testCaseName = "should provide values for '" 
          + testInput.join(",") + "'";

      it(testCaseName, function() {
        var expectedOutput = testInput.map(function(key) {
          return key.replace("key", "value");
        });
        expect(getValues(testInput)).toEqual(expectedOutput);
      });
    });

    it("should handle invalid input properly", function() {
      expect(getValues(null)).toEqual([]);
    });

    it("should return no value for unknown key", function() {
      expect(getValues(["key1", "unknown", "key3"]))
          .toEqual(["value1", undefined, "value3"]);
    });
  });

Unfortunately our implementation does not pass the test for invalid input and we should correct it a bit. Let’s do it.

  function getValues(keys) {
    return keys ? keys.map(function(key) {
      return dictionary[key];
    }) : [];
  }

OK, now we have 10 test cases and still 100% test coverage which, by the way, it seems, we had all along in this example. Should we stop at this point?

If we look formally, there are still lots of untested scenarios, for example, what if dictionary contains other keys or no keys at all, will our code work? So we can get really paranoid about our code, do not believe that it works and add more and more tests. But, rather than wasting time and doing that, let’s just say that now it just works because we tested basic scenarios well and even some invalid inputs.

Yes, that’s right, as much as our tests give us confidence, without strict mathematical proof of correctness at some point we should just say “we believe that it works now”. This is just how it is, tests are a tool not a means. When our tool gives us enough confidence and lets us explore the code enough in the end it is still our judgement whether the code works or not. By the way, this is precisely why our code still needs to be tested or looked at by other people afterwards. As good as we hope it is, our judgement may be flawed and we can miss some important scenarios in which our code will be used.

Given this simple example, let’s move on directly to the final part of this article and formulate some of the conclusions that now should be more or less evident.

Summary

  • Test coverage is just one of many metrics that can be used to judge how good our tests are
  • Name “test coverage” is misleading, it should be more like “line coverage”
  • Line coverage is a simplistic metric that measures not the percentage of possible scenarios that were covered, but rather that of executed lines of code
  • Line coverage is being measured primarily because it is easy to measure
  • Reaching 100% line coverage does not guarantee that code works
  • For strictly proving that code is correct a whole different approach is needed
  • Don’t be deceived by nice line coverage metric values
  • Spending a lot of effort on reaching 100% line coverage may not be worth it, because scenario coverage may still be well below 100%
  • Consider other metrics in combination with line coverage
  • Computing scenario coverage is more difficult than line coverage
  • Tests do not prove that code works, only give more confidence
  • Avoid excessive confidence in tests
  • Quality of the code depends on the quality of tests
  • Automated testing and TDD are great tools that improve the quality of software dramatically
  • It is difficult or not feasible to cover all possible scenarios for code
  • Some leap of faith is still needed that the code works
  • There is no exact recipe when we have written enough tests, not too many not too few
  • Avoid being too speculative about code and adding too many extra tests, added value per test will be less
  • Thoroughly testing invalid inputs is fine for publicly exposed APIs, but may make little sense for internal library code, but, again, no exact recipe

Links

Code from the article
Formal Verification
Test-Driven Development
Jasmine (JavaScript testing framework)

Advertisements

Eloquent JavaScript with Underscore.js

Contents

We need a library for that…
Underscore.js to the rescue
Performance
Language independent concepts
Functional programming
Underscore.js: under the hood
Alternatives
Summary

We need a library for that…

Before looking at Underscore.js more closely let’s first discuss the context in which it can be useful. We will see what problems we may have and that it actually solves them well. Because, who knows maybe we don’t need Underscore.js at all and you can already stop reading the article?

Here is our problem. Let’s say that as part of some other larger project we would like to write code that analyzes text and outputs some important information like the list of all the used words in the alphabetical order, the top 10 most frequently used words, the total number of words, etc.

This is what our first attempt at solving the problem might look like if we never heard of functional programming in general and Underscore.js and corresponding JavaScript APIs in particular.

The text we would like to analyze:

var text = "Alice was beginning to get very tired of sitting \
by her sister on the bank, and of having nothing to do: once \
or twice she had peeped into the book her sister was reading, \
but it had no pictures or conversations in it, \
'and what is the use of a book,' thought Alice \
'without pictures or conversations?'\
\
So she was considering in her own mind (as well as she could, \
for the hot day made her feel very sleepy and stupid), whether \
the pleasure of making a daisy-chain would be worth the trouble \
of getting up and picking the daisies, when suddenly a White \
Rabbit with pink eyes ran close by her.";

And the code that analyzes the text:

function textWords(text) {
    var words = text.match(/[a-zA-Z\-]+/g);

    for (var i = 0; i < words.length; i++) {
        words[i] = words[i].toLowerCase();
    }
    return words;
}

function wordsFrequencies(words) {
    var frequencies = {},
        currentWord = null;
    
    for(var i = 0; i < words.length; i++) {
        currentWord = words[i];
        frequencies[currentWord] =
            (frequencies[currentWord] || 0) + 1;
    }
    return frequencies;
}

function sortedListOfWords(wordsFrequencies) {
    var words = [];

    for (var key in wordsFrequencies) {
        if (wordsFrequencies.hasOwnProperty(key)) {
            words.push(key);
        }
    }
    return words.sort();
}

function topTenWords(wordsFrequencies) {
    var frequencies = [],
        result = [];

    for (var key in wordsFrequencies) {
        if (wordsFrequencies.hasOwnProperty(key)) {
            frequencies.push([key, wordsFrequencies[key]]);
        }
    }

    frequencies = frequencies.sort(function(freq1, freq2) {
        return (freq1[1] < freq2[1])
            ? 1
            : (freq1[1] > freq2[1] ? -1 : 0);
    });

    for (var i = 0; i < 10; i++) {
        result[i] = frequencies[i];
    }
    return result;
}

function analyzeText(text) {
    var words = textWords(text),
        frequencies = wordsFrequencies(words),
        used = sortedListOfWords(frequencies),
        topTen = topTenWords(frequencies);

    console.log("Word count = ", words.length);
    console.log("List of used words = ", used);
    console.log("Top 10 most used words = ", topTen);
}

analyzeText(text);

This should be self explanatory, but still let’s look at one of the methods more closely, in particular we will be interested in low level implementation details:

function wordsFrequencies(words) {
    var frequencies = {},
        currentWord = null;
    
    for(var i = 0; i < words.length; i++) {
        currentWord = words[i];
        frequencies[currentWord] =
            (frequencies[currentWord] || 0) + 1;
    }
    return frequencies;
}

Here we just iterate through the list of all words and for each word increment its frequency value stored in frequencies. Note, that this implementation contains the details how we exactly increment the index when getting the next word and this is pretty low level. It is first not important to our implementation and second very likely can be reused in many other use cases. This is precisely the reason why JavaScript has it already abstracted into a separate function Array.prototype.forEach. Let’s rewrite wordsFrequencies using this function instead:

function wordsFrequencies(words) {
    var frequencies = {};
    
    words.forEach(function(word) {
        frequencies[word] = (frequencies[word] || 0) + 1;
    });
    return frequencies;
}

No doubt the implementation became somewhat clearer without an extra index variable i and currentWord. But actually even this version can be made shorter and the low level implementation details of how exactly frequencies changes with each processed word can also be partially hidden. For that we will use the Array.prototype.reduce function:

function wordsFrequencies(words) {
    return words.reduce(function(frequencies, word) {
        frequencies[word] = (frequencies[word] || 0) + 1;
        return frequencies;
    }, {});
}

It seems like we improved this part quite a bit. But now we notice that in another place of our code we read the list of all the keys of an object, and there is also a separate Object.keys function provided by JavaScript to do that, and instead of:

    for (var key in wordsFrequencies) {
        if (wordsFrequencies.hasOwnProperty(key)) {
            frequencies.push([key, wordsFrequencies[key]]);
        }
    }

we can write:

    frequencies = Object.keys(wordsFrequencies).map(function(key) {
        return [key, wordsFrequencies[key]];
    });

Much more readable. And here we also used the Array.prototype.map function.

Now below is the whole version of our program rewritten using the JavaScript standard methods we mentioned above:

function textWords(text) {
    return text.match(/[a-zA-Z\-]+/g).map(function(word) {
        return word.toLowerCase();
    });
}

function wordsFrequencies(words) {
    return words.reduce(function(frequencies, word) {
        frequencies[word] = (frequencies[word] || 0) + 1;
        return frequencies;
    }, {});
}

function sortedListOfWords(wordsFrequencies) {
    return Object.keys(wordsFrequencies).sort();
}

function topTenWords(wordsFrequencies) {
    var frequencies = [],
        result = [];

    frequencies = Object.keys(wordsFrequencies)
        .map(function(key) {
            return [key, wordsFrequencies[key]];
        }).sort(function(freq1, freq2) {
            return (freq1[1] < freq2[1])
                ? 1
                : (freq1[1] > freq2[1] ? -1 : 0);
        });

    for (var i = 0; i < 10; i++) {
        result[i] = frequencies[i];
    }
    return result;
}

function analyzeText(text) {
    var words = textWords(text),
        frequencies = wordsFrequencies(words),
        used = sortedListOfWords(frequencies),
        topTen = topTenWords(frequencies);

    console.log("Word count = ", words.length);
    console.log("List of used words = ", used);
    console.log("Top 10 most used words = ", topTen);
}

analyzeText(text);

The program became shorter and much more clear. However, if we look carefully, we can notice that there are still a few places with lots of low-level details, for example:

function topTenWords(wordsFrequencies) {
    var frequencies = [],
        result = [];

    frequencies = Object.keys(wordsFrequencies)
        .map(function(key) {
            return [key, wordsFrequencies[key]];
        }).sort(function(freq1, freq2) {
            return (freq1[1] < freq2[1])
                ? 1
                : (freq1[1] > freq2[1] ? -1 : 0);
        });

    for (var i = 0; i < 10; i++) {
        result[i] = frequencies[i];
    }
    return result;
}

still has low level details related to sorting and getting the first 10 elements.

Moreover this function clearly does too much at once and almost asks to split it in two functions. And having 10 hardcoded in the body of the function is clearly far from perfect. Notice that this became much more evident only after we partially re-factored the code of the original function and abstracted away some of the implementation details. For now we will stop here with our re-factoring and return to this function later when we will use Underscore.js in the process.

Unfortunately the standard JavaScrit, while providing a number of useful functions such as map, reduce, filter, etc. still lacks some other functions and the API is a bit limited.

For example in the case of topTenWords function above we miss the function for taking first n elements from an array which is quite common to many languages, see take from the standard Clojure library or take from the standard Ruby library.

This is where Underscore.js comes to the rescue.

Underscore.js to the rescue

Underscore.js can be installed as a Node module:

npm install underscore

Or you can just include the minified source on your page:

<script
src="http://documentcloud.github.com/underscore/underscore-min.js">
</script>

The source code for the library can be found on Github github.com/jashkenas/underscore

Underscore.js does just what we need: it provides a large number of convenient functions that we can use to get rid of low level implementation details in our programs. There is some intersection with the standard JavaScript API, but Underscore.js provides far more useful functions than we can find in the standard JavaScript. For example, the last function we considered for getting the top 10 frequent words can be rewritten as follows:

function wordsAndFrequenciesDescending(wordsFrequencies) {
    return _.sortBy(_.map(_.keys(wordsFrequencies),
        function(key) {
            return [key, wordsFrequencies[key]];
        }),
        _.property("1")).reverse();
}

function topWords(wordsFrequencies, number) {
    return _.take(
        wordsAndFrequenciesDescending(wordsFrequencies),
        number
    );
}

Notice that it is now more terse and expressive. _.keys corresponds to Object.keys, _.map to Array.prototype.map we encountered before, etc. Also we see that Underscore.js does include the take method we mentioned before and which was missing from the standard JavaScript API. The function by which we sort became much shorter and clearer as well thanks to using the _.property function provided by Underscore.js.

Another thing that we immediately notice about Underscore.js is that rather than adding methods to JavaScript objects, Underscore.js provides external methods that take an object as a parameter.

Underscore.js also allows to use a more object-oriented style by wrapping an object and making its functions available on the resulting wrapper object as methods. When invoked each such method returns another wrapper object that contains the intermediate computation result and on which we can again invoke Underscore.js methods. This way we can chain computations.

For example, compare:

var object = {
    "key1": "value1",
    "key2": "value2",
    "key3": "value3"
};

_(object).keys();
_.keys(object);

Both ways of calling keys in this case are valid. You can read more about wrapping objects with Underscore.js and chaining in the documentation.

And here is the full version of our initial program rewritten using Underscore.js functions.

function textWords(text) {
    return _.map(text.match(/[a-zA-Z\-]+/g), function(word) {
        return word.toLowerCase();
    });
}

function wordsFrequencies(words) {
    return _.reduce(words, function(frequencies, word) {
        frequencies[word] = (frequencies[word] || 0) + 1;
        return frequencies;
    }, {});
}

function sortedListOfWords(wordsFrequencies) {
    return _.sortBy(_.keys(wordsFrequencies));
}

function wordsAndFrequenciesDescending(wordsFrequencies) {
    return _.sortBy(_.map(_.keys(wordsFrequencies),
        function(key) {
            return [key, wordsFrequencies[key]];
        }),
        _.property("1")).reverse();
}

function topWords(wordsFrequencies, number) {
    return _.take(
        wordsAndFrequenciesDescending(wordsFrequencies),
        number
    );
}

function analyzeText(text) {
    var words = textWords(text),
        frequencies = wordsFrequencies(words),
        used = sortedListOfWords(frequencies),
        topTen = topWords(frequencies, 10);

    console.log("Word count = ", words.length);
    console.log("List of used words = ", used);
    console.log("Top 10 most used words = ", topTen);
}

analyzeText(text);

The functions became much more succinct and clear and the low level implementation details are now well hidden.

Actually, it is possible to make (thanks for pointing this out, rooktakesqueen) the code even more expressive. Using the _.chain function we can re-write wordsAndFrequenciesDescending as follows:

function wordsAndFrequenciesDescending(wordsFrequencies) {
    return _.chain(wordsFrequences)
        .keys()
        .map(function(key) {
            return [key, wordsFrequencies[key]];
        })
        .sortBy(_.property('1'))
        .reverse()
        .value();
}

And… That’s it, if you have managed to follow the article this far, you already have a good grasp of what Underscore.js is and how it can help you with writing your code. You can always get more information about Underscore.js by reading its annotated source code or documentation. The number of provided functions is considerably larger than what we discussed so far, in particular there are separate functions for objects, arrays, collections and functions, as well as a few generic utility functions.

Even more functions can be found in underscore-contrib. Although the functions there may be less generic and suitable only for a particular type of problems. If you cannot find some function in Underscore.js, check it out, it may already contain what you need. But this part is out of the scope of the present article.

What follows below is more detailed discussion about the style of programming promoted by Underscore.js, performance, implementation details of the library, etc. Read it if you would like to have more advanced understanding of the library.

Performance

Underscore.js provides us high level abstract functions that make our code clearer and shorter, that is it makes us as developers more productive, but what about the computer productivity? Is our code performant enough? Let’s use jsperf to test that in our case and see some results for the three implementations above:

underscore_vs_map_reduce_keys_vs_loops

Here is the link to the test cases and the full results http://jsperf.com/underscore-js-vs-map-reduce-keys-vs-low-level

We immediately see that using the native map, reduce and keys is from 1.5 to 2 times faster than the Underscore.js version.

Suprisingly enough in the case of our program for Chrome and IE10 using native map, reduce and keys is even faster than the low level implementation. But actually it does make sense, because those native methods are implemented as native code which is somewhat faster than trying to write the same logic in JavaScript.

Although this does not always seem to be true, and the low level implementation can still be much faster than the native forEach and reduce if the arrays are large enough and contain only numbers as another benchmark demonstrates http://jsperf.com/foreach-vs-reduce-vs-for-loop

The main result here is that the performance of all the three approaches is of the same order, i.e. is comparable. Then if we get more readability with Underscore.js or native functions we probably should use them as we do not loose much in performance and in the case of Chrome we actually win.

Based on this benchmark the recommendation will be consider avoiding writing your own loops, as they are much less expressive, more complicated and do not always guarantee better performance.

If you still have doubts, in general, when performance considerations enter your decision making, please, remember, as a rule we should optimize only the 10% of the code that takes 90% of the execution time, otherwise we are wasting our development time, do not win much performance-wise and make our code unnecessary complicated.

Language independent concepts

If you are familiar with any other languages than JavaScript you might have seen already something similar to the functions we used above. Ruby has its Enumerable module which defines many of the similar methods, Java has Guava and in fact almost everybody in the Java world tries to develop their own version of Underscore.js without even sometimes realizing it, Scala has many of these methods included into the standard library, etc.

If you look carefully at what we discussed so far, you will see that there is not much that is JavaScript specific. We just talked about collections, functions, passing functions into other functions and creating higher level functions, that is functions that can accept other functions as arguments. In fact, we can define functions like map, filter, reduce, etc. in any other language that has the concepts mentioned above.

A large part of the philosophy and ideas behind Underscore.js is thus language independent and quite generic. But of course, in order to be practically useful it also has a sizable JavaScript-specific part which is still needed by JavaScript developers, for example, take a look at the _.toArray and _.isDate functions.

That ideas and philosophy are in large part inspired by Functional Programming (FP), which is a development paradigm that views functions as the primary building block of programs.

For some languages that currently do not treat functions as important enough, for example, Java, it is stil possible to emulate passing functions into functions and create something similar to Underscore.js but it will look much uglier and will be quite cumbersome to use, for example, compare the different versions of a program that converts a collection of strings into a collection of lengths of each string:

the Java version (using the Functional Java library):

Array.<String, Integer>map().f(new F<String, Integer>() {

    @Override
    public Integer f(String arg) {
        return arg.length();
    }
}).f(array("a", "ab", "abc")).array(Integer[].class));

the Scala version:

Array("a", "ab", "abc").map(_.length())

the Ruby version:

["a", "ab", "abc"].map {|str| str.length}

and the JavaScript one:

["a", "ab", "abc"].map(function(str) {
    return str.length;
});

Now it is clearly time for a bit of critism in the direction of JavaScript. You can see that while JavaScript version is comparable to the Scala one it is still a bit more wordy and includes syntactic noise such as function and return keywords. But here Underscore.js will not help us much, as we cannot easily circumvent the core language limitations. And we can only feel sorry for poor Java developers until they finally get lambdas in Java 8.

Functional programming

This is a generic programming paradigm in which the primary (or at least one of the primary) building blocks that are used to create new programs are functions.

Another things often talked about in relation to Functional Programming (FP) are immutability and referential transparency.
We will not go into many details what those mean, and provide only a brief explanation, but please, feel free to take a deep dive and explore more on your own.

In a couple of words the basic principles of FP can be summed up as follows:

  • You can substitute a function invocation with the body of the function withouth changing the results of the computation (referential transparency)
  • There is no shared state in the program, functions accept as arguments and produce immutable objects
  • There is no notion of time, the actual scheduling of when various parts of the program will be evaluated does not affect the results of the computation as long as each function gets all of its arguments before its invocation
  • Functions can be passed as arguments into other functions and can be returned from functions (in other words, functions are “first-class citizens”)

The programs in FP are more like mathematical statements or formulas then a set of imperative instructions that should be executed one by one.

Many of the languages considered “functional” do not follow all of the outlined principles to begin with. For example, even in Scala it is possible to have mutable shared state although there is considerable language support for immutability. Quite often the theoretical beauty of FP has to make some room for practical concerns: writing something to disk or sending a network message obviously do require to have the notion of time in programs.

It is relatively clear that JavaScript only satisfies the last requirement. We can of course try to create programs that have no shared state and will be referentially transparent in JavaScript but it will require additional efforts from our side as the developers and we are not restricted to violate those principles by the language itself.

That said, having functions as “first-class citizens” in JavaScript is actually quite a powerful feature already. It moves us considerably in the direction of FP, and we can always try to follow the rest of the principles of FP when it suits us best. Definitely Underscore.js provides a great deal of help here.

Underscore.js: under the hood

Let’s now take a look at the implementation details of Underscore.js. There we can find some good examples of functional style of programming but normally the library tries to avoid using its own functions when implementing other functions because of the performance concerns, so some of the examples we will see will be what we call here “low-level”. And this is perfectly acceptable because many libraries depend on Underscore.js and here the performance actually does play an important role, it is even probably more important than readability.

The full source code of Underscore.js is available in the github repository https://github.com/jashkenas/underscore

One example of the functional approach used in Underscore.js is the union function:

  _.union = function() {
    return _.uniq(_.flatten(arguments, true));
  };

The implementation clearly explains what the function does: given a few arrays it first concatenates those arrays with _.flatten and then removes non-unique elements with _.uniq.

Another good example of the functional style is a number of functions that deal with grouping, let’s look at the code for _.groupBy:

  // An internal function used for aggregate "group by" operations.
  var group = function(behavior) {
    return function(obj, iterator, context) {
      var result = {};
      iterator = lookupIterator(iterator);
      each(obj, function(value, index) {
        var key = iterator.call(context, value, index, obj);
        behavior(result, key, value);
      });
      return result;
    };
  };

  // Groups the object's values by a criterion. Pass either 
  //a string attribute to group by, or a function that 
  //returns the criterion.
  _.groupBy = group(function(result, key, value) {
    _.has(result, key)
        ? result[key].push(value) 
        : result[key] = [value];
  });

group is quite an interesting function. It accepts another function that alters its behavior and then returns a function that accepts an object and an iterator. The third argument context is optional, it can specify on which object the iterator should be invoked, i.e. the value of this inside the iterator function. This function returned from group when it is invoked calls the provided iterator to get the key for each value it gets from the target object on which it was invoked. If object is an array, then value will be an element of that array. Once we have key and value we use the provided behavior to process them and maybe alter the result which we first initialize to an empty object. In the end we return result.

We call functions that accept other functions as arguments “higher-order” functions. Then, speaking in the language of Functional Programming, group is just a higher-order function that returns another higher-order function once it is invoked.

This may sound too complicated, but once we look at a concrete example of how _.groupBy is defined and used, everything becomes much clearer.

We see that _.groupBy is just a function returned by the group function which we customize with a specific behavior that for each key and value pushes that value into the array of values stored in result[key]. This way we get all the values stored in the result grouped into corresponding arrays which can be accessed as properties on the result object.

An example of how the _.groupBy function is used:

  //Result: {"even":[0,2,4],"odd":[1,3]}
  _.groupBy([0, 1, 2, 3, 4], function(value) {
    return (value % 2 == 0) ? "even" : "odd";
  });

Here we provide the concrete iterator function that specifies how to map each value to a key. In this case this is a function that determines for each value whether it is odd or even.

The rest of the grouping functions _.indexBy and _.countBy are defined in a very similar fashion:

  // Indexes the object's values by a criterion, similar
  // to `groupBy`, but for when you know that your index
  // values will be unique.
  _.indexBy = group(function(result, key, value) {
    result[key] = value;
  });

  // Counts instances of an object that group by
  // a certain criterion. Passeither a string attribute
  //to count by, or a function that returns the criterion.
  _.countBy = group(function(result, key) {
    _.has(result, key) ? result[key]++ : result[key] = 1;
  });

You can actually see what they do by doing an analysis of code similar to what we have just done with _.groupBy.

But as we mentioned not everything in Underscore.js is written in a functional style. An example of this is the _.object function that allows to create an object from provided arrays of properties in two ways:

  //Result: {"key1": "value1",
  //  "key2": "value2", "key3": "value3"}
  _.object([
    ["key1", "value1"],
    ["key2", "value2"],
    ["key3", "value3"]
  ]);

  //Result: {"key1": "value1",
  //  "key2": "value2", "key3": "value3"}
  _.object(["key1", "key2", "key3"],
    ["value1", "value2", "value3"]);

The current implementation is as follows:

  _.object = function(list, values) {
    if (list == null) return {};
    var result = {};
    for (var i = 0, length = list.length; i < length; i++) {
      if (values) {
        result[list[i]] = values[i];
      } else {
        result[list[i][0]] = list[i][1];
      }
    }
    return result;
  };

Which is quite low level and complicated. If we were to rewrite this using the functional style, we would get a shorter and cleaner version:

  _.object = function(list, values) {
    if (list == null) return {};
    var pairs = values 
      ? _.zip(list, _.take(values, list.length)) 
      : list;
    return _.reduce(pairs, function(result, pair) {
      result[pair[0]] = pair[1];
      return result;
    }, {});
  };

But this version turns out to be almost 10 times slower because of using of the _.zip and _.take functions which are themselves a bit slow. In the end we sacrifice the readability and functional purity for the performance and choose to leave the current implementation in place. The corresponding performance test that demonstrates the comparative performance of the two implementations
can be found here http://jsperf.com/underscore-object-implementations

Here we will stop with exploring the internal workings of Underscore.js as the parts of the library that we reviewed are already quite representative, but feel free to review the rest of the code.

Alternatives

There are a number of alternative helper libraries or APIs that you can use, but Underscore.js is by far the most popular one and the one where you can count on good support and the availability of the need features. De-facto this library has become a standard, for example, it is the most dependent upon Node module.

But as we have seen among the drawbacks may be a bit of a loss of the functional purity in the implementation, large API and slightly worse than native performance. Also not everybody can like the default Underscore.js style when we have to execute external functions on each object. And surely there are always alternatives.

One of them is the Lo-Dash library (thanks jcready for the reference). Personally I did not use it much, but it looks quite interesting and seems to do all of the things that Underscore.js does and sometimes even a bit more. A quick benchmark http://jsperf.com/reduce-underscore-js-vs-lo-dash shows that _.reduce is 2 times faster in Lo-Dash than in Underscore.js when the benchmark is run in Chrome, but at the same time in Firefox the two libraries do not have noticable performance differences.

Yet another alternative you may consider is Lazy.js (thanks brtt3000), it seems to be the fastest library available based on the benchmarks provided on its site comparing it both with Underscore.js and Lo-dash. If your main concern is performance you may give this library a try.

Also you can always just use the native methods analogous to those provided by Underscore.js: map, reduce, filter, etc. Modern browsers support them well and Underscore.js itself falls back to the native implementations for its functions when it is possible for performance reasons. But be mindful that the performance win will not be more than 50% – 100% which can be relatively insignificant if the hot spots of your application lie elsewhere.

If you want to have a small core of well-defined functions and more of a functional paradigm, then consider using, for example, fn.js https://github.com/eliperelman/fn.js, a “JavaScript library built to encourage a functional programming style & strategy.”

Summary

We have seen that:

  • Writing our own loops and low level code makes our programs less readable and not always faster
  • Instead the native higher-order functions should be used or a specialized library like Underscore.js
  • With Underscore.js the resulting code is much more maintainable, as a result the development productivity grows
  • The performance cost of using Underscore.js is acceptable in most cases
  • It would be good to include functions similar to some of the Underscore.js functions into the language standard and support them natively

If you find yourself writing many loops and iterating over arrays a lot in your project you should definitely give Underscore.js a try and see how your code improves.

The code examples from this article can be found on github github.com/antivanov/misc/tree/master/JavaScript/Underscore.js

Links

Underscore.js
Lo-Dash
Lazy.js
“Functional JavaScript: Introducing Functional Programming with Underscore.js” book

JavaScript and Friends: CoffeeScript, Dart and TypeScript

Contents

Why JavaScript Isn’t Enough?
Example JavaScript Program: Dijkstra’s Algorithm
CoffeeScript
TypeScript
Dart
Web Application Development
ECMAScript 6
Conclusions

Why JavaScript Isn’t Enough?

This article assumes that the reader has a good knowledge of JavaScript and has done at least some development in it, but if this is not about you, you can just first refer to one of the beginner’s JavaScript books like Eloquent JavaScript.

JavaScript is an amazing, often underappreciated and misunderstood language. It has some really powerful concepts like functions as first-class citizens (see, for example, JavaScript: The World’s Most Misunderstood Programming Language), flexible prototypal inheritance and is a powerful generic programming language that can be used not only in browsers.

Despite all its power and flexibility the language has some well-known design shortcomings such as global variables, cumbersome emulation of lexical scoping, non-intuitive implicit conversions, etc. In fact, there are parts of the language that you better avoid using at all, as it is advised in JavaScript: The Good Parts. Let us also note that from the beginning JavaScript was not specifically designed for developing applications with large code bases and many developers involved.

Nonetheless, it is increasingly used for developing precisely such applications. In such cases in order for the code to be maintainable it should have a well-defined structure and developers should adhere to a strict coding discipline in order to avoid producing a festering pile of messy code where everything depends on everything and no boundaries can be found between modules. Unfortunately, in JavaScript it is all too easy to transform your code base into an abomination, this is somewhat similar to Ruby, but actually even worse, since unlike in Ruby there is no standard mechanism for emulating class-based inheritance and separating your code into modules and packages. Having no types specified in the source code does not help either.

The point is that for large applications JavaScript developers need to be extra careful and disciplined as they are not particularly restricted from producing a horrible pile of code. The most recent trend is exactly towards complex applications being written in JavaScript. For example, for the client side it was well understood for quite some time that moving a lot of presentational logic to the browser makes much more sense than keeping it away from the client on the server. The clients have become thicker and more complex, MVC frameworks for the client side appeared, one of them Backbone.js, etc. Of course, JavaScript is not limited to just the client side and there appeared large server-side applications as well, see Node.js It looks like now there is time for the language to be updated to be better fit for the new challenges and cases that were likely just not foreseen by its creators from the very beginning.

There are quite a few changes coming to JavaScript in the recent versions of the language standard, for example, see ECMAScript 6, but also simultaneously a number of languages started to appear near JavaScript that try to address the described issues. The present article is just a brief overview of the most well-known of these language and a discussion how they relate to each other and JavaScript. This is not a thorough research of all of the mentioned languages, but rather an attempt to get a feeling of what those languages are and why we should care about them.

Example JavaScript Program: Dijkstra’s Algorithm

For our discussion and comparison to be more tangible and less abstract let’s consider some program in JavaScript and see to what it translates in the selected languages while highlighting the most interesting parts and differences between those languages.

As an example, let’s take an implementation of Dijkstra’s algorithm for finding the shortest path between two nodes in a directed graph. For the purposes of the present article it is actually not important to understand how the algorithm works, but in case you are interested in the algorithm itself, you can just read the Wiki article.

So, here is the implementation in JavaScript:

Array.prototype.remove = Array.prototype.remove || function(element) {
	var index = this.indexOf(element);

	if (index >= 0) {
	    this.splice(index, 1);
	};
};

Array.prototype.contains = Array.prototype.contains || function(element) {
	return this.indexOf(element) >= 0;
};

var graphs = {};

(function(host) {
	function Graph(nodesNumber, edges) {
		this.nodesNumber = nodesNumber;
		this.initEdges(edges);
	};

	Graph.prototype.initEdges = function(edges) {
		var oThis = this,
			i = 0;

		this.edges = [];
		for (; i < this.nodesNumber; i++) {
			this.edges[i] = [];
		};		
		if (edges) {
			edges.forEach(function (edge) {
				oThis.edge(edge[0], edge[1], edge[2]);
			});
		};
	};

	Graph.prototype.edge = function(from, to, weight) {
		this.edges[from - 1][to - 1] = weight;
		return this;
	};
	
	Graph.prototype._constructShortestPath = function(distances, previous, unvisited, to) {
		var vertex = to,
		path = [];

		while (undefined != vertex) {
			path.unshift(vertex + 1);
			vertex = previous[vertex];
		};
			
		return {
			path: path,
			length: distances[to]
		};
	};

	Graph.prototype._getUnvisitedVertexWithShortestPath = function(distances, previous, unvisited) {
		var minimumDistance = Number.MAX_VALUE,
			vertex = null;
			
		unvisited.forEach(function (unvisitedVertex) {
			if (distances[unvisitedVertex] < minimumDistance) {
				vertex = unvisitedVertex;
				minimumDistance = distances[vertex];
			};
		});
		return vertex;
	};
	
	Graph.prototype._updateDistancesForCurrent = function(distances, previous, unvisited, current) {	
		for (var i = 0; i < this.edges[current].length; i++) {
			var currentEdge = this.edges[current][i];
			
			if ((undefined != currentEdge) &amp;&amp; unvisited.contains(i)) {
				if (distances[current] + currentEdge < distances[i]) {
					distances[i] = distances[current] + currentEdge;
					previous[i] = current;
				};
			};			
		};
	};

	//Dijkstra algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
	Graph.prototype.getShortestPath = function(from, to) {
		var unvisited = [],
		    current = null,
		    distances = [],
		    previous = [];

		from = from - 1;		
		to = to - 1;
		//Initialization
		for (var i = 0; i < this.nodesNumber; i++) {
			unvisited.push(i);
			//Infinity
			distances.push(Number.MAX_VALUE);
		};
		distances[from] = 0;
		
		while (true) {
			if (!unvisited.contains(to)) {
				return this._constructShortestPath(distances, previous, unvisited, to);
			};
			current = this._getUnvisitedVertexWithShortestPath(distances, previous, unvisited);
		
			//No path exists
			if ((null == current) || (Number.MAX_VALUE == distances[current])) {
				return {
		    		path: [],
		    		length: Number.MAX_VALUE
				};
			};
			this._updateDistancesForCurrent(distances, previous, unvisited, current);			
			unvisited.remove(current);
		};
	};

	host.Graph = Graph;
})(graphs);

var graph = new graphs.Graph(8, [
	[1, 2, 5], [1, 3, 1], [1, 4, 3],
	[2, 3, 2], [2, 5, 2],
	[3, 4, 1], [3, 5, 8],
	[4, 6, 2],
	[5, 7, 1],
	[6, 5, 1]
]);

var shortestPath = graph.getShortestPath(1, 7);

console.log("path = ", shortestPath.path.join(","));
console.log("length = ", shortestPath.length);

//No shortest path to the vertex '8'
console.log(graph.getShortestPath(1, 8));

This example demonstrates emulating classes with prototypes, extending the core language objects (Array), and working with data structures. All this should already be familiar to you, otherwise, please, refer to some JavaScript resource to quickly get up to speed with the language.

CoffeeScript

CoffeeScript addresses some of the JavaScript issues described above. It introduces classes, shortcuts for most common JavaScript boiler-plate code, such as @ for this and :: for prototype, saves on number of code lines, gets rid of curly braces and actively uses indentation to give structure to your program.

Here is the algorithm rewritten in CoffeeScript:

Array::remove = Array::remove || (element) ->
    index = @indexOf(element)
    @splice(index, 1) if index >= 0

graphs = {}

graphs.Graph = class Graph

    constructor: (@nodesNumber, edges) ->
        @initEdges(edges)

    initEdges: (edges) ->
        @edges = []
        @edges[i] = [] for i in [0..@nodesNumber]
        @edge edge... for edge in edges if edges
    
    edge: (from, to, weight) ->
        @edges[from - 1][to - 1] = weight

    _constructShortestPath = (distances, previous, unvisited, to) ->
        vertex = to
        path = []

        while vertex?
            path.unshift(vertex + 1);
            vertex = previous[vertex];

        path: path
        length: distances[to]
        
    _getUnvisitedVertexWithShortestPath = (distances, previous, unvisited) ->
        minimumDistance = Number.MAX_VALUE

        for unvisitedVertex in unvisited
            if (distances[unvisitedVertex] < minimumDistance)
                vertex = unvisitedVertex
                minimumDistance = distances[vertex]

        vertex

    _updateDistancesForCurrent: (distances, previous, unvisited, current) ->
        for edge, i in @edges[current]
            if ((undefined != edge) &amp;&amp; edge >= 0 &amp;&amp; i in unvisited &amp;&amp; (distances[current] + edge < distances[i]))
                distances[i] = distances[current] + edge
                previous[i] = current

    #Dijkstra algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
    getShortestPath: (from, to) ->
        unvisited = []
        current = null
        distances = []
        previous = []

        from = from - 1        
        to = to - 1

        #Initialization
        for i in [0..@nodesNumber]
            unvisited.push(i)
            #Infinity
            distances.push(Number.MAX_VALUE)

        distances[from] = 0
        
        while (true)
            if (not (to in unvisited))
                return _constructShortestPath(distances, previous, unvisited, to);

            current = _getUnvisitedVertexWithShortestPath(distances, previous, unvisited)
        
            #No path exists
            if ((null == current) || (undefined == current) || (Number.MAX_VALUE == distances[current]))
                return {
                    path: []
                    length: Number.MAX_VALUE
                }

            @_updateDistancesForCurrent(distances, previous, unvisited, current)            
            unvisited.remove(current)

        return

graph = new graphs.Graph(8, [
    [1, 2, 5], [1, 3, 1], [1, 4, 3],
    [2, 3, 2], [2, 5, 2],
    [3, 4, 1], [3, 5, 8],
    [4, 6, 2],
    [5, 7, 1],
    [6, 5, 1]
]);

shortestPath = graph.getShortestPath(1, 7)

console.log("path = ", shortestPath.path.join(","))
console.log("length = ", shortestPath.length)

#No shortest path to the vertex '8'
console.log(graph.getShortestPath(1, 8))

As it can be seen CoffeeScript is a completely different language that can be then (usually) compiled into JavaScript or executed outright. It is not a superset or subset of JavaScript and you cannot just freely mix JavaScript and CoffeeScript constructs in one program. There is still a mechanism for executing JavaScript embedded inside your CoffeeScript, although normally you would not use it.

The CoffeeScript version is somewhat shorter: the number of lines of the CoffeeScript code in our example is approximately 2/3 of the number of lines of the JavaScript version.

From my brief experience with it I can say that programming in CoffeeScript requires advanced JavaScript knowledge and you often have to think about what exact JavaScript code will be produced from your CoffeeScript code.

Tooling support also lags behind so if something goes wrong in your code you will have to debug the resulting JavaScript while simultaneously matching it with the original CoffeeScript source. The amount of time it took me to write the CoffeeScript version of the Dijkstra’s algorithm was actually the same or larger than what I spent on writing the JavaScript version, but then again, I am not an experienced CoffeeScript programmer and maybe as time goes, the speed of development will increase.

Apart from the lack of good development tools for CoffeeScript (at least I did not find any) the resulting code is a bit cryptic and terse, a bit like Bash. Overall, the language somewhat reminds of Ruby and Python.

The main plus of CoffeeScript is getting rid of the syntactic noise that can often be encountered in JavaScript, like Graph.prototype. or .forEach(function() {…}), and as a result you start to see the core logic of the program better. Also, it is nice that CoffeeScript introduces classes, it potentially gives large programs more structure in a more unified way than in JavaScript where everybody devises their own way to emulate class-based inheritance.

TypeScript

TypeScript is a superset of JavaScript enhanced with type annotations, classes and modules. It demonstrates a rather conservative approach to addressing the many issues of JavaScript compared to other languages in this article as TypeScript does not try to replace JavaScript, in fact, any valid JavaScript code is valid TypeScript code.

Here is the TypeScript version:

interface Array {
    remove(element: any): void;
    contains(element: any): bool;
}

Array.prototype.remove = Array.prototype.remove || function(element) {
    var index = this.indexOf(element);

    if (index >= 0) {
        this.splice(index, 1);
    };
};

Array.prototype.contains = Array.prototype.contains || function(element) {
    return this.indexOf(element) >= 0;
};

module graphs {
    export class Graph {
    
        edges: number[][];
    
        constructor(public nodesNumber: number, edges: number[][]) {
            this.initEdges(edges);
        }
        
        initEdges(edges: number[][]): void {
            var oThis = this,
            i = 0;

            this.edges = [];
            for (; i < this.nodesNumber; i++) {
                this.edges[i] = [];
            };        
            if (edges) {
                edges.forEach(function (edge) {
                    oThis.edge.apply(oThis, edge);
                });
            };
        }
        
        edge(from: number, to: number, weight: number): Graph {
            this.edges[from - 1][to - 1] = weight;
            return this;
        }

        //Dijkstra algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
        getShortestPath(from: number, to: number): {path: number[]; length: number;} {
            var unvisited = [],
                current = null,
                distances = [],
                previous = [];

            from = from - 1;        
            to = to - 1;
            //Initialization
            for (var i = 0; i < this.nodesNumber; i++) {
                unvisited.push(i);
                //Infinity
                distances.push(Number.MAX_VALUE);
            };
            distances[from] = 0;
        
            while (true) {
                if (!unvisited.contains(to)) {
                    return this._constructShortestPath(distances, previous, unvisited, to);
                };
                current = this._getUnvisitedVertexWithShortestPath(distances, previous, unvisited);
        
                //No path exists
                if ((null == current) || (Number.MAX_VALUE == distances[current])) {
                    return {
                        path: [],
                        length: Number.MAX_VALUE
                    };
                };
                this._updateDistancesForCurrent(distances, previous, unvisited, current);            
                unvisited.remove(current);
            };
        }

        private _constructShortestPath(distances: number[], previous: number[],
             unvisited: number[], to: number): { path: number[]; length: number; } {
            var vertex = to,
            path = [];

            while (undefined != vertex) {
                path.unshift(vertex + 1);
                vertex = previous[vertex];
            };
            
            return {
                path: path,
                length: distances[to]
            };
        }

        private _getUnvisitedVertexWithShortestPath(distances: number[], previous: number[], unvisited: number[]): number {
            var minimumDistance = Number.MAX_VALUE,
                vertex = null;
            
            unvisited.forEach(function (unvisitedVertex) {
                if (distances[unvisitedVertex] < minimumDistance) {
                    vertex = unvisitedVertex;
                    minimumDistance = distances[vertex];
                };
            });
            return vertex;
        }

        private _updateDistancesForCurrent(distances: number[], previous: number[], unvisited: number[], current: number): void {    
            for (var i = 0; i < this.edges[current].length; i++) {
                var currentEdge = this.edges[current][i];
            
                if ((undefined != currentEdge) &amp;&amp; unvisited.contains(i)) {
                    if (distances[current] + currentEdge < distances[i]) {
                        distances[i] = distances[current] + currentEdge;
                        previous[i] = current;
                    };
                };            
            };
        }
    }
}

var graph = new graphs.Graph(8, [
    [1, 2, 5], [1, 3, 1], [1, 4, 3],
    [2, 3, 2], [2, 5, 2],
    [3, 4, 1], [3, 5, 8],
    [4, 6, 2],
    [5, 7, 1],
    [6, 5, 1]
]);

var shortestPath = graph.getShortestPath(1, 7);

console.log("path = ", shortestPath.path.join(","));
console.log("length = ", shortestPath.length);

//No shortest path to the vertex '8'
console.log(graph.getShortestPath(1, 8));

The main focus of the language is not minimizing the number of lines of the resulting code as in CoffeeScript, but rather making JavaScript friendlier for external tools, static analysis and large project development. As it can be seen, the number of lines is roughly the same as in the JavaScript version.

Converting the existing JavaScript code into more idiomatic TypeScript code is very fast and simple, and can be done seamlessly and gradually for an existing code base, which is a huge plus when deciding to migrate an existing JavaScript project to TypeScript.

TypeScript is also compiled to JavaScript with the command tsc just like the CoffeeScript code in the previous section. However, unlike CoffeeScript, TypeScript has a good tool support in the form of a Visual Studio editor and potentially plugins for other IDEs. For example, the latest version of WebStorm will include support for TypeScript. Developing in TypeScript feels a lot like developing in JavaScript but you feel yourself a bit safer thanks to the type annotations.

The TypeScript language looks very promising. Some of its features have analogs in the recent version of the JavaScript standard ECMAScript 6. The main plus of the language is that it does not reject JavaScript outright but rather tries to improve on the existing language while avoiding introducing too many new concepts for a JavaScript developer.

Dart

Dart, like CoffeeScript, is a separate language, although there are considerably more similarities with JavaScript syntaxwise. Sometimes it feels a bit like trying to bring some Java or C# into JavaScript: it has classes, generics, lists, maps, etc. Dart can be compiled into JavaScript or run directly on a Dart VM.

The Dart version of the Dijkstra’s algorithm:

library Graphs;

class Graph {
  num nodesNumber;
  List<List<num>> edges;
  
  Graph(num nodesNumber, List<List<num>> edges) {
      this.nodesNumber = nodesNumber;
      initEdges(edges);
  }

  void initEdges(List<List<num>> edges) {
      this.edges = new List<List<num>>();
      for (int i = 0; i < nodesNumber; i++) {
          List<num> row = new List<num>();

          for (int j = 0; j < nodesNumber; j++) {            
              row.add(null);
          }
          this.edges.add(row);
      }
      if (!edges.isEmpty) {
          edges.forEach((e) {
              edge(e[0], e[1], e[2]);
          });
      }
  }
  
  void edge(num from, num to, num weight) {
      edges[from - 1][to - 1] = weight;
  }
 
  Map _constructShortestPath(List<num> distances, List<num> previous, List<num> unvisited, num to) {
      num vertex = to;
      List<num> path = new List<num>();

      while (null != vertex) {
          path.add(vertex + 1);
          vertex = previous[vertex];
      };
      
      return {
         'path': path,
         'length': distances[to]
      };
  }
  
  num _getUnvisitedVertexWithShortestPath(List<num> distances, List<num> previous, List<num> unvisited) {
    num minimumDistance = 1/0;
    num vertex = null;
      
    unvisited.forEach((unvisitedVertex) {
      if (distances[unvisitedVertex] < minimumDistance) {
        vertex = unvisitedVertex;
        minimumDistance = distances[vertex];
      };
    });
    return vertex;
  }
  
  void _updateDistancesForCurrent(List<num> distances, List<num> previous, List<num> unvisited, num current) {  
    for (num i = 0; i < edges[current].length; i++) {
      num currentEdge = edges[current][i];
      
      if ((null != currentEdge) &amp;&amp; unvisited.contains(i)) {
        if (distances[current] + currentEdge < distances[i]) {
          distances[i] = distances[current] + currentEdge;
          previous[i] = current;
        };
      };
    };
  }
  
  //Dijkstra algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
  Map getShortestPath(num from, num to) {  
      List<num> unvisited = new List<num>();
      num current = null;
      List<num> distances = new List<num>();
      List<num> previous = new List<num>();

      from = from - 1;    
      to = to - 1;
      //Initialization
      for (num i = 0; i < nodesNumber; i++) {
          unvisited.add(i);
          //Infinity
          distances.add(1/0);
          previous.add(null);
      };
      distances[from] = 0;
    
      while (true) {
          if (!unvisited.contains(to)) {
              return _constructShortestPath(distances, previous, unvisited, to);
          };
          current = _getUnvisitedVertexWithShortestPath(distances, previous, unvisited);
    
          //No path exists
          if ((null == current) || (1/0 == distances[current])) {
              return {
                  'path': [],
                  'length': 1/0
              };
          };
          this._updateDistancesForCurrent(distances, previous, unvisited, current);     
          unvisited.remove(current);
      };
  }
}

void main() {
    Graph graph = new Graph(8, [
        [1, 2, 5], [1, 3, 1], [1, 4, 3],
        [2, 3, 2], [2, 5, 2],
        [3, 4, 1], [3, 5, 8],
        [4, 6, 2],
        [5, 7, 1],
        [6, 5, 1]
    ]);
  
    Map shortestPath = graph.getShortestPath(1, 7);

    print("path = ");
    print(shortestPath['path'].join(","));
    print("length = ");
    print(shortestPath['length']);
  
    //No shortest path to the vertex '8'
    print(graph.getShortestPath(1, 8));
}

From what I read and heard about Dart it seems that its main goal is to enable development of faster web applications. Using raw JavaScript may often lead to suboptimal performance, when using Dart on the other hand, you leverage the power of its compiler that will likely produce more optimal JavaScript code than what you would write yourself or you leverage the engineering effort invested into the Dart VM.

It would be actually interesting to look at the benchmarks how Dart, TypeScript, CoffeeScript and JavaScript applications perform compared to each other but I have a strong suspicion that Dart code will be the fastest one.

Like in TypeScript, there is support for better structuring of your application with libraries, classes and types. The brevity of code is not a goal like in CoffeeScript, instead the language tries to achieve better speed and maintainability.

A bit about tools and editors. There is a special Dart editor which has code assistance and highlighting and makes development much more pleasant. In the future there can also appear plugins for popular IDEs like, Eclipse.

The greatest disadvantage of Dart is the need to learn a new language, however, the language is quite simple, has good online documentation and learning it is still faster than learning CoffeeScript.

Another impression I got from using the language and its library is that it is a bit raw, for example, some features that you would expect to find are sometimes missing: there are no varargs, print accepts only one argument, etc. But this is still much better than Java where you constantly see missing things.

Web Application Development

The main area where JavaScript is used is Web application development, both on the client side and on the server side. The considered languages are generic programming languages that can easily be used in this domain area combined with the existing libraries, please, refer to the documentation on the sites dedicated to a specific language.

ECMAScript 6

The recent version of standard for JavaScript includes support for classes and modules similar to what we considered in other JavaScript related languages here, which should make creating large complex applications easier. See, for example, the following article

Conclusions

  • There is a huge need for JavaScript language improvements due to the changed landscape of JavaScript development, as a result a number of languages has appeared
  • Main improvements come in the areas of maintainability for large complex projects (classes, modules, types), code brevity and performance
  • Introduction of classes, modules (libraries) and types in Dart and TypeScript allows to implement better IDE support
  • Considered languages are generic and can be used both on the client and server side
  • All considered languages can be compiled into JavaScript which plays here a role of a low level assembly language, although it is itself a high-level programming language
  • Some languages are still fast moving and changing and have only their first versions available for use (Dart, TypeScript)
  • All languages preserve functions as first-class citizens which seems to be a very powerful feature in JavaScript

And here is a small matrix that compares the languages against each other:

js_langs

Legend:

Better Structure – it is easy to structure programs into modules
Editor Friendly – it is possible to write editors with re-factoring and autocomplete
Better Speed – improvement in the performance of the code
Large Applications – complex applications with large teams involved are easier to develop
Brevity of Code – if the number of lines of code is reduced
Ease of Learning – whether it is easy to start programming in a language for a JavaScript developer
Easy Debugging – it is easy to find problems in code and debug

There are still quite a few languages that compile to JavaScript but were not covered here. Please, see the following list for more details in case you are interested. For example, there is ClojureScript: a Lisp dialect that can be compiled to JavaScript and many other interesting languages. Unfortunately, I did not yet have time to look at some of them and might as well have missed something that is worth attention.

Links

Eloquent JavaScript (book)
JavaScript: The World’s Most Misunderstood Programming Language
Dijkstra’s algorithm
CoffeeScript language
The Little Book on CoffeeScript
TypeScript language
Goto 2012: TypeScript Keynote
Dart language
Google I/O 2012: Dart – A Modern Web Language
ECMAScript 6
A Few New Things Coming To JavaScript

Producing Code for Machines and Humans

Contents

Code
Efficiency
Abstraction
Requirements for languages
Future

Code

Code (or “source code”) is a set of instructions that can be executed automatically by a machine. It is usually written in a programming language and stored in source files in the textual form. Before it is automatically executed it may be first transformed (compiled) into a format convenient for the target machine or it can be executed right away (interpreted).

Efficiency

Efficient code from the point of view of a machine is code that uses as few resources as possible to execute: memory, network bandwidth, processor time, battery, etc. When deployed inefficient code often costs extra money because it consumes unnecessary resources. Hence we should strive to write efficient code.

But costs associated with code are not only limited to consumed machine resources. Code needs to be maintained and changed throughout its lifetime by humans. And humans have a rather different set of requirements for code than machines. Efficient code for humans is the code that can be quickly understood and easily changed.

Problems of efficiency may arise when a program gets really large and complex with many concepts, rules and flows, because humans have a rather limited attention span, short-term memory, typing speed, etc. In order for a program to be maintainable at all it needs to be adapted to these limitations.

Abstraction

Another area in which humans are remarkably different than machines is the ability to abstract and categorize things and work simultaneously at different levels of abstraction. For example, when we drive a car we do not need to know how the engine works, everything we need is just a steering wheel, a dashboard with measurements, two pedals, etc. In other words, we only need to have an interface through which we can interact with a car. Likewise, when we see a traffic jam on a map we are not interested in how a car is actually driven by a driver, we are interested in how many cars are now standing in our way and do not allow us to get where we want to. That’s an example of abstraction: unnecessary details are stripped out and disregarded in order to get the whole picture in a given context to solve specific problems.

Machines are not really good at this type of thinking. It turns out to be relatively hard to automatically decide what to abstract away for a given task. Ultimately machines always deal with memory bits and electrical signals and only with those abstractions that we explicitly supply them with via the source code.

Requirements for languages

What implications do the outlined above human limitations and the extra ability for abstraction have for code?

To match human perception and the way of thinking code should also be written at a specific level of abstraction that is necessary for solving a particular problem. For example, in a web store application an engineer would like to work with such entities as “shopping cart”, to be able to do a “check out”, etc. The programming language should allow to build such abstractions in the code, otherwise the software will be really hard to maintain or it will be unmaintainable at all.

Building such abstractions means additional resources required from a machine to deal with them, and hence additional machine-related costs. Of course, these costs are now balanced by lowered human-related costs which can be quite high because it takes a lot of money to pay software engineers. And we would always like to lower the overall costs, not just the machine-related costs.

This leads to new requirements for programming languages, especially as the problems being solved and applications being developed get more and more complex and machines more powerful. Languages should allow for humans to build abstractions in code easily and in a manner efficient for a machine.

If we look at the history of programming languages then we see a clear trend: as the time goes the languages become more high-level and more powerful from the perspective of solving a problem fast, but less efficient from the perspective of a machine. First it was machine code, then Assembler, then high-level languages such as Algol, then C++ and object-oriented languages, then Java and garbage collection, then Ruby and domain-specific languages, and the trend still continues to the present moment.

Nowadays building better abstractions with a language means building domain-specific languages better. Domain-specific language is just a small language built to solve a particular problem. In the case with a web store this may be a language that has such notions as “payment method”, “order”, “discount”, etc. which allow you to easily build rules and flows involving them. By they way, the main advantage of Ruby, while it may be slow to execute and more resource-consuming, is the ease of creating domain-specific languages. One such language, Ruby on Rails was very successful precisely because it allowed to build web applications faster and easier than its lower-level competitors.

Future

Now and in the foreseeable future in order to solve some relatively complex problem first a domain-specific language for solving this problem should be established in which this problem is easy to address. Hence the trend is to more machine-inefficient but more abstract and high level applications, ones that are built using domain-specific languages.

And now here are some of the yet not fully answered questions:

  • What are most appropriate building block for abstractions, objects or functions? This is where the whole debate of object-oriented programming vs. functional programming comes in.
  • Is it possible to build self-evolving domain-specific languages and automatically generate domain-specific languages like Ruby on Rails for a specific problem? Various computer-aided software engineering tools and UML tried to solve this problem but none of them managed to completely do it. And a lot of human participation is still usually needed. Moreover, sometimes generated code is really hard or impossible to support for humans.
  • Next machine performance improvements will likely come from the increase in the number of parallel processors and distributed computations. How to best abstract away for humans the complex details underneath such computations? Clearly most of the present-day programming languages were not developed for such an abstraction. This makes programming using them in the future potentially very complex and unproductive. As one such example Java concurrency API is really difficult to use and easy to misuse. There are attempts to address this problem, for example, take a look at the Clojure programming language.
  • What other formats for storing and viewing code can be there? Is it really best to store it as text files? It is hard to make purely visual languages as they often do not easily allow the desired level of customization and flexibility in solving a problem. Normally we would also sometimes access the text representation of the code being generated from the visual representation.

All the above being said, it really comes down to this: we just have to remember that the code is written both for machines and humans, and they have different requirements for efficiency which can be in conflict with each other. We have to balance these requirements: the right level of abstraction for humans and the hardware resource consumption for machines.

Links

Interpreted languages
Compiled languages
Interface
Machine code
Assembler
ALGOL
C++
Java
Ruby
Ruby on Rails
UML
Computer-Aided Software Engineering
Are we there yet? (Rich Hickey)
Clojure programming language
Human-computer interaction

Measuring Language Popularity is Harder Than Many Think

Contents

What languages are most popular?
Measurement method
Technical details
Comparing with other measurements
No comprehensive research anywhere

What languages are most popular?

Obviously it is interesting to know which programming languages are most popular. If some language is popular, this means that you will have plenty of resources such as books and articles to learn it, also, probably, good community support and you will be able to find a lot of ready libraries and tools for that language as well just because so many other developers are already using it. Also it may be interesting to see how our most popular programming language, which we all tend to have once in a while, scores relative to other languages.

Measurement method

But how to measure the language popularity on the Web? The first thing that came to mind was to use different search engines and compare numbers of results for different languages. This seemed like the simplest and the most obvious thing to do. But not so fast! Unfortunately, it turns out, that search counts returned by most popular search engines are just rough estimates of the number of the results they think they should give to you based on your search query, not all the possible results. More details are explained in this article. In other words, search engines are doing well what they are supposed to do: context based search. Nobody designed them to compute exact aggregations over huge amounts of data and they do not usually do this well.

What other options can be there? For once, we can select a few sites with job postings, such as monster.com, or with software development articles and presentations like infoq.com, various forums for software engineers, etc. On these sites we can search for certain programming languages, and by comparing the results we can estimate the relative popularity of the languages.

However, searching just one such resource may not be enough, for example, Java developers can really like one site and Ruby developers at the same time can like completely another site. As later we will see this is actually the case with github.com, which is really popular with JavaScript developers and with stackoverflow.com, which has a large number of C# related questions. But at least we can try to search one of such sites and compare the results with the data we already have from other sources to be more sure in our measurements.

I chose stackoverflow.com as it is a really good and popular site with questions and answers on every software development topic you can think about.

Technical details

So, now I will take the list of all the programming languages from somewhere and search for them on stackoverflow.com. Let’s take, for example, the list of all the languages that are used on github.com. Then we would have to search for each language and write down the number of search results for each one of them. But since it is a really boring and time consuming task and computers were invented a long time ago, let’s write a simple script that will help us do the mundane work and execute around 90 searches. By automating a bit we will also have more confidence in the results as manually doing something is usually more error-prone.

For automation we will use a headless WebKit browser PhantomJS and will generate an HTML report right from our PhantomJS script. The result will be just a simple bar chart rendered with Google Chart Tools.

The complete version of the code for the script is available here.

Some of the code highlights from the key parts of the script are listed below.

Getting the list of all the languages from github.com


function getAllGithubLanguages(callback) {
    page.open("https://github.com/languages", function (status) {
        var allLanguages = page.evaluate(function() {
            var links = document.querySelectorAll(".all_languages a");
            return Array.prototype.slice.call(links, 0).map(function(link) {
                return link.innerHTML;
            });
        });
        callback(allLanguages);
    });
};

By the way, notice how easy it is to work with DOM in JavaScript: all the API specifically adapted for this is already there, so PhantomJS allows us to use querySelectorAll and CSS selectors.

Getting the number of results once they are displayed in the browser.


function getSummaryCount() {
    var resultStats = document.querySelector("div.summarycount"),                   
        regex = /[\d,.]+/g,
        resultsNumber = -1;
                
    if (resultStats) {
        resultsNumber = regex.exec(resultStats.innerHTML)[0];
        resultsNumber = resultsNumber.replace(/[,\.]/g, "");
    };
    return parseInt(resultsNumber);
};

Searching for each language with two URLs in case the first URL produces no results.


function openResultsURL(url, callback) {
    page.open(url, function (status) {                 
        callback(page.evaluate(getSummaryCount));
    });    
};

function search(term, callback) {
    var urls = [
        "http://stackoverflow.com/search?q=" + encodeURIComponent(term),
        "http://stackoverflow.com/tags/" + encodeURIComponent(term)
    ];

    openResultsURL(urls[0], function(resultsCount) {
        if (resultsCount > 0) {
            callback(term, resultsCount);
        } else {
            openResultsURL(urls[1], function(resultsCount) {
                callback(term, resultsCount);
            });
        }
    });
};

Also you may notice how we pass callbacks everywhere. This may seem a bit strange at first if you have not programmed in JavaScript a lot, but this is actually the most common style of programming in JavaScript both on the client and the server side. Here PhantomJS encourages asynchronous programming as well because interaction with the browser is also asynchronous. Each callback is executed once the results are ready at an appropriate point in time. This provides for a more declarative style of programming too.

The entry point into the script, collecting all the search results and saving a generated report.


function saveReport(html) {
    fs.write("top_languages_report.html", html, "w");
};

getAllGithubLanguages(function (languages) {
    var statistics = {},
        int;
    
    languages = Array.prototype.slice.call(languages, 0);
    console.log("Number of languages = ", languages.length);
    int = setInterval(function waitForStatistics() {
        if (0 == activeSearches) {
            if (languages.length > 0) {
                activeSearches++;                
                search(languages.shift(), function (term, count) {
                    console.log(term + " found " + count + " times");               
                    statistics[term] = count;
                    activeSearches--;
                });
            } else {
                console.log("Finished all searches!");
                clearInterval(int);
                saveReport(generateReport(statistics));
                phantom.exit();
            };
        };
    }, TIMEOUT);
});

The code for generating reports that is also in the script is a bit awkward, largely due to the lack of a standard JavaScript library for working efficiently with data structures. It takes quite a bit of effort to transform the results to the format needed for rendering the chart, but the code for doing this is not really what this script is all about, it is just some utility boiler plate that unfortunately cannot be avoided here.So let’s just omit this part.

And, voilà the final chart with the top 30 most popular on stackoverflow.com programming languages.

However, we cannot reach any conclusions based just on the results from one site. Hardly C# is the most popular language, this must be a stackoverflow.com thing.

We can go one step further and search some other sites like Amazon.com but will it give us any more confidence? Let’s stop at this point and compare our results with the results of other similar researches obtained by using slightly different methods.

Comparing with other results

So the top ten languages we got are: C#, Java, PHP, JavaScript, C++, Python, Objective-C, C, Ruby, and Perl.

First, let’s look at TIOBE which uses search engine result counts and we discussed above why it may be not the best idea. It is basically the same 10 languages, but instead of JavaScript it is Visual Basic. Well, maybe Visual Basic has a very lively online community? I doubt it somehow, probably, it is just a lot of Visual Basic books and articles that make it so popular in this index, but everything is possible.

OK, what about the number of projects in different languages at github.com? The following statistics is provided on the site. Also very close to the list that we obtained but instead of C# there is Shell which, probably, can be explained by a lot of people with Linux background who use github.com. It also seems that C# developers do not favor github.com for some reason.

I would say we have a good correlation between the results for the top languages. Still I will be very careful with saying how much the top languages are popular relative to each other since different sources yielded very different results. But, at least, we get the following 12 most popular at the moment programming languages:

C#, Java, PHP, JavaScript, C++, Python, Objective-C, C, Ruby, Perl, Shell, Visual Basic

No comprehensive research anywhere

The problem with measuring the popularity of programming languages seems to be more complex than we initially thought. Developers of a specific language tend to be concentrated around a few sites and different sites give very different results like stackoverflow.com and github.com in the present article. In a way the problem is similar to measuring the popularity of human languages by randomly going to different large world cities. After visiting a couple dozen of cities we may start to get an idea which human languages are popular, but we will have hard time measuring the relative popularity of these languages, and especially hard time with less popular languages: we may just not visit a single large city where those languages are spoken. So, in order to make a good research we should statistically compare results from many different cities (sites), and know the structure of the world (Web) well. Unfortunately, I could not find any such research on the Web and doing it myself requires much more effort than just a weekend project. But even then, is the language popularity only limited to its online popularity?

Links

Git-based collaboration in the cloud github.com
Software development Q&A stackoverflow.com
PhantomJS
Google Chart Tools
Google Chart Tools: Bar Chart
Count results
Github.com top languages
TIOBE programming community index

Smarter Search Engines

Having written a number of simple scripts that aggregate information that can be found on various Internet sites I have a feeling that I am doing some repetitive, useless and low-level work. It is like writing a classic “Hello, world!” program in machine codes. Clearly there is something missing here, something that could have helped […]

Software Development is Just Translation

Writing software is essentially translating descriptions of what a system should do into an exact specification a machine can understand. Descriptions usually come in a human language and are written in terms specific to a problem area. Exact specifications are usually source files that then will be automatically translated into machine codes by compilers.

Translation from one lanuage into another is at the core of software development. What are some of the implications of this view point?

  • Software Engineers are people who translate technical requirements into source code which is essentially just a technical blue print for a machine to execute
  •  Product Managers are persons who gather and systematize the descriptions of a system and can help translate them into technical requirements
  • Bug is a misunderstanding, either a developer did not get the blue print for the machine right from the technical requirements or the technical requirements were wrong, contradictory or incomplete
  •  Software Testers are people who make sure that no misunderstanding occurs during the translation, that is, the system does what it is asked to do and in an expected manner
  • To do software development better means faster and cheaper translation and fewer misunderstandings along the way, that is overall communication should be better and occur at appropriate times, hence various Agile processes make perfect sense, since a lot of communication is between humans
  • In order to translate better and faster the system blue prints should reflect the problem domain, i.e. if we have the notion of ‘shipment’ in the problem domain language we better have something corresponding to it in the system blue prints, say, a class ‘Shipment’, hence domain-driven design is a good idea
  • The higher the level of abstraction in the system blue prints software developers operate with when they are translating, the more productive they are, that is why it is important to have a DSL (domain specific language) emerge over time as the system is developed, and this is also why Ruby and specific problem oriented languages such as BPMN are so popular as they allow for more effective development by either providing effective means for constructing DSLs or making such DSLs available out of the box