JavaScript and Friends: CoffeeScript, Dart and TypeScript

Contents

Why JavaScript Isn’t Enough?
Example JavaScript Program: Dijkstra’s Algorithm
CoffeeScript
TypeScript
Dart
Web Application Development
ECMAScript 6
Conclusions

Why JavaScript Isn’t Enough?

This article assumes that the reader has a good knowledge of JavaScript and has done at least some development in it, but if this is not about you, you can just first refer to one of the beginner’s JavaScript books like Eloquent JavaScript.

JavaScript is an amazing, often underappreciated and misunderstood language. It has some really powerful concepts like functions as first-class citizens (see, for example, JavaScript: The World’s Most Misunderstood Programming Language), flexible prototypal inheritance and is a powerful generic programming language that can be used not only in browsers.

Despite all its power and flexibility the language has some well-known design shortcomings such as global variables, cumbersome emulation of lexical scoping, non-intuitive implicit conversions, etc. In fact, there are parts of the language that you better avoid using at all, as it is advised in JavaScript: The Good Parts. Let us also note that from the beginning JavaScript was not specifically designed for developing applications with large code bases and many developers involved.

Nonetheless, it is increasingly used for developing precisely such applications. In such cases in order for the code to be maintainable it should have a well-defined structure and developers should adhere to a strict coding discipline in order to avoid producing a festering pile of messy code where everything depends on everything and no boundaries can be found between modules. Unfortunately, in JavaScript it is all too easy to transform your code base into an abomination, this is somewhat similar to Ruby, but actually even worse, since unlike in Ruby there is no standard mechanism for emulating class-based inheritance and separating your code into modules and packages. Having no types specified in the source code does not help either.

The point is that for large applications JavaScript developers need to be extra careful and disciplined as they are not particularly restricted from producing a horrible pile of code. The most recent trend is exactly towards complex applications being written in JavaScript. For example, for the client side it was well understood for quite some time that moving a lot of presentational logic to the browser makes much more sense than keeping it away from the client on the server. The clients have become thicker and more complex, MVC frameworks for the client side appeared, one of them Backbone.js, etc. Of course, JavaScript is not limited to just the client side and there appeared large server-side applications as well, see Node.js It looks like now there is time for the language to be updated to be better fit for the new challenges and cases that were likely just not foreseen by its creators from the very beginning.

There are quite a few changes coming to JavaScript in the recent versions of the language standard, for example, see ECMAScript 6, but also simultaneously a number of languages started to appear near JavaScript that try to address the described issues. The present article is just a brief overview of the most well-known of these language and a discussion how they relate to each other and JavaScript. This is not a thorough research of all of the mentioned languages, but rather an attempt to get a feeling of what those languages are and why we should care about them.

Example JavaScript Program: Dijkstra’s Algorithm

For our discussion and comparison to be more tangible and less abstract let’s consider some program in JavaScript and see to what it translates in the selected languages while highlighting the most interesting parts and differences between those languages.

As an example, let’s take an implementation of Dijkstra’s algorithm for finding the shortest path between two nodes in a directed graph. For the purposes of the present article it is actually not important to understand how the algorithm works, but in case you are interested in the algorithm itself, you can just read the Wiki article.

So, here is the implementation in JavaScript:

Array.prototype.remove = Array.prototype.remove || function(element) {
	var index = this.indexOf(element);

	if (index >= 0) {
	    this.splice(index, 1);
	};
};

Array.prototype.contains = Array.prototype.contains || function(element) {
	return this.indexOf(element) >= 0;
};

var graphs = {};

(function(host) {
	function Graph(nodesNumber, edges) {
		this.nodesNumber = nodesNumber;
		this.initEdges(edges);
	};

	Graph.prototype.initEdges = function(edges) {
		var oThis = this,
			i = 0;

		this.edges = [];
		for (; i < this.nodesNumber; i++) {
			this.edges[i] = [];
		};		
		if (edges) {
			edges.forEach(function (edge) {
				oThis.edge(edge[0], edge[1], edge[2]);
			});
		};
	};

	Graph.prototype.edge = function(from, to, weight) {
		this.edges[from - 1][to - 1] = weight;
		return this;
	};
	
	Graph.prototype._constructShortestPath = function(distances, previous, unvisited, to) {
		var vertex = to,
		path = [];

		while (undefined != vertex) {
			path.unshift(vertex + 1);
			vertex = previous[vertex];
		};
			
		return {
			path: path,
			length: distances[to]
		};
	};

	Graph.prototype._getUnvisitedVertexWithShortestPath = function(distances, previous, unvisited) {
		var minimumDistance = Number.MAX_VALUE,
			vertex = null;
			
		unvisited.forEach(function (unvisitedVertex) {
			if (distances[unvisitedVertex] < minimumDistance) {
				vertex = unvisitedVertex;
				minimumDistance = distances[vertex];
			};
		});
		return vertex;
	};
	
	Graph.prototype._updateDistancesForCurrent = function(distances, previous, unvisited, current) {	
		for (var i = 0; i < this.edges[current].length; i++) {
			var currentEdge = this.edges[current][i];
			
			if ((undefined != currentEdge) &amp;&amp; unvisited.contains(i)) {
				if (distances[current] + currentEdge < distances[i]) {
					distances[i] = distances[current] + currentEdge;
					previous[i] = current;
				};
			};			
		};
	};

	//Dijkstra algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
	Graph.prototype.getShortestPath = function(from, to) {
		var unvisited = [],
		    current = null,
		    distances = [],
		    previous = [];

		from = from - 1;		
		to = to - 1;
		//Initialization
		for (var i = 0; i < this.nodesNumber; i++) {
			unvisited.push(i);
			//Infinity
			distances.push(Number.MAX_VALUE);
		};
		distances[from] = 0;
		
		while (true) {
			if (!unvisited.contains(to)) {
				return this._constructShortestPath(distances, previous, unvisited, to);
			};
			current = this._getUnvisitedVertexWithShortestPath(distances, previous, unvisited);
		
			//No path exists
			if ((null == current) || (Number.MAX_VALUE == distances[current])) {
				return {
		    		path: [],
		    		length: Number.MAX_VALUE
				};
			};
			this._updateDistancesForCurrent(distances, previous, unvisited, current);			
			unvisited.remove(current);
		};
	};

	host.Graph = Graph;
})(graphs);

var graph = new graphs.Graph(8, [
	[1, 2, 5], [1, 3, 1], [1, 4, 3],
	[2, 3, 2], [2, 5, 2],
	[3, 4, 1], [3, 5, 8],
	[4, 6, 2],
	[5, 7, 1],
	[6, 5, 1]
]);

var shortestPath = graph.getShortestPath(1, 7);

console.log("path = ", shortestPath.path.join(","));
console.log("length = ", shortestPath.length);

//No shortest path to the vertex '8'
console.log(graph.getShortestPath(1, 8));

This example demonstrates emulating classes with prototypes, extending the core language objects (Array), and working with data structures. All this should already be familiar to you, otherwise, please, refer to some JavaScript resource to quickly get up to speed with the language.

CoffeeScript

CoffeeScript addresses some of the JavaScript issues described above. It introduces classes, shortcuts for most common JavaScript boiler-plate code, such as @ for this and :: for prototype, saves on number of code lines, gets rid of curly braces and actively uses indentation to give structure to your program.

Here is the algorithm rewritten in CoffeeScript:

Array::remove = Array::remove || (element) ->
    index = @indexOf(element)
    @splice(index, 1) if index >= 0

graphs = {}

graphs.Graph = class Graph

    constructor: (@nodesNumber, edges) ->
        @initEdges(edges)

    initEdges: (edges) ->
        @edges = []
        @edges[i] = [] for i in [0..@nodesNumber]
        @edge edge... for edge in edges if edges
    
    edge: (from, to, weight) ->
        @edges[from - 1][to - 1] = weight

    _constructShortestPath = (distances, previous, unvisited, to) ->
        vertex = to
        path = []

        while vertex?
            path.unshift(vertex + 1);
            vertex = previous[vertex];

        path: path
        length: distances[to]
        
    _getUnvisitedVertexWithShortestPath = (distances, previous, unvisited) ->
        minimumDistance = Number.MAX_VALUE

        for unvisitedVertex in unvisited
            if (distances[unvisitedVertex] < minimumDistance)
                vertex = unvisitedVertex
                minimumDistance = distances[vertex]

        vertex

    _updateDistancesForCurrent: (distances, previous, unvisited, current) ->
        for edge, i in @edges[current]
            if ((undefined != edge) &amp;&amp; edge >= 0 &amp;&amp; i in unvisited &amp;&amp; (distances[current] + edge < distances[i]))
                distances[i] = distances[current] + edge
                previous[i] = current

    #Dijkstra algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
    getShortestPath: (from, to) ->
        unvisited = []
        current = null
        distances = []
        previous = []

        from = from - 1        
        to = to - 1

        #Initialization
        for i in [0..@nodesNumber]
            unvisited.push(i)
            #Infinity
            distances.push(Number.MAX_VALUE)

        distances[from] = 0
        
        while (true)
            if (not (to in unvisited))
                return _constructShortestPath(distances, previous, unvisited, to);

            current = _getUnvisitedVertexWithShortestPath(distances, previous, unvisited)
        
            #No path exists
            if ((null == current) || (undefined == current) || (Number.MAX_VALUE == distances[current]))
                return {
                    path: []
                    length: Number.MAX_VALUE
                }

            @_updateDistancesForCurrent(distances, previous, unvisited, current)            
            unvisited.remove(current)

        return

graph = new graphs.Graph(8, [
    [1, 2, 5], [1, 3, 1], [1, 4, 3],
    [2, 3, 2], [2, 5, 2],
    [3, 4, 1], [3, 5, 8],
    [4, 6, 2],
    [5, 7, 1],
    [6, 5, 1]
]);

shortestPath = graph.getShortestPath(1, 7)

console.log("path = ", shortestPath.path.join(","))
console.log("length = ", shortestPath.length)

#No shortest path to the vertex '8'
console.log(graph.getShortestPath(1, 8))

As it can be seen CoffeeScript is a completely different language that can be then (usually) compiled into JavaScript or executed outright. It is not a superset or subset of JavaScript and you cannot just freely mix JavaScript and CoffeeScript constructs in one program. There is still a mechanism for executing JavaScript embedded inside your CoffeeScript, although normally you would not use it.

The CoffeeScript version is somewhat shorter: the number of lines of the CoffeeScript code in our example is approximately 2/3 of the number of lines of the JavaScript version.

From my brief experience with it I can say that programming in CoffeeScript requires advanced JavaScript knowledge and you often have to think about what exact JavaScript code will be produced from your CoffeeScript code.

Tooling support also lags behind so if something goes wrong in your code you will have to debug the resulting JavaScript while simultaneously matching it with the original CoffeeScript source. The amount of time it took me to write the CoffeeScript version of the Dijkstra’s algorithm was actually the same or larger than what I spent on writing the JavaScript version, but then again, I am not an experienced CoffeeScript programmer and maybe as time goes, the speed of development will increase.

Apart from the lack of good development tools for CoffeeScript (at least I did not find any) the resulting code is a bit cryptic and terse, a bit like Bash. Overall, the language somewhat reminds of Ruby and Python.

The main plus of CoffeeScript is getting rid of the syntactic noise that can often be encountered in JavaScript, like Graph.prototype. or .forEach(function() {…}), and as a result you start to see the core logic of the program better. Also, it is nice that CoffeeScript introduces classes, it potentially gives large programs more structure in a more unified way than in JavaScript where everybody devises their own way to emulate class-based inheritance.

TypeScript

TypeScript is a superset of JavaScript enhanced with type annotations, classes and modules. It demonstrates a rather conservative approach to addressing the many issues of JavaScript compared to other languages in this article as TypeScript does not try to replace JavaScript, in fact, any valid JavaScript code is valid TypeScript code.

Here is the TypeScript version:

interface Array {
    remove(element: any): void;
    contains(element: any): bool;
}

Array.prototype.remove = Array.prototype.remove || function(element) {
    var index = this.indexOf(element);

    if (index >= 0) {
        this.splice(index, 1);
    };
};

Array.prototype.contains = Array.prototype.contains || function(element) {
    return this.indexOf(element) >= 0;
};

module graphs {
    export class Graph {
    
        edges: number[][];
    
        constructor(public nodesNumber: number, edges: number[][]) {
            this.initEdges(edges);
        }
        
        initEdges(edges: number[][]): void {
            var oThis = this,
            i = 0;

            this.edges = [];
            for (; i < this.nodesNumber; i++) {
                this.edges[i] = [];
            };        
            if (edges) {
                edges.forEach(function (edge) {
                    oThis.edge.apply(oThis, edge);
                });
            };
        }
        
        edge(from: number, to: number, weight: number): Graph {
            this.edges[from - 1][to - 1] = weight;
            return this;
        }

        //Dijkstra algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
        getShortestPath(from: number, to: number): {path: number[]; length: number;} {
            var unvisited = [],
                current = null,
                distances = [],
                previous = [];

            from = from - 1;        
            to = to - 1;
            //Initialization
            for (var i = 0; i < this.nodesNumber; i++) {
                unvisited.push(i);
                //Infinity
                distances.push(Number.MAX_VALUE);
            };
            distances[from] = 0;
        
            while (true) {
                if (!unvisited.contains(to)) {
                    return this._constructShortestPath(distances, previous, unvisited, to);
                };
                current = this._getUnvisitedVertexWithShortestPath(distances, previous, unvisited);
        
                //No path exists
                if ((null == current) || (Number.MAX_VALUE == distances[current])) {
                    return {
                        path: [],
                        length: Number.MAX_VALUE
                    };
                };
                this._updateDistancesForCurrent(distances, previous, unvisited, current);            
                unvisited.remove(current);
            };
        }

        private _constructShortestPath(distances: number[], previous: number[],
             unvisited: number[], to: number): { path: number[]; length: number; } {
            var vertex = to,
            path = [];

            while (undefined != vertex) {
                path.unshift(vertex + 1);
                vertex = previous[vertex];
            };
            
            return {
                path: path,
                length: distances[to]
            };
        }

        private _getUnvisitedVertexWithShortestPath(distances: number[], previous: number[], unvisited: number[]): number {
            var minimumDistance = Number.MAX_VALUE,
                vertex = null;
            
            unvisited.forEach(function (unvisitedVertex) {
                if (distances[unvisitedVertex] < minimumDistance) {
                    vertex = unvisitedVertex;
                    minimumDistance = distances[vertex];
                };
            });
            return vertex;
        }

        private _updateDistancesForCurrent(distances: number[], previous: number[], unvisited: number[], current: number): void {    
            for (var i = 0; i < this.edges[current].length; i++) {
                var currentEdge = this.edges[current][i];
            
                if ((undefined != currentEdge) &amp;&amp; unvisited.contains(i)) {
                    if (distances[current] + currentEdge < distances[i]) {
                        distances[i] = distances[current] + currentEdge;
                        previous[i] = current;
                    };
                };            
            };
        }
    }
}

var graph = new graphs.Graph(8, [
    [1, 2, 5], [1, 3, 1], [1, 4, 3],
    [2, 3, 2], [2, 5, 2],
    [3, 4, 1], [3, 5, 8],
    [4, 6, 2],
    [5, 7, 1],
    [6, 5, 1]
]);

var shortestPath = graph.getShortestPath(1, 7);

console.log("path = ", shortestPath.path.join(","));
console.log("length = ", shortestPath.length);

//No shortest path to the vertex '8'
console.log(graph.getShortestPath(1, 8));

The main focus of the language is not minimizing the number of lines of the resulting code as in CoffeeScript, but rather making JavaScript friendlier for external tools, static analysis and large project development. As it can be seen, the number of lines is roughly the same as in the JavaScript version.

Converting the existing JavaScript code into more idiomatic TypeScript code is very fast and simple, and can be done seamlessly and gradually for an existing code base, which is a huge plus when deciding to migrate an existing JavaScript project to TypeScript.

TypeScript is also compiled to JavaScript with the command tsc just like the CoffeeScript code in the previous section. However, unlike CoffeeScript, TypeScript has a good tool support in the form of a Visual Studio editor and potentially plugins for other IDEs. For example, the latest version of WebStorm will include support for TypeScript. Developing in TypeScript feels a lot like developing in JavaScript but you feel yourself a bit safer thanks to the type annotations.

The TypeScript language looks very promising. Some of its features have analogs in the recent version of the JavaScript standard ECMAScript 6. The main plus of the language is that it does not reject JavaScript outright but rather tries to improve on the existing language while avoiding introducing too many new concepts for a JavaScript developer.

Dart

Dart, like CoffeeScript, is a separate language, although there are considerably more similarities with JavaScript syntaxwise. Sometimes it feels a bit like trying to bring some Java or C# into JavaScript: it has classes, generics, lists, maps, etc. Dart can be compiled into JavaScript or run directly on a Dart VM.

The Dart version of the Dijkstra’s algorithm:

library Graphs;

class Graph {
  num nodesNumber;
  List<List<num>> edges;
  
  Graph(num nodesNumber, List<List<num>> edges) {
      this.nodesNumber = nodesNumber;
      initEdges(edges);
  }

  void initEdges(List<List<num>> edges) {
      this.edges = new List<List<num>>();
      for (int i = 0; i < nodesNumber; i++) {
          List<num> row = new List<num>();

          for (int j = 0; j < nodesNumber; j++) {            
              row.add(null);
          }
          this.edges.add(row);
      }
      if (!edges.isEmpty) {
          edges.forEach((e) {
              edge(e[0], e[1], e[2]);
          });
      }
  }
  
  void edge(num from, num to, num weight) {
      edges[from - 1][to - 1] = weight;
  }
 
  Map _constructShortestPath(List<num> distances, List<num> previous, List<num> unvisited, num to) {
      num vertex = to;
      List<num> path = new List<num>();

      while (null != vertex) {
          path.add(vertex + 1);
          vertex = previous[vertex];
      };
      
      return {
         'path': path,
         'length': distances[to]
      };
  }
  
  num _getUnvisitedVertexWithShortestPath(List<num> distances, List<num> previous, List<num> unvisited) {
    num minimumDistance = 1/0;
    num vertex = null;
      
    unvisited.forEach((unvisitedVertex) {
      if (distances[unvisitedVertex] < minimumDistance) {
        vertex = unvisitedVertex;
        minimumDistance = distances[vertex];
      };
    });
    return vertex;
  }
  
  void _updateDistancesForCurrent(List<num> distances, List<num> previous, List<num> unvisited, num current) {  
    for (num i = 0; i < edges[current].length; i++) {
      num currentEdge = edges[current][i];
      
      if ((null != currentEdge) &amp;&amp; unvisited.contains(i)) {
        if (distances[current] + currentEdge < distances[i]) {
          distances[i] = distances[current] + currentEdge;
          previous[i] = current;
        };
      };
    };
  }
  
  //Dijkstra algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
  Map getShortestPath(num from, num to) {  
      List<num> unvisited = new List<num>();
      num current = null;
      List<num> distances = new List<num>();
      List<num> previous = new List<num>();

      from = from - 1;    
      to = to - 1;
      //Initialization
      for (num i = 0; i < nodesNumber; i++) {
          unvisited.add(i);
          //Infinity
          distances.add(1/0);
          previous.add(null);
      };
      distances[from] = 0;
    
      while (true) {
          if (!unvisited.contains(to)) {
              return _constructShortestPath(distances, previous, unvisited, to);
          };
          current = _getUnvisitedVertexWithShortestPath(distances, previous, unvisited);
    
          //No path exists
          if ((null == current) || (1/0 == distances[current])) {
              return {
                  'path': [],
                  'length': 1/0
              };
          };
          this._updateDistancesForCurrent(distances, previous, unvisited, current);     
          unvisited.remove(current);
      };
  }
}

void main() {
    Graph graph = new Graph(8, [
        [1, 2, 5], [1, 3, 1], [1, 4, 3],
        [2, 3, 2], [2, 5, 2],
        [3, 4, 1], [3, 5, 8],
        [4, 6, 2],
        [5, 7, 1],
        [6, 5, 1]
    ]);
  
    Map shortestPath = graph.getShortestPath(1, 7);

    print("path = ");
    print(shortestPath['path'].join(","));
    print("length = ");
    print(shortestPath['length']);
  
    //No shortest path to the vertex '8'
    print(graph.getShortestPath(1, 8));
}

From what I read and heard about Dart it seems that its main goal is to enable development of faster web applications. Using raw JavaScript may often lead to suboptimal performance, when using Dart on the other hand, you leverage the power of its compiler that will likely produce more optimal JavaScript code than what you would write yourself or you leverage the engineering effort invested into the Dart VM.

It would be actually interesting to look at the benchmarks how Dart, TypeScript, CoffeeScript and JavaScript applications perform compared to each other but I have a strong suspicion that Dart code will be the fastest one.

Like in TypeScript, there is support for better structuring of your application with libraries, classes and types. The brevity of code is not a goal like in CoffeeScript, instead the language tries to achieve better speed and maintainability.

A bit about tools and editors. There is a special Dart editor which has code assistance and highlighting and makes development much more pleasant. In the future there can also appear plugins for popular IDEs like, Eclipse.

The greatest disadvantage of Dart is the need to learn a new language, however, the language is quite simple, has good online documentation and learning it is still faster than learning CoffeeScript.

Another impression I got from using the language and its library is that it is a bit raw, for example, some features that you would expect to find are sometimes missing: there are no varargs, print accepts only one argument, etc. But this is still much better than Java where you constantly see missing things.

Web Application Development

The main area where JavaScript is used is Web application development, both on the client side and on the server side. The considered languages are generic programming languages that can easily be used in this domain area combined with the existing libraries, please, refer to the documentation on the sites dedicated to a specific language.

ECMAScript 6

The recent version of standard for JavaScript includes support for classes and modules similar to what we considered in other JavaScript related languages here, which should make creating large complex applications easier. See, for example, the following article

Conclusions

  • There is a huge need for JavaScript language improvements due to the changed landscape of JavaScript development, as a result a number of languages has appeared
  • Main improvements come in the areas of maintainability for large complex projects (classes, modules, types), code brevity and performance
  • Introduction of classes, modules (libraries) and types in Dart and TypeScript allows to implement better IDE support
  • Considered languages are generic and can be used both on the client and server side
  • All considered languages can be compiled into JavaScript which plays here a role of a low level assembly language, although it is itself a high-level programming language
  • Some languages are still fast moving and changing and have only their first versions available for use (Dart, TypeScript)
  • All languages preserve functions as first-class citizens which seems to be a very powerful feature in JavaScript

And here is a small matrix that compares the languages against each other:

js_langs

Legend:

Better Structure – it is easy to structure programs into modules
Editor Friendly – it is possible to write editors with re-factoring and autocomplete
Better Speed – improvement in the performance of the code
Large Applications – complex applications with large teams involved are easier to develop
Brevity of Code – if the number of lines of code is reduced
Ease of Learning – whether it is easy to start programming in a language for a JavaScript developer
Easy Debugging – it is easy to find problems in code and debug

There are still quite a few languages that compile to JavaScript but were not covered here. Please, see the following list for more details in case you are interested. For example, there is ClojureScript: a Lisp dialect that can be compiled to JavaScript and many other interesting languages. Unfortunately, I did not yet have time to look at some of them and might as well have missed something that is worth attention.

Links

Eloquent JavaScript (book)
JavaScript: The World’s Most Misunderstood Programming Language
Dijkstra’s algorithm
CoffeeScript language
The Little Book on CoffeeScript
TypeScript language
Goto 2012: TypeScript Keynote
Dart language
Google I/O 2012: Dart – A Modern Web Language
ECMAScript 6
A Few New Things Coming To JavaScript

Advertisements

A Gentle Introduction to Routine Task Automation with Ruby

Contents

Learn You a Bit of Ruby
Simple Routine
Automation
Alternatives to Ruby
More Examples of Automation
Studying Ruby Further

Learn You a Bit of Ruby

Even if you are not a software developer you absolutely have to sometime give Ruby a try. It is a nice language geared towards increasing programmer’s productivity and is a lot like a Swiss Army Knife of the world of programming: in many situations you will find that Ruby is the right tool for achieving what you need.

I mainly use Ruby for rapid prototyping of new features, automation of routine administrative tasks and for test automation but Ruby is not limited to this. For example, a very popular area for Ruby is Web development, where the Ruby-on-Rails framework is widely used.

There are, of course, as it is the case with any language, areas for which Ruby and associated with it libraries and frameworks are not the best choice. Examples of this are performance-critical low level applications such as drivers or highly scalable powerful server-side architectures. We just have to keep in mind that not just Ruby but any tool is best used in a particular context and has its natural limitations and imperfections as well as advantages.

Learning Ruby is quite pleasant and surprisingly easy and you will quickly be able to achieve in 10 lines of code what seasoned Java developers cann’t do in 100 lines. Chances are that Ruby will also give you a better understanding of programming languages and algorithms in general.

Simple Routine

My first experience with Ruby was with automating tasks that I had to routinely perform on my machine. Let’s now consider one such simple task:

On my computer I have quite a few directories with photos but they are a bit unorganized. For each event (like “New Year Celebration”) I have a directory with photos which in turn can contain subdirectories with photos that can contain other subdirectories with photos, etc. Mainly this is because I make pictures from different devices but also sometimes I select a few photos and put them in a separate subdirectory so that I can send them to a friend.

Now I would like to organize these photos in such a way that the directory structure is flattened: only the top level directories corresponding to events are left, all the photos from the nested subdirectories are copied to these top level directories and then the nested subdirectories themselves are removed.

So, how do we go about such a routine task? Of course, we can do all the actions manually and perform roughly the same routine procedure involving moving files and removing directories for 30 or so top level event directories. However, this will be extremely boring, tiresome, unproductive, error-prone and wrong. Such routines should not be left to humans, they should be automated and done by a machine. Let’s use the Swiss Army Knife – Ruby to automate this.

Automation

Some familiarity with Ruby would not be excessive here, but it’s OK if you don’t understand everything in the code listings as I will try to explain the code as we go. However, if you are new to programming in general or to Ruby, feel free to follow the following nice 20 minutes Ruby tutorial. At this point you can also try to install Ruby and familiarize yourself with how to create and run simple scripts if you want to follow along and play with the provided code a bit.

All begins fairly straightforward. We import the libraries that we will need for our simple script.

require 'find'
require 'fileutils'
require 'pathname'

Then we write a procedure that will perform some action passed as a block for each first level subdirectory in a given directory:

def for_each_child_directory(root_dir)
  return if !block_given?
  old_dir = Dir.pwd
  Dir.chdir root_dir
  Dir.glob("*") do |file_path|
    if File.directory?(file_path)
      yield file_path
    end
  end
  Dir.chdir old_dir
end

This procedure can now be called as follows:

  for_each_child_directory("/home/anton") do |dir|
    puts "Found first level directory #{dir}"
  end

where

  do |dir|
    puts "Found first level directory #{dir}"
  end

is a block of code that is passed to the method for_each_child_directory in addition to the explicit parameter root_dir. You can think of this block as a special kind of hidden parameter which contains something that can be executed and to which we can pass parameters of its own with the following construct:

  yield file_path

Now, let’s implement other procedures.

def move_all_new_files_from_subdirectories_to(dir)
  Find.find(dir) do |file_path|
    already_exists = File.exist?(File.join(dir, Pathname.new(file_path).basename))
    if !File.directory?(file_path) && !already_exists
      FileUtils.mv(file_path, dir) 
    end
  end
end

This procedure moves all the files from the subdirectories to the root directory which is specified as the parameter dir. For each file in a subdirectory we move it to the root directory only if it is not a directory and it does not already exist there.

Once the files have been moved to the root we can remove the empty subdirectories of the root directory dir:

def remove_all_subdirectories_in(dir)
  for_each_child_directory(dir) do |dir|
    FileUtils.rm_rf dir
  end
end

And now we bind the procedures that we defined so far together and make sure that we can call the script from the console:

def flatten_directory(dir)
  move_all_new_files_from_subdirectories_to(dir)
  remove_all_subdirectories_in(dir)
end

def flatten_child_directories(root_dir)
  for_each_child_directory(root_dir) do |dir|
    flatten_directory dir
  end
end

flatten_child_directories(ARGV[0] || ".")

ARGV[0] is the first argument that we pass to the script from the console when we call it like this:

ruby flatten_directories.rb “/home/anton/photos”

The whole script can be found here

Alternatives to Ruby

There are, of course, a number of alternatives for automating the task above or similar tasks. For example, we could have used Bash for Linux or PowerShell for Windows. The advantage of such tools is that they are native to the platform for which you implement automation. However, their huge disadvantage is poor scalability (it is hard to write large scripts) and maintainability. Bash has a very terse and cryptic syntax that is hard to learn and use unless you write Bash scripts every day. Another thing is that what you learn about Bash is specific for Bash and you cannot really reuse it anywhere else outside of Bash scripting. So you invest quite a lot of time into learning a programming language that has a very limited usage: administrative scripting on a specific platform. Often the Bash scripts written some time ago are hard to read and understand, especially this is true for large scripts (>= 100 lines of code). My advice would be to use Bash only for really simple scripts and do not allow your scripts grow to more than 10-15 lines. Otherwise Ruby (often with some Bash or PowerShell commands mixed into the script) will do a much better job.

Let’s list the main advantages of using Ruby to automate routine tasks:

  • Availability of many good libraries for easy networking and file system access

    The used ‘fileutils’ library provides a good example of this.

  • Minimum of dependency on the underlying platform

    The written script is cross-platform, it can be executed both on Windows and Linux. In most scripts, however, there may be calls specific for the underlying operating system, for example:

    def download(file_URL, file_name)
      system("wget -c \"#{file_URL}\" -O \"#{file_name}\"")
    end
    

    Here we are using the wget utility to download a file. Unless there is a wget available the script will not run. So the cross-platformness has its limits.

  • Seamless integration with the native commands

    It is possible to resort to Bash in a Ruby script by using the methods system or IO.popen. However, it is much nicer to glue the low level commands with Ruby rather than with pure Bash.

  • Ruby allows to write procedural style simple scripts, but you can also introduce objects and additional structure to your scripts as you go.

    The script above is written in the procedural style in which Ruby allows you to program, but should you find yourself in a need to write a larger script you can easily use objects and modules as in any Ruby program and this will give you more structure and make it easier to maintain and understand the script.

More Examples of Automation

Here are a few other examples of automation:

  • CD grabber

    Grabs CD tracks into MP3 with an option to fetch information about tracks from Amazon.com or from the local file system.

  • Apartment watcher

    Checks the availability of apartments and notifies via e-mail when a new apartment with the specified search options becomes available.

  • Podcast client

    Downloads podcasts from a given URL to the local machine.

Studying Ruby Further

You will not need to know advanced Ruby to write simple scripts like the one in the present article and Everyday Scripting with Ruby: for Teams, Testers, and You will provide a nice start if you are not a programmer.

But if you are a novice Ruby programmer and would like to study Ruby more like a programming language from a perspective of a developer you can read Beginning Ruby.

Ruby is a generic programming language well-suited for a number of tasks. One of them is automation. So, next time when you are faced with some routine administrative task on your machine why don’t you learn a bit of Ruby and give it a try to help you automate it?

Links

Ruby language
Ruby in 20 minutes (tutorial)
Everyday Scripting with Ruby: for Teams, Testers, and You (book)
Beginning Ruby (book)
Pro Bash Programming: Scripting the Linux Shell (book)
Windows PowerShell
Ruby-on-Rails

Producing Code for Machines and Humans

Contents

Code
Efficiency
Abstraction
Requirements for languages
Future

Code

Code (or “source code”) is a set of instructions that can be executed automatically by a machine. It is usually written in a programming language and stored in source files in the textual form. Before it is automatically executed it may be first transformed (compiled) into a format convenient for the target machine or it can be executed right away (interpreted).

Efficiency

Efficient code from the point of view of a machine is code that uses as few resources as possible to execute: memory, network bandwidth, processor time, battery, etc. When deployed inefficient code often costs extra money because it consumes unnecessary resources. Hence we should strive to write efficient code.

But costs associated with code are not only limited to consumed machine resources. Code needs to be maintained and changed throughout its lifetime by humans. And humans have a rather different set of requirements for code than machines. Efficient code for humans is the code that can be quickly understood and easily changed.

Problems of efficiency may arise when a program gets really large and complex with many concepts, rules and flows, because humans have a rather limited attention span, short-term memory, typing speed, etc. In order for a program to be maintainable at all it needs to be adapted to these limitations.

Abstraction

Another area in which humans are remarkably different than machines is the ability to abstract and categorize things and work simultaneously at different levels of abstraction. For example, when we drive a car we do not need to know how the engine works, everything we need is just a steering wheel, a dashboard with measurements, two pedals, etc. In other words, we only need to have an interface through which we can interact with a car. Likewise, when we see a traffic jam on a map we are not interested in how a car is actually driven by a driver, we are interested in how many cars are now standing in our way and do not allow us to get where we want to. That’s an example of abstraction: unnecessary details are stripped out and disregarded in order to get the whole picture in a given context to solve specific problems.

Machines are not really good at this type of thinking. It turns out to be relatively hard to automatically decide what to abstract away for a given task. Ultimately machines always deal with memory bits and electrical signals and only with those abstractions that we explicitly supply them with via the source code.

Requirements for languages

What implications do the outlined above human limitations and the extra ability for abstraction have for code?

To match human perception and the way of thinking code should also be written at a specific level of abstraction that is necessary for solving a particular problem. For example, in a web store application an engineer would like to work with such entities as “shopping cart”, to be able to do a “check out”, etc. The programming language should allow to build such abstractions in the code, otherwise the software will be really hard to maintain or it will be unmaintainable at all.

Building such abstractions means additional resources required from a machine to deal with them, and hence additional machine-related costs. Of course, these costs are now balanced by lowered human-related costs which can be quite high because it takes a lot of money to pay software engineers. And we would always like to lower the overall costs, not just the machine-related costs.

This leads to new requirements for programming languages, especially as the problems being solved and applications being developed get more and more complex and machines more powerful. Languages should allow for humans to build abstractions in code easily and in a manner efficient for a machine.

If we look at the history of programming languages then we see a clear trend: as the time goes the languages become more high-level and more powerful from the perspective of solving a problem fast, but less efficient from the perspective of a machine. First it was machine code, then Assembler, then high-level languages such as Algol, then C++ and object-oriented languages, then Java and garbage collection, then Ruby and domain-specific languages, and the trend still continues to the present moment.

Nowadays building better abstractions with a language means building domain-specific languages better. Domain-specific language is just a small language built to solve a particular problem. In the case with a web store this may be a language that has such notions as “payment method”, “order”, “discount”, etc. which allow you to easily build rules and flows involving them. By they way, the main advantage of Ruby, while it may be slow to execute and more resource-consuming, is the ease of creating domain-specific languages. One such language, Ruby on Rails was very successful precisely because it allowed to build web applications faster and easier than its lower-level competitors.

Future

Now and in the foreseeable future in order to solve some relatively complex problem first a domain-specific language for solving this problem should be established in which this problem is easy to address. Hence the trend is to more machine-inefficient but more abstract and high level applications, ones that are built using domain-specific languages.

And now here are some of the yet not fully answered questions:

  • What are most appropriate building block for abstractions, objects or functions? This is where the whole debate of object-oriented programming vs. functional programming comes in.
  • Is it possible to build self-evolving domain-specific languages and automatically generate domain-specific languages like Ruby on Rails for a specific problem? Various computer-aided software engineering tools and UML tried to solve this problem but none of them managed to completely do it. And a lot of human participation is still usually needed. Moreover, sometimes generated code is really hard or impossible to support for humans.
  • Next machine performance improvements will likely come from the increase in the number of parallel processors and distributed computations. How to best abstract away for humans the complex details underneath such computations? Clearly most of the present-day programming languages were not developed for such an abstraction. This makes programming using them in the future potentially very complex and unproductive. As one such example Java concurrency API is really difficult to use and easy to misuse. There are attempts to address this problem, for example, take a look at the Clojure programming language.
  • What other formats for storing and viewing code can be there? Is it really best to store it as text files? It is hard to make purely visual languages as they often do not easily allow the desired level of customization and flexibility in solving a problem. Normally we would also sometimes access the text representation of the code being generated from the visual representation.

All the above being said, it really comes down to this: we just have to remember that the code is written both for machines and humans, and they have different requirements for efficiency which can be in conflict with each other. We have to balance these requirements: the right level of abstraction for humans and the hardware resource consumption for machines.

Links

Interpreted languages
Compiled languages
Interface
Machine code
Assembler
ALGOL
C++
Java
Ruby
Ruby on Rails
UML
Computer-Aided Software Engineering
Are we there yet? (Rich Hickey)
Clojure programming language
Human-computer interaction

Measuring Language Popularity is Harder Than Many Think

Contents

What languages are most popular?
Measurement method
Technical details
Comparing with other measurements
No comprehensive research anywhere

What languages are most popular?

Obviously it is interesting to know which programming languages are most popular. If some language is popular, this means that you will have plenty of resources such as books and articles to learn it, also, probably, good community support and you will be able to find a lot of ready libraries and tools for that language as well just because so many other developers are already using it. Also it may be interesting to see how our most popular programming language, which we all tend to have once in a while, scores relative to other languages.

Measurement method

But how to measure the language popularity on the Web? The first thing that came to mind was to use different search engines and compare numbers of results for different languages. This seemed like the simplest and the most obvious thing to do. But not so fast! Unfortunately, it turns out, that search counts returned by most popular search engines are just rough estimates of the number of the results they think they should give to you based on your search query, not all the possible results. More details are explained in this article. In other words, search engines are doing well what they are supposed to do: context based search. Nobody designed them to compute exact aggregations over huge amounts of data and they do not usually do this well.

What other options can be there? For once, we can select a few sites with job postings, such as monster.com, or with software development articles and presentations like infoq.com, various forums for software engineers, etc. On these sites we can search for certain programming languages, and by comparing the results we can estimate the relative popularity of the languages.

However, searching just one such resource may not be enough, for example, Java developers can really like one site and Ruby developers at the same time can like completely another site. As later we will see this is actually the case with github.com, which is really popular with JavaScript developers and with stackoverflow.com, which has a large number of C# related questions. But at least we can try to search one of such sites and compare the results with the data we already have from other sources to be more sure in our measurements.

I chose stackoverflow.com as it is a really good and popular site with questions and answers on every software development topic you can think about.

Technical details

So, now I will take the list of all the programming languages from somewhere and search for them on stackoverflow.com. Let’s take, for example, the list of all the languages that are used on github.com. Then we would have to search for each language and write down the number of search results for each one of them. But since it is a really boring and time consuming task and computers were invented a long time ago, let’s write a simple script that will help us do the mundane work and execute around 90 searches. By automating a bit we will also have more confidence in the results as manually doing something is usually more error-prone.

For automation we will use a headless WebKit browser PhantomJS and will generate an HTML report right from our PhantomJS script. The result will be just a simple bar chart rendered with Google Chart Tools.

The complete version of the code for the script is available here.

Some of the code highlights from the key parts of the script are listed below.

Getting the list of all the languages from github.com


function getAllGithubLanguages(callback) {
    page.open("https://github.com/languages", function (status) {
        var allLanguages = page.evaluate(function() {
            var links = document.querySelectorAll(".all_languages a");
            return Array.prototype.slice.call(links, 0).map(function(link) {
                return link.innerHTML;
            });
        });
        callback(allLanguages);
    });
};

By the way, notice how easy it is to work with DOM in JavaScript: all the API specifically adapted for this is already there, so PhantomJS allows us to use querySelectorAll and CSS selectors.

Getting the number of results once they are displayed in the browser.


function getSummaryCount() {
    var resultStats = document.querySelector("div.summarycount"),                   
        regex = /[\d,.]+/g,
        resultsNumber = -1;
                
    if (resultStats) {
        resultsNumber = regex.exec(resultStats.innerHTML)[0];
        resultsNumber = resultsNumber.replace(/[,\.]/g, "");
    };
    return parseInt(resultsNumber);
};

Searching for each language with two URLs in case the first URL produces no results.


function openResultsURL(url, callback) {
    page.open(url, function (status) {                 
        callback(page.evaluate(getSummaryCount));
    });    
};

function search(term, callback) {
    var urls = [
        "http://stackoverflow.com/search?q=" + encodeURIComponent(term),
        "http://stackoverflow.com/tags/" + encodeURIComponent(term)
    ];

    openResultsURL(urls[0], function(resultsCount) {
        if (resultsCount > 0) {
            callback(term, resultsCount);
        } else {
            openResultsURL(urls[1], function(resultsCount) {
                callback(term, resultsCount);
            });
        }
    });
};

Also you may notice how we pass callbacks everywhere. This may seem a bit strange at first if you have not programmed in JavaScript a lot, but this is actually the most common style of programming in JavaScript both on the client and the server side. Here PhantomJS encourages asynchronous programming as well because interaction with the browser is also asynchronous. Each callback is executed once the results are ready at an appropriate point in time. This provides for a more declarative style of programming too.

The entry point into the script, collecting all the search results and saving a generated report.


function saveReport(html) {
    fs.write("top_languages_report.html", html, "w");
};

getAllGithubLanguages(function (languages) {
    var statistics = {},
        int;
    
    languages = Array.prototype.slice.call(languages, 0);
    console.log("Number of languages = ", languages.length);
    int = setInterval(function waitForStatistics() {
        if (0 == activeSearches) {
            if (languages.length > 0) {
                activeSearches++;                
                search(languages.shift(), function (term, count) {
                    console.log(term + " found " + count + " times");               
                    statistics[term] = count;
                    activeSearches--;
                });
            } else {
                console.log("Finished all searches!");
                clearInterval(int);
                saveReport(generateReport(statistics));
                phantom.exit();
            };
        };
    }, TIMEOUT);
});

The code for generating reports that is also in the script is a bit awkward, largely due to the lack of a standard JavaScript library for working efficiently with data structures. It takes quite a bit of effort to transform the results to the format needed for rendering the chart, but the code for doing this is not really what this script is all about, it is just some utility boiler plate that unfortunately cannot be avoided here.So let’s just omit this part.

And, voilà the final chart with the top 30 most popular on stackoverflow.com programming languages.

However, we cannot reach any conclusions based just on the results from one site. Hardly C# is the most popular language, this must be a stackoverflow.com thing.

We can go one step further and search some other sites like Amazon.com but will it give us any more confidence? Let’s stop at this point and compare our results with the results of other similar researches obtained by using slightly different methods.

Comparing with other results

So the top ten languages we got are: C#, Java, PHP, JavaScript, C++, Python, Objective-C, C, Ruby, and Perl.

First, let’s look at TIOBE which uses search engine result counts and we discussed above why it may be not the best idea. It is basically the same 10 languages, but instead of JavaScript it is Visual Basic. Well, maybe Visual Basic has a very lively online community? I doubt it somehow, probably, it is just a lot of Visual Basic books and articles that make it so popular in this index, but everything is possible.

OK, what about the number of projects in different languages at github.com? The following statistics is provided on the site. Also very close to the list that we obtained but instead of C# there is Shell which, probably, can be explained by a lot of people with Linux background who use github.com. It also seems that C# developers do not favor github.com for some reason.

I would say we have a good correlation between the results for the top languages. Still I will be very careful with saying how much the top languages are popular relative to each other since different sources yielded very different results. But, at least, we get the following 12 most popular at the moment programming languages:

C#, Java, PHP, JavaScript, C++, Python, Objective-C, C, Ruby, Perl, Shell, Visual Basic

No comprehensive research anywhere

The problem with measuring the popularity of programming languages seems to be more complex than we initially thought. Developers of a specific language tend to be concentrated around a few sites and different sites give very different results like stackoverflow.com and github.com in the present article. In a way the problem is similar to measuring the popularity of human languages by randomly going to different large world cities. After visiting a couple dozen of cities we may start to get an idea which human languages are popular, but we will have hard time measuring the relative popularity of these languages, and especially hard time with less popular languages: we may just not visit a single large city where those languages are spoken. So, in order to make a good research we should statistically compare results from many different cities (sites), and know the structure of the world (Web) well. Unfortunately, I could not find any such research on the Web and doing it myself requires much more effort than just a weekend project. But even then, is the language popularity only limited to its online popularity?

Links

Git-based collaboration in the cloud github.com
Software development Q&A stackoverflow.com
PhantomJS
Google Chart Tools
Google Chart Tools: Bar Chart
Count results
Github.com top languages
TIOBE programming community index

Sorting Algorithms in Haskell

Contents

Why Haskell?
Quicksort
Mergesort
Bubble Sorting

Why Haskell?

Recently I decided to learn a bit of Haskell. Having programmed a bit in Clojure and having some familiarity with Common Lisp and Scheme I always wanted to take a closer look at Haskell.

What distinguishes Haskell is that it is a purely functional language, without state and variables. As a result statements in it are very close to mathematical expressions.

While reading Learn You a Haskell book I sat down the second day’s evening behind my computer to write some simple sorting algorithms and was pleasantly surprised with the result: it was really easy and fast to implement these algorithms in Haskell, and the code reads almost like the definitions of the algorithms itself.

While I am still learning and my Haskell code may still be far from perfect I just wanted to share these first results and maybe convince you to also take a look at Haskell. If you are not comfortable with the syntax, please, refer to the first chapters of the Learn You a Haskell book. Some prior Haskell knowledge may be beneficial for reading this article although I try to explain the syntax and the language constructs a bit in the context of the provided examples.

So, let’s go over some of the best known classical sorting algorithms and try to use Haskell to implement them.

Quicksort

Let’s define the function quicksort that will implement the Quicksort algorithm.

Haskell is a statically typed language. When a function library is compiled, compiler tries to infer types where it can and we can also help it by specifying them explicitly.

quicksort :: (Ord a) => [a] -> [a]

The function quicksort takes a list [a] of some type a, such that elements of a can be compared with each other (this is specified by using the (Ord a) guard). And then the function returns a list of the same type [a].

The quicksort algorithm itself is really simple:

1. We pick some element x in the list

2. The rest of the elements in the list are separated into two lists: elements smaller than x and elements greater than x

3. The algorithm is applied recursively to these lists and then the list with smaller elements, the selected element and the list of greater elements are concatenated together and the sorting is done

Here is how we would implement this in Haskell:

quicksort (x:xs) = quicksort [y | y <- xs, y <= x] ++ [x] ++ quicksort [y | y <- xs, y > x]

quicksort takes an argument (x:xs) which is a list consisting of the first element x and the rest of the list xs. Then we apply list comprehension [y | y <- xs, y <= x] to get a list of all the elements in the list xs that are smaller or equal than x. Then we concatenate the resulting list with a single element list [x] and the list of elements that are greater than x.

So the recursion on which the Quicksort algorithm is built is now defined by the function quicksort, but we still need to finish the recursion at some point, so we need to specify the condition when the recursion ends. This is also easily done in Haskell by augmenting the definition of the function quicksort with one more rule:

quicksort [] = []

Thus the algorithm applied on an empty list will return an empty list.

Combining everything together we get the complete Quicksort implementation in Haskell:

quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) = quicksort [y | y <- xs, y <= x] ++ [x] ++ quicksort [y | y <- xs, y > x]

Note, how mathematically pure this implementation looks. No variables, no order in which the steps of the algorithm should be performed, just the specification of how the algorithm should work.

Mergesort

Mergesort is a little more complicated to implement. The algorithm as follows:

1. List is split into two parts

2. Two parts are sorted by the algorithm

3. The sorted parts are merged by a special merging procedure for sorted lists

Let’s first define how we split a list into two parts:

mergesort'splitinhalf :: [a] -> ([a], [a])
mergesort'splitinhalf xs = (take n xs, drop n xs)
    where n = (length xs) `div` 2 

The function mergesort’splitinhalf returns a pair of arrays into which the original array was split.
n is equal to the half of the length of the array, and then we use the standard functions take and drop
to get the first n elements of the list take n xs and the rest of the list after those first elements drop n xs.

Let’s now define a function for merging two sorted arrays:

mergesort'merge :: (Ord a) => [a] -> [a] -> [a]
mergesort'merge [] xs = xs
mergesort'merge xs [] = xs
mergesort'merge (x:xs) (y:ys)
    | (x < y) = x:mergesort'merge xs (y:ys)
    | otherwise = y:mergesort'merge (x:xs) ys

The function receives two arrays and produces one array of the same type. The algorithm for merging:

1. If the first list is empty [] then the result of the merge is the second list xs

2. If the second list is empty [] then the result of the merge is the first list xs

3. Otherwise we compare the first elements of the lists and append with the colon : function the least of them to the new list which is the result of merging the remaining two lists

Now that we have defined the functions mergesort’splitinhalf and mergesort’merge we can easily define the function mergesort.

mergesort :: (Ord a) => [a] -> [a]
mergesort xs 
    | (length xs) > 1 = mergesort'merge (mergesort ls) (mergesort rs)
    | otherwise = xs
    where (ls, rs) = mergesort'splitinhalf xs

If the length of the list is greater than 1 then we do the standard step of the algorithm. Otherwise the list with the length of 1 is already sorted (the condition for ending the recursion).

The code reads exactly the same as the textual description of the algorithm given earlier but now this is a formal and shorter specification.

The complete code for Mergesort:

mergesort'merge :: (Ord a) => [a] -> [a] -> [a]
mergesort'merge [] xs = xs
mergesort'merge xs [] = xs
mergesort'merge (x:xs) (y:ys)
    | (x < y) = x:mergesort'merge xs (y:ys)
    | otherwise = y:mergesort'merge (x:xs) ys

mergesort'splitinhalf :: [a] -> ([a], [a])
mergesort'splitinhalf xs = (take n xs, drop n xs)
    where n = (length xs) `div` 2 

mergesort :: (Ord a) => [a] -> [a]
mergesort xs 
    | (length xs) > 1 = mergesort'merge (mergesort ls) (mergesort rs)
    | otherwise = xs
    where (ls, rs) = mergesort'splitinhalf xs

Bubble Sorting

And now the Bubble sorting algorithm: we change places in pairs of elements while we can do so, that is, while there is still a pair or elements (x, y) such as x > y.

Let’s first define the function that will go through all the elements in a list and exchange pairs of elements
when it sees that the sorting order is wrong.

bubblesort'iter :: (Ord a) => [a] -> [a]
bubblesort'iter (x:y:xs)
    | x > y = y : bubblesort'iter (x:xs)
    | otherwise = x : bubblesort'iter (y:xs)
bubblesort'iter (x) = (x)

Then we just need to apply this function n times – the length of the list that should be sorted.

bubblesort' :: (Ord a) => [a] -> Int -> [a]
bubblesort' xs i 
    | i == (length xs) = xs
    | otherwise = bubblesort' (bubblesort'iter xs) (i + 1) 

bubblesort :: (Ord a) => [a] -> [a]
bubblesort xs = bubblesort' xs 0

We do this by defining a function bubblesort’ that takes two arguments: the current list and the number of the current iteration i.

Essentially what we are doing here, is transforming iteration into a recursion, so that Bubble sorting becomes a recursive algorithm.

Links

Source code
Haskell
Learn You a Haskell book
Bubble sorting
Mergesort
Quicksort

Into the Land of Functional Programming with JavaScript

Contents

Enter JavaScript…
Running examples
Functions are “first-class citizens”
Closures and scopes
Partial Application
Memoization
Lazy evaluation
Not a functional language but

Enter JavaScript…

JavaScript is an interesting language. Syntactically it resembles C, C++ and Java, but it was also inspired by a functional language: Scheme, which is a dialect of Lisp. That JavaScript has a C-like syntax is largely due to the hype around Java at the time JavaScript was introduced. The language was rushed to the market, leaving in it a few bad design decisions like global variables, ‘with’ construct, etc. The name of the language itself is a bit misleading: it has nothing to do with Java except for a slight syntactic similarity.

JavaScript still has somewhat bad reputation with some people who do not know the language well, but encountered a few problems when programming for the browser or heard some unfavorable opinions from the people who did so. The language itself, however, had nothing to do with the inconsistent client-side API implementations in different browsers. By the way, there are plenty of client-side libraries such as JQuery that hide the browser differences behind their API and the developer usually does not have to deal with browser specific issues.

Contrary to the popular misconception JavaScript is not used only in browser, it became quite popular recently on the server-side. Why does the language continue to evolve and be successful both on the client and server side? Why more and more people choose JavaScript as the primary language in which they develop software? The examples below will provide a few of the answers to these questions. There are a few very nice design concepts in the language that have been there from the very beginning. JavaScript is flexible and powerful when it comes to working with functions, and this is what we would like to explore here.

Running examples

To execute examples you can use Firefox and the JavaScript console of the Firebug extension for Firefox The code makes use of “console.log” but there is nothing browser-specific in the examples and with minor modifications they can be run on every JavaScript implementation. You may take your time to set up the development environment so that you can play with examples in this article which is the best way to learn.

Functions are “first-class citizens”

In JavaScript functions are “first-class citizens”: they can be passed as arguments and returned from other functions. You do not have to wrap your function in an anonymous class like in Java to do so. To illustrate this, let’s add a few useful methods for working with arrays.

function forEach(arr, callback) {
    for (var i = 0; i < arr.length; i++) {
        callback(arr[i], i);
    };
};

function map(arr, callback) {
    var result = [];
    for (var i = 0; i < arr.length; i++) {
        result.push(callback(arr[i]));
    };
    return result;
};

function reduce(arr, initial, callback) {
    var accumulated = initial;
    for (var i = 0; i < arr.length; i++) {
        accumulated = callback(accumulated, arr[i]);
    };
    return accumulated;
};

//Examples
var x = [1, 2, 3, 4, 5];

console.log("x = ");
forEach(x, function (el) {
    console.log(el);
});

console.log("squares of x = ");
forEach(map(x, function (el) {
    return el * el;
}), function (el) {
    console.log(el);
});

console.log("sum of elements of x = ");
console.log(reduce(x, 0, function (sum, el) {
    return sum + el;
}));

console.log("product of elements of x = ");
console.log(reduce(x, 1, function (sum, el) {
    return sum * el;
}));

forEach performs an action for each element, map transforms each element and reduce computes an aggregate value for a given array. The action, transformation or aggregation are specified by the callback function.

Even in this simple example it is already worth noting how we can combine two different callbacks with reduce and get completely different results. The code present in reduce is easy to reuse and we reused it twice: the action performed on the element is abstracted away from the way we iterate over the elements.

But this still looks a bit ugly: we have to always pass an array as an argument. In JavaScript it is easy to fix that by adding the methods forEach, map and reduce onto the Array class. To do that we will add functions to the prototype property of the Array. The prototype here is just a special kind of object, properties of which will be available in every created Array instance. For more details look at this explanation of prototypal inheritance in JavaScript but in this post you can also view this just like a small magic trick.

Array.prototype.forEach = function(callback) {
    for (var i = 0, length = this.length; i < length; i++) {
        callback(this[i], i);
    };
};

Array.prototype.map = function(callback) {
    var result = [];
    for (var i = 0, length = this.length; i < length; i++) {
        result.push(callback(this[i]));
    };
    return result;
};

Array.prototype.reduce = function(initial, callback) {
    var accumulated = initial;
    for (var i = 0, length = this.length; i < length; i++) {
        accumulated = callback(accumulated, this[i]);
    };
    return accumulated;
};

//Examples
var x = [1, 2, 3, 4, 5];

console.log("x = ");
x.forEach(function (el) {
    console.log(el);
});

console.log("squares of x = ");
x.map(function (el) {
    return el * el;
}).forEach(function (el) {
    console.log(el);
});

console.log("sum of elements of x = ");
console.log(x.reduce(0, function (sum, el) {
    return sum + el;
}));

console.log("product of elements of x = ");
console.log(x.reduce(1, function (sum, el) {
    return sum * el;
}));

The latest version of JavaScript already includes the methods forEach, map and reduce for arrays similar to what we just implemented and it would be wise not to override these methods. We will only define them on Array.prototype in case they are not yet there (maybe, for some older browser versions).

if (!Array.prototype.forEach) {
    Array.prototype.forEach = function(callback) {
        ...
    };
};
if (!Array.prototype.map) {
    Array.prototype.map = function(callback) {
        ...
    };
};
if (!Array.prototype.reduce) {
    Array.prototype.reduce = function(initial, callback) {
        ...
    };
};

This shows that we can treat functions as values and use them in conditional statements. Besides this simple example passing functions as arguments is also widely used in client-side JavaScript programming for registering event listeners. We can, for example, add a click listener to body of the current document.

document.body.addEventListener("click", function (event) {
    console.log("Click handled", event);
}, false);

Not only can we pass functions as arguments to other functions it is also possible to return a function from another function.

function op(str) {
    switch (str) {
        case '+': return function(x, y) {
            return x + y;
        };
        case '+': return function(x, y) {
            return x + y;
        };
        case '-': return function(x, y) {
            return x - y;
        };
        case '*': return function(x, y) {
            return x * y;
        };
        case '/': return function(x, y) {
            return x / y;
        };
    };
};

console.log("op('+')(1, 2) = ", op('+')(1, 2));
console.log("op('-')(5, 3) = ", op('-')(5, 3));
console.log("op('*')(4, 5) = ", op('*')(4, 5));
console.log("op('/')(12, 3) = ", op('/')(12, 3));

This is a bit artificial example, but it illustrates well the general idea that a function can be considered a value.

Closures and scopes

There is a scope associated with each function invocation. In fact in JavaScript until the latest versions there were no other scopes, that is, a pair of brackets {} did not define a scope like in other languages such as Java or C. In the latest version it is possible to use let to define a scope, but this will not be covered in the present article.

At the time of its invocation each function captures the variables in the enclosing scope in which this function has been invoked. We say that the function “closes over” the values of the variables in the enclosing scope. This is called “closure.” The following counter example demonstrates how the counter variable is “living” in a closure:

function getCounter() {
    var counter = 0;
    return {
        increment: function() {
            return counter++;
        },
        reset: function() {
            counter = 0;
        }
    };
};

//Getting the counter object
var counter = getCounter();

//Executing its methods
console.log(counter.increment());
console.log(counter.increment());
console.log(counter.increment());
counter.reset();
console.log(counter.increment());
console.log(counter.increment());

We return an object from the getCounter method and the variable counter remains accessible for the functions defined in this object.

If we have a few function invocations then we can talk about a chain of scopes formed by a chain of function invocations much like in Scheme.

Partial Application

Partial application is converting a function of multiple arguments into a function with a fewer number of arguments. Simple example with multiplication:

function multiply(x, y) {
    return x * y;
};

function twice(x) {
    return multiply(x, 2);
};

console.log("multiply(2, 3) = ", multiply(2, 3));
console.log("twice(3) = ", twice(3));

In twice the second argument 2 is captured and we get a function of one variable x rather than two variables x and y. It is easy to build a generic solution for partially applying a function.

if (!Function.prototype.partial) {
    Function.prototype.partial = function(argTransformer) {
        //The current function that we partially apply
        var f = this;
        return function() {
            //Need to convert the function arguments into an array
            var args = Array.prototype.slice.call(arguments, 0);

            /*
             * Transforming the arguments and calling the initial function
             * with the transformed arguments. 'this' here is determined by the context
             * of invocation of the partially applied function and is not 'f'
             */
            return f.apply(this, argTransformer(args));
        };
    };
};

var multiply = function(x, y) {
    return x * y;
};
var double = multiply.partial(function (args) {
    args.push(2);
    return args;
});
var triple = multiply.partial(function (args) {
    args.push(3);
    return args;
});

console.log("multiply(2, 5) = ", multiply(2, 5));
console.log("double(3) = ", double(3));
console.log("triple(4) = ", triple(4));

Initially we keep the reference to this which references the function for which partial has been invoked. We use this reference later in the anonymous function that we return from partial to execute the original function together with the captured arguments and the arguments passed to the anonymous function at the point of its invocation. argsTransformer, which was the argument to the original partial invocation, combines the passed and captured arguments inside the returned anonymous function. Also, note that this in the anonymous function returned from partial is different from what is stored in the f variable, it is now the object on which the anonymous function was invoked.

Memoization

Memoization is an optimization technique for avoiding expensive calculations in repeated function calls. Simple example:

function expensiveComputation(x) {
    return x * x;
};

var cache = {};

function memoizedExpensiveComputation(x) {
    var result = cache[x];

    if (!result) {
        result = expensiveComputation(x);
        cache[x] = result;
    };
    return result;
};

console.log("expensiveComputation(5)", expensiveComputation(5));
console.log("memoizedExpensiveComputation(5)", memoizedExpensiveComputation(5));
console.log("memoizedExpensiveComputation(5)", memoizedExpensiveComputation(5));

memoizedExpensiveComputation caches the values that were already computed for particular arguments and returns these values directly from the cache avoiding calling expensiveComputation.

With JavaScript it is easy to build a generic solution for function memoization.

function memoize(func, host, hash) {
    //By default memoize a function on the window object
    var host = host || window,
        hash = hash || {},
        original = host[func];
    //Only functions can be memoized
    if (!host[func] || !(host[func] instanceof Function)) {
        throw "Can memoize only a function or function is not defined in host";
    };
    //Redefine the function on the host object
    host[func] = function() {
        //The key in the cash is a JSON representation of arguments
        var jsonArguments = JSON.stringify(arguments);
        //If the value has not yet been computed
        if (!hash[jsonArguments]) {
            //Calling the original function with the arguments provided to host[func],
            //'this' in the original function will also be the same as in the redefined function in order to handle
            //host[func].call and host[func].apply
            hash[jsonArguments] = original.apply(this, Array.prototype.slice.call(arguments, 0));
        };
        return hash[jsonArguments];
    };
};

function fib(num) {
    return (num < 2) ? num : this.fib(num - 1) + this.fib(num - 2);
};
memoize("fib");

console.log("fib(5) =", fib(5));
console.log("fib(10) =", fib(10));
console.log("fib(11) =", fib(11));

First we check that host contains the function, name of which was passed as the first argument to memoize. If it does, we keep the reference to this function and then redefine the function host[func] just like in the simple example before. If the computed value can be looked up in the cache, than we return it without actually calling the original function, otherwise we call the original function and store the result in the cache. The key in the cache is the JSON representation of the arguments passed to the redefined host function. We have to call Array.prototype.slice on the arguments to convert them into an array object (a flaw in the design of JavaScript – arguments passed to the function is an array-like object, not an array) and then we call the original function original with apply on this which is defined by the invocation context of the redefined host[func]. We pass to the original function all the arguments that were passed to the redefined version of it. For user of the API using the redefined function is as transparent as using the original one. The call to memoize just redefines the function so that it starts returning already computed values from a cache.

Lazy evaluation

With JavaScript it is also relatively easy to implement lazy evaluation and lazy streams. Each stream consists of two parts: the head and the tail. The tail can be an actual element or a promise to compute the element. Execution of this promise to compute an element can be omitted at the point of defining the stream and can be done later. In JavaScript such a promise can be implemented as a function.

function node(head, tail) {
    return [head, tail];
};

function head(stream) {
    return stream[0];
};

function tail(stream) {
    var tail = stream[stream.length - 1];
    return (tail instanceof Function) ? tail() : tail;
};

function drop(stream) {
    var h = head(stream);
    var t = tail(stream);
    stream[0] = t ? t[0] : null;
    stream[1] = t ? t[1] : null;
    return h;
};

function iterate(stream, callback, limit) {
    while (head(stream) && ((undefined == limit) || (limit > 0))) {
        limit && limit--;
        callback(drop(stream));
    };
};

function show(stream, limit) {
    iterate(stream, function (x) {
        console.log(x);
    }, limit);
};

//Examples
function upto(from, to) {
    return (from > to) ? null : node(from, function() {
        return upto(from + 1, to);
    });
};
function upfrom(start) {
    return node(start, function() {
        return upfrom(start + 1);
    });
};

console.log("upto:");
show(upto(3, 6));

console.log("upfrom:");
show(upfrom(7), 10);

The key part in this code is the tail function where we check whether tail is a promise rather than an actual element (that is, a function) and execute this promise if needed. Then defining each particular lazy stream is as easy as recursively defining a promise for the next element.

It is further possible to “objectify” the lazy stream code so that each stream represents an object and add the ability to filter, transform and unite streams. The full implementation is available here

Not a functional language but

Despite all the flexibility and ease of working with functions JavaScript is not a functional language. It has mutable shared state and the notion of the time of execution: if one statement precedes another it is executed earlier. Also there is no tail call recursion optimization in JavaScript, so the following code will not be optimized:

function factorial(number) {
    if (0 == number) {
        return 1;
    };
    return number * factorial(number - 1);
};

console.log("factorial(10) = ", factorial(10));

While JavaScript is still not functional, the excellent support for functions helps a lot by adding the necessary flexibility and power to the language and in part explains the popularity of JavaScript both on the client and the server side.

More working examples like in this post can be found on github.com at “Higher-Order JavaScript” The code for this project was inspired by the blog series about “Higher-Order Ruby” and the “Higher-Order Perl” book. If you also like Ruby, please, visit the blog http://blog.grayproductions.net and consider buying the book http://hop.perl.plover.com/ if you want to learn some good Perl.

Links

1. Scheme programming language
2. Firebug
3. JavaScript prototypal inheritance
4. addEventListener
5. Partial Application
6. Memoization
7. Tail call
8. Lazy evaluation
9. “Higher-Order JavaScript”
10. “Higher-Order Ruby” blog series
11. “Higher-Order Perl” book

Smarter Search Engines

Having written a number of simple scripts that aggregate information that can be found on various Internet sites I have a feeling that I am doing some repetitive, useless and low-level work. It is like writing a classic “Hello, world!” program in machine codes. Clearly there is something missing here, something that could have helped […]