Producing Code for Machines and Humans

Contents

Code
Efficiency
Abstraction
Requirements for languages
Future

Code

Code (or “source code”) is a set of instructions that can be executed automatically by a machine. It is usually written in a programming language and stored in source files in the textual form. Before it is automatically executed it may be first transformed (compiled) into a format convenient for the target machine or it can be executed right away (interpreted).

Efficiency

Efficient code from the point of view of a machine is code that uses as few resources as possible to execute: memory, network bandwidth, processor time, battery, etc. When deployed inefficient code often costs extra money because it consumes unnecessary resources. Hence we should strive to write efficient code.

But costs associated with code are not only limited to consumed machine resources. Code needs to be maintained and changed throughout its lifetime by humans. And humans have a rather different set of requirements for code than machines. Efficient code for humans is the code that can be quickly understood and easily changed.

Problems of efficiency may arise when a program gets really large and complex with many concepts, rules and flows, because humans have a rather limited attention span, short-term memory, typing speed, etc. In order for a program to be maintainable at all it needs to be adapted to these limitations.

Abstraction

Another area in which humans are remarkably different than machines is the ability to abstract and categorize things and work simultaneously at different levels of abstraction. For example, when we drive a car we do not need to know how the engine works, everything we need is just a steering wheel, a dashboard with measurements, two pedals, etc. In other words, we only need to have an interface through which we can interact with a car. Likewise, when we see a traffic jam on a map we are not interested in how a car is actually driven by a driver, we are interested in how many cars are now standing in our way and do not allow us to get where we want to. That’s an example of abstraction: unnecessary details are stripped out and disregarded in order to get the whole picture in a given context to solve specific problems.

Machines are not really good at this type of thinking. It turns out to be relatively hard to automatically decide what to abstract away for a given task. Ultimately machines always deal with memory bits and electrical signals and only with those abstractions that we explicitly supply them with via the source code.

Requirements for languages

What implications do the outlined above human limitations and the extra ability for abstraction have for code?

To match human perception and the way of thinking code should also be written at a specific level of abstraction that is necessary for solving a particular problem. For example, in a web store application an engineer would like to work with such entities as “shopping cart”, to be able to do a “check out”, etc. The programming language should allow to build such abstractions in the code, otherwise the software will be really hard to maintain or it will be unmaintainable at all.

Building such abstractions means additional resources required from a machine to deal with them, and hence additional machine-related costs. Of course, these costs are now balanced by lowered human-related costs which can be quite high because it takes a lot of money to pay software engineers. And we would always like to lower the overall costs, not just the machine-related costs.

This leads to new requirements for programming languages, especially as the problems being solved and applications being developed get more and more complex and machines more powerful. Languages should allow for humans to build abstractions in code easily and in a manner efficient for a machine.

If we look at the history of programming languages then we see a clear trend: as the time goes the languages become more high-level and more powerful from the perspective of solving a problem fast, but less efficient from the perspective of a machine. First it was machine code, then Assembler, then high-level languages such as Algol, then C++ and object-oriented languages, then Java and garbage collection, then Ruby and domain-specific languages, and the trend still continues to the present moment.

Nowadays building better abstractions with a language means building domain-specific languages better. Domain-specific language is just a small language built to solve a particular problem. In the case with a web store this may be a language that has such notions as “payment method”, “order”, “discount”, etc. which allow you to easily build rules and flows involving them. By they way, the main advantage of Ruby, while it may be slow to execute and more resource-consuming, is the ease of creating domain-specific languages. One such language, Ruby on Rails was very successful precisely because it allowed to build web applications faster and easier than its lower-level competitors.

Future

Now and in the foreseeable future in order to solve some relatively complex problem first a domain-specific language for solving this problem should be established in which this problem is easy to address. Hence the trend is to more machine-inefficient but more abstract and high level applications, ones that are built using domain-specific languages.

And now here are some of the yet not fully answered questions:

  • What are most appropriate building block for abstractions, objects or functions? This is where the whole debate of object-oriented programming vs. functional programming comes in.
  • Is it possible to build self-evolving domain-specific languages and automatically generate domain-specific languages like Ruby on Rails for a specific problem? Various computer-aided software engineering tools and UML tried to solve this problem but none of them managed to completely do it. And a lot of human participation is still usually needed. Moreover, sometimes generated code is really hard or impossible to support for humans.
  • Next machine performance improvements will likely come from the increase in the number of parallel processors and distributed computations. How to best abstract away for humans the complex details underneath such computations? Clearly most of the present-day programming languages were not developed for such an abstraction. This makes programming using them in the future potentially very complex and unproductive. As one such example Java concurrency API is really difficult to use and easy to misuse. There are attempts to address this problem, for example, take a look at the Clojure programming language.
  • What other formats for storing and viewing code can be there? Is it really best to store it as text files? It is hard to make purely visual languages as they often do not easily allow the desired level of customization and flexibility in solving a problem. Normally we would also sometimes access the text representation of the code being generated from the visual representation.

All the above being said, it really comes down to this: we just have to remember that the code is written both for machines and humans, and they have different requirements for efficiency which can be in conflict with each other. We have to balance these requirements: the right level of abstraction for humans and the hardware resource consumption for machines.

Links

Interpreted languages
Compiled languages
Interface
Machine code
Assembler
ALGOL
C++
Java
Ruby
Ruby on Rails
UML
Computer-Aided Software Engineering
Are we there yet? (Rich Hickey)
Clojure programming language
Human-computer interaction

Measuring Language Popularity is Harder Than Many Think

Contents

What languages are most popular?
Measurement method
Technical details
Comparing with other measurements
No comprehensive research anywhere

What languages are most popular?

Obviously it is interesting to know which programming languages are most popular. If some language is popular, this means that you will have plenty of resources such as books and articles to learn it, also, probably, good community support and you will be able to find a lot of ready libraries and tools for that language as well just because so many other developers are already using it. Also it may be interesting to see how our most popular programming language, which we all tend to have once in a while, scores relative to other languages.

Measurement method

But how to measure the language popularity on the Web? The first thing that came to mind was to use different search engines and compare numbers of results for different languages. This seemed like the simplest and the most obvious thing to do. But not so fast! Unfortunately, it turns out, that search counts returned by most popular search engines are just rough estimates of the number of the results they think they should give to you based on your search query, not all the possible results. More details are explained in this article. In other words, search engines are doing well what they are supposed to do: context based search. Nobody designed them to compute exact aggregations over huge amounts of data and they do not usually do this well.

What other options can be there? For once, we can select a few sites with job postings, such as monster.com, or with software development articles and presentations like infoq.com, various forums for software engineers, etc. On these sites we can search for certain programming languages, and by comparing the results we can estimate the relative popularity of the languages.

However, searching just one such resource may not be enough, for example, Java developers can really like one site and Ruby developers at the same time can like completely another site. As later we will see this is actually the case with github.com, which is really popular with JavaScript developers and with stackoverflow.com, which has a large number of C# related questions. But at least we can try to search one of such sites and compare the results with the data we already have from other sources to be more sure in our measurements.

I chose stackoverflow.com as it is a really good and popular site with questions and answers on every software development topic you can think about.

Technical details

So, now I will take the list of all the programming languages from somewhere and search for them on stackoverflow.com. Let’s take, for example, the list of all the languages that are used on github.com. Then we would have to search for each language and write down the number of search results for each one of them. But since it is a really boring and time consuming task and computers were invented a long time ago, let’s write a simple script that will help us do the mundane work and execute around 90 searches. By automating a bit we will also have more confidence in the results as manually doing something is usually more error-prone.

For automation we will use a headless WebKit browser PhantomJS and will generate an HTML report right from our PhantomJS script. The result will be just a simple bar chart rendered with Google Chart Tools.

The complete version of the code for the script is available here.

Some of the code highlights from the key parts of the script are listed below.

Getting the list of all the languages from github.com


function getAllGithubLanguages(callback) {
    page.open("https://github.com/languages", function (status) {
        var allLanguages = page.evaluate(function() {
            var links = document.querySelectorAll(".all_languages a");
            return Array.prototype.slice.call(links, 0).map(function(link) {
                return link.innerHTML;
            });
        });
        callback(allLanguages);
    });
};

By the way, notice how easy it is to work with DOM in JavaScript: all the API specifically adapted for this is already there, so PhantomJS allows us to use querySelectorAll and CSS selectors.

Getting the number of results once they are displayed in the browser.


function getSummaryCount() {
    var resultStats = document.querySelector("div.summarycount"),                   
        regex = /[\d,.]+/g,
        resultsNumber = -1;
                
    if (resultStats) {
        resultsNumber = regex.exec(resultStats.innerHTML)[0];
        resultsNumber = resultsNumber.replace(/[,\.]/g, "");
    };
    return parseInt(resultsNumber);
};

Searching for each language with two URLs in case the first URL produces no results.


function openResultsURL(url, callback) {
    page.open(url, function (status) {                 
        callback(page.evaluate(getSummaryCount));
    });    
};

function search(term, callback) {
    var urls = [
        "http://stackoverflow.com/search?q=" + encodeURIComponent(term),
        "http://stackoverflow.com/tags/" + encodeURIComponent(term)
    ];

    openResultsURL(urls[0], function(resultsCount) {
        if (resultsCount > 0) {
            callback(term, resultsCount);
        } else {
            openResultsURL(urls[1], function(resultsCount) {
                callback(term, resultsCount);
            });
        }
    });
};

Also you may notice how we pass callbacks everywhere. This may seem a bit strange at first if you have not programmed in JavaScript a lot, but this is actually the most common style of programming in JavaScript both on the client and the server side. Here PhantomJS encourages asynchronous programming as well because interaction with the browser is also asynchronous. Each callback is executed once the results are ready at an appropriate point in time. This provides for a more declarative style of programming too.

The entry point into the script, collecting all the search results and saving a generated report.


function saveReport(html) {
    fs.write("top_languages_report.html", html, "w");
};

getAllGithubLanguages(function (languages) {
    var statistics = {},
        int;
    
    languages = Array.prototype.slice.call(languages, 0);
    console.log("Number of languages = ", languages.length);
    int = setInterval(function waitForStatistics() {
        if (0 == activeSearches) {
            if (languages.length > 0) {
                activeSearches++;                
                search(languages.shift(), function (term, count) {
                    console.log(term + " found " + count + " times");               
                    statistics[term] = count;
                    activeSearches--;
                });
            } else {
                console.log("Finished all searches!");
                clearInterval(int);
                saveReport(generateReport(statistics));
                phantom.exit();
            };
        };
    }, TIMEOUT);
});

The code for generating reports that is also in the script is a bit awkward, largely due to the lack of a standard JavaScript library for working efficiently with data structures. It takes quite a bit of effort to transform the results to the format needed for rendering the chart, but the code for doing this is not really what this script is all about, it is just some utility boiler plate that unfortunately cannot be avoided here.So let’s just omit this part.

And, voilĂ  the final chart with the top 30 most popular on stackoverflow.com programming languages.

However, we cannot reach any conclusions based just on the results from one site. Hardly C# is the most popular language, this must be a stackoverflow.com thing.

We can go one step further and search some other sites like Amazon.com but will it give us any more confidence? Let’s stop at this point and compare our results with the results of other similar researches obtained by using slightly different methods.

Comparing with other results

So the top ten languages we got are: C#, Java, PHP, JavaScript, C++, Python, Objective-C, C, Ruby, and Perl.

First, let’s look at TIOBE which uses search engine result counts and we discussed above why it may be not the best idea. It is basically the same 10 languages, but instead of JavaScript it is Visual Basic. Well, maybe Visual Basic has a very lively online community? I doubt it somehow, probably, it is just a lot of Visual Basic books and articles that make it so popular in this index, but everything is possible.

OK, what about the number of projects in different languages at github.com? The following statistics is provided on the site. Also very close to the list that we obtained but instead of C# there is Shell which, probably, can be explained by a lot of people with Linux background who use github.com. It also seems that C# developers do not favor github.com for some reason.

I would say we have a good correlation between the results for the top languages. Still I will be very careful with saying how much the top languages are popular relative to each other since different sources yielded very different results. But, at least, we get the following 12 most popular at the moment programming languages:

C#, Java, PHP, JavaScript, C++, Python, Objective-C, C, Ruby, Perl, Shell, Visual Basic

No comprehensive research anywhere

The problem with measuring the popularity of programming languages seems to be more complex than we initially thought. Developers of a specific language tend to be concentrated around a few sites and different sites give very different results like stackoverflow.com and github.com in the present article. In a way the problem is similar to measuring the popularity of human languages by randomly going to different large world cities. After visiting a couple dozen of cities we may start to get an idea which human languages are popular, but we will have hard time measuring the relative popularity of these languages, and especially hard time with less popular languages: we may just not visit a single large city where those languages are spoken. So, in order to make a good research we should statistically compare results from many different cities (sites), and know the structure of the world (Web) well. Unfortunately, I could not find any such research on the Web and doing it myself requires much more effort than just a weekend project. But even then, is the language popularity only limited to its online popularity?

Links

Git-based collaboration in the cloud github.com
Software development Q&A stackoverflow.com
PhantomJS
Google Chart Tools
Google Chart Tools: Bar Chart
Count results
Github.com top languages
TIOBE programming community index

Sorting Algorithms in Haskell

Contents

Why Haskell?
Quicksort
Mergesort
Bubble Sorting

Why Haskell?

Recently I decided to learn a bit of Haskell. Having programmed a bit in Clojure and having some familiarity with Common Lisp and Scheme I always wanted to take a closer look at Haskell.

What distinguishes Haskell is that it is a purely functional language, without state and variables. As a result statements in it are very close to mathematical expressions.

While reading Learn You a Haskell book I sat down the second day’s evening behind my computer to write some simple sorting algorithms and was pleasantly surprised with the result: it was really easy and fast to implement these algorithms in Haskell, and the code reads almost like the definitions of the algorithms itself.

While I am still learning and my Haskell code may still be far from perfect I just wanted to share these first results and maybe convince you to also take a look at Haskell. If you are not comfortable with the syntax, please, refer to the first chapters of the Learn You a Haskell book. Some prior Haskell knowledge may be beneficial for reading this article although I try to explain the syntax and the language constructs a bit in the context of the provided examples.

So, let’s go over some of the best known classical sorting algorithms and try to use Haskell to implement them.

Quicksort

Let’s define the function quicksort that will implement the Quicksort algorithm.

Haskell is a statically typed language. When a function library is compiled, compiler tries to infer types where it can and we can also help it by specifying them explicitly.

quicksort :: (Ord a) => [a] -> [a]

The function quicksort takes a list [a] of some type a, such that elements of a can be compared with each other (this is specified by using the (Ord a) guard). And then the function returns a list of the same type [a].

The quicksort algorithm itself is really simple:

1. We pick some element x in the list

2. The rest of the elements in the list are separated into two lists: elements smaller than x and elements greater than x

3. The algorithm is applied recursively to these lists and then the list with smaller elements, the selected element and the list of greater elements are concatenated together and the sorting is done

Here is how we would implement this in Haskell:

quicksort (x:xs) = quicksort [y | y <- xs, y <= x] ++ [x] ++ quicksort [y | y <- xs, y > x]

quicksort takes an argument (x:xs) which is a list consisting of the first element x and the rest of the list xs. Then we apply list comprehension [y | y <- xs, y <= x] to get a list of all the elements in the list xs that are smaller or equal than x. Then we concatenate the resulting list with a single element list [x] and the list of elements that are greater than x.

So the recursion on which the Quicksort algorithm is built is now defined by the function quicksort, but we still need to finish the recursion at some point, so we need to specify the condition when the recursion ends. This is also easily done in Haskell by augmenting the definition of the function quicksort with one more rule:

quicksort [] = []

Thus the algorithm applied on an empty list will return an empty list.

Combining everything together we get the complete Quicksort implementation in Haskell:

quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) = quicksort [y | y <- xs, y <= x] ++ [x] ++ quicksort [y | y <- xs, y > x]

Note, how mathematically pure this implementation looks. No variables, no order in which the steps of the algorithm should be performed, just the specification of how the algorithm should work.

Mergesort

Mergesort is a little more complicated to implement. The algorithm as follows:

1. List is split into two parts

2. Two parts are sorted by the algorithm

3. The sorted parts are merged by a special merging procedure for sorted lists

Let’s first define how we split a list into two parts:

mergesort'splitinhalf :: [a] -> ([a], [a])
mergesort'splitinhalf xs = (take n xs, drop n xs)
    where n = (length xs) `div` 2 

The function mergesort’splitinhalf returns a pair of arrays into which the original array was split.
n is equal to the half of the length of the array, and then we use the standard functions take and drop
to get the first n elements of the list take n xs and the rest of the list after those first elements drop n xs.

Let’s now define a function for merging two sorted arrays:

mergesort'merge :: (Ord a) => [a] -> [a] -> [a]
mergesort'merge [] xs = xs
mergesort'merge xs [] = xs
mergesort'merge (x:xs) (y:ys)
    | (x < y) = x:mergesort'merge xs (y:ys)
    | otherwise = y:mergesort'merge (x:xs) ys

The function receives two arrays and produces one array of the same type. The algorithm for merging:

1. If the first list is empty [] then the result of the merge is the second list xs

2. If the second list is empty [] then the result of the merge is the first list xs

3. Otherwise we compare the first elements of the lists and append with the colon : function the least of them to the new list which is the result of merging the remaining two lists

Now that we have defined the functions mergesort’splitinhalf and mergesort’merge we can easily define the function mergesort.

mergesort :: (Ord a) => [a] -> [a]
mergesort xs 
    | (length xs) > 1 = mergesort'merge (mergesort ls) (mergesort rs)
    | otherwise = xs
    where (ls, rs) = mergesort'splitinhalf xs

If the length of the list is greater than 1 then we do the standard step of the algorithm. Otherwise the list with the length of 1 is already sorted (the condition for ending the recursion).

The code reads exactly the same as the textual description of the algorithm given earlier but now this is a formal and shorter specification.

The complete code for Mergesort:

mergesort'merge :: (Ord a) => [a] -> [a] -> [a]
mergesort'merge [] xs = xs
mergesort'merge xs [] = xs
mergesort'merge (x:xs) (y:ys)
    | (x < y) = x:mergesort'merge xs (y:ys)
    | otherwise = y:mergesort'merge (x:xs) ys

mergesort'splitinhalf :: [a] -> ([a], [a])
mergesort'splitinhalf xs = (take n xs, drop n xs)
    where n = (length xs) `div` 2 

mergesort :: (Ord a) => [a] -> [a]
mergesort xs 
    | (length xs) > 1 = mergesort'merge (mergesort ls) (mergesort rs)
    | otherwise = xs
    where (ls, rs) = mergesort'splitinhalf xs

Bubble Sorting

And now the Bubble sorting algorithm: we change places in pairs of elements while we can do so, that is, while there is still a pair or elements (x, y) such as x > y.

Let’s first define the function that will go through all the elements in a list and exchange pairs of elements
when it sees that the sorting order is wrong.

bubblesort'iter :: (Ord a) => [a] -> [a]
bubblesort'iter (x:y:xs)
    | x > y = y : bubblesort'iter (x:xs)
    | otherwise = x : bubblesort'iter (y:xs)
bubblesort'iter (x) = (x)

Then we just need to apply this function n times – the length of the list that should be sorted.

bubblesort' :: (Ord a) => [a] -> Int -> [a]
bubblesort' xs i 
    | i == (length xs) = xs
    | otherwise = bubblesort' (bubblesort'iter xs) (i + 1) 

bubblesort :: (Ord a) => [a] -> [a]
bubblesort xs = bubblesort' xs 0

We do this by defining a function bubblesort’ that takes two arguments: the current list and the number of the current iteration i.

Essentially what we are doing here, is transforming iteration into a recursion, so that Bubble sorting becomes a recursive algorithm.

Links

Source code
Haskell
Learn You a Haskell book
Bubble sorting
Mergesort
Quicksort