Engagement Trap: Why Content Quality Matters

tableta_con_trillo

This post is a reaction to the notion of “content trap” http://www.thecontenttrap.com/, avalanche of click-baiting links now spottable even in more respectable media outlets and the whole idea of focusing on capturing attention at the expense of neglecting the underlying content to the extreme. Here we will try to rectify some of the misconceptions and show that, not surprisingly, content is always important, no matter what medium it is communicated through.

As of late the ubiquity of different information channels, the ease with which information proliferates through them, and the abundance of the available information itself have made the attention of the potential recipients all the more valuable.

For a publisher of information this means competing with many other publishers over the scarce attention of their target audience. For consumers of the produced information on the other hand it often leads to being overwhelmed with the streams of data that try to reach them from all the possible directions.

This leads to a paradox: when information is more abundant it becomes harder, not easier to retrieve just the right kind of information that one may be looking for. And it is also harder to make sure that high quality information reaches its intended audience as now it may easily be drowned in a sea of irrelevant content. For producers it therefore becomes harder to be heard and noticed. For consumers of information it is now more difficult to concentrate their precious attention on the information that is really worth it and discern whether the content that reaches them is true or useful in the first place. Probably almost everybody can relate to this problem from their own personal experience of endless click-baiting Web links that pop-up on their radar every day.

The phenomenon when abundance of information leads to the degradation of its average quality and less efficient communication between its producers and consumers is not just limited to news sites or the Web. The exact same dynamic can be observed in the book publishing industry, TV, and academic setting. The rapid increase in the amount of information, the speed and ease with which it spreads all seem to be inherent in the modern technological civilization itself and inevitably lead to the aforementioned paradox.

What is even more interesting, is that this explosion of information did not even start with computers and Internet like some might be inclined to think. In fact, probably the first invention on the path to make information more abundant and accessible was the introduction of the first writing system long before the first computers appeared. Ever since then the present discussion remained relevant.

Nowadays many seem just to embrace the brave new world of online noise and distraction shortened to messages under 255 symbols each as something inevitable and even claim that what really matters is not content itself, but ‘engagement’, ‘clicks’ and ‘experience’, meaning how long somebody would pay attention to a given source of information and in what exact way. Content therefore becomes pushed into the background and is not considered to be as important compared to the manner or timing of its presentation. Somehow in this simplified view the attention and time spent by the target audience, and not the utility gained both by the producer and consumer of information become important.

Have we started valuing presentation more than content? Or appearance more than substance? Does clicking through a dozen of “10 things you cannot miss that change everything and whoever did it is a genius” or completing another “Financial management of big data mobile machine learning blockchain” course deliver a lot of utility for both the publisher and the consumers of the information? The answer to these questions is often a resounding “no”.

Those who claim that digital technologies and the Web radically change everything are really missing the point. The dynamic of making information and knowledge more accessible to ever broader audiences has been here already for thousands of years, and to suggest that somehow this particular moment of time at which we happen to live now is very different and special may be a bit arrogant and ignorant. In many ways it is indeed special, but we should not miss the obvious commonalities with the past and should always try to see things in a much longer perspective than our personal experience would normally allow. Surprisingly often things do not change all that much with the passage of time, they just tend to take on different forms and manifest themselves in new ways.

What really actually always has mattered and really still matters in the end, is the mutual utility derived from the engagement and information exchange. Be it a product bought after seeing a relevant ad, acquired knowledge, payment for a subscription or giving relevant or useful data back in return. The gained utility is the most important thing to note and understand when analyzing information exchange as opposed to just taking into account the expenditures incurred by one of the participating parties (such as attention and time spent). Of course, the really difficult and challenging problem now is how to measure this utility and optimize for its increase, but a more detailed discussion of this topic will be left outside of the scope of the present short post.

In fact one the greatest inventions on the Web: search engines, tries to address exactly the same problem of information quality and relevance. The sheer volume of utility derived by a search engine from routing users to the content that gives them most value and in turn routing advertisers to those users who may be slightly more interested in their products shows how real the problem of irrelevant low-quality information is and how valuable solving it can be.

Unsurprisingly content quality and relevance still mean and will probably always mean a lot, so the old traditional news sites, radio, TV, educational institutions and book publishers will not be out of business in any foreseeable future just yet as long as they continue to provide high quality content. Content matters in itself, not just some particular channel or a peculiar way in which it is communicated. Content can and should certainly be appealing and presented in an engaging manner friendly to the recipient, however, there is really no substitute for content’s quality. Just increasing the number of channels through which content is communicated will not automatically increase the gained utility from the information exchange as long as the target audience has already been reached, although it may indeed capture more attention. Similarly just the number of links in a social network does not matter as much as the total quality and utility of these links, i.e. the network itself.

While we are still searching for better ways to increase utility from interactions with the deluge of information surrounding us both as its producers and consumers, we probably can still provide a few basic hints on how to deal with the increased amount of low-quality content from the consumer’s point of view:

  • Try to focus on the information that you really need, avoid information sources that try to capture your attention all too obviously (for example, click-baiting links on news sites)
  • Consider valuing high quality information sources more and giving them preference
  • Use filtering and try to limit the background noise and irrelevant information, ‘blacklist’ bad sources and ‘whitelist’ the good ones based on your previous experience
  • Ask the question “What is the utility gained from giving my attention to this particular piece of information?”
  • Focus. Consider periodically shutting down access to sources of information and working with the already received information in isolation in order to reduce distractions and make sure that there is enough attention for the information that requires it most at a particular moment
  • If some piece of information is still relatively hard to find, try skimming through the available information sources quickly and study the sources deeper only when they seem to be really worth your attention

Hopefully these tips will help your manage your most precious resource: attention and the post itself will also make us think more about substance rather than appearance and the importance of content quality.

Illustration: Kish tablet image as provided by Ashmolean museum for Wikipedia https://en.wikipedia.org/wiki/Kish_tablet

Advertisements

Measuring Language Popularity is Harder Than Many Think

Contents

What languages are most popular?
Measurement method
Technical details
Comparing with other measurements
No comprehensive research anywhere

What languages are most popular?

Obviously it is interesting to know which programming languages are most popular. If some language is popular, this means that you will have plenty of resources such as books and articles to learn it, also, probably, good community support and you will be able to find a lot of ready libraries and tools for that language as well just because so many other developers are already using it. Also it may be interesting to see how our most popular programming language, which we all tend to have once in a while, scores relative to other languages.

Measurement method

But how to measure the language popularity on the Web? The first thing that came to mind was to use different search engines and compare numbers of results for different languages. This seemed like the simplest and the most obvious thing to do. But not so fast! Unfortunately, it turns out, that search counts returned by most popular search engines are just rough estimates of the number of the results they think they should give to you based on your search query, not all the possible results. More details are explained in this article. In other words, search engines are doing well what they are supposed to do: context based search. Nobody designed them to compute exact aggregations over huge amounts of data and they do not usually do this well.

What other options can be there? For once, we can select a few sites with job postings, such as monster.com, or with software development articles and presentations like infoq.com, various forums for software engineers, etc. On these sites we can search for certain programming languages, and by comparing the results we can estimate the relative popularity of the languages.

However, searching just one such resource may not be enough, for example, Java developers can really like one site and Ruby developers at the same time can like completely another site. As later we will see this is actually the case with github.com, which is really popular with JavaScript developers and with stackoverflow.com, which has a large number of C# related questions. But at least we can try to search one of such sites and compare the results with the data we already have from other sources to be more sure in our measurements.

I chose stackoverflow.com as it is a really good and popular site with questions and answers on every software development topic you can think about.

Technical details

So, now I will take the list of all the programming languages from somewhere and search for them on stackoverflow.com. Let’s take, for example, the list of all the languages that are used on github.com. Then we would have to search for each language and write down the number of search results for each one of them. But since it is a really boring and time consuming task and computers were invented a long time ago, let’s write a simple script that will help us do the mundane work and execute around 90 searches. By automating a bit we will also have more confidence in the results as manually doing something is usually more error-prone.

For automation we will use a headless WebKit browser PhantomJS and will generate an HTML report right from our PhantomJS script. The result will be just a simple bar chart rendered with Google Chart Tools.

The complete version of the code for the script is available here.

Some of the code highlights from the key parts of the script are listed below.

Getting the list of all the languages from github.com


function getAllGithubLanguages(callback) {
    page.open("https://github.com/languages", function (status) {
        var allLanguages = page.evaluate(function() {
            var links = document.querySelectorAll(".all_languages a");
            return Array.prototype.slice.call(links, 0).map(function(link) {
                return link.innerHTML;
            });
        });
        callback(allLanguages);
    });
};

By the way, notice how easy it is to work with DOM in JavaScript: all the API specifically adapted for this is already there, so PhantomJS allows us to use querySelectorAll and CSS selectors.

Getting the number of results once they are displayed in the browser.


function getSummaryCount() {
    var resultStats = document.querySelector("div.summarycount"),                   
        regex = /[\d,.]+/g,
        resultsNumber = -1;
                
    if (resultStats) {
        resultsNumber = regex.exec(resultStats.innerHTML)[0];
        resultsNumber = resultsNumber.replace(/[,\.]/g, "");
    };
    return parseInt(resultsNumber);
};

Searching for each language with two URLs in case the first URL produces no results.


function openResultsURL(url, callback) {
    page.open(url, function (status) {                 
        callback(page.evaluate(getSummaryCount));
    });    
};

function search(term, callback) {
    var urls = [
        "http://stackoverflow.com/search?q=" + encodeURIComponent(term),
        "http://stackoverflow.com/tags/" + encodeURIComponent(term)
    ];

    openResultsURL(urls[0], function(resultsCount) {
        if (resultsCount > 0) {
            callback(term, resultsCount);
        } else {
            openResultsURL(urls[1], function(resultsCount) {
                callback(term, resultsCount);
            });
        }
    });
};

Also you may notice how we pass callbacks everywhere. This may seem a bit strange at first if you have not programmed in JavaScript a lot, but this is actually the most common style of programming in JavaScript both on the client and the server side. Here PhantomJS encourages asynchronous programming as well because interaction with the browser is also asynchronous. Each callback is executed once the results are ready at an appropriate point in time. This provides for a more declarative style of programming too.

The entry point into the script, collecting all the search results and saving a generated report.


function saveReport(html) {
    fs.write("top_languages_report.html", html, "w");
};

getAllGithubLanguages(function (languages) {
    var statistics = {},
        int;
    
    languages = Array.prototype.slice.call(languages, 0);
    console.log("Number of languages = ", languages.length);
    int = setInterval(function waitForStatistics() {
        if (0 == activeSearches) {
            if (languages.length > 0) {
                activeSearches++;                
                search(languages.shift(), function (term, count) {
                    console.log(term + " found " + count + " times");               
                    statistics[term] = count;
                    activeSearches--;
                });
            } else {
                console.log("Finished all searches!");
                clearInterval(int);
                saveReport(generateReport(statistics));
                phantom.exit();
            };
        };
    }, TIMEOUT);
});

The code for generating reports that is also in the script is a bit awkward, largely due to the lack of a standard JavaScript library for working efficiently with data structures. It takes quite a bit of effort to transform the results to the format needed for rendering the chart, but the code for doing this is not really what this script is all about, it is just some utility boiler plate that unfortunately cannot be avoided here.So let’s just omit this part.

And, voilà the final chart with the top 30 most popular on stackoverflow.com programming languages.

However, we cannot reach any conclusions based just on the results from one site. Hardly C# is the most popular language, this must be a stackoverflow.com thing.

We can go one step further and search some other sites like Amazon.com but will it give us any more confidence? Let’s stop at this point and compare our results with the results of other similar researches obtained by using slightly different methods.

Comparing with other results

So the top ten languages we got are: C#, Java, PHP, JavaScript, C++, Python, Objective-C, C, Ruby, and Perl.

First, let’s look at TIOBE which uses search engine result counts and we discussed above why it may be not the best idea. It is basically the same 10 languages, but instead of JavaScript it is Visual Basic. Well, maybe Visual Basic has a very lively online community? I doubt it somehow, probably, it is just a lot of Visual Basic books and articles that make it so popular in this index, but everything is possible.

OK, what about the number of projects in different languages at github.com? The following statistics is provided on the site. Also very close to the list that we obtained but instead of C# there is Shell which, probably, can be explained by a lot of people with Linux background who use github.com. It also seems that C# developers do not favor github.com for some reason.

I would say we have a good correlation between the results for the top languages. Still I will be very careful with saying how much the top languages are popular relative to each other since different sources yielded very different results. But, at least, we get the following 12 most popular at the moment programming languages:

C#, Java, PHP, JavaScript, C++, Python, Objective-C, C, Ruby, Perl, Shell, Visual Basic

No comprehensive research anywhere

The problem with measuring the popularity of programming languages seems to be more complex than we initially thought. Developers of a specific language tend to be concentrated around a few sites and different sites give very different results like stackoverflow.com and github.com in the present article. In a way the problem is similar to measuring the popularity of human languages by randomly going to different large world cities. After visiting a couple dozen of cities we may start to get an idea which human languages are popular, but we will have hard time measuring the relative popularity of these languages, and especially hard time with less popular languages: we may just not visit a single large city where those languages are spoken. So, in order to make a good research we should statistically compare results from many different cities (sites), and know the structure of the world (Web) well. Unfortunately, I could not find any such research on the Web and doing it myself requires much more effort than just a weekend project. But even then, is the language popularity only limited to its online popularity?

Links

Git-based collaboration in the cloud github.com
Software development Q&A stackoverflow.com
PhantomJS
Google Chart Tools
Google Chart Tools: Bar Chart
Count results
Github.com top languages
TIOBE programming community index

Smarter Search Engines

Having written a number of simple scripts that aggregate information that can be found on various Internet sites I have a feeling that I am doing some repetitive, useless and low-level work. It is like writing a classic “Hello, world!” program in machine codes. Clearly there is something missing here, something that could have helped […]