Information, Language, and Intelligence
Why probability, non-determinism, time and causality may not be universal notions, why it is hard to write in functional programming languages, why more sophisticated and less generally applicable theories are needed to get more insight into a particular problem, why modern computers and “artificial intelligence” are not intelligent, why philosophy rarely uses the language of mathematics, what is common between art, literature, natural languages and computer programs, what intelligence may really mean and the central place of language in understanding it.
We will define information quite informally, as although it is something quite intuitive and commonplace, it is hard to give an exact definition without imposing unnecessarily strict limitations on what the notion of information can encompass. Let’s also note that quite often information is confused with a particular way to represent it, for example, zeros and ones in a computer memory are just a representation of information and are not information themselves. At the same time one of the most important properties of information is that it can be encoded and represented. Information is a very generic notion, and, for example, everything perceivable by a human is just an encoding of some information. It is a somewhat elusive notion as we really do not perceive information in its original form, if indeed there is such a form, but only do that through various means in which traces of the original information can be observed. In short, information is something that can be encoded and interpreted.
By language we understand a particular way to encode information. The notion of language as we define it is a much wider and more universal notion than that of just a human or programming language, or that of a mathematical notation. For example, music and various art forms can also be viewed as languages for expressing information. In general everything which is perceivable by an agent is information encoded by means of multiple languages, and human speech or executable machine instructions form just a couple of particular classes of languages through which information can be perceived.
Simplicity of a language is determined by the ability required from an agent trying to perceive the information encoded in this language.
For example, zeros and ones stored in a computer memory may not be a simple enough language for humans to understand without some help from a computer. In case of humans an example of a simpler language is the language of numbers when it is used to describe the quantity of some type of objects.
Simplicity is therefore a relative and subjective notion for a particular agent, as what can seem like a simple language to one agent can be prohibitively difficult to another.
Formality is the quality of preserving the original information being encoded.
For example, intuitively it is clear that the symbolic language of propositional logic is more formal than any spoken human language when dealing with describing properties of objects and relationships between them. Accidentally, the language of propositional logic is also simpler for this particular type of information.
But how formal symbolic languages really are? It may be at first a bit surprising that even such a formal way to represent information as mathematical notation may not be “formal enough” as some information may be lost when we try to encode it as symbols. In fact, this is what one of the most important results in the field of mathematical logic, the Gödel theorem about the incompleteness of the axiomatic arithmetic, shows: the language of general relational logic is not formal enough to formally build arithmetic from axioms http://plato.stanford.edu/entries/goedel-incompleteness/
Applicability of a language can be judged from its ability to encode information in general. For example, a spoken human language can be used to encode poetry or write the present essay, while the language of mathematics is much less suitable for the same purposes. In this way the spoken language is more generally applicable than the language of mathematics.
Information encoded in one particular language can then be encoded again in another language, and we can then talk about translation from the first language to the second one. Languages can also be used to define other languages and we can reason about relationships between different languages as well. But in order to stay focused we will largely omit this discussion in the present essay.
Formality and applicability as we defined them are own properties of a particular language itself and do not change depending on the agent trying to interpret the language, but simplicity is a subjective quality attributed to the language by a particular agent.
Informally intelligence is the ability to interpret information by a given agent. In order to do that the agent should be able to construct and understand different encodings of information (different languages). These languages should be simple enough for the agent and at the same time remain formal and applicable enough for the given subset of information. Then intelligence can be defined as the ability to construct and understand languages and is fully defined by a set of all such possible languages.
It is quite interesting to note that the given definition of intelligence does not include any mention of agents and is done purely in terms of languages.
In the case of human intelligence having the ability to build new sufficiently simple, formal and applicable languages leads to the ability to categorize things, abstract away non-essential details and reason about the same set of information at different levels. So called “abstract thinking” can then be considered as one of the indications of human intelligence.
In the context of intelligence language simplicity can also be understood as the amount of memory and computational resources required to interpret the encoded information. For example, in the case of human intelligence, the language of normal form games in Game Theory may not be simple enough in all cases as a large amount of storage may be required. Determining satisfiability of expressions in propositional logic is yet another example, it can require exponential time and then in some cases propositional logic is also not a simple enough language for humans.
Let’s also make a remark like we did when discussing languages in general that we can further consider groups of agents, agents created by other agents in some language, relationship between different agents, etc. and also talk about intelligence in every one of these cases, but these discussions will remain out of the scope of this short essay. We can also consider whether the intelligence of the population of agents is greater than the intelligence of every individual agent, or whether we can define the average intelligence of a given population. Therefore, further in this essay by human intelligence we simply mean the average intelligence attainable by a human agent, and we do not talk about social environment or discuss how the intelligence is affected by the development of technology and many other things. These are all separate interesting topics for which a separate research can be done.
Hypothesis 1. (connection between applicability, simplicity and formality)
For a given intelligence the corresponding set of interpretable languages is such that as the applicability tends to increase, the formality and perceived simplicity tend to decrease and vice versa.
- In the case of mathematics, proposional logic is less applicable but more formal and simpler in its resolution rules than the general relational logic.
- Philosophical texts tend to use spoken language as opposed to strict symbolic notation as it allows them to be more applicable and less narrow. Unfortunately this also means being less formal and more difficult. In fact, this has likely been well understood by Plato as he deliberately chose to use dialogs as the form of his philosophical writing which makes it even less formal than the average plain philosophical text at the same time increasing applicability and allowing to reason about things in a more general way.
- Computing integrals and derivatives requires a simple and formal language of mathematical analysis to be developed. Yet this language is not generic enough to write general purpose novels in it or use it for communication between people and is highly specialized.
- Domain specific languages in software engineering are well suited for solving some particular problem and are built using a general purpose programming language. Being as formal as the general purpose programming language they are simpler to grasp and less applicable in general.
- Many art forms such as painting or music allow for different interpretations in a human language (i.e. different translations) and present an example of more applicable but less simple and less formal languages.
From this hypothesis it follows that at a given intelligence level trying to use simpler and more formal languages will lead to these languages becoming ever narrower. This in fact can be observed in many fields of science, where more sophisticated and formal theories are built to describe the growing number of specialized fields. The main reason for this apparent fragmentation and specialization of science is the limitation of human intelligence and its inability to build sufficiently applicable and at the same time formal and simple enough languages.
Human intelligence and its limitations
One particular case of intelligence, probably most familiar to us but still not understood well enough, is human intelligence. As we discussed before, like any other intelligence it is also defined by the set of languages that can be constructed and understood, in this case, by an average human.
From observing a number of still not fully solved or well understood problems in the perceivable world in general and in mathematics in particular we can guess that the human intelligence is actually quite limited by its nature. For example, this intelligence is apparently not able to build a universally applicable formal and simple language.
Moreover, many of the very core concepts and notions that humans are routinely using to reason with at a closer look turn out to be not all that universal and specific just to some of the languages that are constructed and understood by humans. These core concepts are seemingly not inherent in the information itself but are just artifacts of the human mind and its perception of the world. Let’s discuss a few examples in more detail.
Thinking in terms of objects, actions and changes is a common human way to imagine the world as if it indeed consists of objects that can have mutable state and can change after some action is applied to them. Trees, animals, cars, houses and many other things are all perceived as objects: an object can be both a source of action for other objects or can itself experience actions from other objects. The effect of a particular action is then some change of the state of this object.
But do we really always need concepts of “object”, “action” and “change” in order to encode information? The answer we get from pure mathematics, geometry and functional programming languages is clearly – “no”. There are multiple examples that it is possible to do just fine with an abstract symbolic notation that is not concerned with objects, actions or changes, and at the same time is formal and simple. Unfortunately it turns out that such a symbolic notation will not always be applicable widely enough and might be too overspecialized for solving some particular problem, but this has nothing to do with such a notation itself, rather it indicates the limitations of the human intelligence.
Symbolic notation in general is also usually perceived by humans as being “more difficult”. For example, software engineers know it well that for many algorithms it is possible to create purely functional stateless versions, although it is much harder and less obvious to do so. Or even mathematics itself is often perceived as being “difficult”.
Similar reasoning can be done about time, non-determinism and causality: it is possible to build formal and simple languages that do not involve either of these concepts. An interesting example of how time, causality and non-determinism are artificial and non-essential is provided, for example, by Game Theory: every normal form game can be rewritten as an incomplete information extensive form game which unlike the original game operates with the concept of time and sequences of actions and may potentially simplify the subsequent analysis. Yet another example we can find in statistical methods and probability theory: when it is “difficult” to compute some values for a given model we usually can make a few assumptions and then deal with a less formal “non-deterministic” model that is now much easier to analyze, although interestingly enough we still try to reason about the resulting simplified model in a relatively formal way.
It seems that such concepts as time, non-determinism, causality, object, action and change are merely mental devices that humans tend to incorporate in most of the languages understood by them and are not essential for representing information and perceiving the world.
Having discovered this it is interesting to recall the famous Plato dialog, Euthyphro http://plato.stanford.edu/entries/plato-ethics-shorter/#13 which discusses causality and shows that often we cannot easily determine which of the several concepts is the primary one causing and defining the other concepts (in the dialog the discussion is about how piety can be defined). Now we understand why we cannot always decide which of the given several concepts is the primary one: causality is deeply flawed, not universal and not applicable in this and many other cases. So it is not always correct to ask to define one thing through another. The same, by the way, can be observed in many axiomatic systems in mathematics: there are always concepts that are not caused by or do not follow from other concepts, which is yet another indication that causality is not universal.
Hypothesis 2. (non-causal determinism)
The concepts of time, non-determinism, causality, object, action and change are specific to a certain class of languages that can be understood by humans and are not essential for encoding information and perceiving the world. The information and the world themselves are non-causal and deterministic.
Interestingly enough many of the things commonly observable by humans seem to be cyclical in nature, these are usually either simple cycles or superposition of many cycles. According to the last hypothesis the cyclical nature of things is just a peculiarity of human perception and an illusion, as in reality the things are not cyclical, although they are often perceived as such due to the limitations of human intelligence.
While discussing cycles we can also recall the modern view on the dual nature of particles in physics: according to this view it can be convenient to imagine particles as waves. Based on the hypothesis we should not be surprised if the further progress of particle physics eventually will show that modeling particles as waves or even imagining particles themselves is just an oversimplification and a mere mental device, and that the reality is different, non-changing and deterministic. Although in this particular case the true understanding of the nature of things also might be beyond any human intelligence.
How can human intelligence be expanded or improved? There are a few obvious ways that follow from the definition of intelligence.
In order to be able to construct more languages humans may need to perceive more, have better measurement tools and information gathering devices and this is one direction for improvement.
Another way is to try to make the constructed languages to appear simple enough with the help of appropriate tools. It can be that some particular language is already formal and applicable in the desired domain but at the same time it is not simple enough and requires a lot of computational resources or memory that an average human does not possess. Historically for this purpose humans would use computers of different kinds, starting from abacus a few thousand years ago and ending in present with the digital machines built according to the architecture suggested by von Neumann.
Beyond human intelligence
The ability to understand languages is not unique to humans and there are other intelligences of which we can find multiple examples. There are much simpler ones created by humans themselves such as, for example, an executable computer program that can understand only a very limited set of inputs. Every such program can also be viewed as a mere piece of information encoded by humans using some of the languages that humans understand but the resulting computer program obviously does not.
We also can notice that there are languages that humans cannot interpret well. This can be an indication of far more advanced intelligences to which humans may be related in a similar fashion like a computer program is related to a person who wrote it. Due to the already discussed limitations of human intelligence we, unfortunately, may not be able to reason well about the intelligences that are more advanced than our own as we cannot by definition fully understand all the languages understood by them. Although as hypothesis 1 (connection between applicability, simplicity and formality) suggests it may be possible to try to get a short glimpse by trying to create more formal and simple languages. Metaphorically it may be a bit similar to being able to observe what is happening outside through a very narrow key hole which significantly limits the perspective. We can only guess that these higher languages that we cannot fully understand may be similar to the formal languages of the many specialized fields of mathematics and physics in their precision with the difference being that they are much more generic and applicable. Still it may be quite hard to understand higher languages just like as it is impossible for a program that was written by a person to understand that it was in fact written by somebody.
Hypothesis 3. (language continuum)
For any set of information a formal language can be found that is simple for some intelligence.
Inability to understand of find such a language may merely indicate the limitations of a certain intelligence, and, as we noted before, perceiving a formal applicable language for a given set of information as being “difficult” is subjective and is not a property of the language itself.
Therefore a simple solution can be found for any problem once an appropriate formal and simple language is found. According to this hypothesis there are no inherently difficult problems, only not capable enough intelligences that cannot understand them. Solving any problem is equivalent to finding a corresponding formal and simple language.
Hypothesis 4. (absolute intelligence)
There exists absolute intelligence which is capable of encoding any information in a formal and simple for that intelligence language.
It may be that all the possible information can be encoded using a simple and formal language which in this case we will call universal language. The intelligence able to understand and use this language is absolute.
Other languages for other subsets of information and corresponding intelligences itself may then just be represented in the same universal language and fully understood by this absolute intelligence. In between any given intelligence such as the human one and the absolute intelligence there can exist many other higher intelligences which will be somewhat similar to the human intelligence in the sense that they are not absolute.
Turing test is one of the suggested criteria for determining whether an intelligence created by humans can itself be judged to be comparable to the human intelligence. In the test a human person and a created intelligence or another human person communicate with each other and the human tries to judge whether he is communicating with another human or the created intelligence. If no distinction can be made, then the created intelligence passes the Turing test and is considered “intelligent enough”.
If some intelligence passes the test it means that it can construct and understand at least as many sufficiently simple, formal and applicable languages as an average human can. Then according to the definition given in the present essay this constructed intelligence is comparable to the average human intelligence. Therefore, it is easy to see that the linguistic definition of human intelligence given in the present essay is equivivalent to the definition of intelligence suggested earlier by Turing. In this sense the present essay expands on the ideas of Alan Turing as the validity of the Turing test follows directly from our definitions.
Modern “Artificial Intelligence” and “Machine Learning”
By “artificial” intelligence we usually understand any intelligence that has been manufactured by humans.
Especially lately there has been a lot of talk of artificial intelligence and machine learning in the software industry. At a closer look, however, given the linguistic definition of intelligence we notice that in this particular case the term “intelligence” may be a bit misplaced. Indeed, modern adaptive and categorization algorithms are not able to construct or understand many basic languages understood by humans, and are quite limited to solving very specific problems. Then based on our definition of intelligence it may be more appropriate to talk about adaptive and categorization algorithms or finding solutions to optimization problems rather than about “intelligence”.
Unfortunately the linguistic nature of intelligence remains little understood, and the Turing research is often misinterpreted as well, so for such algorithms and techniques we continue to use the terms “intelligence” and “machine learning” which only causes further confusion and misunderstanding.
Another interesting observation is that, in fact, the very modern computer architecture itself may be limiting the applicability of the languages that can be potentially constructed by programs running on such computers. One possible indication of this seems to be the observable “difficulty” of NP-complete problems that cannot be solved in a reasonable amount of time on such architectures. And yet these are precisely the problems that are often encountered in modern artificial intelligence research which in itself does not look like a mere “coincidence”.
Let’s note that according to the language continuum hypothesis NP-complete problems can easily be “solved” while satisfying the perceived complexity requirements. The key to solving them is using an appropriate language, but the real problem here seems to be the inaccessibility of such a formal and simple language to human intelligence and inability to find it, although it must exist.
In general singularity is some abrupt qualitative change when the usual laws and assumptions stop working and the new ones are needed.
In the context of artificial intelligence singularity means a potential breakthrough that would allow to create an intelligence that will equal or exceed that of an average human and will be potentially able to create an even more advanced intelligence or replicate itself. It is also implied that the process of creating such a breakthrough intelligence should be fully understood and controlled by humans.
Is singularity possible or even inevitable? It is hard to find a definite answer. Let’s try nonetheless to define a few necessary conditions for reaching the singularity. Note, however that, like the Turing test itself, these conditions are not sufficient to judge whether the created intelligence is already advanced to such a degree that the singularity has been reached and as a consequence the very fact that the singularity has been reached can remain largely unnoticed.
At the point of singularity the created intelligence will have to be able to:
- pass the Turing test
- create new languages and categories for encoding the information it has not yet encountered when it had been created
- produce artifacts expressed in visual art, literature, and music comparable to those made by humans
- learn and proficiently use any human language
- attain perfect knowledge of self so that it can create a new intelligence comparable to that of a human or replicate itself
Intelligence is considered to be “greater” than some other intelligence if all the languages understood by another intelligence are also understood by it and it also understands some additional languages that another intelligence is not capable of understanding. Most likely the following hypothesis also holds true:
Hypothesis 5. (impossibility to construct greater intelligence)
Intelligence cannot construct a greater intelligence that will be able to construct a richer set of languages.
Given our definition of intelligence intuitively it is quite clear why this hypothesis may be true. Indeed, if some intelligence is able to construct another intelligence, then indirectly the original intelligence is also able to construct all the languages that the created intelligence is able to construct and the constructed intelligence is not greater than the original one.
Note, that at the same time this hypothesis allows for constructing the intelligence that will be exactly the same as the intelligence that constructed it or will be a lesser intelligence.
Another aspect that is often omitted or misunderstood when talking about singularity is that the human intelligence as what we perceive it to be “at present” may be underestimated and “in the future” humans might be able to construct much more advanced languages with the help of computing devices or other not yet discovered technologies. Even if it is not possible for humans to create a greater or equal intelligence it is still possible to try to further develop the human intelligence itself. In fact the boundaries and inner workings of human intelligence and the types of languages that can be constructed by it are still not understood well enough by the modern science.
Directions for further research
The present essay tries to give a linguistic definition of intelligence and expand on the ideas first suggested by Alan Turing.
However, we have barely touched the surface on many of the topics and the discussion itself was quite informal and far from rigorous. As we noted in the essay, some of the concepts we constantly operate with in human languages such as objects, actions and causality may not be universal and yet, paradoxically enough, these very concepts are heavily used by the essay itself to reason about more abstract and universal concepts such as information, language and intelligence. It might be that the language of the essay and any further following research can be revised and refined in this regard.
Another interesting questions is whether we can use a formal symbolic notation to try to define the concepts discussed and to build a formal mathematical theory that will model them. This surely can be done, but we should be mindful of hypothesis 1 (connection between applicability, simplicity and formality) and understand that our discussion may be unnecessarily narrowed down because of the choice of a more formal language. Also when striving for ultimate formality and strictness we should remember the Gödel theorem that shows that even the language of general relational logic is not formal enough and ultimate formality likely cannot be attained anyway. These are some of the reasons why we chose not to use any formal symbolic notation in the present essay, although this direction can certainly be researched further and we may be able to get more insight for some particular cases similar to how it has been done in many other fields of mathematics and physics.
Historically and especially in the last few centuries the research in many fields including linguistics and intelligence has been too centered on humans, which probably is no surprise, given that all such research was done by humans themselves. As we saw in the essay human intelligence is just one particular case of intelligence in general and it may be wrong to focus exclusively on it when trying to understand intelligence in general. Even the question whether “the present” human level of intelligence can be exceeded by some intelligent machine may not be all that important as it seems at first and instead we should focus more on other interesting related questions in the field of intelligence. But still it is possible to do further research on the intelligence of groups of people, how intelligence seems to evolve and what role the surrounding conditions can play.
Equally it seems to be wrong to consider machine intelligence separate from human intelligence or other kinds of possible intelligence. In particular we discussed that intelligence is defined by the set of languages that are understood by it and not by an individual agent or whether this agent has a silicon-based, biological or some other nature.
Yet another possible research direction involves studying the relationship between different languages, what translation between languages can really mean, what metrics we can introduce for the way information is encoded by a particular language, and what classes of languages there are. An important thing is that in any such research languages should be understood in the broader sense that we used here.
We can also try to define the concept of computation in the linguistic terms and try to reason about time and storage complexity.
[Credits: Oak image from © Free Clip Art Now http://www.freeclipartnow.com/nature/plants/trees/oak.jpg.html ]
“Computing Machinery and Intelligence” by A.M. Turing (1950), Mind, 59, 433-460. http://mind.oxfordjournals.org/content/LIX/236/433.full.pdf+html