Many people believe AI (Artificial Intelligence research) started quite recently, like five years ago. But in fact the field has already had 70 years of fascinating history. It all began in the nineteen-fifties when the potential power of information technology was becoming clear, at least to a small group of far-sighed thinkers including Alan Turing and Norbert Wiener. They started to dream in earnest of building machines that exhibit some form of intelligence. Initially, there was a strong interest in constructing small cybernetic robots now called animats. For example, Grey Walter built two robotic turtles called Elmer and Elsie in 1950 and showed how they could roam around a living room, find a charging station and recharge themselves. In the same year, Claude Shannon demonstrated an electronic mouse Theseus that could learn to find its way in a maze. These animat builders were strongly influenced by the behaviorist school that dominated psychology at the time. Among the main ideas coming out of their work were models of neural networks to implement associative and reinforcement learning, which they demonstrated in behavioral conditioning scenarios similar to those used by Skinner and others to train animals.
The neural network models coming out of this work were initially applied primarily to problems in pattern recognition, image classification and systems control. In subsequent decades, neural networks and their applications multiplied, although the general idea always remained the same: Neural networks are numerical decision-makers. A network consists of several layers of nodes (loosely inspired by biological neurons) that make a weighted decision to produce a numerical output given a set of numerical inputs. For example, a node might produce a control signal to increase the speed of the left motor if a light sensor mounted on the right side of the robot captures light, so that the robot turns towards the light. More generally, neural networks implement dynamical systems that map vectors of numbers (for example a sequence of values produced by a digital camera) to other vectors (for example a stream of signals controlling the operation of robot actuators), possibly with extra layers of decision-making in between. The intermediary layers may extract additional information from the sensory inputs or coordinate different aspects of the output.
The most remarkable property of these neural networks is that they learn autonomously based on a procedure, called a learning algorithm, that gradually changes the weights of the various decision nodes in order to minimize a decision error. Neural networks are therefore no longer programmed as is the case for ordinary computer programs. The 'trainer' only needs to provide a very large set of input-output pairs or reinforcement signals and - if all goes well - the weights then get progressively pushed in the right direction given an adequate learning algorithm.
Around 1955 a group of adventurous researchers including Herbert Simon (who later got a Nobel prize in economics), Alen Newell, Marvin Minsky, and John McCarthy opened a second thread in AI research. They focused on human mental tasks, rather than animal behavior, and started to use the term 'artificial intelligence' for their work. At first they were particularly interested in mathematical theorem proving, problem solving, board games and puzzles. By the end of the nineteen-fifties they already showed impressive demonstrations of computer programs capable of excellent performance in these domains. For example, Newell, Simon and Shaw already demonstrated around 1958 a system that could prove most of the theorems contained in the Principia Mathematica of Bertrand Russell and Norbert Whitehead.
The basic idea behind these achievements is that human intelligence is based on the creation and manipulation of symbolic structures. Symbolic structures are graphs where the nodes and links between nodes are labelled. For example, the problem of finding a path in a city is handled by representing streets, building and other landmarks as symbol nodes and the locations and spatial relations of these entities as labelled links between these nodes. Finding a path then consists in traversing this network to search connections between an initial starting point and a goal destination. Playing chess is done by representing the pieces of chess and their positions as symbols, and defining symbolically the possible moves that each piece can make on the chess board. To decide on the next move, the player generates a search space which considers the different possible moves from the current board position and evaluates whether they will give an advantage or create dangerous conditions that might lead to check mate. Because the search space of possible chess moves is very large, human players bring heuristics to bear. Heuristics are strategies to minimize search by applying more knowledge, for example, knowledge about typical openings or end-game solutions. In early AI, research into heuristics and how they could be learned was one of the main topics.
The first difference with the earlier neural network models is that this kind of AI uses symbolic representations and operations rather than vectors of numbers and numerical operations over them. It is therefore also called symbolic AI as opposed to the numerical AI of neural networks. A second difference is that this kind of AI took from the beginning the side of cognitive psychology, in opposition to the behaviourist psychology that inspired the neural network pioneers. Behaviourists argued against complex mental processing claiming that competence, even for language or problem solving, was based on fairly superficial stimulus-response associations learned through associative or reinforcement learning. Cognitive psychologists were instead no longer averse to complex internal models (like a graph of streets representing the geography of a city), rich knowledge representations (for example, semantic networks representing the common sense implications of basic concepts), or sophisticated syntactic and semantic processing (as needed in parsing and producing language). They argued instead that the stimulus-response associations implemented by neural networks were too superficial to implement reasoning, language processing or other tasks we consider to require intelligence. In AI we similarly have a dichotomy between behaviourist AI which rests its hope on associations implemented through neural networks and cognitivist AI which works with complex symbolic representations.
A third distinction between the two schools of thought concern learning. Whereas neural network enthusiasts emphasize statistical induction, i.e. progressively generalizing from many experiences, the symbolic models primarily emphasize `learning by being told’ and constructivist learning. Learning by being told means that the learner is able to comprehend instructions or advice and incorporate and use that in subsequent problem solving. Constructivist learning means that the learner uses his available knowledge to construct new distinctions or to formulate sensible hypotheses and then test them out against reality. A single exposure is then often enough to acquire a significant piece of new knowledge, contrasting with the massive amount of data that is needed to implement the statistical induction which neural networks rely on.
By the early nineteen-sixties several laboratories exploring symbolic AI had sprung up and already very significant technical advances had been made, particularly in how to handle symbolic computation. Soon many more areas of human intelligence were explored: medical diagnosis, scientific discovery, intelligent scheduling, legal argumentation, technical design, language processing, common sense reasoning, even artistic creativity. In the decades that followed, all this research lead to industrially usable expert systems assisting human experts in problem solving. They also lead to the construction of very large knowledge bases such as the knowledge graphs that underlie today's search engines, and to natural language processing technologies that could power computer-assisted translation or text editing tools.
Fast forward to more recent times.
By the beginning of the 21st century both the numerical/behaviorist AI tradition and the symbolic/cognitivist AI tradition had reached maturity. AI was no longer in the spotlight and became an accepted branch of software engineering and computer science. The field had developed a well-established set of tools and practices for building intelligent systems and they were used in a wide range of industrial and commercial applications. Meanwhile fundamental AI research continued, exploring both neural network models and symbolic methods. More fundamental research was necessary, partly because the difference between human intelligence and machine intelligence was still very significant - and it still is today.
But around 2010, a remarkable surge in the availability of data due to the deepening penetration of information technology in human activities and a considerable jump in the power of computers caused a rather sudden growth in enthusiasm for AI, specifically for the numerical AI techniques pioneered by neural network researchers, such as deep learning and convolutional networks. Earlier on these techniques were not applicable on realistic problems due to a lack of data and computing power. But now they were. The renewed enthusiasm caught on and spread rapidly throughout the world. Management consulting companies promoted (numerical) AI as the next frontier for industry and as an essential skill if companies wanted to remain competitive in today's world. Governments drew up strategic plans for AI and new start-ups and laboratories sprung up like mushrooms. The enthusiasm was not only due to the use of neural network methods. Many existing techniques of numerical and statistical analysis (such as regression, clustering, principled component analysis, optimization techniques, etc.) were now also promoted as being part of AI, thus rapidly increasing the scope of the field to encompass a far larger range of techniques and applications, beyond neural networks and symbolic methods.
But the growing reach of numerical methods and the fact that they were now labelled as AI came with a catch. The symbolic/cognitivist AI approach advocates starting from human expertise. It tries to model human reasoning, human knowledge and human forms of communication so that the decision making by a system can be followed by a human, an explanation in human terms can be provided easily, and the system can accept advice from a human in a symbolic form (i.e. in human language). This kind of AI is therefore human-centered. It attempts to empower humans rather than replacing them. In contrast, the numerical/behaviorist AI tradition, including the recent addition of statistical numerical methods, focuses on building systems by finding the right weight parameters that (ideally) give adequate performance, but the basis of their decision-making is hidden in millions of numerical parameters that are entirely incomprehensible to a human observer, even to the designer of the network or the trainer. Such systems are forever black-boxes.
A black-box approach is alright for domains where a human-centered approach is not required, for example, for a controller of a complex technical device. But it is another matter if these numerical methods are used for domains that touch on human concerns, for example, to decide whether a prisoner gets parole, a citizen gets social housing, a consumer gets more credit, or a candidate gets an interview for a job. In those cases, the black-box approach of numerical AI becomes problematic and those who are affected by these decisions rightly feel helpless and treated unfairly. Of course numerical methods have been used for a long time (such as in operations research) but the systems built on this basis were not called intelligent. Nobody was expecting an explanation and nobody was claiming that they were as good or better than human experts. However, if you call such systems 'intelligent', the expectations of users increase drastically and they expect similar functions as we find in human intelligence, in particular the capacity to explain how a decision was made or to accept counterarguments, transparency, and consistency.
An additional problem of statistical methods is that they do not give the robustness and reliability that we normally expect from engineered systems. If decisions are based on statistical grounds, there are always going to be outlier cases which do not fall in the most common range. There is always going to be a bias in the data that is used for training. A decision can only be based on the features that were available for training, which might not include crucial properties of the context that a human expert would effortlessly take into account or aspects of reality which cannot be measured easily but are nevertheless important. For example, a legal advisory system built using deep learning will perform induction over a large number of cases to build statistical models how cases have been handled in the past. A new case is handled by comparing it to these models, but the system has no explicit notion of the underlying law or common custom and can therefore not justify its decisions in terms that would stand up in court. In contrast, a symbolic legal expert system will be based on a codification and implementation of the law and it will handle new cases through logical inference based on the implementation of these legal rules. This is not without its problems either, because, even in the case of codified law, there is always an interpretation step that relies on human empathy and common sense knowledge, which is very hard to capture in explicit rules.
So AI finds itself in an impasse. Numerical AI has caused great enthusiasm lately but, because it is not human-centered, it has raised a wide range of ethical and legal considerations and has generated justified worries by those caring about the rights of citizens. Particularly in Europe, this has lead to calls for developing trustworthy AI, although it is far from clear how this can be done for AI systems built by using statistical numerical methods on big data. On the other hand, we do not want to forego the obvious power that these statistical numerical methods provide either. They have proven their worth in many areas particularly in pattern recognition or systems control. So how to resolve this paradox?
My feeling is that we should do two things.
As a starter, we need to develop hybrid AI which uses both numerical approaches and symbolic approaches. Indeed, this is already happening in a number of innovative projects. For example, numerical AI is useful for learning heuristic decision rules in tandem with a symbolic system that creates search spaces using an accurate model of the domain. Numerical methods are useful for quickly retrieving information from very large knowledge bases but the knowledge-bases themselves are symbolic and the application of information to a concrete case is done with symbolic inference. Numerical techniques are effective in pattern recognition, for example for image processing and interpretation, but these techniques only give reliable results when complemented by common sense knowledge and inference to interpret the hierarchical structures and activities of a scene.
Second, fundamental AI research has to go back to the drawing board. So far both the numerical and the symbolic approach have always tried to circumvent meaning and understanding, even though meaning is central to humans as persons. A judicial decision on parole is not just a matter of statistics or the cold application of logical rules. A human judge will try to understand the social context of the offender, the motivation for the crime, the psychology and attitudes of the offender, and so on. When we send in a cv for a job, we expect that the recruiter will go beyond superficial features of a cv and build up a total picture, which includes social skills, history of achievements (even if they have nothing to do with the job itself), respect for human values, motivations, fluency in other languages, fit with other members of the team, etc. AI is not at all capable today of constructing the kind of narratives that humans make all the time in order to interpret the world and the behaviour of others. As long as that is the case, we should not throw AI into society for applications that touch on human life.