A Brief History of Artificial Intelligence
Artificial Intelligence (AI) has become a common phrase in current day and it no longer surprises us what some of the AI based systems used in our daily lives can achieve. However, this is a result of decades of progress. In this article, I present to you a brief history of AI. Pretty much all of it is a summary of a chapter in the book “Artificial Intelligence A Modern Approach” by Stuart J. Russell and Peter Norvig. It can be found here.
What is AI? Before we define artificial intelligence, let us define intelligence. There are many ways to define intelligence. It is the ability to perceiving your surroundings, self-awareness, reason, logic, creative thinking, planning, and problem solving. Broadly, it is the ability to receive information from the surrounding, interpret it, remember it, draw conclusions, and then apply it for problem solving. To us humans, it comes instinctively.
The best way to define AI in my opinion is that it is when a none-living entity imitates the behavior of a human, in other words, performs tasks that would normally require the logical reasoning abilities of a human. The field of AI deals with understanding how human intelligence works as well as translating that into building systems that follow those principals and imitate human intelligence.
Russel and Norvig in their famous book, “Artificial Intelligence, A Modern Approach” note that historically, there have been many definitions of AI, that can be divided into for categories:
1. Thinking Humanly
2. Thinking Rationally
3. Acting Humanly
4. Acting Rationally
As they mention, the first too are related to thought processes and reasoning and the last two deal with actions or applications. Modern AI systems implement all those four components, interpreting data, learning from it and then devising a plan of action, like a human would.
The Conception of AI:
The first work in AI is generally believed to be done by Warren McCulloch and Walter Pitts in 1943. They proposed a model of artificial neurons, with each neuron being on or off. The switch is turned on or off by stimulus from a certain number of neighboring neurons. They showed that any computable function can be represented by a network of connected neurons and all logical connectives (and, or, not, etc.) can be implemented by simple net structures. Thy further suggested that suitably defined networks can learn too. In 1949, Donald Hebb demonstrating a simple rule for modifying connection strength between neurons, now known as Hebbian rule. In 1950, two Harvard University undergraduates, Marvin Minsky and Dean Edmonds built the first neural network computer. Alan Turing’s gave lectures on the topic as early as 1947 and presented a persuasive agenda in his famous 1950 article “Computing Machinery and Intelligence” in which he introduced the Turing Test, machine learning, genetic algorithms, and reinforcement learning. He came up with the remarkably interesting idea that instead of making programs that simulate an adult mind, we should focus on making ones that simulate a child’s mind.
John McArthy, along with other influential AI figures like Minsky, Claude Shannon and Nathaniel Rochester organized a two month workshop at Dartmouth in the summer of 1956 to bring together US researchers interested in the study of automata theory, neural nets, and the study of intelligence. This can be considered as the official birth of artificial intelligence as a field of its own. This workshop was attended by researchers from IBM, MIT and Carnegie Tech. This workshop did not lead to any new breakthroughs, but it introduced all the influential researchers to each other, and for the next 20 years, the field of AI was dominated by these figures and their students and colleagues from MIT, CMU, Stanford and IBM.
The early years of AI were full of achievements. Considering computing power was limited, anything new in this field felt like a big breakthrough. Newell and Simon came up with the General Problem Solver program which imitated human problem-solving protocols. At IBM, Herbert Gelernter constructed the Geometry Theorem Prover in 1959 which was able to prove geometrical theorem found to be challenging by many students. Starting in 1952, Arthur Samuel built a series of programs that played checkers (draughts) and eventually built one that could learn quickly and play better than its creator. This was featured on television in 1956 and created a great buzz.
In 1958, John McCarthy defined the programing language Lisp at MIT AI lab which remained dominated as the AI programming language for the next 30 years. Also in 1958, McCarthy published a paper “Programs with Common Sense”, in which he described a hypothetical program the Advice Taker, which can be seen as the first complete AI program. It embodied general knowledge of the world and its applications were not limited to any domain but could be used in any field. Thus it could perform search, logical deduction on any provided knowledge to solve the problem without having to be reprogrammed.
In 1963, McCarthy started the AI lab at Stanford. Work at Stanford emphasized on general-purpose methods for logical reasoning. Applications of logic included Cordell Green’s question-answering and planning systems and Shakey robotics project at Stanford Research Institute. The later project was the first to demonstrate the complete integration of logical reasoning and physical activity. Minsky supervised several his students who chose a certain domain of problems that required intelligence to find solution. These domains were termed as microworlds. Perhaps the most famous of these microworlds was the blocks world which consisted of geometric blocks set on a tabletop and required a robotic hand to arrange the blocks a certain way. The block world was home to various other projects like the vision project (1971), the vision and constraint-propagation (1975), the learning theory (1970), the natural language understanding program (1972) and the planner (1974).
A scene from Blocks World (image source)
In 1963, Winograd and Cowan showed how large number of elements could represent an individual concept and increase the robustness and parallelism, advancing the concept of neural networks. Hebb’s learning methods were enhanced by Bernie Windrows which termed his network adalines. In 1962 Frank Rosenblatt introduced perceptrons. The perceptron convergence theorem says that connection strength between perceptron can adjust to match any input provided such a match exists.
Challenges and setbacks:
In the early era, AI created a lot of enthusiasm and scientists were not shy in making predictions, some of which eventually did become true however there were some setbacks.
Most early programs were not based on knowledge but succeeded mainly by syntactic manipulation. For example, early machine translation efforts were generously funded by the US National Research Council. They efforts were directed towards simple syntactic transformation of Russian Grammar to English and replacing Russian words with their meanings in English using an electronic dictionary. This failed horribly and in 1966, a report by an advisory committee reported that “there has been no machine translation of general scientific text, and none is in immediate prospect”. After this, all US government funding for academic translation projects were cancelled.
Another problem was that the AI systems at that time mainly found solutions by tying out all the solutions and seeing which combination worked best. This worked very well in the microworlds where the number of possible solutions were limited. Early experiments in machine evolution were based on the belief that with the appropriate changes in the code (mutation) can enable the program to be used to find solution for any problem. The idea was then to try random mutations in the code and preserve the ones that worked well. However, despite large amount of computation, no significant progress was demonstrated. In 1973, this was heavily criticized in the Lighthill report after which, the UK government stopped funding for AI research in all but two universities.
There were also some limitations to the structures of neural networks. For example, the perceptrons could learn anything that they could represent, but they could represent very little. This did not apply to more complex and multi-layered neural networks but funding to research reduced to almost nothing.
Introduction of knowledge-based systems:
To overcome the limitations of early AI system, AI systems were built with domain-specific knowledge. An early example is The Dendral Program in 1969 developed at Stanford by Ed Feigenbaum and Joshua Lederberg. The input would consist of the elementary formula of a molecule and the mass spectrum giving the mass of various fragments of the molecule. The output was to be the structure of the molecule. The naïve version of the program would generate all possible structures of the molecules based on the formula and then compared the mass with the provided mass. This would mean an insanely large amount of processing because of the large amount of generated possible structures. However, the intelligent system contained knowledge of the presence of certain substructures in the molecule based on certain peaks in the spectrum, reducing the number of potential structures significantly. This was the first successful knowledge intensive system.
With this in mind, the Heuristic Programming Project was started at Stanford to research the extent to which this methodology could be applied to other areas requiring human expertise. They developed a system called MYCIN to diagnose blood infections, which was based on about 450 rules and it performed as well as some experts and considerably better than some junior doctors. The importance of the domain specific knowledge was also apparent in the understanding of natural language. Researchers suggested that good language understanding requires general knowledge about the world and a general method for using that knowledge. A linguist-turned-AI-researcher Roger Shank at Yale along with his students built several systems with the task of understanding language (1977–1983). The emphasis was less of the language but more on the problems of representing and reasoning with the knowledge required for language understanding.
The widespread growth of applications to real-world problems led to an increase in the demands for workable knowledge representation scheme and a large number of different representation and reasoning languages were developed.
AI as an industry:
Overall, the AI industry grew from a few million in 1980 to billions in 1988. The first commercial expert system called R1 became operational at the Digital Equipment Corporation in 1986, helping the company configure computer orders and saving them estimated 40 million USD annually! By 1988, nearly every major US corporation had AI systems deployed or were looking into them.
This period was followed by AI winter, in which many companies failed to deliver on the big promises. In the mid 1980s, back-propagation learning algorithm was reinvented after being first found in 1969 by Bryson and Ho. Parallel Distributed Processing (Rumerhart and McClelland, 1986) was introduced as a neural network model that provided a general mathematical framework for researchers to operate in. The algorithm was applied to various computer science and psychology learning problems and generated a lot of excitement.
With time, it became common to build AI models based on already existing systems and models rather than inventing new ones, as well as applications of these models in real life rather than toy concepts. AI was founded as a rebellion to control theory and statistic but with time, AI has embraced those fields and has become a scientific method. To be accepted, a hypothesis must be subjected to empirical experiments, results statistically analyzed for their importance (Cohen, 1995). Experiments can now be replicated by using shared repositories of test data and code. This pattern is obvious in various fields.
One example is the field of speech recognition. In the 1970s, various methodologies were tried, but results were limited to few selected samples. Recently, Hidden Markov Models (HMM) has dominated the field and is based on rigorous mathematical theory and these models are generated by training them on large amount of real speech data. Similarly, machine translation was based on sequence s of words which models learned through principals of information theory. This was not favored in the 1960s but was adopted again in the 1990s. Neural networks followed the same trend. Initially in the 1980s, focus was on understanding neural networks and how they differ from the “traditional” methodologies but now, they can be compared with corresponding techniques from statistics, pattern recognition and machine learning.
The Birth of Intelligent agents/Artificial General Intelligence (1995-present):
Since a domain specific AI has been doing well and has progressed a lot, the focus once again is shifting towards intelligent agents or Artificial General Intelligence which means that AI should not be limited to a specific domain but can perform tasks that would require human intelligence in any environment.
The emphasis has also shifted from the algorithm to the amount of data available. Many recent papers (like Banko and Brill in 2001, Hays and Efros in 2007) in AI suggest that it is the data available to train the model that we should worry about and not worry as much about which algorithm to apply. This is specially true now because of the very large amount of data available these days, for example, on the internet. It has been proven that learning methods rather than hand coded knowledge gives superior performance provided there’s sufficient data available.
AI is the next revolution in pretty much every industry, and it is deeply embedded (and going deeper) in every industry that it is no longer limited to computer science, many individuals from other industries are jumping in to learn about AI and how to apply it in their field. We have self-driving cars on the road already, conversational agents are replacing customer service agents, both text-based chat and audio/video, medical imaging is being processed by AI systems for diagnosis and the list goes on. We have also gotten close to Artificial General Intelligence in the form of models like GPT3, which seems to be able to understand language and generate articles, answers and even code for websites and other applications.