A long-awaited success for Go

 In recent years, for most games, programs won against the best human players. However, one game was still dominated by human experts: Go.

Go has a unique place among games: the number of legal moves is around 250, and the game length is near 150. If one wants to consider all the possible moves, the tree is so large that it is infeasible. For such games, a solution is to have an evaluation function, that predicts the value of the corresponding terminal position. Go is so complex that researchers were not able to find a satisfactory function. Winning against the best Go players was the holy grail of AI.

Therefore, for a long time, human Go players made fun of programs, which did not even have the level of a beginner. In the 70s, Chess players had the same contempt for Chess programs.

Google Deep Mind has just developed an extraordinary system, which has won 4-1 against one of the best Go players in the world: Lee Sedol. For this achievement, AlphaGo received an honorary “ninth dan” professional ranking, the highest possible level. This citation was given in recognition of AlphaGo’s “sincere efforts” to master Go’s Taoist foundations and reach a level “close to the territory of divinity”.

AlphaGo is based on general purpose AI methods, which have already been used for Go programs, but with far less success. A paper, published in Nature, gives a lot of information on this system: Mastering the Game of Go with Deep Neural Networks and Tree Search.

It uses neural networks for learning from a huge set of results: the more results, the better. In that way, it builds several functions: one of them estimates the value of a position, and another predicts good moves in this position. For one of these functions, it has used a database with 30 millions moves. It is also able to use the moves generated when AlphaGo plays against itself. It is extraordinary that AlphaGo already plays well using only the neural networks, without developing a tree.

However, it plays better when it develops a tree, while always using its neural networks. For considering the future possible moves, AlphaGo uses random Go, that has been introduced in most-recent Go programs, after Bruno Bouzy shown that it gave a strategic vision to the program. For evaluating a move, one plays it; from this position, one develops thousands of paths. At each step, one move is chosen randomly among the best moves found by the neural network. When this is completed, it chooses the move that has the highest mean of all the terminal positions of its paths. Naturally, parallelism is very useful for this kind of search: At Seoul, AlphaGo had 1202 CPU and 176 GPU. However, if it develops a very large tree, it is significantly smaller than the trees developed by Deep Blue against Kasparov. This method was very efficient in the end game, where it was playing perfectly, because the number of legal moves is not high. Its opponent had no chance of winning the game, if it had not an advantage at the end of the middle game.

The authors of this program have masterfully coordinated all these methods for developing a program with outstanding performances. I believed that some day Go programs would be stronger than the best human players; I did not expect that it would happen so fast.

It is interesting to note that an industrial team achieved this success story. Google provided appropriate means to face this challenge, and the researchers could dedicate all their energies to this goal. On the contrary, a university researcher has many additional time-consuming tasks: teaching, writing papers, administration, seeking funds for his projects, etc. At least 50% of our time must be affected in the development of a complex computer system: without this, it is impossible to control anything. It is exceptional that a university researcher has this possibility. Probably, the future of AI is not in the Universities, but in large industrial companies such as Google, IBM, Facebook, etc.

I am slightly less enthusiastic on a minor point. In a preceding post, I have said that AI researchers have two flaws: they are too clever, and they work too much. Therefore, their systems include too much human intelligence, and not enough artificial intelligence: modules designed by humans must be replaced by meta-modules that create these modules. My criticism may be a bit exaggerated, since many important components such as the functions used by neural networks have been automatically learned. However, the Nature paper was co-authored by 20 researchers, and that makes a lot of human work and intelligence.

A negative consequence: AI has no longer a holy grail! We need a new spectacular task that is not too easy, but not impossible. Google, which just removed the preceding one, could use its search engines to find a new holy grail from its massive amounts of data.

To conclude, one cannot help but admire this result, which happened long before the most optimistic researchers expected it.

Should we disclose everything? Part I

 When a human or an artificial being has solved a problem, must he indicate all that he has done? Obviously, one must not describe everything, too much information would be worthless. However, is it sufficient to indicate only what is enough to check that a solution is correct? For my part, I don’t think so; we will see some aspects of this problem in the following posts. Here, we will only show that it is possible and useful to explain why one has chosen the steps taken when solving a problem, even those that are not necessary to justify the solution.

Usually, people tell nothing about this kind of knowledge, often because the solver does not know it: these mechanisms are largely unconscious. A chess player knows what moves he has considered, but he rarely knows why he has considered them. A pity, because de Groot’s work has shown that the strength of the strongest chess players derives from their ability to consider the best moves. In an experiment, world champion Alekhine had to think aloud while he was finding the best move in a complex position; he considered only four moves, among them the two best ones. Unfortunately, he did not say why he thought to consider them, probably because he did not know it: looking at a chess board automatically gave him the moves to consider. On the contrary, a good player had considered nine moves for the same position, but the best one was not among them.

Likewise, mathematics teachers rarely indicate why the steps of a proof have been chosen, and also why ways that seem promising do not lead toward the solution. This is unfortunate, this cause some misgivings about the method used by mathematicians: it is normal to wander, the best mathematicians have not directly found the results that have made them famous. Some students do not even try to find the solution of a problem because they believe that they are too incompetent: indeed, it is impossible to find only the good steps, in the way the teacher presents the proof of a theorem. What is important is to try, to fail, and to try again.

I was most impressed by the description, given by Martin Gardner, of the stages that have resulted in the resolution of the squaring the square problem by four maths students. The goal was to find a square covered with squares, the sizes of the small squares being all different. When we are looking for a solution, it is easy to check whether it is correct; this does not indicate how this square has been found. In fact, the students began to find a rectangle covered by small squares, all different. They enjoyed it a lot, and they built a rectangular box containing all these wooden squares. One of them, who lived at his parents, let it on his desk, and his mother was not entitled to clean this desk. Naturally, she could not resist cleaning it and, in doing so, the puzzle fell to the ground. Her disobedience being obvious, she tried to put the squares back into the box, and she succeeded, hoping to escape her son’s reproaches. However, her son discovered that his mother has found a solution different from the preceding one. Immediately, he called his friends, and they tried to understand why this puzzle had several solutions. That gave them the idea of an analogy with Kirchoff’s laws, the size of a square corresponding to the value of a resistance in an electrical circuit. They realized that the circuit must have some characteristics, and that led them to the discovery of the first squared square.

Knowing why some rules have or have not been executed would also be useful for CAIA: it could be used to improve its meta-knowledge used for selecting, prohibiting, or ordering the possible actions. In its explanation of a solution, as CAIA can completely observe any of its actions, it knows the reasons for the elimination of a result, for its choice, for the priority allocated to each action. Unfortunately, if it can know how meta-knowledge was used, it is not so easy to improve it: one must define meta-meta-knowledge, which is always very difficult to create and to use.

When an AI system uses an algorithm that tries everything possible, there is no need to explain the choice of its steps. However, CAIA dynamically determines whether it will keep or eliminate a result, or whether it will use it immediately or save it for difficult times. It could produce an interesting explanation of its actions such as: it did not keep this new constraint because it contained many unknowns, or because it was an inequality and it already had too many of them, or it delayed using this rule because its execution is time-consuming, and so on.

For an explanation to a human being, it is possible to include some indications on the reasons why CAIA has chosen to try an action; the difficulty is to model this human for giving him only what he needs to know. On the other hand, such explanations given by CAIA to itself would allow it to improve its own meta-knowledge. Both changes could be made; unfortunately, as they are difficult to implement, so far they are only in my to-do list.

Superintelligence

 Superintelligence is the title of a book written by Nick Bostrom. This book is enjoyable to read, the author is well aware of what has been done in AI. For instance, he considers bootstrap as one of the possible ways for creating superintelligence, he mentions the crossover point where the intelligence of the improving system will become greater than human intelligence. Moreover, quoting Yudkowski, he notices that our intelligence is in a very narrow range, in spite of “the human tendency to think of “village idiot” and “Einstein” as the extreme ends of the intelligence scale, instead of nearly indistinguishable points on the scale of minds in general.”

I completely agree with the author on the possibility of a superintelligence. However, I do not see the future as he does; in fact, I do not see the future at all. While reading this book, I had the feeling that the author had never realized any AI system; from his biography, it seems that it is true. He presents the future of AI in a very loose and undefined way: he does not give specific information about how to do it. I am very well placed to know that it is not enough to say that one must bootstrap AI, it needs to be indicated how to do this effectively, and it is not that easy. I have plenty of ideas for advancing CAIA, but I have not a clue about their final consequences.

One cannot fully appreciate what will become a particular AI system, especially if one has never realized such a system. In its final stage, it bears little resemblance with its initial version. We must be humble about predicting future consequences of a good idea.

In order to understand better the difficulty of predicting the future of a field of research, just look at computer science. I began working in this domain in 1958; at that time, people thought that it was excellent for computing trajectories or for issuing payrolls (incidentally, my first program solved an integral equation). In these years, I had never read a paper foreseeing computer omnipresence in the future world, where everybody uses several computers, even in the phones, the cars and the washing machines, and where a thing such as the web could exist. In the same way, we have no idea of AI future application: we are not smart enough, and AI is still in a rudimentary state. This does not allow us to have a reasonable idea of superintelligence.

Many are concerned that machines will size power over man because we assume that they will behave as ourselves. However, our intelligence was built for our hunter-gatherer ancestors by evolution, which promotes procreation and search for food. Sizing power is an important step for these activities, no wonder that it is widely displayed among living beings, including aliens if they exist, who are improved by evolution. The goals of super-intelligent beings are not necessarily the same as ours; therefore, the search for power is not always essential for their activity. Perhaps, there will be artificial super-mathematicians, quite happy to develop complicated theories; to do that, there is no need to rule the world and to enslave us.

In his book, Bostrom considers various methods for not being dominated by artificial beings, such as boxing methods that confine the system, or tripwires detecting signs of dangerous activity. However, if we confine the system, it will never become super-intelligent; furthermore, if it is super-intelligent, it will detect and bypass our tripwires. In the March 2015 issue of the AI Journal, Ernest Davis analyses this book, and suggests that the issue will be settled by “an off switch that it cannot block.” This is a variant of the idea of unplugging the machine from the mains; Science Fiction has long shown that it does not work. Even for AI specialists, it is difficult to imagine the immense superiority of superintelligence!

Let’s go back to the general scale of minds, where animals are well below us, even those rather intelligent such as dogs. Therefore, if we are able to realize a super-intelligent system, the difference between ourselves and this system could be as large as the difference between a dog and ourselves. For understanding our position vis-à-vis super-intelligent systems, we can get inspired by the position of a dog vis-à-vis ourselves. Considering again the efficiency of tripwires, might a dog invent an off switch that we could not bypass? How could a dog imagine many of our activities, such as writing, mathematics, astronomy, AI, and so on? No doubt that we also have no idea about many essential activities, as well as the ancient Greeks had no idea about Computer Science. Long into the future, super-intelligent systems will possibly discover new applications; we will be able to use them because it is much easier to use than to find.

Who knows, perhaps some day our descendants will harmoniously live with super-intelligent systems, and they will be very satisfied with this. When we see how the world is ruled, we can hope better for future generations. This may happen if super-intelligent beings do not want to control the world, but to help us. As we have said before, this is possible if superintelligence is not created by a kind of evolutionary process, based on the pursuit of power and control.

Anyway, we are very far away from this distant future. I am not sure that our intelligence is sufficient for allowing us to create superintelligence; however, if we succeed, the result will certainly be very different from all our predictions. Only when we have made further progress, we will be able to predict the consequences of super-intelligent beings accurately. In the meantime, we are playing at frightening ourselves.

Poor Artificial Intelligence: little known and little liked

 I have the sad honor of working in AI, a domain that is frowned upon by many remarkable people. We are dealing with two main categories of opponents. I have already spoken of the first ones, such as Stephen Hawking, who think that AI is potentially very dangerous: the emergence of super-intelligent artificial beings will lead to the extinction of the human species. On the contrary, for others it is impossible to create artificial beings more intelligent than ourselves. A recent interview shows that Gérard Berry belongs to this last category.

Gérard Berry is a computer scientist who has made outstanding works, deservedly recognized by the scientific community: he is a member of the Académie des Sciences, professor at the Collège de France, and he has received the CNRS gold medal, France’s most prestigious scientific distinction, the second time it honored a computer scientist. For me, AI is not a part of computer science, but both are very close: computer is an essential tool for implementing our ideas. Therefore, when a top-level computer scientist criticizes AI, this must be taken very seriously.

AI occupies only a small part of this interview, but he is not mincing his words in these few sentences. Here are two excerpts:

“I was never disappointed by Artificial Intelligence because I did not even believe for a second in Artificial Intelligence. Never.”

“Fundamentally, man and computer are the most different opposites that exist. Man is slow, not particularly rigorous, and highly intuitive. Computer is super-fast, very rigorous, and an absolute ass-hole.”

Firstly, I will consider three points mentioned in this interview, where I agree with him:

Intuition is certainly an essential characteristic of human intelligence. In a recent blog, I have looked at how AI systems could also have this capacity.

Chess programs are a splendid realization, but their intelligence is mainly the intelligence of their developers. They used the combinatorial aspects of this game, enormous for human beings, but manageable by fast computers. It would have been much more interesting to speak of another outstanding IBM achievement: Watson, an intuitive system, won again the best human players for the game Jeopardy!

For the present time, AI byproducts are the most useful results of our discipline: they allowed to discover important concepts in computer science.

Let us talk about our disagreements. Up to now, nobody has proven that man could have an intellectual activity that no artificial being could ever have. As long as we are in this situation, we must assume that all our activities can be mechanized. Then, we are in a win-win situation. If we show that we are wrong, a significant progress would be made by a reduction ad absurdum argument. On the contrary, if we are right, we will create very helpful artificial beings, which will not be restrained by our intellectual limits.

The main argument that appears in this interview is that computers are absolutely stupid. I am way back in 1960! At that time, many people already wanted to show the impossibility of intelligent artificial beings. Their argument was: computers only do what they are told to do. Naturally, this is true, but this proves nothing: the problem is to write programs, and to gather data, so that the computer will behave intelligently. One can develop programs that do other things than running as fast as possible on their data. Programs can analyze the description of a problem, and then write and execute efficient programs well adapted to a particular problem and its data. Moreover, in a bootstrap, the existing system works with its author for creating an improvement of the system itself. Computer users strongly believe in the usefulness of bootstrap: without computers, it would have been impossible to design the current computers! Hawking’s extraordinary intuition had seen the efficiency of bootstrapping AI; this is why he was afraid of its future.

If one has never proven that man can do something that no computer could ever do, many things can be done by a computer while no human being could ever do them. For instance, our reflexive consciousness is very limited: most of the processes in our brain are unconscious. On the contrary, it is possible to realize a computer system that can observe any of its processes if it wants to; moreover, it can have access to all of its data. As consciousness is an essential part of intelligence, this will have tremendous consequences. Unfortunately, we are not yet able to make full use of this capacity because man’s consciousness is too limited for being a useful model.

AI is probably the only scientific domain with so many staunch opponents, although they do not know it. This is not surprising: man has always rejected what would remove him from the center of the world. We all know the difficulties encountered by those who said that the earth revolved around the sun, or that apes were our cousins. Naturally, the idea that artificial beings could be more intelligent than ourselves is particularly unpleasant for the most intelligent among us. Less intelligent people are used to live with beings more intelligent than themselves, but geniuses are not. Of course, some of them are strongly against AI.

I believe in the possible existence of artificial beings much more intelligent than ourselves. On the other hand, I am not sure that we are intelligent enough to achieve this goal. We need help, and for this reason, since many years, I am trying to bootstrap AI: as of now, my colleague CAIA, an Artificial Artificial Intelligence Researcher, gives me a substantial assistance for its own realization. AI is probably the most difficult task ever undertook by mankind: it is no wonder that progress is very slow.

55 years of Artificial Intelligence

It was in October 1960 that I started to work on my thesis. One year ago, I was appointed deputy chief of a computer department, which existed since almost ten years, in a military laboratory. It had an impressive documentation on computers, and it was a very supportive environment: the same year, the head of the department started working on automatic translation from Russian to French. I was thrilled by the first AI realizations, such as Newell and Simon’s Logic Theorist, Samuel‘s machine learning using the game of checkers, Gelertner‘s geometry-theorem proving machine, and so on.

For my thesis, I wanted to implement a general theorem prover that received as data any of the many propositional logic axiomatizations. It had to discover interesting theorems in each theory, without any information on its existing theorems.

The first difficulty to overcome was to find a thesis director: at that time, Artificial Intelligence, and even computer science, were not taught at Paris University. Luckily, Professor Jean Ville was interested in computers, although he essentially worked on probability, statistics, and economics. He was very kind to accept that I registered at the University for this thesis.

Looking at the results of the initial version of my program, I was surprised to see that it had discovered proofs different from those given in logic books. These original proofs showed me that it could be interesting to use meta-theorems, that is new ways for proving theorems. Therefore, I gave my program the possibility to prove meta-theorems, and the modified program found more results, and also proofs that were easier to understand. The results of this program were not bad; for a particular axiomatization, it proved results for which Lukasiewicz said: “One must be very expert in performing such proofs.” Results found for one of these axiomatizations can be found at page 134 of Laurière’s book (page 125 for the French version).

I was greatly impressed by these results: since then, I have always tried to realize systems that have the ability to work at the meta-level. This is a challenging task, since their results are compared with systems where the work at the meta-level has been made by their author, and not by the system itself. For the present time, the performances of a system working at the meta-level are not better than those found by other programs, but human intelligence is less important, they have a larger degree of autonomy. The primacy of men over animals mainly comes from our capacity to work at a meta-level, consciousness is one example of this superiority. I cannot believe that it is possible to create a superintelligence without this ability.

Moreover, this ability allows to bootstrap AI: existing AI systems will help us to implement more powerful systems. I believe that AI is the most difficult science, perhaps far too complex for human intelligence. The ideal would be to have an Artificial Artificial Intelligence Researcher; this is CAIA’s long-term goal.

Since 30 years, I am developing CAIA. At the moment, it has 13,000 rules that transform themselves into 500,000 lines of C; I have not written a single line of the present programs, and many rules have also be written by CAIA. I continue to replace the expertises created by myself, by meta-expertises that create the preceding expertises. The bootstrap will be completed when CAIA includes a set of meta-expertises that could generate all its expertises and meta-expertises, including themselves.

I am trying to create more and more declarative, and more and more general knowledge: I prefer to give CAIA meta-knowledge for creating expertise E rather than writing myself expertise E. It is difficult but, when I succeed, CAIA’s version is usually better than my initial version.

There is still a very long way before this bootstrap is completed. I have not 55 more years for completing it, but I hope that other researchers will pursue this tremendously difficult task.

Is it possible to define a problem?

 We do not pay enough attention to the definition of the problems given to our solvers. Firstly, by selecting a tailored definition to our system, we can give it an unreasonable help. More importantly, ambiguities in the definition may lead to a different problem although we believe that they are always identical.

This is often due to the ambiguities of natural language. However, we will see that different problems come from the same definition because some people add constraints that they feel evident; nevertheless, we have the right to reject these constraints.

The definition of Sudoku does not seem to pose difficulties, it is very straightforward: one must complete a 9×9 grid with numbers from 1 to 9 so that each column, each row, and each of the nine 3×3 subregions contain all the digits from 1 to 9. This definition appears in the magazines that publish these problems. Moreover, a few add another constraint: there is only one solution for this puzzle.

The question is whether one can use this uniqueness constraint for solving Sudoku puzzles. This constraint is respected in all the published puzzles: they always have exactly one solution. However, this can be considered as a constraint only for those who create new puzzles. The requirement of a unique solution exist for many problems; for instance, a chess problem is cooked if there are several solutions. Even then, very difficult problems, such as the magic cube, described in a preceding post, has millions of solutions; as each one is very hard to find, the problem is interesting.

If this constraint must be respected by the author of a Sudoku puzzle, does a human or machine solver have the right to use it? I have never seen a chess problem where the solution uses this uniqueness constraint. Some constraints are necessary for creating a beautiful problem; this does not mean that they can be used for solving it. However, the first question that needs to be asked is: can this constraint be useful for solving Sudoku puzzles?

Of course, knowing the number of solutions of a problem may be very helpful: as soon as all the solutions have been found, one can stop the process. Indeed, no more solutions can be found if the author has not made a mistake. It is very easy to implement this possibility. For instance, a parameter indicates to CAIA the maximum number of solutions that it can find, so that it avoids wasting a lot of time when there are millions of solutions. The default value for this parameter is 50, but one has just to instantiate it to 1: CAIA will stop as soon as it has found one solution.

It is less obvious to use this restriction even before finding a solution. I will illustrate this point in a problem published in the Figaro Magazine. Far too often, after a newspaper has published a Sudoku problem, it only gives the solution: it does not indicate the steps that were taken for finding this solution. Fortunately, the Figaro Magazine also gives an excellent description of the main steps; therefore, it is possible to see which rules have been used. Usually, there are only the constraints given with the definition of the problem, but not the one stating the uniqueness of the solution. However, a problem stated in the 17 July 2015 issue uses it. The following diagram describes the situation after finding the value of three squares. The lines are numbered 1 to 9 from the top.

It is easy to see that the possible values for B2 and C2 are 7 and 9. In the same way: 1, 7, and 9 are the possible values for B5 and C5. (5 cannot be in C5 because in column C, 5 can must be in C7 or C9, the only squares where 5 can be put in the lower left subregion). If 1 were neither in B5, nor in C5, each solution with B5=7 and C5=9 would give a symmetrical solution where 7 and 9 are switched in squares B2, C2, B5, and C5. If there are N solutions with B5=7 in this hypothesis, there would be also N solutions with B5=9. If K is the number of solutions when 1 is in B5 or C5, the total number of solutions would be K+2*N. As the uniqueness constraint states that it is equal to 1, this means that K=1, and N=0: there is no solution without 1 in B5 or C5. This is very useful, that means that D5 is not 1. As in the center subregion 1 is either in D5 or F6, we have F6=1, and it is easy to complete the Sudoku with this information.

                                      A            B         C          D          E          F         G          H          I

 CAIA does not make this deduction: it has not received the uniqueness constraint. For this problem, it finds the same possible values for squares B2, C2, B5, and C5; unfortunately, it cannot deduce the following steps of the Figaro solution. In its solution, it sees that the only values for E1 are 2 and 5. 2 leads to a contradiction, and it directly finds the solution with E1=5. Incidentally, this proves that the value of N is actually zero. However, contrary to the solution given in the Figaro, it has to backtrack once.

I understand that many people want to use the uniqueness constraint, although I do not agree with them for several reasons:

*Real problems may have none or many solutions, and it is interesting to solve them. Why are they giving a special status to artificial problems?

*Even when there is certainly one solution, why transform the search for a solution into a game of chance. The first to find the solution is the lucky one who considers firstly the good value for the unknowns.

*For other problems that have only one solution, as Chess problems, one never uses this uniqueness for finding this solution.

*In many cases, including the Figaro, the given Sudoku definition does not include this constraint.

*If the problem is cooked, nobody will see it.

*The author of the problem must check that it has just one solution: the uniqueness constraint is certainly necessary for creating a puzzle. Naturally, for checking it, he cannot use the fact that the problem has just one solution! Therefore, one must also have a solver that does not use this constraint. Is it necessary to have two solvers for the same problem?

All this shows that, even when the definition of a problem looks evident at first sight, it may be difficult to agree on this definition: the solvers cannot help but interpret this formulation. In this case, the solutions and the solving methods may not be the same as those given in the initial formulation: they do not solve the same problem. Therefore, if it is possible to define a problem, it is certainly not easy. Unfortunately, AI researchers have to give unambiguous definitions to their problem solvers.

Jeopardy!

 For the present time, AI systems are worse than ourselves for some applications, and I believed that the reason was that they have not an associative memory as good as ours; its efficient use is essential for our intuition. In a fraction of a second, it gives us an answer, or allows us to remove an ambiguity. We have already considered this problem in the preceding blog. AI systems rarely use a large corpus of English texts effectively, and don’t do it quickly.

Therefore, I was impressed by the results achieved by Watson, an AI system developed by an IBM team. It can play a Jeopardy! competition, a television game very popular in the States. Several competitors must find as fast as possible the person or the object corresponding to a clue, which is an English sentence in any domain. Watson must first understand the sentence, then find candidate answers in a huge amount of English texts, and finally choose the best match with the clue. All this has to be done in a short time, a few seconds.

Let us consider an example from the match that showed Watson’s superiority. For the clue It’s a poor workman who blames these.“, Watson was the first to find the good answer: tools.

The difficulty comes for three main factors: questions come from a broad domain, the answer must be given with high confidence, and it must be done very fast. Watson does not use Internet, but it has a great deal of English knowledge, including several encyclopedias, among them the full text of Wikipedia; the whole takes four terabytes of storage.

Watson competed against two top champions: Brad Rutter, a 20-time champion, and Ken Jennings, the best player in Jeopardy! history, famous for winning 74 games in a row. Watson won, and won big: $77.147, when the other two only won $21.600 and $24.000.

As Watson includes many modules, I will briefly speak of those that seem the most important for an AI point of view.

In the first step, Watson analyses the clue and extracts keywords and sentence fragments used for finding possible answers from the breadth of its knowledge. This mechanism is a little like our intuition: it is using many heuristics. Its main advantage is that it is very fast. Unfortunately, for us, this often leads to mistakes, because we often merely keep the first response, for lack of time, laziness, or unawareness of the knowledge leading to the result.

For Watson, this step usually gives several possible candidates. Then, far better than us, it will spend a lot of time (that means a few seconds for a computer!) for choosing the most reliable one. It ranks the candidates, and it chooses the first one, provided that it has a sufficient level of confidence. If no one is satisfactory enough, it does not answer.

Many methods enable it to measure its confidence in a particular result. If a result appears several times, Watson will be much more confident in this result. It also tries to evaluate the reliability of a candidate. If the clue is Chile shares its longest land border with this country, it is easy to remove China, which does not share borders with Chile. Two serious candidates are Argentina and Bolivia. The media often speak of the border dispute between Chile and Bolivia; this will tend to favor Bolivia. On the other hand, some results give the lengths of the borders. As these lengths are very different, and these results are rather reliable, Argentina will finally be chosen by the geographic module. However, even when a candidate has most of the wanted characteristics, it will not be chosen if one reliable result forbids it.

Watson may analyze several results and compare their reliability because it is aware of the information used for finding a particular answer. This is a huge advantage for artificial cognition, usually we do not know why our intuition gives a particular result.

Watson correctly answered several difficult clues, for instance clock for Even a broken one of these on your wall is right twice a day.“, and escalator for This clause in a union contract says that wages will rise or fall depending on a standard such as cost of living.

Nevertheless, as human beings, Watson sometimes makes mistakes. The audience had a good laugh when it answered Toronto to the following clue on US towns: Its largest airport is named for a World War II hero; its second largest for a World War II battle.

Realizing a system playing Jeopardy! has shown the possibility of using a huge memory efficiently. Now, the authors of this system want to adapt their methods to the resolution of another problem: medical diagnosis. In that domain, it is also important to give the right answer by reasoning over an unstructured natural language content; moreover, it will have much more useful consequences.

Cows are thirsty

 Several years ago, I was in the subway with one of my children. For keeping him entertained, I asked him a question: “what are the cows drinking?” Knowing me, he did not answer immediately: he sniffed out a trap, it was too easy to answer this question! A passenger, whom I did not know, was surprised by this total ignorance, and he could not refrain from saying: “They are drinking milk, my little child!” This mistake is natural: this is a hoax. It is due to a very useful mechanism, that often enables us to give a correct answer quickly. The question was on a beverage and on cows; as cows are associated with milk, and milk is a beverage, we have the answer immediately.

It is better to avoid systems giving incorrect results. However, if the result is that one gives no answer at all, this is not satisfactory. Many AI researchers have an excellent mathematical training. Therefore they overwhelmingly reject all possibilities of error: if one accepts a contradiction, one can prove everything. Luckily, human beings survive their contradictions.

Methods that can lead to erroneous results may be useful when we do not know a perfect method, or when it is too costly over time. It may be better to have a result quickly rather than waiting centuries for a perfect result. For instance, current Chess programs play very well, better than the world champion, but they certainly do not always play the best move. This was an issue for Edgar Poe, when he was asserting that Maelzel’s Chess Player was not an automaton. He was right: a human chess player was hidden into the machine. Nevertheless, one of his arguments was incorrect: he believed that it was not a machine, because sometimes it lost the game. Sorry, Edgar, a machine may be imperfect. However, we know a perfect method for playing the best chess move: one fully develops the tree of all the possible moves for each player. As the depth of the tree is finite, it can be done in a finite time, and one can find the perfect solution from this tree. Unfortunately, the time necessary for generating it goes beyond anything we can imagine.

Fast mechanisms are used by our unconscious; we call them intuition. Unfortunately, sometimes they are making mistakes. They are also useful for resolving ambiguities in natural language texts. It may be dangerous to use the results found by our intuition: we must cautiously check that they are correct when the consequences could be serious. To do that, our conscious mechanisms verify the results given by our intuition. However, calling them is not automatic: for each result, we must decide whether we will check it.

Intuition is an essential mechanism for human beings, and especially for animals: this is the only way enabling them to take decisions that are not genetically programmed. It always gives results quickly; moreover, we can use it in situations where we do not know other methods. As living beings, artificial beings will have to use similar methods. Nothing prevents us to give them such capabilities. Unfortunately, this is not so easy because we do not know them, since they are unconscious; we have to discover and experiment them in our systems. Another difficulty is that our love of correct methods will reject the bunch of heuristics that makes up intuition.

Fortunately, we have now a clearer idea of the mechanisms called ‘intuition’. For Herbert Simon: “The situation has provided a cue; this cue has given the expert access to information stored in memory, and the information provides the answer. Intuition is nothing more and nothing less than recognition.” I cannot see why AI systems could not use this mechanism.

The horror of error is dangerous: some research directions are neglected because they can lead to dubious results. This favors rigorous mathematical methods in AI, even when their application is limited because they are too time-consuming.

If artificial beings were never making mistakes, that would mean that they would be too restricted. They must not be refrained from using methods that can be very helpful when carefully used. However, we must also give them the possibility to find out when they are wrong: we have to add modules verifying the results. For instance, CAIA checks that all the solutions it finds satisfy all the constraints. Unfortunately, it is possible that it misses solutions, but we have already seen that, sometimes, great mathematicians also miss some of the solutions.

We will progress when our systems believe that cows drink milk, see that this is wrong, then find the right answer, as my embarrassed fellow traveler finally did.

When a Belgium canon meets a French mathematician

 Let us consider again magic squares, I want to show that it is possible to find solutions for very large squares, using methods completely different from the improved combinatorial method used by CAIA in the preceding post.

 In 1703, canon Poignard, from Brussels, found an algorithm that generates solutions for magic squares when N is odd. This method was improved by mathematician Philippe de la Hire, who issued in 1705 a communication to the Académie Royale des Sciences. This method is well known; it was already described in Diderot’s Encyclopedie, entry ‘quarrés magiques’.

 For describing this method, I will consider N=5. Let us choose a permutation of the numbers from 1 to N, for instance: 2,3,5,1,4. We write it on the first line of the square. For the second line, we put this permutation shifted by P elements; P is a number less than N such that P and P-1 must be relatively prime numbers with N. This implies that the method cannot be used for even squares: as either P or P-1 is even, N cannot be even. If one chooses P=3, the second line will be 5,1,4,2,3. The process continues, and we get the square just down left.

In the same way, we consider a permutation of the integers from 0 to N-1, multiplied by N, for instance: 5,20,0,15,10 which is written on the first line of another square. This new square is completed as the preceding one, with a shift Q, which must also be different from P. With Q=4, we generate the square on the right above.

 Both squares have exactly once each number on each line, each column, and each diagonal; moreover, the sums on these lines are the same, since we have the same permutation. Naturally, there is no bijection, since there are only N different numbers for the N2 little squares of each square.

We create the magic square by adding these two squares. It has always all the preceding characteristics, and each value between 1 and N2 is present exactly once.

 An interesting remark: if opposite sides of the square are considered as next to one another, all the diagonals have also the common value for their sum. We do not generate only magic squares, but a kind of ‘hyper-magic’ square that has 4N constraints, including N diagonals to the right and N diagonals to the left. The new problem is much more constrained than the original one. This explains why there is no hyper-magic square for N=3, where one cannot find a couple P and Q.

This way of generating magic squares is extraordinary: we do not need to write the constraints, the way the solution is created automatically satisfies them. It is no more necessary to find the value of SUM. It is also unnecessary to backtrack.

 Without using a computer, I found a magic square for N=11 in 16 minutes. Furthermore, I wrote a 25 lines C program that generates such squares. In only 333 seconds, it generated (but not stored) a solution for N=100,001. It had to find a value for 10,000,200,001 little squares! It is unthinkable to solve such a problem with the methods used by CAIA, and many other constraint satisfaction systems: there are 200,002 constraints, each one has 100,002 unknowns, and the number of possible values for each unknown is greater than ten billion. Studying the formulation of this problem, canon Poignard transformed a constraint satisfaction problem into an algorithm building a large permutation from simpler permutations.

 CAIA’s problem solver is very far from such performances. This does not mean that it is an impossible task for an AI system, but that our systems must have a greater capacity for working at the meta level, studying and transforming the formulation of the problem: they must be able to do what canon Poignard did more than 300 years ago.

CAIA is not a good magician

 Too often, IA researchers describe the successes of their system, but they keep a lower profile when they get poor outcomes. I do not want to comply to this bad habit. I describe here CAIA’s behavior when success does not come, in the present case, the magic square problem. An order N Magic Square has N horizontal lines, and N vertical lines. Therefore, there are N2 squares, and each one contains a number between 1 and N2, so that each number appears only once. Moreover, the sum of the verticals, the horizontals, and the two diagonals, must have the same unknown value, SUM.

First, CAIA finds symmetries, which are permutations of the unknowns which lead to an identical set of constraints. Seven symmetries are evident for all magic squares: three rotations and four reflections with respect to horizontal, vertical, and both diagonals. CAIA knows nothing about geometry, but it can find all symmetries, both geometric and non-geometric. To do that, it uses the general component that solves constraint satisfaction problems. I had only to specify one particular problem: finding all the permutations of the unknowns leading to the initial set of constraints.

I hoped that it would find the seven geometrical symmetries. However, CAIA was far beyond any of my expectations: it found a huge number of symmetries that I never imagined existed. Even for N=4, it finds 31 symmetries! Some are completely unexpected, for instance, one of them transforms the first horizontal into the second vertical: both contain the same values, naturally not in the same order.

 

Finding this is not evident: almost 400 years ago, Bernard Frénicle de Bessy performed an outstanding study of the N=4 magic square. In particular, he defined a standard representation of the solutions, which is always used. However, he has found 880 basic solutions; multiplied by 8, this gives a total of 7040 solutions. CAIA found only 220 basic solutions; multiplied by 32, we find again 7040. When N grows, the number of symmetries increases at a startling rate: for N=10, CAIA finds 15,359 symmetries! On this point, CAIA’s behavior is flawless, although a lot of computer time is needed: 31,193 seconds for all the symmetries when N=10.

In the following step, CAIA tries to find other constraints. For the magic problem, it adds that the sum of the unknowns is equal to the sum of the first N2 integers, since there is a bijection. After that, it can find the value of SUM. For us, humans, this is easy because we are seeing the square: it is clear that the sum of the N horizontal lines is N*SUM. It is also evident that this sum contains all the squares, so its value is the sum of all the numbers between 1 et N2, that is N2*(N2+1)/2. Therefore, we have SUM=N*(N2+1)/2.

I did not give CAIA this formula, it has to find the value of SUM from the set of constraints. For this task, it is rather good: for N=16, it finds SUM=2056, after many substitutions in the 34 initial constraints, plus the bijection constraint. It would have been better to prove the general formula, but it is difficult for a blind system, which cannot see the square as we do. It does not find other constraints, and I do not believe that constraints with less than N unknowns exist for this problem. In total, for this task, CAIA has done well overall, although it could not set out the general formula.

The last part is to find the solutions, and this is the point where CAIA fails: it has been designed for finding all the solutions of a problem. For this reason, it is very interesting to find symmetries. However, it makes no sense to find all the solutions when there are billions of them. In its normal working, it considers the formulation of the problem, and particularly it finds new constraints, easier to use than the initial ones. In some cases, (as for N=3), it directly finds the solution. When there is no new constraint, it is often better to stop trying to be clever, but to write cleverly a combinatorial program that generates all the solutions.

For a magic square, this works well for the small values of N: for N=8, it generates the first 1,000 solutions in 4 seconds. Unfortunately, the search space dramatically increases with N: the number of unknowns grows as N2, and the number of possible values for each unknown also grows as N2. Very soon CAIA is completely overcome. The method still works for N=9: 10 seconds for the first 1,000 solutions. Unfortunately, there is a sharp discontinuity for N=10: after 14 hours, it has not yet found a solution.

The reason for this failure is that it is unthinkable that, given the huge number of solutions, one can ever find all of them when N is not small. It is sufficient to find one of them. We could think that all we have to do is to launch the search for solutions, and stop when one is found. In reality, it is often better to use other methods for scanning the search space, pinpointing the areas where the probability of finding a solution is high. For instance, one can find solutions for the set of equations when the unknowns may have continuous values. Then, one can look for the combinations of integer values close to the values of these solutions.

Another heuristic may be used, such as to make the initial problem more constrained, adding new constraints so that it would be easier to find solutions for this new problem. If it has solutions, they would also be solutions of the initial problem. When there are zillions of solutions, we can add constraints that remove some of them. In a future post, we will see how this could be very efficient for finding solutions for some magic square problems.

To sum up, although a problem may have many solutions, CAIA experiences serious difficulties when all its constrains has many unknowns, each unknown having many possible values; this is normal, the search space is huge. In this situation, it can, nonetheless, have excellent results when it can generate new constraints with few unknowns; this happened with the magic cube, where it found constraints with only three unknowns. If this is impossible, it must use completely different methods, good for finding a needle in a huge haystack. CAIA improves the combinatorial search with a meta-combinatorial search generating new constraints; this is sometimes not enough for solving this kind of problem, some modules must be added to CAIA.