Bootstrapping CAIA Part I: The initial domain

I can’t just say “one must bootstrap AI”, how to do it must also be explained. Naturally, I will use my own experience in CAIA’s development, started in 1985.

I shall quickly go through the first step, where I defined a language and knowledge for translating itself into C programs. This is well known by those who write a compiler in the language of this compiler. It was interesting to define a new language rather than using an existing language for two reasons:

1. This language must change over time, so that it becomes more and more declarative. At the beginning, in order to facilitate the compilation, it has many procedural aspects. They are gradually replaced by more declarative possibilities. Declarativity is essential because it is easier for the system to create declarative rather than procedural knowledge; it is also easier to study its own knowledge when it is given in a declarative formalism. Very important elements in this language are sets and bags, which do not imply an order to be followed, contrarily to the lists. Expertises are sets of rules, and rules have sets of clauses.

2. It is important that CAIA and myself could thoroughly examine any module of CAIA when it is executed. An unrestricted access to the present state of CAIA and its knowledge by CAIA itself is essential if we want to give it a kind of consciousness. It is easier to have something that suits us when the system is specially built for this purpose. Black boxes are great foes of intelligence: they restrict consciousness since one does not know what happens when one executes them. There are still two main black boxes for CAIA: the operating system and the C compiler.

Using its knowledge, CAIA translates all its knowledge, either given in the initial formalism or in more declarative formalisms that I later introduced. Since thirty years, CAIA does not contain a single line of C that I have written. All in all, there are 500,000 lines of C, and 13,500 rules. Many rules have not be created by myself, but from rules that create rules.

AI essential goal is to create a general problem solving system; all the human activities are, in fact, problems that we have to solve. The most important problem for AI researchers is to realize a system that could solve every problem, including this last one. It is foolish to begin with the most difficult problem, it is better to consider simpler problems, then to extend this domain gradually. This is one of the main directions of a bootstrap, the other one being to improve the performances.

It is important to choose the initial domain well. CAIA started to solve problems defined by a set of constraints. Firstly, this domain is interesting because many problems may be defined in that way; but it is even more interesting because some of these problems may be used by the solver itself. Since a long time, I have added two such problems:

1. To find symmetries from the study of the formulation of a problem can be stated as a constraint satisfaction problem. This is useful because it reduces the size of the search space, it facilitates the search for new constraints, and it enables to find a decomposition of the search space so that proofs will be easier in each area. I will explain this last point in a future blog, I completely underestimated it: it was a pleasant surprise.

2. For experimenting with a system, it needs many problems. Often, there are not enough problems in the literature, or I have to give them myself; moreover, there is a lack of very, very difficult problems, which are far too hard for human beings. However, finding new problems is a problem that can be defined with constraints; therefore, many CAIA’s problems have been found by CAIA itself.

In the next blog, we will see how a first step has been made to extend this initial domain. When CAIA will be able to solve any problem, one direction of the bootstrap will be completed; however, it will also be necessary to solve these problems more and more efficiently.

One thought on “Bootstrapping CAIA Part I: The initial domain”

There are still two main black boxes for CAIA: the operating system and the C compiler

You did not define (in your blog) what is exactly for your the operating system (and it might be that CAIA don’t know what an OS is). For some people, the C compiler is an utility software which is part of the OS (viewed as the set of software installed on your laptop). For other people, the OS is only its kernel.

And there are much more than two “black boxes” required to run CAIA on MacOSX (or on Linux). You depend on a lot of other software components (the desktop environment, the terminal emulator, the linker, the unix shell, …). Try to explicitly list all the software components you need to run CAIA (and show that list to others, it is lilkely that you’ll forget some items in it).

All in all, there are 500,000 lines of C [in CAIA]

But 500KLOC is a very small software by current standards (even if for a single-person project, it is huge). Your C compiler (GCC or Clang) is about ten million lines of code (and CAIA need most of them -I measured that more than two thirds of GCC code is useful to compile all the *.c files of CAIA, because you want the C compiler to optimize). CAIA is probably needing a significant portion (my guess is at least a third) of the 20 millions lines in a typical Unix kernel (be it Darwin on MacOSX or the Linux kernel). And you also need terminal emulators, computer fonts and the software libraries supporting them, etc etc etc. Your Apple computer is a lot more complex than the IBM7090 you started coding on.
In other words, most of the code (probably 50 millions lines of C, so 99%) running on your “CAIA” machine is not the code that CAIA has generated and you need all of it. So your computer is a MacOSX box with one (or a few)percents being CAIA code. I can’t understand how one could practically dream of replacing all of them with CAIA expertise.

Black boxes are great foes of intelligence

In an ideal word (where you could afford having a team of hundreds of artificial intelligence and computer science researchers -you need both of them!- working for you), you might be in principle right (but I am not sure of that). In our practical word, all the AI software projects (at least those from the 21st century) that you have mentioned on this blog are heavily based on (and taking advantage of, i.e. leveraging above) a lot of “black boxes” and they are right in doing so for pragmatical reasons: nobody can afford giving all the declarative knowledge contained in all the software components used by current AI systems (about operating systems alone, you’ll need to give to CAIA more than all the knowledge in Operating Systems course; and there are also large pieces of knowledge related to computer architecture, compilers, algorithms etc etc. that CAIA would need to avoid “black boxes”.

A possible approach might be to give CAIA some strong abilities regarding to program comprehension, which is a skill that all software developers and artificial intelligence researcher have (they routinely are able to select existing software components and learn how to use them by studying both software documentation and source code). With the success of free software, one could imagine an AI system downloading all the source code (and other software artifacts) and studying them. The software heritage project is then highly relevant. To go on that very difficult route, CAIA would need to use big data and deep learning techniques. As far as I know, CAIA has no expertise on these domains (and even worse, since CAIA does not do any dynamic heap allocated memory today, it simply cannot handle a lot of data as it is now). BTW, if you (J.Pitrat) wanted to add such big data or deep learning abilities to CAIA, it would be pragmatically a lot simpler to choose some existing big data/deep learning software components (there are many of them, available as free software), and to explain to CAIA how to use such another “black box”. The alternative (teaching to CAIA all the theory and know-how related to big data or machine learning) would be a lot more difficult to you.

So I guess that understanding software code will be (or should be) a future domain for CAIA, and I am very interested of how would you approach that very difficult issue. For what it is worth, many reports on AI (from a social & economical perspective) like Artificial Intelligence and Life in 2030 mention software development as one of the human tasks which is less likely to be automated by AI systems, even twenty years from now. I hope to read your insights on software comprehension by AI systems soon!

Merry Christmas 2016 to all.

My view on Artificial Intelligence

Just another WordPress site

One thought on “Bootstrapping CAIA Part I: The initial domain”

Leave a Reply to Basile Starynkevitch Cancel reply