2009-06-24

Illustrating SBCL's build process

A while back I read Christophe Rhodes's paper “SBCL: A Sanely-Bootstrappable Common Lisp” which describes SBCL's bootstrap procedures.

The paper includes a bunch of diagrams for each build stage. These were pretty helpful in improving my understanding of the build process. So, I tried to take them a step further and create a single diagram that provides a global overview of the build process:

I'm interested in hearing any comments you might have. If you already know how the build process works, does it make you cringe? If you are vaguely familiar with (parts of) the process, does it provide you with some sort of new insight? Given that I haven't included a legend, does it make any sense at all?

14 komentoj:

Anónimo disse...

Thanks - I think this is a much needed diagram. And it's pretty :)

Anton Vodonosov disse...

> Given that I haven't included a legend, does it make any sense at all?

Without a legend, as a person who is not familiar with SBCL build process, I have some questions:

- Why some arcs have two arrows?
- What is XC?

charlie mac disse...

This is a nice diagram. I can't speak to its accuracy, however the varying arrow styles' meaning is not obvious to me just from looking. What do the double arrows mean? What about filled versus non-filled?

Luís disse...

Yeah, that seems to be the least clear part of the diagram.

White double arrows: "executes". Black double arrows: "generates an executable". White single arrows: "is compiled/loaded by". Black single arrows: "generates". Something like that. Does that help?

XC = cross-compiler. It's the SBCL compiler running as an application inside the "host". The host can be SBCL, CMUCL, CLISP, CCL, etc...

Anton Vodonosov disse...

> White double arrows: "executes". Black double arrows: "generates an executable". White single arrows: "is compiled/loaded by". Black single arrows: "generates". Something like that. Does that help?

I must admit, while I scroll the page up from that explanation to the picture, I forget the meaning of the arrows (((

Luís disse...

You know, it's probably not very important. I think I'll just draw all the arrows alike.

Anton Vodonosov disse...

After some concentration I think I am starting to understand...

Anton Vodonosov disse...

> You know, it's probably not very important. I think I'll just draw all the arrows alike.

I think, if we write A -> B, it is important to distinguish from "A is input data for B, where B is active thing", and "A is active thing and produces B".

But maybe differences between "generates" and "generates an executable" is not so big?

I.e. reducing to 3 types of arrows looks like a good refactoring step.

Anton Vodonosov disse...

Let me try to explain in text how I read the diagram.

1. Host lisp (SBCL, CMUCL, CLISP, ...) loads code from src/code/, src/compiler/, src/assemby/ that forms a cross-compiler.
Cross compiler is a thing that can compile lisp code into a file format that is understood by SBCL.
I suppose the files produced by the coss-compiler contains machine code, but the file layout,
format of metainformation about objects (their location in the file, names of symbols, argument number of functions)
is specific to SBCL.

And that is the responsibility of the C runtime: to load such files and arrange for the machine code in that
files to be executed (i.g. ensure reference to an object uses the same address where object
really resides in memory, etc.).

Not sure how much lisp-obj files are different from core file (sbcl-cold.core). How much processing must be made
to turn lisp-obj files into core?

I assume cross-compiler can compile a limited subset of Common-Lisp (no CLOS i think, and maybe lot of
standard functions are absent; maybe even collections functions are absent?)

2. Compile the C-runtime. Input for this step is:
- some headers produced by the coade loaded at the step 1 (the headers produced are placed to genesis/*.h files);
- [.chS], which means .c, .h, and .S files (it took me to check SBCL source to understand that; and I
started from searching for *.chS, then for *.ch files). .c and .h are usual C files, but what is .S?
Looks like they are compiled by C compiler too, but these files use preprocessor very extensively (or not?)

Result of this step is C-runtime (on the diagram represented by the red "SBCL" node) and
grovelled constants. Grovelled constants are constants extracted from C source files and transformed into
a lisp definitions. I.e. grovelled constants artifact is embodied as lisp source files, right?

How the grovelling works? From the diagram I assume [host] Lisp is not used for this at all?

3. Compile the same Lisp sources used on the step 1 (the cross-compiler source) by the
cross-compiler. I.e. coss-compiler recompiles itself to be loaded into C-runtime
produced by the step 2.

Result here is lisp-obj files of cross-compiler (lisp-obj files are in SBCL format, right?
Does final SBCL produce the same files when compiles Lisp).

4. Produce SBCL-format image file (sbcl-cold.core) of cross-compiler (from the lisp-obj files).

5. Load the cross-compiler (sbcl-cold.core) into the C-runtime and compile the full compiler.
I assume the full compiler source code uses the limited language subset of the cross-compiler
to implement more language constructions; in turn uses them to extend the language further,
and so on. Also the full compiler source includes type inference engine,
more optimized code-generator, etc (because these components are more convenient
to implement in full language, than in the limited language subset of corss-compiler).
Or cross-compiler already has all this, and the last recompilation is nesessary just to
produce more optimised code of those components, to make final SBCL just run faster?

Of course, If I read the Christophe Rhodes's paper linked from your post, I'll get the answers to my questions.
The above text is just an illustrantion of how I understand the diagram: what I can understand, and what I can only
speculate about.

IMO it's a good idea to represent the build process in such a short form as a diagram. The amount
of text that we all read daily is too huge and laconic explanations are very wellcome.

Anton Vodonosov disse...

BTW, what diagramming software you use?

Luís disse...

.S are assembly files.

Consider a situation where you need to figure out what is the value of some constant, e.g. INT_MAX. So you generate a C program that does printf("(defconstant +int-max+ %d)\n", INT_MAX) and output that to a .lisp file. That's grovelling. You can do similar stuff for figuring out structure layouts, types, sizes etc.

I believe you got things right, except the compiler is not compiled a third time in step 5. The main course in step 5 is compiling PCL, SBCL's implementation of CLOS. Oh, and before that, this step needs to create the infrastructure for running Lisp code: packages have to be created, which requires creating new hashtables, etc.

I used OmniGraffle to draw this diagram.

Anton Vodonosov disse...

You mean cross-compiler is not a limited complier, but full SBCL?

Luís disse...

The compiler is not limited, IIUC.

Anónimo disse...

Estas belaj bildetoj :)

Kategorioj