| 1261 | | \iffinal |
|---|
| | 1263 | \chapter{The Algol 60 GCC Frontend} |
|---|
| | 1264 | |
|---|
| | 1265 | The \Algol 60 GCC front end has been written in an attempt to get a |
|---|
| | 1266 | first hand experience on work with GCC. \Algol 60 itself turned out |
|---|
| | 1267 | to be a kinky language, and as of now, is not yet fully implemented. |
|---|
| | 1268 | This chapter presents overview of the compiler architecture and |
|---|
| | 1269 | implementation, and ways it handles certain interesting \Algol 60 |
|---|
| | 1270 | constructs. |
|---|
| | 1271 | |
|---|
| | 1272 | |
|---|
| | 1273 | \section[gcc-algol the Project]{{\tt gcc-algol} the Project} |
|---|
| | 1274 | |
|---|
| | 1275 | The programming language \Algol 60 was picked purposefuly, because |
|---|
| | 1276 | language such as this is best fit for GCC. It is structured, with |
|---|
| | 1277 | static typing, has arrays and functions, and will profit from support |
|---|
| | 1278 | of runtime library. Lastly it has a rigorous standard, which is not |
|---|
| | 1279 | that important in and on itself, but why invent new language? |
|---|
| | 1280 | |
|---|
| | 1281 | The \Algol 60 compiler was started as a project independent on |
|---|
| | 1282 | GCC\footnote{Despite the language being picked to suit GCC!}. I knew |
|---|
| | 1283 | I would plug it in GCC one day, but I specifically avoided any |
|---|
| | 1284 | relation to GCC before the time came. This helped me simulate the |
|---|
| | 1285 | situation where the company or an individual wants to port existing |
|---|
| | 1286 | work to GCC back end\footnote{In the sense of GCC itself being the |
|---|
| | 1287 | back end.}. |
|---|
| | 1288 | |
|---|
| | 1289 | The front end is written in a language C. As mentioned before, C is |
|---|
| | 1290 | natural fit for GCC. If you have the choice, choose either C, or |
|---|
| | 1291 | other GCC language that can compile transitively from C. Front end |
|---|
| | 1292 | written in such a way is simpler to distribute and build at user |
|---|
| | 1293 | sites. |
|---|
| | 1294 | |
|---|
| | 1295 | I strived to avoid any extra dependencies of the front end, to the end |
|---|
| | 1296 | of writing linked list and generic string modules from scratch just |
|---|
| | 1297 | for the purpose of \Algol parser. These two modules are the only |
|---|
| | 1298 | library functionality beyond pure \literal{libc}---even symbol tables |
|---|
| | 1299 | are stored in linked lists \footnote{I will fix this, I promise.}. |
|---|
| | 1300 | |
|---|
| | 1301 | |
|---|
| | 1302 | \section{Overall Architecture} |
|---|
| | 1303 | |
|---|
| | 1304 | From the bird's eye view, \Algol compiler is a straightforward |
|---|
| | 1305 | compiler: it has Flex-based lexical analyser, Yacc-based parser, and |
|---|
| | 1306 | several C modules with AST and semantic analysis. |
|---|
| | 1307 | |
|---|
| | 1308 | Algol compiler uses encapsulation heavily. There is no (official) way |
|---|
| | 1309 | to get inside structures, everything is passed as a pointer to |
|---|
| | 1310 | undefined structure. Only in the module that handles that particular |
|---|
| | 1311 | type is the structure actually defined, so that the implementation has |
|---|
| | 1312 | full, unrestricted access to bits. |
|---|
| | 1313 | |
|---|
| | 1314 | The encapsulation zeal is exposed by use of visitors. There are |
|---|
| | 1315 | certain polymorphic structures in \Algol AST. For example expressions |
|---|
| | 1316 | may be numbers, variable references, unary and binary expressions, |
|---|
| | 1317 | function calls, etc. All expression kinds are passed around as the |
|---|
| | 1318 | same opaque \variable{expression\_t} pointer. To do any action |
|---|
| | 1319 | depending on exact type of expression, one has to define a {\em |
|---|
| | 1320 | visitor} over the expression structure, and let the visitor dispatch |
|---|
| | 1321 | on the desired expression object. |
|---|
| | 1322 | |
|---|
| | 1323 | Other means of achieving safety include strong trend towards static. |
|---|
| | 1324 | Whatever can be checked by compiler giving a warning is checked so. |
|---|
| | 1325 | If the warning can be turned to full error with some extra coding, so |
|---|
| | 1326 | much the better. If compiler can't check that, runtime will. |
|---|
| | 1327 | |
|---|
| | 1328 | |
|---|
| | 1329 | \subsection{Visitors} |
|---|
| | 1330 | |
|---|
| | 1331 | The figure \ref{Figure++Visitors} presents an example of visitor |
|---|
| | 1332 | creation and usage. You can see that the expression visitor is |
|---|
| | 1333 | created by a dedicated function \function{new\_visitor\_expr}, which |
|---|
| | 1334 | accepts one argument for each expression kind. The arguments have to |
|---|
| | 1335 | be \function{callback\_t} pointers: to convert function pointer to |
|---|
| | 1336 | callback, you have to pass it through appropriate callback builder, |
|---|
| | 1337 | \function{a60\_expr\_callback} in this case. Callback builder makes |
|---|
| | 1338 | sure that your callback function takes \literal{expression\_t*} as a |
|---|
| | 1339 | first argument. All is checked during compilation: that you didn't |
|---|
| | 1340 | forget one of the callbacks, that all callbacks are typed right, etc. |
|---|
| | 1341 | |
|---|
| | 1342 | In addition, the visitor builder knows it builds visitor for |
|---|
| | 1343 | expression type, and when in debug mode, it {\em dynamically} checks |
|---|
| | 1344 | that that visitor is dispatched over expression, and not |
|---|
| | 1345 | e.g. statement. In release mode, the check goes away. |
|---|
| | 1346 | |
|---|
| | 1347 | \begin{figure} |
|---|
| | 1348 | \begin{verbatim} |
|---|
| | 1349 | /// Excerpt from compiler context. |
|---|
| | 1350 | struct struct_al60l_bind_state_t |
|---|
| | 1351 | { |
|---|
| | 1352 | visitor_t * expression_build_generic; |
|---|
| | 1353 | }; |
|---|
| | 1354 | |
|---|
| | 1355 | /// An example callback function. |
|---|
| | 1356 | /// Builds GENERIC for integer constant node. |
|---|
| | 1357 | void * |
|---|
| | 1358 | expr_int_build_generic (expression_t * self, void * state) |
|---|
| | 1359 | { |
|---|
| | 1360 | tree ret = build_int_cst (integer_type_node, expr_int_value (self)); |
|---|
| | 1361 | return ret; |
|---|
| | 1362 | } |
|---|
| | 1363 | |
|---|
| | 1364 | /// Creates new compiler context. |
|---|
| | 1365 | al60l_bind_state_t * |
|---|
| | 1366 | new_bind_state (void) |
|---|
| | 1367 | { |
|---|
| | 1368 | al60l_bind_state_t * ret = xmalloc (sizeof (al60l_bind_state_t)); |
|---|
| | 1369 | ret->expression_build_generic = new_visitor_expr ( |
|---|
| | 1370 | a60_expr_callback (expr_int_build_generic), |
|---|
| | 1371 | a60_expr_callback (expr_real_build_generic), |
|---|
| | 1372 | a60_expr_callback (expr_string_build_generic), |
|---|
| | 1373 | a60_expr_callback (expr_bool_build_generic), |
|---|
| | 1374 | a60_expr_callback (expr_idref_build_generic), |
|---|
| | 1375 | a60_expr_callback (expr_if_build_generic), |
|---|
| | 1376 | a60_expr_callback (expr_binary_build_generic), |
|---|
| | 1377 | a60_expr_callback (expr_unary_build_generic), |
|---|
| | 1378 | a60_expr_callback (expr_call_build_generic), |
|---|
| | 1379 | a60_expr_callback (expr_subscript_build_generic) |
|---|
| | 1380 | ); |
|---|
| | 1381 | return ret; |
|---|
| | 1382 | } |
|---|
| | 1383 | |
|---|
| | 1384 | /// Wrapper function, uses visitor to build GENERIC from given expression. |
|---|
| | 1385 | tree |
|---|
| | 1386 | expr_build_generic (expression_t * expression, al60l_bind_state_t * state) |
|---|
| | 1387 | { |
|---|
| | 1388 | return (tree)a60_visitor_dispatch (state->expression_build_generic, |
|---|
| | 1389 | expression, expression, state); |
|---|
| | 1390 | } |
|---|
| | 1391 | |
|---|
| | 1392 | \end{verbatim} |
|---|
| | 1393 | \caption{Example use of visitors in \Algol 60 front end.} |
|---|
| | 1394 | \label{Figure++Visitors} |
|---|
| | 1395 | \end{figure} |
|---|
| | 1396 | |
|---|
| | 1397 | |
|---|
| | 1398 | \subsection{Dynamic Typing} |
|---|
| | 1399 | |
|---|
| | 1400 | To implement rigorous runtime checking, each polymorphic objects |
|---|
| | 1401 | carries around two identifiers: a {\em kind}, which identifies subtype |
|---|
| | 1402 | (e.g. ``number''), and {\em signature}, which identifies type |
|---|
| | 1403 | (e.g. ``expression''). Kind is actually used to model RTTI, but not |
|---|
| | 1404 | so signature. Given a \literal{void*}, signature makes it possible to |
|---|
| | 1405 | decide whether it points to (e.g.) expression or not. This property |
|---|
| | 1406 | is not used to emulate RTTI, but rather to provide additional type |
|---|
| | 1407 | safety, for example in a code like this: |
|---|
| | 1408 | |
|---|
| | 1409 | \begin{verbatim} |
|---|
| | 1410 | void * mystery = slist_front (state->for_statements); |
|---|
| | 1411 | statement_t * parent_for_stmt = a60_as_statement (mystery); |
|---|
| | 1412 | //statement_t * parent_for_stmt = (statement_t *)mystery; |
|---|
| | 1413 | \end{verbatim} |
|---|
| | 1414 | |
|---|
| | 1415 | On first line, mysterious object is extracted from singly-linked list. |
|---|
| | 1416 | Instead of typecasting it like in commented-out code, dedicated |
|---|
| | 1417 | function is called, that {\em actually checks} exact type, and if it |
|---|
| | 1418 | doesn't match, aborts. Thus you can't conditionalize on the result of |
|---|
| | 1419 | typecheck. The mechanism is here to {\em prevent errors}, not to |
|---|
| | 1420 | emulate dynamic typing systems. This is C, you have to know what |
|---|
| | 1421 | pointers you're passing around! |
|---|
| | 1422 | |
|---|
| | 1423 | In release mode, the checks go away, and \function{a60\_as\_statement} |
|---|
| | 1424 | ends up being a macro that expands to direct C typecast. |
|---|
| | 1425 | |
|---|
| | 1426 | This same dynamic typing mechanism is used for visitors. Visitor |
|---|
| | 1427 | knows which object type it should dispatch on, and given a visitor and |
|---|
| | 1428 | an object (passed in as \literal{void*}), it can decided whether the |
|---|
| | 1429 | visitor matches the object type. |
|---|
| | 1430 | |
|---|
| | 1431 | Dynamic typing is used for implementation of {\em typed slists}, |
|---|
| | 1432 | i.e. singly-linked lists with associated predicate that is guaranteed |
|---|
| | 1433 | to hold for its elements. For example, in \Algol 60, each assignment |
|---|
| | 1434 | statement is made up from a vector of left hand sides, which have to |
|---|
| | 1435 | be lvalues, and one right hand side. To express this property, code |
|---|
| | 1436 | such as the one at figure \ref{Figure++TypedSlist} can be written. |
|---|
| | 1437 | |
|---|
| | 1438 | \begin{figure} |
|---|
| | 1439 | \begin{verbatim} |
|---|
| | 1440 | static void * |
|---|
| | 1441 | private_check_expr_lvalue (void * ptr, void * data ATTRIBUTE_UNUSED) |
|---|
| | 1442 | { |
|---|
| | 1443 | expression_t * expr = a60_as_expression (ptr); |
|---|
| | 1444 | if (expr && !expr_is_lvalue (expr)) |
|---|
| | 1445 | expr = NULL; |
|---|
| | 1446 | return expr; |
|---|
| | 1447 | } |
|---|
| | 1448 | |
|---|
| | 1449 | statement_t * |
|---|
| | 1450 | new_stmt_assign (cursor_t * cursor, slist_t * lhss, expression_t * rhs) |
|---|
| | 1451 | { |
|---|
| | 1452 | statement_t * ret = private_new_statement (sk_assign, cursor, NULL); |
|---|
| | 1453 | slist_set_type (lhss, private_check_expr_lvalue, NULL); |
|---|
| | 1454 | ret->assign.lhss = lhss; |
|---|
| | 1455 | ret->assign.rhs = rhs; |
|---|
| | 1456 | return ret; |
|---|
| | 1457 | } |
|---|
| | 1458 | \end{verbatim} |
|---|
| | 1459 | \caption{Example use of typed slists in \Algol 60 front end.} |
|---|
| | 1460 | \label{Figure++TypedSlist} |
|---|
| | 1461 | \end{figure} |
|---|
| | 1462 | |
|---|
| | 1463 | A function \function{private\_check\_expr\_lvalue} is a predicate, |
|---|
| | 1464 | that, given a \literal{void*} pointer, can decided whether the given |
|---|
| | 1465 | pointer points to lvalue expression. It does so by first checking |
|---|
| | 1466 | that \literal{void*} is expression at all, and then asks expression |
|---|
| | 1467 | module (because it's safe by now) on lvalue status. This predicate is |
|---|
| | 1468 | attached as a type to the passed-in list of left hand sides |
|---|
| | 1469 | (\literal{lhss}), thus providing additional checking in case anything |
|---|
| | 1470 | slipped between fingers of yacc parser. Slist module would abort if |
|---|
| | 1471 | one of the elements didn't pass the test, or an element that doesn't |
|---|
| | 1472 | pass was added. |
|---|
| | 1473 | |
|---|
| | 1474 | As is the case for other dynamic features, in release mode this one |
|---|
| | 1475 | goes completely away, so there is no performance cost. |
|---|
| | 1476 | |
|---|
| | 1477 | |
|---|
| | 1478 | \subsection{Implementation} |
|---|
| | 1479 | |
|---|
| | 1480 | Visitors and dynamic typing go hand in hand, and are both implemented |
|---|
| | 1481 | by the same module. Each module, that wants to participate in this |
|---|
| | 1482 | scheme, has to list the field \literal{visitable\_t} (from |
|---|
| | 1483 | \file{visitor-impl.h}) as the first member of its data structure: |
|---|
| | 1484 | |
|---|
| | 1485 | \begin{verbatim} |
|---|
| | 1486 | struct struct_expression_t |
|---|
| | 1487 | { |
|---|
| | 1488 | visitable_t base; |
|---|
| | 1489 | // rest of structure as usual |
|---|
| | 1490 | }; |
|---|
| | 1491 | \end{verbatim} |
|---|
| | 1492 | |
|---|
| | 1493 | The \literal{base} structure contains a signature (only in debug |
|---|
| | 1494 | mode), and a kind determinator. |
|---|
| | 1495 | |
|---|
| | 1496 | Kind is simple \literal{int}, and its values would typically be |
|---|
| | 1497 | determined by those of some internal \literal{enum} values. Signature |
|---|
| | 1498 | is \literal{char const*}, and is used as follows: |
|---|
| | 1499 | |
|---|
| | 1500 | \begin{verbatim} |
|---|
| | 1501 | static char const * const private_expression_signature = "expression"; |
|---|
| | 1502 | static expression_t * |
|---|
| | 1503 | private_new_expr (cursor_t * location, expr_kind_t kind) |
|---|
| | 1504 | { |
|---|
| | 1505 | expression_t * ret = calloc (1, sizeof (expression_t)); |
|---|
| | 1506 | #ifndef NDEBUG |
|---|
| | 1507 | ret->base.signature = private_expression_signature; |
|---|
| | 1508 | #endif |
|---|
| | 1509 | // ... |
|---|
| | 1510 | } |
|---|
| | 1511 | \end{verbatim} |
|---|
| | 1512 | |
|---|
| | 1513 | This is extremely useful for debugging and diagnostic messages. All |
|---|
| | 1514 | you have to do to find out the type of mysterious object is to |
|---|
| | 1515 | dereference the pointer and interpret it as a string. |
|---|
| | 1516 | |
|---|
| | 1517 | When creating a visitor, not a signature, but address of signature |
|---|
| | 1518 | variable is used. When doing a dispatch, the visitor performs simple |
|---|
| | 1519 | check: |
|---|
| | 1520 | |
|---|
| | 1521 | \begin{verbatim} |
|---|
| | 1522 | #ifndef NDEBUG |
|---|
| | 1523 | /* check signature */ |
|---|
| | 1524 | char * obj_sig = *(char const* const*)object; |
|---|
| | 1525 | char * vis_sig = **(char const* const* *)visitor; |
|---|
| | 1526 | if (obj_sig != vis_sig) |
|---|
| | 1527 | { |
|---|
| | 1528 | fprintf (stderr, |
|---|
| | 1529 | "error: visitor %p for `%s' dispatches over `%s'.\n", |
|---|
| | 1530 | (void*)visitor, vis_sig, obj_sig); |
|---|
| | 1531 | abort (); |
|---|
| | 1532 | } |
|---|
| | 1533 | #endif |
|---|
| | 1534 | \end{verbatim} |
|---|
| | 1535 | |
|---|
| | 1536 | As you can see, this very typing mechanism gives us ability to deduce |
|---|
| | 1537 | what's wrong, and inform a user in meaningful way: |
|---|
| | 1538 | |
|---|
| | 1539 | \begin{verbatim} |
|---|
| | 1540 | error: visitor 0x0804d200 for `expression' dispatches over `statement'. |
|---|
| | 1541 | \end{verbatim} |
|---|
| | 1542 | |
|---|
| | 1543 | |
|---|
| | 1544 | %=====================================================================% |
|---|
| | 1545 | |
|---|