Changeset 171

Show
Ignore:
Timestamp:
05/15/07 06:53:33 (2 years ago)
Author:
ant_39
Message:
  • Work on thesis.
Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • trunk/doc/dp.tex

    r158 r171  
    2222\title{Construction of GNU Compiler Collection Front End} 
    2323 
    24 \newif\iffinal\finalfals
     24\newif\iffinal\finaltru
    2525\newif\ifnfinal\iffinal\nfinalfalse\else\nfinaltrue\fi 
    2626 
     
    11181118\literal{\$(MAKE) -C} should be enough, but you will have to take care 
    11191119of passing necessary build/host/target trichotomy 
    1120 \footnote{See section \ref{CrossCompilationIssues} for details.} 
    11211120variables and paths over to your build system, and respecting them 
    11221121there.  This in fact holds for all various variables that are passed 
     
    11311130C-like preprocessing or inline assembly.  These themes are covered in 
    11321131dedicated chapters \ref{GCCServices} and \ref{LanguageExtenstions}, 
    1133 respectively. \NWY 
     1132respectively. 
    11341133 
    11351134For special-casing the code that's built as part of GCC, you can use 
     
    11461145 
    11471146Hand-managed code is OK with GCC, so it's best to keep the collector 
    1148 off already written codebase. 
     1147off already written codebase.  On the other hand, if you know you are 
     1148writing what will become part of GCC, it is perhaps better to use 
     1149garbage collector from the beginning.  It saves some nerves. 
    11491150 
    11501151\subsection{Versioning Issues} 
     
    11711172\file{trunk/gcc/gcc/testsuite/} in their respective sections, 
    11721173\ref{RuntimeLibraries} and \ref{TestSuite}. For now, you can skip 
    1173 their description, if you so wish. \NWY 
     1174their description, if you so wish. 
    11741175 
    11751176The toplevel split to \file{trunk/doc/} and \file{trunk/gcc/} is here 
     
    12031204But for test suite, you will have to add two files into subdirectory 
    12041205\file{gcc/gcc/testsuite/lib/}, and you have to symlink those two files 
    1205 explicitly: 
     1206explicitly (we need to ``merge'' the two directory subtrees, which is 
     1207not simply done in most OS's): 
    12061208 
    12071209\begin{verbatim} 
     
    12201222directory, and symlink there all the files from front end's 
    12211223\file{gcc/libga60/}.  You will have to either take care and symlink 
    1222 back by hand when someone adds new files into version system, or store 
    1223 all source files in subdirectory e.g. \file{gcc/libga60/src/}, symlink 
    1224 that, and only keep in \file{gcc/libga60/} the build machinery files 
    1225 that need to be there. 
     1224back by hand when someone adds new files into version system (maybe 
     1225with support of nice VCS hook), or store all source files in 
     1226subdirectory e.g. \file{gcc/libga60/src/}, symlink that, and only keep 
     1227in \file{gcc/libga60/} the build machinery files that need to be 
     1228there. 
    12261229 
    12271230\section{Summary} 
     
    12561259\end{itemize} 
    12571260 
    1258  
    12591261%=====================================================================% 
    12601262 
    1261 \iffinal 
     1263\chapter{The Algol 60 GCC Frontend} 
     1264 
     1265The \Algol 60 GCC front end has been written in an attempt to get a 
     1266first hand experience on work with GCC.  \Algol 60 itself turned out 
     1267to be a kinky language, and as of now, is not yet fully implemented. 
     1268This chapter presents overview of the compiler architecture and 
     1269implementation, and ways it handles certain interesting \Algol 60 
     1270constructs. 
     1271 
     1272 
     1273\section[gcc-algol the Project]{{\tt gcc-algol} the Project} 
     1274 
     1275The programming language \Algol 60 was picked purposefuly, because 
     1276language such as this is best fit for GCC.  It is structured, with 
     1277static typing, has arrays and functions, and will profit from support 
     1278of runtime library.  Lastly it has a rigorous standard, which is not 
     1279that important in and on itself, but why invent new language? 
     1280 
     1281The \Algol 60 compiler was started as a project independent on 
     1282GCC\footnote{Despite the language being picked to suit GCC!}.  I knew 
     1283I would plug it in GCC one day, but I specifically avoided any 
     1284relation to GCC before the time came.  This helped me simulate the 
     1285situation where the company or an individual wants to port existing 
     1286work to GCC back end\footnote{In the sense of GCC itself being the 
     1287  back end.}. 
     1288 
     1289The front end is written in a language C.  As mentioned before, C is 
     1290natural fit for GCC.  If you have the choice, choose either C, or 
     1291other GCC language that can compile transitively from C.  Front end 
     1292written in such a way is simpler to distribute and build at user 
     1293sites. 
     1294 
     1295I strived to avoid any extra dependencies of the front end, to the end 
     1296of writing linked list and generic string modules from scratch just 
     1297for the purpose of \Algol parser.  These two modules are the only 
     1298library functionality beyond pure \literal{libc}---even symbol tables 
     1299are stored in linked lists \footnote{I will fix this, I promise.}. 
     1300 
     1301 
     1302\section{Overall Architecture} 
     1303 
     1304From the bird's eye view, \Algol compiler is a straightforward 
     1305compiler: it has Flex-based lexical analyser, Yacc-based parser, and 
     1306several C modules with AST and semantic analysis. 
     1307 
     1308Algol compiler uses encapsulation heavily.  There is no (official) way 
     1309to get inside structures, everything is passed as a pointer to 
     1310undefined structure.  Only in the module that handles that particular 
     1311type is the structure actually defined, so that the implementation has 
     1312full, unrestricted access to bits. 
     1313 
     1314The encapsulation zeal is exposed by use of visitors.  There are 
     1315certain polymorphic structures in \Algol AST.  For example expressions 
     1316may be numbers, variable references, unary and binary expressions, 
     1317function calls, etc.  All expression kinds are passed around as the 
     1318same opaque \variable{expression\_t} pointer.  To do any action 
     1319depending on exact type of expression, one has to define a {\em 
     1320  visitor} over the expression structure, and let the visitor dispatch 
     1321on the desired expression object. 
     1322 
     1323Other means of achieving safety include strong trend towards static. 
     1324Whatever can be checked by compiler giving a warning is checked so. 
     1325If the warning can be turned to full error with some extra coding, so 
     1326much the better.  If compiler can't check that, runtime will. 
     1327 
     1328 
     1329\subsection{Visitors} 
     1330 
     1331The figure \ref{Figure++Visitors} presents an example of visitor 
     1332creation and usage.  You can see that the expression visitor is 
     1333created by a dedicated function \function{new\_visitor\_expr}, which 
     1334accepts one argument for each expression kind.  The arguments have to 
     1335be \function{callback\_t} pointers: to convert function pointer to 
     1336callback, you have to pass it through appropriate callback builder, 
     1337\function{a60\_expr\_callback} in this case.  Callback builder makes 
     1338sure that your callback function takes \literal{expression\_t*} as a 
     1339first argument.  All is checked during compilation: that you didn't 
     1340forget one of the callbacks, that all callbacks are typed right, etc. 
     1341 
     1342In addition, the visitor builder knows it builds visitor for 
     1343expression type, and when in debug mode, it {\em dynamically} checks 
     1344that that visitor is dispatched over expression, and not 
     1345e.g. statement.  In release mode, the check goes away. 
     1346 
     1347\begin{figure} 
     1348\begin{verbatim} 
     1349/// Excerpt from compiler context. 
     1350struct struct_al60l_bind_state_t 
     1351
     1352  visitor_t * expression_build_generic; 
     1353}; 
     1354 
     1355/// An example callback function. 
     1356/// Builds GENERIC for integer constant node. 
     1357void * 
     1358expr_int_build_generic (expression_t * self, void * state) 
     1359
     1360  tree ret = build_int_cst (integer_type_node, expr_int_value (self)); 
     1361  return ret; 
     1362
     1363 
     1364/// Creates new compiler context. 
     1365al60l_bind_state_t * 
     1366new_bind_state (void) 
     1367
     1368  al60l_bind_state_t * ret = xmalloc (sizeof (al60l_bind_state_t)); 
     1369  ret->expression_build_generic = new_visitor_expr ( 
     1370      a60_expr_callback (expr_int_build_generic), 
     1371      a60_expr_callback (expr_real_build_generic), 
     1372      a60_expr_callback (expr_string_build_generic), 
     1373      a60_expr_callback (expr_bool_build_generic), 
     1374      a60_expr_callback (expr_idref_build_generic), 
     1375      a60_expr_callback (expr_if_build_generic), 
     1376      a60_expr_callback (expr_binary_build_generic), 
     1377      a60_expr_callback (expr_unary_build_generic), 
     1378      a60_expr_callback (expr_call_build_generic), 
     1379      a60_expr_callback (expr_subscript_build_generic) 
     1380  ); 
     1381  return ret; 
     1382
     1383 
     1384/// Wrapper function, uses visitor to build GENERIC from given expression. 
     1385tree 
     1386expr_build_generic (expression_t * expression, al60l_bind_state_t * state) 
     1387
     1388  return (tree)a60_visitor_dispatch (state->expression_build_generic, 
     1389                                     expression, expression, state); 
     1390
     1391 
     1392\end{verbatim} 
     1393\caption{Example use of visitors in \Algol 60 front end.} 
     1394\label{Figure++Visitors} 
     1395\end{figure} 
     1396 
     1397 
     1398\subsection{Dynamic Typing} 
     1399 
     1400To implement rigorous runtime checking, each polymorphic objects 
     1401carries around two identifiers: a {\em kind}, which identifies subtype 
     1402(e.g. ``number''), and {\em signature}, which identifies type 
     1403(e.g. ``expression'').  Kind is actually used to model RTTI, but not 
     1404so signature.  Given a \literal{void*}, signature makes it possible to 
     1405decide whether it points to (e.g.) expression or not.  This property 
     1406is not used to emulate RTTI, but rather to provide additional type 
     1407safety, for example in a code like this: 
     1408 
     1409\begin{verbatim} 
     1410  void * mystery = slist_front (state->for_statements); 
     1411  statement_t * parent_for_stmt = a60_as_statement (mystery); 
     1412  //statement_t * parent_for_stmt = (statement_t *)mystery; 
     1413\end{verbatim} 
     1414 
     1415On first line, mysterious object is extracted from singly-linked list. 
     1416Instead of typecasting it like in commented-out code, dedicated 
     1417function is called, that {\em actually checks} exact type, and if it 
     1418doesn't match, aborts.  Thus you can't conditionalize on the result of 
     1419typecheck.  The mechanism is here to {\em prevent errors}, not to 
     1420emulate dynamic typing systems.  This is C, you have to know what 
     1421pointers you're passing around! 
     1422 
     1423In release mode, the checks go away, and \function{a60\_as\_statement} 
     1424ends up being a macro that expands to direct C typecast. 
     1425 
     1426This same dynamic typing mechanism is used for visitors.  Visitor 
     1427knows which object type it should dispatch on, and given a visitor and 
     1428an object (passed in as \literal{void*}), it can decided whether the 
     1429visitor matches the object type. 
     1430 
     1431Dynamic typing is used for implementation of {\em typed slists}, 
     1432i.e. singly-linked lists with associated predicate that is guaranteed 
     1433to hold for its elements.  For example, in \Algol 60, each assignment 
     1434statement is made up from a vector of left hand sides, which have to 
     1435be lvalues, and one right hand side.  To express this property, code 
     1436such as the one at figure \ref{Figure++TypedSlist} can be written. 
     1437 
     1438\begin{figure} 
     1439\begin{verbatim} 
     1440static void * 
     1441private_check_expr_lvalue (void * ptr, void * data ATTRIBUTE_UNUSED) 
     1442
     1443  expression_t * expr = a60_as_expression (ptr); 
     1444  if (expr && !expr_is_lvalue (expr)) 
     1445    expr = NULL; 
     1446  return expr; 
     1447
     1448 
     1449statement_t * 
     1450new_stmt_assign (cursor_t * cursor, slist_t * lhss, expression_t * rhs) 
     1451
     1452  statement_t * ret = private_new_statement (sk_assign, cursor, NULL); 
     1453  slist_set_type (lhss, private_check_expr_lvalue, NULL); 
     1454  ret->assign.lhss = lhss; 
     1455  ret->assign.rhs = rhs; 
     1456  return ret; 
     1457
     1458\end{verbatim} 
     1459\caption{Example use of typed slists in \Algol 60 front end.} 
     1460\label{Figure++TypedSlist} 
     1461\end{figure} 
     1462 
     1463A function \function{private\_check\_expr\_lvalue} is a predicate, 
     1464that, given a \literal{void*} pointer, can decided whether the given 
     1465pointer points to lvalue expression.  It does so by first checking 
     1466that \literal{void*} is expression at all, and then asks expression 
     1467module (because it's safe by now) on lvalue status.  This predicate is 
     1468attached as a type to the passed-in list of left hand sides 
     1469(\literal{lhss}), thus providing additional checking in case anything 
     1470slipped between fingers of yacc parser.  Slist module would abort if 
     1471one of the elements didn't pass the test, or an element that doesn't 
     1472pass was added. 
     1473 
     1474As is the case for other dynamic features, in release mode this one 
     1475goes completely away, so there is no performance cost. 
     1476 
     1477 
     1478\subsection{Implementation} 
     1479 
     1480Visitors and dynamic typing go hand in hand, and are both implemented 
     1481by the same module.  Each module, that wants to participate in this 
     1482scheme, has to list the field \literal{visitable\_t} (from 
     1483\file{visitor-impl.h}) as the first member of its data structure: 
     1484 
     1485\begin{verbatim} 
     1486struct struct_expression_t 
     1487
     1488  visitable_t base; 
     1489  // rest of structure as usual 
     1490}; 
     1491\end{verbatim} 
     1492 
     1493The \literal{base} structure contains a signature (only in debug 
     1494mode), and a kind determinator. 
     1495 
     1496Kind is simple \literal{int}, and its values would typically be 
     1497determined by those of some internal \literal{enum} values.  Signature 
     1498is \literal{char const*}, and is used as follows: 
     1499 
     1500\begin{verbatim} 
     1501static char const * const private_expression_signature = "expression"; 
     1502static expression_t * 
     1503private_new_expr (cursor_t * location, expr_kind_t kind) 
     1504
     1505  expression_t * ret = calloc (1, sizeof (expression_t)); 
     1506#ifndef NDEBUG 
     1507  ret->base.signature = private_expression_signature; 
     1508#endif 
     1509  // ... 
     1510
     1511\end{verbatim} 
     1512 
     1513This is extremely useful for debugging and diagnostic messages. All 
     1514you have to do to find out the type of mysterious object is to 
     1515dereference the pointer and interpret it as a string. 
     1516 
     1517When creating a visitor, not a signature, but address of signature 
     1518variable is used.  When doing a dispatch, the visitor performs simple 
     1519check: 
     1520 
     1521\begin{verbatim} 
     1522#ifndef NDEBUG 
     1523  /* check signature */ 
     1524  char * obj_sig = *(char const* const*)object; 
     1525  char * vis_sig = **(char const* const* *)visitor; 
     1526  if (obj_sig != vis_sig) 
     1527    { 
     1528      fprintf (stderr, 
     1529               "error: visitor %p for `%s' dispatches over `%s'.\n", 
     1530               (void*)visitor, vis_sig, obj_sig); 
     1531      abort (); 
     1532    } 
     1533#endif 
     1534\end{verbatim} 
     1535 
     1536As you can see, this very typing mechanism gives us ability to deduce 
     1537what's wrong, and inform a user in meaningful way: 
     1538 
     1539\begin{verbatim} 
     1540error: visitor 0x0804d200 for `expression' dispatches over `statement'. 
     1541\end{verbatim} 
     1542 
     1543 
     1544%=====================================================================% 
     1545 
    12621546\chapter{Targeting GCC} 
    12631547\label{TargetingGCC} 
    12641548 
    1265 This chapter should describe how to generate GCC intermediate language 
    1266 called GENERIC.  Also use hooks if possible. 
    1267  
    1268 \section{Symbol Handling} 
     1549This chapter describes how to generate GCC intermediate language 
     1550called GENERIC.  It is organized as a catalogue of typical constructs 
     1551found in programming language, and typical tasks performed by a 
     1552compiler, with explanation how to handle these in context of GCC. 
     1553 
     1554\section{Variables, Types, Symbols} 
     1555 
     1556A fundamental part of any programming language processing is dealing 
     1557with {\em symbols}, ways of typing and typechecking them, and handling 
     1558scoping rules. 
     1559 
     1560 
    12691561 
    12701562Symbol handling: probably has to be front end specific, although GCC 
    12711563can store declarations.  See how other frontends do their thing.  Name 
    12721564mangling, C and C++-compatible interfacing (extern "C"). 
    1273  
    1274 \section{Variables and Types} 
    12751565 
    12761566How to do typechecking, representing various types (including arrays, 
     
    13491639 
    13501640Look into \cite{GDS:2005:Dvorak}.  Maybe find better source. 
    1351 \fi 
    13521641 
    13531642%=====================================================================%