137x Filetype PDF File size 0.04 MB Source: courses.cs.washington.edu
CSE 401: Introduction to Compiler Construction Course Outline Text:ModernCompilerImplementationinJava,SecondEdition, Compiler front-ends: by Appel, with Palsberg • lexical analysis (scanning): characters → tokens Suggested: Compilers - Principles, Techniques, and Tools, by • syntactic analysis (parsing): tokens → abstract syntax trees Ahoet al. (the "Dragon Book") • semantic analysis (typechecking): annotate ASTs Goals: Midterm • learn principles & practice of language implementation • brings together theory & pragmatics of previous courses Compiler back-ends: • understand compile-time vs. run-time processing • intermediate code generation: ASTs → intermediate code • study interactions among: • target code generation: intermediate code → target code • language features • implementation efficiency • run-time storage layout • compiler complexity • target instruction selection • architectural features • register allocation • gain more experience with object-oriented design & Java • optimizations • gain more experience working on a team Final Prerequisites: 322, 326, 341, 378 Sign up on course mailing list! Craig Chambers 1 CSE 401 Craig Chambers 2 CSE 401 Project Grading Start with compiler for MiniJava, written in Java Project: 40% total Homework: 20% total Add: Midterm: 15% • comments Final: 25% • floating-point values • arrays • static (class) variables Homework & projects due at the start of class • for loops • break statements 3 free late days, per person, for the whole quarter • and more • thereafter, 25% off per calendar day late Completed in stages over the quarter Strongly encourage working in a 2-person team on project • but only if joint work, not divided work Grading based on: • correctness • clarity of design & implementation • quality of test cases Craig Chambers 3 CSE 401 Craig Chambers 4 CSE 401 An example compilation First step: lexical analysis Sample (extended) MiniJava program: Factorial.java “Scanning”, “tokenizing” // Computes 10! and prints it out Read in characters, clump into tokens class Factorial { • strip out whitespace & comments in the process public static void main(String[] a) { System.out.println( new Fac().ComputeFac(10)); } } class Fac { // the recursive helper function public int ComputeFac(int num) { int numAux; if (num < 1) numAux = 1; else numAux = num * this.ComputeFac(num-1); return numAux; } } Craig Chambers 5 CSE 401 Craig Chambers 6 CSE 401 Specifying tokens: regular expressions Second step: syntactic analysis Example: “Parsing” Ident ::= Letter AlphaNum* Read in tokens, turn into a tree based on syntactic structure Integer ::= Digit+ • report any errors in syntax AlphaNum::= Letter | Digit Letter ::= 'a' | ... | 'z' | 'A' | ... | 'Z' Digit ::= '0' | ... | '9' Craig Chambers 7 CSE 401 Craig Chambers 8 CSE 401 Specifying syntax: context-free grammars Third step: semantic analysis EBNF is a popular notation for CFG’s “Name resolution and typechecking” Example: Given AST: Stmt ::= if ( Expr ) Stmt [else Stmt] • figure out what declaration each name refers to | while ( Expr ) Stmt • perform typechecking and other static consistency checks | ID = Expr; | ... Key data structure: symbol table Expr ::= Expr + Expr | Expr < Expr | ... • maps names to info about name derived from declaration | ! Expr • tree of symbol tables corresponding to nesting of scopes | Expr . ID ( [Expr {, Expr}] ) | ID Semantic analysis steps: | Integer | ... 1. Process each scope, top down | (Expr) | ... 2. Process declarations in each scope into symbol table for scope EBNF specifies concrete syntax of language 3. Process body of each scope in context of symbol table Parser usually constructs tree representing abstract syntax of language Craig Chambers 9 CSE 401 Craig Chambers 10 CSE 401 Fourth step: intermediate code generation Example Given annotated AST & symbol tables, int Fac.ComputeFac(*? this, int num) { translate into lower-level intermediate code int T1, numAux, T8, T3, T7, T2, T6, T0; T0 := 1; Intermediate code is a separate language T1 := num < T0; • Source-language independent ifnonzero T1 goto L0; • Target-machine independent T2 := 1; T3 := num - T2; T6 := Fac.ComputeFac(this, T3); Intermediate code is simple and regular T7 := num * T6; ⇒ good representation for doing optimizations numAux := T7; goto L2; Mightbeareasonabletargetlanguageitself,e.g.Javabytecode label L0; T8 := 1; numAux := T8; label L2; return numAux; } Craig Chambers 11 CSE 401 Craig Chambers 12 CSE 401 Fifth step: target (machine) code generation Summary of compiler phases Analysis Synthesis Translate intermediate code into target code of input program of output program (front-end) (back-end) Need to do: character • instruction selection: choose target instructions for stream intermediate (subsequences of) intermediate code instructions form • register allocation: allocate intermediate code variables to Lexical Analysis machine registers, spilling excess to stack token Optimization • compute layout of each procedure’s stack frame & stream other run-time data structures Syntactic Analysis • emit target code intermediate abstract form syntax tree Code Generation Semantic Analysis target language annotated AST Intermediate Code Generation Ideal: many front-ends, many back-ends sharing one intermediate language Craig Chambers 13 CSE 401 Craig Chambers 14 CSE 401 Other language processing tools Engineering issues Compilers translate the input language into Compilers are hard to design so that they are a different, usually lower-level, target language • fast • highly optimizing Interpreters directly execute the input language • extensible & evolvable • same front-end structure as a compiler • correct • then evaluate the annotated AST, or translate to intermediate code and evaluate that Some parts of compilers can be automatically generated from specifications, e.g., scanners, parsers, & target code Software engineering tools can resemble compilers generators • same front-end structure as a compiler • generated parts are fast & correct • then: • specifications are easily evolvable • pretty-print/reformat/colorize (Some of my current research is on generating fast, correct • analyze to compute relationships like declarations/uses, optimizations from specifications.) calls/callees, etc. • analyze to find potential bugs Need good management of software complexity • aid in refactoring/restructuring/evolving programs Craig Chambers 15 CSE 401 Craig Chambers 16 CSE 401
no reviews yet
Please Login to review.