首页
编程语言
数据库
网络开发
Algorithm算法
移动开发
系统相关
金融统计
人工智能
其他
首页
>
> 详细
program讲解 、辅导 C/C++设计程序
Assignment 1: Micro Language Compiler
1 Introduction
In this assignment, you are required to design and implement a compiler frontend for Micro
language which transforms the Micro Program into corresponding LLVM Intermediate Representation (IR) and finally translated to RISC-V assembly code and executable with the help of
LLVM optimizer and its RISC-V backend. After that, we can execute the compiled program on our
RISC-V docker container to verify the correctness of your compiler.
Since it is a senior major elective course, we don’t want to set any limitation for you. You are strongly
recommended to use Lex/Flex and Yacc/Bison taught in tutorial 3 to design your compiler frontend,
but it is not forcible. You can choose Any Programming Language you like for this assignment,
but the RISC-V container we use only has C/C++ toolchain installed and you need to provide me a
Dockerfile to run your compiler and execute the RISC-V program, which may need some extra effort.
Some languages also provide tools like Lex and Yacc, and you are free to use them. It is also OK if
you want to design the scanner and parser by hand instead of using tools.
2 Micro Language
Before we move on to the compiler design, it is necessary to have an introduction to Micro Language,
that serves as the input of our compiler. Actually, it is a very simple language, with limited number
of tokens and production rules of context-free grammar (CFG):
• Only integers(i32); No float numbers
• No Declarations
• Variable consists of a-z, A-Z, 0-9, at most 32 characters long, must start with character and are
initialized as 0
• Comments begin with ”−−” and end with end-of-line(EOL)
• Three kind of statements:
– assignments, e.g. a:=b+c
– read(list of IDs), e.g. read(a, b)
– write(list of Expressions), e.g. write (a, b, a+b)
• BEGIN, END, READ, WRITE are reserved words
• Tokens may not extend to the following line
1
2.1 Tokens & Regular Expression
Micro Language has 14 Tokens, and the regular expression for each token is listed below. Since BEGIN
is a reserved word in C/C++, we need to use the alternative BEGIN as the token class.
1. BEGIN : begin
2. END: end
3. READ: read
4. WRITE: write
5. LPAREN: (
6. RPAREN: )
7. SEMICOLON: ;
8. COMMA: ,
9. ASSIGNOP: :=
10. PLUSOP: +
11. MINUSOP: −
12. ID: [a−zA−Z][a−zA−Z0−9 ]{0,31}
13. INTLITERAL: −?[0−9]+
14. SCANEOF: <
>
2.2 Context Free Grammar
Here is the extended context-free grammar (CFG) of Micro Language:
1.
→
SCANEOF
2.
→ BEGIN
END
3.
→
{
}
4.
→ ID ASSIGNOP
;
5.
→ READ LPAREN
RPAREN;
6.
→ WRITE LPAREN
RPAREN;
7.
→ ID {COMMA ID}
8.
→
{COMMA
}
9.
→
{
}
10.
→ LPAREN
RPAREN
11.
→ ID
12.
→ INTLITERAL
13.
→ PLUSOP
14.
→ MINUSOP
Note: {} means the content inside can appear 0, 1 or multiple times.
2.3 How to Run Micro Compiler
Here is a very simple Micro program that we are going to use as the sample program throughout this
instruction. SCANEOF is the end of line and we do not need to explicitly write it in the program.
−− Expected Output: 30
begin
A := 10;
B := A + 20;
write (B);
end
We can use our compiler to compile, optimize and execute this program to get expected output 30.
Note: The exact command to run your compiler is up to you. Just specify out in your report how
to compile and execute your compiler to get the expected output.
118010200@c2d52c9b1339:˜/A1$ ./compiler ./testcases/test0.m
118010200@c2d52c9b1339:˜/A1$ llc −march=riscv64 ./program.ll −o ./program.s
118010200@c2d52c9b1339:˜/A1$ riscv64−unknown−linux−gnu−gcc ./program.s −o ./program
118010200@c2d52c9b1339:˜/A1$ qemu−riscv64 −L /opt/riscv/sysroot ./program
30
2
3 Compiler Design
Figure 1 shows the overall structure of a compiler. In this assignment, your task is to implement the
frontend only, which contains scanner, parser and intermediate code generator and form a whole →
compiler with LLVM for Micro language.
Figure 1: Compiler Structure
3.1 Scanner
Scanner takes input character stream and extracts out a series of tokens, and your scanner should
be able to print out both token class and lexeme for each token. Figure ?? shows an example of the
sample Micro program.
118010200@c2d52c9b1339:˜/A1$ ./compiler ./testcases/test0.m −−scan−only
BEGIN begin
ID A
ASSIGNOP :=
INTLITERAL 10
SEMICOLON ;
ID B
ASSIGNOP :=
ID A
PLUOP +
INTLITERAL 20
SEMICOLON ;
WRITE write
LPAREN (
ID B
RPAREN )
SEMICOLON ;
END end
SCANEOF
3
3.2 Parser
Parser receives the tokens extracted from scanner, and generates a parse tree (or concrete syntax tree),
and futhermore, the abstract syntax tree (AST) based on the CFG. Your compiler should be able to
print out both the parse tree and the abstract syntax tree, and visualize them with graphviz.
3.2.1 Parse Tree (Concrete Syntax Tree)
Figure 2 shows an example of the concrete syntax tree generated from the sample program:
Figure 2: Concrete Syntax Tree of Sample Program
3.2.2 Abstract Syntax Tree
Figure 3 shows an example of the abstract syntax tree generated from the sample program:
Figure 3: Abstract Syntax Tree of Sample Program
4
3.3 Intermediate Code Generator
For all the assignments in this course, the intermediate representation (IR) of the compiler should be
the LLVM IR. There are mainly two reasons why LLVM IR is chosen. One is that LLVM IR can take
advantage of the powerful LLVM optimizer and make up for the missing backend part of compiler in
this course. The other is that LLVM IR can be easier translated into assembly code for any target
machine, no matter MIPS, x86 64, ARM, or RISC-V. This functionality can make our compiler more
compatible to machines with different architecture. The following shows the LLVM IR generated for
the sample Micro program:
; Declare printf
declare i32 @printf (i8 ∗, ...)
; Declare scanf
declare i32 @scanf(i8 ∗, ...)
define i32 @main() {
% ptr0 = alloca i32
store i32 10, i32∗ % ptr0
%A = load i32, i32∗ % ptr0
% 1 = add i32 %A, 20
store i32 % 1, i32∗ % ptr0
%B = load i32, i32∗ % ptr0
% scanf format0 = alloca [4 x i8 ]
store [4 x i8 ] c”%d\0A\00”, [4 x i8]∗ % scanf format0
% scanf str0 = getelementptr [4 x i8 ], [4 x i8]∗ % scanf format0, i32 0, i32 0
call i32 (i8 ∗, ...) @printf (i8∗ % scanf str0 , i32 %B)
ret i32 0
}
3.4 Bonus (Extra Credits 10%)
If you are interested and want to make your compiler better, you may try the following options:
• Add robust syntax error report
• Add a symbol table to make your compiler more complete
• Generate LLVM IR with LLVM C/C++ API instead of simply generating the string
• Optimize the LLVM IR generation plan for more efficient IR
• Any other you can think about...
4 Submission and Grading
4.1 Grading Scheme
• Scanner: 20%
• Parser: 40% (20% for parse tree generator and 20% for AST generation)
• Intermediate Code Generator: 30%
We have prepared 10 test cases, and the points for each section will be graded according to the
number of testcases you passed.
• Technical Report: 10%
If your report properly covers the three required aspects and the format is clean, you will get 10
points.
5
• Bonus: 10%
Refer to section 3.4 for more details. The grading of this part will be very flexible and highly
depend on the TA’s own judgement. Please specify clearly what you have done for the bonus
part so that he do not miss anything.
4.2 Submission with Source Code
If you want to submit source C/C++ program that is executable in our RISC-V docker container,
your submission should look like:
csc4180−a1−118010200.zip
|−
|−−− csc4180−a1−118010200−report.pdf
|−
|−−− testcases
|−
|−−− src
|−
|−−−Makefile
|−−−ir generator.cpp
|−−−ir generator.hpp
|−−−node.cpp
|−−−node.hpp
|−−−scanner.l
|−−−parser.y
|−−−Other possible files
4.3 Submission with Docker
If you want to submit your program in a docker, your submission should look like:
csc4180−a1−118010200.zip
|−
|−−− csc4180−a1−118010200.Dockerfile
|−
|−−− csc4180−a1−118010200−report.pdf
|−
|−−− src
|−
|−−−Makefile
|−
|−−−run compiler.sh
|−
|−−−Your Code Files
4.4 Technical Report
Please answer the following questions in your report:
• How to execute your compiler to get expected output?
• How do you design the Scanner?
• How do you design the Parser?
• How do you design the Intermediate Code Generator?
• Some other things you have done in this assignment?
The report doesn’t need to be very long, but the format should be clean. As long as the three questions
are clearly answered and the format is OK, your report will get full mark.
6
联系我们
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-21:00
微信:codinghelp
热点文章
更多
program讲解 、辅导 c++,java...
2024-05-15
program讲解 、辅导 c/c++,ja...
2024-05-15
辅导 cpt206、java程序设计讲解...
2024-05-15
讲解 infs 2042、java程序设计...
2024-05-15
辅导 csci-ga、讲解 java编程语...
2024-05-15
program辅导 、讲解 matlab语言...
2024-05-15
讲解 econ20120 homework 1讲解...
2024-05-15
辅导 bpln0025: business case...
2024-05-15
讲解 600543 current issues i...
2024-05-15
辅导 lh physical iiib / 03 3...
2024-05-15
讲解 ecs776p image processin...
2024-05-15
讲解 bit233: network design ...
2024-05-15
辅导 eecs 111: system softwa...
2024-05-15
辅导 ematm0067 text analytic...
2024-05-15
讲解 projectile motion assig...
2024-05-15
辅导 finance 251: financial ...
2024-05-15
讲解 acc202 literature revie...
2024-05-15
讲解 econ20120 homework 2讲解...
2024-05-15
讲解 instruction for ca3 gro...
2024-05-15
讲解 comp814 text mining讲解...
2024-05-15
热点标签
engi4547
econ1048
fit1008
7033mkt
ec2066
cct380h5f
man00019m
fit9137
n1542
csc4140
math6119
comp1710
mech265001
fin2020
eengm2510
fina864
csys5020
busi4412
math5007
2702ict
econ2101
econ0051
dts204tc
fit2004
fit3152
cosc2673
mec208
ecmt2150
comp2003j
econ20120
cpt304
bff3121–
comu7000
stat6118
comp814
acc202
ematm0067
bit233
ecs776p
600543
bpln0025
comp3400
econ7030
159.342 ‐ operating
mang6134
math1005/math6005
geog5404m
comp1710/6780
infs 2042
inf6028
bman30702
math0002
msci242l
mgt11001
com00177m
bman71282
fit2001
cpt210
159.341
econ7310
comp3221
comp10002
cpt206
ecmt1010
finm081
econ2005
cpt202
fit3094
socs0030
data7201
data2x01
mn-3507
mat246h1
ib2d90
ib3j80
acc207
comp90007
compx518-24a
fit1050
info1111
acct2201
buad801
compsci369
cse 332s
info1110
math1033
scie1000
eeee2057
math4063
cmt219
econ5074
eng5009
csse2310/csse7231
ec333
econ0001
cpt204
elec4630
ma117
dts104tc
comp2017
640481
csit128
eco000109m
finc5090
ggr202h5f
nbs8295
4ssmn902
chc6171
dsa1002
ebu6304
comp1021
csci-ua.202
com6511
ma416
mec206
iom209
bism7202
idepg001
comp1212
ecom209
cpt106
math1062
mn-3526
fnce3000
fmhu5002
psyc10003
fina2222
be631-6-sp/1
finc2011
37989
5aaob204
citx1401
econ0028
bsan3204
comp9123
cmt218
itp122
qbus6820
ecmt1020
bus0117
soft3202/comp9202
basc0057
mecm30013
aem4060
acb1120
comp2123
econ2151
ecmt6006
inmr77
com 5140
ocmp5328
comp1039
had7002h
cmt309
asb-3715
elec373
cpt204-2324
be631-6-sp
econ3016
mast10007
buss6002
comp4403
comp30023
finm1416
csc-30002
6qqmn971
fin668
mnfg309
inft2031
cits1402
comp2011
eecs 3221
ebu4201
ct60a9600
com336
8pro102
econ7300
comp8410
comp3425
comp222
finm8007
comp2006
comp26020
comp1721
eeen3007j
cis432
csci251
comp5125m
com398sust
finm7405
econ7021
fin600
infs4205/7205
mktg2510-
32022
mth6158
comp328
finn41615
2024
mec302
联系我们
- QQ: 99515681 微信:codinghelp
© 2024
www.7daixie.com
站长地图
程序辅导网!