MIPS,MIPSMIPS。
Introduction
In this assignment, you will write a command line utility to translate MIPS machine code between binary and human-readable mnemonic form. The goal of this homework is to familiarize yourself with C programming, with a focus on input/output, strings in C, and the use of pointers.
You MUST write your helper functions in a file separate from main.c. The main.c file MUST ONLY contain #includes, local #defines and the main function. This is the only requirement for project structure. Beyond this, you may have as many or as few additional .c files in the src directory as you wish. Also, you may declare as many or as few headers as you wish. In this document, we use hw1.c as our example file containing helper functions.
Getting Started
Fetch base code for hw1 as described in hw0.
Both repos will probably have a file named .gitlab-ci.yml with different contents. Simply merging these files will cause a merge conflict. To avoid this, we will merge the repos using a flag so that the .gitlab-ci.yml found in the hw1 repo will be the file that is preserved. To merge, use this command:
git merge -m quot;Merging HW1_CODEquot; HW1_CODE/master --strategy-option theirs
Note: All commands from here on are assumed to be run from the hw1 directory.
A Note about Program Output
What a program does and does not print is VERY important. In the UNIX world stringing together programs with piping and scripting is commonplace. Although combining programs in this way is extremely powerful, it means that each program must not print extraneous output. For example, you would expect ls to output a list of files in a directory and nothing else. Similarly, your program must follow the specifications for normal operation. One part of our grading of this assignment will be to check whether your program produces EXACTLY the specified output. If your program produces output that deviates from the specifications, even in a minor way, or if it produces extraneous output that was not part of the specifications, it will adversely impact your grade in a significant way, so pay close attention.
Use the debug macro debug (described in the 320 reference document in the Piazza resources section) for any other program output or messages you many need while coding (e.g. debugging output).
Part 1: Program Operation and Argument Validation
In this part, you will write a function to validate the arguments passed to your program via the command line. Your program will support the following flags:
- If no flags are provided, you will display the usage and return with an EXIT_FAILURE return code
- If the -h flag is provided, you will display the usage for the program and exit with an EXIT_SUCCESS return code
- If the -a flag is provided, you will perform text-to-binary conversion (i.e. “assembly”), reading text from stdin and writing binary to stdout.
- If the -d flag is provided, you will perform binary-to-text conversion (i.e. “disassembly”), reading binary from stdin and writing text to stdout.
- The -a and -d flags are not allowed to be used in combination with each other
- EXIT_SUCCESS and EXIT_FAILURE are macros defined in
lt;stdlib.hgt; which represent success and failure return codes respectively. - stdin, stdout, and stderr are special files that are opened upon execution for all programs and do not need to be reopened.
Some of these operations will also need other command line arguments which are described in each part of the assignment. The two usages for this program are:
usage: ./hw1 -h [any other number or type of arguments] usage: bin/hw1 [-h] -a|-d [-b BASEADDR] [-e ENDIANNESS] -a Assemble: convert mnemonics to binary code -d Disassemble: convert binary code to mnemonics Additional parameters: [-b BASEADDR] [-e ENDIANNESS] -b BASEADDR is the starting memory address for the code It must be a hexadecimal number of 8 digits or less -e ENDIANNESS specifies the byte order of the binary code It must be a single character: b for big-endian, or l for little-endian -h Display this help menu.
A valid invocation of the program implies that the following hold about the command-line arguments:
- All positional arguments (-a|-d) come before any optional arguments (-b and -e). The optional arguments may come in any order after the positional ones.
- If the -h flag is provided, it is the first positional argument after the program executable.
- If an option requires a parameter, the corresponding parameter must be provided (e.g. -e must always be followed by an ENDIANNESS specification).
- If -b is given, the BASEADDR argument will be given as a hexadecimal number in which in addition to the digits (‘0’-‘9) either upper-case letters (‘A’-‘F’) or lower-case letters (‘a’-‘f’) may be used, in any combination.
- If -e is given, then the ENDIANNESS argument will be a single word (i.e. will have no whitespace).
- You may only use argc and argv for argument parsing and validation. Using any libraries that parse command line arguments (e.g. getopt) is prohibited.
- Any libraries that help you parse strings are prohibited as well (string.h, ctype.h, etc). This is intentional and will help you practice parsing strings and manipulate pointers.
- You MAY NOT use dynamic memory allocation in this assignment (i.e. malloc, realloc, calloc, mmap, etc)
For example, the following are a subset of the possible valid argument combinations:
- $ bin/hw1 -h …
- $ bin/hw1 -a
- $ bin/hw1 -a -e b
- $ bin/hw1 -d -b D000d000 -e l
Some examples of invalid orderings would be:
- $ bin/hw1 -e b -d
- $ bin/hw1 -b D000d000 -a -e b
- The … means that all arguments, if any, are to be ignored; e.g. the usage bin/hw1 -h -a -b D00D000 -e b is equivalent to bin/hw1 -h
NOTE: The makefile compiles the hw1 executable into the bin folder. Assume all commands in this doc are run from from the hw1 directory of your repo.
Required Validate Arguments Function
In const.h, you will find the following function prototype (function declaration) already declared for you. You MUST implement this function as part of the assignment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| * @brief Validates command line arguments passed to the program. * @details This function will validate all the arguments passed to the * program, returning 1 if validation succeeds and 0 if validation fails. * Upon successful return, the selected program options will be set in the * global variable "global\_options", where they will be accessible * elsewhere in the program. * * @param argc The number of arguments passed to the program from the CLI. * @param argv The argument strings passed to the program from the CLI. * @return 1 if validation succeeds and 0 if validation fails. * Refer to the homework document for the effects of this function on * global variables. * @modifies global variable "global\_options" to contain a bitmap representing * the selected options. */ int validargs(int argc, char **argv);
|
This function must be implemented as specified as it will be tested and graded independently. It should always return – the USAGE macro should never be called from validargs.
The validargs function should return 0 if there is any form of failure. This includes, but is not limited to:
- Invalid number of arguments (too few or too many)
- Invalid ordering of arguments
- A missing parameter to an option that requires one (e.g. -e with no ENDIANNESS specification).
- Invalid base address (if one is specified). A base address is invalid if it contains characters other than the digits (‘0’-‘9), upper-case letters (‘A’-‘F’), and lower-case letters (‘a’-‘f’), if it is more than 8 digits in length, or if it is not a multiple of 4096 (i.e. the twelve least-significant bits of its value are not all zero).
- Invalid endianness (if one is specified). An endiannness is invalid if either it does not consist of a single character or that single character is not either ‘b’ or ‘l’.
The global_options variable of type unsigned int is used to record the mode of operation (i.e. assemble/disassemble) of the program, as well as any selected flags and base address. This is done as follows:
- If the -h flag is specified, the least significant bit is 1
- The second least significant bit is 0 if -a is passed (i.e. the user wants assembly mode) and 1 if -d is passed (i.e. the user wants disassembly mode)
- The third least signficant bit is 1 if -e b is passed (i.e. the user wants big-endian byte ordering) and 0 otherwise.
- If the -b option was specified, then the base address is given by taking the value of global_options and clearing the 12 least significant bits. If the -b option was not specified, then the 20 most significant bits of global_options should all be 0 (i.e. the default base address is 0).
If validargs returns 0 indicating failure, your program must print USAGE(program_name, return_code) and return EXIT_FAILURE. Once again, validargs must always return, and therefore it must not call the USAGE(program_name, return_code) macro itself. That should be done in main.
If validargs sets the least significant bit of global_options to 1 (i.e. the -h flag was passed), your program must print USAGE(program_name, return_code) and return EXIT_SUCCESS.
- The USAGE(program_name, return_code) macro is already defined for you in const.h.
If validargs returns 1 and the least significant bit of global_options is 0, your program must perform assembly or disassembly accordingly and return EXIT_SUCCESS upon successful completion, or EXIT_FAILURE in case of an error.
If -b is provided, you must check to confirm that the specified base address is valid.
If -e is provided, you must check that the specified endianness is either the single character b or the single character l.
- Remember EXIT_SUCCESS and EXIT_FAILURE are defined in [stdlib.h]. Also note, EXIT_SUCCESS is 0 and EXIT_FAILURE is 1.
- We suggest that you create functions for each of the operations defined in this document. Writing modular code will help you isolate and fix problems.
Sample validargs Execution
The following are examples of global_options settings for given inputs. Each input is a bash command that can be used to run the program. In the examples, all don’t care bits (bits 3-11, where the least significant bit is numbered 0 and the most significant bit is numbered 31) have been set to 0.
- Input: bin/hw1 -h. Setting: 0x1 (help bit is set. All other bits are don’t cares.)
- Input: bin/hw1 -d. Setting: 0x2 (disassemble bit is set).
- Input: bin/hw1 -d -e b. Setting: 0x6 (disassemble and big endian bits are set).
- Input: bin/hw1 -d -e b -b BaB000. Setting: 0xBAB006 (disassemble and big endian bits are set, base address is 0xBAB000).
- Input: bin/hw1 -e b -d -b BaB000. Setting: 0x0. This is an error case because the argument ordering is invalid (-e is before -d). In this case validargs returns 0, leaving global_options unset.
Presumably you learned something about the MIPS process and its instruction set in CSE 220. If you need to, review the materials used for that course. You might also find useful information via this link or this one. Below we summarize the information about the MIPS instruction format that will be needed to do the assignment.
Each MIPS instruction consists of one 32-bit word. We will number the bits from 0 (least significant bit) to 31 (most significant bit) and we will think of bit 31 as being “leftmost”. To indicate a particular bit field from the instruction word we will use a notation like 31:26, which indicates bits 31 down to 26; that is, the 6 “leftmost”, or most significant bits.
In every MIPS instruction, bit field 31:26 is used as a 6-bit opcode. Most instructions are directly identified by one of the 64 possible values of this field, but as we will see there are some special cases. There are three types of MIPS instructions: R, I, and J. Instructions of type R take up to three registers as arguments. Instructions of type I take up to two registers and a 16-bit immediate value (obtained from the 16 least significant bits of the instruction word). Instructions of type J take a jump target from the 26 least signficant bits of the instruction word. The MIPS processor has 32 registers, which means that it takes 5 bits to specify a register. The registers are specified by the contents of bit fields 25:21 (called RS), 20:16 (called RT), and 15:11 (called RD), or, in some cases, bit field 10:6.
In the files instruction.h and instr_table.c you have been provided with a set of tables that can be used to decode MIPS binary instruction words. Rather than going through full details of the MIPS instruction format, we will just go through the procedure for decoding an instruction using the tables. The type Opcode is an enumerated type that assigns to integer values in the range 0 to 63 the names of MIPS instructions, and in addition defines names for three additional values SPECIAL (64), BCOND (65), and ILLEGL (66). Opcode values in the range 0 to 63 serve as indices into the instruction table instrTable. Each entry in this table uniquely identifies a particular type of MIPS instruction and provides further information about it. Our first objective in decoding an instruction is to determine the proper Opcode value (in the discussion below we refer to this as “the Opcode”), thereby obtaining access to the proper entry from the instruction table.
The starting point for obtaining the Opcode is the value in bits 31:26 of the instruction word. This value is used as an index into opcodeTable and the value (of type Opcode) at that index in the table is retrieved.
- If the value obtained from opcodeTable is neither SPECIAL nor BCOND, then it is the Opcode.
- If the value obtained from opcodeTable is SPECIAL (this occurs when the value of bits 31:26 is 000000), then the value in bits 5:0 of the instruction word is used as an index into the table specialTable to obtain the Opcode.
- If the value obtained from opcodeTable is BCOND, then the value in bits 20:16 is examined. If the value is 00000, 00001, 10000, or 10001, then the Opcode is OP_BLTZ, OP_BGEZ, OP_BLTZAL, or OP_BGEZAL, respectively, otherwise it is an error.
Having determined the Opcode, it is then used as an index into instrTable and the corresponding Instr_info structure is retrieved. What happens next depends on the value of the type field. This value can be NTYP (which occurs in a few entries of the table that do not correspond to actual instructions), RTYP, which indicates an instruction of type R, ITYP, which indicates an instruction of type I, and JTYP, which indicates an instruction of type J.
The next task is to determine the sources of the instruction arguments. For this, the information in the srcs field of the Instr_info structure is used. This field consists of an array of three values of type Source. The first entry in this array specifies the source of the first instruction argument, the second entry specifies the source of the second instruction argument, and the third entry specifies the source of the third argument. There are five possible source values: RS, RT, RD, EXTRA, and NSRC. The value RS indicates that the argument source is the register specified by the RS field of the instruction word. Similarly, the values RT and RD the argument source is the register specified by the RT or RD field of the instruction word, respectively. The value EXTRA indicates that the argument value has to be decoded from the instruction word in a way that depends on the particular type of instruction. The value NSRC is used as a place-holder value for instructions that take fewer than three arguments.
For arguments with source EXTRA, the actual argument is determined as follows:
- If the Opcode is OP_BREAK, then the argument consists of the 20-bit value in bits 25:6 of the instruction word.
- For instructions of type R, the argument consists of the 5-bit value in bits 10:6 of the instruction word.
- For instructions of type I, the argument is obtained by extracting the 16-bit value in bits 15:0, treating bit 15 as a sign bit, and performing sign-extension to a 32-bit signed integer. For non-branch instructions of type I (such as ADDI), this 32-bit signed integer is the immediate argument to the instruction.
For the conditional branch instructions BEQ, BGEZ, BGEZ, BGEZAL, BGTZ, BLEZ, BLTZ, BLTZAL, BNE, the 32-bit signed integer value is further processed by shifting it left by two bits (which amounts to multiplication by 4) and then treating it as a PC-relative branch offset. It is added to the current value of the PC register (this will be the memory address at which the instruction “lives”, plus 4) to obtain an absolute address which is the branch target.
- For instructions of type J, the argument is obtained by extracting the 26-bit value in bits 25:0 of the instruction word and treating it as an unsigned integer. This value is shifted left by two bits and then added to the value obtained from the PC by zeroing the 28 least significant bits, to obtain an absolute address that is the jump target. (As above, the PC value is given by the memory address of the instruction, plus 4.)
Example 1
The instruction word is 0x00c72820, which when written in binary is:
0000 0000 1100 0111 0010 1000 0010 0000 OOOO OOSS SSST TTTT DDDD D FF FFFF
The letters written underneath the bits indicate the various bit fields: O for the opcode field in bits 31:26, S for the RS field in bits 25:21, T for the RT field in bits 20:16, D for the RD field in bits 15:11, and F for the function code in bits 5:0. The value in bits 31:26 is 000000; i.e. 0. Using this as an index into opcodeTable yields SPECIAL, so it is then necessary to use the value in bits 5:0 as an index into specialTable. This index is 100000, or 32 in decimal, and the entry at that index is OP_ADD. The corresponding entry in instrTable indicates that the ADD instruction is of type R, and that the three arguments are given by RD, RS, and RT. The value of RD (in bits 15:11) is 00101 indicating that the first argument is register 5. The value of RS (in bits 25:21) is 00110 indicating that the second argument is register 6. The value of RT (in bits 20:16) is 00111 indicating that the third argument is register 7. So the mnemonic form of this instruction is add $5,$6,$7.
Example 2
The instruction word is 0x8cc50007, which when written in binary is:
1000 1100 1100 0101 0000 0000 0000 0111 OOOO OOSS SSST TTTT XXXX XXXX XXXX XXXX
The value in bits 31:26 is 100011, or 35. Using this as an index into opcodeTable yields OP_LW. The corresponding entry from instrTable indicates that the instruction is of type I, with first argument source RT, second argument source EXTRA, and the third argument source RS. RT is 00101 so the first argument is register 5. RS is 00110 so the third argument is register 6. The second argument is obtained from bits 15:0, which have the value 7. So the mnemonic form of this instruction is lw $5,7($6).
Example 3
The instruction word is 0x10effc1f, which when written in binary is:
0001 0000 1110 1111 1111 1100 0001 1111 OOOO OOSS SSST TTTT XXXX XXXX XXXX XXXX
The value in bits 31:26 is 000100, or 4. Using this as an index into opcodeTable yields OP_BEQ. The corresponding entry from instrTable indicates that the instruction is of type I, with first argument source RS, second argument source RT, and the third argument source EXTRA. RS is 00111 so the first argument is register 7. RT is 01111 so the third argument is register 15. The third argument is obtained from bits 15:0, which is fc1f in hex. This 16-bit value is sign-extended to the 32-bit signed value fffffc1f, which is then shifted two bits to obtain ffff f07c, or -3972 in decimal. This is the PC-relative branch offset. This offset is added to the current value of the PC (i.e. the memory address of the instruction, plus 4) to obtain the final absolute address that is the branch target. Assuming the memory addess of this instruction is 1000 in hex, or 4096 in decimal, the branch target is 4096 + 4 - 3972, or 128 in decimal. So the mnemonic form of this instruction is beq $7,$15,128.
Example 4
The instruction word is 0x08000400, which when written in binary is:
0000 1000 0000 0000 0000 0100 0000 0000 OOOO OOXX XXXX XXXX XXXX XXXX XXXX XXXX
The value in bits 31:26 is 000010, or 2. Using this as an index into opcodeTable yields OP_J. The value in bits 25:0 is 0000400, which is shifted left two bits to obtain 00001000 in hex. Assuming that the memory address of the instruction is 40000000 in hex, the PC value at the time of execution would be 40000004. Clearing the 28 least-significant bits yields 40000000, and adding this to 00001000 yields 40001000 in hex. So the mnemonic form of this instruction is j 0x40001000.
The MIPS instruction set does not support jumps to addresses whose four most-significant bits differ from those of the current PC value. Consequently, an attempt to assemble a jump instruction (i.e. J or JAL) whose target address differs in its four most-significant bits from the base address supplied with -b should be treated as an error.