CSE 320 hw2

3/2/2019 README.md · master · CSE 320 / hw2-doc · GitLab

Bring in assignment documents from the development repo.

Gene Stark authored 2 weeks ago

26d64187

In this assignment you are tasked with updating an old piece of software, making sure it compiles, and works properly in your VM environment.

Maintaining old code is a chore and an often hated part of software engineering. It is definitely one of the aspects which are seldom discussed or

thought about by aspiring computer science students. However, it is prevalent throughout industry and a worthwhile skill to learn. Of course, this

homework will not give you a remotely realistic experience in maintaining legacy code or code left behind by previous engineers but it still

provides a small taste of what the experience may be like. You are to take on the role of an engineer whose supervisor has asked you to correct

all the errors in the program, plus add additional functionality.

By completing this homework you should become more familiar with the C programming language and develop an understanding of:

How to use tools such as gdb and valgrind for debugging C code.

Modifying existing C code.

C memory management and pointers.

Working with files and the C standard I/O library.

Your goal will be to debug and extend an old version of a program called ged2html , which I (Prof. Stark) wrote in 1994-1995. The original version

ran on BSD Unix systems and MS-DOS. The version I am handing out is very close to the original version, except that I have made a few changes

for this assignment. Specifically, I introduced a few bugs here and there to make things more interesting and educational for you 😉, and I

updated a few things myself that would have been tedious and not very educational to ask you to do. Aside from these changes and the

introduced bugs, which only involve a few lines here and there, the code is identical to the original, functioning version. There is also a bug or two

that were undetected in the original program, but which I detected while working on this assignment, and I have left them for you to find.

The purpose of the ged2html program is to help people who are interested in researching their family history present their results on the Web as

a set of interlinked HTML files. The doc directory included with the assignment basecode contains the original documentation files that were

distributed with the program. As its input, the program reads a "GEDCOM" file that contains data on (among other things) sets of "individuals"

and "families". GEDCOM, which stands for "GEnealogical Data Communication", is a file format developed by the Family History Department of

the Church of Jesus Christ of Latter-day Saints (the "Mormons") to allow genealogical information to be shared between software applications.

The ged2html program builds a data structure that represents the information contained in the GEDCOM file, and then it traverses this data

structure to create a set of interlinked HTML output files that can be viewed with a Web browser.

Though some additional details on the functioning of the ged2html program will be given in a later section, you are not expected to have to fully

understand what it does or how it works in order to be able to complete the assignment. This is pretty realistic as far as working on legacy

software is concerned: often one would be given code that needs to be updated, ported, or have some bugs fixed, and this has to be done

without (at least initially) having a full understanding of structure and function of the code. In this kind of situation, you have to be careful not to

make arbitrary changes to the code that might impact things that you don't fully understand. Limit your changes to the minimum necessary to

achieve the specified objectives.

Perhaps the easiest way to get an idea of what the program is supposed to do is to look at some sample input and output. The tests/rsrc

directory contains some sample GEDCOM files (which end with the .ged extension). One of these is royal92.ged , which is a 30682-line

GEDCOM file containing information on 3010 individuals and 1422 families of European royalty. You can read it into a text editor to get an idea of

what it looks like. The following link: royal92/INDEX.html will take you to the HTML files produced by running the command:

bin/ged2html -i tests/rsrc/royal92.ged

You can browse these HTML files (and view their source, if you like) to get an idea of the output.

Fetch base code for hw2 as you did for the previous assignments. You can find it at this link: https://gitlab02.cs.stonybrook.edu/cse320/hw2.

 README.md 23.1 KB

Homework 2 Debugging and Fixing - CSE 320 - Fall 2019

Professor Eugene Stark

Due Date: Friday 3/8/2019 @ 11:59pm

Introduction

The Existing Program

Getting Started - Obtain the Base Code

3/2/2019 README.md · master · CSE 320 / hw2-doc · GitLab

https://gitlab02.cs.stonybrook.edu/cse320/hw2-doc/blob/master/README.md 2/6

Once again, to avoid a merge conflict with respect to the file .gitlab-ci.yml , use the following command to merge the commits:

git merge -m "Merging HW2_CODE" HW2_CODE/master --strategy-option=theirs

Here is the structure of the base code:

hw2

├── doc

│ ├── gedstand.t82

│ ├── MSDOS

│ ├── OPTIONS

│ └── README

├── include

│ ├── database.h

│ ├── index.h

│ ├── node.h

│ ├── output.h

│ ├── read.h

│ └── tags.h

├── Makefile

├── rsrc

│ ├── index.tpl

│ └── indiv.tpl

├── src

│ ├── database.c

│ ├── index.c

│ ├── main.c

│ ├── output.c

│ ├── read.c

│ └── tags.c

└── tests

├── ged2html_tests.c

└── rsrc

├── garbage.ged

├── royal92.ged

├── testall

│ ├── INDEX.html

│ ├── PERSON1.html

│ ├── PERSON2.html

│ ├── PERSON3.html

│ ├── PERSON4.html

│ ├── PERSON5.html

│ ├── PERSON6.html

│ ├── PERSON7.html

│ └── PERSON8.html

└── testall.ged

Before you begin work on this assignment, you should read the rest of this document. In addition, we additionally advise you to read the

Debugging Document.

The command line arguments and expected operation of the program are described by the following "usage" message, which is printed within

main() in main.c :

Usage: bin/ged2html [-Hciv][-d ][-s ...][-u ][-f ][-t ][-T ] [-- ...]

-v Print version information.

-c Disable automatic capitalization of surnames.

-d max_per_directory Specify number of individuals per subdirectory

(0 means no subdirectories)

-i Cause an index file to be generated containing

all the individuals in the input.

-s individuals ... Limit the production of output files to a specified

set of zero or more selected individuals.

-u url_template Specify a template string for the URL's used

in anchors in the output files (default '%s.html').

-f file_template Specify a template string for the names of the

HTML files (default '%s.html').

-t individual_template Specify an HTML template file for individuals.

-T index_template Specify an HTML template file for the index.

-H Print a brief message listing the available options.

Part 1: Debugging and Fixing

3/2/2019 README.md · master · CSE 320 / hw2-doc · GitLab

https://gitlab02.cs.stonybrook.edu/cse320/hw2-doc/blob/master/README.md 3/6

The -H and -v options cause the program to print a message and terminate without processing any input. If neither of those options is given,

then the program performs its normal function of translating GEDCOM to HTML. The other options can be used to modify the behavior of the

program in various ways.

Under Linux (you only need to worry about making the program compile and run under Linux, though you will see some #ifdef ed code related

to MS-DOS) option processing in main() is performed with the help of the GNU getopt library. This library supports a flexible syntax for

command-line arguments, including support for traditional single-character options (prefixed by '-') and "long-form" options (prefixed by '--'),

which need not be single characters. The library also takes care of some of the "grunt work" in parsing option arguments and producing error

messages.

You can modify anything you want in the assignment, but limit your changes to the minimum necessary to restore functionality to the program.

Assume that the program is essentially correct -- it just has a few lingering bugs that need to be fixed.

Complete the following steps:

1. Clean up the code; fixing any compilation issues, so that it compiles without error using the compiler options that have been set for you in

the Makefile .

2. Fix bugs. To get started, try running bin/ged2html royal92.ged . The program will crash. Track down and fix the problem. Repeat until the

program runs without crashing. Check the functionality of various option settings and try the other data files provided. You should also use

the provided Criterion unit tests to help point the way.

3. Use valgrind to identify any memory leaks or other memory access errors. Fix any errors you find.

Run valgrind using the following command:

valgrind --leak-check=full --show-leak-kinds=all [GED2HTML PROGRAM AND ARGS]

Note that the bugs that are present will all manifest themselves in some way either as program crashes or as memory errors that can be detected

by valgrind . It is not necessary to go hunting for obscure issues with the program output. Also, do not make gratuitous changes to the program

output, as this will interfere with our ability to test your code.

😱 You are NOT allowed to share or post on PIAZZA solutions to the bugs in this program, as this defeats the point of the assignment.

You may provide small hints in the right direction, but nothing more.

The original program uses getopt() to do options processing, but it only understands traditional, single-character options. For this part of the

assignment, rewrite the options processing so that it uses getopt_long() instead of getopt() , and it understands the following alternative

forms for the various options (as well as the original short forms):

--help as equivalent to -H

--version as equivalent to -v

--index as equivalent to -i

--select as equivalent to -s

--no-surname-caps as equivalent to -c

--individual-template as equivalent to -t

--index-template as equivalent to -T

--files-per-directory as equivalent to -d

--url-template as equivalent to -u

--filename-template as equivalent to -f

You will probably need to read the Linux "man page" on the getopt package. This can be accessed via the command man 3 getopt . If you need

further information, search for "GNU getopt documentation" on the Web.

😱 You MUST use the getopt_long() function to process the command line arguments passed to the program. Your program should

be able to handle cases where the (non-positional) flags are passed IN ANY order.

In addition, add the following new option, which has only a long form:

--change-directory

This option takes a single argument, which is assumed to be the name of an existing directory. If this option has been specified, then instead

of creating the HTML output files in the current directory, before the output files are produced, the program should use the chdir() system

call to change its current working directory to be the specified directory. The result will be that the HTML files are created in the specified

directory. If the chdir() call fails, then instead of producing output files, an error message should be printed and the program should exit

with status EXIT_FAILURE .

Part 2: Adding Features

Unit Testing

3/2/2019 README.md · master · CSE 320 / hw2-doc · GitLab

https://gitlab02.cs.stonybrook.edu/cse320/hw2-doc/blob/master/README.md 4/6

For this assignment, you have been provided with a basic set of Criterion unit tests to help you debug the program. We encourage you to write

your own as well as it can help to quickly test inputs to and outputs from functions in isolation.

In the tests/ged2html_tests.c file, there are four unit test examples. You can run these with the bin/ged2html_tests command. The tests have

been constructed so that they will point you at most of the problems with the program. They are not true "unit tests", because they all run the

program as a black box using system() ; the ged2html program was unfortunately not designed in a way that makes it very conducive to using

unit testing on individual program modules. Each test has one or more assertions to make sure that the code functions properly. If there was a

problem before an assertion, such as a SEGFAULT, the test will print the error to the screen and continue to run the rest of the tests.

To obtain more information about each test run, you can use the verbose print option: bin/ged2html_tests --verbose=1 .

One of the tests uses valgrind to verify that no memory errors are found. If errors are found, then you can look at the valgrind.out file that is

left behind by the test code. Alternatively, you can better control the information that valgrind provides if you run it manually.

You may write more of your own tests if you wish. Criterion documentation for writing your own tests can be found here. It would be a good idea

to have tests that exercise non-default program options to make sure that the program does not crash when they are used.

Ensure that all files you expect to be on your remote repository are pushed prior to submission.

This homework's tag is: hw2

$ git submit hw2

This section provides some additional details on the GEDCOM format and how it is is processed by the ged2html program. Though it should not

be necessary for this assignment, if you are interested in many more details about GEDCOM you can refer to the file doc/gedstand.t82 that is

included with the basecode.

A GEDCOM file consists of a sequence of lines of input, each of which starts with a level number, followed by an optional cross-reference ID, then

a keyword called a tag, and finally data associated with that tag. Lines with level number 0 introduce records, and lines with level numbers higher

than zero define fields or substructures of the record to which they belong. Each record starts with a level-0 line and continues up to either the

end of the file or the next level-0 line. Level numbers are permitted to decrease arbitrarily or stay the same from one line to the next, but if they

increase, they are only allowed to do so by an increment of one.

For example, the tests/rsrc/royal92.ged file starts as follows:

0 HEAD

1 SOUR PAF 2.2

1 DEST PAF

1 DATE 20 NOV 1992

1 FILE ROYALS.GED

1 CHAR ANSEL

0 @S1@ SUBM

1 NAME Denis R. Reid

1 ADDR 149 Kimrose Lane

2 CONT Broadview Heights, Ohio 44147-1258

2 CONT Internet Email address: ah189@cleveland.freenet.edu

1 PHON (216) 237-5364

1 COMM >> In a message to Cliff Manis (cmanis@csoftec.csf.com)

2 CONT >> Denis Reid wrote the following:

2 CONT >> Date: Fri, 25 Dec 92 14:12:32 -0500

The first line has level number 0, has no cross-reference ID, has tag HEAD , and has no additional data. This line introduces the beginning of a

header record. Every GEDCOM starts out with a header record in this way. The subsequent lines with level number 1 define fields of the header

record. The next record has cross reference ID S1 (enclosed in @ @ ) and has tag SUBM , which defines a submitter record. A submitter record

gives information about the creator or submitter of the GEDCOM file.

Later on in the file you will see:

0 @I1@ INDI

1 NAME Victoria /Hanover/

1 TITL Queen of England

1 SEX F

1 BIRT

2 DATE 24 MAY 1819

2 PLAC Kensington,Palace,London,England

1 DEAT

2 DATE 22 JAN 1901

2 PLAC Osborne House,Isle of Wight,England

Hand-in Instructions

Additional Information

3/2/2019 README.md · master · CSE 320 / hw2-doc · GitLab

https://gitlab02.cs.stonybrook.edu/cse320/hw2-doc/blob/master/README.md 5/6

1 BURI

2 PLAC Royal Mausoleum,Frogmore,Berkshire,England

1 REFN 1

1 FAMS @F1@

1 FAMC @F42@

The tag INDI defines an individual record, which gives information about an individual person. The cross-reference ID I1 serves as a unique

identifier within this GEDCOM for this individual, and it allows this record to be linked to by other records. This INDI record has level-1

substructures introduced by BIRT and DEAT , which give information about birth and death events for this individual. For example, at level 2

within the BIRT substructure are fields that give the date and place of birth. The FAMS and FAMC tags define links to family records related to

this individual record. The FAMS record defines a spouse family link, which points to a family in which this individual is a spouse (husband or wife).

The FAMC record defines a child family link, which points to a family in which this individual is a child.

Still later in the file, you will see examples of family records:

0 @F1@ FAM

1 HUSB @I2@

1 WIFE @I1@

1 CHIL @I3@

1 CHIL @I4@

1 CHIL @I5@

1 CHIL @I6@

1 CHIL @I7@

1 CHIL @I8@

1 CHIL @I9@

1 CHIL @I10@

1 CHIL @I11@

1 DIV N

1 MARR

2 DATE 10 FEB 1840

2 PLAC Chapel Royal,St. James Palace,England

This record defines a family with unique identifier F1 , in which the husband was the individual with identifier I2 , the wife ws the individual with

identifier I1 , and there were children with identifiers I3 , I4 , and so on. The MARR substructure at level 1 gives information about the event of

marriage by which this family was created.

The end of a GEDCOM file is indicated by a trailer record, which looks simply as follows:

0 TRLR

The ged2html program reads a GEDCOM file line-by-line and builds a data structure. For each line read, a node is created to contain the

information from that line:

typedef struct node {

long lineno; /* Source line number */

int level; /* Level number at which line appears */

char *xref; /* Cross-reference ID */

struct tag *tag; /* tag structure for tag */

char *line; /* The whole line of input */

char *rest; /* The rest of the line after the tag */

void *hook; /* Hook to corresponding database node */

struct node *children; /* List of subsidiary nodes */

struct node *siblings; /* List of siblings */

} NODE;

The siblings pointer in a node structure is used to chain together nodes at the same level within the "parent" enclosing record or structure

having a level number of one less. The children pointer in a node structure is used to point to the immediately following node, if that node has

a level number that is one greater than that of the current node. The effect of these pointers is to organize the nodes into a tree as defined by

their level numbers. So for example, if we have a pointer to the INDI record I1 shown above, then we could traverse the list of substructures at

level 1 by first following the children pointer and then following siblings pointers until NULL is reached. As the GEDCOM lines are read and

the tree of nodes is constructed, the cross-reference IDs encountered for level 0 records are entered into a hash table, together with pointers to

the corresponding nodes. This will enable the nodes to be looked up by ID in later phases.

Once a complete GEDCOM has been read, the list of all level 0 records is traversed by starting from the header record and following siblings

pointers. For each record, a structure is allocated of a type that depends on the records tag, and various fields of this structure are filled in using

the sub-nodes of that record. For example, when a record with tag INDI is encountered, a structure of type individual_record is allocated (see

the database.h header file for definitions of all the various structures that may be created):

struct individual_record {

int serial;

char *xref;

struct name_structure *personal_name;

3/2/2019 README.md · master · CSE 320 / hw2-doc · GitLab

https://gitlab02.cs.stonybrook.edu/cse320/hw2-doc/blob/master/README.md 6/6

char *title;

char sex;

char *refn;

char *rfn;

char *afn;

struct xref *fams, *lastfams;

struct xref *famc, *lastfamc;

struct xref *sources, *lastsource;

struct note_structure *notes, *lastnote;

struct event_structure *events, *lastevent;

struct individual_record *next;

};

The children of the INDI node are processed recursively, and the results of processing are used to fill in the fields of the individual_record

structure. Some child nodes, such as a node with tag SEX , just result in simple values to be stored in a field. Other child nodes, such as those with

tag BIRT and DEAT , are process to produce sub-structures that are linked together in lists. For example, BIRT and DEAT are examples of

individual events, which yield structures of type struct event_structure that are linked into a list whose head is pointed to by events and

whose tail is pointed to by lastevent . Tags such as FAMS and FAMC yield structures of type struct xref , which are linked into the

corresponding list. The definition of struct xref is as follows:

struct xref {

char *id;

union {

struct individual_record *individual;

struct family_record *family;

struct source_record *source;

} pointer;

struct xref *next;

};

The purpose of a struct xref is to provide a link to a level-0 record. The id field contains a pointer to a string that gives the cross-reference ID

of the pointed-at record. The next field allows the struct xref structures to be chained into a list within an enclosing structure. The

individual , family , or source pointers are initially NULL, but they will get filled in during the next phase.

When all of the level-0 records have been processed and the corresponding structures have been filled in, the data is traversed again in a

"linking" phase, whose purpose is to fill in all the struct xref structures. This is done by using the hash table to look up the identifier stored in

the id field of the struct xref , obtaining a pointer to the corresponding level-0 record, and storing this pointer in one of the fields of the

union { ... } pointer in the struct xref . This will enable, for example, links from individuals to families, and families to individuals, to be

followed conveniently and efficiently during the output phase.

Once all the linking has been done, the output phase can begin. For each level-0 record, HTML code is created and output to a file. The HTML

code is created by following a template, which is essentially a program, in a very ugly programming language, that describes how to emit HTML

code that is conditionalized on the content of the GEDCOM. The file output.c contains the interpreter for this ugly programming language. In

the simplest output model, the HTML code for each level-0 record is output to a separate file, which is named after the cross-reference ID for that

record. There are also provisions for outputting the HTML code for multiple records to a single file, and for placing files in subdirectories. These

options were provided because many systems at the time could not efficiently store very large numbers of small files, or very large numbers of

files in a single directory.

Besides the HTML created from the level-0 records, there is also a provision for creating HTML for a top-level index, which can be browsed to

identify and navigate to particular records of interest.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

辅导 comm2000 creating socia... 2026-01-08
讲解 isen1000 – introductio... 2026-01-08
讲解 cme213 radix sort讲解 c... 2026-01-08
辅导 csc370 database讲解迭代 2026-01-08
讲解 ca2401 a list of colleg... 2026-01-08
讲解 nfe2140 midi scale play... 2026-01-08
讲解 ca2401 the universal li... 2026-01-08
辅导 engg7302 advanced compu... 2026-01-08
辅导 comp331/557 – class te... 2026-01-08
讲解 soft2412 comp9412 exam辅... 2026-01-08
讲解 scenario # 1 honesty讲解... 2026-01-08
讲解 002499 accounting infor... 2026-01-08
讲解 comp9313 2021t3 project... 2026-01-08
讲解 stat1201 analysis of sc... 2026-01-08
辅导 stat5611: statistical m... 2026-01-08
辅导 mth2010-mth2015 - multi... 2026-01-08
辅导 eeet2387 switched mode ... 2026-01-08
讲解 an online payment servi... 2026-01-08
讲解 textfilter辅导 r语言 2026-01-08
讲解 rutgers ece 434 linux o... 2026-01-08

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！