Software Development: Strings and File I/O
1 The Problem
The processing of files is critical to most applications computers can be put to.
Whether searching for patterns while sequencing the genome, analysing images from
the Hubble space telescope, screening mammograms (as last week’s problem set!), or
more prosaically, storing our emails, Word documents, family photos or the databases
be-hind Facebook. It’s all just files and file Input/Output.
Your task today is to take on the role of a computer security and forensics expert
and to write a program that can analyse a log of people accessing a web site. We
have provided you with an ‘access.log’ file from a web server. Your task is to read in
each line of the file (corresponding to a ‘hit’ on a page of the web site) and to gather
some statistics by looking for patterns in this data.
More specifically, you should answer:
1. How many ‘hits’ are there in total in the log? (Hint: there is one ‘hit’ per line of
the file)
2. We think our attacker may be using the ‘Mozilla browser’, how many lines con-
tain the identifying fingerprint ‘Mozilla’?
3. We’re seeing a lot of ‘possible attacks’ coming from addresses starting with
(lines starting with) 66.249.66— can you print out every page they access (you
only need print the line unless you want to go further to test yourself )? How
many accesses do they make in total?
Optional:
How many unique addresses do we see? (hint: look at the IP address at the
start of each line— these look like ‘148.88.61.11’)
Can you tabulate how many hits from each address? (hint: you’ll need an array
for this). Optional: Use a dynamic/resizing array for this.
Based on this, should we fear hits from 66.249.66.*?
2 Resources
As part of today’s problem set you will get more familiar with how to do file processing
in C. You will need to use the following functions from #include :
FILE fopen(char filename, char mode) — Open a file with mode ‘mode’.
See man fopen, the example below (Section ??) or google it for more
details and examples!
char fgets(char str, int size , FILE stream) — Get a line from the file
(stream) and copy it into ‘str’. Take care that you’ve set ‘size’ to be large
enough to read the whole line of a ‘hit’ in the access log. A very common
error is not to declare an array big enough (and thus allocate enough
space) for the resulting data!
int fclose (FILE stream) — Close file ‘stream’. This is a file handle, not a
filename!
You may also find it useful to explore #include in more detail:
int strncmp(char s1, char s2, int n) — Compare the first ‘n’ characters of
strings ‘s1’ and ‘s2’.
char strchr(char s, int c) — Find character ‘c’ in strings ‘s’ and return a
pointer to it (or NULL if it’s not found).
char strstr (char s1, char s2); — Find string ‘s2’ in string ‘s1’ and re-turn a
pointer to it (or NULL if it’s not found).
Please also revisit this week’s lecture covering arrays and strings.
3 Example of reading a file using fopen
i n t main ( )
{
FILE myFileHandle ;
/ Open t h e f i l e ’ access . l o g ’ f o r r e a d i n g /
myFileHandle = fopen ( ”access .l o g ” , ” r ” ) ;
i f ( myFileHandle ! = NULL ) f
/ / A b i t o f space f o r a l i n e o f t e x t
char l i n e O f T e x t [ 8 0 ] ;
while ( f g e t s ( l i n e O f T e x t , 80 , myFileHandle ) ! = NULL ) {
}
p r i n t f ( ” L i n e read i s : %s ” , l i n e O f T e x t ) ;
f c l o s e ( myFileHandle ) ;
}
}
4 Example of using string compare and search
functions
i f ( strncmp ( l i n e O f T e x t , ” 1 2 7 . 0 . ” , 6 ) == 0 ) {
/* t h e f i r s t 6 l e t t e r s are ’ 1 2 7 . 0 . ’ */
}
i f ( s t r s t r ( l i n e O f T e x t , ” Google ” ) ! = NULL ) {
/* the line contains ‘Google’ */
}