CS 214: Systems Programming, Spring 2020
Assignment vLast: Where’s the file?
Warning: This is a complex assignment that will take time to code and test. Be sure to define modules and give
it its due time. Make sure to read the assignment carefully!
Note: Due to the recent suspension of in-person contact you may conplete this assignment
either with a partner so solo (which I do not recommend). You must sign up/register your
group in either case.
0. Abstract
Git is a a popular current iteration of a series of version control systems. Most large, complex software projects
are coded using version control. Version control can be very helpful when working on a single set of source code
that multiple people are contributing to by making sure that everyone is working on the same version of the
code. If people are working on code in physically separate locations, it is entirely possible that two different
people have edited the same original code in two ways that are incompatible with each other. A versioning
control system would not allow two different versions of the same file to exist in its central repository, enforcing
that any changes made to a file are seen by everyone before they can submit additional changes to the repository.
1. Introduction
You will write a version control system for this assignment. You first need a primer for the vocabulary:
project: some collection of code that is being maintained
also, the directory under which a collection of code resides
repository: the union of all the canonical copies of all projects being managed as well as
metadata and backups/historical data
also, the directory under which all managed project directories are located
.Manifest: a metadata file listing all paths to all files in a project, their state (version), a
hash of the all the contents of each file and a project version (which increments any
time there is any change to any part of the project)
history: a list of all updates/changes made to a given project since creation
(is maintained at the repository, but can be requested by clients)
roll back: change the current version of a project in the repository to be a previous version
commit/push: upload changes you made to a project locally to the repository, updating its current
version
check out: download the current version of a project in the repository
update: download from the repository only the files in a project that are newer than your local
copy
A version control system consists of some remote server that holds the repository and manages changes to it, and
any number of user clients that can fetch projects from the repository and push changes to projects they have
checked out. The clients have a local copy of the project they can make changes to, while the server holds the
canonical or definitive version of the project.
The overriding mandate of a version control system is to make sure no changes are made to a project in the
repository unless the user making the changes to that project has its current version. This can ease difficulties
when working remotely on a shared code base with other people. Rather than, for instance, emailing around
copies of code as it is changed, a versioning system will maintain a single, canonical version. A version control
system enforces synchronization by requiring that you have the current version of a project before you can
submit a change to it. This means it is not possible for two people to check out the current version of a project,
make different changes to the same file, and then both submit those changes. The first one to submit their
changes will alter the current version of the repository, so the next person who tries to submit changes will need
to update to the current version. On updating, they would see the file they have been editing has been altered and
would have to integrate their changes with the current version of the file before submitting.
A version control system can also protect a team from development mishaps. If you accidentally delete a file,
you can just update from the repository and get it restored. If you have an odd bug and just want to start fresh,
you can delete your whole project directory and check out a fresh copy. The version control system also saves all
versions of a project as changes are committed to it. Every new version of a project is saved separately in the
repository so it is possible to fetch old versions. Given the scope of this assignment, your version control system
will have limited functionality and will only be able to roll back (most version control systems can fork a
repository to have different development paths, and can roll forward undoing previous roll backs). This would
allow a group to start coding, realize that the current design or direction is flawed and roll back to a previous
version of the repository. The next check out done would then restore an older version of the project that the
version control system had saved.
The server will then necessarily need to support multiple client connections, perhaps simultaneous ones, be able
to automatically scan recursively through a project directory and compare files to determine similarity. The
server will also need to keep multiple versions of a project so that it can roll back the current version to a
previous and send those old files to a client. The client will need to parse commands from the user, scan through
a project directory to build a local manifest and know how to communicate with the server to either commit
changes make by the user to the local copy up to the repository, or fetch from the server files that it has that are
newer versions of files in the user's local copy of the project.
2. Implementation
You will need to write two programs; a “Where's The File” server and client. The server will maintain a
repository of projects that a client can check out from the server's repository and then commit and fetch updates
to and from them.
This functionality is mostly provided by the .Manifest file. Every project has a .Manifest file in its root directory.
The .Manifest file contains:
the current version of the project
for each file in the project:
that file's path/name
that file's current version
a stored hash of that file
The project's version is incremented any time any change is pushed to the repository. The files that hold the
changes have their specific versions incremented as well (if they were not removed). The Manifest file is
discussed more in 4.4, below.
The client and server programs can be invoked in any order. Client processes that cannot find the server should
repeatedly try to connect every 3 seconds until killed or exited with a SIGINT (Ctrl+C).
Minimally, your code should produce the following messages:
- Client announces completion of connection to server.
- Server announces acceptance of connection from client.
- Client disconnects (or is disconnected) from the server.
- Server disconnects from a client.
- Client displays error messages.
- Client displays informational messages about the status of an operation
(another operation is required, aborted and reason)
- Client displays successful command completion messages.
2.0 WTF client:
The WTF client program will taken as command line arguments a WTF command and a number of arguments.
Most commands require a project name first, some need only the project name. The first command the client is
given should always be a configure command where the hostname or IP address of the machine on which the
server is located as well as the port number where it is listening are command-line arguments. After a configure
is done, based on the command it is given, the client may send different things to the WTF server. The client
program can take only one command per invocation. The client's job is to maintain a Manifest; a list of all files
currently considered to be part of the project and to verify that list with the server when asked. Most commands
that result in files being sent or received to or from the server have two halves; a command of preparation to get
ready for an operation and then a command of execution to do the operation:
update - get the server's .Manifest and compare all entries in it with the client's .Manifest and see
what changes need to be made to the client's files to bring them up to the same version as the
server, and write out a .Update file recording all those changes that need to be made.
upgrade - make all the changes listed in the .Update to the client side
commit - get the server's .Manifest and compare all entries in it with the client's .Manifest and find out
which files the client has that are newer versions than the ones on the server, or the server does
not have, and write out a .Commit recording all the changes that need to be made.
push - make all the changes listed in the .Commit to the server side
All other commands are fairly direct; they create or destroy a project, add or remove a file to or from a project,
fetch the current version of the entire project from the server or change the current version of the project, or get
metadata about the project.
2.1 WTF server:
The WTF server program firstly need to be multithreaded, as it needs to serve potentially any number of clients
at once. It should spawn a new client service thread whenever it gets a new connection request. It should not do
any client communication in the same execution context that listens for new connections.
Since there will be multiple threads trying to access the files in the repository at the same time, you should have
a mutex per project to control access to it. Be sure to lock the mutex whenever reading or writing information or
files from or to a project. You do not want to, for instance, send a .Manifest to a client while you're adding a file
to it so that the .Manifest sent would be out of date the moment it is sent. Be careful not to deadlock your server's
mutexes.
When being started the server takes a single command line argument, a port number to listen on;
./WTFserver 9123
The server can be quit with a SIGINT (Crtrl+C) in the foreground of its process. You should however make sure
that you catch the exit signal (atexit()) and nicely shut down all threads, close all sockets and file descriptors and
free() all memory before allowing the process to terminate.
3. WTF Client Commands
The client process will send commands to the server, and the server will send responses back to the client. The
server will send back error, confirmation messages, and/or files for each command. All messages sent to the
server should result in a response to the client. The client program will take one command at a time and can only
perform one command per execution/invocation.
3.0 ./WTF configure
The configure command will save the IP address (or hostname) and port of the server for use by later
commands. This command will not attempt a connection to the server, but insteads saves the IP and port number
so that they are not needed as parameters for all other commands. The IP (or hostname) and port should be
written out to a ./.configure file. All commands that need to communicate with the server should first try to get
the address information and port from the ./.configure file and must fail if configure wasn’t run before they were
called. All other commands must also fail if a connection to the server cannot be established.
Note: if you can write out to an environment variable that persists between Processes, feel free to do so, but all
recent feedback has been that security upgrades to the iLabs seem to have obviated this option.
3.1 ./WTF checkout
The checkout command will fail if the project name doesn’t exist on the server, the client can't communicate
with the server, if the project name already exists on the client side or if configure was not run on the client side.
If it does run it will request the entire project from the server, which will send over the current version of the
project .Manifest as well as all the files that are listed in it. The client will be responsible for receiving the
project, creating any subdirectories under the project and putting all files in to place as well as saving the
.Manifest.
3.2 ./WTF update
The update command will fail if the project name doesn’t exist on the server and if the client can not contact the
server. The update command is rather complex since it is where lots of things are compared in order to maintain
proper versioning. If update doesn't work correctly, almost nothing else will.
Update's purpose is to fetch the server's .Manifest for the specified project, compare every entry in it to the
client's .Manifest and see if there are any changes on the server side for the client. If there are, it adds a line to
a .Update file to reflect the change and outputs some information to STDOUT to let the user know what needs to
change/will be changed. This is done for every difference discovered. If there is an update but the user changed
the file that needs to be updated, update should write instead to a .Conflict file and delete any .Update file (if
there is one). If the server has no changes for the client, update can stop and does not have to do a line-by-line
analysis of the .Manifest files, and should blank the .Update file and delete any .Conflict file (if there is one),
since there are no server updates.
There is one full success case, three partial success cases and one failure case:
Full success case: (client won't have to download anything)
Update code: (server has no updates for client and client may or may not have updates for the server)
criteria: the server and client .Manifests are the same version ... can stop immediately!
action: Write a blank .Update file because everthing is awesome!
(*dum*dum*dum*dum*dum*... everything is great when you're part of a team! ...)
Delete .Conflict if it exists
Output 'Up To Date' to STDOUT
Partial success cases: (client will have to download some things)
Modify code: (server has modifications for the client)
criteria: the server and client .Manifest are different versions, and the client's .Manifest:
- has files whose version and stored hash are different than the server's,
and the live hash of those files match the hash in the client's .Manifest
action: Append 'M