Assignment 1 - File Encryption with the Java
Cryptography Architecture
This assignment will make use of the theory taught in the first four weeks' lectures
and will teach you how file encryption is performed in reality using the standard
cryptoprimitives provided in the Java Cryptography Architecture. Along the way,
you'll explore issues such as padding, block cipher modes and how to generate
large, reasonably random keys from passwords (or, better still, passphrases). You
can split this task into three easy stages to allow you to check your work, and as
you proceed, these instructions ask you several questions - you should record your
answers because they are part of your assignment submission.
Download the file FileEncryptor.java from iLearn and import it into an Eclipse
Java project called FileEncryptor so that you can work on it.
FileEncryptor.java is the skeleton of a simple file encrypt/decrypt program. On the
command line, the first parameter is "E" or "D" for encrypt or decrypt,
respectively, while the second and third parameters are the input file and output
file, as you'd expect. However, the fourth parameter is not a key, but a password
or passphrase. If a passphrase which includes spaces is used, it must be surrounded
by quotes to stop the shell parsing it as multiple parameters, e.g. "My Super Secret
Passphrase".
Your mission, should you choose to accept it - and you don't really have a lot of
choice - is to complete the program by filling in the missing code which
instantiates objects of the right classes and then calls the appropriate method calls.
I created the skeleton by taking the completed, working version, and deleting
code while leaving the explanatory comments - but I've been reasonably careful to
replace multi-line function calls and blocks of code with the same number of
blank lines (there's usually a blank line below the excised code). So if you see a
two-line gap, you can reasonably assume that one line of code will provide the
missing functionality - and if you think a single line of code will fill a 15-line gap,
you should wonder whether you're missing something.
Along the way, you will need to refer to the online documentation for the Java
Cryptography Architecture, which you will find
at http://docs.oracle.com/javase/8/docs/technotes/guides/security/crypto/CryptoSp
ec.html . The JavaDoc for the various javax.crypto classes are
at http://docs.oracle.com/javase/8/docs/api/javax/crypto/packagesummary.html while javax.crypto.spec JavaDoc is
at http://docs.oracle.com/javase/8/docs/api/javax/crypto/spec/packagesummary.html.
You may also need to refer to the Standard Algorithm Names
at http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.
html and the Oracle Providers documentation
at http://docs.oracle.com/javase/8/docs/technotes/guides/security/SunProviders.ht
ml .
Anything else can be found under the main Security Documentation
at http://docs.oracle.com/javase/8/docs/technotes/guides/security/index.html .
This is a shameless ploy to get you to become at least slightly familiar with the
JCA reference documentation - you are quite likely to need it in the real world, as
well as for this assignment. However, I will provide some overview guidance in
this article.
I've also moved the declarations of the required variables and objects to the
beginning of the main() method - looking at these will give you some valuable
clues.
If you are not a strong Java programmer, you might want to review the skeleton
code first, while referring to the notes on Programming Style at the end of this
document. However, it should be possible for a non-programmer to work out the
required methods and their arguments like algorithm names and transformations
from the lectures and the write-up that follows.
The Java Cryptography Architecture
The JCA provides a standard interface which Java programmers can use to both
use cryptographic functionality and also implement crypto functionality. Notice
the latter point: anybody can develop cryptoprimitives that conform to the JCA
and package them, as JAR files, into what are called providers. In this exercise, we
will be using one of the default providers from the Oracle Java SE SDK - the
SunJCE provider. Others exist and may be preferable if you require some advanced
functionality - a good example is the Australian-developed Bouncy Castle package
found at https://www.bouncycastle.org/.
Because the JCA is highly generic and algorithm-independent - you can utilise
DES, Blowfish or AES with almost-identical code - in many cases you do not
directly instantiate a particular class of cipher. Instead, you call
a generator or factory method to get an instance of the required cipher or other
cryptoprimitive. So the various base classes
- Cipher, SecretKeyFactory, KeyPairGenerator, etc. - all provide a
static getInstance() method which you should call, usually with the name of
the required algorithm as the first parameter.
This technique allows the JCA runtime to search multiple different providers in
order to instantiate the required algorithm, rather than tying your code to a
specific provider.
In practice, just the algorithm name alone is insufficient, as discussed in the
lectures - we also need to specify what mode the cipher will be used in, as well as
a padding mechanism. These options are concatenated with "/" characters as a
delimiter, so we arrive at strings like "AES/GCM/NoPadding", which the JCA
documentation calls transformations.
One of the benefits of strongly typed languages like Java is that many errors can be
discovered either by the compiler, at compile time, or even by the editor of your
IDE. However, in the JCA many different cipher implementations share the same
class or interface - and the actual cipher implementation to be used is specified by
a string parameter specifying the required transformation.
This means non-existent cipher implementations cannot be discovered at compile
time, but only at run time, and so many of the getInstance() calls will need
to be surrounded by try/catch blocks. Fortunately, Eclipse will take care of most of
that work for you.
The various cryptoprimitives require different sets of their corresponding
parameters, and so there are supporting classes such
as AlgorithmParameters, KeySpec and its
derivatives SecretKeySpec, PBEKeySpec, RSAPrivateKeySpec, etc. that
allow the programmer to fully specify how he wants ciphers configured, keys
generated, etc.
In general, the various algorithms will provide default values if a parameter is not
specified. However, be aware that the various API's don't like null pointers. So, for
example, if you don't want to specify a salt value for a KeySpec, you can't say
salt = null;
but must write:
salt = new byte[20];
in order to avoid an exception being thrown at run time.
SunJCE
For this exercise, we'll use the standard SunJCE provider - the original Sun Java
Cryptography Engine. All its documentation is in the standard JCA documentation
and JavaDoc linked above, and no installation or configuration is required.
Converting a Passphrase to a Key
We're going to be encrypting and decrypting using AES with a 128-bit key - but
humans are really bad at remembering 128-bit binary strings, so we're letting the
user enter a passphrase instead. As a result, we'll need an algorithm that converts
an arbitrary-length string into a fixed-length binary value.
That really ought to ring bells in your head; we spent a week discussing this type
of algorithm.
Right: we need a hash function. Fortunately, there's an algorithm that is purposebuilt for this task - taking a passphrase and turning it into a key. It's called
PBKDF2, which stands for "Password Based Key Derivation Function 2", and it's
based on repeated hashing of a single block of input text, typically using SHA-1
thousands of times over - the first time to hash the input text, and then to hash the
previous hash value. It can also add in a little salt (something we'll discuss in a
later lecture, in connection with password hashing and storage).
You may have already used PBKDF2 - it's the function that Veracrypt uses to turn
a passphrase into a key, and it's also the way wi-fi access points and routers create
a 256-bit AES key for use in WPA2. When setting up WPA2, you could enter a
64-digit hex string as a specific 256-bit key - or a string up to 63 characters long,
which will be passed through PBKDF2, using 4096 iterations to produce what is
called the Pairwise Master Key. In addition, the network SSID is used as salt, so
that the same passphrase produces different key values on different networks.
Here's an outline of the missing code:
To create a PBKDF2 key factory, call the getInstance() method
of SecretKeyFactory with the right transformation string (which you get
from the Standard Algorithm Names page linked above).
Next, you'll need to create a key specification, which sets up the number of
iterations, the required key size, the salt string, and - very importantly - the
passphrase the key will be derived from. There are various types of KeySpec, but
you'll need one for Password-Based Encryption (this is a big hint - I've already
mentioned this type of KeySpec).
Finally, you get the key by calling your key
factory's generateSecret() method with the KeySpec object as parameter.
Notice that the key is returned as a SecretKey, which is a class that wraps
around the actual key value in order to provide type safety and also deal with
various types of key encodings - it is impossible to do things like attaching a key to
an email or pasting it into a web form if it is in a purely binary representation, and
so it may be saved in formats like ASN.1 DER (ISO Abstract Syntax Notation type
1, Data Extended Representation). To get the actual key as an array of bytes, call
the getEncoded() method on the SecretKey object.
Finally, you've got a key!
Performing Symmetric Encryption
The SunJCE's Cipher implementations support a wide variety of cryptoprimitives,
including DES, RC4, Blowfish and AES along with various modes and padding
types.
To use a Cipher, the general sequence of events is this:
• Create a Cipher of the required type by
calling Cipher.getInstance() while specifying the appropriate
transformation.
• Initialise the Cipher object by calling its init() method, specifying the
encryption mode (encrypt or decrypt), a key spec and optionally
an IVSpec.
• Loop around, calling the cipher object's update() method which returns
blocks of ciphertext.
• Finally, call the cipher object's doFinal() method, which allows it to
perform padding if it has a partial block left over from the previous update.
Note that the JCA Cipher interface does not have
separate encrypt() and decrypt() methods - instead, the
same update() and doFinal() methods are used to both encrypt and decrypt
and the operating mode is set in the init() method call. You can see the static
final constants that are passed to init() in the skeleton code.
The other thing that is passed to init() is a KeySpec - this a wrapper that goes
around the raw bytes of the key. In this case, you'll need to set up
a SecretKeySpec - take a look at its constructor in the JCA documentation.
ECB mode (Version 1)
For the first version of your program, you should implement the encryption using
AES with 128-bit keys in ECB mode and PKCS #5 padding. This means that no
IVSpec will be required.
PKCS #5 padding is discussed in the lecture - it basically means that if the final
block falls n bytes short of the AES 128-bit block size, we fill that space
with n bytes, each with the value n. There's a curious Java weirdness here: PKCS
#5 padding is only defined for ciphers which use an 8-byte - that's 64 bits, of
course - block size, which means DES. PKCS #7 defines this style of padding for
arbitrary block sizes, but the JCA won't accept that as part of a transformation
string, and so we're forced to call it PKCS #5 padding, even though it's really #7.
In any case, the good news is that you don't have to write any code to do the
padding manually, because the Cipher will take care of this for you, if you
specify it as part of the transformation string - that's the point of
the doFinal() method.
Once you've written your code, test your program, using increasingly long text
files. You may need to insert some statements to print values to System.err so
that you can see what's going on.
Once your program is working, encrypt the file infile20.txt, using a
passphrase of your choosing:
java FileEncryptor e infile20.txt crypto.bin passphrase
The ciphertext is written to the file crypto.bin - open it in Notepad++ or a
similar editor. What do you notice about the ciphertext? Why do you think this
is?
Experiment with encrypting small files of various sizes - say, 50 bytes, 60 bytes
and 64 bytes - and comparing the size of the generated ciphertext file. Record
your results. What do you notice?
CBC Mode With Default IV (Version 2)
Now edit your code to operate in Cipher Block Chaining mode. You will need to
change the transformation string, and also create an IVSpec. Once again, check
the documentation for IVSpec in the online documentation - especially its
constructors. Notice that if you do not provide an initialization vector,
the Cipher will provide a default. However, remember the note about default
values above - passing a null value will generally cause an exception, so that a
default IV is provided as
byte[] iv = new byte[16];
Once you have completed your program and tested it with some small text files,
try encrypting infile20.txt once again and viewing the ciphertext in
Notepad++. Does it look different?
Once again, try encrypting small files and comparing the size of the input
plaintext and generated ciphertext. Record your results - what do you find?
Obviously, decryption has to be performed with the same key (i.e. the same
passphrase) but also with the same initialization vector. How is the initialization
vector being passed between encryption and decryption? Use the getIV() method
of the Cipher interface to get the actual default initialization vector being used and
print it - do you think this is particularly secure?
Final Challenge: CBC Mode With Random IV (Version 3)
The answer to the last question above should suggest a further weakness with
using a default IV - setting aside the obviously problematic value, every file is
being encrypted using the same IV value and quite possible the same passphrase,
leaving us increasingly open to a known-ciphertext attack as we encrypt more and
more files.
We really need a randomly-generated IV. Fortunately, the JCA provides a
cryptographically adequate pseudo-random number generator class,
in SecureRandom. Once you've create a 16-byte array, iv, as above, getting it
filled in with a random value is as easy as:
SecureRandom random = new SecureRandom();
random.nextBytes(iv);
Now modify your CBC-mode file encryption program so that, instead of using the
same constant value for an IV, it uses a "genuinely" random IV.
But before you copy and paste the two lines above into your program and set
about compiling it, consider that question asked in the previous task: How is the
initialization vector being passed between encryption and decryption? Whenever
you encrypt a file, a new, random value for the IV is generated - and the file has to
be decrypted using the same IV value. If this is going to work, you're going to
have to persistently save the IV somewhere.
Test your program by encrypting infile20.txt and decrypting the resultant
ciphertext - obviously, you should get back the same file. Now, repeat the test of
encrypting some small files and comparing the sizes of the generated ciphertext
files. Your ciphertext should be a bit larger - and you should know exactly how
much larger and why.
Hopefully, you can see that initialization vectors aren't as straightforward as
people often naively assume. In fact, many secure systems have failed, not because
DES was cracked or AES was cracked, but because a naive programmer chose a
weak way of generating initialization vectors. And you should also be able to see
why initialization vectors for encrypting disk sectors are particularly challenging,
as discussed in the lecture, and we have to come up with schemes like ESSIV and
XTS.
Notes on Coding Style
Some general comments on coding style - for this demo program, I opted to let
Eclipse generate try/catch blocks around various method calls, then replaced
the generated printStackTrace() with a call to a routine which prints a more
meaningful error message - in particular, a message that gives a clue as to where
things went pear-shaped. This is usually followed by a call to System.exit() to
exit the program, setting the errorlevel to an appropriate value. I use this
technique a lot in "exploratory" programming, but for larger and more polished
programs I tend to use larger try/catch blocks and throw to a higher-level
error handler.
The main processing loop is written in a slightly verbose style, just to make what
is happening clear to the less experienced programmers. The paradigm used is:
read_from_file()
while (something was successfully read) {
encrypt_what_was_read()
write_what_was_encrypted()
read_from_file()
}
perform_final_processing()
write_final_block()
However, it's quite common for C/C++/Java programmers to combine the
read_from_file() with the while loop test. In this example, it would read like this:
while ((bytesRead = in.read(inputBuffer)) > 0) {
This paradigm would make the program a few lines shorter, as both calls
to in.read() are consolidated into a single call.
Notice that the file I/O reads binary blocks of data, rather than using a stream
reader to read lines of text. There are two reasons for this: firstly, the program
cannot be restricted to simply encrypting text files - you might want to encrypt an
Excel spreadsheet, a JPEG image, or any other kind of data. And secondly, the
ciphertext written to the encrypted file will definitely not be text - and the
program needs to be able to read it back in to decrypt it.
You may have noticed that I've used buffered streams for file input and output.
This means that the block size for reading and writing has virtually no impact on
performance: reading lots of small blocks of data will not cause lots of slow disk
accesses because the Java runtime will buffer all I/O. The buffer size of 128 bytes
set at the beginning of the program can be changed arbitrarily.
Please upload your Java source code file, FileEncryptor.java, along with a text file
called Assignment1.txt containing the answers to the following questions:
1. For version 1 of the program, operating in ECB mode:
a) What did you notice about the ciphertext, and why do you think that was?
b) What did you notice about the size of the generated ciphertext files. Why was
this?
2. For version 2 of the program, operating in CBC mode with default IV:
a) What did you notice about the size of the generated ciphertext files?
b) How is the initialization vector being passed between encryption and
decryption?
c) Is this particularly secure?
3. For version 3 of the program, operating in CBC mode with a random IV:
a) What did you notice about the size of the generated ciphertext files?
The upload dialog will only accept filetypes of .java and .txt - no Word documents
or PDF's please!