HW 4- My Computer is Now Better at Blackjack

Than I Am

ENGR 3H

Due 11:59 PM, Friday, December 6

1 Deal or No Deal? (40 Points)

A quick refresher on the rules of blackjack for those who might not be familiar:

the goal is to get a total value on your cards that is as close to 21 as possible

without going over 21. Cards have the same value as the number shown on the

card, except for face cards (jacks, queens, and kings), which are all worth 10,

and aces, which are worth either 1 or 11, depending on what’s better for the

player. Play starts with the player and the dealer each receiving 2 cards; the

player can only see one of the dealer’s cards. The player then chooses whether

to receive another card or keep their current cards; they continue to do this

until they decide to keep their cards or exceed 21, at which point the dealer’s

other card is revealed. The dealer then takes another card while their total is

under 17. If the dealer’s score is 17 or more, the dealer stops taking cards and

compares scores with the player. If the player has a total that is higher than

the dealer’s total (or the dealer exceeded 21), they win! If it’s less, they lose. A

tie is called a push, and no one makes any money.

We’re going to use Q-learning to try to teach our computers to play blackjack.

1.1 Choose Your Weapon

We can make the Q-learning quite complicated or more simple; for this assignment, you are only required to implement a simplified version of blackjack, but

you are allowed and encouraged to try to get a better performance by increasing

the complexity once you’ve got the basics working. Blackjack is a great game

for simple Q-learning because its action space is only two actions: stay, meaning

the player keeps their current card total, and hit, meaning the player gets an

additional card. We can define the state space in a number of different ways,

with increasing complexity:

1. The player’s current point total;

2. The player’s current point total and the number of aces the player has

(since this can affect the point total);

3. The player’s current point total, the number of aces, the player has, and

the card that is visible from the dealer’s hand;

4. The specific cards the player has as well as what is visible from the dealer’s

hand.

NOTE: Unless you’re feeling really comfortable, I wouldn’t try the last of these;

since the number of cards the player has changes, this can get very messy to deal

with. Not impossible, but not fun. Everyone is responsible for implementing

Q-learning using at least the first option (the player’s current point total), but

you may try to implement any of the other options instead.

Now that we’ve defined our state-action space, let’s look at how we’re going

to put together our code.

1.2 Initialize, Shuffle, and Deal (15 Points)

We need to start by initializing a bunch of variables and setting up our iteration loop. We want to train our Q-learning network by playing N games of

blackjack. Initially, we’re going to favor exploration; we’ll set our initial exploration likelihood = 1, but each game we’re going to adjust it by a factor of