2.4 KiB

Raw Permalink Blame History

Tic Tac Toe notes

Checkpoint 1 Notes

Which class is responsible for each of the following behaviors? For each, explain how the behavior is accomplished.

Checking to see whether the game is over

The game class checks if the game is over using the is_over method.

Determining which actions are available at a particular state

The game class also does this through the get_actions method

Showing the board

The view class does this through the print_board method

Choosing which action to play on a turn

The player class contains the choose_action method which allows the player to choose whicn place to play.

Checkpoint 2 Notes

I was able to get the game to correctly determine who the winner was. I understand that there's an array that I'm accessing, and that indexes 0 through 9 correspond to the different spaces on the board starting on the top left and going across the row, then down to the left of the next row, etc. What I do not understand is what state is and why we need to use state["board"][index] to access something on the boare

TTT Strategy

For each of the following board states, if you are playing as X and it's your turn, which action would you take? Why?

0 | O | O 0 | 1 | O 0 | X | 2 X | O | 2 ---+---+--- ---+---+--- ---+---+--- ---+---+--- X | X | 5 3 | X | 5 X | O | O 3 | 4 | 5 ---+---+--- ---+---+--- ---+---+--- ---+---+--- 6 | 7 | 8 6 | 7 | O 6 | 7 | 8 6 | 7 | 8

1 - You should play in space 5, since you'll win the game. 2 - You should play in space 5 since that will block the O player from winning and also give you two in a row. 3 - You should play in space 0, since that will give you two chances to win on your next move and the other player can only block one of them. 4 - Play in space 4. This blocks the other player from using the O they placed.

Initial game state

You can get the inital game state using game.get_initial_state(). What is the current and future reward for this state? What does this mean? I think the current reward is 1, meaning x wins, and the future reward is 0. This means that at the beginning of the game, x who is the first player is likely to win, but after the computer tests all of it's cases, it's likely that the game will end in a draw. This makes sense to me. It wasn't too difficult to tie with the computer, however it was difficult to beat the computer.

2.4 KiB Raw Permalink Blame History