generated from mwc/lab_tic_tac_toe
Checkpoint 3
What I changed I changed the computer to use the lookahead strategy rather than the random picker strategy. Why I changed it It's boring to play agains the computer when all it does is pick a random available spot. When the computer plays intelligently it's more interesting and difficult. Estimate for remaining time to finish assignment: [30 minutes to an hour]
This commit is contained in:
parent
0dbc59d637
commit
c2ae97588a
16
notes.md
16
notes.md
|
@ -21,20 +21,24 @@ an array that I'm accessing, and that indexes 0 through 9 correspond to the diff
|
|||
board starting on the top left and going across the row, then down to the left of the next row, etc.
|
||||
What I do not understand is what state is and why we need to use state["board"][index] to access
|
||||
something on the boare
|
||||
### TTT Strategy
|
||||
|
||||
### TTT Strategy
|
||||
For each of the following board states, if you are playing as X
|
||||
and it's your turn, which action would you take? Why?
|
||||
|
||||
| O | O | | O | X | X | O |
|
||||
0 | O | O 0 | 1 | O 0 | X | 2 X | O | 2
|
||||
---+---+--- ---+---+--- ---+---+--- ---+---+---
|
||||
X | X | | X | X | O | O | |
|
||||
X | X | 5 3 | X | 5 X | O | O 3 | 4 | 5
|
||||
---+---+--- ---+---+--- ---+---+--- ---+---+---
|
||||
| | | | O | | | |
|
||||
6 | 7 | 8 6 | 7 | O 6 | 7 | 8 6 | 7 | 8
|
||||
|
||||
1 - You should play in space 5, since you'll win the game.
|
||||
2 - You should play in space 5 since that will block the O player from winning and also give you two in a row.
|
||||
3 - You should play in space 0, since that will give you two chances to win on your next move and the other player can only block one of them.
|
||||
4 - Play in space 4. This blocks the other player from using the O they placed.
|
||||
|
||||
### Initial game state
|
||||
|
||||
You can get the inital game state using game.get_initial_state().
|
||||
What is the current and future reward for this state? What does this mean?
|
||||
|
||||
I think the current reward is 1, meaning x wins, and the future reward is 0. This means that at the beginning of the game, x who is the first player is likely to win, but after the computer tests all of it's cases, it's likely that the game will end in a draw. This makes sense to me. It wasn't too difficult to tie with the computer, however it was difficult to beat the computer.
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ from ttt.view import TTTView
|
|||
from ttt.player import TTTHumanPlayer, TTTComputerPlayer
|
||||
|
||||
player0 = TTTHumanPlayer("Player 1")
|
||||
player1 = TTTHumanPlayer("Player 2")
|
||||
player1 = TTTComputerPlayer("Player 2")
|
||||
game = TTTGame()
|
||||
view = TTTView(player0, player1)
|
||||
|
||||
|
@ -12,4 +12,4 @@ view.greet()
|
|||
while not game.is_over(state):
|
||||
action = view.get_action(state)
|
||||
state = game.get_next_state(state, action)
|
||||
view.conclude(state)
|
||||
view.conclude(state)
|
|
@ -1,5 +1,6 @@
|
|||
from click import Choice, prompt
|
||||
from strategy.random_strategy import RandomStrategy
|
||||
from strategy.lookahead_strategy import LookaheadStrategy
|
||||
from ttt.game import TTTGame
|
||||
import random
|
||||
|
||||
|
@ -24,7 +25,7 @@ class TTTComputerPlayer:
|
|||
def __init__(self, name):
|
||||
"Sets up the player."
|
||||
self.name = name
|
||||
self.strategy = RandomStrategy(TTTGame())
|
||||
self.strategy = LookaheadStrategy(TTTGame())
|
||||
|
||||
def choose_action(self, state):
|
||||
"Chooses a random move from the moves available."
|
||||
|
|
Loading…
Reference in New Issue