Game playing
What can machine learning do ?
What is (still) hard ?
Various types of games
• Board games
• Card games
• Real-time games
Some historical developments
Why Games ?
Games - ideal environment to test AI / ML systems
• Progress / performance can easily be measured
• Environment can easily be controlled
Machine Learning for Game Playing
A long history, almost as old
as AI itself
Arthur Samuel
• Playing checkers - Damen
• (late 50’s, early 60’)
• Several interesting ideas
and techniques
• Now, chinook (without
learning) - world champion
State of the art
Solves
• Tic-tac-toe, 4 gewinnt, Go-Mo-Ku
• Endgames: chess (5 pieces), checkers (8)
Modelling the opponent
• Given optimal strategy
• Find strategy that better fits the opponent.
MENACE (Michie, 1963)
MENACE (Michie, 1963)
Learns Tic-Tac-Toe
• 287 boxes
(1 for every board)
• 9 colors (for every position)
O X O
X
Algorithm:
• Choose box according to position
• Choose pearl from box
• Take corresponding move
Learning:
• Lost game -> keep pearls (negative reinforcement)
• Won game -> add extra pearl to boxes from which
pearl was taken (positive reinforcement)
X to
Move
O X O
X
O X O
X
X
Choose Box
Take
corresponding
Move
Select pearl
Arthur’s Samuel Checkers Player
Rote learning
• Learning by heart - memorizing
• Minimax - AlphaBeta
Minimax Search / KnightCap
Temporal difference learning
Backgammon
Elements of chance
TD-gammon (Tesauro)
Very high level
Changes in strategies of humans
Why does it work ?
• Deep search does not seem to be very useful (due
to random aspects)
• Situations can be compactly represented using
neural net and reasonable set of features
KnightCap (Baxter et al. 2000)
Learns chess
• From 1650 Elo (beginner)
to 2150 Elo (master player)
in ca. 300 Internetgames
Improvements wrt TD-Gammon:
• Integration of TD-learning with search
• Training against real opponents instead of against
itself
Discovering patterns
Database endgames
• Enormous endgame databases exist
• For certain combinations of pieces
Optimal moves known (brute force)
Known whether positions are won, lost, draw, how many moves
• Can they be compressed ?
Rules + exceptions more compact than database ?
• Can they be turned into simple rules ?
• Can we turn complex optimal strategies into simple but
effective ones ?
• Which properties of boards to take into account ?
Relational representations / engineering
• E.g., Quinlan, Alan Shapiro, Fuernkranz, …
KRK: simplest endgame
• 25620 positions
• Won in 0-16 moves
2796 different positions
18 classes
Relational / Logical representatoins
krk(-1,d,4,h,5,g,5)
Use information such as
•
•
•
•
•
samediagonal
samerow
samecollumn
attacks(…)
Etc.
Discovering strategies
Endgames are solved but hard to
understand
• Even hard for grand masters (KQKR)
• Many books written on endgames
Goal
• Find easy to understand strategies
• Perhaps not optimal, but easy to recall and
follow
Difficult games for computers
Go ?
• Too many possible moves
• Too deep search would be necessary
• Intractable (big award to be gained)
What about end-games ?
• Go end-games (simplified) have been
considered (E.g. Jan Ramon)
Modelling the opponent
Key problem in games such as poker, bridge, …
For simple games, optimal strategy known (NashEquilibrium)
• Optimal: Random
• But not optimal against a player that always plays stone
Modelling the opponent
• Trying to predict move of the opponent
• Or which move the opponent you will play
Key to success for some games
• Cf. Poker (Jonathan Schaeffer)
Other types of games
Adventure games, interactive games,
current compute games
Let’s look at some examples
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Digger
(learning to survive)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
A key problem : representing the states, use
of relations necessary
Real time games
Robocup
• Components can be learned
Using RL - e.g. the goalie
How to tackle those ?
• Problems
Degrees of freedom
Varying number of objects
Continuous positions …
Learning to fly
Work by Claude Sammut et al.
• Behavioural cloning
Trying to imitate the player