Tagai

2018 Winter AI Project – Connect 5 Minimax

After finishing my previous project, I still found myself with plenty of time in-between classes (thankfully I’m only taking 4 courses this semester as opposed to my usual 6, so hopefully this trend continues throughout the semester). Also, as I recently took up a UTA position for CS2505 at VT (Intro to Comp Org), I found myself with plenty of down time on-shift as the early assignments are easy and most students don’t need help as of yet. Thankfully, some friends come and spend time with me to make the time go by faster. One of these friends typically brings a Go board along with her and loves to play connect 5 with us (think tic-tac-toe, but on a Go board, and you have to connect 5 instead of 3). After countless games, I thought back to my minimax algorithm for tic-tac-toe (here), and wondered if it applied smoothly to connect 5. And so, I decided to start working on this project.

First was to create the game. I was drawn to Java because I’m most comfortable working with object-heavy projects in Java (as opposed to something like Python). Coding the basic game was simple. The board itself is basically an 18×18 tic-tac-toe board. As such, I just used a 2D integer array size 18×18 and used constants for black, white, and empty spaces. I used swing for rendering, and KeyListeners for user input. Arrow keys navigate and enter places a piece. Upon placing a piece, control is handed over to the AI to decide the move they are going to take.

Screen Shot 2018-02-05 at 2.24.53 PM.png
Empty gameboard. The cursor is indicated with the red square (moves with arrow key input). As you’ll see later, the blue cursor (not shown here) indicates the previous move.

Next came calculating the heuristics for scoring any given board. I used this post by Ofek Gila as a foundation and built on that. Essentially, the board is traversed in every direction (horizontal, vertical, topleft->bottomright diagonal, topright->bottomleft diagonal) and consecutive pieces are counted and scored, with the sum total being the overall score of the current board state. The scores for consecutive pieces are based on the perceived value of that shape (i.e. a 5 in a row is worth much more than a single piece). I wanted this version to play very defensively (an annoying strategy for veterans of the game), so there are noticeable score differences based on what the color of the consecutive pieces is. Afterward, I began writing the minimax algorithm. The basic jist of minimax is to look a set number of moves into the future, score the current board state, and play off of the assumption that your opponent is going to make the best move for them at their given board state (hence the name minimax — you’re trying to maximize your gains and the opponent is trying to minimize them). I settled on looking 2 moves into the future, as it fit the balance of running quickly and decent decision making. If you want to learn more about minimax, feel free to look at the code for this project (link at the bottom).

Screen Shot 2018-02-05 at 2.23.14 PM.png
Using minimax, the AI (white) was able to defeat the user (black) here, while also stopping a win for black (if the white piece at the upper-right was not placed, black would have won)

Using this method, I ran into a couple of speed bumps. The major issue was performance. Especially at the beginning of the game, even after the user’s move, there are still 323 possible moves (18^2=324, -1 for the placed piece). That in addition to looking n moves into the future, meant that there are about 323^n checks that the algorithm needs to make (the number is actually slightly less than 323^n, as it doesn’t factor in placing new pieces and repeating the process, but it is essentially that value for small values of n, so we will ignore it for this basic analysis). To solve this, I cropped the board down to only ‘relevant’ spaces (I defined relevant here as any space around all of the current pieces in play with a buffer of 2 spaces on every side). Using this method, I was able to reduce the number of checks for 2 moves from 323^2=104,329 to 25^2=625 (placing one piece and creating a buffer of size 2 on every side creates a 5×5 grid, resulting in the 25 seen there (5^2 = 25)). Other than that, there is a bug in which the AI would let the player win if it won later down the decision-tree (in other words, it detected a win but didn’t take into account the opponent winning sooner). I expect this to be an error with scoring, but as I want to continue working on other projects, I will leave this bug in for now.

Screen Shot 2018-02-05 at 2.21.13 PM.png
You can see that the AI (white) detected a win, so it ignored black’s 3 in a row, ensuring a win for black.

Testing this out on some of my friends, I found that it played fairly decently. However, due to some minor bugs in the score evaluation (detailed at the end of the previous paragraph), veterans could consistently beat the bot (albeit after hard, long-fought games). This project was a fun one that I could share with friends, so I definitely enjoyed working on it and seeing it get better along the way. I encourage you to download the project and try playing against it yourself. You can find the github for this project here.

2017 Spring AI Project – PoNNg

Towards the end of my first year at Virginia Tech, I oddly found myself with more free time than usual. Throughout the year I tried to focus on my neural network study when I could, but sadly, little time arose for me to pursue what I love. However, I always worked on small projects when I could. The majority of this time was spent geared towards learning Linux and Python, both of which I have become somewhat proficient in. Using these newfound skills, I looked back to my roots to see what improvements I could make on my original code from high school and to get back into the field of artificial intelligence in general. The main goal for this project was to create a short, dependable neural network algorithm for personal use in Python. I also wanted to create a system in which 2 neural networks were pitted against each other in some form of competition. I thought pong was a suitable game for this purpose because of its simplicity and competitive nature. My version of it stays, for the most part, true to the original game. 2 paddles, 1 ball, reset and score increase upon the ball exiting the left or right screen. After each bounce, the ball’s y velocity is randomly chosen between [2, 5] to add some randomness to where the ball will be when it reaches the left or right side of the screen. Pygame was used to render the components onto a window.

The general frame of mind I had while creating the neural network algorithm for this project was brevity and simplicity. One of the main issues with my Java implementation from a year ago was the fact that all of the weights were stored in a single 1D array, which isn’t ideal for many reasons. Mainly, it’s confusing to look at and understand the math behind extracting the correct weight at the correct time. To correct this, I switched it from a 1D array to a 3D array, in the format of [layer][input][output], layer being the current index in regards to the hidden layer, input being the “left” set, and output being the “right” set (i.e. input -> hidden, input is the “left” set and hidden is the “right” set). Although confusing at first due to my lack of experience with 3D arrays, I eventually came up with a good system that works just as fine as the original and is slightly easier to understand from an outsiders point of view. e

Screenshot from 2017-04-09 19-13-31

Example of a 3D weight storage system for a neural network with 2 input nodes, 1 hidden layer of 3 nodes, and 1 output node. The number of rows corresponds to the “input” layer (input -> hidden; input is the input layer, hidden -> output; hidden is the input layer) and the number of columns corresponds to the “output” layer (hidden and output in the above example).

In this project, I experimented around with neural networks of varying sizes before settling on a simple 2 node input, 3 node hidden layer, and 1 node output network. The input was the vertical and horizontal distance between the paddle and the ball over the total height/width of the game window. When given more input nodes like the x and y velocity of the ball, it was observed that the neural networks were able to accurately predict where the ball would hit the paddle when it reached the side of the window (as opposed to just following the ball’s y component), but the time it took to reach this point was incredibly long and results varied over different situations (i.e. the angle the ball was approaching the paddle, etc.). Fitness was given based on how many matches the neural network won, but was punished twice as hard for every match (first to 3 points) lost (+1 for win, -2 for loss). A standard breeding process was performed after each match in which the parents were the 2 best performing networks in a population and the child was the worst, with the child weights having 2/3 chance of being from either of the parents and 1/3 chance of being the average of the parent weights. There was a 6% chance of any weight being erased and assigned a random value between (-1, 1). After each match, the loser was replaced by a randomly selected network from the population (it was ensured that the two networks playing each other were not the same one). Each paddle’s neural network can be seen displayed on its side of the board — each circle being a node and each line being a weight. A blue weight signifies a negative value and a red weight signifies a positive value.

Getting on to the actual project, the rate at which the neural networks became “good” at the game varied on each run. In my experience, it could be as short as 100 breeds, and as long as around 2000. However, for each run, the neural networks followed the same general pattern every time — make no attempt to move for the ball, show some attempt at chasing the ball or “reacting” the ball when it came close, and successfully chasing the ball.

before

Right when the program started. You can see the paddles move to either the top or the bottom of the screen, paying no attention to the ball.

during.gif

Networks show some learning — they now react to the ball when it approaches them, though they do not move to the correct y value as of yet.

after

The end of the pattern. Networks demonstrate ability to chase after the ball and hit it every time. At this point, the match time is indefinite.

Although this project seemed simple at first, there were many different stages it went through while in development. At first, the input size for the neural networks was complex (including paddle location, ball location, ball velocity). However, after a lot of playing around, consistent improvement was shown when this list was refined to just the distance between the paddle and the ball. In retrospect, this makes sense, as decreasing the complexity of any problem makes it easier to solve. Although, maybe with more patience, a more unique strategy could have been developed with more inputs. I encourage you to download my source code and play around with yourself if you are at all interested in doing so (here).

2016 Spring AI Project – Floppo Bordo

As my senior year of high school began to slow down and the college hype began to increase, I looked for simple projects to work on while in the midst of the craziness. I spent a while sitting back and looking at the projects I’ve finished this past school year and remembered a couple of other projects from other people that served as inspiration for my personal projects. Since I learned everything I know about AI from watching lectures and demonstrations online, it was a trip down memory lane to go back and watch those videos in particular. One of the projects I looked back at was a machine learning algorithm that used reinforcement learning for Flappy Bird (A popular mobile game), and I remember being amazed at the actual process of learning that the algorithm went through (see that video here). I decided to challenge myself by making my own Flappy Bird AI to benchmark my current progress, especially in comparison to where I was 6 months ago.

First, however, I had to make a Flappy Bird clone. Making it was surprisingly easy, as it has simple rules and doesn’t require a lot of complex algorithms. Flappy Bird is essentially an endless runner, but the player must tap the bird to fly precariously in between pipes (if the player touches one of these pipes they die and have to start over). Since this was just a quick project, I didn’t bother with making images to go over each of the objects, instead using colored rectangles. The blue rectangle is the player, the black rectangles are pipes, and the green rectangle is the ground.

I used my standard Neural Network algorithm with a genetic algorithm for the machine learning AI (read more about Neural Networks here). It uses 4 inputs – the player’s y position (height from the ground), the y positions of the opening between the closest pipes (the lowest point on the top pipe and the highest point on the bottom pipe), and the horizontal distance between the player and the closest pipe. There are 2 hidden layers of 5 nodes each, and 1 output node (if the output node > 0.5, the bird jumps). I used a sigmoid function (1 / (1 + e^-x)) as my activation function. Reward was based on the score at the end of the run and how far the AI moved the bird to the right.

Now onto the results. It actually learned to jump between pipes fairly quickly, but difficulty came when it needed to jump with precision between the pipes. However, after a little bit of time, it mastered the game perfectly.

start

Start of the simulation. You can see the AI has trouble handling its own jump height.

stend

After about 10/15 minutes of runtime, the AI has mastered the art of jumping in between the pipes.

I enjoyed making this project and seeing how far I’ve come. A while ago, I never would have thought I would have been able to make something like this. If you want to see the code for this project, you can get it here.

2016 Spring AI Project – TicTacToe & Brute Search

As my senior year of high school began to slow down and the college hype began to increase, I looked for simple projects to work on while in the midst of the craziness. Looking at the history of Artificial Intelligence, I began to read up on the different methods AI developers used before modern AI existed (Neural Networks would be an example of a modern day technique). I also decided to code this project in Python in order to broaden my skills with other languages (and because Python is pretty fun to use).

Artificial Intelligence for games like TicTacToe and Minimax are prime examples of early stage AI – programs that brute search their way through an entire list of possible situations in order to find the best action to take. However, brute searching only works on games with a small amount of game boards. For instance, Minimax is a really simple game because the amount of choices and paths players can take is very limited. TicTacToe ramps it up a little (about 9! or 362,880 different ways to play the game), but is still feasible with a little bit of load time. Games like chess, however, are almost impossible to brute search just from the sheer amount of games possible (about 10^10^50 games, in fact, there are more games of chess than grains of sand on the Earth). This means that although brute searching is an alright method to use when creating game AI, it isn’t always the most efficient, and other options should be considered the more complex the game is.

One of the main roadblocks I hit along the way on this project was dealing with recursion. I am familiar with recursion and its usefulness, but for more complicated tasks like this, it can become confusing at times (especially writing in an unfamiliar language, might I add). For those who don’t know much about recursion, it’s essentially a function that calls itself in order to simplify a problem. It’s complicated in that its formation is very abstract and you have to keep some key issues in mind while creating recursive functions (i.e., making sure you don’t accidentally create a function that loops indefinitely). After some frustration / hair pulling out, I was able to overcome the challenge and write a program that performs pretty well. Basically, the AI looks at each possible choice it has and finds the probability it will win if it chooses that path taking into consideration all possible game boards extending from that board. After probabilities for each possible spot to play are calculated, the AI chooses the option with the highest probability of success.

Screen Shot 2016-03-31 at 10.24.41 PM

Here the AI is controlling Xs and I am controlling Os. You can see the AI was successfully able to perform a double attack, forcing a win for Xs.

This project was fun to make. Comparing it with the Neural Networks I’ve made this year, It’s amazing to see how far Computer Scientists have come in terms of Artificial Intelligence and the ability for computers to recognize patterns and solve problems. If you want to see the code for this project, you can get it here: https://github.com/mccloskeybr/tictactoe

2016 Spring AI Project – LD33 & Neural Networks

As my senior year of high school began to slow down and the college hype began to increase, I looked for simple projects to work on while in the midst of the craziness. Thinking that it had been a while since I had made anything relating to artificial intelligence, I wanted to refresh my mind on Neural Networks and machine learning in general. Since I needed a game to attach an AI to, I naturally thought back to my Ludum Dare 33 game (linked here), as it was a rather simple game that wouldn’t be hard for a computer to master.

I used a Neural Network (read more about Neural Networks here) with 2 input nodes (one to tell when an arrow was close and another to tell when a house was close), 2 hidden layers of 5 nodes each, and 4 output nodes (controlled each of the 4 keys required to play the game: punch, block, jump, move right). The activation function was again a sigmoid (1 / (1 + e^-4.9x)). I used a genetic algorithm with a 10% mutation rate on each gene passed down in order to weed out bad Networks and incentivize growth of good Networks.

Screen Shot 2016-03-25 at 11.11.00 PM

Graphic displaying the structure of the Neural Network I used for this project. Inputs were either a -1 or a +1 depending on whether or not its specific object was close (-1 for false, +1 for true). A key was pressed if its respective output value was > 0.5 (the range for the sinusoid I used was (-1, 1)).

Despite my belief that my game was easy, initial runs of the simulation proved that although scoring points was easy, there were too many ways to earn points, which confused the networks. The Networks easily figured out that moving right was a good way to earn points as points were given based on how many civilians are squished, but it had a hard time figuring out that it could squish civilians longer if it blocked arrows. However, after many many iterations, Networks could generally figure out a better strategy for scoring (whether it be jumping on houses to destroy them or blocking arrows). Despite the Networks learning how to score points better, I have yet to see a perfect strategy formed (one that jumps on houses AND blocks arrows), which may be solved if I ran more iterations of the simulation.

ld33aistart.gif

Starting out you can see that it doesn’t do anything. Usually, if it does anything, it holds down random keys, causing its actions to be sporadic at best.

ezgif-2212419532.gif

After ~5 minutes of the simulation running, the AI finds out that walking right is a good strategy for getting points.

ld33aiBLOCK.gif

After a considerably long time, the AI has figured out blocking is a good combo with walking, as it can walk longer distances than if it wasn’t blocking. However, it hasn’t found out that destroying houses is a good source of points.

All in all I think this was a great refresher for AI. Since I’ve been mostly working on games these past few months, it was also a good break from the grind that asset creation is becoming. Also, learning that machines aren’t always as bright as I may think they are is a great lesson to take going forward in my AI career. If you want to see the code for this you can see it here: https://github.com/mccloskeybr/LD33

2015 Winter AI/Arduino Project – Brainy Crawler

Having taken a general engineering course my senior year of High school at the same time I was researching Neural Networks (read more about Neural Networks here), I decided it was time to make a physical agent – a tangible robot controlled by a Neural Network. My background in robotics helped in this venture as well, as I knew before hand how servos and ultrasonic sensors worked.

Before getting to the hard part of writing the code to control the robot, I had to make the agent itself first. I knew I wanted to make a robot that learned how to move, but I didn’t have 2 motors. Instead, I decided to use servos, as it not only allows movement, but the motion itself is complex and warrants a fair amount of learning in order to master it. For parts, I used an Arduino Uno with 2 Parallax standard servos and 1 HC-SR04 Ultrasonic sensor. A friend of mine, Clay Busbey, helped out by 3D printing parts for me. I had 3 parts printed, the main chassis and 2 essentially straight bars used as 1 arm with a joint. I put the Arduino Uno and a half+ breadboard onto the chassis and hooked everything up from there.

crawler_png

Fritzing diagram of how I hooked up all of the components. It doesn’t show the bars used for arms, but you’ll see how that works in the next gif. The red and green LEDs indicate whether the Arduino is switching between Networks, red meaning it’s switching, and green meaning movements are controlled by a Neural Network.

In terms of the Neural Network, I used 3 input nodes, 1 hidden layer of 10 nodes, and 2 output nodes. The input values are the current position of each servo, along with the current distance away from the object detected in the ultrasonic sensor. The output values decide whether the servo should increase or decrease its current angle (> or < 0.5). I used a sigmoid function (1 / (1 + e^(-x))) as my activation function. Each Neural Network controls one motion, with a genetic algorithm that replaces the least successful Network (2 best networks make a crossover/mutation child that replaces worst network). Fitness is distributed based on how far the robot moved while under control of each Neural Network.

brainy crawler

Timelapse of the agent moving across the floor (sped up 3x). You can see that although it has a good start, it hiccups due to either a bad breed or mutation.

I enjoyed working on this project. It’s really one thing to watch a program learn in the real world as opposed to in the virtual world.

If you want to get the Arduino script and/or the .stl files for 3D printing the parts, you can get them here (.zip, 15 kb): https://drive.google.com/file/d/0Bz_0wgRmDpKqdGdGUS1YMk9oN1E/view?usp=sharing

2015 Winter AI Project – Evolution Simulator

The 2015 school year was when I began dabbling around AI. The idea always amazed me. Everything from movie depictions of them to the real practical uses they have in modern day life was interesting, and I wanted to know more about it.

Wanting to explore Neural Networks deeper (read more about Neural Networks here), I decided to make an evolution simulator. The aim of the project was to create a population of creatures that, starting with knowing nothing about their environment, evolved to chase after food and run away from predators. The creatures can also evolve different values of attack and defense, of which determines if that creature can eat other creatures. A set of 12 creatures is allowed to survive for as long as it can (must eat food to replenish a diminishing energy value, if it gets to 0 the creature dies), and at the end of the generation, the bottom half is eliminated, the top half reproduces sexually once (2 children) with each other (first mates with second, third mates with fourth, and so on), and asexually once as well. Reproduction is accomplished with crossover/mutation, that takes into account the weights for each creatures Neural Network and the attack value for each creature. Fitness is distributed based on how many times the creature eats.

The Neural Network itself was fun to make. I decided I wanted to make an adaptable system, that is, one in which I can change whenever I want it to. This includes the size of each individual layer and the total amount of hidden layers. For this project, I got the most success using one and two hidden layers with different amount of nodes for each (18 nodes for 1 layer, 5 nodes for 2 layers).

The input for each creature is as follows: 2 ‘eyes’ that take up 4 input nodes, as well as one node expressing the current angle the creature is facing. Each eye detects the angle from the creature to the closest entity (or second closest, for the second eye) and the r, g, and b values of that entity, expressed as a fraction over 255 (the range a color value can have is [0, 255]), allowing the creature to evolve decision making based on whether the entity it sees is edible or not. The output nodes control whether it turns left or right, and its velocity forwards or backwards (move more slowly while going backwards). I used a sigmoid function (1 / (1 + e^(-x))) as my activation function.

Screen Shot 2016-01-05 at 10.12.19 AM

Neural Network that has 9 input nodes, 1 hidden layer with 18 nodes, and 2 output nodes. The color for each weight is explained below.

Before I go over examples of the simulation, I’ll explain what each of the graphics in the frame represent. The creatures are represented by shades of purple (creatures that have a high defense are more blue while creatures that have a high attack are more red), with the current best creature (highest fitness) is outlined in yellow. The line extending from the center from each creature to its outline is the direction in which the creature is currently facing. Food is represented by the green squares. Moving onto the panels outside the main frame, the one in the top right shows the current generation, the amount of creatures alive (from a set max population size of 12), the best fitness value from the overall simulation, the best fitness value from the previous generation, and the best fitness for the current generation (represented with O, LR, and C respectively). The graphic below that panel displays the Neural Network of the creature that garnered the best fitness, where connections that are mostly red are between (0, 1), and connections that are mostly blue are between (-1, 0). Finally, the graph displayed on the bottom of the frame illustrates the best fitness for each round and shows overall growth of the population.

beginning

At the beginning, you can observe how most creatures are spinning out of control and lack in objective.

structured

After some time, you can see that although the creatures have not evolved the ability to go directly towards food, they have formed some structure in the way they collect it. You can also see that this population has evolved to contain creatures that have a high defense, as creatures that had a high attack in previous generations did not perform as well.

end usual

The usual ending. Creatures have evolved to not only go towards food, but to distinguish food from other creatures. That means that although an entity might be closer and thus more efficient to go towards, the creatures recognizes that entity as another creature and decides to go towards an entity that is further away because it recognizes that as food.

ending weird

Although creatures tend to evolve the ability to go straight for food, there are other strategies that have the possibility to emerge. For instance, in the simulation above, the creatures have come up with the strategy to propel themselves vertically, turn horizontally, and propel themselves the left or right whenever they detect food.

This project was incredibly fun to make. Although I had some frustration at the start while learning how to make Neural Networks of varying size, it  was fun finding solutions to my problems. I will probably keep working on this project for weeks or months to come, and I will definitely be utilizing my adaptive Neural Network on future projects.

If you want the source code, you can get it here (Java, 35 kb): https://drive.google.com/file/d/0Bz_0wgRmDpKqaXV0TVp0aXhIcTA/view?usp=sharing

2015 Winter AI Project – Wall Avoider

The 2015 school year was when I began dabbling around AI. The idea always amazed me. Everything from movie depictions of them to the real practical uses they have in modern day life was interesting, and I wanted to know more about it.

During the holiday break, I decided to dedicate my programming efforts towards learning Neural Networks. I knew a little bit about them before hand, like the architecture and the theory of how they worked, but had no idea how to implement them into my own projects. The wall avoider project was dedicated towards making a very, very simple Neural Network that progressively got better at avoiding walls using a genetic algorithm.

But first, I will briefly explain what Neural Networks are. In short, they are systems modeled after the human brain – that is, they use a set of ‘neurons and synapses’ that manipulate a given set of input data so that a desired output is reached. The issue is, however, that we do not know the correct weights and values necessary so that the network will give us our desired output. There are several ways of determining error/finding the best set of values so that a desired output is reached. For this project, I’ll be using a genetic algorithm to essentially determine the best network, and ‘breed’ it with other good networks, resulting in an equally if not better Neural Network. This occurs by choosing 2 Neural Networks that result in a long run (doesn’t hit the wall) and randomly choosing weights from both networks to result in a new Neural Network.

neural-network

Anatomy of a Neural Network.

My project’s Neural Network has 3 input nodes and 1 output node. The value of the input nodes is either a 1 or a 0, depending on whether or not its respective ‘eye’ (line extending from the center of the runner) hits a wall or not. These values are each multiplied by their respective weights (random at first, then altered through the genetic algorithm) and summed together, then put through an activation function. In my example I used a variation of a sigmoid function (1 / (1 + e^(-x)) as an activation function. Sigmoid functions are nice because their ranges are typically (0,1), resulting in either an ‘on’ or an ‘off’ setting. If the a(x) > 0.5, it turned right. If a(x) < 0.5, it turned left. If a(x) = 0.5, it didn’t turn at all.

Screen Shot 2015-12-23 at 12.47.08 PM

Graphic of my Neural Network. i = input (eyes), W = weights, a = activation

After constructing this system, I began running tests. As the objective wasn’t a hard one to learn, the networks involved mastered the challenge quickly. In the gifs, you can see on the left the trial number and the fitness value (how well the network is doing) for each member of the population. p# = parent, c = child, and ^ indicates current best network.

trial 1

Trial 1, struggling.

trial 7

Trial 7, a pretty decent network is bred, but still room for improvement.

trial 25

Trial 25, all networks are doing very well.

This was a great intro to Neural Networks. I definitely want to take this idea further and create deeper, more complex networks for harder tasks.

If you want the source code, you can get it here (Java, 26kb): https://drive.google.com/file/d/0Bz_0wgRmDpKqSVRvbXpzcm1VRVE/view?usp=sharing

2015 Winter AI Project – Maze Runner

The 2015 school year was when I began dabbling around AI. The idea always amazed me. Everything from movie depictions of them to the real practical uses they have in modern day life was interesting, and I wanted to know more about it.

After my Chess project, I wanted to become a little more serious in my research surrounding artificial intelligence. I looked up everything from examples of how it was used in day to day life to videos of lectures by college professors. I learned a lot about the concept of reinforcement learning through these lectures and even watching college Thesis projects, making me want to program something to test out the idea.

First, however, a very brief overview of my current understanding of reinforcement learning. Reinforcement learning is a system in which the AI learns about the environment it’s put in through a system of actions and states. The ai is rewarded or punished through the various states (depending on whether you want to maximize utility or minimize cost, either way is fine, but for my example I will be using a reward system) for good actions versus bad actions, but the AI doesn’t know which actions are good or bad until it experiences them. Thus, after a long time, the AI should begin to gravitate towards actions that garner it the largest sum reward.

Alright, now to my project. I decided to go with something simple and intuitive to explore the concept of reinforcement learning through creating an artificial intelligence that would find the quickest path to an exit in a predefined maze. As mentioned before, the AI started with knowing absolutely nothing about the maze it was placed in, and must learn everything from scratch.

Screen Shot 2015-12-18 at 2.09.06 PM

Original board displaying reward values for each tile (currently 0 because the AI doesn’t know which tiles are good and bad yet). The blue tile is the current position of the runner (controlled by the AI) and the green tile is the exit.

The reward system for the AI essentially takes in all of the rewards of the tiles it visited and sums it together, divides that number by the amount of tiles visited, and uses that value to update the reward value for all of the tiles it visited. The tiles takes that value and averages it into all of its previous rewards given from previous trials. The AI gets a heavy reward boost for completing the maze faster than before, and a heavy punishment if it was slower (increase/decrease overall reward by a factor of 2). Thus, over time, the AI slowly becomes better and better at solving the maze.

10

The 10th trial. You can see the general path the AI takes (tiles with the highest reward) is rather sporadic.

50

The 50th trial. The AI has begun to iron out all the kinks in the path, but it isn’t yet the most efficient.

100

The 100th trial. You can see the AI has almost found the most efficient path.

I heavily enjoyed this project, as watching something learn, especially a program, is really interesting. I’m really excited to see where I can take this in the future.

If you want to play around with this, you can get the source code here (Java, 24kb): https://drive.google.com/file/d/0Bz_0wgRmDpKqNUVpTFFQdUNaaUk/view?usp=sharing

2015 Winter Game/AI Project – Chess

The 2015 school year was when I began dabbling around AI. The idea always amazed me. Everything from movie depictions of them to the real practical uses they have in modern day life was interesting, and I wanted to know more about it.

I had been wanting to make chess for a while, as I’ve been a fan of the sport. What better way to make it than to explore simple self learning artificial intelligence at the same time? However, before getting to the AI, I first had to make the game. The idea of generating a list of possible moves was was simple enough, but the real challenge was implementing more complex moves like castling, checks, stalemates, etc.

Screen Shot 2015-12-11 at 6.49.45 PM

Final game showing all possible moves.

After spending a while messing around with the game itself, time came to begin working on artificial intelligence. Having known nothing about the subject (I started learning about concepts like neural networks and genetic algorithms shortly after), I decided on a simple concept. The artificial intelligence would save all the games it had previously played and look back on them, determine the statistical probability that it would win if it did that move (based on the outcome of the games that had the same move), and decide whether or not to proceed from there. If it decided that the move was bad, it would instead do a random move and add that to the database.

There were a few errors in my method, however, that presented themselves as I neared the end of writing the program. Mainly, it would take ages before the AI became even decent at the game, because, no matter how fast it played the game, it would almost always play a new game every single iteration (Hardy’s estimate of possible chess games is roundabout 10^(10^(50)), more on the subject here), causing the file that contained all of the previous games it had played to become unbelievably massive before it became good at the game.

Screen Shot 2015-12-11 at 8.26.02 PM

A fraction of a single game stored in a .txt file.

Despite all short comings, this project was still a simple and a fun introduction into the world of self-learning artificial intelligence.

© 2024 Brendan McCloskey

Theme by Anders NorénUp ↑