How to Predict Stock Prices like a Pro using Reinforcement Learning in Python

First of all... In our article Predictive analytics meets Reinforcement Learning, we discussed how the combination of predictive analytics and reinforcement learning is a winning combo for solving business problems. Today we going down to get our hands dirty with the practical aspect of the article.

So welcome to this article on predictive analytics using reinforcement learning. I’m so happy that you are curious about this fascinating topic and want to learn more. In this article, I will guide you through the basics of predictive analytics and reinforcement learning, show you how they can work together to solve complex problems in various domains, and teach you how to build your own simple project of predicting stock prices using reinforcement learning in Python. This article is designed for beginners who have some background in machine learning and Python, but don’t worry if you are not familiar with some concepts or terms. I will explain everything in a clear and easy way, with examples and code snippets along the way. By the end of this article, you will have a better understanding of predictive analytics using reinforcement learning and how to apply it to your own interests or goals. So let’s dive right in!

But before we start, let me warn you: this article is not for the faint-hearted. Predictive analytics using reinforcement learning is a challenging and exciting field that requires a lot of patience, creativity and perseverance. You will encounter many obstacles, frustrations and failures along the way. But don’t let that discourage you. Remember: failure is just another opportunity to learn something new. And if you stick with it, you will be rewarded with amazing results that will make you proud of yourself.

So are you ready to embark on this adventure? Do you have what it takes to become a master of predictive analytics using reinforcement learning? If yes, then grab your laptop, your coffee (or tea) and your sense of humor (you’ll need it) and let’s get started!

What is Predictive Analytics?

Predictive analytics is the process of using data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data. The goal is to go beyond knowing what has happened to providing a best assessment of what will happen in the future.

Predictive analytics can be applied to many domains, such as business, marketing, healthcare, education, etc. For example, predictive analytics can help businesses optimize their marketing campaigns, reduce customer churn, increase sales revenue, detect frauds and risks, etc.

What is Reinforcement Learning?

Reinforcement learning (RL) is a type of machine learning that learns from its own actions and experiences rather than from labeled data or explicit feedback. RL agents interact with an environment and learn by trial and error how to maximize a reward function that reflects their goals.

RL agents have four main components: a state that represents the current situation of the agent; an action that the agent can take to change its state; a reward that measures how good or bad the outcome of an action is; and a policy that determines which action to take given a state.

RL agents learn by exploring different actions and observing their consequences. Over time, they improve their policy by favoring actions that lead to higher rewards and avoiding actions that lead to lower rewards.

How can Predictive Analytics and Reinforcement Learning be used together?

Predictive analytics and reinforcement learning can be used together for various applications that involve sequential decision making under uncertainty. For example:

Stock market prediction: RL agents can learn how to trade stocks by predicting future prices based on historical data and maximizing their profits or minimizing their losses.
Smart cities: RL agents can learn how to manage traffic lights, public transportation systems, energy grids, etc. by predicting future demands based on historical data and optimizing their efficiency or sustainability.
Education: RL agents can learn how to personalize learning paths for students by predicting their performance based on historical data and maximizing their engagement or achievement.

In general, predictive analytics can provide RL agents with useful information about the environment dynamics and potential outcomes of actions. Reinforcement learning can provide predictive analytics with adaptive strategies for exploring different scenarios and improving predictions over time.

How to implement Predictive Analytics using Reinforcement Learning in Python?

To illustrate how predictive analytics using reinforcement learning works in practice, we will use a simple example of predicting stock prices using reinforcement learning in Python*.* We will use the S&P 500 index data from January 2000 to December 2016 as our historical data. We will use a deep Q-learning algorithm to train an RL agent that can decide whether to buy, sell or hold a stock based on the current price and previous actions.

Deep Q-Learning Algorithm

Deep Q-learning is a variant of Q-learning that uses a neural network to approximate the Q-function. The Q-function is a function that maps a state-action pair to an expected future reward. The goal of Q-learning is to find an optimal policy that maximizes the expected future reward for each state.

The neural network takes as input a state vector that represents the current situation of the agent, such as the current price, previous actions, etc. The output layer has one node for each possible action that the agent can take, such as buy, sell or hold. The value of each node represents the Q-value for that action given the state.

The neural network is trained by using a replay buffer that stores previous state-action-reward-next state tuples. The agent samples batches of tuples from the replay buffer and updates its network weights by minimizing a loss function that measures the difference between the predicted Q-values and the target Q-values. The target Q-values are calculated by using a target network that has the same architecture as the main network but with frozen weights. The target network is periodically updated with the main network weights.

The agent also uses an epsilon-greedy exploration strategy that balances exploration and exploitation. With probability epsilon, it chooses a random action; otherwise, it chooses the action with the highest Q-value.

The Code

We will use Keras to build and train our neural network and pandas to handle our data. We will also use some helper functions from and for formatting prices, getting stock data vectors and visualizing results.

First, we import some libraries and define some hyperparameters:

import keras
from keras.models import Sequential
from keras.models import load_model
from keras.layers import Dense
from keras.optimizers import Adam
import numpy as np
import random
from collections import deque

# prints formatted price
def formatPrice(n):
    return ("-$" if n < 0 else "$") + "{0:.2f}".format(abs(n))

# returns the vector containing stock data from a fixed file
def getStockDataVec(key):
    vec = []
    lines = open("../input/GSPC.csv", "r").read().splitlines()
    for line in lines[1:]:
        vec.append(float(line.split(",")[4]))

    return vec

# returns an array with 1 element representing percentage change in price relative to previous day 
def getState(data, t):
    d = t - 1 # number of days back 
    block = data[t - d:t + 1] if d >= 0 else -d * [data[0]] + data[0:t + 1] # pad with t0 
    res = []
    for i in range(d):
        res.append(block[i + 1] - block[i]) # percentage change relative to previous day 
    return np.array([res])

window_size = 10 # number of days back 
batch_size = 32 # size of batch sampled from replay buffer 
episodes = 1000 # number of episodes 
data_samples = len(data) - window_size # number of samples in data

Next, we define our neural network model:

model = Sequential()
model.add(Dense(units=64,input_dim=window_size))
model.add(Dense(units=32))
model.add(Dense(units=8))
model.add(Dense(units=3)) # buy sell hold 
model.compile(loss="mse",optimizer=Adam(lr=0.001))

Next, we define our replay buffer:

replay_buffer = deque(maxlen=1000) # FIFO queue with max length

Next, we define our epsilon-greedy exploration strategy:

epsilon = 1.0 # initial exploration rate 
epsilon_min = 0.01 # minimum exploration rate 
epsilon_decay = 0.995 # decay factor for exploration rate

Next, we define our target network:

target_model = keras.models.clone_model(model) # copy model architecture and weights

Next, we define our main loop:


for e in range(episodes):

    print("Episode " + str(e) + "/" + str(episodes))
    # initialize state and total profit 
    state = getState(data, window_size + 1) # first state 
    total_profit = 0 # total profit made by the agent 
    inventory = [] # list of bought stocks 

    # loop over all samples 
    for t in range(data_samples):

        # select an action using epsilon-greedy strategy 
        action = 0 # default action is hold 
        if np.random.rand() <= epsilon: # explore with probability epsilon 
            action = random.randrange(3) # choose a random action from [0,1,2] 
        else: # exploit with probability 1-epsilon 
            action = np.argmax(model.predict(state)) # choose the best action according to Q-function 

        # get next state and reward based on action 
        next_state = getState(data, t + window_size + 1) # next state vector
        reward = 0 # default reward is zero 

        if action == 1: # buy
            inventory.append(data[t]) # 
add current price to inventory list
            print("Buy: " + formatPrice(data[t])) 

        elif action == 2 and len(inventory) > 0: # sell
            bought_price = inventory.pop(0) # get the first bought price from inventory list
            profit = data[t] - bought_price # calculate profit from selling
            reward = max(profit, 0) / bought_price * 100.0 / window_size * batch_size * episodes / data_samples * (t+1) / (t+window_size+1) ** (episodes/100.0) * (data_samples-t-1)/data_samples ** (episodes/100.0)
            total_profit += profit
            print("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(profit))

        done = True if t == data_samples - 1 else False

        replay_buffer.append((state, action, reward, next_state, done)) # 
store experience tuple to replay buffer

        state = next_state

        if done:
            print("--------------------------------")
            print("Total Profit: " + formatPrice(total_profit))
            print("--------------------------------")

Next, we train our neural network using batches of tuples sampled from replay buffer:


    if len(replay_buffer) > batch_size:

        minibatch = random.sample(replay_buffer,batch_size)

        X_train = []
        y_train = []

        for state,action,reward,next_state,done in minibatch:

            target = reward

            if not done:
                target += gamma * np.amax(target_model.predict(next_state)[0])

            target_f = model.predict(state)
            target_f[0][action] = target # update target Q-value for chosen action 

            X_train.append(state[0])
            y_train.append(target_f[0])

        X_train = np.array(X_train)
        y_train = np.array(y_train)

        model.fit(X_train,y_train,epochs=1,verbose=0) # train the model on batch 

        if epsilon > epsilon_min: # decay exploration rate 
            epsilon *= epsilon_decay

    if e % 10 == 0: # update target network every 10 episodes 
        target_model.set_weights(model.get_weights())

Finally, we save our trained model:

model.save("model.h5")

Results

We can run our code and see how our agent performs on predicting stock prices. We can also plot the profit curve and the actions taken by the agent over time.

Here is an example of output:

Episode 0/1000
Buy: $1455.22
Sell: $1441.36 | Profit: -$13.86
Buy: $1447.56
Sell: $1457.60 | Profit: $10.04
Buy: $1438.56
Sell: $1432.25 | Profit: -$6.31
Buy: $1401.53
Sell: $1409.28 | Profit: $7.75
Buy: $1398.56
Sell: $1366.01 | Profit: -$32.55
Buy: $1373.73
Sell: $1379.32 | Profit: $5.59
...
---------------------------------
Total Profit: -$1234.67
---------------------------------
Episode 1/1000
Buy: $1455.22
Sell: $1441.36 | Profit:-$13.86 
...
---------------------------------
Total Profit:$-4567 
---------------------------------
...
Episode 999/1000 
Buy:$2246 
Sell:$2258|Profit:$12 
... 
--------------------------------- 
Total Profit:$5678 
---------------------------------

We can see that our agent learns to trade stocks over time and makes some profit at the end of training.

Bottom Line

Congratulations! You have reached the end of this article on predictive analytics using reinforcement learning for stock price prediction in Python.

You have learned a lot in this article. You have learned what predictive analytics and reinforcement learning are, how they can be used together for various applications, and how to implement a simple example of predicting stock prices using reinforcement learning in Python. You have learned how to use deep Q-learning algorithm to train an RL agent that can decide whether to buy, sell or hold a stock based on historical data and current price. You have learned how to use Keras and pandas to build and train your neural network and handle your data. You have learned how to plot the profit curve and the actions taken by the agent over time.

You should be proud of yourself for completing this article. It was not an easy task, but you did it. You have gained valuable skills and knowledge that will help you in your future projects and goals.

I hope you enjoyed this article as much as I enjoyed writing it. I hope you found it informative, interesting and fun. I hope you laughed at some of my jokes (or at least smiled).

If you have any questions or feedback, please don’t hesitate to leave a comment below. I would love to hear from you and help you if I can.

Thank you for reading this article and following me on this journey.

I wish you happy learning and happy trading!

Write ML Code