Presented here is a demonstration of how to use the robin.py module from the rh_trading repository on my github (https://github.com/michaelray1). Here's the general strategy the neural network will implement:
Calculate relevant metrics for whether a stock will go up or down. For instance, find out which bollinger band the stock is currently sitting in. Each metric will be binary in nature, either true or false. We represent true by +1 and false by -1.
Calculate a weighted sum of the binary metrics, as follows:
Here the $w_i$ are the weights, which the neural network will find, and the $M_i$ are the metric values (either +1 or -1). The sum runs from 1 to $n$, where $n$ is the number of metrics.
Let's begin by importing the relevant modules.
%%capture
#Import the robin.py module and numpy
import robin as rb
import numpy as np
import robin_stocks.robinhood as rh
#Create a statistics object, then use that to create a metrics object
stats = rb.statistics()
metrics = rb.metrics(statistics = stats)
#Now use these two to create a neural network object. You also need a username and password for this. Enter them below.
un = ''
pw = ''
nn = rb.nn(metrics = metrics, statistics = stats, un = un, pw = pw)
Now that we've got all the necessary ingredients for the setup of our neural network. The last thing we need is some data to train the neural network. All we need to provide is the stock tickers for whatever stocks we are interested in analyzing. I've got a file saved here that has tickers for all the S&P 500 stocks (expensive, safe stocks). We'll use the S&P 500 to train and test our network. Let's load in the tickers
#Load in S&P 500 stock tickers and print out what it looks like
sp500 = np.genfromtxt('sp500_tickers.txt', delimiter='\n', dtype=str)
print("length of sp500 list = ", len(sp500))
Some of these stock tickers correspond to stocks that robinhood does not offer. So, we first need to clean our list of tickers set to get rid of any tickers that robinhood does not offer (since we can't get the data for these stocks).
%%capture
#Clean the data to get rid of any invalid stock tickers
bad_stocks = []
for i in np.arange(len(sp500)):
x = rh.get_stock_historicals(inputSymbols = str(sp500[i]))
if x==[None]:
bad_stocks.append(sp500[i])
else:
pass
sp500=list(sp500)
for i in np.arange(len(bad_stocks)):
sp500.remove(bad_stocks[i])
sp500 = np.array(sp500)
Now that we've eliminated many of our tickers, let's see how many we have left.
len(sp500)
Okay, now we've got a list of 381 ticker symbols which we can use to get robinhood stock data for 381 companies. Let's go ahead and get that data. First, however, let's understand, in the context of our particular strategy, what our data is going to look like.
We use the get_tt_data function to get our training and testing data. What this function does is goes through all of the metrics in the nn.metrics class and calculates all of the metrics for each stock in the list of tickers we give it. Then, for each stock, our data has length equal to the number of metrics + 1. The extra data point is the target value (the Y value) and tells us whether or not the stock actually increased in price by 4% in the 10 days following our data set (the 4 and 10 here are default values and can be changed by passing the arguments days_before and percent_gained into the get_tt_data function).
So, now we understand what our data looks like, so let's calculate it for our 381 stocks.
%%capture
#Use neural network function to get training and testing data
nn.get_tt_data(inputSymbols = sp500, days_before = 30)
The testing and training data is crucial to creating a successful neural network. In a sense, this is the main function that implements my trading strategy. Let's see what the shape of our data is. Our function get_tt_data splits the data into a training set and a testing set for us. By defualt 80% of the data is used for training, but this can be changed with the percent_training argument for the get_tt_data function.
print("testing data shape = ", nn.testing.shape)
print("training data shape = ", nn.training.shape)
Due to the importance of what our training and testing data actually tells us, let's go over one more time what is precisely is.
Each row in our data consists of of 5 values. Four of these are our metrics and are either plus one or minus one. The rightmost value in each row is the "expected output" or "target value". By default (you can change all these settings in the keyword arguments of nn.get_tt_data), this tells us whether the stock went up by 4% within 10 days of the end of the data that the neural network has access to. Essentially, we truncate the data such that the neural network only has access to data up to 10 trading days before the present day. Then we use that truncated data set to try to predict the "future" (we have the "future" data already, we are using it to see if the neural network can make accurate predictions).
To get a feel for what these metrics are, let's look at an example of one. One of the metrics is called bbands_bottom and it has a value of +1 or -1 (as do all our metrics in this scheme). bbands_bottom gives +1 if the stock's price is below one standard deviation (the number of std deviations can be changed in the code) from its long-term mean, and -1 if it higher than that. This is basically saying "give a positive point to the buy signal if the stock seems lower than it should be and give a negative point to the buy signal if the stock does not seem lower than it should be". It is based in the idea that stocks over long periods of time, will generally exhibit mean reversion.
Once the neural network has the metrics, the idea is that it will weight each metric by a given amount (we called these weights $w_i$). Then, it takes a look at the "total prediction" which is $S$. Then we apply a reLU activation function to $S$ and the neural network predicts a "1" which corresponds to "buy the stock" or "0" which corresonds to "don't buy the stock".
Now, what the neural network is doing, is it's using the last column of our training data (the "expected output") to optimize the weights, w_i. By using the training/testing data to optimize the weights, the idea is that we can let the computer figure out which metrics are actually the best indicators of whether a stock will go up or down. Perhaps it's actually some funky combination of all the metrics that best indicates whether a stock will go up or down. Let's see how to check the accuracy of our neural network and also use it to predict things about other stocks.
#Build the neural network with non-default settings for size of layers
nn.build_network(hidden_layers=[32, 4, 2, 1])
%%capture
#Train the network
nn.train_network(batch_size=8, epochs=5000)
Now our neural network is trained and ready to make predictions! Before we make predictions about the future, however, let's test the overall accuracy of our model. To do this, we use the test_accuracy() function from the nn (neural network) class. The function test_accuracy() works as follows:
For each stock ticker in the list of stocks used for testing data, calculate each of the metrics using data up to 10 days before the present (again, this value of 10 can be changed, but if you change it, you should run get_tt_data again and retrain your network using nn.train_network).
Calculate $S = \Sigma_{i=1}^{n} w_i M_i$.
Apply the ReLU activation function to $S$ to get a prediction, either 0 (buy) or 1 (don't buy).
Compare this result to whether or not the stock actually did increase by 4% over the 10 days following the truncation day of our data.
Do this for all stocks and output the total number of correct predictions.
nn.test_accuracy()
The result for the 77 stocks in our testing data is that we correctly predicted 71 of them. That's about a 92% success rate!
There's one last step to actually implementing our model, and that's to make a prediction about some other stock. Let's see how to do this for Apple stock.
nn.predict([str('AAPL')])[0][0]
The 1.0 here tells us that the neural network predicts an increase of 4% over the next 10 days for Apple stock (i.e. we should buy Apple stock). When using the predict function, our neural network does not use a truncated data set, but instead uses the full data set (because when making predictions about the future, we clearly want access to all of the data up to the present). So, we have finally made a prediction about the future!
Our 4% increase that we are predicting might sound small, but let's remember that if we make lot's of 4% increases (like, say 71 of them in 10 days as our testing data showed) then we can make a large profit on our investments.