In this project, I want to implement LSTM to predict stock price on a minute-level frequency. Basically, I want LSTM learn from a set window size of data (maybe stock return in the last ten minutes) so that it can be used to predict the next N-steps in the series.
This project is divided into five part:
- data processing
- feature selection
- model building
- model evaluation
- backtesting
-
high realized volatility
- 5/15 minute frequency pct data.
- Implemented by
realized_volatility.py
and save toF:/data/rv.h5
-
high turnover rate
volume/market
-
no large jump at opening
open_price/last_day_close_price
- get rid of new stocks
To test the effectiveness of LSTM on stock prediction, use some very simple feature to test the model first.
- percent change
- jump change between minutes
- volume
- high/low (amplitude)
- pct with moving average
- pct of the market and industry
- pct of related stock
- mean pct
- realized volatility
why standardize?
- make training faster
- less likely stucking in local optima
- gradients less likely explode or becomes too small (more likely if features are big)
Easy since there's limit (20), so use Min-Max normalization
method
- Use z-score with mean and std from previous 5/10 day
- Use quantile compared with history/recent_days
-
SelectKBest Algorithm, f_regression, F statistic
-
Autocorrelation/cross-correlation
-
sklearn
- target: next minute percent change (up,down or stay) from 9:31 to 11:30 and 1:01 to 14:45
- input: minute pct from 9:30 to 11:29 and 1:00 to 14:44
different stock may perform differently, maybe more suitable to build different model for different category of stocks. (By doing unsupervised learning first)