Over the last few years we’ve looked at various tools to help us identify exploitable patterns in asset prices. In particular we have considered basic econometrics, statistical machine learning and Bayesian statistics.
While these are all great modern tools for data analysis, the vast majority of asset modeling in the industry still makes use of statistical time series analysis. In this article we are going to examine what time series analysis is, outline its scope and learn how we can apply the techniques to various frequencies of financial data.
Firstly, a time series is defined as some quantity that is measured sequentially in time over some interval.
In its broadest form, time series analysis is about inferring what has happened to a series of data points in the past and attempting to predict what will happen to it the future.
However, we are going to take a quantitative statistical approach to time series, by assuming that our time series are realisations of sequences of random variables. That is, we are going to assume that there is some underlying generating process for our time series based on one or more statistical distributions from which these variables are drawn.
Time series analysis attempts to understand the past and predict the future. Such a sequence of random variables is known as a discrete-time stochastic process (DTSP). In quantitative trading we are concerned with attempting to fit statistical models to these DTSPs to infer underlying relationships between series or predict future values in order to generate trading signals.
Time series in general, including those outside of the financial world, often contain the following features:
Trends - A trend is a consistent directional movement in a time series. These trends will either be deterministic or stochastic. The former allows us to provide an underlying rationale for the trend, while the latter is a random feature of a series that we will be unlikely to explain. Trends often appear in financial series, particularly commodities prices, and many Commodity Trading Advisor (CTA) funds use sophisticated trend identification models in their trading algorithms.
Seasonal Variation - Many time series contain seasonal variation. This is particularly true in series representing business sales or climate levels. In quantitative finance we often see seasonal variation in commodities, particularly those related to growing seasons or annual temperature variation (such as natural gas).
Serial Dependence - One of the most important characteristics of time series, particularly financial series, is that of serial correlation. This occurs when time series observations that are close together in time tend to be correlated. Volatility clustering is one aspect of serial correlation that is particularly important in quantitative trading.
Our goal as quantitative researchers is to identify trends, seasonal variations and correlation using statistical time series methods, and ultimately generate trading signals or filters based on inference or predictions.
Our approach will be to:
In addition we can apply standard (classical/frequentist or Bayesian) statistical tests to our time series models in order to justify certain behaviours, such as regime change in equity markets.
To date we have almost exclusively made use of C++ and Python for our trading strategy implementation. Both of these languages are “first class environments” for writing an entire trading stack. They both contain many libraries and allow an “end-to-end” construction of a trading system solely within that language.
Unfortunately, C++ and Python do not possess extensive statistical libraries. This is one of their shortcomings. For this reason we will be using the R statistical environment as a means of carrying out time series research. R is well-suited for the job due to the availability of time series libraries, statistical methods and straightforward plotting capabilities.
We will learn R in a problem-solving fashion, whereby new commands and syntax will be introduced as needed. Fortunately, there are plenty of extremely useful tutorials for R availabile on the internet and I will point them out as we go through the sequence of time series analysis articles.
Previous articles to date on the topics of statistical learning, econometrics and Bayesian analysis, have mostly been introductory in nature and haven’t considered applications of such techniques to modern, high-frequency pricing information.
In order to apply some of the above techniques to higher frequency data we need a mathematical framework in which to unify our research. Time series analysis provides such a unification and allows us to discuss separate models within a statistical setting.
Eventually we will utilise Bayesian tools and machine learning techniques in conjunction with the following methods in order to forecast price level and direction, act as filters and determine “regime change”, that is, determine when our time series have changed their underlying statistical behaviour.
Our time series roadmap is as follows. Each of the topics below will form its own article or set of articles. Once we’ve examined these methods in depth, we will be in a position to create some sophisticated modern models for examining high-frequency data.