Want to learn more? Take the full course at at your own pace. More than a video, you’ll learn hands-on coding & quickly apply skills to your daily work.
Last time we learned about ways in which a time series can be non-stationary, and how we can identify it by plotting.
However, there are more formal ways of accomplishing this task, with statistical tests.
There are also ways to transform non-stationary time series into stationary ones.
We’ll address both of these in this lesson and then you’ll be ready to start modeling.
The most common test for identifying whether a time series is non-stationary is the augmented Dicky-Fuller test.
This is a statistical test, where the null hypothesis is that your time series is non-stationary due to trend.
We can implement the augmented Dicky-Fuller test using statsmodels. First we import the adfuller function as shown, then we can run it on our time series.
The results object is a tuple. The zeroth element is the test statistic, in this case it is -1.34.
The more negative this number is, the more likely that the data is stationary.
The next item in the results tuple, is the test p-value. Here it’s 0.6. If the p-value is smaller than 0.05, we reject the null hypothesis and assume our time series must be stationary.
The last item in the tuple is a dictionary. This stores the critical values of the test statistic which equate to different p-values. In this case, if we wanted a p-value of 0.05 or below, our test statistic needed to be below -2.91.
We will ignore the rest of the tuple items for now but you can find out more about them here.
Remember that it is always worth plotting your time series as well as doing the statistical tests. These tests are very useful but sometimes they don’t capture the full picture.
Remember that Dicky-Fuller only tests for trend stationarity.
In this example, although the time series behavior clearly changes, and is non-stationary, it passes the Dicky-Fuller test. So let’s say we have a time series that is non-stationary. We need to transform the data into a stationary form before we can model it. You can think of this a bit like feature engineering in classic machine learning.
Let’s start with a non-stationary dataset. Here is an example of the population of a city. One very common way to make a time series stationary is to take its difference.
This is where, from each value in our time series we subtract the previous value. We can do this using the dot-diff method of a pandas DataFrame. Notice that this gives us one NaN value at the start since there is no previous value to subtract from it. We can get rid of this using the dot-dropna method.
Here is the time series after differencing. This time, taking the difference was enough to make it stationary, but for other time series we may need to take the difference more than once.
Sometimes we will need to perform other transformations to make the time series stationary.
This could be to take the log, or the square root of a time series, or to calculate the proportional change.
It can be hard to decide which of these to do, but often the simplest solution is the best one.
You’ve learned how to test for stationarity and make time series stationary. Now let’s practice!
#PythonTutorial #Python #DataCamp #timeseries #stationarity #ARIMA