code / programming python

A simple grid search to find (p,d,q)*(p,d,q,s) in SARIMAX forecasting algorithm

I’m not a data sciencitist and I don’t pretend to be one of them. I’m a data analyst focusing on business with a big love on IT and programming, that’s it. But I’m, let’s say quite curious and, same as a driver who doesn’t know all the details of a car engine, I still can drive a car when needed (a more in-depth article of what I can do). Moreover, you need to constantly learn not to get bored.

Internet democratized access to knowledge and sometimes, to keep learning and improving my work at the office, I’m try new things, such as in this case SARIMAX, for forecasting. I generally use Prophet, which has been developed by the Core data science team at Facebook, since :

  1. It’s quite simple and straightforward to use
  2. The MAPE score and all back-test analysis to assess the model performance is really bluffing
  3. It works pretty well on daily-level datasets

Sometimes on this blog, for no reason but for me to find back later some interesting pieces of code which helped me, I will put some code which I developed or curated while leading some pieces of analysis.

#code found here: https://www.bounteous.com/insights/2020/09/15/forecasting-time-series-model-using-python-part-two/


import itertools
import warnings
warnings.filterwarnings("ignore")

def sarima_grid_search(y,seasonal_period):
    p = d = q = range(0, 3)
    pdq = list(itertools.product(p, d, q))
    seasonal_pdq = [(x[0], x[1], x[2],seasonal_period) for x in list(itertools.product(p, d, q))]
    
    mini = float('+inf')
    
    
    for param in pdq:
        for param_seasonal in seasonal_pdq:
            try:
                mod = sm.tsa.statespace.SARIMAX(y,
                                                order=param,
                                                seasonal_order=param_seasonal,
                                                enforce_stationarity=False,
                                                enforce_invertibility=False)

                results = mod.fit()
                
                if results.aic < mini:
                    mini = results.aic
                    param_mini = param
                    param_seasonal_mini = param_seasonal

#                 print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, results.aic))
            except:
                continue
    print('The set of parameters with the minimum AIC is: SARIMA{}x{} - AIC:{}'.format(param_mini, param_seasonal_mini, mini))

How it works?

This piece of code will first generate all the combinations possible of p,d,q (P for auto-regression, D for differenciation and Q for the moving average component. I modified the original code to include a broader search (since the range function in Python is exclusive, this range(0,3) will create all the combinations possible for 0,1,2)

It will then set a variable named mini which is equal to infinite. This will be the maximum value and, with the loop, the code will fit the model to find the optimal combination which will manage to get the lowest AIC (Akaike’s Information Criterion ) score. It will compare each value generated vs. the value stored in mini var.

The last combination generated will be the optimal set of parameters to use for the fitting of the SARIMAX forecasting algorithm. This function is just returning a prompt, but instead of displaying the information, these parameters could / should be stored into a dictionnary to be used automatically later.

I really advise disabling the warnings when running this grid search as, from experience, this can generate quite a few lines of log warnings being prompted while the loop runs, which is really noisy. This is why I just added from the original version of the code these 2 lines disabling the warnings.