Nowcasting Inflation In Brazil Using Web Data

J. Renato Leripio

Kapitalo Investimentos

Outline

  1. Motivation (financial market point of view).
  2. In highly competitive environments, details matter a lot.
  3. New data can be more important than just employing state-of-art methods.
  4. Data from websites provide a cutting edge for inflation (and other variables) forecasting.

Motivation

  • Inflation expectations have a direct effect on the yield curve and stocks.
  • Large surprises in monthly releases can and do shift expectations accordingly.
  • The presence of some very volatile items with a high share in the consumption basket makes short-term inflation particularly hard to forecast.
  • Special events such as out-of-season sales, supply disruptions and tax changes pose an additional challenge that statistical models fail to capture.

What the literature says

  • Stock and Watson (2010): “It is exceedingly difficult to improve systematically upon simple univariate forecasting models”.
  • Medeiros et al. (2021): “Our results show that it is possible to consistently beat univariate benchmarks”.
    • Random Forest using a high dimensional (publicly available) data set.
    • Greater gains only for 3-up-to-5 months ahead.
  • Similar results can be found for Brazil.
    • Usually making use of market expectations on the RHS.
    • Cut off dates make incentives for market participants to reveal their true expectations unevenly distributed across time.

The real horse race in Brazil

  • Dozens of qualified market participants improving their forecasts on a daily basis as new information arrive.
  • Highly accurate short-term forecasts: deeply disaggregated approach, high quality data and judgment.
  • Beating univariate benchmarks is not enough to beat the market (Garcia, Medeiros, and Vasconcelos (2017)).
  • Improving a few basis points over the market requires tackling specific items.

Market accuracy (BCB Survey)

The market moviments

  • Market participants move in a very synchronized and similar way.
    • No significant spike in SD over the forecast horizon.
  • This is due to a few reasons:
    • The incorporation of new publicly available information (both hard and soft data, news, etc).
    • Incoming data from the FGV Monitor, which is a private tracking of every CPI item collected at brick-and-mortar stores.
  • Usually, one item (or a couple of items) is responsible for large forecast errors among market participants.
    • Recent episode: large forecast errors for Ind. Goods arguably due to erroneous prediction of perfume prices.

Perfume over-representation

  • Perfume makes up ~ 4% of Ind. Goods (out of 69 items) but accounts for a large portion of its variance.

The recent case of Industrial Goods

  • The FGV Monitor was unable to capture these large movements. More on this soon.

We do need new information

  • Online prices are available at a much lower cost.
  • We can collect data on a daily basis:
    • This allow us to optimize the predictor (e.g.: is data collected at an specific day?).
    • We can have a broader sense of uncertainty (e.g.: Are short-lived sales to have an impact or not?).
  • However, both greater product availability and higher frequency data require spending more time on selection tasks.
  • In addition, the way we summarize information to compute the indicator is item-specific and may require solid field knowledge. For example:
    • Perfume: a panel of the same products over time.
    • Electronic devices: constant features (for some time), but products may vary.

Perfume indicator pipeline

Small sample (temporary) solution:

  1. Data collected from the three most representative brands.
  2. For each brand, we kept the 10 most popular products.
  3. Then, we created K random baskets of N products accounting for the brand’s market share.
  4. The point forecast is the median from this distribution. Shaded area are the maximum and minimum (we could use quantiles instead).

Results for Perfume

Results for Ind. Goods

Results for other items are also promising

  • We currently collect web site prices for ~ 65% of CPI items.

  • Some of them show great adherence without much effort.

  • Some are useful to catch large movements only.

  • Some are useful to reveal trends.

  • Some are not good at all (to best of our knowledge).

  • A few examples next (short time, sorry).

Results for Milk

  • A couple of representative brands with a single product each.
  • Online prices were able to better capture the desinflation following a supply disrupt in the Milk market.

Results for PC

  • Major brands and products sorted by specific features.
  • As with perfume, online prices were also better at capturing the movements of Black Friday.

Used cars

  • More challenging since we measure the listing price, not the actual deal price.
  • State-space model to extract a common trend from several websites.
  • During most of the recent period the benchmark showed upward rather than downward level prices.

Future developments

  • The national statistical bureau collects and summarizes each CPI item using different methodologies. A great deal of effort then goes into trying to reproduce the items accordingly.
  • As samples grow larger, optimization techniques will become more reliable.
  • Ensamble techniques (online indicator + benchmark + …) are promising too.
  • In addition to goods prices, online prices can also be used to forecast services prices.

Main Takeaways

  • It’s very hard to outperform the (true) market using only publicly structured available data.
  • There is great potential for improvement by more accurately predicting specific items.
  • Short-term forecast can benefit enormously from unstructured data and web sites provide a good source of information.
  • Transforming these data into reliable indicators is not trivial, but it surely pays off (I hope my employer feels the same).
  • The boundary between forecasting and data science is getting even thinner, requiring forecasters to either learn data science skills themselves or work alongside data scientists.

Thank you!

Contact:

Personal website: http://rleripio.com

Book: R for Economic Research (online & free) http://book.rleripio.com

References

Garcia, Márcio G. P., Marcelo C. Medeiros, and Gabriel F. R. Vasconcelos. 2017. “Real-Time Inflation Forecasting with High-Dimensional Models: The Case of Brazil.” International Journal of Forecasting 33 (3): 679–93. https://doi.org/https://doi.org/10.1016/j.ijforecast.2017.02.002.
Medeiros, Marcelo C., Gabriel F. R. Vasconcelos, Álvaro Veiga, and Eduardo Zilberman. 2021. “Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods.” Journal of Business & Economic Statistics 39 (1): 98–119. https://doi.org/10.1080/07350015.2019.1637745.
Stock, James H, and Mark W Watson. 2010. “Modeling Inflation After the Crisis.” Working Paper 16488. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w16488.