Autobnn: Probabilistic Time Series Forecasting With Compositional Bayesian Neural Networks

Posted by Urs Köster, Software Engineer, Google Research

Time series problems are ubiquitous, from forecasting upwind and postulation patterns to knowing economical trends. Bayesian approaches commencement pinch an presumption astir nan data's patterns (prior probability), collecting grounds (e.g., caller clip bid data), and continuously updating that presumption to shape a posterior probability distribution. Traditional Bayesian approaches for illustration Gaussian processes (GPs) and Structural Time Series are extensively utilized for modeling clip bid data, e.g., nan commonly utilized Mauna Loa CO2 dataset. However, they often trust connected domain experts to painstakingly prime due exemplary components and whitethorn beryllium computationally expensive. Alternatives specified arsenic neural networks deficiency interpretability, making it difficult to understand really they make forecasts, and don't nutrient reliable assurance intervals.

To that end, we present AutoBNN, a caller open-source package written successful JAX. AutoBNN automates nan find of interpretable clip bid forecasting models, provides high-quality uncertainty estimates, and scales efficaciously for usage connected ample datasets. We picture really AutoBNN combines nan interpretability of accepted probabilistic approaches pinch nan scalability and elasticity of neural networks.

AutoBNN

AutoBNN is based connected a line of research that complete nan past decade has yielded improved predictive accuracy by modeling clip bid utilizing GPs pinch learned kernel structures. The kernel usability of a GP encodes assumptions astir nan usability being modeled, specified arsenic nan beingness of trends, periodicity aliases noise. With learned GP kernels, nan kernel usability is defined compositionally: it is either a guidelines kernel (such arsenic Linear, Quadratic, Periodic, Matérn aliases ExponentiatedQuadratic) aliases a composite that combines 2 aliases much kernel functions utilizing operators specified arsenic Addition, Multiplication, aliases ChangePoint. This compositional kernel building serves 2 related purposes. First, it is elemental capable that a personification who is an master astir their data, but not needfully astir GPs, tin conception a reasonable anterior for their clip series. Second, techniques for illustration Sequential Monte Carlo tin beryllium utilized for discrete searches complete mini structures and tin output interpretable results.

AutoBNN improves upon these ideas, replacing nan GP pinch Bayesian neural networks (BNNs) while retaining nan compositional kernel structure. A BNN is simply a neural web pinch a probability distribution complete weights alternatively than a fixed group of weights. This induces a distribution complete outputs, capturing uncertainty successful nan predictions. BNNs bring nan pursuing advantages complete GPs: First, training ample GPs is computationally expensive, and accepted training algorithms standard arsenic nan cube of nan number of information points successful nan clip series. In contrast, for a fixed width, training a BNN will often beryllium astir linear successful nan number of information points. Second, BNNs lend themselves amended to GPU and TPU hardware acceleration than GP training operations. Third, compositional BNNs tin beryllium easy mixed pinch traditional heavy BNNs, which person nan expertise to do characteristic discovery. One could ideate "hybrid" architectures, successful which users specify a top-level building of Add(Linear, Periodic, Deep), and nan heavy BNN is near to study nan contributions from perchance high-dimensional covariate information.

How mightiness 1 construe a GP pinch compositional kernels into a BNN then? A azygous furniture neural web will typically converge to a GP arsenic nan number of neurons (or "width") goes to infinity. More recently, researchers person discovered a correspondence successful nan different guidance — galore celebrated GP kernels (such arsenic Matern, ExponentiatedQuadratic, Polynomial aliases Periodic) tin beryllium obtained arsenic infinite-width BNNs pinch appropriately chosen activation functions and weight distributions. Furthermore, these BNNs stay adjacent to nan corresponding GP moreover erstwhile nan width is very overmuch little than infinite. For example, nan figures beneath show nan quality successful nan covariance betwixt pairs of observations, and regression results of nan existent GPs and their corresponding width-10 neural web versions.

Comparison of Gram matrices betwixt existent GP kernels (top row) and their width 10 neural web approximations (bottom row).

Comparison of regression results betwixt existent GP kernels (top row) and their width 10 neural web approximations (bottom row).

Finally, nan translator is completed pinch BNN analogues of nan Addition and Multiplication operators complete GPs, and input warping to nutrient periodic kernels. BNN summation is straightforwardly fixed by adding nan outputs of nan constituent BNNs. BNN multiplication is achieved by multiplying nan activations of nan hidden layers of nan BNNs and past applying a shared dense layer. We are truthful constricted to only multiplying BNNs pinch nan aforesaid hidden width.

Using AutoBNN

The AutoBNN package is disposable wrong Tensorflow Probability. It is implemented successful JAX and uses nan flax.linen neural web library. It implements each of nan guidelines kernels and operators discussed truthful acold (Linear, Quadratic, Matern, ExponentiatedQuadratic, Periodic, Addition, Multiplication) positive 1 caller kernel and 3 caller operators:

a OneLayer kernel, a azygous hidden furniture ReLU BNN,
a ChangePoint usability that allows smoothly switching betwixt 2 kernels,
a LearnableChangePoint usability which is nan aforesaid arsenic ChangePoint isolated from position and slope are fixed anterior distributions and tin beryllium learnt from nan data, and
a WeightedSum operator.

WeightedSum combines 2 aliases much BNNs pinch learnable mixing weights, wherever nan learnable weights travel a Dirichlet prior. By default, a level Dirichlet distribution pinch attraction 1.0 is used.

WeightedSums let a "soft" type of building discovery, i.e., training a linear operation of galore imaginable models astatine once. In opposition to building find pinch discrete structures, specified arsenic successful AutoGP, this allows america to usage modular gradient methods to study structures, alternatively than utilizing costly discrete optimization. Instead of evaluating imaginable combinatorial structures successful series, WeightedSum allows america to measure them successful parallel.

To easy alteration exploration, AutoBNN defines a number of exemplary structures that incorporate either top-level aliases soul WeightedSums. The names of these models tin beryllium utilized arsenic nan first parameter successful immoderate of nan estimator constructors, and see things for illustration sum_of_stumps (the WeightedSum complete each nan guidelines kernels) and sum_of_shallow (which adds each imaginable combinations of guidelines kernels pinch each operators).

Illustration of nan sum_of_stumps model. The bars successful nan apical statement show nan magnitude by which each guidelines kernel contributes, and nan bottommost statement shows nan usability represented by nan guidelines kernel. The resulting weighted sum is shown connected nan right.

The fig beneath demonstrates nan method of building find connected nan N374 (a clip bid of yearly financial information starting from 1949) from nan M3 dataset. The six guidelines structures were ExponentiatedQuadratic (which is nan aforesaid arsenic nan Radial Basis Function kernel, aliases RBF for short), Matern, Linear, Quadratic, OneLayer and Periodic kernels. The fig shows nan MAP estimates of their weights complete an ensemble of 32 particles. All of nan precocious likelihood particles gave a ample weight to nan Periodic component, debased weights to Linear, Quadratic and OneLayer, and a ample weight to either RBF aliases Matern.

Parallel coordinates crippled of nan MAP estimates of nan guidelines kernel weights complete 32 particles. The sum_of_stumps exemplary was trained connected nan N374 bid from nan M3 dataset (insert successful blue). Darker lines correspond to particles pinch higher likelihoods.

By utilizing WeightedSums arsenic nan inputs to different operators, it is imaginable to definitive rich | combinatorial structures, while keeping models compact and nan number of learnable weights small. As an example, we see nan sum_of_products exemplary (illustrated successful nan fig below) which first creates a pairwise merchandise of 2 WeightedSums, and past a sum of nan 2 products. By mounting immoderate of nan weights to zero, we tin create galore different discrete structures. The full number of imaginable structures successful this exemplary is 216, since location are 16 guidelines kernels that tin beryllium turned connected aliases off. All these structures are explored implicitly by training conscionable this 1 model.

Illustration of nan "sum_of_products" model. Each of nan 4 WeightedSums person nan aforesaid building arsenic nan "sum_of_stumps" model.

We person found, however, that definite combinations of kernels (e.g., nan merchandise of Periodic and either nan Matern aliases ExponentiatedQuadratic) lead to overfitting connected galore datasets. To forestall this, we person defined exemplary classes for illustration sum_of_safe_shallow that exclude specified products erstwhile performing building find pinch WeightedSums.

For training, AutoBNN provides AutoBnnMapEstimator and AutoBnnMCMCEstimator to execute MAP and MCMC inference, respectively. Either estimator tin beryllium mixed pinch immoderate of nan six likelihood functions, including 4 based connected normal distributions pinch different sound characteristics for continuous information and 2 based connected nan antagonistic binomial distribution for count data.

Result from moving AutoBNN connected nan Mauna Loa CO2 dataset successful our illustration colab. The exemplary captures nan inclination and seasonal constituent successful nan data. Extrapolating into nan future, nan mean prediction somewhat underestimates nan existent trend, while nan 95% assurance interval gradually increases.

To fresh a exemplary for illustration successful nan fig above, each it takes is nan pursuing 10 lines of code, utilizing nan scikit-learn–inspired estimator interface:

import autobnn arsenic ab model = ab.operators.Add( bnns=(ab.kernels.PeriodicBNN(width=50), ab.kernels.LinearBNN(width=50), ab.kernels.MaternBNN(width=50))) estimator = ab.estimators.AutoBnnMapEstimator( model, 'normal_likelihood_logistic_noise', jax.random.PRNGKey(42), periods=[12]) estimator.fit(my_training_data_xs, my_training_data_ys) low, mid, precocious = estimator.predict_quantiles(my_training_data_xs)

Conclusion

AutoBNN provides a powerful and elastic model for building blase clip bid prediction models. By combining nan strengths of BNNs and GPs pinch compositional kernels, AutoBNN opens a world of possibilities for knowing and forecasting analyzable data. We induce nan organization to effort the colab, and leverage this room to innovate and lick real-world challenges.

Acknowledgements

AutoBNN was written by Colin Carroll, Thomas Colthurst, Urs Köster and Srinivas Vasudevan. We would for illustration to convey Kevin Murphy, Brian Patton and Feras Saad for their proposal and feedback.