learn.probabilistic.init_force_matching

learn.probabilistic.init_force_matching#

init_force_matching(energy_param_prior, energy_fn_template, nbrs_init, init_params, position_data, energy_data=None, energy_scale=None, force_data=None, force_scale=None, virial_data=None, virial_scale=None, kt_data=None, box_tensor=None, train_ratio=0.7, val_ratio=0.1, likelihood_distribution=<function logpdf>, prior_scale_distribution=<function logpdf>, prior_scale_init_multiple=1.0, shuffle=False)[source]#

Initializes a compatible set of prior, likelihood, initial MCMC samples as well as train and validation loaders for learning probabilistic potentials via force-matching.

Data scales are used for parametrization of the exponential prior distributions for the standard deviations of the (energy, force and/or virial) likelihood distributions. This allows accounting for the different scales of energies, forces and virial, similar to the loss weights in standard force matching. Additionally, the scales of the likelihood components are normalized to be on same scale as energy_params to facilitate learning.

Note that scale = 1 / lambda for the common parametrization of the exponential distribution via the rate parameter lambda. See the scipy.stats.expon documentation for more details.

Parameters:
  • energy_param_prior – Prior function for , e.g. as generated from 'init_elementwise_prior_fn'.

  • energy_fn_template – Energy function template

  • nbrs_init – Initial neighbor list

  • init_params – Initial energy params

  • position_data – (N_snapshots x N_particles x dim) array of particle positions

  • energy_data – (N_snapshots,) array of corresponding energy values, if applicable

  • energy_scale – Prior scale of energy data.

  • force_data – (N_snapshots x N_particles x dim) array of corresponding forces acting on particles, if applicable.

  • force_scale – Prior scale of force components.

  • virial_data – (N_snapshots,) or (N_snapshots, dim, dim) array of corresponding virial (tensor) values (without kinetic contribution), if applicable.

  • virial_scale – Prior scale of virial components.

  • kt_data – Temperature corresponding to each data point. For learning temperature-dependent (coarse-graned) models.

  • box_tensor – Box tensor, only needed if virial_data used.

  • train_ratio – Ratio of dataset to be used for training. The remaining data can be used for validation.

  • val_ratio – Ratio of dataset to be used for validation. The remaining data will be used for testing.

  • likelihood_distribution – Log-likelihood distribution, defaults to Gaussian log-likelihood.

  • prior_scale_distribution – Log-prior distribution of the likelihood scale parameter. Defaults to exponential distribution.

  • prior_scale_init_multiple – Initial value of prior scale multiple. Can be used to initialize prior scales larger or smaller than prior mean or to cunteract the smaller scale in the gamma distribution compared to the mean.

  • shuffle – Whether to shuffle data before splitting into train-val-test.

Returns:

A tuple (prior_fn, likelihood_fn, init_samples, train_loader, val_loader, test_loader, test_set). Prior and likelihood can be used to construct the potential function for Hamiltonian MCMC formulations. Init_samples is a list of initial values for multiple MCMC chains, e.g. for SGMCMCTrainer. The data loaders are jax-sgmc NumpyDataloaders used for training, validation and testing. The test_set can be used for further analyses of the trained model on unseen data.