learn.probabilistic.init_force_matching#
- init_force_matching(energy_param_prior, energy_fn_template, nbrs_init, init_params, position_data, energy_data=None, energy_scale=None, force_data=None, force_scale=None, virial_data=None, virial_scale=None, kt_data=None, box_tensor=None, train_ratio=0.7, val_ratio=0.1, likelihood_distribution=<function logpdf>, prior_scale_distribution=<function logpdf>, prior_scale_init_multiple=1.0, shuffle=False)[source]#
Initializes a compatible set of prior, likelihood, initial MCMC samples as well as train and validation loaders for learning probabilistic potentials via force-matching.
Data scales are used for parametrization of the exponential prior distributions for the standard deviations of the (energy, force and/or virial) likelihood distributions. This allows accounting for the different scales of energies, forces and virial, similar to the loss weights in standard force matching. Additionally, the scales of the likelihood components are normalized to be on same scale as energy_params to facilitate learning.
Note that scale = 1 / lambda for the common parametrization of the exponential distribution via the rate parameter lambda. See the
scipy.stats.expondocumentation for more details.- Parameters:
energy_param_prior – Prior function for , e.g. as generated from
'init_elementwise_prior_fn'.energy_fn_template – Energy function template
nbrs_init – Initial neighbor list
init_params – Initial energy params
position_data – (N_snapshots x N_particles x dim) array of particle positions
energy_data – (N_snapshots,) array of corresponding energy values, if applicable
energy_scale – Prior scale of energy data.
force_data – (N_snapshots x N_particles x dim) array of corresponding forces acting on particles, if applicable.
force_scale – Prior scale of force components.
virial_data – (N_snapshots,) or (N_snapshots, dim, dim) array of corresponding virial (tensor) values (without kinetic contribution), if applicable.
virial_scale – Prior scale of virial components.
kt_data – Temperature corresponding to each data point. For learning temperature-dependent (coarse-graned) models.
box_tensor – Box tensor, only needed if virial_data used.
train_ratio – Ratio of dataset to be used for training. The remaining data can be used for validation.
val_ratio – Ratio of dataset to be used for validation. The remaining data will be used for testing.
likelihood_distribution – Log-likelihood distribution, defaults to Gaussian log-likelihood.
prior_scale_distribution – Log-prior distribution of the likelihood scale parameter. Defaults to exponential distribution.
prior_scale_init_multiple – Initial value of prior scale multiple. Can be used to initialize prior scales larger or smaller than prior mean or to cunteract the smaller scale in the gamma distribution compared to the mean.
shuffle – Whether to shuffle data before splitting into train-val-test.
- Returns:
A tuple (prior_fn, likelihood_fn, init_samples, train_loader, val_loader, test_loader, test_set). Prior and likelihood can be used to construct the potential function for Hamiltonian MCMC formulations. Init_samples is a list of initial values for multiple MCMC chains, e.g. for SGMCMCTrainer. The data loaders are jax-sgmc NumpyDataloaders used for training, validation and testing. The test_set can be used for further analyses of the trained model on unseen data.