The Ohio State University
Large-scale structure
Credit: Illustris
Large-scale structure
Credit: Illustris
Cosmic Reionisation
Credit: ESO
Motions of a billion stars
Credit: ESA
P ( physics | observation ) p ( observation | physics ) * p ( physics )
P ( price | car ) p ( car | price ) * p ( price )
P ( price | car ) = p ( car | price ) * p ( price )
P ( price | car ) = p ( car | price ) * p ( price )
P ( physics | observation ) p ( observation | physics ) * p ( physics )
Very high dimensional
Credit : arxivsorter.org
Credit : arxivsorter.org
Nobel Prize 11, 19
Exoplanet
Pulsar
Nobel Prize 74, 93
Nobel Prize 19
Cosmology
Credit : arxivsorter.org
Exoplanet
Plasma
Turbulence
Nobel Prize 11, 19
Nobel Prize 19
Pulsar
Nobel Prize 74, 93
Cosmology
Star Formation,
Galaxy Evolution,
Black Hole Physics
Complexity
Stochasticity
My niece
E.g.,
orientation
projecting out uninformative variability
Sersic profile
(or "structures")
("uninformative" variances)
Complexity
Stochasticity
Cosmic Microwave Background
Reionization
Intergalactic Medium Tomography
Large-Scale Structure
(or "structures")
("uninformative" variances)
Simple
Complex
Power spectrum
Power spectrum
Complexity
Stochasticity
(or "structures")
("uninformative" variances)
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
Consider a single random variable
Consider a single random variable
Consider a single random variable
In 1D, power spectrum is equivalent to taking the second moment
Variance
Skewness
The "tail" differentiates these two patterns
Consider a single random variable
Variance
Skewness
Skewness defines locality
Classical lores : characterizing
with all its moments
Heavy tail
(Non-Gaussian)
Depend critically on the "outliers"; subjected to the sampling noise
The estimates are noisy
"Folding"
"Folding" = non-linear operation + averaging
"Folding"
"Folding" = non-linear operation + averaging
"Folding"
Linear order with respect to
stable and robust summary statistcs
(or "activation function")
e.g., ReLU
e.g., tanh, sigmoid
A necessary set of operations to capture high-order moments robustly
Credit: Wikipedia
Background galaxies
Foreground dark matter
Unlensed
Lensed
Vary cosmological parameters
0.9
0.8
0.7
0.25
0.30
0.35
0.40
Dark Matter Density
Growth Amplitude
Power spectrum fails to distinguish the intricate differences between the two maps
Power spectrum
0.9
0.8
0.7
0.25
0.30
0.35
0.40
Power spectrum
Dark Matter Density
Growth Amplitude
Scattering
Transform
International Astrostatistics Association Award
Vary cosmological parameters
0.9
0.8
0.7
0.25
0.30
0.35
0.40
Dark Matter Density
Growth Amplitude
Power spectrum fails to distinguish the intricate differences between the two maps
Power spectrum
vs.
Parity violating
Non violating
0
20
40
60
80
50
200
800
120
# Training patches of the sky
Detection Significance
Scattering Tranform
Convolutional Neural Network
P ( physics | observation ) p ( observation | physics ) * p ( physics )
Describing the high-dimensional likelihood as it is
Kangaroo operating telescope,
looking into space
Text Prompt
Images
All possible images
Kangaroo operating telescope,
looking into space
Cosmological parameters
Cosmological parameters
Cosmological parameters
Images
Cosmological parameters
Text Prompt
Cosmological parameters
Generator
( machine 1)
Discriminator
( machine 2 )
Neural Network
Truth (simulations)
Frechet distance:
~0.2 (truth x truth)
GAN (StyleGAN 2)
Diffusion Models (DM)
~2-2.5 (truth x GAN)
~0.4 (truth x DM)
0.9
0.8
0.7
0.2
0.3
0.4
0.5
Power spectrum
Dark Matter Density
Growth Amplitude
Normalizing Flow
A shameless promotion of 1990s Japanese anime
Image = fixed dimension
Graph = varying dimension
Blackhole mass
Blackhole accretion
Gas metallicity
Stellar metallicity
Stellar mass
Rotational curve vmax
Velocity dispersion
Stellar mass
For individual nodes :
Merging redshift
Pairwise distances
Stellar
Metallicity
Velocity
Dispersion
Blackhole
Mass
Stellar
Metallicity
Velocity
Dispersion
0.2
0.3
0.4
0.6
1.0
2.0
Merging Redshift
Normal
Outlier
Log likelihood
Displacement [ 100 kpc ]
Simulated system
0.2
0.3
0.4
0.6
1.0
2.0
Merging Redshift
Normal
Outlier
Log likelihood
Displacement [ 100 kpc ]
Cosmic Reionisation
Length [ Mpc ]
0
50
100
150
200
250
0
50
100
150
200
250
0
50
100
150
200
250
Power spectrum
Scattering transform
0.0
Fraction of gas in galaxies, and
Ionization photon escape fraction, and
its dependence on the halo mass
its dependence on the halo mass
Galaxy mass below which star formation is suppressed
Star formation evolution time scale
Emisitivity in X-ray
X-ray escape threshold
P ( physics | observation ) p ( observation | physics ) * p ( physics )
Very high dimensional
Observation
Physics
Summary Statistics
Theory
Frequency
Power Spectral Density
Oscillation
Granulation
P ( physics | observation ) p ( observation | physics ) * p ( physics )
Observation
Physics
Summary Statistics
Theory
Power Spectrum
Convolutional Neural Networks
Scattering Transform
15 coefficients
37 coefficients
10
100
1000
noiseless
DES
figure of merit
Galaxy number density
Euclid / CSST / Roman
Rubin
Complexity
Stochasticity
Human heuristics
( power spectrum )
(or "structures")
("uninformative" variances)
Scattering Transform
Simple
Complex
Log Planet Mass [ inferred - truth ]
Log Viscosity [ inferred - truth ]
Zhang+22
This study
( Scattering Transform)
0.11
0.14
Observation
The ML tool with the relevant symmetries
Blindly applying machine learning
Classical tool
(e.g., power spectrum)
stellar interior,
surface gravity
Local perceptive field
Long-range information
0
1
2
3
4
0
1
2
3
4
Ground Truth Surface Gravity
0
1
2
3
4
Inferred Surface Gravity
State-of-the-art
This study
Neural Scaling Law also
applies to astronomical
time series data
"Model Size "
Performance
4500
4600
4700
4800
4900
5000
Wavelength [A]
Generative Residual [dex]
-0.02
0
0.02
0
-0.1
0.1
-0.02
0
0.02
Transformers
Convolutional Neural Networks
Multi-Layer Perceptron
The more computing power we can allocate, the more the models will continue to improve
Number of Training Steps
10
3
10
4
10
5
10
6
Mean Absolute Error
10
-2
10
-1
Transformers
Multilayer-Perceptron
Transformers
CNN
This Study
Classical Pipeline
[Fe/H]
-3
-2
-1
0
-3
-2
-1
0
-3
-2
-1
0
-0.2
0.0
0.2
0.4
0.6
0.8
[Mg/Fe]
[Mg/Fe]
Thick Disk
Thin Disk
Gaia-Sausage-Enceladus
Fail to extract detailed structures
-0.2
0.0
0.2
0.4
0.6
0.8
[Mg/Fe]
Distribution Function of 6D phase space
Gravitational Potential
(1) Liouville's Theorem
(2) Stationary
i.e., the system is collisionless
"Penalty" function: minimize
Inferring potential from the distribution function of phase-space tracers
(1) Liouville's Theorem
(2) Stationary
i.e., the system is collisionless
Gravitational potential
r
(represented by a neural network)
0.0
-0.5
-1.0
0.1
10
x
y
-4
4
-4
4
True solution
1.0
1.5
0.5
0.0
v
0
2
4
r
0
2
4
r
4
Input : Phase-space tracers
Output: Gravitational potential
while satisfying the Liouville theorem and being stationary
N-body Simulations
Deep Potential Recovery
x [kpc]
y [kpc]
x [kpc]
Observation
Physics
Summary Statistics
Theory
Simulation
Overcoming the curse of dimensionality
- (1 ) Symmetries
- (2) Physics-Inspired NN
Generative Models
"interpretable knowledge"
"interpretable knowledge"
"interpretable knowledge"
"interpretable knowledge"
Neural ODE as Normalizing Flow
Graph Neural Network as the integrand
- Can vary in dimension
The message passing function satisfies E(n) equivariances
Graph Neural Networks
Normalizing Flow
Physics INN
Equivariances
SBI
Main Progenitor
1st Merger
2nd Merger
3rd Merger
4th Merger
Stellar mass
Distance from the main progenitor
Merging redshift
Graph generated
TNG 300 simulations
Not possible by looking only at the most similar galaxies in the simulation
Correlations
smeared out
The Milky Way
All the progenitors that can make the Milky Way
M0
M1
M4
M2
M3
z1
z2
z3
z4
M0
Progenitor Mass
Merging
Redshift
Pairwise Separation
Progenitor
Mass
Merging
Redshift
M1
d02
M2
M3
M4
z1
z2
z3
z4
d01
d03
d04
d12
d13
d14
d23
d24
d34
Correlation
-0.5
-0.1
0
0.1
0.5
M0
M1
M4
M2
M3
z1
z2
z3
z4
M0
Progenitor Mass
Merging
Redshift
Progenitor
Mass
Merging
Redshift
M1
d02
M2
M3
M4
z1
z2
z3
z4
d01
d03
d04
d12
d13
d14
d23
d24
d34
Correlation
-0.5
-0.1
0
0.1
0.5
Pairwise Separation
M0
M1
M4
M2
M3
z1
z2
z3
z4
M0
Progenitor Mass
Merging
Redshift
Progenitor
Mass
Merging
Redshift
M1
d02
M2
M3
M4
z1
z2
z3
z4
d01
d03
d04
d12
d13
d14
d23
d24
d34
Correlation
-0.5
-0.1
0
0.1
0.5
A massive main progenitor requires a latter merger to compensate
Pairwise Separation
If the first massive merger happens earlier, the other mergers have to be more massive to compensate
2013
2015
2017
2017
2019
2021
2023
0%
20%
40%
60%
80%
100%
Year
human baseline
Image classification
Basic-level reading comprehension
Multitask language understanding
Mathematical reasoning
Performance vs. human
Observed data
Synthetic models
"Observed data"
"Synthetic models"
0.5
1.0
0.5
1.0
15900
15950
APOGEE
Kurucz model
Normalized Spectrum
Domain adaptation
Wavelength [A]
APOGEE
M-dwarfs are hard!