The Ohio State University
~ 2 centers
~ 2 centers
Physical Sciences
~ 3 centers
7 centers x 15M ~ 100M
Environmental Sciences
Biological Sciences
YST, Annual Review of Astronomy and Astrophysics, arXiv: 2510.10713
0.9
0.8
0.7
0.25
0.30
0.35
0.40
Dark Matter Density
Growth Amplitude
Sihao Cheng, YST+, 2020
My niece
Highly non-Gaussian
Weakly non-Gaussian
Cosmic large-scale structure
Data / Observation
Theory / Hypothesis
Analysis Pipelines
True
False
Data / Observation
Theory / Hypothesis
Analysis Pipelines
True
False
Alphafold
Data / Observation
Theory / Hypothesis
LamdaCDM
True
False
Data
Theory
State of the research
Making "plans"
Harness reasoning
Pinheiro, ..., YST+, 2025
Sun, YST+, 2024, 2025
Can A.I. agents understand spectral data (spectral energy distribution) from JWST?
A default fit with
an SED model
Extinction model ?
Young stellar population?
* In the classic tale of Faust, Mephisto is a demon who tempts the scholar Faust with knowledge and power in exchange for his soul.
Proposing actions
Execute actions
State evolution
Knowledge distillation
Proposing actions
Execute actions
State evolution
Knowledge distillation
Knowledge base
1
2
3
4
Proposing Actions - e.g., different physical models / parameter range
Knowledge base
1
2
3
4
Execute Actions - write configuration files, run the codes, automously
Knowledge base
1
2
3
4
State Evaluation - evaluate the results (beyond a single error metric)
Knowledge base
1
2
3
4
Knowledge Distillation - summarise useful actions given the previous state
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
GPT-4o baseline --
"think without knowledge"
Chi-Square of the Fit
Fitting JWST JADES data
Sun, YST+, 2024
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
GPT-4o baseline --
"think without knowledge"
Mephisto
Chi-Square of the Fit
Sun, YST+, 2024
" If the fit is overestimated in the UV and optical bands,
increasing the E_BV_lines parameter may lead to a better fit by accounting for more dust attenuation in these bands. "
Sun, YST+, 2025
With COSMOS2020 SEDs
Mephisto finds better solutions using only 1% of the trials that brute force methods require
YST+, 2025g
YST+, 2025g
YST+, 2025g
YST+, 2025g
Liu, YST+ 2024
Pesta & YST, in prep.
Phase
Phase
-0.5
-0.25
0
0.25
0.5
-0.5
-0.25
0
0.25
0.5
Magnitude
11
12
13
14
15
12
13
14
15
16
Caught in the brief, unstable evolutionary semi-detached phase
A rare alignment of
a massive Supergiant
in a 13-year orbit
P=13 years
P=2.3 days
Graduate student / Postdoc
Princeton Language and Intelligence Lab, June 2024
Human accuracy: ~80%
GPT-4o: ~47%
ARC Prize Foundation (ARC-AGI-2, 2025)
Human Panel : ~ 100%
GPT-5 : ~10%
Complex calculations
Logical inference (?)
Memorizing information
Language
Coding
Spatial reasoning
Common sense physics
(water flows downhill)
Basic motor skills
Visual reasoning
Understanding context
Pinheiro, ..., YST+, 2025
Pinheiro, ..., YST+, 2025
Yang,... YST+, 2025, ICCV
YST+, 2025d
Sun, YST+, 2024b
de Haan, YST+, 2025
Score (%)
Cost per 1 SED Source (USD)
AstroSage-8B
(de Haan, YST+ 2025a)
AstroSage-70B
(de Haan, YST+ 2025b)
de Haan, YST+, 2024, 2025
de Haan, YST+, 2025
Complex calculations
Logical inference (?)
Memorizing information
Language
Coding
Spatial reasoning
Common sense physics
Basic motor skills
Visual reasoning
Understanding context
Supported by the Alfred P. Sloan Foundation and CCAPP / OSU
YST+ 2026, Nature Astronomy, in-press.
YST+ 2026, Nature Astronomy, in-press.
Ernest Hemingway's six-word story
For sale:
Baby shoes,
Never worn.
YST+ 2026, Nature Astronomy, in-press.
YST+ 2026, Nature Astronomy, in-press.
Annotated
Labelled Data
Unlabelled Data
Interacting with "physical" world
" If there is a gross underestimation in the MWIR bands,
consider exploring a wider range of fracAGN values in the agn module to improve the fit in these bands "
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
Chi-Square
Chi-Square of the Fit
Why this plateau ??
Sun, YST+, 2024
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
Chi-Square
Chi-Square of the Fit
- Number of photometry bands fitted within 1σ
Sun, YST+, 2024
Number of Learning Iterations
0
10
20
30
5.1
5.6
6.0
6.4
Chi-Square
Chi-Square of the Fit
- Number of photometry bands fitted within 1σ
"Exploration"
"Exploitation"
Sun, YST+, 2024
Wavelength [micron]
Flux
Sun, YST+, 2025
Learn from the data
Summarize "knowledge"
Examine and include prior knowledge
Expedite discovery
Use the learned knowledge as context
Year
2000
2005
2010
2015
2020
7
9
11
10
8
Count [thousands]
Scientific concepts
Sun, YST+, 2024b
Year
2000
2005
2010
2015
2020
1.5
Count [thousands]
Numerical simulation
1.2
0.9
0.6
0.3
Statistics
Sun, YST+, 2024b
Year
2000
2005
2010
2015
2020
1.5
Count [thousands]
1.2
0.9
0.6
0.3
Machine learning
Linear Regression,
Gaussian Process, Random Forest, ......
152
210
230
Sun, YST+, 2024b
Concept
Paper
Ting et al.
Contain
Einstein et al.
Contain
Contain
citation
Concept B:
Plasmon
Concept A:
Dark Matter
Concept A:
Dark Matter
Concept
Concept B:
Plasmon
Distance between concept A to B =
Paper
averaged over all papers containing concept A
Concept
Paper
Technical concept:
Neural Networks
Scientific concept: Large-Scale Structure
Year
2000
2005
2010
2015
2020
-4.0
Log Average Linkage
-4.2
-4.4
-4.6
Numerical simulation
x scientific concepts
Technology development
Sun, YST+, 2024b
Concept
Paper
Scientific Concept: Large-Scale Structure
Numerical Simulations
Simulations being developed
Linkage
decoupled
Year
2000
2005
2010
2015
2020
-4.0
Log Average Linkage
-4.2
-4.4
-4.6
Numerical simulation
x scientific concepts
Technology deployment
Technology development
Sun, YST+, 2024b
Concept
Paper
Scientific Concept: Large-Scale Structure
Numerical Simulations
Simulations being deployed to sciences
Linkage increases
Year
2000
2005
2010
2015
2020
-4.0
Log Average Linkage
-4.2
-4.4
-4.6
Numerical simulation
x scientific concepts
N-body
simulation
Hydrodynamical simulation
Sun, YST+, 2024b
Year
2000
2005
2010
2015
2020
-4.0
Log Average Linkage
-4.2
-4.4
-4.6
ML x Scientific concepts
Gaussian process
multi-layer perceptron
AI is still 20-50 points worse than humans
Brute force fine-tuning can close the gap in simple descriptive tasks, but not in visual reasoning tasks
Yang,... YST+, 2025, ICCV
YST+, 2025d
YST+, 2025d
Cosmology
Galaxy
High-energy
Sun/Star
Exoplanet
Simulation
Instrument
AI/Stat
Cosmology
Galaxy
High
-energy
Star
Planet
Sims
Instru.
AI/Stats
Sun/Star
Applications of AI in Stats
YST+, 2025d
e.g., GPT-5
In the SED case study, we need ~0.1M tokens per source
= USD 1 per source ...
e.g., Roman Space Telescope, Euclid Space Telescope
Natural Language Processing experts
Oak Ridge
National Lab
Argonne
National Lab
Harvard-Smithsonian ADS
U. Ilinois
Urbana-Champaign
Knowledge Recall
YST+, 2025a
Nguyen, YST+ 2023
What is the primary reason for the decline in the number density of luminous quasars at redshifts greater than 5?
A decrease in the overall star formation rate, leading to fewer potential host galaxies for quasars.
An increase in the neutral hydrogen fraction in the intergalactic medium, which obscures the quasars’ light.
A decrease in the number of massive black hole seeds that can form and grow into supermassive black holes.
An increase in the average metallicity of the Universe, leading to a decrease in the efficiency of black hole accretion.
LLaMA-3.1 70b throughput on four H100 GPUs
= ~ 100 tokens / second
1 SED source = 15 GPU minutes
1B sources = 10M GPU days
A cluster with 10,000 H100 GPUs
running for 3 years
= 0.03 USD
= 40 USD
Compute Power
Year
CPU Moore's Law is plateauing
GPU is
picking up the pace
The price drop has an e-folding time of appromately
3 months
Score (%)
Cost per 1 SED Source (USD)
< July 2024
Score (%)
Cost per 1 SED Source (USD)
< July 2024
Score (%)
Cost per 1 SED Source (USD)
+ 3 months
Google Gemma-2
Google
Gemini-1.5
Open-Weight
Proprietary
DeepSeek v2
Score (%)
Cost per 1 SED Source (USD)
Alibaba Qwen-2.5
Open-Weight
Proprietary
Meta LLaMA 3
+ 3 months
Yi 01
X's Grok
Stepfun
Microsoft
Phi-3.5
Nvidia's Nemotron
Score (%)
Cost per 1 SED Source (USD)
Open-Weight
Proprietary
+ 3 months
+ 3 months
Proprietary
(Experimental / Not Released)
DeepSeek v3 / R1
Score (%)
Cost per 1 SED Source (USD)
Open-Weight
Proprietary
+ 3 months
+ 3 months
Proprietary
(Experimental / Not Released)
OpenAI (o3)
Google Gemini-2.0
Score (%)
Cost per 1 SED Source (USD)
Open-Weight
Proprietary
+ 3 months
+ 3 months
Proprietary
(Experimental / Not Released)
Microsoft
Phi-4
MiniMax 01
Gemini-2.5-Pro
Claude-3.7-Sonnet
Meta LLaMA 4
e.g., Roman Space Telescope, Euclid Space Telescope
Data-poor , Theory-rich
Collecting
more data
???
Data-poor , Theory-rich
Data-rich , Theory-poor
Roman, HSC, Euclid, DESI, SDSS, PFS
Data-poor , Theory-rich
JWST SED Fitting