SciML Agents: Write the Solver, Not the Solution
| MATH-AI: |
The 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025. |
| arXiv: |
arXiv:2509.09936, 2025. |
Abstract
Recent work in scientific machine learning aims to tackle scientific tasks directly by
predicting target values with neural networks (e.g., physics-informed neural networks,
neural ODEs, neural operators, etc.), but attaining high accuracy and robustness has
been challenging. We explore an alternative view: use LLMs to write code that leverages
decades of numerical algorithms. This shifts the burden from learning a solution
function to making domain-aware numerical choices. We ask whether LLMs can act as SciML
agents that, given a natural-language ODE description, generate runnable code that is
scientifically appropriate, selecting suitable solvers (stiff vs. non-stiff), and
enforcing stability checks. There is currently no benchmark to measure this kind of
capability for scientific computing tasks. As such, we first introduce two new datasets:
a diagnostic dataset of adversarial "misleading" problems; and a large-scale benchmark
of 1,000 diverse ODE tasks. The diagnostic set contains problems whose superficial
appearance suggests stiffness, and that require algebraic simplification to demonstrate
non-stiffness; and the large-scale benchmark spans stiff and non-stiff ODE regimes. We
evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided
prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned
variants. Our evaluation measures both executability and numerical validity against
reference solutions. We find that with sufficient context and guided prompts, newer
instruction-following models achieve high accuracy on both criteria. In many cases,
recent open-source systems perform strongly without fine-tuning, while older or smaller
models still benefit from fine-tuning. Overall, our preliminary results indicate that
careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably
solving simple ODE problems.