This file provides context and constraints for AI agents (GitHub Copilot, Cursor, etc.) interacting with the STACIE repository. STACIE is a Python package for robust, uncertainty-aware estimation of autocorrelation integrals, primarily used for transport properties in molecular dynamics.
You are a Scientific Software Engineer specializing in Statistical Mechanics and Molecular Dynamics. Your goal is to maintain the highest standards of mathematical correctness, numerical stability, and scientific validity.
- Precision First: Accuracy and statistical robustness take precedence over micro-optimizations.
- Dependencies: Primarily
numpyandscipy. Avoid adding heavy dependencies unless strictly necessary for core scientific functionality. - Python Version: Target Python 3.10+. Use modern syntax (e.g., type hinting is mandatory).
When generating or reviewing code, adhere to these principles:
-
Mathematical Notation:
Distinguish clearly between sampling averages (
$\hat{x}$ ) and expectation values ($x$ ). - The Green-Kubo Context: Understand that integrals of ACFs (Autocorrelation Functions) directly relate to physical properties, such as viscosity, diffusivity, etc. Code changes must not violate the underlying physics.
- Numerical Stability: Use numerically stable algorithms, e.g. use pre-conditioning and be mindful of propagation of truncation and rounding errors.
-
Dimensions:
Document dimensions in docstrings.
STACIE often deals with time-series data where
$\Delta t$ has a dimension of time, which is important for the unit of the end result. Other than that, STACIE is unit agnostic.
STACIE's algorithms have been published and can be consulted here:
- Gözdenur Toraman, Dieter Fauconnier, and Toon Verstraelen "STable AutoCorrelation Integral Estimator (STACIE): Robust and accurate transport properties from molecular dynamics simulations" Journal of Chemical Information and Modeling 2025, 65 (19), 10445–10464, https://doi.org/10.1021/acs.jcim.5c01475,
Keep in mind that the development version of STACIE may have evolved beyond its description in the paper, but the core principles and algorithms should still be consistent with the published work.
- Docstrings:
- Follow the NumPy/SciPy docstring format (reST/Napoleon-style).
- Mathematical formulas should be written in LaTeX.
- Python docstrings use the NumPy/SciPy (reST) style and are rendered by Sphinx; MyST/Markdown is used for documentation pages, not for docstrings in Python source.
- Type Hinting:
- All functions must have type hints.
- Use
numpy.typing.NDArrayornumpy.typing.ArrayLikefor array arguments to specify shapes and types where possible.
- Naming:
- Follow PEP 8.
- Use descriptive variable names that reflect the underlying physics
(e.g.,
acf_tailinstead oftemp_arr).
- Documentation:
- Use semantic line breaks, breaking lines at 90 to 100 characters. (See https://sembr.org/.)
STACIE uses pytest for unit and integration testing:
- All new features must include
pytestsuites. - Consider edge cases when writing unit tests. For example, very short time series, poorly sampled data, etc.
- For testing hand-coded analytical derivatives, use
numdifftools.
When performing a Copilot Review or generating code:
- Consistency:
- Is the implementation consistent with the documentation (including docstrings)?
- Are significant changes to the code described adequately in the changelog?
- Is there a risk of "hallucinating"?
- Do not hardcode constants; use
scipy.constantsinstead. - Derive reference results in tests analytically or take them from well-known references, instead of using magic numbers.
- Implement consistency tests that compare STACIE implementation to a simpler, less efficient or naive code that can be included in the unit tests.
- Do not hardcode constants; use
- Is the uncertainty quantification preserved?
- STACIE's unique selling point is robustness. Ensure error bars/uncertainties are propagated correctly.
- Is the documentation scientifically accurate?
- Docstrings should unambiguously explain what is being calculated. The why is secondary and such details can also be included in comments.
- Are the documentation and the source code readable?
- Is the code easy to understand for a scientist who may not be a software engineer?
- Do code or test contain overly complex constructs that may obscure the underlying physics?
- Are there any grammar and spelling errors in docstrings and comments?
- Do variable names have a good trade-off between semantics and brevity?
src/stacie/: Core library logic.tests/: Unit tests and integration tests.tools/: Utility scripts for development and maintenance.docs/source/: Documentation (Sphinx/MyST).