From 83a78ebd3819319becffe9f06c7069c2856fe2a9 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 4 Nov 2024 21:05:15 +0000 Subject: [PATCH 1/4] [pre-commit.ci] pre-commit autoupdate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit updates: - [github.com/kynan/nbstripout: 0.7.1 → 0.8.0](https://github.com/kynan/nbstripout/compare/0.7.1...0.8.0) --- .pre-commit-config.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 8a82e585..7e5f4fe1 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -16,7 +16,7 @@ repos: exclude: pysr/test/test_nb.ipynb # Stripping notebooks - repo: https://github.com/kynan/nbstripout - rev: 0.7.1 + rev: 0.8.0 hooks: - id: nbstripout exclude: pysr/test/test_nb.ipynb From e8bbc5c3555a162a3d13ea8e1f7f4cb927f3de87 Mon Sep 17 00:00:00 2001 From: Ilya Orson Date: Tue, 19 Nov 2024 21:45:21 +0000 Subject: [PATCH 2/4] docs: Add another paper using PySR (#741) * Update papers.yml * Add files via upload * move image to other repo --------- Co-authored-by: Miles Cranmer --- docs/papers.yml | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/papers.yml b/docs/papers.yml index b7911103..862624f9 100644 --- a/docs/papers.yml +++ b/docs/papers.yml @@ -245,3 +245,18 @@ papers: abstract: "How can we find interpretable, domain-appropriate models of natural phenomena given some complex, raw data such as images? Can we use such models to derive scientific insight from the data? In this paper, we propose some methods for achieving this. In particular, we implement disentangled representation learning, sparse deep neural network training and symbolic regression, and assess their usefulness in forming interpretable models of complex image data. We demonstrate their relevance to the field of bioimaging using a well-studied test problem of classifying cell states in microscopy data. We find that such methods can produce highly parsimonious models that achieve ~98% of the accuracy of black-box benchmark models, with a tiny fraction of the complexity. We explore the utility of such interpretable models in producing scientific explanations of the underlying biological phenomenon." image: https://raw.githubusercontent.com/MilesCranmer/PySR_Docs/master/images/cell_state_classification.jpg date: 2024-02-05 + - title: "The automated discovery of kinetic rate models – methodological frameworks" + authors: + - Miguel Ángel de Carvalho Servia (1) + - Ilya Orson Sandoval (1) + - King Kuok (Mimi) Hii (1) + - Klaus Hellgardt (1) + - Dongda Zhang (2) + - Ehecatl Antonio del Rio Chanona (1) + affiliations: + 1: Imperial College London + 2: University of Manchester + link: https://arxiv.org/abs/2301.11356 + abstract: "The industrialization of catalytic processes requires reliable kinetic models for their design, optimization and control. Mechanistic models require significant domain knowledge, while data-driven and hybrid models lack interpretability. Automated knowledge discovery methods, such as ALAMO (Automated Learning of Algebraic Models for Optimization), SINDy (Sparse Identification of Nonlinear Dynamics), and genetic programming, have gained popularity but suffer from limitations such as needing model structure assumptions, exhibiting poor scalability, and displaying sensitivity to noise. To overcome these challenges, we propose two methodological frameworks, ADoK-S and ADoK-W (Automated Discovery of Kinetic rate models using a Strong/Weak formulation of symbolic regression), for the automated generation of catalytic kinetic models using a robust criterion for model selection. We leverage genetic programming for model generation and a sequential optimization routine for model refinement. The frameworks are tested against three case studies of increasing complexity, demonstrating their ability to retrieve the underlying kinetic rate model with limited noisy data from the catalytic systems, showcasing their potential for chemical reaction engineering applications." + image: https://raw.githubusercontent.com/MilesCranmer/PySR_Docs/refs/heads/master/images/adok_s_results.jpg + date: 2024-03-22 From 3d361f20e7fded28a26b6637e753e00bbbd65fe1 Mon Sep 17 00:00:00 2001 From: Ho Fung Tsoi Date: Tue, 19 Nov 2024 16:51:20 -0500 Subject: [PATCH 3/4] docs: add symbolfit paper (#750) * add symbolfit paper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs: move image to other repo * docs: fix yaml syntax --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Miles Cranmer --- docs/papers.yml | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/papers.yml b/docs/papers.yml index 862624f9..11ccfb07 100644 --- a/docs/papers.yml +++ b/docs/papers.yml @@ -245,6 +245,30 @@ papers: abstract: "How can we find interpretable, domain-appropriate models of natural phenomena given some complex, raw data such as images? Can we use such models to derive scientific insight from the data? In this paper, we propose some methods for achieving this. In particular, we implement disentangled representation learning, sparse deep neural network training and symbolic regression, and assess their usefulness in forming interpretable models of complex image data. We demonstrate their relevance to the field of bioimaging using a well-studied test problem of classifying cell states in microscopy data. We find that such methods can produce highly parsimonious models that achieve ~98% of the accuracy of black-box benchmark models, with a tiny fraction of the complexity. We explore the utility of such interpretable models in producing scientific explanations of the underlying biological phenomenon." image: https://raw.githubusercontent.com/MilesCranmer/PySR_Docs/master/images/cell_state_classification.jpg date: 2024-02-05 + - title: "SymbolFit: Automatic Parametric Modeling with Symbolic Regression" + authors: + - Ho Fung Tsoi (1) + - Dylan Rankin (1) + - Cecile Caillol (2) + - Miles Cranmer (3) + - Sridhara Dasu (4) + - Javier Duarte (5) + - Philip Harris (6, 7) + - Elliot Lipeles (1) + - Vladimir Loncar (6, 8) + affiliations: + 1: University of Pennsylvania + 2: European Organization for Nuclear Research (CERN) + 3: University of Cambridge + 4: University of Wisconsin-Madison + 5: University of California San Diego + 6: Massachusetts Institute of Technology + 7: Institute for Artificial Intelligence and Fundamental Interactions + 8: Institute of Physics Belgrade + link: https://arxiv.org/abs/2411.09851 + abstract: "We introduce SymbolFit, a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data, while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we address this problem by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without needing a predefined functional form, treating the functional form itself as a trainable parameter. Our approach is demonstrated in data analysis applications in high-energy physics experiments at the CERN Large Hadron Collider (LHC). We demonstrate its effectiveness and efficiency using five real proton-proton collision datasets from new physics searches at the LHC, namely the background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We also validate the framework using several toy datasets with one and more variables." + image: https://raw.githubusercontent.com/MilesCranmer/PySR_Docs/refs/heads/master/images/symbolfit_sampling.png + date: 2024-11-15 - title: "The automated discovery of kinetic rate models – methodological frameworks" authors: - Miguel Ángel de Carvalho Servia (1) From e8f1c70ebceb315c1caeecf82061d974bb975af1 Mon Sep 17 00:00:00 2001 From: LionessOfCintra <92221853+LionessOfCintra@users.noreply.github.com> Date: Tue, 19 Nov 2024 22:52:43 +0100 Subject: [PATCH 4/4] docs: add paper on acoustic transmission modelling (#751) * Update papers.yml Added new paper on acoustic transmission modelling * Add image sonic_crystals.png for paper Analytical formulae for design of one-dimensional sonic crystals with smooth geometry based on symbolic regression * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs: move image to other repo * fix: papers list * docs: fix date --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Miles Cranmer --- docs/papers.yml | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/papers.yml b/docs/papers.yml index 11ccfb07..b6b80fec 100644 --- a/docs/papers.yml +++ b/docs/papers.yml @@ -245,6 +245,17 @@ papers: abstract: "How can we find interpretable, domain-appropriate models of natural phenomena given some complex, raw data such as images? Can we use such models to derive scientific insight from the data? In this paper, we propose some methods for achieving this. In particular, we implement disentangled representation learning, sparse deep neural network training and symbolic regression, and assess their usefulness in forming interpretable models of complex image data. We demonstrate their relevance to the field of bioimaging using a well-studied test problem of classifying cell states in microscopy data. We find that such methods can produce highly parsimonious models that achieve ~98% of the accuracy of black-box benchmark models, with a tiny fraction of the complexity. We explore the utility of such interpretable models in producing scientific explanations of the underlying biological phenomenon." image: https://raw.githubusercontent.com/MilesCranmer/PySR_Docs/master/images/cell_state_classification.jpg date: 2024-02-05 + - title: Analytical formulae for design of one-dimensional sonic crystals with smooth geometry based on symbolic regression + authors: + - Viktor Hruška (1) + - Aneta Furmanová (1) + - Michal Bednařík (1) + affiliations: + 1: Czech Technical University in Prague, Faculty of Electrical Engineering + link: https://doi.org/10.1016/j.jsv.2024.118821 + abstract: Even though locally periodic structures have been studied for more than three decades, the known analytical expressions relating the waveguide geometry and the acoustic transmission are limited to a few special cases. Having an access to numerical model is a great opportunity for data-driven discovery. Our choice of cubic splines to parametrize the waveguide unit cell geometry offers enough variability for waveguide design. Using Webster equation for unit cell and Floquet–Bloch theory for periodic structures, a dataset of numerical solutions was prepared. Employing the methods of physics-informed machine learning, we have extracted analytical formulae relating the waveguide geometry and the corresponding dispersion relation or directly the bandgap widths. The results contribute to the overall readability of the system and enable a deeper understanding of the underlying principles. Specifically, it allows for assessing the influence of the waveguide geometry, offering more efficient alternative to computationally demanding numerical optimization. + image: https://raw.githubusercontent.com/MilesCranmer/PySR_Docs/refs/heads/master/images/sonic_crystals.jpg + date: 2024-11-15 - title: "SymbolFit: Automatic Parametric Modeling with Symbolic Regression" authors: - Ho Fung Tsoi (1)