This section explores the intersection where physics meets data analysis, a rapidly evolving frontier where complex datasets reveal hidden patterns in the universe. From tracking particle collisions to modeling cosmic structures, these studies rely on advanced statistical methods to turn raw numbers into fundamental insights about how reality works.

Gist.Science monitors every new preprint in this category as it appears on arXiv, ensuring you never miss a breakthrough. We process each entry to provide both plain-language overviews for general understanding and detailed technical summaries for experts, bridging the gap between dense research and clear comprehension.

Below are the latest papers in physics and data analysis, organized for easy reading and discovery.

Time delay embeddings to characterize the timbre of musical instruments using Topological Data Analysis: a study on synthetic and real data

This study demonstrates that applying Topological Data Analysis to time delay embeddings of audio signals, specifically using delays related to fractions of the fundamental period, effectively characterizes musical timbre by revealing harmonic structures and distinguishing between instruments in both synthetic and real data.

Gakusei Sato, Hiroya Nakao, Riccardo Muolo2026-02-05🌀 nlin

Automated Extraction of Multicomponent Alloy Data Using Large Language Models for Sustainable Design

This paper presents an LLM-based pipeline that accurately extracts multicomponent alloy data from both text and tables to create the largest publicly available database of its kind, enabling sustainable materials design by identifying high-performance alloy candidates for lightweighting, soft magnetic, and corrosion-resistant applications.

Aravindan Kamatchi Sundaram, Mohit Chakraborty, Sai Mani Kumar Devathi, B. Pabitramohan Prusty, Rohit Batra2026-02-05🔬 cond-mat.mtrl-sci

Link Statistics of Dislocation Network during Strain Hardening

By analyzing Discrete Dislocation Dynamics simulations of fcc Cu, this study reveals that dislocation link lengths on active slip systems follow a double-exponential distribution due to stress-induced bowing, whereas inactive systems exhibit a single-exponential distribution, a distinction explained by modeling the network as a one-dimensional Poisson process with super-linear growth rates for long links.

Sh. Akhondzadeh, Hanfeng Zhai, Wurong Jian, Ryan B. Sills, Nicolas Bertin, Wei Cai2026-02-03🔬 cond-mat.mtrl-sci

Multimodal Machine Learning for Integrating Heterogeneous Analytical Systems

This paper presents an interpretable multimodal machine learning framework that integrates heterogeneous analytical data from SEM, Raman, gas adsorption, and electrical measurements to characterize carbon nanotube films, demonstrating that nonlinear models like XGBoost can accurately predict material properties while providing physically meaningful insights into the underlying structure-property relationships.

Shun Muroga, Hideaki Nakajima, Taiyo Shimizu, Kazufumi Kobashi, Kenji Hata2026-02-03🔬 cond-mat.mtrl-sci

Phase Transitions in Unsupervised Feature Selection

This paper presents a theoretical analysis demonstrating that unsupervised feature selection for proteins using Differentiable Information Imbalance reveals a phase transition between glass-like and liquid-like states, where the critical number of physico-chemical features coincides with the saturation of downstream classification performance, offering a principled criterion for identifying minimal feature sets.

Jonathan Fiorentino, Michele Monti, Dimitrios Miltiadis-Vrachnos, Vittorio Del Tatto, Alessandro Laio, Gian Gaetano Tartaglia2026-02-03🧬 q-bio