cs.LG - arXiv 学术档案

cs.CL 2026-06-17

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Progress in legal AI increasingly depends on access to authoritative legal text at scale. Yet one of the most consequential layers of American law remains largely absent from existing machine-readable...

Denis Peskoff, Joe Barrow, Christopher Vu 等

详情 PDF

astro-ph.IM 2026-06-17

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalo...

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such ...

V. Samuel Pérez-Díaz, Vinay L. Kashyap, Joshua D. Ingram 等

详情 PDF

cs.LG 2026-06-17

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

Preference-based RL provides an approach to learning reward models from pairwise comparisons of behaviors, bypassing the need for explicit reward design. However, existing methods typically rely on pa...

Mohamed Nabail, Leo Cheng, Jingmin Wang 等

详情 PDF

cs.LG 2026-06-17

Explaining Attention with Program Synthesis

A longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approxima...

Amiri Hayes, Belinda Li, Jacob Andreas

详情 PDF

cs.LG 2026-06-17

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

Enhancing the formal math reasoning capabilities of Large Language Models (LLMs) has become a key focus in both mathematical and computer science communities in recent years. While significant progres...

Ruida Wang, Rui Pan, Pengcheng Wang 等

详情 PDF

cs.LG 2026-06-17

P-K-GCN: Physics-augmented Koopman-enhanced Graph Convolutional Network for Deep Spatiotemporal Super-resolution

High-fidelity simulation of spatiotemporal dynamics is computationally prohibitive, necessitating efficient super-resolution techniques to reconstruct high-resolution data from coarse-grained inputs. ...

Xizhuo, Zhang, Zekai Wang 等

详情 PDF

physics.ao-ph 2026-06-17

Optimal scenario design for climate emulation

As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, f...

Christopher B. Womack, Shahine Bouabid, Andrei Sokolov 等

详情 PDF

cs.CV 2026-06-17

Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation

Glioma segmentation in multiparametric MRI is a critical component of treatment planning. A segmentation model that fails silently on treatment-critical sub-regions represents a patient safety risk th...

Xin Ci Wong, Duygu Sarikaya, Kieran Zucker 等

详情 PDF

cs.LG 2026-06-17

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet it is unclear how much commonsense and factual knowledge they retain a...

Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin 等

详情 PDF

cs.LG 2026-06-17

Risk Stratification for ICU Delirium using Pervasive Ambient Sensing Information

Delirium is a common and serious complication in the Intensive Care Unit (ICU), associated with increased morbidity, prolonged hospital stays, and higher healthcare costs. Despite its prevalence, earl...

Jiaqing Zhang, Sabyasachi Bandyopadhyay, Miguel Contreras 等

详情 PDF

cs.AI 2026-06-17

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic and neural systems each define truth by their own inductive rules. NeSyCat, extending ULLER, subsumes them under a single inducti...

Daniel Romero Schellhorn, Till Mossakowski, Björn Gehrke

详情 PDF

eess.IV 2026-06-17

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI

Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centr...

Mark A. Anastasio

详情 PDF

cs.LG 2026-06-17

Structured Inference with Large Language Gibbs

The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically c...

Sanghyeok Choi, Henry Gouk, Esmeralda S. Whitammer

详情 PDF

cs.LG 2026-06-17

Detecting Hidden ML Training With Zero-Overhead Telemetry

Hardware-enabled monitoring of GPU workloads underpins many proposals for AI compute governance, but if developers can defeat monitoring mechanisms, such schemes are unworkable. We evaluate the advers...

Robi Rahman, Sabiha Tajdari

详情 PDF

cs.LG 2026-06-17

SCAN: Enhance Time Series Anomaly Detection via Multi-Scale Neighborhood-Centered Clustering

Time series anomaly detection plays a crucial role in a wide range of real-world applications. Reconstruction-based methods have become the mainstream paradigm, but they suffer from over-generalizatio...

Xingze Zheng, Hanyin Cheng, Siyuan Wang 等

详情 PDF

cs.CV 2026-06-17

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

Existing approaches to 3D scene understanding in Vision-Language Models (VLMs) either rely on complex, model-specific geometry encoders or large training budgets in pursuit of spatial reasoning. Inste...

Bartłomiej Baranowski, Dave Zhenyu Chen, Matthias Nießner

详情 PDF

cs.AI 2026-06-17

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic prog...

Hannah Le, Ramesh Ramasamy, Alex Urrutia 等

详情 PDF

cs.LG 2026-06-17

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse ...

Haipeng Luo, Qingfeng Sun, Songli Wu 等

详情 PDF

cs.LG 2026-06-17

A Human-in-the-Loop Bayesian Optimization Framework for Constraint-Aware Bioprocess Development

This work presents an extension to Pareto Front Guided Sampling (PFGS), a Human-in-the-Loop (HitL) Bayesian Optimization (BO) framework in which Gaussian process (GP) surrogate-derived quantities are ...

Samuel Stricker, Claus Wirnsperger, Alessandro Butté 等

详情 PDF

cs.LG 2026-06-17

Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates...

Chenyu Zhou, Qiliang Jiang, Shuning Wu 等

详情 PDF

cs.LG 2026-06-17

Machine Unlearning for the XGBoost Model with Network Intrusion Datasets

Machine Unlearning (MU) has emerged as an important technique for removing specific data points from trained models without requiring full retraining. However, most existing MU research focuses on dee...

Diana Magalhães, Eva Maia, João Vitorino 等

详情 PDF

stat.ML 2026-06-17

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

Recent empirical work shows that semantically equivalent paraphrases can fool financial sentiment classifiers: although a paraphrase remains close to the original under a strong reference embedding, i...

Martin Anthony, Kaveh Salehzadeh Nobari

详情 PDF

cs.LG 2026-06-17

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

Stochastic momentum methods such as heavy ball (HB), Nesterov momentum, and variants of Accelerated SGD (ASGD) [Kidambi et al., 2018] are widely used in modern training, but their stochastic benefits ...

Depen Morwani, Alexandru Meterez, Pranav Nair 等

详情 PDF

stat.ML 2026-06-17

On Local Population-Risk Certificates

This paper develops local certificates for population-risk increments around a current model. For a local candidate set \(\mathcal D\), the certificate is a two-sided confidence band for \(P({\ell_{θ+...

Mingzhi Song

详情 PDF

cs.LG 2026-06-17

INDEQS: Informed Neural controlled Differential EQuationS

Neural Controlled Differential Equations (NCDE) provide a powerful continuous-time framework for forecasting time series, but standard graph-based extensions typically learn spatial structure purely f...

Michael Detzel, Gabriel Nobis, Kristiyan Blagov 等

详情 PDF

stat.ME 2026-06-17

Wasserstein Policy Learning for Distributional Outcomes

Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that ma...

Yiyan Huang, Cheuk Hang Leung, Qi Wu 等

详情 PDF

cs.LG 2026-06-17

Smoothness-Based Derandomization of PAC-Bayes Bounds

We study PAC-Bayes derandomization for smooth loss functions. Our goal is to obtain generalization bounds that hold with high probability for deterministic predictors by exploiting smoothness properti...

Alexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère

详情 PDF

stat.ML 2026-06-17

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notabl...

Zilong Zhang, Yi-Ting Hung, Lei Ding 等

详情 PDF

stat.ML 2026-06-17

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this diffi...

Zheng He, Danica J. Sutherland

详情 PDF

stat.ML 2026-06-17

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single op...

Connor Simpson, Ricardo J. G. B. Campello

详情 PDF

cs.LG 2026-06-17

Strategic Feature Selection

When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to...

Jivat Neet Kaur, Pratik Patil, Divya Shanmugam 等

详情 PDF

stat.ML 2026-06-17

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinter...

Nicolas Mahler

详情 PDF

cs.LG 2026-06-17

Online Distributional Prediction via Latent Cluster Geometry Under Drift and Corruption

Online learning in non-stationary streams is often formulated as tracking a point estimate, but many applications require predicting the full data-generating distribution. We study online distribution...

Navyansh Mahla, Prateek Chanda, Ganesh Ramakrishnan

详情 PDF

stat.ML 2026-06-17

TimeLAVA: Learning-Agnostic Data Valuation for Time Series

Data valuation quantifies the intrinsic quality of individual samples to enable principled data curation, quality control, and robust learning. For time series in critical domains such as healthcare, ...

Wenqin Liu, Weizhi Quan, Aoqi Zuo 等

详情 PDF