arXiv
From the wire
Original publisher articles citing arXiv. Broadside has not yet written editorial coverage for these.
PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines
Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs
The Scaling Laws of Skills in LLM Agent Systems
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation
DeepSlide: From Artifacts to Presentation Delivery
Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models
LISTEN to Your Preferences: An LLM Framework for Multi-Objective Selection
Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
Vector Policy Optimization: Training for Diversity Improves Test-Time Search
DrugSAGE:Self-evolving Agent Experience for Efficient State-of-the-Art Drug Discovery
SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?
Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation
SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
Polar probe linearly decodes semantic structures from LLMs
Responsible Federated LLMs via Safety Filtering and Constitutional AI
Readers make targeted regressions to plausible errors in reanalysis of "noisy-channel garden-path" sentences
MeMo: Memory as a Model
Herculean: An Agentic Benchmark for Financial Intelligence