cs.CY papers | Gist.Science

What are AI researchers worried about?

Based on a survey of over 4,000 AI researchers, this paper reveals that contrary to public and media narratives dominated by existential threats, researchers prioritize immediate sociotechnical risks and show significant convergence with public opinion on risk assessment, suggesting a need for collaborative dialogue focused on mitigating present-day harms rather than speculating on future catastrophes.

Cian O'Donovan, Sarp Gurakan, Ananya Karanam, Xiaomeng Wu, Jack StilgoeMon, 09 Ma💻 cs

SemFuzz: A Semantics-Aware Fuzzing Framework for Network Protocol Implementations

SemFuzz is a semantics-aware fuzzing framework that leverages large language models to extract rules from RFC documents and generate targeted test cases, successfully identifying deep semantic vulnerabilities in network protocol implementations that traditional methods often miss.

Yanbang Sun, Quan Luo, Yuelin Wang, Qian Chen, Benjin Liu, Ruiqi Chen, Qing Huang, Xiaohong Li, Junjie WangMon, 09 Ma💻 cs

THETA: A Textual Hybrid Embedding-based Topic Analysis Framework and AI Scientist Agent for Scalable Computational Social Science

This paper introduces THETA, an open-source framework that combines domain-adaptive fine-tuned embedding models with an AI Scientist Agent system to overcome the scalability and semantic limitations of traditional qualitative research, enabling rigorous, reproducible, and theoretically grounded analysis of massive social datasets.

Zhenke Duan, Xin LiMon, 09 Ma💻 cs

Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality

This paper presents a user study demonstrating that human perceptions of fairness in AI systems are shaped not just by statistical parity or outcomes, but significantly by beliefs about the underlying causes of disparities, specifically how infra-marginality and data distribution differences influence judgments in medical decision-making scenarios.

Schrasing Tong, Minseok Jung, Ilaria Liccardi, Lalana KagalMon, 09 Ma💻 cs

The Values of Value in AI Adoption: Rethinking Efficiency in UX Designers' Workplaces

Through design workshops with 15 UX designers, this paper argues that AI adoption is not merely an efficiency-driven process but a complex negotiation of values across individual, team, and organizational scales that reconfigures roles, power dynamics, and worker agency.

Inha Cha, Catherine Wieczorek, Richmond Y. WongMon, 09 Ma💻 cs

From Risk Avoidance to User Empowerment: Reframing Safety in Generative AI for Mental Health Crises

This paper argues that current generative AI chatbots' risk-avoidant responses to mental health crises can harm users and proposes shifting toward empowerment-oriented design principles that enable AI to act as a supportive bridge for de-escalation and connection to professional care.

Benjamin Kaveladze, Arka Ghosh, Leah Ajmani, Denae Ford, Peter M Gutierrez, Jetta E Hanson, Eugenia Kim, Keertana Namuduri, Theresa Nguyen, Ebele Okoli, Teresa Rexin, Jessica L Schleider, Hongyi Shen, Jina SuhMon, 09 Ma💻 cs

Biometric-enabled Personalized Augmentative and Alternative Communications

This study proposes a roadmap for integrating biometric technologies into personalized Augmentative and Alternative Communication (AAC) systems by introducing concepts like the AAC biometric register, while highlighting through case studies that current AI accuracy in gesture and sign language recognition remains insufficient for practical applications and offering recommendations to bridge this gap.

S. Yanushkevich, E. Berepiki, P. Ciunkiewicz, V. Shmerko, G. Wolbring, R. GuestMon, 09 Ma💻 cs

Do Prevalent Bias Metrics Capture Allocational Harms from LLMs?

This paper demonstrates that commonly used bias metrics fail to reliably capture allocational harms from large language models because they overlook the critical discrepancy between model predictions and the actual decisions made when allocating limited resources.

Hannah Cyberey, Yangfeng Ji, David EvansMon, 09 Ma💬 cs.CL

Analyzing the Performance of ChatGPT in Cardiology and Vascular Pathologies

This study demonstrates that ChatGPT outperformed two medical students in answering challenging cardiology and vascular pathology questions, achieving a 92.10% accuracy rate on a 190-question dataset, thereby highlighting its potential as a valuable tool for medical education.

Walid HaririMon, 09 Ma💬 cs.CL

Mind the Gap: Pitfalls of LLM Alignment with Asian Public Opinion

This paper presents a multilingual audit revealing that while contemporary large language models generally align with public opinion on broad social issues across Asian regions, they consistently fail to accurately represent diverse religious viewpoints—particularly those of minority groups—and often amplify negative stereotypes, a problem that persists despite lightweight prompting interventions and remains undetected by standard bias benchmarks.

Hari Shankar, Vedanta S P, Sriharini Margapuri, Debjani Mazumder, Ponnurangam Kumaraguru, Abhijnan ChakrabortyMon, 09 Ma💬 cs.CL

Towards Autonomous Mathematics Research

This paper introduces Aletheia, an autonomous AI research agent powered by advanced reasoning models and tool use that successfully generates, verifies, and revises mathematical proofs from Olympiad problems to PhD-level research, achieving milestones such as fully AI-generated papers and the autonomous solution of open problems while proposing new frameworks for quantifying AI autonomy and transparency.

Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung, Joonkyung Lee, Carlo Pagano, Sang-hyun Kim, Federico Pasqualotto, Sergei Gukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-Tze Cheng, Demis Hassabis, Koray Kavukcuoglu, Quoc V. Le, Thang LuongMon, 09 Ma🤖 cs.AI

Classroom AI: Large Language Models as Grade-Specific Teachers

This paper introduces a framework for finetuning Large Language Models to generate age-appropriate, factually accurate educational content across six grade levels, which significantly improves grade-level alignment by 35.64 percentage points compared to standard prompting methods.

Jio Oh, Steven Euijong Whang, James Evans, Jindong WangMon, 09 Ma🤖 cs.AI

Bridging MOOCs, Smart Teaching, and AI: A Decade of Evolution Toward a Unified Pedagogy

This paper proposes a unified instructional framework that integrates MOOCs, Smart Teaching, and AI into a coherent, teaching-driven pedagogy, formalizing them as a layered knowledge transformation model to maximize systemic educational potential through structured exposure, adaptive allocation, and efficiency amplification.

Bo Yuan, Jiazi HuMon, 09 Ma🤖 cs.AI

AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference

This paper introduces AdAEM, a novel self-extensible evaluation framework that automatically generates adaptive test questions by probing the internal value boundaries of diverse LLMs to overcome the limitations of static benchmarks and provide more informative, distinguishable insights into models' value differences and alignment dynamics.

Jing Yao, Shitong Duan, Xiaoyuan Yi, Dongkuan Xu, Peng Zhang, Tun Lu, Ning Gu, Zhicheng Dou, Xing XieMon, 09 Ma🤖 cs.AI

The Malicious Technical Ecosystem: Exposing Limitations in Technical Governance of AI-Generated Non-Consensual Intimate Images of Adults

This paper adopts a survivor-centered approach to expose how a "malicious technical ecosystem" of accessible tools enables the creation of AI-generated non-consensual intimate images, while demonstrating that current governance frameworks, such as the NIST AI 100-4 report, fail to effectively regulate this landscape due to flawed underlying assumptions.

Michelle L. Ding, Harini SureshMon, 09 Ma🤖 cs.AI

The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

This systematic literature review critiques the "ground truth" paradigm in machine learning as a positivistic fallacy that misinterprets human disagreement as noise, arguing instead for pluralistic annotation infrastructures that treat diverse subjective perspectives as high-fidelity signals essential for building culturally competent models.

Sheza Munir, Benjamin Mah, Krisha Kalsi, Shivani Kapania, Julian Posada, Edith Law, Ding Wang, Syed Ishtiaque AhmedMon, 09 Ma🤖 cs.AI

Transforming Agency. On the mode of existence of Large Language Models

This paper argues that while Large Language Models lack the autonomy required for genuine agency due to their absence of self-generated norms, goals, and embodied interaction, they function as transformative "linguistic automata" that, through a unique human-machine coupling, enable new "midtended" forms of intentional agency.

Xabier E. Barandiaran, Lola S. AlmendrosMon, 09 Ma🤖 cs.AI

Ambiguity Collapse by LLMs: A Taxonomy of Epistemic Risks

This paper introduces the concept of "ambiguity collapse," where large language models prematurely resolve genuinely ambiguous, value-laden terms into singular interpretations, and proposes a taxonomy of the resulting epistemic risks across process, output, and ecosystem levels alongside multi-layered mitigation strategies to preserve meaningful ambiguity.

Shira Gur-Arieh, Angelina Wang, Sina FazelpourMon, 09 Ma🤖 cs.AI

The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok

This paper presents an algorithmic audit of TikTok revealing that while the platform technically complies with the Digital Service Act's ban on profiled advertising to minors, it effectively circumvents this protection by delivering highly personalized, often undisclosed influencer marketing content to adolescents, thereby highlighting the urgent need to expand the regulatory definition of "advertisement" to cover such commercial practices.

Sara Solarova, Matej Mosnar, Matus Tibensky, Jan Jakubcik, Adrian Bindas, Simon Liska, Filip Hossner, Matúš Mesarčík, Ivan SrbaMon, 09 Ma🤖 cs.AI

Exploring Human-in-the-Loop Themes in AI Application Development: An Empirical Thematic Analysis

This paper presents a multi-source qualitative study that identifies four key themes—AI Governance and Human Authority, Human-in-the-Loop Iterative Refinement, AI System Lifecycle and Operational Constraints, and Human-AI Team Collaboration and Coordination—to address the fragmented operational guidance for structuring human roles and oversight in AI application development.

Parm Suksakul, Nathan Kittichaikoonkij, Nakhin Polthai, Aung PyaeMon, 09 Ma🤖 cs.AI

← Previous Next →