Like a Hammer, It Can Build, It Can Break: Large… — Plain-Language Explanation

✨

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine a Security Operations Center (SOC) as a massive, high-tech control room for a city's defense system. It's where teams of "security guards" (analysts) sit 24/7, watching thousands of screens, trying to spot intruders, stop hackers, and keep the lights on.

For years, these guards have been drowning in a sea of false alarms. It's like a fire station where the phone rings 10,000 times a day, but 9,900 of those calls are just someone burning toast. The guards are exhausted, stressed, and burning out.

Enter the Large Language Model (LLM). You can think of an LLM as a super-smart, incredibly fast, but occasionally hallucinating intern. This paper is like a report card on how real security guards are actually using this new intern, based on thousands of conversations they've had on Reddit.

Here is the breakdown of what the paper found, using simple analogies:

1. The "Hammer" Analogy: What Can It Do?

The title says, "Like a Hammer, It Can Build, It Can Break."

Building (The Good): The intern is amazing at the boring stuff. It can write code scripts, summarize long reports, and explain complex technical jargon in plain English. It's like having a personal assistant who can draft your emails in seconds.
Breaking (The Bad): If you ask this intern to make a critical decision on its own, it might confidently tell you that "The sky is green" because it read a weird comic book once. In security, a confident wrong answer can be disastrous.

2. What Tools Are They Actually Using?

You might think security guards are using fancy, expensive, military-grade "AI Security Robots."

The Reality: They are mostly using general-purpose tools like ChatGPT or Microsoft Copilot. It's like the guards using a Swiss Army Knife instead of a specialized laser cutter.
The "Long Tail": There are dozens of specialized security AI tools on the market, but most guards haven't tried them yet. They are sticking to the tools they already know, even if those tools weren't built specifically for security.

3. How Are They Using It? (The "Trainee" vs. The "General")

The paper found a clear pattern in how the guards use this new tech:

Low-Risk Tasks (The Intern's Playground): Guards are happy to let the AI handle the "grunt work." They use it to write scripts, draft reports, or explain what a confusing error message means. This is like letting the intern organize the filing cabinet.
High-Risk Tasks (The General's Domain): When it comes to actually stopping a hacker or shutting down a server, the guards do not trust the AI to act alone. They treat the AI as a "decision support" tool. They ask, "Hey AI, what do you think?" but then they double-check the answer before doing anything.
- Analogy: You might let the GPS drive the car on the highway (low risk), but you keep your hands on the wheel when navigating a tricky construction zone (high risk).

4. The Three Big Problems (Why They Don't Trust It Yet)

Even though the AI is fast, the guards have three major reservations:

The "Confident Liar" (Reliability): The AI sometimes makes things up (hallucinations) but says them with 100% confidence. In a courtroom, a confident liar gets you sent to jail. In security, a confident liar gets your company hacked.
The "Glass House" (Privacy): If you ask a public AI, "How do I fix this specific error in my company's database?", you might accidentally tell the AI your company's secrets. The guards are worried that by using these tools, they are handing their blueprints to the enemy.
The "Price Tag" (Cost): Running these AI models costs a lot of money. Some guards feel that for the price of one AI subscription, they could hire a real human analyst who won't make up facts.

5. The Big Irony: The "Experience Trap"

This is the most interesting part of the paper.

The Cycle: To become a senior security expert, you usually have to start as a junior guard, sorting through thousands of small, boring alerts to learn the ropes.
The Problem: The AI is now doing all those boring, entry-level tasks.
The Crisis: If the AI does all the "learning" work, how will the next generation of experts learn their job? It's like if a robot did all the practice drills for a football team; the players would never learn how to actually play the game.

The Bottom Line

The paper concludes that LLMs are a powerful tool, but not a magic wand.

Security professionals are using them like a powerful flashlight to help them see in the dark, but they are not letting the flashlight drive the car. They are using it to get faster at the boring stuff, but they are keeping a very tight grip on the critical decisions because they know that if the AI slips up, the whole building could burn down.

In short: The AI is a great intern, but it's not ready to be the boss.

1. Problem Statement

Security Operations Centers (SOCs) face an escalating crisis characterized by alert fatigue, analyst burnout, and a shortage of skilled personnel. While traditional automation (SIEM, SOAR, ML) has been deployed, it has not fully alleviated the workload. Recently, Large Language Models (LLMs) have emerged as promising tools to augment SOC workflows, with vendors marketing "autonomous AI" solutions (e.g., Microsoft Copilot for Security, CrowdStrike Charlotte AI).

However, there is a significant gap in empirical understanding regarding:

How real-world security practitioners actually use these tools.
How they perceive the benefits and risks (reliability, security, cost).
The actual adoption patterns and barriers preventing full integration.
The long-term implications for the cybersecurity workforce.

Prior research has focused on specific tools or controlled environments, lacking a broad, practitioner-driven perspective on the ecosystem of LLM adoption in security.

2. Methodology

The authors conducted a mixed-methods analysis of large-scale discourse from online cybersecurity communities to capture natural, practitioner-driven insights.

Data Source: Reddit, specifically three cybersecurity-focused subreddits: r/cybersecurity, r/Information_Security, and r/ciso.
Dataset:
- Timeframe: December 2022 to September 2025.
- Volume: 1,703 posts collected; 892 posts were manually coded as relevant after filtering for relevance, redundancy, and deletion.
- Scope: Discussions covering tools, use cases, perceptions, and adoption strategies.
Analysis Approach:
- Qualitative: Hybrid coding methodology (inductive and deductive) using a codebook derived from prior literature. Two coders achieved high inter-rater reliability ( $\alpha \ge 0.8$ ) through iterative rounds. Thematic analysis was used to distill patterns.
- Quantitative: Statistical tests (Chi-square tests of independence, two-sample tests for proportions) were used to compare the prevalence of tools, use cases, and sentiment across different factors (e.g., capabilities vs. reliability).
Ethical Considerations: The study used publicly available data with IRB exemption. All user metadata was anonymized, and posts were paraphrased to prevent re-identification.

3. Key Contributions

This study provides the first large-scale, empirical characterization of LLM adoption in SOC environments based on practitioner discourse. Its primary contributions include:

A Taxonomy of LLM Usage: Categorizing how LLMs are actually deployed in SOC workflows (from decision support to autonomous triage).
Sentiment Analysis across Critical Dimensions: Quantifying the trade-offs practitioners make between efficiency gains and risks regarding reliability, security, and cost.
Adoption Trajectories: Distinguishing between the "bottom-up" adoption of general-purpose LLMs by individual analysts and the "top-down" evaluation of enterprise-grade security tools by decision-makers.
Workforce Implications: Identifying a critical "experience gap" where automation of entry-level tasks may hinder the development of future senior analysts.

4. Key Results

A. Tools and Use Cases (RQ1)

Tool Dominance: General-purpose LLMs (e.g., ChatGPT, Microsoft Copilot) dominate discussions (60.5% of mentions), far outpacing security-specific commercial tools (43.9%). The ecosystem of security-specific tools is fragmented, with many niche vendors mentioned only once.
Primary Use Cases:
1. Incident Response & Triage (42.8%): The most discussed area. LLMs are used for alert correlation, hypothesis generation, and initial investigation.
2. Scripting & Query Support (27.1%): Generating Python/PowerShell scripts and refining SIEM queries (KQL, SQL).
3. Reporting & Documentation (25.9%): Summarizing investigations, drafting policies, and translating technical findings for business stakeholders.
4. Lower Adoption in Core Analysis: Threat analysis and knowledge support are discussed less frequently.
Autonomy Levels:
- Decision Support: Most common (analysts use LLMs as a "buddy" for ideas).
- Human-in-the-Loop: Used for initial triage of high-volume alerts, but human verification is mandatory.
- Fully Autonomous: Rare. Only a small fraction of posts describe fully autonomous mitigation (e.g., isolating endpoints), usually in highly controlled, low-risk scenarios.

B. Perceptions and Factors (RQ2)

Practitioners evaluate LLMs across six factors, showing a stark divergence in sentiment:

Positive Sentiment:
- Capabilities: LLMs are praised for contextualizing alerts, explaining complex logs, and reducing false positives by filtering noise.
- Efficiency: Significant time savings reported (e.g., reducing Mean Time to Triage from 45 mins to <2 mins).
Negative Sentiment:
- Reliability: Major concern regarding hallucinations (fabricating evidence) and non-determinism (inconsistent outputs). Practitioners fear "confident wrong answers" in high-stakes security contexts.
- Security & Privacy: Fear of data leakage (sending sensitive org data to public models) and LLMs becoming new attack surfaces (prompt injection, jailbreaking).
- Autonomy: Strong resistance to granting full autonomy. Practitioners view LLMs as "interns" that gather info but cannot execute critical actions without oversight.
- Cost: Concerns about the high cost of inference at scale and the potential for tools to be more expensive than hiring human analysts.

C. Adoption Patterns (RQ3)

Adoption Rates: ~54% of active posters report using LLMs; ~33% are evaluating them.
The "Shadow IT" Phenomenon: General-purpose LLMs are often adopted by individual analysts for low-risk, productivity tasks (scripting, summarizing) without formal organizational approval.
Enterprise Friction: While interest in enterprise-grade "AI SOC Analysts" is high, adoption is slow due to skepticism about vendor promises, integration complexity, and cost justification.
Barriers:
- Vendor Overpromising: Skepticism regarding "autonomous" marketing claims.
- Verification Overhead: The time required to verify LLM outputs often negates the efficiency gains.
- Organizational Policy: Data loss prevention (DLP) policies often block public LLMs.

5. Significance and Implications

Trust and Autonomy: The paper argues that reliability is the "hard ceiling" for LLM autonomy in SOCs. Until LLMs can communicate uncertainty and provide deterministic outputs, they will remain decision-support tools rather than autonomous agents.
Workforce Sustainability Crisis: A critical finding is the "Experience Gap." If LLMs automate entry-level (Tier 1) tasks (the traditional training ground for analysts), future senior analysts may lack the hands-on experience needed to effectively supervise or verify AI decisions. This creates a circular dependency where experienced oversight is required, but the pathway to gaining that experience is being automated away.
Design Recommendations:
- Human-in-the-Loop: Systems must be designed with mandatory human verification steps, particularly for high-stakes actions.
- Co-Learning: Training paradigms must shift from "human learns from AI" to "co-learning," where LLMs act as mentors (e.g., explaining why a decision was made) rather than just executors.
- Trustworthy AI: Future tools must focus on explainability, confidence scoring, and reducing hallucinations to gain practitioner trust.

In conclusion, while LLMs offer transformative potential for SOC efficiency, their current adoption is cautious, fragmented, and heavily reliant on human oversight. The path forward requires addressing reliability, security, and the long-term implications for the cybersecurity workforce's skill development.

Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit