AI for Threat Intelligence: What Actually Works
An honest breakdown of AI-powered threat intelligence -- what the platforms actually automate, what still needs a human analyst, and where the accuracy claims fall apart.
Every vendor in the threat intelligence space is pitching AI as the reason you should buy their platform. Recorded Future has AI. Mandiant Advantage has AI. ThreatConnect has AI. The pitch decks all say the same thing: automated threat intelligence that turns raw data into actionable insights at machine speed.
Some of that is real. Some of it is a language model bolted onto a feed aggregator with a marketing budget. After running red team engagements against organizations using these platforms, here is what AI-powered CTI actually delivers in production — and where it still falls flat.
What AI CTI Platforms Actually Automate Well
Three capabilities consistently work as advertised across the major platforms. These aren’t aspirational features. They’re operational today and measurably reduce analyst workload.
IOC Enrichment
This is the strongest use case for AI in threat intelligence. When your SIEM or SOAR ingests a raw indicator — an IP address, a file hash, a domain — AI-powered platforms cross-reference it against multiple intelligence sources in seconds rather than the 15-30 minutes it takes an analyst to pivot across VirusTotal, Shodan, passive DNS, and WHOIS manually.
Recorded Future does this particularly well. Its Intelligence Cloud scores IOCs on a 1-100 risk scale by aggregating data from open-source feeds, dark web monitoring, paste sites, and proprietary collection. When a suspicious IP hits your SIEM, Recorded Future’s API returns the risk score, associated malware families, known threat actor links, and historical context in under two seconds.
Mandiant Advantage takes a different approach. Its enrichment draws heavily from Mandiant’s incident response casework, which means the context you get is weighted toward APT and nation-state activity. If an IOC shows up in Mandiant’s data, it’s because a real IR team encountered it in a real breach investigation. The coverage is narrower than Recorded Future’s but the signal quality for advanced threats is higher.
ThreatConnect’s enrichment model is more customizable. It lets you define scoring rules that weight different intelligence sources based on your own priorities — if you trust your industry ISAC feed more than open-source aggregators, you can configure that. The tradeoff: more configuration effort up front.
Accuracy in practice: IOC enrichment scores are useful for prioritization, not for verdict. In our testing, Recorded Future’s risk scores above 75 correlated with confirmed malicious activity roughly 80-85% of the time. Scores between 50-75 were a coin flip — about 50% true positive rate. Below 50, you’re mostly looking at noise. Mandiant’s data showed higher precision on scored indicators (around 90% for high-confidence attributions) but lower coverage overall; many IOCs simply aren’t in their dataset.
Report Summarization
Threat intelligence analysts spend a significant portion of their time reading. Vendor advisories, OSINT reports, government bulletins, dark web posts — the volume is crushing. A mid-size SOC might process 50-100 intelligence reports per week.
AI summarization is genuinely useful here. Recorded Future’s AI-generated summaries condense multi-page threat reports into structured briefs: affected industries, IOCs mentioned, TTPs mapped to MITRE ATT&CK, and recommended defensive actions. Mandiant Advantage does the same for its threat briefs and campaign tracking reports.
The quality is good enough for initial triage. An analyst can read a 200-word AI summary, decide if the full report is relevant to their environment, and either deep-dive or move on. This replaces the “skim 30 reports to find the 3 that matter” workflow that eats hours every week.
Where summarization breaks down: AI summaries miss nuance. A human analyst reading a Mandiant report on a Chinese APT campaign will pick up on hedging language (“we assess with moderate confidence”) and contextual clues that inform how much weight to give the findings. The AI summary flattens that nuance into a bullet point. For strategic intelligence — the kind that informs defensive architecture decisions — always read the full report.
Feed Deduplication and Correlation
Every CTI platform ingests multiple intelligence feeds: commercial feeds, open-source (AlienVault OTX, Abuse.ch), ISACs, government feeds (CISA, US-CERT), and internal indicators from your own investigations. The overlap between feeds is substantial. The same IP address showing up in five different feeds isn’t five separate threats — it’s one indicator reported five times with slightly different metadata.
AI-driven dedup and correlation across these feeds is table stakes for all three major platforms. ThreatConnect’s Threat Intelligence Platform (TIP) is arguably strongest here — it was purpose-built for feed aggregation and correlation, and its AI layer identifies when indicators from different sources refer to the same campaign or actor even when the raw data doesn’t include explicit links.
This matters for alert volume. Without dedup, your SIEM correlates against every raw indicator from every feed, generating duplicate alerts for the same underlying threat. With AI dedup, you get one correlated alert with context from all sources attached.
What Still Requires Human Analysts
This is the section vendors skip in their pitch decks. AI automates the mechanical parts of threat intelligence. The analytical parts — the work that turns data into decisions — still require humans.
Attribution
Attributing a cyberattack to a specific threat actor or nation-state is one of the hardest problems in intelligence analysis. AI can identify overlaps in infrastructure, tooling, and TTPs that suggest links to known threat groups. It cannot make the final attribution call.
Why not? Attribution requires weighing evidence that’s inherently uncertain. False flag operations exist. Threat actors share tools. Infrastructure gets reused, sold, or compromised. A cluster of IOCs that looks like APT29 could be APT29, or it could be a different group using leaked APT29 tooling, or it could be a red team mimicking APT29 TTPs for a purple team exercise.
Mandiant Advantage assigns threat clusters (UNC groups) to track activity before formal attribution. That “uncategorized” label exists precisely because their analysts — among the best in the industry — won’t attribute activity until the evidence meets their confidence threshold. No AI model has the judgment to make that call.
Geopolitical Context
Threat intelligence doesn’t exist in a vacuum. A spike in scanning activity from Chinese IP ranges means something different during a Taiwan Strait crisis than it does on a random Tuesday. Ransomware campaigns targeting healthcare spike during open enrollment periods. Nation-state activity shifts in response to sanctions, diplomatic events, and military operations.
AI models trained on historical threat data can identify patterns, but they can’t incorporate breaking geopolitical context that wasn’t in the training data. When Russia invaded Ukraine in 2022, the surge in wiper malware targeting Ukrainian infrastructure was predictable to any analyst following the situation. No ML model predicted it from the IOC data alone.
Strategic intelligence — the kind that tells a CISO “we should expect increased targeting from X because of Y” — requires analysts who understand the threat actor’s motivations, capabilities, and operating environment. This is political science and international relations work, not data science.
Contextualizing Threats to Your Specific Environment
A high-severity vulnerability in Apache Struts is a critical finding for organizations running Struts. It’s completely irrelevant to an organization that doesn’t have Struts anywhere in their stack. AI platforms can tell you a vulnerability exists and that it’s being exploited in the wild. They can’t reliably map that to your specific asset inventory, business processes, and risk tolerance.
This gap is narrowing. CrowdStrike Falcon and Recorded Future both offer integrations that correlate external threat intelligence with internal asset data. But “narrowing” isn’t “closed.” The mapping between “this threat exists” and “this threat matters to us” still requires an analyst who knows the environment.
False Positive Rates: The Number Nobody Wants to Publish
Here is the uncomfortable truth about AI-powered threat intelligence: false positive rates vary wildly depending on the indicator type, and vendors don’t publish these numbers.
From our operational experience across engagements:
- IP reputation scores: 15-30% false positive rate at default thresholds. IP addresses rotate. CDN and cloud provider ranges get flagged because a previous tenant was malicious. Legitimate services (Cloudflare Workers, AWS Lambda endpoints, Azure Functions) appear on blocklists because other customers used the same infrastructure for attacks.
- Domain reputation: 10-20% false positive rate. Newly registered domains get flagged as suspicious by default. Domain generation algorithms (DGA) detection has improved but still flags legitimate short-lived domains (marketing campaign trackers, A/B testing domains).
- File hash reputation: 5-10% false positive rate. This is the most reliable indicator type because hashes are deterministic. False positives come from hash collisions in fuzzy hashing (ssdeep) and from legitimate software that shares code with malicious tools (Cobalt Strike is the classic example — the legitimate penetration testing tool and the cracked copies used by criminals produce overlapping signatures).
- Behavioral indicators: 20-40% false positive rate. “Suspicious PowerShell execution” is the most over-fired alert in modern security. Behavioral detection is where AI adds the most value (baseline comparison) and also where it produces the most noise.
These numbers shift based on your environment, your tuning, and your baseline period. An organization that runs its baselines for six months will see lower false positive rates than one that deploys and starts alerting immediately. But the ranges above are representative of what we see in practice.
Practical Recommendations
If you’re evaluating AI-powered CTI platforms, here’s what matters:
If your primary need is feed aggregation and IOC enrichment: Recorded Future or ThreatConnect. Both handle the mechanical intelligence work well. Recorded Future has broader coverage; ThreatConnect gives you more control over scoring logic.
If your primary concern is APT and nation-state threats: Mandiant Advantage. The intelligence quality from their IR casework is unmatched for advanced threat tracking. The coverage for commodity threats is thinner.
If you’re a Microsoft-stack shop: Microsoft’s Defender Threat Intelligence (MDTI) integrates natively with Sentinel and the rest of the security suite. The intelligence quality is improving but still trails the dedicated CTI vendors for depth.
Regardless of platform: Budget for at least one full-time intelligence analyst who reads reports, understands your business context, and makes the judgment calls that AI cannot. A $200K/year CTI platform with no analyst to interpret the output is an expensive alerting engine. A $200K platform with a skilled analyst is a force multiplier.
The AI in these platforms is real and it works — for the mechanical, high-volume tasks that burned out analysts before. It doesn’t replace the analytical tradecraft that makes threat intelligence actionable. Any vendor who tells you otherwise is selling you a product, not a capability.