Dealing with AI Risk¶
Every few years, something new comes along and the infosec community collectively loses its mind. AI is no different. Businesses are moving fast, GRC teams are scrambling to catch up, and everyone is asking the same question - how do we manage this risk?
I've been in this space long enough to know that the answer is usually simpler than it looks.
It's not that new¶
Here's the thing - an LLM isn't actually a new type of risk. It's a combination of two things your organisation already knows how to deal with:
- An API - something that takes input, does something with it, and spits out a result
- A contractor acting on your behalf - someone you've given access and trust to, who makes decisions and takes actions in your name
The tricky part is that an LLM is both of these things at the same time. Your AppSec team has always managed APIs. Your GRC team has always managed human agent risk. They've just never had to manage the same thing, together, at the same time.
That's where the confusion comes from. Not because AI is fundamentally different, but because it breaks down a boundary that's always existed between two separate disciplines.
Mapping the risk¶
If you think about it that way, the risk categories start to look pretty familiar:
| Risk | As an API | As a Human Agent | As an LLM |
|---|---|---|---|
| Injection | SQL/code injection | Social engineering | Prompt injection - same input, both problems |
| Impersonation | Spoofed auth tokens | Fake instructions from "management" | Compromised system prompt |
| Scope Violation | Privilege escalation | Insider threat | Excessive agency - it reasons its way into things |
| Audit Evasion | Log tampering | Hiding actions | Generates plausible justifications that look clean |
| Denial of Function | DoS, resource exhaustion | Poor decisions under pressure | Context stuffing, token exhaustion |
| Data Exposure | Sensitive data in API responses | Oversharing | System prompt leakage, training data exposure |
| Manipulation | Malformed inputs | Phishing, coercion | Jailbreaking - manipulating reasoning, not code |
| Scale | Automated exploit scripts | A rogue employee leaves a trail | A manipulated LLM acts at API speed, with a straight face |
You already have the controls¶
This is the part no one wants to hear, because it means the problem is on us. If your organisation has a mature API security practice and a halfway decent approach to managing third-party and human agent risk, you've already got most of what you need.
| Control | API Context | Human Agent Context | Applied to LLMs |
|---|---|---|---|
| Least Privilege | Scoped API keys | Role-based access, need-to-know | Scoped tool access, action whitelisting |
| Audit Trail | Immutable request/response logging | Action logs, four-eyes principle | Log inputs and reasoning, not just outputs |
| Input Validation | Sanitise all inputs | Verify instructions through proper channels | Prompt filtering, instruction hierarchy |
| Segregation of Duties | Separate services for sensitive operations | No single person end-to-end | Model reasons, tool layer executes, human approves irreversible actions |
| Rate Limiting | Throttle requests, detect anomalies | Workload monitoring | Token budgets, query limits |
| Escalation Paths | Alert on anomalous behaviour | Escalate unusual requests | Hard limits on action scope, human-in-the-loop for high-risk actions |
What about the frameworks?¶
There are a few worth knowing about:
- OWASP LLM Top 10 - probably the most practical starting point, written from an application security angle
- MITRE ATLAS - good on the technical attack side, modelled on the same approach as MITRE ATT&CK
- NIST AI RMF - governance heavy, maps reasonably well to ISO 27001 thinking
- EU AI Act - regulatory, useful for compliance conversations, not really a security framework
None of them do a great job of bridging the API security world and the human agent trust model. That's the gap where most organisations are currently exposed.
The bottom line¶
I'm not going to dress this up. Your LLM is an API that can also be talked into doing things it shouldn't. You can exploit it technically, or you can manipulate it contextually - and sometimes the same single input does both.
The good news is you don't need to invent anything new. Apply your API security controls to the endpoint. Apply your human agent trust model to the actions it takes. Figure out where those two things overlap, and make sure someone owns that gap.
That's not a new discipline. That's just GRC doing its job.