Skip to main content

Command Palette

Search for a command to run...

Why Code Scanning Alone Isn't Enough for AI-Generated Code

Published
5 min read

Scanning finds what's wrong with the code. It doesn't answer how the code got there.


The Assumption That No Longer Holds

Every code scanning tool — Semgrep, Snyk, CodeQL, GitHub Advanced Security — was built on the same assumption:

Code enters the repository through a trusted process (developer → git commit → push → review → merge), and the scanner's job is to find bugs in that code.

For decades, this assumption held. Developers wrote code on company machines, committed through company Git, and reviewers checked it before merging.

AI coding tools broke this assumption.


What Changed

1. Code Now Originates Outside the Network

When a developer uses Claude Code or Gemini CLI, the code is generated on a machine with internet access — often outside the corporate network entirely. It then needs to be transferred into the internal network.

This transfer step has no standard protocol. Developers use:

  • Email attachments

  • Personal cloud storage

  • Chat messages

  • USB drives (where allowed)

  • Copy-paste from personal devices

None of these have signature verification, integrity checks, or audit trails.

2. Volume Exceeds Human Review Capacity

AI tools can generate hundreds of files in a single session. A developer might push an entire module — authentication, database layer, API endpoints — all generated in one afternoon.

Traditional code review was designed for human-written diffs of 50-200 lines. It doesn't scale to AI-generated codebases.

3. Identity Is No Longer Implicit

In a traditional workflow, the git author is the person who sat at the keyboard. It's not cryptographically verified, but it's generally trustworthy because the developer is on a managed machine with SSO.

With AI-generated code transferred from external environments, the git author name is just a string someone typed. There's no mathematical proof of who actually produced or sent the code.

4. AI Agents Can Act Autonomously

Modern AI coding tools can execute shell commands. An AI agent could theoretically:

  • Generate code

  • Run leeh push (or any transfer mechanism)

  • Repeat indefinitely

Without a human-in-the-loop gate, there's no guarantee a person reviewed what was sent.


What Scanning Catches — and What It Misses

Scanning catches:

  • SQL injection patterns

  • Hardcoded secrets (API keys, passwords)

  • Known vulnerable dependencies

  • Insecure cryptographic usage

  • XSS and CSRF patterns

Scanning misses:

  • Who sent this code? — No scanner verifies the sender's identity

  • Was it modified in transit? — No scanner checks integrity between origin and repository

  • Was it quarantined? — Scanners run after code is already in the repo

  • Did a human approve the transfer? — No scanner enforces human attestation

  • What was the intent? — Pattern matching finds known-bad code, but a novel exfiltration technique written by an AI might not match any existing rule


The Missing Layer: Inbound Verification

Code scanning is post-entry security. It analyzes what's already inside.

What's missing is pre-entry security — a controlled checkpoint that code must pass through before it reaches the repository.

Pre-entry (missing in most enterprises):
  Identity    → Who sent this? (cryptographic proof)
  Integrity   → Was it tampered with? (hash verification)
  Quarantine  → Is it isolated until verified? (3-state)
  Attestation → Did a human approve it? (optional gate)

Post-entry (already solved):
  SAST        → Semgrep, CodeQL
  Secrets     → gitleaks, TruffleHog
  Dependencies → Snyk, Dependabot
  Review      → Pull request review

You need both layers. Most enterprises only have the second one.


A Practical Example

Without inbound verification:

Developer generates auth module with Claude Code
  → Emails zip file to work address
  → Extracts on work laptop
  → git add, commit, push
  → CI runs Semgrep → no findings
  → Merged to main
  
Who sent it? Unknown (email is not identity verification)
Was it modified? Unknown (no integrity check)
Was it quarantined? No
Was it approved? Only the code review, not the transfer

With inbound verification (leeh):

Developer generates auth module with Claude Code
  → leeh push (Ed25519 signed, SHA-256 hashed)
  → Gateway verifies signature → valid
  → Gateway verifies hash → intact
  → Semgrep + gitleaks scan → clean
  → Spec Contract check → paths allowed
  → Quarantine → accepted
  → Committed to internal Git
  
Who sent it? alice (Ed25519 signature, non-repudiable)
Was it modified? No (SHA-256 verified)
Was it quarantined? Yes (pending → accepted)
Was it approved? Yes (scan passed, optionally human-approved)
Full audit trail? Yes (every step logged)

The Argument for Both

This isn't about replacing your scanner. It's about acknowledging that scanning alone is incomplete when code originates outside your network.

Security Layer What It Answers Tools
Inbound verification Should this code be allowed in? leeh
Static analysis Does this code have vulnerabilities? Semgrep, CodeQL
Secret detection Does this code contain secrets? gitleaks, TruffleHog
Dependency scanning Are the dependencies safe? Snyk, Dependabot
Code review Does a human approve the logic? Pull requests

Each layer answers a different question. Skip one, and you have a gap.


What To Do Next

  1. Keep your scanners. They're essential. Don't remove them.

  2. Add an inbound layer. Verify identity, integrity, and intent before code enters your repository.

  3. Require signatures. Git author names are theater. Ed25519 signatures are math.

  4. Quarantine first. Don't let code touch internal Git until all checks pass.

  5. Log everything. When an auditor asks "how did this code get into your system?", have an answer.


leeh — LLM Escrow & Entry Hub. One-way secure inbound pipeline for AI-generated code.

GitHub · leeh.io

4 views