Code Analyzer：用 LLM + Docker 实现代码功能验证

从静态分析到动态验证的技术实现

January 23, 2026·7 min read·Yimin

#AI#LLM#Docker#Code Analysis#Testing

代码分析的终极目标不是"看懂代码"，而是"验证代码能工作"。

🎯 问题：静态分析的局限性

传统代码分析工具（如 ESLint、SonarQube）能告诉你：

代码有没有语法错误
是否遵循编码规范
有没有潜在的安全问题

但它们无法回答一个关键问题：这段代码真的能跑起来吗？

举个例子，给你一个 NestJS 项目的 zip 包，你想知道：

消息发送功能在哪里实现的？
这个功能真的能正常工作吗？

第一个问题可以通过静态分析回答，但第二个问题需要真正运行代码。

🏗️ 架构设计：三层验证

Code Analyzer 采用三层架构来实现完整的代码验证：

┌─────────────────────────────────────────────────────────────┐
│  Layer 1: 代码理解层 (LLM)                                    │
│  └── 分析代码结构，定位功能实现                                 │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: 运行时层 (Docker Sandbox)                           │
│  └── 在隔离环境中启动项目                                      │
├─────────────────────────────────────────────────────────────┤
│  Layer 3: 验证层 (LLM + Runtime)                              │
│  └── 生成测试代码，执行并验证结果                               │
└─────────────────────────────────────────────────────────────┘

核心流程

用户上传代码 + 功能描述
        │
        ▼
┌───────────────────┐
│ 1. 代码结构分析    │ ──▶ LLM 分析源文件，输出 feature_analysis
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ 2. 启动方式分析    │ ──▶ LLM 分析 package.json/README，确定如何启动
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ 3. 启动 Docker 沙盒│ ──▶ docker run node:20 "npm install && npm start"
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ 4. 生成测试代码    │ ──▶ LLM 根据 feature_analysis 生成测试
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ 5. 执行测试        │ ──▶ docker exec 在沙盒中运行测试
└───────────────────┘
        │
        ▼
    返回结果

🔑 关键设计：feature_analysis

feature_analysis 是连接"代码理解"和"测试生成"的桥梁。

什么是 feature_analysis？

用户说"分析消息发送流程"，这是模糊的自然语言。LLM 需要把它转换成结构化的功能描述：

{
  "feature_analysis": [
    {
      "feature_description": "消息创建 - GraphQL Mutation 入口",
      "implementation_location": [
        {
          "file": "src/modules/message/message.resolver.ts",
          "function": "createMessage",
          "lines": "10-15"
        }
      ]
    },
    {
      "feature_description": "消息创建 - 业务逻辑和数据验证",
      "implementation_location": [
        {
          "file": "src/modules/message/message.service.ts",
          "function": "create",
          "lines": "24-38"
        }
      ]
    }
  ]
}

为什么重要？

没有 feature_analysis：

用户: "测试消息发送"
LLM: "嗯...消息发送...可能是 POST /messages？我猜测一下..."

有 feature_analysis：

LLM: "根据分析，消息创建是通过 GraphQL mutation createMessage 实现的，
      需要测试 channelId、title、content 字段，我来生成针对性的测试..."

feature_analysis 让测试生成从"猜测"变成"精准定位"。

🐳 Docker-in-Docker：最大的技术挑战

Code Analyzer 自身运行在 Docker 容器中，但它需要启动另一个 Docker 容器（沙盒）来运行被分析的项目。这就是 Docker-in-Docker (DinD)。

首先解释一下 Docker Daemon：Docker 分为客户端（docker 命令）和服务端（Docker Daemon）。Daemon 是后台服务程序，负责真正创建和管理容器。当你敲 docker run nginx，客户端只是发请求给 Daemon，Daemon 才去拉镜像、创建容器。

挑战 1：文件路径转换

场景：用户上传代码后，Code Analyzer 需要把代码挂载给沙盒容器运行。

通过 Docker 卷挂载，同一份文件可以被多个地方看到：

宿主机
└── ./tmp/session-abc/project/           ◀── 路径 A（真实存储位置）
    ├── package.json  ──────────────────────┐
    └── src/                                │
                                            │ 同一份文件
Code Analyzer 容器                           │ (通过 -v ./tmp:/tmp/code-analyzer 挂载)
└── /tmp/code-analyzer/session-abc/project/ │◀── 路径 B
    ├── package.json  ◀─────────────────────┤
    └── src/                                │
                                            │
沙盒容器 (node:20)                           │ (需要挂载)
└── /app/                                   │◀── 路径 C
    ├── package.json  ◀─────────────────────┘
    └── src/

问题：Code Analyzer 容器要创建沙盒时，执行的命令是：

docker run -v /tmp/code-analyzer/session-abc/project:/app node:20

但 Docker Daemon 运行在宿主机上，它会去宿主机找 /tmp/code-analyzer/session-abc/project——这个路径在宿主机上不存在！它只存在于 Code Analyzer 容器内部。

解决方案：路径转换（B → A）

# settings.py
def get_host_path(self, container_path: str) -> str:
    """Convert container path to host path for Docker volume mounts"""
    if not self.host_upload_dir:
        return container_path
    if container_path.startswith(self.upload_dir):
        return container_path.replace(self.upload_dir, self.host_upload_dir, 1)
    return container_path

# 转换示例：
# /tmp/code-analyzer/session-abc/project
#        ↓
# /Users/you/code-analyzer/tmp/session-abc/project

# docker-compose.yml
volumes:
  - ./tmp:/tmp/code-analyzer    # 建立宿主机和容器的目录对应关系
environment:
  - HOST_UPLOAD_DIR=${PWD}/tmp  # 告诉容器宿主机的真实路径是什么

挑战 2：网络连通性

场景：沙盒容器启动了 NestJS 服务在 3000 端口，Code Analyzer 需要做健康检查。

问题：每个容器都有自己独立的 localhost，它们之间不互通。

┌─────────────────────────────────────────────────────────┐
│                      宿主机                              │
│                   0.0.0.0:3000 ◀── 端口映射              │
│                        ▲                                │
│  ┌─────────────────┐   │   ┌─────────────────┐          │
│  │ Code Analyzer   │   │   │ 沙盒容器         │          │
│  │                 │   │   │                 │          │
│  │ localhost:3000  │   │   │ localhost:3000  │          │
│  │ (自己的端口,空) │   │   │ (NestJS 服务)   │──────────┘
│  └────────┬────────┘   │   └─────────────────┘   -p 3000:3000
│           │            │
│           │ curl localhost:3000 ──▶ ❌ 访问自己（空的）
│           │
│           │ curl host.docker.internal:3000 ──▶ ✅ 访问宿主机
│           └──────────────────────────────────────▶ 转发到沙盒
└─────────────────────────────────────────────────────────┘

host.docker.internal 是 Docker 提供的特殊域名，在容器内访问它等于访问宿主机。

解决方案：健康检查 URL 转换

# project_runner.py
async def _wait_for_health(self, project: RunningProject) -> None:
    health_url = config.health_check_url  # http://localhost:3000/graphql
    
    # 在 Docker 中运行时，转换 localhost 为 host.docker.internal
    if health_url and settings.host_upload_dir:
        health_url = health_url.replace("localhost", "host.docker.internal")
        # 变成: http://host.docker.internal:3000/graphql

挑战 3：测试执行位置

最初的实现尝试在 Code Analyzer 容器内运行 node test.mjs，但容器里没装 Node.js：

FileNotFoundError: [Errno 2] No such file or directory: 'node'

解决方案：在沙盒容器内执行测试

# test_runner.py
async def _execute_tests(self, test_code: str, container_name: str) -> dict:
    # 1. 把测试代码复制到沙盒容器
    copy_cmd = f"docker cp {local_test_file} {container_name}:/tmp/test.mjs"
    
    # 2. 在沙盒容器内执行
    exec_cmd = f"docker exec {container_name} node /tmp/test.mjs"

这样测试代码在沙盒内运行，可以直接用 localhost:3000 访问服务。

📡 实现细节：启动方式分析

不同项目的启动方式千差万别：

项目类型	启动命令	运行时
Node.js + npm	`npm install && npm start`	node:20
Python + pip	`pip install -r requirements.txt && python app.py`	python:3.11
Go	`go build && ./app`	golang:1.21

让 LLM 分析项目结构来确定启动方式：

# startup_analyzer.py
async def analyze_startup_method(project_dir: str) -> StartupConfig:
    # 读取项目文件
    package_json = read_file(f"{project_dir}/package.json")
    readme = read_file(f"{project_dir}/README.md")
    
    # 让 LLM 分析
    prompt = f"""
    分析这个项目的启动方式：
    
    package.json: {package_json}
    README: {readme}
    
    返回：
    - start_method: npm/pip/go/docker
    - runtime: 基础镜像
    - install_command: 安装依赖命令
    - start_command: 启动命令
    - health_check_url: 健康检查地址
    - service_port: 服务端口
    """
    
    return await llm.analyze(prompt)

🧪 测试代码生成

有了 feature_analysis，LLM 可以生成精准的测试代码：

// LLM 生成的测试代码示例
const BASE_URL = 'http://localhost:3000/graphql';

async function runTests() {
  // Test 1: Create Channel (前置条件)
  const channel = await graphql(`
    mutation {
      createChannel(input: { name: "Test Channel" }) { id }
    }
  `);
  
  // Test 2: Create Message (核心功能)
  const message = await graphql(`
    mutation {
      createMessage(input: {
        channelId: ${channel.id},
        title: "Test",
        content: "Hello"
      }) { id title content }
    }
  `);
  
  // Test 3: Query Message (验证持久化)
  const queried = await graphql(`
    query { message(id: ${message.id}) { id title } }
  `);
  
  assert(queried.title === "Test", "Message should be persisted");
}

📊 完整示例

输入

curl -X POST http://localhost:3006/analyze/stream \
  -F "problem_description=分析消息发送流程" \
  -F "code_zip=@nestjs-messenger.zip" \
  --no-buffer

输出（SSE 流式）

data: {"stage": "extracting", "message": "Extracting code archive..."}
data: {"stage": "analyzing_code", "message": "Analyzing code structure with AI..."}
data: {"stage": "analyzing_startup", "message": "Using npm with node:20..."}
data: {"stage": "starting_project", "message": "Starting project in Docker sandbox..."}
data: {"stage": "waiting_health", "message": "Service is running on port 3000"}
data: {"stage": "generating_tests", "message": "Generated 7987 bytes of test code"}
data: {"stage": "running_tests", "message": "Tests passed"}
data: {"stage": "cleanup", "message": "Cleanup complete"}
data: {"stage": "complete", "data": {...}}

🔧 部署

git clone https://github.com/Wangggym/code-analyzer.git
cd code-analyzer

cp .env.example .env
# 编辑 .env 填入 ANTHROPIC_API_KEY

docker compose up -d --build

# 测试
curl http://localhost:3006/health

💡 总结

Code Analyzer 的核心创新：

LLM 驱动的代码理解：不依赖固定规则，能适应任意项目结构
真正的动态验证：不是猜测代码能不能跑，而是真正运行验证
Docker 隔离：被测项目在沙盒中运行，安全且可复现
feature_analysis 桥梁：结构化的功能描述，让测试生成精准有效

静态分析告诉你代码"看起来"没问题，动态验证告诉你代码"确实"能工作。

📚 相关链接

GitHub: Wangggym/code-analyzer
Docker-in-Docker 官方文档: Using Docker-in-Docker