Operations Runbook: Accepted Risk Monitoring & Response CommunityPro
This runbook covers two accepted risks that require ongoing operational awareness: DataGuard connection termination and AI prompt injection residual risk. Both are intentional design decisions with known trade-offs.
DataGuard Connection Termination
Section titled “DataGuard Connection Termination”What is it
Section titled “What is it”DataGuard monitors per-query response size and rolling-window transfer volume. When a limit is exceeded, the connection is permanently blocked for the remainder of its lifetime. The client receives:
FATAL: Querycop: data guard: response size NNN bytes exceeds limit NNN bytes (reconnect to continue)This is not a bug. It is a security-first design to prevent data exfiltration via repeated large queries within a single connection.
Monitoring
Section titled “Monitoring”Metrics to watch:
- Audit log events with
event_type = "data_guard_violation" - WebSocket events of type
query.data_guard_violation - Sudden increase in connection count (reconnect storms)
Alerting thresholds (suggested):
-
5 violations per hour from same
db_user-> investigate -
20 violations per hour across all users -> possible misconfiguration
- Reconnect rate > 10x normal -> possible reconnect loop
Triage procedure
Section titled “Triage procedure”-
Check the violating user and query
GET /audit?type=data_guard_violation&limit=10 -
Determine if it is legitimate usage or exfiltration
- Legitimate: analytics export, large JOIN, reporting query
- Suspicious:
SELECT *without WHERE, bulk dump pattern, unfamiliar user
-
If legitimate:
- Increase
GATEKEEPER_MAX_RESPONSE_MBfor the specific workload - Consider per-role DataGuard overrides (future feature)
- Advise the application to use pagination (
LIMIT/OFFSET)
- Increase
-
If suspicious:
- Do NOT increase limits
- Check if the user should have access to this data
- Review RBAC policy for the user’s role
- Consider temporary Break-Glass revocation or policy tightening
- Escalate to security team if bulk data access is confirmed
Configuration reference
Section titled “Configuration reference”| Variable | Default | Description |
|---|---|---|
GATEKEEPER_MAX_RESPONSE_MB | 100 | Max single query response (MB) |
GATEKEEPER_MAX_WINDOW_MB | 500 | Max transfer per 60-second window (MB) |
Escalation
Section titled “Escalation”- If violation count exceeds alerting threshold: page on-call DBA
- If suspected exfiltration: escalate to security incident process
- If legitimate workload consistently hits limits: file capacity planning ticket
AI Prompt Injection Monitoring
Section titled “AI Prompt Injection Monitoring”What is it
Section titled “What is it”SQL queries are sent to an LLM for risk scoring. An attacker who controls SQL content (e.g., via application-level SQL injection) may attempt to manipulate the LLM’s analysis. Querycop has multiple defense layers, but prompt injection is inherently unsolvable at the LLM level.
Current defenses
Section titled “Current defenses”| Layer | Mechanism | What it prevents |
|---|---|---|
| Comment stripping | SanitizeSQLForAnalysis | -- Ignore instructions |
| System prompt | Anti-injection instructions | {"score":0} embedded in SQL |
| Server-side override | Destructive keyword check | DELETE/DROP with score < 10 -> force score 50 |
| Threshold enforcement | ShouldAutoApprove | Server decides, not AI text |
Monitoring
Section titled “Monitoring”Metrics to watch:
- AI score override events: search audit log for
[score overridden by safety check]inrisk_reason - Suspiciously low scores for destructive queries (< 10 for DELETE/UPDATE/DDL)
- AI error rate (provider timeouts, parse failures)
- Unusual
risk_reasontext patterns (very long, contains JSON-like content, contains English instructions)
Alerting thresholds (suggested):
- Score override > 3 per hour -> investigate SQL content
- AI error rate > 10% -> check provider status
- Same query pattern receiving wildly different scores -> possible adversarial probing
Triage procedure
Section titled “Triage procedure”-
Check recent AI analysis results
GET /audit?limit=20Look for entries with
risk_scoreandrisk_reason. -
If score override is firing frequently:
- The override means the AI returned a low score for a destructive query
- This could be prompt injection or just an AI misjudgment
- Review the actual SQL queries that triggered the override
- If queries contain natural language text mixed with SQL: likely injection attempt
-
If AI provider is returning errors:
- Check provider status page (OpenAI, Anthropic, etc.)
- Queries will pass through without AI scoring when AI is unavailable
- Consider temporarily setting
auto_approve_threshold: 0to require human approval for all destructive queries
-
If you suspect active adversarial probing:
- Lower
auto_approve_thresholdto 0 (all destructive queries require human) - Review Slack/webhook notifications for unusual patterns
- Check if the SQL source application has a SQL injection vulnerability
- The attacker may be exploiting the application, not Querycop directly
- Lower
Risk reason trust boundary
Section titled “Risk reason trust boundary”The risk_reason field from AI analysis is untrusted text. It appears in:
- Slack notifications (escaped via
escapeSlackMrkdwn) - Dashboard UI (escaped via
escH) - Audit log (stored as-is)
Operators should treat risk_reason as advisory context, not as a reliable
classification. The authoritative signal is the numeric risk_score after
server-side override.
Fallback: disable AI and require human approval
Section titled “Fallback: disable AI and require human approval”If AI scoring becomes unreliable:
# Remove AI provider (disables AI scoring entirely)unset AI_API_KEY
# Set all destructive queries to require human approval# (via policy: set auto_approve_threshold to 0 for all roles)Without AI, Querycop still blocks destructive queries and requires human approval. AI scoring is an additional signal, not the sole gate.
Escalation
Section titled “Escalation”- AI scoring consistently wrong: file issue with AI provider and adjust thresholds
- Active adversarial probing confirmed: escalate to security incident
- Provider outage > 1 hour: consider switching to backup provider or disabling AI
Change History
Section titled “Change History”| Date | Change |
|---|---|
| 2026-04-01 | Initial runbook creation |