Jeremy Longshore

The Challenge: Integration Works, But Not Well Enough

I integrated my AI agent platform (Bob’s Brain) with Slack for a client project. The webhook verified successfully, messages flowed through, and responses came back. But every message triggered 6 duplicate responses.

This wasn’t acceptable for production. Users would receive the same answer six times, creating confusion and making the system appear broken.

The Investigation Process

Step 1: Establish a Stable Foundation

Initial problem: The public tunnel service (localhost.run) kept changing URLs every few hours, requiring manual Slack configuration updates each time.

Decision: Migrate to Cloudflare Tunnel for stability.

Implementation:

# Deployed cloudflared daemon
nohup cloudflared tunnel --url http://localhost:8080 > /tmp/cloudflared.log 2>&1 &

# Stable URL acquired
https://editor-steering-width-innovation.trycloudflare.com

Result: Tunnel remained stable for the entire debugging session and beyond.

Step 2: Eliminate Noise (LlamaIndex Migration)

Observation: Knowledge Orchestrator was throwing deprecation warnings that cluttered logs during debugging.

Action: Migrated from deprecated ServiceContext API to modern Settings API in the knowledge integration layer.

Impact: Clean logs made it easier to identify the actual Slack integration issue.

Step 3: Analyze the Duplicate Response Pattern

Data collected:

Hypothesis: Slack was retrying webhook events that weren’t being acknowledged fast enough.

Step 4: Measure Response Times

LLM processing times observed:

Slack’s timeout: 3 seconds

Root cause confirmed: Our webhook was processing the entire LLM query synchronously before returning HTTP 200, exceeding Slack’s timeout window and triggering automatic retries.

The Solution Architecture

Design Principles

  1. Immediate acknowledgment - Return HTTP 200 within 100ms
  2. Asynchronous processing - Handle LLM query in background thread
  3. Idempotent handling - Deduplicate retries using event IDs
  4. Graceful degradation - Cache responses for instant replies to repeated questions

Implementation

Event deduplication layer:

_slack_event_cache = {}  # In-memory cache of processed event IDs

if event_id and event_id in _slack_event_cache:
    return jsonify({"ok": True})  # Already processing/processed

_slack_event_cache[event_id] = True

Background processing:

# Spawn daemon thread for LLM processing
thread = threading.Thread(
    target=_process_slack_message,
    args=(text, channel, user, event_id),
    daemon=True
)
thread.start()

# Return immediately
return jsonify({"ok": True})

Cleanup mechanism:

# Remove from cache after 60 seconds (prevents memory leak)
threading.Timer(60, lambda: _slack_event_cache.pop(event_id, None)).start()

Why This Approach

Alternative considered: Queue-based processing (Redis, Celery)

Why I chose threading:

When I’d use queues: High-volume production (>100 msg/sec) or need for guaranteed delivery across server restarts.

Results and Validation

Performance metrics:

User experience:

User: "Hey Bob, what is DiagPro?"
Bob: [Single comprehensive response]

Production readiness achieved: System now handles concurrent users without duplicate responses.

Additional Work: Knowledge Integration

During this integration, I also:

  1. Created comprehensive customer avatar (19,000 words) for DiagPro automotive diagnostic platform
  2. Trained AI agent on domain-specific knowledge using /learn endpoint
  3. Verified knowledge retrieval through multi-source query system (653MB knowledge base + analytics database + research index)

This demonstrates the full integration capability: not just connecting systems, but making them intelligently context-aware.

Technical Skills Demonstrated

Lessons Applied to Future Projects

  1. Always measure before optimizing - I confirmed Slack’s timeout was the bottleneck before changing architecture
  2. Simple solutions first - Threading solved the problem without adding Redis/Celery complexity
  3. Design for retries - External services WILL retry; handle it gracefully
  4. Stable foundations matter - Switching to Cloudflare Tunnel eliminated one entire class of debugging complexity

Jeremy Longshore Email: jeremy@intentsolutions.io GitHub | LinkedIn

Solving complex integration challenges with systematic debugging and production-ready solutions.

#Production-Debugging #Api-Integration #Problem-Solving #Slack #System-Architecture