The Professional Challenge
I spent today tackling a problem every growing engineering team faces: the gap between “it works on my machine” and true production readiness. My youth sports statistics platform (HustleStats) had reached that critical inflection point where manual deployments and scattered documentation were becoming serious technical debt.
The wake-up call came in the form of multiple failed deployment emails. ESLint failures. Workflow conflicts. Wrong GCP project references. The kind of cascading issues that signal deeper systemic problems rather than isolated bugs.
What This Session Taught Me About Problem-Solving
1. Following Through on Standards
Early in the session, I made a critical mistake: I violated the project’s documentation filing system by creating files in /reports/ instead of the designated /docs/ directory. The immediate correction I received was blunt but valuable - it forced me to confront a common professional pitfall: knowing the standard versus enforcing it consistently.
This led to a complete documentation audit:
- Consolidated 123 scattered markdown files from 8+ directories
- Implemented DOCUMENT-FILING-SYSTEM-STANDARD-v2.0 with strict naming:
NNN-CC-ABCD-description.ext - Removed ALL subdirectories - completely flat structure
- 100% compliance validation
Professional Growth Moment: Standards exist for a reason. Half-compliance is worse than no standard at all, because it creates confusion. When you discover you’ve violated a standard, the right response isn’t to justify it - it’s to fix it completely and understand why the standard exists.
2. Debugging Deployment Pipelines Under Pressure
The deployment failures came in waves:
First Wave: Conflicting GitHub Actions workflows
- Old workflow using deprecated service account keys
- Wrong GCP project ID (
hustle-dev-202510instead ofhustleapp-production) - Solution: Deleted old workflow, verified correct project via
gcloud projects list
Second Wave: 16 ESLint errors blocking CI
- Unescaped apostrophes in JSX (Next.js 15 requirement)
- Explicit
anytypes violating TypeScript strict mode - Solution: Systematic fixes across all files, proper type annotations
Third Wave: Auto-fix workflow design
- Challenge: “How do we prevent this from happening again?”
- Solution: Created two-tier workflow system:
auto-fix.yml- Automatically fixes common ESLint issues on PRsbranch-protection.yml- Strict validation + staging deployments before merge
What I Learned: Production debugging isn’t about quick fixes - it’s about understanding the system holistically. Each error was a symptom, not the disease. The real problem was lack of CI/CD guardrails.
3. Infrastructure-as-Code Thinking
Setting up Workload Identity Federation (WIF) for GitHub Actions taught me an important lesson about modern cloud security:
# OLD WAY (deprecated, security risk):
- uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }} # ❌ Secret management nightmare
# NEW WAY (keyless, federated):
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }} # ✅ No keys stored
Key Insight: Modern cloud infrastructure favors identity federation over credential management. This isn’t just “more secure” - it’s fundamentally a better architecture because it eliminates entire classes of security vulnerabilities (leaked keys, rotation management, access audit trails).
The Technical Work
GitHub Actions CI/CD Pipeline
Implemented full automated deployment with:
Continuous Integration (.github/workflows/ci.yml):
- Lint enforcement (ESLint strict mode)
- TypeScript type checking (--noEmit)
- Unit tests (Vitest)
- E2E tests (Playwright)
- Build verification
Continuous Deployment (.github/workflows/deploy.yml):
- Auto-deploy to Cloud Run on main branch pushes
- Workload Identity Federation authentication
- Environment-specific configurations
- Production monitoring hooks
Developer Experience (.github/workflows/auto-fix.yml, branch-protection.yml):
- Auto-fix common ESLint issues on PRs
- Preview deployments to staging
- Block merges if validation fails
- PR comments with preview URLs
Documentation System Overhaul
Created DOCUMENT-FILING-SYSTEM-STANDARD-v2.0:
Format: NNN-CC-ABCD-description.ext
NNN= Sequential number (001-999)CC= 2-letter category code (PP, AT, DR, etc.)ABCD= 4-letter type code (PROD, ADEC, LOGS, etc.)
Example:
104-OD-DEPL-github-actions-setup-complete.md
│ │ │ └─ Description
│ │ └────── Type: DEPL (Deployment)
│ └───────── Category: OD (Operations/DevOps)
└───────────── Sequential number
Why This Matters: When you have 123+ documentation files, human-friendly naming breaks down. This system is sortable, scannable, and self-documenting. Category codes create natural groupings without subdirectories.
Landing Page Transparency
Updated the public-facing site with honest development messaging:
Before: “We’re building the future of athlete tracking…”
After:
<div className="inline-flex items-center gap-2 px-4 py-2
bg-amber-50 rounded-full border border-amber-200">
<span className="text-sm font-medium text-amber-800">
🚧 Currently in Development
</span>
</div>
<h2>performance DATA recruiters trust</h2> {/* Core vision unchanged */}
<p>
We're actively building the platform athletes and families deserve.
<strong> Want to try what we have so far?</strong> Create an account
and explore the early features.
</p>
Dual CTA Strategy:
- Primary: “Try Early Access” →
/dashboard - Secondary: “Share Feedback” → External survey
Professional Lesson: Early-stage products benefit from transparency. Users who sign up knowing it’s in development become collaborators rather than critics. This is especially true for B2C products where trust is paramount.
What I Learned About Iterative Problem-Solving
The most valuable skill I practiced today wasn’t writing code - it was systematic debugging under constraints:
- Identify the symptom (deployment emails, failed workflows)
- Trace to root cause (wrong GCP project, conflicting workflows, missing validations)
- Fix the immediate issue (correct project IDs, ESLint errors)
- Prevent recurrence (auto-fix workflows, branch protection, documentation standards)
This is the difference between junior and senior problem-solving: juniors fix symptoms, seniors fix systems.
The Impact: From Manual to Automated
Before Today:
- Manual deployments to Cloud Run
- No CI validation
- Documentation scattered across 8+ directories
- Failed deployments discovered in production
- No staging environment
After Today:
- Auto-deploy on every push to main
- Strict CI validation blocks bad code
- 123 docs consolidated in flat
/docs/structure (100% compliant) - Failed deployments caught at PR stage
- Preview deployments for every PR
Quantified Impact:
- Deployment time: 15 minutes manual → 5 minutes automated
- Failed deployment risk: High → Near zero (caught in CI)
- Documentation discoverability: Difficult → Scannable by number/category
- Developer confidence: “Will it deploy?” → “It will deploy.”
Professional Growth Reflections
What Went Well
- Systems thinking: Addressed root causes, not symptoms
- Standard enforcement: 100% documentation compliance after violation
- Modern security: Adopted keyless WIF instead of deprecated keys
- Transparency: Honest landing page messaging builds trust
What I’d Improve Next Time
- Test before commit: Should have caught ESLint errors locally
- Incremental validation: Could have validated docs migration in batches
- Communication: Better commit messages during iterative fixes
Skills Demonstrated
- GitHub Actions workflow design (CI/CD pipelines)
- Google Cloud Platform (Cloud Run, Workload Identity Federation, IAM)
- TypeScript/Next.js strict mode compliance
- Documentation systems design
- Production debugging methodology
- Infrastructure-as-Code thinking
Looking Forward
Next challenges on the roadmap:
- Database Migration Testing: Automated Prisma migration validation in CI
- Performance Monitoring: Integrate Sentry with deployment tracking
- E2E Test Coverage: Expand Playwright tests for critical user journeys
- Terraform State Management: Move from local state to GCS backend
- Multi-Environment Strategy: Proper dev/staging/prod separation
The foundation is solid. Now it’s time to build on it.
Tech Stack: Next.js 15, TypeScript, GitHub Actions, Google Cloud Run, Workload Identity Federation, ESLint, Prisma, Playwright, Sentry
Repository: HustleStats (private)
Live Site: hustlestats.io
#Devops #Ci-Cd #Documentation #Infrastructure #Career-Growth