Overview
Executive summary: We should keep Strava as the canonical activity source in Data Hub and treat Garmin Connect ingest as a secondary/recovery path. Garmin can provide rich activity data (including splits and downloadable activity files), but unofficial ingestion through garminconnect/garth introduces auth fragility, MFA operational friction, and policy risk. The official Garmin Connect Developer Program is stronger but business-gated and not the current path for this project.
This spike evaluated whether and how to ingest Garmin Connect activity data given an existing Strava ingest. The output is a recommendation, risk profile, and data model guidance for dedupe and source precedence.
Deliverables
| ID |
Title |
Effort |
Status |
Owner |
| SPIKE-008 |
Explore Garmin Connect activities ingest vs Strava |
S |
Done |
Codex |
Key Decisions
- Canonical source for activity entities remains Strava.
- Garmin ingest is optional and secondary, focused on reconciliation and selective enrichment.
- If Garmin ingest is added in the short term, treat it as a non-critical connector with explicit failure tolerance.
- Do not request or transmit credentials in chat; use secure local secret handling only.
Technical Notes
Options considered
- Option A (recommended): Strava-only canonical ingest. Lowest complexity and best reliability with official OAuth/webhook/rate-limit guidance.
- Option B: Add Garmin unofficial ingest (
garminconnect) as secondary source. Gains recovery + file-level detail, but raises auth/MFA/TOS and maintenance risk.
- Option C: Pursue Garmin official Activity API access. Best long-term Garmin path, but requires business program approval and commercial constraints.
Data model implications for Data Hub
- Add source-aware dedupe keys:
start_time_utc, elapsed_time, distance_m, sport_type, and optional file checksum.
- Preserve per-source external IDs to map Strava activity IDs and Garmin activity IDs without collision.
- Use source precedence rules at field level: Strava for canonical public activity metadata; Garmin for raw file artifact and supplemental metrics when available.
- Support a reconciliation window so Garmin-only records can be merged into later Strava arrivals instead of creating permanent duplicates.
Comparison snapshot: Garmin vs Strava
| Category |
Garmin |
Strava |
| API posture |
Official API is business-gated; unofficial libs are common for personal access |
Official public API with OAuth and docs |
| Activity coverage |
Summary/details/splits and activity files (FIT/TCX/GPX in official program) |
Summary/details/laps/streams/zones |
| Reliability |
Official path: strong; unofficial path: medium due to backend/auth drift risk |
High for supported endpoints |
| Rate limits |
Program throttling; unofficial limits opaque |
Published limits and headers |
| Historical access |
Available in official program tooling and in practical unofficial backfills |
Available via before/after and pagination |
Outcomes
Recommendation: Keep Strava as canonical, add Garmin only as an optional secondary ingest path for reconciliation and raw-file enrichment.
Why: This keeps the current pipeline stable while still allowing Garmin-derived value where it is uniquely useful (missing sync recovery and detailed file artifacts). It avoids making unofficial Garmin auth flow a critical dependency for core activity ingestion.
Next steps:
- Define source precedence and dedupe rules in the common activity schema.
- Prototype a non-production Garmin pull job that stores raw responses/files only, with strict retry/backoff and clear failure isolation.
- Add reconciliation logic to match Garmin records to existing Strava activities before any canonical record creation.
- If Garmin coverage becomes strategic, evaluate Garmin Connect Developer Program onboarding as a formal path.
Open Issues
- Auth fragility: unofficial login/token internals can break without notice when Garmin changes backend behavior.
- MFA friction: scheduled automation may fail during re-auth challenges.
- TOS/compliance ambiguity: unofficial endpoint usage may create legal/policy uncertainty.
- Account lockouts: repeated failed login attempts can trigger protective controls.
- Operational complexity: dual-source dedupe and reconciliation increases connector maintenance burden.
References