Takeaway: Build for polite persistence—bounded jittered retries, idempotency, back‑pressure, and first‑class observability—so third‑party API failures never cascade.

Problem Framing

Third‑party APIs fail in ways your test suite won’t: throttling (429), flaky 5xx, long latency tails, partial truths, silent contract drift.

Constraints:

  • You don’t control capacity or deploy cadence on the other side.
  • You must avoid duplicated financial actions (payments, invoices, journals).
  • You must meet your SLAs without DDoSing partners.
  • OAuth lifetimes + clock skew complicate token refresh.

Architecture Sketch

[Producers] → (Command messages) → [Worker Service]
                │                    │
                │                    ├─ Resilient HttpClient (timeouts + retries + idempotency)
                │                    └─ Schedules retryable work

        [Azure Service Bus Queue] ⇄ (Scheduled Retries / Backpressure)

                          └─ [Third‑Party API]

Key elements: resilient outbound client, queue‑mediated backpressure, idempotency layer, structured telemetry.


Step‑by‑Step

1. Start with Correct HttpClient Defaults

// Program.cs
builder.Services.AddHttpClient("ExternalApi", client =>
{
    client.Timeout = TimeSpan.FromSeconds(15); // end-to-end per request
    client.DefaultRequestHeaders.UserAgent.ParseAdd("integrations/1.0");
})
.ConfigurePrimaryHttpMessageHandler(() => new SocketsHttpHandler
{
    PooledConnectionLifetime = TimeSpan.FromMinutes(5),
    PooledConnectionIdleTimeout = TimeSpan.FromMinutes(2),
    MaxConnectionsPerServer = 8 // cap host concurrency
})
.AddHttpMessageHandler(() => new RetryHandler(
    maxAttempts: 5,
    baseDelay: TimeSpan.FromSeconds(1),
    maxDelay: TimeSpan.FromMinutes(2)));

2. Retry Handler (Jitter + Retry-After)

Retry only idempotent operations (GET/HEAD or those explicitly marked). Honor Retry-After for 429/503.

public sealed class RetryHandler : DelegatingHandler
{
    private readonly int _maxAttempts;
    private readonly TimeSpan _baseDelay;
    private readonly TimeSpan _maxDelay;
    private static readonly HttpStatusCode[] Transient =
    {
        HttpStatusCode.RequestTimeout,
        HttpStatusCode.TooManyRequests,
        HttpStatusCode.BadGateway,
        HttpStatusCode.ServiceUnavailable,
        HttpStatusCode.GatewayTimeout
    };

    public RetryHandler(int maxAttempts, TimeSpan baseDelay, TimeSpan maxDelay)
        => (_maxAttempts, _baseDelay, _maxDelay) = (maxAttempts, baseDelay, maxDelay);

    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken ct)
    {
        bool idempotent = request.Method == HttpMethod.Get ||
            request.Options.TryGetValue(new HttpRequestOptionsKey<bool>("Idempotent"), out var idem) && idem;

        for (int attempt = 1; ; attempt++)
        {
            var response = await base.SendAsync(request, ct);
            if (!ShouldRetry(response, idempotent) || attempt >= _maxAttempts)
                return response;

            var delay = ComputeDelay(response, attempt);
            await Task.Delay(delay, ct);
        }
    }

    private static bool ShouldRetry(HttpResponseMessage resp, bool idempotent)
        => idempotent && Transient.Contains(resp.StatusCode);

    private TimeSpan ComputeDelay(HttpResponseMessage resp, int attempt)
    {
        if (resp.Headers.RetryAfter is { } ra)
        {
            if (ra.Delta is not null) return Cap(ra.Delta.Value);
            if (ra.Date is not null)
            {
                var delta = ra.Date.Value - DateTimeOffset.UtcNow;
                if (delta > TimeSpan.Zero) return Cap(delta);
            }
        }
        var max = Math.Min(_maxDelay.TotalMilliseconds, _baseDelay.TotalMilliseconds * Math.Pow(2, attempt));
        return TimeSpan.FromMilliseconds(Random.Shared.NextDouble() * max); // full jitter
    }

    private TimeSpan Cap(TimeSpan v) => v > _maxDelay ? _maxDelay : v;
}

For POSTs that are idempotent on your side (e.g. payment with deterministic key), set Idempotent=true via request.Options.

3. Idempotency End‑to‑End

Outbound — deterministic key:

using var request = new HttpRequestMessage(HttpMethod.Post, "/payments");
request.Content = JsonContent.Create(body);
request.Headers.Add("Idempotency-Key", $"{companyId}:{invoiceId}:{amountCents}");
request.Options.Set(new HttpRequestOptionsKey<bool>("Idempotent"), true);

Inbound — persist (companyId, invoiceId, amount) hash → canonical result; on replay return cached result.

4. Queue‑Based Backpressure (Azure Service Bus)

public async Task HandleAsync(ServiceBusReceivedMessage msg, CancellationToken ct)
{
    try
    {
        await _processor.ProcessAsync(msg, ct);
        await _receiver.CompleteMessageAsync(msg, ct);
    }
    catch (TooManyRequestsException ex)
    {
        var delay = ex.RetryAfter ?? TimeSpan.FromMinutes(5);
        await _sender.ScheduleMessageAsync(new ServiceBusMessage(msg.Body)
        {
            Subject = msg.Subject,
            CorrelationId = msg.CorrelationId,
            ApplicationProperties = { ["retry"] = Increment(msg) }
        }, DateTimeOffset.UtcNow.Add(delay), ct);
        await _receiver.CompleteMessageAsync(msg, ct); // rescheduled
    }
}

5. OAuth Tokens Without Drama

Single‑flight refresh + slack for clock skew.

public sealed class OAuthTokenCache
{
    private readonly SemaphoreSlim _refreshLock = new(1,1);
    private Token? _cached;

    public async ValueTask<string> GetAsync(Func<Task<Token>> refresh, CancellationToken ct)
    {
        var now = DateTimeOffset.UtcNow;
        if (_cached is { ExpiresAtUtc: var exp } && exp - now > TimeSpan.FromMinutes(2))
            return _cached.AccessToken;

        await _refreshLock.WaitAsync(ct);
        try
        {
            if (_cached is { ExpiresAtUtc: var exp2 } && exp2 - DateTimeOffset.UtcNow > TimeSpan.FromMinutes(2))
                return _cached.AccessToken;
            _cached = await refresh();
            return _cached.AccessToken;
        }
        finally { _refreshLock.Release(); }
    }

    public sealed record Token(string AccessToken, DateTimeOffset ExpiresAtUtc);
}

6. Prevent Silent Contract Drift

  • Request/response validators (light schema assertions).
  • Partner‑scoped feature flags for version pins & optional fields.
  • Canary subset on new versions + sample payload archiving for diff.

7. Defensive Timeouts & Budgets

using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
await _invoiceGenerator.RunAsync(companyId, cts.Token);

Budget each unit of work; propagate a linked cancellation token.


Failure Modes & Observability

DimensionMetric / SignalPurpose
Throttling429 rate per partnerAlert on sustained spikes
Availability5xx rate, p95 latencyDetect degradation
RetriesAttempts histogramTune backoff / spot storms
QueueAge, scheduled backlogIdentify backpressure saturation
TokensRefresh count / failuresOAuth health
IdempotencyCache hit ratioValidate retry correctness

Alerts: 429 rate > X% for Y minutes; backlog age threshold; token refresh failures > N/5m; circuit open > 2m.

Logs: CorrelationId, attempt, computed delay (ms), reason (429/503/timeout), idempotency key (hashed), redacted payload identifiers only.

Traces: span name external.api/<provider>/<operation> + attributes: http.status_code, retry_attempt, retry_after_ms, idempotency_key.


What I’d Do Differently Next Time

  • Introduce a token broker early (central OAuth for all finance connectors).
  • Add a lightweight circuit breaker for noisy neighbours to protect shared pools.

Checklist to Apply Tomorrow

  • Cap MaxConnectionsPerServer & set client timeouts.
  • Jittered retries that honor Retry-After (only idempotent ops).
  • Send Idempotency-Key + dedupe on our side.
  • Schedule retries (Service Bus) instead of hot loops.
  • Cache OAuth tokens (single‑flight refresh + skew slack).
  • Emit metrics: 429/5xx, retries, backlog age, token refresh.
  • Correlate logs + annotate traces with retry metadata.
  • Add schema checks & sample payload archiving.

Resources