Takeaway: Build for polite persistence—bounded jittered retries, idempotency, back‑pressure, and first‑class observability—so third‑party API failures never cascade.
Problem Framing
Third‑party APIs fail in ways your test suite won’t: throttling (429), flaky 5xx, long latency tails, partial truths, silent contract drift.
Constraints:
- You don’t control capacity or deploy cadence on the other side.
- You must avoid duplicated financial actions (payments, invoices, journals).
- You must meet your SLAs without DDoSing partners.
- OAuth lifetimes + clock skew complicate token refresh.
Architecture Sketch
[Producers] → (Command messages) → [Worker Service]
│ │
│ ├─ Resilient HttpClient (timeouts + retries + idempotency)
│ └─ Schedules retryable work
↓
[Azure Service Bus Queue] ⇄ (Scheduled Retries / Backpressure)
│
└─ [Third‑Party API]
Key elements: resilient outbound client, queue‑mediated backpressure, idempotency layer, structured telemetry.
Step‑by‑Step
1. Start with Correct HttpClient
Defaults
// Program.cs
builder.Services.AddHttpClient("ExternalApi", client =>
{
client.Timeout = TimeSpan.FromSeconds(15); // end-to-end per request
client.DefaultRequestHeaders.UserAgent.ParseAdd("integrations/1.0");
})
.ConfigurePrimaryHttpMessageHandler(() => new SocketsHttpHandler
{
PooledConnectionLifetime = TimeSpan.FromMinutes(5),
PooledConnectionIdleTimeout = TimeSpan.FromMinutes(2),
MaxConnectionsPerServer = 8 // cap host concurrency
})
.AddHttpMessageHandler(() => new RetryHandler(
maxAttempts: 5,
baseDelay: TimeSpan.FromSeconds(1),
maxDelay: TimeSpan.FromMinutes(2)));
2. Retry Handler (Jitter + Retry-After)
Retry only idempotent operations (GET/HEAD or those explicitly marked). Honor Retry-After
for 429/503.
public sealed class RetryHandler : DelegatingHandler
{
private readonly int _maxAttempts;
private readonly TimeSpan _baseDelay;
private readonly TimeSpan _maxDelay;
private static readonly HttpStatusCode[] Transient =
{
HttpStatusCode.RequestTimeout,
HttpStatusCode.TooManyRequests,
HttpStatusCode.BadGateway,
HttpStatusCode.ServiceUnavailable,
HttpStatusCode.GatewayTimeout
};
public RetryHandler(int maxAttempts, TimeSpan baseDelay, TimeSpan maxDelay)
=> (_maxAttempts, _baseDelay, _maxDelay) = (maxAttempts, baseDelay, maxDelay);
protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken ct)
{
bool idempotent = request.Method == HttpMethod.Get ||
request.Options.TryGetValue(new HttpRequestOptionsKey<bool>("Idempotent"), out var idem) && idem;
for (int attempt = 1; ; attempt++)
{
var response = await base.SendAsync(request, ct);
if (!ShouldRetry(response, idempotent) || attempt >= _maxAttempts)
return response;
var delay = ComputeDelay(response, attempt);
await Task.Delay(delay, ct);
}
}
private static bool ShouldRetry(HttpResponseMessage resp, bool idempotent)
=> idempotent && Transient.Contains(resp.StatusCode);
private TimeSpan ComputeDelay(HttpResponseMessage resp, int attempt)
{
if (resp.Headers.RetryAfter is { } ra)
{
if (ra.Delta is not null) return Cap(ra.Delta.Value);
if (ra.Date is not null)
{
var delta = ra.Date.Value - DateTimeOffset.UtcNow;
if (delta > TimeSpan.Zero) return Cap(delta);
}
}
var max = Math.Min(_maxDelay.TotalMilliseconds, _baseDelay.TotalMilliseconds * Math.Pow(2, attempt));
return TimeSpan.FromMilliseconds(Random.Shared.NextDouble() * max); // full jitter
}
private TimeSpan Cap(TimeSpan v) => v > _maxDelay ? _maxDelay : v;
}
For POSTs that are idempotent on your side (e.g. payment with deterministic key), set
Idempotent=true
viarequest.Options
.
3. Idempotency End‑to‑End
Outbound — deterministic key:
using var request = new HttpRequestMessage(HttpMethod.Post, "/payments");
request.Content = JsonContent.Create(body);
request.Headers.Add("Idempotency-Key", $"{companyId}:{invoiceId}:{amountCents}");
request.Options.Set(new HttpRequestOptionsKey<bool>("Idempotent"), true);
Inbound — persist (companyId, invoiceId, amount)
hash → canonical result; on replay return cached result.
4. Queue‑Based Backpressure (Azure Service Bus)
public async Task HandleAsync(ServiceBusReceivedMessage msg, CancellationToken ct)
{
try
{
await _processor.ProcessAsync(msg, ct);
await _receiver.CompleteMessageAsync(msg, ct);
}
catch (TooManyRequestsException ex)
{
var delay = ex.RetryAfter ?? TimeSpan.FromMinutes(5);
await _sender.ScheduleMessageAsync(new ServiceBusMessage(msg.Body)
{
Subject = msg.Subject,
CorrelationId = msg.CorrelationId,
ApplicationProperties = { ["retry"] = Increment(msg) }
}, DateTimeOffset.UtcNow.Add(delay), ct);
await _receiver.CompleteMessageAsync(msg, ct); // rescheduled
}
}
5. OAuth Tokens Without Drama
Single‑flight refresh + slack for clock skew.
public sealed class OAuthTokenCache
{
private readonly SemaphoreSlim _refreshLock = new(1,1);
private Token? _cached;
public async ValueTask<string> GetAsync(Func<Task<Token>> refresh, CancellationToken ct)
{
var now = DateTimeOffset.UtcNow;
if (_cached is { ExpiresAtUtc: var exp } && exp - now > TimeSpan.FromMinutes(2))
return _cached.AccessToken;
await _refreshLock.WaitAsync(ct);
try
{
if (_cached is { ExpiresAtUtc: var exp2 } && exp2 - DateTimeOffset.UtcNow > TimeSpan.FromMinutes(2))
return _cached.AccessToken;
_cached = await refresh();
return _cached.AccessToken;
}
finally { _refreshLock.Release(); }
}
public sealed record Token(string AccessToken, DateTimeOffset ExpiresAtUtc);
}
6. Prevent Silent Contract Drift
- Request/response validators (light schema assertions).
- Partner‑scoped feature flags for version pins & optional fields.
- Canary subset on new versions + sample payload archiving for diff.
7. Defensive Timeouts & Budgets
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
await _invoiceGenerator.RunAsync(companyId, cts.Token);
Budget each unit of work; propagate a linked cancellation token.
Failure Modes & Observability
Dimension | Metric / Signal | Purpose |
---|---|---|
Throttling | 429 rate per partner | Alert on sustained spikes |
Availability | 5xx rate, p95 latency | Detect degradation |
Retries | Attempts histogram | Tune backoff / spot storms |
Queue | Age, scheduled backlog | Identify backpressure saturation |
Tokens | Refresh count / failures | OAuth health |
Idempotency | Cache hit ratio | Validate retry correctness |
Alerts: 429 rate > X% for Y minutes; backlog age threshold; token refresh failures > N/5m; circuit open > 2m.
Logs: CorrelationId, attempt, computed delay (ms), reason (429/503/timeout), idempotency key (hashed), redacted payload identifiers only.
Traces: span name external.api/<provider>/<operation>
+ attributes: http.status_code
, retry_attempt
, retry_after_ms
, idempotency_key
.
What I’d Do Differently Next Time
- Introduce a token broker early (central OAuth for all finance connectors).
- Add a lightweight circuit breaker for noisy neighbours to protect shared pools.
Checklist to Apply Tomorrow
- Cap
MaxConnectionsPerServer
& set client timeouts. - Jittered retries that honor
Retry-After
(only idempotent ops). - Send Idempotency-Key + dedupe on our side.
- Schedule retries (Service Bus) instead of hot loops.
- Cache OAuth tokens (single‑flight refresh + skew slack).
- Emit metrics: 429/5xx, retries, backlog age, token refresh.
- Correlate logs + annotate traces with retry metadata.
- Add schema checks & sample payload archiving.
Resources
- Azure Service Bus scheduled messages: https://learn.microsoft.com/azure/service-bus-messaging/message-scheduling
- OpenTelemetry for .NET (getting started): https://opentelemetry.io/docs/languages/net/
- Idempotency patterns (Stripe docs): https://stripe.com/docs/idempotency
- Idempotent API design (AWS architecture blog): https://aws.amazon.com/blogs/architecture/best-practices-for-building-resilient-idempotent-apis/