When talking about observability in ASP.NET Core, the risk is almost always the same: teams focus on just one piece of the puzzle, usually logs or traces, and end up with an incomplete setup that is only partially useful. In practice, though, a solid baseline should cover at least four areas: tracing, metrics, structured logging and health checks.
That is also the approach we recommend in real-world projects: OpenTelemetry for traces and metrics, Serilog for logging, separate endpoints for liveness and readiness, along with a few details that are often overlooked, such as correlating TraceId with application logs, setting up sampling, and adding a minimum amount of custom telemetry for the operations that actually matter.
The benefit is tangible and becomes clear very quickly. You get a modern, standardized observability platform that is easy to export via OTLP to collectors, Grafana, Tempo or other backends; at the same time, you avoid the classic situation where an application “has logs”, but when things start to degrade, it does not return anything truly useful.
A sensible baseline: tracing, metrics, logs and HTTP probes
The starting model is simple and, in my view, should become almost universal in ASP.NET Core 10 projects:
- OpenTelemetry tracing to follow requests end to end;
- OpenTelemetry metrics to measure throughput, errors, latency and runtime signals;
- Serilog for structured, queryable logs aligned with the application context;
- Health Checks with separate
/health/liveand/health/readyendpoints.
This architecture has a very practical advantage: it keeps signals cleanly separated, without forcing logs to do the job of metrics or health probes to replace tracing. Each component stays within its own scope, and the final result is much easier to read both locally and in production.
OpenTelemetry with OTLP exporter in ASP.NET Core 10
In .NET, integrating OpenTelemetry is quite natural, also because the platform already exposes the primitives this model relies on: Activity and ActivitySource for tracing, Meter for metrics, and ILogger for logging. OpenTelemetry takes care of collecting these signals and sending them to a backend through an exporter, with OTLP being the most flexible and portable option.
A typical configuration in Program.cs might look like this:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
using OpenTelemetry.Metrics; using OpenTelemetry.Resources; using OpenTelemetry.Trace; using Serilog; using Serilog.Context; using System.Diagnostics; using System.Diagnostics.Metrics; var builder = WebApplication.CreateBuilder(args); builder.Host.UseSerilog((context, services, logger) => logger .ReadFrom.Configuration(context.Configuration) .Enrich.FromLogContext() .Enrich.WithProperty("Application", "MyApi") .WriteTo.Console()); builder.Services.AddOpenTelemetry() .ConfigureResource(resource => resource .AddService( serviceName: "MyApi", serviceVersion: typeof(Program).Assembly.GetName().Version?.ToString())) .WithTracing(tracing => { tracing .AddAspNetCoreInstrumentation() .AddHttpClientInstrumentation() .AddSqlClientInstrumentation() .AddSource("MyApi") .SetSampler(new ParentBasedSampler(new TraceIdRatioBasedSampler(0.25))) .AddOtlpExporter(); }) .WithMetrics(metrics => { metrics .AddAspNetCoreInstrumentation() .AddHttpClientInstrumentation() .AddRuntimeInstrumentation() .AddMeter("MyApi") .AddOtlpExporter(); }); builder.Services.AddHealthChecks() .AddCheck("self", () => Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckResult.Healthy(), tags: new[] { "live" }) .AddCheck<SqlServerReadinessHealthCheck>("sql", tags: new[] { "ready" }) .AddCheck<RedisReadinessHealthCheck>("redis", tags: new[] { "ready" }); var app = builder.Build(); app.Use(async (context, next) => { var traceId = Activity.Current?.TraceId.ToString(); var spanId = Activity.Current?.SpanId.ToString(); using (LogContext.PushProperty("TraceId", traceId)) using (LogContext.PushProperty("SpanId", spanId)) { await next(); } }); app.MapHealthChecks("/health/live", new Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckOptions { Predicate = check => check.Tags.Contains("live") }); app.MapHealthChecks("/health/ready", new Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckOptions { Predicate = check => check.Tags.Contains("ready") }); app.MapGet("/", () => "OK"); app.Run(); |
The interesting part is that, with a configuration like this, you already get a serious observability baseline: incoming requests traced, outgoing HTTP calls observable, SQL queries intercepted, runtime and application metrics exported via OTLP, structured logs written to console or external sinks, plus two clear endpoints for orchestrators and load balancers.
OTLP exporter: is it always worth it? (Spoiler: YES)
OTLP is, today, the most reasonable choice if you do not want to tie yourself too early to a specific vendor or stack. It allows you to send traces and metrics to an OpenTelemetry collector, to an Aspire Dashboard locally, or to platforms such as Grafana through dedicated pipelines. During development, you can also keep instrumentation active without necessarily having a collector always running: the code will continue to produce local Activity instances and metrics, which are still useful for diagnostics and testing.
In practical terms, the OTLP exporter is not just an “enterprise” choice: it is also the cleanest way to avoid redesigning your whole telemetry setup every time you change backend or add a new environment.
Tracing: enabling what you actually need
One of the most common mistakes is thinking you “have tracing” just because you added OpenTelemetry. In reality, the quality of the outcome depends on the instrumentations you enable and on the custom sources you add for your application domain.
In an ASP.NET Core Web API, the three minimum activations I consider truly important are these:
- ASP.NET Core / Kestrel to trace incoming HTTP requests;
- HttpClient to observe external dependencies and outbound calls;
- EF Core or SqlClient to understand where time is being spent on the database.
It is worth being clear here: if your application runs on ASP.NET Core, server-side instrumentation covers incoming requests handled by the web stack, which in practice also includes traffic managed by Kestrel; for the data layer, on the other hand, you can stop at SqlClient if observing SQL queries is enough, or add dedicated EF Core instrumentation when you want visibility closer to the ORM layer.
When a project includes critical operations, it is almost always worth adding a custom ActivitySource as well. That is where you start seeing the domain itself, not just the infrastructure.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
public static class Observability { public const string ActivitySourceName = "MyApi"; public static readonly ActivitySource ActivitySource = new(ActivitySourceName); } app.MapPost("/orders", async (OrderRequest request, ILogger<Program> logger) => { using var activity = Observability.ActivitySource.StartActivity("orders.create"); activity?.SetTag("order.customer_id", request.CustomerId); activity?.SetTag("order.items_count", request.Items.Count); logger.LogInformation("Creating order for customer {CustomerId}", request.CustomerId); // business logic... return Results.Accepted(); }); |
This is one of those steps that really makes a difference in Grafana Tempo or in any serious trace backend: seeing an HTTP trace is useful, but seeing an application span with domain tags is much more useful.
Custom metrics with Meter and Counter<T>
The built-in metrics from ASP.NET Core, HttpClient and the runtime are extremely useful, but almost never enough. Sooner or later you need to measure something specific: processed orders, completed jobs, failed webhooks, discarded documents, denied logins, retries performed. That is where Meter and Counter<T> come into play.
The pattern is simple: define an application meter and one or more metric instruments, then increment them at the points in the flow that have operational or diagnostic value.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
using System.Diagnostics.Metrics; public static class AppMetrics { public const string MeterName = "MyApi"; public static readonly Meter Meter = new(MeterName); public static readonly Counter<int> OrdersCreated = Meter.CreateCounter<int>( "orders.created", unit: "{order}", description: "Number of successfully created orders"); public static readonly Counter<int> OrdersFailed = Meter.CreateCounter<int>( "orders.failed", unit: "{order}", description: "Number of failed order creations"); } |
Usage:
|
1 2 3 4 5 6 7 |
AppMetrics.OrdersCreated.Add(1, new KeyValuePair<string, object?>("channel", "api"), new KeyValuePair<string, object?>("tenant_id", tenantId)); AppMetrics.OrdersFailed.Add(1, new KeyValuePair<string, object?>("reason", "validation"), new KeyValuePair<string, object?>("tenant_id", tenantId)); |
This is the kind of telemetry that, in production, is worth much more than a large number of verbose logs. A counter with well-chosen tags allows you to spot anomalies immediately, build meaningful Grafana dashboards and configure alerts on real trends or spikes. The important thing is not to overdo tag cardinality: unique identifiers, email addresses, GUIDs or values that vary too much are almost always a bad idea.
Serilog: structured, readable and queryable logs
OpenTelemetry does not make logs useless. It simply puts them back in their proper place. Logs are still needed, very much so, but they must be structured, not written as indistinct free text. In an ASP.NET Core 10 project, Serilog remains an excellent choice for exactly this reason.
The minimum configuration I usually recommend is very simple:
ReadFrom.Configuration(...)to centralize levels and sinks;Enrich.FromLogContext()to collect contextual properties;- JSON output, or at least structured output, to console, files, Seq, Loki or other sinks;
- noise reduction for framework namespaces such as
MicrosoftandSystem.
The real quality leap, though, happens when logs speak the same language as traces: the same business identifiers, the same conceptual tags, the same consistent property names. At that point, correlation stops being theoretical and becomes genuinely usable.
TraceId - Serilog correlation through LogContext
This is one of those details many teams skip at the beginning and then find themselves chasing in production. If you want to move quickly from a trace to a log entry, or from logs back to the full trace, you need to include at least the TraceId in logging, and ideally the SpanId as well.
Serilog can include this information when rendering logs, but in practice it is also worth pushing it explicitly into the LogContext inside the HTTP pipeline, so it is always available in structured sinks and in Loki queries or similar tools.
|
1 2 3 4 5 6 7 8 9 10 11 |
app.Use(async (context, next) => { var activity = Activity.Current; using (LogContext.PushProperty("TraceId", activity?.TraceId.ToString())) using (LogContext.PushProperty("SpanId", activity?.SpanId.ToString())) using (LogContext.PushProperty("RequestPath", context.Request.Path.Value)) { await next(); } }); |
Then, at the important points in the code:
|
1 2 3 4 |
logger.LogInformation( "Order {OrderId} created for tenant {TenantId}", orderId, tenantId); |
This way, logs stay clean while still carrying the context needed to move from Grafana Loki to Tempo and back. It is a small addition, but one that pays off almost immediately when you need to reconstruct a real incident.
Sampling: should you trace everything? (Spoiler: NO)
At the beginning, it is tempting to collect 100% of traces. Locally, that can make sense. In production, almost never. Volume grows fast, costs go up, and the signal-to-noise ratio gets worse. That is why sampling should be planned early, not afterwards.
In most cases, the most sensible combination is parent-based + ratio-based. In practice:
- if a parent trace is already sampled, child spans follow that decision to preserve end-to-end consistency;
- for new traces, a percentage is applied, such as 10%, 25% or 50%, depending on load and diagnostic value.
A reasonable example:
|
1 2 3 |
tracing.SetSampler( new ParentBasedSampler( new TraceIdRatioBasedSampler(0.10))); |
This setting avoids a fairly common mistake: using only a ratio-based sampler and ending up with broken or inconsistent distributed traces across services. In a system with multiple hops, preserving the parent decision is almost always the correct choice.
Health Checks: /health/live and /health/ready
Health checks are often treated as a formality, but they are not. Separating /health/live from /health/ready makes a fundamental distinction explicit:
- liveness tells you whether the process is alive;
- readiness tells you whether the application is actually ready to receive traffic.
They may look similar, but in practice they answer very different questions. An application can be alive and not ready, for example because initialization is not complete yet, because it cannot connect to the database, or because a critical dependency is degraded.
The structure I usually recommend is this:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
builder.Services.AddHealthChecks() .AddCheck("self", () => Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckResult.Healthy(), tags: new[] { "live" }) .AddCheck<SqlServerReadinessHealthCheck>("sql", tags: new[] { "ready" }) .AddCheck<RedisReadinessHealthCheck>("redis", tags: new[] { "ready" }) .AddCheck<RabbitMqReadinessHealthCheck>("rabbitmq", tags: new[] { "ready" }); app.MapHealthChecks("/health/live", new Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckOptions { Predicate = check => check.Tags.Contains("live") }); app.MapHealthChecks("/health/ready", new Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckOptions { Predicate = check => check.Tags.Contains("ready") }); |
This setup works well in Kubernetes, behind load balancers, in containers and, more generally, anywhere you want to avoid an instance being considered available just because it is still responding at process level. In practical terms: it is not enough to be running, you also need to be ready.
Grafana dashboards and tracing in Tempo
Once traces and metrics have been exported via OTLP, the next step is visualizing them in a meaningful way. Grafana and Tempo work very well here, especially when telemetry has been designed with at least some discipline.
Some panels I find genuinely useful:
1. HTTP overview
- request rate by endpoint;
- p50, p95 and p99 latency;
- 4xx/5xx error rate;
- top endpoints by average duration.
2. External dependencies
HttpClientcall latency by remote host;- error rate by external provider;
- slow traces filtered by dependency.
3. Database
- SQL query or EF Core operation duration;
- number of database errors;
- traces with the highest impact on the data access layer.
4. Custom domain metrics
orders.createdandorders.failedby tenant or channel;- completed/failed jobs by type;
- operational trends over short and long time windows.
5. Health and availability
- readiness state over time;
- critical check failure counts;
- correlation between readiness flapping, application errors and latency spikes.
In Tempo, on the other hand, the most useful part is almost always the ability to navigate slow or anomalous traces and correlate them with logs. If you configured TraceId correctly in Serilog logs and linked Tempo to Loki in Grafana, you can move from a span to the related logs with one or two clicks. That is the point where the system stops being merely “nice to look at” and starts becoming genuinely useful for troubleshooting.
Naming, tags and cardinality
It is worth closing with a very practical observation: a large part of observability quality comes down to details. Emitting telemetry is not enough; you need to emit it well.
So:
- use clear names for meters, activity sources and metric instruments;
- keep tags consistent across traces, metrics and logs;
- avoid excessively high cardinality in metric tags;
- do not log everything: log what actually helps you understand the system;
- do not use health probes as a substitute for metrics and tracing.
These are simple rules, but they make a huge difference over time. And, as often happens, the hard part is not adding a library: it is maintaining discipline as the project grows.
Conclusions
The overall picture is quite clear: OpenTelemetry for traces and metrics, Serilog for structured logging, separate health checks for liveness and readiness, plus a handful of custom metrics and solid correlation between signals.
In ASP.NET Core 10, this is a mature, practical combination that works well both for new projects and for existing ones that want to raise the bar on observability. The interesting part is that you do not need to build a huge platform to get value out of it: you just need to start with the right components and make them work well together.
When you are looking at a slow trace in Tempo, the related TraceId in Serilog logs, a custom metric flagging an anomaly and a /health/ready endpoint starting to degrade, you realize that you are no longer just “collecting data”: you are finally observing the real behavior of your application.
