Logging

Centralized Logs

In distributed systems, logs are the lifeblood of understanding system behavior. However, when your system scales, logs become scattered across multiple nodes, making it difficult to get a unified view. This is where centralized logging comes in — a critical practice for observability that aggregates logs from all parts of your system into a single, accessible location. Without it, debugging becomes a nightmare: imagine manually checking logs from 100 different servers to trace a single error. Centralized logging transforms this chaos into actionable insights.

Why Centralized Logs Matter

Centralized logging solves the fundamental problem of distributed log fragmentation. In microservices architectures, each service writes logs to its own file, creating a fragmented landscape. When an incident occurs, you need to correlate logs across services, time, and users — something impossible without a unified view.

For example, consider a payment system where:

Service A (user authentication) logs user_id=12345
Service B (payment processing) logs transaction_id=67890
Service C (notification) logs status=failure

Without centralized logging, you’d have to manually cross-reference these logs across three different files. With centralized logs, you can instantly see the chain of events: userid=12345 → transactionid=67890 → status=failure — all in one query. This is the power of centralized logging.

Core Components of a Centralized Logging Solution

A robust centralized logging solution requires five interconnected components:

Component	Purpose	Example Tool
Log Aggregation	Collects logs from all sources	Fluent Bit, Logstash
Log Storage	Durably stores logs for long-term access	Elasticsearch, S3
Log Processing	Enriches and transforms logs	Kibana, OpenTelemetry
Log Querying	Allows efficient log searches	Elasticsearch Query DSL
Alerting	Triggers notifications for critical events	PagerDuty, Slack

Let’s explore each component with practical examples.

Log Aggregation

The first step is aggregating logs from distributed services. Fluent Bit is a lightweight, high-performance log shipper ideal for this purpose.

Example configuration for a Node.js application:

<code class="language-bash"># Fluent Bit configuration (fluent-bit.conf)
<p>[INPUT]</p>
<p>    Name tail</p>
<p>    Path /var/log/myapp.log</p>
<p>    Tag myapp</p>

<p>[OUTPUT]</p>
<p>    Name elasticsearch</p>
<p>    Hosts ["elasticsearch:9200"]</p>
<p>    Index "logs-%Y-%m-%d"</code>

This config tells Fluent Bit to:

Read logs from /var/log/myapp.log
Send them to Elasticsearch with a date-based index
Maintain the myapp tag for service identification

Log Storage

After aggregation, logs need durable storage. Elasticsearch is a popular choice because it:

Indexes logs for fast searching
Handles large volumes of structured data
Supports real-time analytics

Example index creation for structured logs:

<code class="language-bash"># Create index with proper mapping
<p>PUT /logs-2024-05-01/_mapping</p>
<p>{</p>
<p>  "properties": {</p>
<p>    "timestamp": { "type": "date" },</p>
<p>    "service": { "type": "keyword" },</p>
<p>    "level": { "type": "keyword" },</p>
<p>    "message": { "type": "text" },</p>
<p>    "user_id": { "type": "keyword" }</p>
<p>  }</p>
<p>}</code>

Log Processing and Enrichment

Raw logs often lack context. Enrichment adds critical metadata (like user IDs) to make logs actionable. This is typically done via pipelines.

Example enriched log entry:

<code class="language-json">{
<p>  "timestamp": "2024-05-01T12:34:56Z",</p>
<p>  "service": "auth-service",</p>
<p>  "level": "info",</p>
<p>  "message": "User logged in",</p>
<p>  "user_id": "12345",</p>
<p>  "ip_address": "192.168.1.100"</p>
<p>}</code>

This structured format enables powerful queries while preserving security.

Log Querying

With logs indexed, you can query them in real time. Elasticsearch’s query language allows complex searches.

Example: Find failed authentication attempts from the last 5 minutes:

<code class="language-json">GET /logs-2024-05-01/_search
<p>{</p>
<p>  "query": {</p>
<p>    "bool": {</p>
<p>      "must": [</p>
<p>        { "range": { "timestamp": { "gte": "now-5m", "lte": "now" } } },</p>
<p>        { "term": { "service": "auth-service" } },</p>
<p>        { "term": { "level": "error" } }</p>
<p>      ]</p>
<p>    }</p>
<p>  }</p>
<p>}</code>

This query returns all authentication errors within the specified timeframe — critical for incident response.

Alerting

Proactive alerts prevent minor issues from becoming major problems. Set up rules to trigger notifications when specific patterns occur.

Example alert rule for payment failures:

<code class="language-yaml"># Alert rule configuration (using Prometheus)
<ul>
<li>alert: PaymentFailure</li>
<p></ul>  expr: rate(payment<em>errors</em>total[5m]) > 0.1</p>
<p>  for: 2m</p>
<p>  labels:</p>
<p>    severity: critical</p>
<p>  annotations:</p>
<p>    summary: "Payment failures exceeding 10% in last 5 minutes"</p>
<p>    description: "Service: payment-service | Error rate: {{ $value }}"</code>

This rule triggers an alert if payment errors exceed 10% within a 5-minute window.

Implementing Centralized Logging: Step-by-Step

Here’s a practical implementation guide for a small distributed system:

Install a log shipper: Use Fluent Bit for lightweight deployment
Configure log collection: Add your services to the shipper
Set up storage: Create an Elasticsearch cluster
Enrich logs: Add context to logs (user IDs, IPs)
Build queries: Start with simple searches before complex analytics

Real-world example: A payment microservice using centralized logging

<code class="language-javascript">// Node.js service with Winston logger
<p>const winston = require('winston');</p>

<p>const logger = winston.createLogger({</p>
<p>  level: 'info',</p>
<p>  transports: [</p>
<p>    new winston.transports.File({ filename: 'logs/error.log' }),</p>
<p>    new winston.transports.Http({ url: 'http://central-logger:8080/log' })</p>
<p>  ]</p>
<p>});</p>

<p>// Log with context</p>
<p>logger.info('Payment processed', {</p>
<p>  user_id: '12345',</p>
<p>  transaction_id: 'PAY-67890',</p>
<p>  service: 'payment-service'</p>
<p>});</code>

This code sends structured logs to your centralized logger. The Http transport ensures logs reach the aggregation layer without service overhead.

Best Practices for Centralized Logging

To avoid common pitfalls and maximize value:

Use structured logs: Always log in JSON format (not plain text) for machine readability. Example:

<code class="language-json">   {</p>
<p>     "timestamp": "2024-05-01T12:34:56Z",</p>
<p>     "service": "payment-service",</p>
<p>     "level": "info",</p>
<p>     "message": "Payment processed",</p>
<p>     "transaction_id": "PAY-67890",</p>
<p>     "user_id": "12345"</p>
<p>   }</code>

Rotate logs: Implement log rotation to prevent file bloat. Fluent Bit supports this via rotate settings.

Secure logs: Encrypt logs in transit and at rest. Use IAM roles for access control.

Monitor your pipeline: Track shipper health metrics (e.g., fluent-bit-healthy in Prometheus).

Start small: Begin with one service before scaling to the entire system.

Summary

Centralized logging transforms fragmented logs into a unified observability layer. By aggregating, storing, processing, querying, and alerting logs, you gain the ability to quickly diagnose issues, understand system behavior, and make data-driven decisions. Implementing a well-structured centralized logging solution with components like aggregation, storage, processing, querying, and alerting will transform your observability capabilities — turning logs from a burden into your most valuable system intelligence.

🌟 Remember: In the world of distributed systems, centralized logs are your eyes and ears.