CodeWithAbdessamad

Logging

Centralized Logs

In distributed systems, logs are the lifeblood of understanding system behavior. However, when your system scales, logs become scattered across multiple nodes, making it difficult to get a unified view. This is where centralized logging comes in — a critical practice for observability that aggregates logs from all parts of your system into a single, accessible location. Without it, debugging becomes a nightmare: imagine manually checking logs from 100 different servers to trace a single error. Centralized logging transforms this chaos into actionable insights.

Why Centralized Logs Matter

Centralized logging solves the fundamental problem of distributed log fragmentation. In microservices architectures, each service writes logs to its own file, creating a fragmented landscape. When an incident occurs, you need to correlate logs across services, time, and users — something impossible without a unified view.

For example, consider a payment system where:

  • Service A (user authentication) logs user_id=12345
  • Service B (payment processing) logs transaction_id=67890
  • Service C (notification) logs status=failure

Without centralized logging, you’d have to manually cross-reference these logs across three different files. With centralized logs, you can instantly see the chain of events: userid=12345transactionid=67890status=failure — all in one query. This is the power of centralized logging.

Core Components of a Centralized Logging Solution

A robust centralized logging solution requires five interconnected components:

Component Purpose Example Tool
Log Aggregation Collects logs from all sources Fluent Bit, Logstash
Log Storage Durably stores logs for long-term access Elasticsearch, S3
Log Processing Enriches and transforms logs Kibana, OpenTelemetry
Log Querying Allows efficient log searches Elasticsearch Query DSL
Alerting Triggers notifications for critical events PagerDuty, Slack

Let’s explore each component with practical examples.

Log Aggregation

The first step is aggregating logs from distributed services. Fluent Bit is a lightweight, high-performance log shipper ideal for this purpose.

Example configuration for a Node.js application:

<code class="language-bash"># Fluent Bit configuration (fluent-bit.conf)
<p>[INPUT]</p>
<p>    Name tail</p>
<p>    Path /var/log/myapp.log</p>
<p>    Tag myapp</p>

<p>[OUTPUT]</p>
<p>    Name elasticsearch</p>
<p>    Hosts ["elasticsearch:9200"]</p>
<p>    Index "logs-%Y-%m-%d"</code>

This config tells Fluent Bit to:

  1. Read logs from /var/log/myapp.log
  2. Send them to Elasticsearch with a date-based index
  3. Maintain the myapp tag for service identification

Log Storage

After aggregation, logs need durable storage. Elasticsearch is a popular choice because it:

  • Indexes logs for fast searching
  • Handles large volumes of structured data
  • Supports real-time analytics

Example index creation for structured logs:

<code class="language-bash"># Create index with proper mapping
<p>PUT /logs-2024-05-01/_mapping</p>
<p>{</p>
<p>  "properties": {</p>
<p>    "timestamp": { "type": "date" },</p>
<p>    "service": { "type": "keyword" },</p>
<p>    "level": { "type": "keyword" },</p>
<p>    "message": { "type": "text" },</p>
<p>    "user_id": { "type": "keyword" }</p>
<p>  }</p>
<p>}</code>

Log Processing and Enrichment

Raw logs often lack context. Enrichment adds critical metadata (like user IDs) to make logs actionable. This is typically done via pipelines.

Example enriched log entry:

<code class="language-json">{
<p>  "timestamp": "2024-05-01T12:34:56Z",</p>
<p>  "service": "auth-service",</p>
<p>  "level": "info",</p>
<p>  "message": "User logged in",</p>
<p>  "user_id": "12345",</p>
<p>  "ip_address": "192.168.1.100"</p>
<p>}</code>

This structured format enables powerful queries while preserving security.

Log Querying

With logs indexed, you can query them in real time. Elasticsearch’s query language allows complex searches.

Example: Find failed authentication attempts from the last 5 minutes:

<code class="language-json">GET /logs-2024-05-01/_search
<p>{</p>
<p>  "query": {</p>
<p>    "bool": {</p>
<p>      "must": [</p>
<p>        { "range": { "timestamp": { "gte": "now-5m", "lte": "now" } } },</p>
<p>        { "term": { "service": "auth-service" } },</p>
<p>        { "term": { "level": "error" } }</p>
<p>      ]</p>
<p>    }</p>
<p>  }</p>
<p>}</code>

This query returns all authentication errors within the specified timeframe — critical for incident response.

Alerting

Proactive alerts prevent minor issues from becoming major problems. Set up rules to trigger notifications when specific patterns occur.

Example alert rule for payment failures:

<code class="language-yaml"># Alert rule configuration (using Prometheus)
<ul>
<li>alert: PaymentFailure</li>
<p></ul>  expr: rate(payment<em>errors</em>total[5m]) > 0.1</p>
<p>  for: 2m</p>
<p>  labels:</p>
<p>    severity: critical</p>
<p>  annotations:</p>
<p>    summary: "Payment failures exceeding 10% in last 5 minutes"</p>
<p>    description: "Service: payment-service | Error rate: {{ $value }}"</code>

This rule triggers an alert if payment errors exceed 10% within a 5-minute window.

Implementing Centralized Logging: Step-by-Step

Here’s a practical implementation guide for a small distributed system:

  1. Install a log shipper: Use Fluent Bit for lightweight deployment
  2. Configure log collection: Add your services to the shipper
  3. Set up storage: Create an Elasticsearch cluster
  4. Enrich logs: Add context to logs (user IDs, IPs)
  5. Build queries: Start with simple searches before complex analytics

Real-world example: A payment microservice using centralized logging

<code class="language-javascript">// Node.js service with Winston logger
<p>const winston = require('winston');</p>

<p>const logger = winston.createLogger({</p>
<p>  level: 'info',</p>
<p>  transports: [</p>
<p>    new winston.transports.File({ filename: 'logs/error.log' }),</p>
<p>    new winston.transports.Http({ url: 'http://central-logger:8080/log' })</p>
<p>  ]</p>
<p>});</p>

<p>// Log with context</p>
<p>logger.info('Payment processed', {</p>
<p>  user_id: '12345',</p>
<p>  transaction_id: 'PAY-67890',</p>
<p>  service: 'payment-service'</p>
<p>});</code>

This code sends structured logs to your centralized logger. The Http transport ensures logs reach the aggregation layer without service overhead.

Best Practices for Centralized Logging

To avoid common pitfalls and maximize value:

  1. Use structured logs: Always log in JSON format (not plain text) for machine readability. Example:
<code class="language-json">   {</p>
<p>     "timestamp": "2024-05-01T12:34:56Z",</p>
<p>     "service": "payment-service",</p>
<p>     "level": "info",</p>
<p>     "message": "Payment processed",</p>
<p>     "transaction_id": "PAY-67890",</p>
<p>     "user_id": "12345"</p>
<p>   }</code>

  1. Rotate logs: Implement log rotation to prevent file bloat. Fluent Bit supports this via rotate settings.
  1. Secure logs: Encrypt logs in transit and at rest. Use IAM roles for access control.
  1. Monitor your pipeline: Track shipper health metrics (e.g., fluent-bit-healthy in Prometheus).
  1. Start small: Begin with one service before scaling to the entire system.

Summary

Centralized logging transforms fragmented logs into a unified observability layer. By aggregating, storing, processing, querying, and alerting logs, you gain the ability to quickly diagnose issues, understand system behavior, and make data-driven decisions. Implementing a well-structured centralized logging solution with components like aggregation, storage, processing, querying, and alerting will transform your observability capabilities — turning logs from a burden into your most valuable system intelligence.

🌟 Remember: In the world of distributed systems, centralized logs are your eyes and ears.