Debugging Serverless Functions: A Practical Guide with trace.do

Serverless computing has revolutionized how we build and deploy applications. The promise of infinite scalability, pay-per-use pricing, and reduced operational overhead is compelling. But as many developers have discovered, with great power comes a significant challenge: when things go wrong, how do you debug a system that's distributed, ephemeral, and seemingly opaque?

Wading through mountains of logs from dozens of different function invocations is a frustrating and time-consuming process. You're looking for a needle in a haystack, without a clear picture of how one event led to another.

This is where distributed tracing comes in. It transforms the "black box" of serverless into a transparent, observable system. And with a tool like trace.do, implementing it is easier than you think.

The Serverless Debugging Black Hole

Traditional debugging methods fall short in a serverless world. The core challenges are:

Distributed Complexity: A single user request might trigger a cascade of functions. An API Gateway endpoint could invoke a Lambda function, which writes to DynamoDB, puts a message on an SQS queue, and calls a third-party API. How do you follow that single request's journey across all those touchpoints?
Ephemeral Environments: Serverless functions spin up to do their job and then disappear. You can't SSH into a "server" to check on a running process because, most of the time, there isn't one.
Log Overload: While logs are essential, they are often disconnected. Correlating logs from multiple function executions to understand a single workflow is a manual, error-prone task that rarely gives you the full performance picture.

These issues lead to longer resolution times, frustrated developers, and unhappy users. The solution is to shift from reactive log-diving to proactive observability with distributed tracing.

What is Distributed Tracing?

Imagine you're tracking a package from a warehouse to your doorstep. You can see when it left the warehouse, when it arrived at the local distribution center, when it was loaded onto the truck, and when it was delivered.

Distributed tracing does the same thing for your application requests.

It provides a unified view of a request's entire lifecycle as it travels through your services. This complete journey is called a trace. Each step in the journey—an API call, a database query, or a function execution—is a span.

By connecting these spans, trace.do builds a detailed, chronological map of your request flow. This allows you to instantly pinpoint errors, identify performance bottlenecks, and understand the real-world behavior of your system.

A Practical Example: Tracing a Serverless Order Processor

Let's see how trace.do brings effortless observability to a common serverless pattern: an order processing workflow.

Imagine we have a function, processOrder, that is triggered when a customer places an order. This function then calls two other services (which could be other functions or microservices): one to handle payment and another to dispatch the shipment.

Without Tracing

Our initial AWS Lambda handler might look something like this:

// services.ts
import { completePayment, dispatchShipment } from './downstream-services';

// handler.ts
export async function handler(event: any) {
  const { orderId } = JSON.parse(event.body);
  console.log(`Starting to process order: ${orderId}`);
  
  try {
    const payment = await completePayment(orderId);
    console.log(`Payment processed for order: ${orderId}`);
    
    await dispatchShipment(orderId);
    console.log(`Shipment dispatched for order: ${orderId}`);

    return { statusCode: 200, body: JSON.stringify({ success: true }) };
  } catch (error) {
    console.error(`Failed to process order ${orderId}:`, error);
    return { statusCode: 500, body: JSON.stringify({ success: false }) };
  }
}

If an order fails, we have to sift through CloudWatch logs to piece together what happened. If it's just slow, we have no easy way of knowing whether the delay was in the payment service or the shipping service.

With trace.do

Now, let's add trace.do. Our approach becomes one of Observability as Code. We use a simple trace.span() wrapper to automatically capture the entire execution and add meaningful context.

// Import the trace.do SDK
import { trace } from '@do/trace';
import { completePayment, dispatchShipment } from './downstream-services';

export async function handler(event: any) {
  const { orderId } = JSON.parse(event.body);

  // Automatically trace the entire handler execution
  return trace.span('processOrderHandler', async (span) => {
    // Add custom attributes to the span for easy filtering and analysis
    span.setAttribute('order.id', orderId);
    span.setAttribute('cloud.region', process.env.AWS_REGION);

    try {
      // The trace context is automatically propagated to downstream calls
      const payment = await completePayment(orderId);
      // Add events to mark specific points in time within the span
      span.addEvent('Payment processed', { paymentId: payment.id });

      await dispatchShipment(orderId);
      span.addEvent('Shipment dispatched');

      return { statusCode: 200, body: JSON.stringify({ success: true }) };
    } catch (error) {
      // Errors are automatically captured on the span
      span.recordException(error);
      span.setStatus({ code: 'error', message: error.message });
      throw error; // Re-throw to let Lambda handle the failure
    }
  });
}

What a Difference a Wrapper Makes

With just a few lines of code, we've gained complete clarity:

Automated Tracing: The trace.span() wrapper creates a parent span for our processOrderHandler execution. It automatically records the start time, end time, duration, and any errors that occur.
Rich Context: Using span.setAttribute(), we've added a searchable order.id tag to our trace. Now we can easily find the exact trace for a specific customer's order.
Key Events: span.addEvent() creates a structured log entry tied directly to the trace's timeline, telling us precisely when the payment was processed and the shipment was dispatched.
Effortless Propagation: Crucially, trace.do automatically propagates the trace context. When processOrderHandler calls completePayment (assuming it's also instrumented), the completePayment span will appear as a child of the processOrderHandler span, giving us a perfect waterfall view of the entire operation.

Now, instead of guessing, you can look at the trace and see:

processOrderHandler took 2.5s.
The child span completePayment took 2.1s.
The child span dispatchShipment took 0.3s.

The bottleneck is obvious. You know exactly where to focus your optimization efforts.

Built on Open Standards for Maximum Compatibility

Worrying about vendor lock-in? Don't be. trace.do is built on OpenTelemetry (OTel), the open-source industry standard for observability. This means you get:

Future-Proof Instrumentation: Your code isn't tied to a specific vendor. The instrumentation is standard.
Ecosystem Compatibility: You can send your trace data to any OTel-compatible backend, whether it's Jaeger, Honeycomb, Datadog, or others.
Broad Support: OTel provides auto-instrumentation for hundreds of popular libraries and frameworks, reducing the amount of manual work you need to do.

Get Started with Effortless Observability

Stop flying blind. Serverless applications demand a modern approach to debugging and performance monitoring. By embracing distributed tracing with trace.do, you turn complex, opaque workflows into clear, actionable insights.

The code-driven, agentic workflow of trace.do integrates observability directly into your development process, making it a natural part of building resilient, high-performance applications.

Ready to bring complete clarity to your services? Explore trace.do today and transform your debugging workflow.

Do Work. With AI.