Running two tracer_providers causes CPU to spike #3694

twiclo · 2024-02-21T00:02:44Z

twiclo
Feb 21, 2024

I have an application that takes HTTP calls and converts the information into a RabbitMQ publish. I use OpenTelemetry to report traces into Jaeger. When I run my application, and blast it with HTTP calls, it will consume all the CPU on its VM after about 6 minutes and the process has to be force killed. I commented out all of my OTel code and couldn't get the application to crash.

Because this application is managing publishing to RMQ for multiple services I couldn't just rely on one Resource. I needed each service to report into Jaeger as its own resource. Here's my code for setting this up:

from opentelemetry import trace, context
from opentelemetry.propagate import inject, extract
from opentelemetry.trace.status import Status, StatusCode
from opentelemetry.context import attach, detach
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

class BasePublisher(ABC):
    name: str
    tracer_provider: Optional[TracerProvider] = None
        
    def set_tracer_provider(self):
        tracer_provider = TracerProvider(resource=Resource.create({"service.name": self.name}))
        jaeger_exporter = JaegerExporter(agent_host_name=env['JAEGER_ADDRESS'], agent_port=int(env['JAEGER_PORT']))
        tracer_provider.add_span_processor(BatchSpanProcessor(jaeger_exporter))
        self.tracer_provider = tracer_provider

    async def publish(self, exchange: Exchange, msg: Union[str, bytes], routing_key: str, correlation_id: str=str(uuid.uuid4())):
        if self.tracer_provider is None:
            self.set_tracer_provider()

        tracer = trace.get_tracer(self.name, tracer_provider=self.tracer_provider)

        message = Message(...)

        log.info(f"Publishing message with rouing key `{routing_key}`")
        with tracer.start_as_current_span("publish") as span:
            span.set_attribute("rmq.message.routing_key", routing_key)
            span.set_attribute("rmq.message.correlation_id", correlation_id if correlation_id else False)
            span.set_attribute("rmq.message.timestamp", str(message.timestamp))

            # My application also handles consuming. The context is injected into my RMQ message so it can be used
            # if a consume is a part of the trace
            inject(message.headers, context=context.get_current())

            await exchange.publish(message, routing_key, mandatory=False)
        log.info("Message published")

        handler.flush()
        return "Message published successfully"

Each service implements the BasePublisher class. I'm new to Python but I think my issue here is that I'm creating a lot tracer_providers. After the 10 millionth once is created then things start to crash. Interestingly enough there is no memory spike when the crash does happen. My reason for thinking this is that every time I call MyService.publish() I can see that it has to call self.set_tracer_provder(). I would hope that that only has to get called once per application run.

This is a stripped down sample of my code. I can provide more context if needed. Any help would be appreciated. Thanks.

twiclo · 2024-02-23T19:17:18Z

twiclo
Feb 23, 2024
Author

#3706

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running two tracer_providers causes CPU to spike #3694

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Running two tracer_providers causes CPU to spike #3694

twiclo Feb 21, 2024

Replies: 1 comment

twiclo Feb 23, 2024 Author

twiclo
Feb 21, 2024

twiclo
Feb 23, 2024
Author