Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagated traces have incorrect start time #698

Open
omikader opened this issue Oct 1, 2024 · 3 comments
Open

Propagated traces have incorrect start time #698

omikader opened this issue Oct 1, 2024 · 3 comments
Labels
bug Report a bug

Comments

@omikader
Copy link

omikader commented Oct 1, 2024

Description

We are seeing strange delays in the start time for propagated traces emitted from our Flask API. Notice how the HTTP GET trace, which is emitted by our React SPA, starts and finishes before the traces emitted by the Flask API server seem to even start.

Image

Steps to reproduce

  • Add Faro to the project and enable propagateTraceHeaderCorsUrls to ensure that the Traceparent header is sent to the backend server
  • Instrument a Flask server using OpenTelemetry's zero-code autoinstrumentation which automatically creates traces, metrics, and logs for popular Python libraries like flask and mysql-connector-python
  • Click around in the React app and verify that a value for Traceparent is also set in the request headers to the API server. This allows us to successfully correlate the traces emitted in Flask with the original web request from the React app.

Expected behavior

  • I expected the reported time in Grafana for this trace to be identical to the time reported by the network tab in the browser
  • I expected any propagated traces to start + finish within the timing of the top level trace associated with the web request from the dor-omar-frontend app

Actual behavior

  • The event sent to the collector suggests that the request took 66ms

Image

  • The network tab suggests that the request took 28ms + 29ms = 57ms. This is inline with the 56ms reported by Grafana for the light-blue trace

Image

  • The top-level suggests that it took 391.32ms which I believe to be due to the delayed reporting of the propagated traces.

Image

Environment

  • SDK version: 1.10.0
  • SDK instrumentations: Web SDK, React, React Router, Web Tracing
  • Device type: laptop
  • Device name: MacBook Air 13"
  • OS: macOS Sonoma 14.6.1
  • Browser: Chrome 128.0.6613.120
@omikader omikader added the bug Report a bug label Oct 1, 2024
@JordiPolo
Copy link

JordiPolo commented Oct 2, 2024

I think this happens because the clock in the laptop and the clock in the server are 200ms apart.
Consumer laptops are not too hardcore about syncing continuously their clocks, being a couple of seconds up or down compared with some atomic clock in the atmosphere or something, does not make a difference to them.
Servers tend to be perfectly synced within the datacenter so this effect does not happen among services.

@omikader
Copy link
Author

omikader commented Oct 2, 2024

Hi @JordiPolo, thanks for the quick response! Given that this is the case, would you say it is not valuable to propagate traces from a client's device to actions taken on the server? Or perhaps is there a canonical strategy for resetting the start time for the client-side trace once it is captured by the Faro receiver?

@JordiPolo
Copy link

Maybe at some point there is a way to know the diff in clocks an a solution. Right now this is what you will get. To me still makes sense to have this information as long as the people who would be reading the trace understand why this looks like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report a bug
Projects
None yet
Development

No branches or pull requests

2 participants