-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use download thread to speed up result retrieval #280
base: master
Are you sure you want to change the base?
Conversation
3a5e182
to
9b3fb71
Compare
9b3fb71
to
15de687
Compare
Performance tests: TPC-current prod
TPC - Download thread
|
@@ -684,6 +686,27 @@ def _verify_extra_credential(self, header): | |||
raise ValueError(f"only ASCII characters are allowed in extra credential '{key}'") | |||
|
|||
|
|||
class ResultDownloader(): | |||
def __init__(self): | |||
self.queue: queue.Queue = queue.Queue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs, the default maxsize
is zero, which means unbounded. When I was testing similar changes in the Go driver, it turned out that the queue size didn't matter, so even a queue of size 1 is fine. Did you test different queue sizes, and if not, can you check if maxsize
greater than 1 makes a notable difference?
There is a lot of noise in the benchmark results. But ignoring that the results seem like a mixed bag, there is noticeable improvement for some data types/queries with legacy type mapping but the same types also show regressions as soon as type mapping is introduced - which seems a bit unintuitive. Some regressions which we should look at, I think the ones with type mapping actually show that type mapping on main thread actually introduces some contention/waiting? The ones without type mapping regressing are the more interesting results.
As a side note there's no point to using the TPCH connector to benchmark, it's often bottlenecked by data generation. Using the memory connector is the correct way. |
Oh, one more thing - are you testing with the server running locally or somewhere remote? Because the local network is so fast that you probably won't be able to measure any benefits of faster downloads. |
Python's Queue is deeply flawed and queue raising Empty is unreliable, source: I've done tons of multi-threaded programming in Python. Also, the semaphore that the Queue class uses internally is slow. Queue is designed to protect the programmer from themselves when unknown objects are being synchronized. Because of the Global Interpreter Lock, most methods of a list are thread safe. |
Description
Use a separate thread for downloading and do parsing in the main thread.
Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:
* Improved result retrieval performance by using a download thread to allow for simultaneous mapping and downloading of results