New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add metrics GpuPartitioning.CopyToHostTime #11882

Merged

sperlingxx merged 5 commits into NVIDIA:branch-25.02 from sperlingxx:add_h2d_metrics

Dec 19, 2024

Collaborator

sperlingxx commented Dec 17, 2024

Close #11878

This PR is to add the GpuMetric GpuPartitioning.CopyToHostTime. Since GpuPartitioning is a GpuExpression rather than a GpuPlan, a specialized method GpuPartitioning.setupMetrics was created for the setup of detailed GpuPartitioning metrics during the planning time.

During the local test, the newly-added metric works well.

sperlingxx requested review from jlowe, abellina, liurenjie1024 and revans2

December 17, 2024 03:49


          add metrics GpuPartitioning.CopyToHostTime

967d345

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx force-pushed the add_h2d_metrics branch from db876e4 to 967d345 Compare

December 17, 2024 03:53

Collaborator Author

sperlingxx commented Dec 17, 2024

build

fix

00580a0

Collaborator Author

sperlingxx commented Dec 17, 2024

build

sperlingxx requested review from winningsix and binmahone

December 17, 2024 13:23

jlowe reviewed

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartitioning.scala Outdated

Comment on lines 36 to 37

		// The SQLMetric key for MemoryCopyFromDeviceToHost
		val CopyToHostTime: String = "d2hMemCpyTime"

Member

jlowe Dec 17, 2024

This should be in GpuMetric along with the description. Copy to host time is not a metric specific to partitioning, and we should be consistent about it.

Collaborator Author

sperlingxx Dec 18, 2024

Moved it into GpuMetric

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartitioning.scala Outdated Show resolved Hide resolved

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartitioning.scala

+                        new NvtxRange("PartitionD2H", NvtxColor.CYAN))
+                    // Wait for copyToHostAsync
+                    withResource(memCpyNvtxRange) { _ =>
+                      Cuda.DEFAULT_STREAM.sync()

Member

jlowe Dec 17, 2024

This is not the only time spent on copy to host. The copyToHostAsync calls above are not guaranteed to be asynchronous (e.g.: when the copy is from pageable memory, and we're not guaranteed to be using pinned memory). Therefore the metric and NVTX range needs to cover the copyToHostAsync calls above.

Collaborator Author

sperlingxx Dec 18, 2024

I refined the code to wrap them all.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartitioning.scala Outdated Show resolved Hide resolved

revans2 reviewed

View reviewed changes

Collaborator

revans2 left a comment

Mostly the same comments as Jason

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartitioning.scala Outdated

@@ @@ -132,7 +135,15 @@ trait GpuPartitioning extends Partitioning { @@
                     }
                   }
                   withResource(hostPartColumns) { _ =>
-                    Cuda.DEFAULT_STREAM.sync()
+                    lazy val memCpyNvtxRange = memCopyTime.map(
+                        new NvtxWithMetrics("PartitionD2H", NvtxColor.CYAN, _))

Collaborator

revans2 Dec 17, 2024

NvtxWithMetrics has an apply that already does this for you.

withResource(NvtxRange("PartitionD2H", NvtxColor.CYAN, memCopyTime)) { _ =>
...
}

Collaborator Author

sperlingxx Dec 18, 2024

Fixed.

sperlingxx and others added 3 commits

December 18, 2024 09:24


          Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartition…

85e84eb

…ing.scala

Co-authored-by: Jason Lowe <jlowe@nvidia.com>


          Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartition…

377f87c

…ing.scala

Co-authored-by: Jason Lowe <jlowe@nvidia.com>


          refine

505b7e8

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

Collaborator Author

sperlingxx commented Dec 18, 2024

build

revans2 approved these changes

View reviewed changes

abellina approved these changes

View reviewed changes

sperlingxx merged commit 231a9c6 into NVIDIA:branch-25.02

50 checks passed

sperlingxx deleted the add_h2d_metrics branch

December 19, 2024 00:06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jlowe jlowe left review comments

revans2 revans2 approved these changes

abellina abellina approved these changes

liurenjie1024 Awaiting requested review from liurenjie1024

winningsix Awaiting requested review from winningsix

binmahone Awaiting requested review from binmahone

Labels

None yet