Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Improve the performance of ranger access requests #6754

Open
3 of 4 tasks
wankunde opened this issue Oct 18, 2024 · 2 comments · May be fixed by #6758
Open
3 of 4 tasks

[Improvement] Improve the performance of ranger access requests #6754

wankunde opened this issue Oct 18, 2024 · 2 comments · May be fixed by #6758

Comments

@wankunde
Copy link

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Right now in RuleAuthorization we use an ArrayBuffer to collect access requests, which is very slow because each new PrivilegeObject needs to be compared with all access requests.

How should we improve?

We can use a HashMap to optimize this.

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
  • No. I cannot submit a PR at this time.
Copy link

Hello @wankunde,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

@wankunde
Copy link
Author

Test with local 50000 files:

test("KYUUBI #6754: improve the performance of ranger access requests") {
    val outputPath = "/private/var/folders/tr/scn8dgl13_l6_sh17bghtln1b35kn1/T/kyuubi-test-5492934124608743789/"
    println("output path: "+ outputPath)

    val plugin = mock[SparkRangerAdminPlugin.type]
    when(plugin.verify(Seq(any[RangerAccessRequest]), any[SparkRangerAuditHandler]))
      .thenAnswer(_ => ())

    val df = spark.read.parquet(outputPath + "/*/*.parquet")
    val plan = df.queryExecution.optimizedPlan
    val start = System.currentTimeMillis()
    RuleAuthorization(spark).checkPrivileges(spark, plan)
    val end = System.currentTimeMillis()
    println(s"Time elapsed : ${end - start} ms")
  }

Before
Before
After
After

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant