Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize expensive projects that are not under an exchange #766

Open
erizocosmico opened this issue Feb 22, 2019 · 2 comments
Open

Parallelize expensive projects that are not under an exchange #766

erizocosmico opened this issue Feb 22, 2019 · 2 comments
Assignees

Comments

@erizocosmico
Copy link
Contributor

SELECT uast_extract(
    uast(blob_content, 'csharp', "(//csharp:BinaryExpression_AddExpression/Left/uast:String | //csharp:InterpolatedStringExpression//csharp:InterpolatedStringTextToken[1])[starts-with(normalize-space(@Value), 'SELECT') or starts-with(normalize-space(@Value), 'select') or starts-with(normalize-space(@Value), 'UPDATE') or starts-with(normalize-space(@Value), 'update') or starts-with(normalize-space(@Value), 'DELETE') or starts-with(normalize-space(@Value), 'delete') or starts-with(normalize-space(@Value), 'INSERT') or starts-with(normalize-space(@Value), 'insert') or starts-with(normalize-space(@Value), 'CREATE') or starts-with(normalize-space(@Value), 'create') or starts-with(normalize-space(@Value), 'ALTER') or starts-with(normalize-space(@Value), 'alter') or starts-with(normalize-space(@Value), 'DROP') or starts-with(normalize-space(@Value), 'drop')]"),
    '@pos') AS positions,
    repository_id,
    file_path
FROM (
    SELECT f.repository_id,
        f.file_path,
        b.blob_content
    FROM (
        SELECT *
        FROM refs r
        NATURAL JOIN commit_blobs cb
        NATURAL JOIN blobs
        WHERE r.ref_name = 'HEAD'
            AND NOT IS_BINARY(blob_content)
    ) b
    INNER JOIN (
        SELECT repository_id, file_path, blob_hash
        FROM refs r
        NATURAL JOIN commit_files cf
        WHERE r.ref_name = 'HEAD'
    ) f
    ON b.blob_hash = f.blob_hash
        AND b.repository_id = f.repository_id
    WHERE language(f.file_path, b.blob_content) = 'C#'
) t
WHERE positions IS NOT NULL

This could be parallelized adding an exchange over the topmost projection. Instead, we do this serially, causing extremely low performance on queries using uast functions on the topmost projects under certain conditions.

@erizocosmico erizocosmico transferred this issue from src-d/gitbase Feb 22, 2019
@erizocosmico
Copy link
Contributor Author

erizocosmico commented Mar 19, 2019

This cannot be done right now (see src-d/go-mysql-server#621).

What we can do (and would solve the problem for this query) is execute Projections in parallel when they contain UDFs if there is not an exchange higher up the tree.

WDYT @ajnavarro?

@erizocosmico erizocosmico self-assigned this Mar 27, 2019
@erizocosmico erizocosmico changed the title Subqueries should be parallelized too Parallelize expensive projects that are not under an exchange Mar 27, 2019
@erizocosmico
Copy link
Contributor Author

This is actually slower than serially computing project, even with UDFs:

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-mysql-server.v0/sql/plan
BenchmarkProject/no_parallelism-4         	   10000	    168082 ns/op	   67199 B/op	     958 allocs/op
BenchmarkProject/parallelism-4            	    5000	    275574 ns/op	   67821 B/op	     967 allocs/op
PASS
ok  	gopkg.in/src-d/go-mysql-server.v0/sql/plan	3.131s

I think the only case that might benefit from this is actually bblfsh parsing. So if we enable this, it would need to be a gitbase rule, not a go-mysql-server one, because for builtin UDFs it's twice as slow.

@erizocosmico erizocosmico transferred this issue from src-d/go-mysql-server Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant