Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Interactive update streams #402

Open
szarnyasg opened this issue Jul 1, 2022 · 1 comment
Open

Generate Interactive update streams #402

szarnyasg opened this issue Jul 1, 2022 · 1 comment

Comments

@szarnyasg
Copy link
Member

szarnyasg commented Jul 1, 2022

For SNB Interactive, updates are continuous and not bound to daily batches. Therefore, inserts and deletes go into files sorted by creationDate/deletionDate (resp.).

For the inserts, each line should define an operation, which necessitates merging all outgoing edges (not just 1-N but also N-M ones) and attributes into the file. E.g. Forum.csv should have the Forum_hasTag_Tag edges merged as tagIds.

These features are currently implemented as Python scripts in https://github.com/ldbc/ldbc_snb_interactive_driver/tree/main/scripts

@szarnyasg szarnyasg added this to the Milestone 4 milestone Jul 1, 2022
@szarnyasg szarnyasg changed the title Generate Interactive updatestreams Generate Interactive inserts update streams Jul 2, 2022
@szarnyasg
Copy link
Member Author

szarnyasg commented Jul 2, 2022

The set of files is the same 8 files as for the deletes but the remaining dynamic files (e.g. Forum_hasTag_Tag) have to be joined & aggregated into them (as nested attributes). This is currently performed by a Python/DuckDB script but ultimately it should be implemented by Datagen.

@szarnyasg szarnyasg removed this from the Milestone 4 milestone Oct 10, 2022
@szarnyasg szarnyasg changed the title Generate Interactive inserts update streams Generate Interactive update streams Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants