Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAPHNE-#529] WIP: translator for dml to daphneDSL #576

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

aski02
Copy link

@aski02 aski02 commented Jul 21, 2023

WIP: Script for translating scripts from systemds' dml to daphneDSL.

Tests can be run via "python3 test_shortestPath.py" or "python3 test_sigmoid.py"

  • .csv output is then saved in output/

Translator itself can be run via "python3 dml2daph.py dml_file" (replace <dml_file> with path to .dml script)

  • translated script is saved to tools/translated_files

pdamme and others added 3 commits July 11, 2023 00:42
- So far, user-defined functions in DaphneDSL could return either no values or exactly one value.
- However, there are situations when it is convenient to return multiple values from a UDF.
  - These values may even have different data types (e.g., scalar and matrix), such that wrapping them into a matrix or frame is not a general solution.
- This commit introduces support for DaphneDSL UDFs with multiple return values.
- Adapted the DaphneDSL grammar and parser accordingly.
- Added script-level test cases.
- Updated the DaphneDSL language reference (documentation).
- This commit is just an intermediate development state, it will be split into several meaningful commits before it is merged into main.
- Manual translation of decision trees and random forests scripts from Apache SystemDS's DML to DAPHNE's DaphneDSL.
- More useful error messages.
- Several small things.
@aski02 aski02 changed the title WIP: initial translator for dml to Daphne [DAPHNE-#529] WIP: translator for dml to daphneDSL Jul 22, 2023
@pdamme pdamme self-requested a review July 29, 2023 11:34
@pdamme pdamme added the LDE summer 2023 Student project in the course Large-scale Data Engineering at TU Berlin (summer 2023). label Sep 13, 2023
- Consistently use space (not mixed space and tab) for indentation.
- Removed trailing whitespaces.
…rests' into 529-dml2daph"

This reverts commit e7a762a, reversing
changes made to 9ecf1e1.

- This is done to simplify merging upstream/main into this branch.
- Meanwhile, a lot has happened on upstream/main and that is the state of the code we need.
- Added license header.
- Fixed various small issues.
- Fixed/harmonized formatting and wording in comments and error messages.
- Fixed wording in identifiers (e.g., "if_body" to "then_body", "data type" to "value type").
- Simplified the code at some points.
- Harmonized indentation of generated DaphneDSL code.
- And several more minor things.
@pdamme
Copy link
Collaborator

pdamme commented Oct 4, 2024

Thanks for this contribution and your pioneering efforts back then for a (semi-)automatic translation of SystemDS's DML scripts to DAPHNE's DaphneDSL scripts, @aski02! This dml2daph tool will enable us to quickly build up a hierarchy of data science primitives in DAPHNE, which users can then use like library functions. Offering such primitives will greatly improve user productivity.

As we've discussed back then, the code looks very good, overall. Nevertheless, before we can merge it in, some tidy-up is required. I've already tidied up the main dml2daph.py script as well as the documentation/tutorial a bit (see the commits I added to your branch).

I will continue handling this PR and address the required points to merge it in.

Required changes before we can merge it:

  1. Remove the ANTLR-generated files DmlLexer.py, DmlParser.py, and DmlVisitor.py from the PR. Back then, you told me that some manual changes were required, since the code generated by ANTLR contained some fragments of Java, which made the files unusable (perhaps a bug in ANTLR), and that a newer version of ANTLR could fix this. If that is the case, we should rather switch to a newer version of ANTLR and generate these files as part of the DAPHNE build rather than pushing thousands of lines of generated code. I will investigate this issue.
  2. Remove files that are not required anymore. We need to have a closer look at the test and experiments files again. These were highly valuable for your project work in the "Large-scale Data Engineering" module, but may not be required in the DAPHNE repository. I will take care of it.

Hints on using the dml2daph translator before the merge.

  • The source code of Apache SystemDS must be cloned into thirdparty/systemds/. We will make this more flexible in the future.
  • The ANTLR4 Python runtime must be installed, which can be done by pip install antlr4-python3-runtime.
  • Additional information on using the translator can be found in doc/tutorial/Translator.md contained in this PR.

@aski02
Copy link
Author

aski02 commented Oct 4, 2024

Hi @pdamme, I apologize for not having completed this. If there are any tasks I can assist with to help resolve the issues, I'll be glad to do it!

@pdamme
Copy link
Collaborator

pdamme commented Oct 4, 2024

Dear @aski02, no worries! You did a very good job on this task last year. I just didn't find the time to finalize this PR so far, but I'll take care of that now, no additional work is required from your side. After the merge, we will continue developing the dml2daph tool and start actually using it (we're finally ready for that now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LDE summer 2023 Student project in the course Large-scale Data Engineering at TU Berlin (summer 2023).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants