Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/add include_slide_notes parameter #455

Merged
merged 7 commits into from
Sep 9, 2024

Conversation

mackurzawa
Copy link
Contributor

Description

  • Added include_slide_notes parameter, default is True. Works for .ppt and .pptx file extensions.
  • Added two new files in sample-docs: sample-docs/notes.ppt, sample-docs/notes.pptx that include notes on their slides. This is to easily test the functionality, as there are no existing PowerPoint files that include slide notes.

Testing

#  using default value (True) returns additional NarrativeText element that contains notes
curl -X 'POST'   'http://localhost:8000/general/v0/general'   -H 'accept: application/json'   -H 'Content-Type: multipart/form-data'   -F 'files=@sample-docs/notes.pptx'   -F 'output_format="text/csv"' 

# explicit include_slide_notes=True returns additional NarrativeText element that contains notes
curl -X 'POST'   'http://localhost:8000/general/v0/general'   -H 'accept: application/json'   -H 'Content-Type: multipart/form-data'   -F 'files=@sample-docs/notes.pptx'   -F 'output_format="text/csv"' -F 'include_slide_notes=True'

# explicit include_slide_notes=False returns no NarrativeText element 
curl -X 'POST'   'http://localhost:8000/general/v0/general'   -H 'accept: application/json'   -H 'Content-Type: multipart/form-data'   -F 'files=@sample-docs/notes.pptx'   -F 'output_format="text/csv"' -F 'include_slide_notes=False'

Same with file notes.ppt

@mackurzawa mackurzawa marked this pull request as ready for review August 22, 2024 17:43
)
df = pd.read_csv(io.StringIO(response.text))

if include_slide_notes or test_default:
Copy link

@plutasnyy plutasnyy Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is heavy nit pick and don't feel obligated to update this test, but consider this a hint for the future ;)

There is a good practice in writing unit tests, or at least convention how unit tests can be named called given-when-then (much more popular in java where long function names are more natural).
check eg URL

So instead of putting logic in the test when something is expected I would create 2/3 tests like:
test_given_note_in_slide_when_variable_not_set_then_note_extracted

(avoid keywords, and you will be able create reasonable unit test name under 100 chars ;) )

this way you won't have to guess which part of the unit test was executed (avoid if/else), avoid putting strange logic parameteres inside test parametrization, won't do mistake in logic and the test will be much more self explaining ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thank you so much for that detailed advice! Next time, I will consider splitting these kinds of tests rather than parameterizing one to contain so many cases. I will also work on improving the naming conventions. Great idea!

@mackurzawa mackurzawa requested a review from awalker4 August 22, 2024 21:13
@mackurzawa
Copy link
Contributor Author

Hey @awalker4,
Do you think it is worth mentioning this change in the changelog as a breaking change due to the different default behavior? Here’s why I’m wondering: if a user calls the API with a PowerPoint file that includes slide notes, they will receive additional elements related to those notes. While the previous elements will remain unchanged, their order might be different. I think it’s pretty rare for a user to care about the element order, but still, it might be worth noting

Copy link
Collaborator

@awalker4 awalker4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! On the element order differences, I think it's worth adding another sentence in the changelog, but that's probably all I'd do in this case. Thanks!

@mackurzawa mackurzawa merged commit 3c3b75a into main Sep 9, 2024
6 checks passed
@mackurzawa mackurzawa deleted the feat/add-include_slide_notes-parameter branch September 9, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants