Skip to content

fix: Fixing Handling of Pictures PowerPoint Backend #1263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

benichou
Copy link

@benichou benichou commented Mar 30, 2025

Bug Fix

Handling Pictures in PPTX with no "image" attributes or "emf"/"wmf" extensions in the Image parts.

Change

So the change is minimal: Just making sure in the mspowerpoint backend that when handling pictures, we only process pictures that indeed have an image attribute (sometimes slide deck have images that look like images and are Picture Types but do not have an image attribute) and for which the Image part extension is not "emf" or "wmf"

Issue resolved:

1242

Areas that has been changed:

MsPowerBackend Slight Change in handle_shapes

1242: https://github.com//issues/1242

Checklist:
Minimal change so no need to update the documentation, add example or adding tests

  • [ X] Documentation has been updated, if necessary.
  • [ X] Examples have been added, if necessary.
  • [X ] Tests have been added, if necessary.

…n image attribute and image part of all extensions except with emf or wmf extensions to avoid bug in adding picture to doc
Copy link

mergify bot commented Mar 30, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

…e attribute and image part of all extensions except with emf or wmf extensions to avoid bug in adding picture to doc
@benichou benichou changed the title bug fix to ensure handling of pictures only applies to picture with a… fix: bug fix to ensure handling of pictures only applies to picture with a… Mar 30, 2025
@benichou benichou changed the title fix: bug fix to ensure handling of pictures only applies to picture with a… fix: Fixing Handling of Pictures PowerPoint Backend Mar 30, 2025
…e attribute and image part of all extensions except with emf or wmf extensions to avoid bug in adding picture to doc (just added ny signoff) Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr
Copy link
Contributor

@cau-git cau-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benichou Thanks for this contribution, can you please review below suggestions?

Also, please be sure to sign-off all commits so the DCO passes. See what DCO CI test recommends as actions here.

)
if hasattr(shape, "image"): # make sure the Picture shape has an image attribute
image_part = shape.image # get the image part
if image_part.ext not in ["emf", "wmf"]: # all extensions except emf and wmf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of excluding EMF and WMF pictures, this PR would better adopt how we handled it on the MS Word backend: https://github.com/docling-project/docling/blob/main/docling/backend/msword_backend.py#L671-L686

On Windows platforms, EMF and WMF actually work with PIL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure sounds awesome @cau-git let me try to implement your suggestion after work. And thanks so much for directing me to the DCO convention. Very much appreciate the team quick response!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benichou Let us know when you expect this to be done! Love the work!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PeterStaar-IBM for the gentle reminder! I have been busier than usual at work but I have prepared the code this weekend and working on it today, you should have it today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants