-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix: Fixing Handling of Pictures PowerPoint Backend #1263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Fixing Handling of Pictures PowerPoint Backend #1263
Conversation
…n image attribute and image part of all extensions except with emf or wmf extensions to avoid bug in adding picture to doc
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
…e attribute and image part of all extensions except with emf or wmf extensions to avoid bug in adding picture to doc
…e attribute and image part of all extensions except with emf or wmf extensions to avoid bug in adding picture to doc (just added ny signoff) Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
if hasattr(shape, "image"): # make sure the Picture shape has an image attribute | ||
image_part = shape.image # get the image part | ||
if image_part.ext not in ["emf", "wmf"]: # all extensions except emf and wmf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of excluding EMF and WMF pictures, this PR would better adopt how we handled it on the MS Word backend: https://github.com/docling-project/docling/blob/main/docling/backend/msword_backend.py#L671-L686
On Windows platforms, EMF and WMF actually work with PIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure sounds awesome @cau-git let me try to implement your suggestion after work. And thanks so much for directing me to the DCO convention. Very much appreciate the team quick response!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benichou Let us know when you expect this to be done! Love the work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PeterStaar-IBM for the gentle reminder! I have been busier than usual at work but I have prepared the code this weekend and working on it today, you should have it today!
Bug Fix
Handling Pictures in PPTX with no "image" attributes or "emf"/"wmf" extensions in the Image parts.
Change
So the change is minimal: Just making sure in the mspowerpoint backend that when handling pictures, we only process pictures that indeed have an image attribute (sometimes slide deck have images that look like images and are Picture Types but do not have an image attribute) and for which the Image part extension is not "emf" or "wmf"
Issue resolved:
1242
Areas that has been changed:
MsPowerBackend Slight Change in handle_shapes
1242: https://github.com//issues/1242Checklist:
Minimal change so no need to update the documentation, add example or adding tests