Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w3cDom.asString: Namespace for prefix 'xxx' has not been declared #2087

Closed
richardmorleysmith opened this issue Dec 19, 2023 · 7 comments
Closed
Milestone

Comments

@richardmorleysmith
Copy link

So basically I noticed this crash when trying to use w3CDom.asString() for a site which was created using Vue.js, and it was using "v-bind" in the place of "xxx"

To reproduce you can run the following test case:

@Test
void testNameSpaceCrash()
{
    final W3CDom w3CDom = new W3CDom().namespaceAware(false);
    final String html = """
        <html>
        <body>
        <div xxx:class="test"></div>
        </body>
        </html>""";
    final Document jSoupDoc = Jsoup.parse(html);
    final org.w3c.dom.Document w3CDoc = w3CDom.fromJsoup(jSoupDoc);

    assertDoesNotThrow(() -> w3CDom.asString(w3CDoc));
}
@jhy
Copy link
Owner

jhy commented Dec 23, 2023

I'm not sure what the best way to handle this is. The exception is coming out of the JDKs XML serializer (com.sun.org.apache.xml.internal.serializer) and it's always going to throw an exception if an attribute has an undeclared prefix.

A couple of options:

  1. When creating the attribute, jsoup could set an arbitrary namespace URI for an undeclared prefix. The output would be something like: <div xmlns:xxx="undefined" xxx:class="test"></div>
  2. Or, we could escape the : in the attribute key and so the output would be: <div xxxU00003Aclass="test"></div>

Option 1 is probably more compatible (in that in this instance of Vue, the JS would still execute).

Can you add some detail to your use case -- what are you trying to do with using this W3C interface and serialization vs the jsoup document serialization?

@jhy jhy changed the title Namespace for prefix 'xxx' has not been declared w3cDom.asString: Namespace for prefix 'xxx' has not been declared Dec 23, 2023
@jhy jhy added the needs-more-info More information is needed from the reporter to progress the issue label Dec 30, 2023
@richardmorleysmith
Copy link
Author

Hi @jhy,

I'm using a JavaFX WebView to load web pages, which gives us a W3C document which I then convert into a String using the W3CDom class provided by JSoup.

@richardmorleysmith
Copy link
Author

Hey @jhy, just wondering if there were any updates on this one? Is there still more info you need? :)

@jhy
Copy link
Owner

jhy commented Jul 1, 2024

Sorry for the late reply. Thanks for the usecase info. So, I think my suggested option 1 would be best? Or, do you have another suggestion?

@richardmorleysmith
Copy link
Author

Hi @jhy, I agree that option 1 would work best!

@jhy jhy removed the needs-more-info More information is needed from the reporter to progress the issue label Mar 5, 2025
@jhy jhy closed this as completed in 3082a4f Mar 10, 2025
@jhy
Copy link
Owner

jhy commented Mar 10, 2025

I have fixed this for W3C DOM objects created by jsoup - as in your test case.

However for W3C DOMs created outside of jsoup (e.g. you mentioned via JavaFX), this won't help. I tried adding an error listener to the transformer, but could not catch it. So will treat that as out of scope for the present. I think you could do a pass on the document and add the missing prefixes before passing to the asString.

@jhy jhy added this to the 1.19.2 milestone Mar 10, 2025
@richardmorleysmith
Copy link
Author

Hi @jhy,

Thanks for the update, looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants