Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: unevaluatedProperties doesn't correctly handle subschema validation #423

Conversation

tobz
Copy link
Contributor

@tobz tobz commented Mar 24, 2023

Fixes #421.

Context

As described in #421, my initial attempt at adding subschema validation handling in unevaluatedProperties was a brute-force approach that sufficed to pass the JSON Schema test suite, but didn't correctly evaluate real-world schemas/data.

Instead of only considering properties evaluated if the subschema they were part of was valid, taking into account the behavior of the validator keyword (i.e. oneOf can't have multiple matching subschemas), it took an approach of simply finding the first subschema which seemed to favorably evaluate/validate the property.

Solution

SubschemaSubvalidator has been changed in two significant ways:

  • it now properly considers the validation type (i.e. allOf vs oneOf)
  • it validates the entire instance before validating the property

In practice, this means that we generate the SchemaNode for each subschema, as well as our stripped-down unevaluated properties validator based on the subschema, used for evaluating properties themselves. Using oneOf as an example:

  • for each subschema in the oneOf:
    • validate the property against the property validator (UnevaluatedPropertiesValidator) and the property's parent instance against the node validator (SchemaNode)
  • for each validation result pair:
    • if the instance result was valid, store the property validation result in a local
    • if the local was already populated, return None overall: more than one valid subschema in a oneOf is not valid
  • return the local property validation result

This isn't particularly novel or anything, as it's just following the existing behavior for the oneOf/allOf/anyOf validators, with a little extra logic to thread through the evaluation status of the specific property.

Testing

I've added three new unit tests -- one for each subschema validation mode -- with fairly simple examples, but they initially did not pass, and now, with this change, they do. 😂

I've also created a simple CLI program to use jsonschema with for taking a schema and input data, and have used it successfully on a large schema document (~800KB, 200+ instances of unevaluatedProperties: false, 400-500 schema definitions, etc) where it properly detects... well, if a property is unevaluated or not.

It's still a little weird that the official JSON Schema test suite doesn't have a test for this, but I'm guessing it's based on the fact that their unofficial recommendation for implementing unevaluatedProperties is based on checking the output annotations from all of the other keywords, whereas we do it all ourselves in the unevaluatedProperties validator. That might be worth thinking about in the future, in terms of refactoring the validator to make it more robust... but for now, I'm just trying to get this working correctly. :)

@codecov
Copy link

codecov bot commented Mar 24, 2023

Codecov Report

Merging #423 (50242d4) into master (4179972) will decrease coverage by 0.02%.
The diff coverage is 80.85%.

@@            Coverage Diff             @@
##           master     #423      +/-   ##
==========================================
- Coverage   79.40%   79.38%   -0.02%     
==========================================
  Files          56       56              
  Lines        5015     5133     +118     
==========================================
+ Hits         3982     4075      +93     
- Misses       1033     1058      +25     
Impacted Files Coverage Δ
jsonschema/src/lib.rs 66.66% <ø> (ø)
jsonschema/src/keywords/unevaluated_properties.rs 85.52% <80.85%> (-1.60%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Stranger6667
Copy link
Owner

Hi @tobz

The description is impeccable! Thank you very much :) Indeed, I agree that this is an improvement over the status quo and we can make more changes in the future :)

Maybe we can upstream some of the tests to the official test suite? Also, I guess that @Julian might be interested in reusing some of the logical parts we have in this PR in some other JSON Schema related projects? :) or there could be some useful changes for other implementations too

P.S. I hope to find more bandwidth to work on this myself too, especially on those $ref issues found by bowtie

@@ -1094,6 +1329,11 @@ pub(crate) fn compile<'a>(
}
}

fn boxed_errors<'a>(errors: Vec<ValidationError<'a>>) -> ErrorIterator<'a> {
let boxed_errors: ErrorIterator<'a> = Box::new(errors.into_iter());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work without binding? I.e. immediately return Box::new(errors.into_iter())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, no. The compiler doesn't want to infer that std::vec::IntoIter<...> is dyn Iterator<...> without it, which is why I made this small helper method in the first place.

Silly compiler. 😠

// evaluated by at least one subschema. We group the errors for the property itself
// across all subschemas, though.
SubschemaBehavior::All => {
let results = mapped.collect::<Vec<_>>();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse mapped below without collecting it into a Vec?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. There is a similar case above

@tobz
Copy link
Contributor Author

tobz commented Mar 30, 2023

Maybe we can upstream some of the tests to the official test suite? Also, I guess that @Julian might be interested in reusing some of the logical parts we have in this PR in some other JSON Schema related projects? :) or there could be some useful changes for other implementations too

I actually realized that the commit that the test suite submodule is pointed at is actually fair old, and if I bring it up to the latest commit... a lot of new unevaluatedProperties tests are added that don't pass, as well as other, unrelated new tests that also don't pass.

I had the idea that I would get this merged, and then try and open another PR that updates the test suite checkout and fixes whatever needs to be fixed, assuming the fixes are small enough, all in the same PR.

@tobz
Copy link
Contributor Author

tobz commented Mar 30, 2023

As a separate note, using this approach for unevaluatedProperties in practice has shown me that it can be quite slow when there is a lot of subschema validation due to the double validation: we run subschema validation ourselves, and then the actual validators for those keywords run outside of unevaluatedProperties.

This is likely why most implementations actually go with the annotation-based approach, examining what properties got evaluated by successful schema nodes, in addition to simplifying the logic for unevaluatedProperties itself. I do plan to pursue a refactor to do it with the annotation-based approach, but it would involve -- I believe -- deeper changes to all other property-related keywords to emit the right annotations, so I was going to, again, hold off until I got this implementation working as correctly as possible, and then focus on the "make it fast" part.

@Julian
Copy link

Julian commented Apr 2, 2023

Hey! Have been a bit behind, sorry. Definitely if there's anything more widely usable I'm interested :)

And yeah certainly so if it turns out there are test gaps even after updating!

@tobz
Copy link
Contributor Author

tobz commented May 31, 2023

Really delayed response here (back after an extended work leave) but I'm definitely interested in trying to finally get this merged.

I'll still keep thinking about the aforementioned rework to deal with the performance problems, but this PR is still strictly better from a correctness standpoint, at least. :)

@Stranger6667
Copy link
Owner

@tobz Sorry for the delay from my side too! I agree with you, this PR is a strict improvement over the status quo. I am going to re-review it once more and will work on releasing this :)

@Stranger6667 Stranger6667 merged commit fde9c44 into Stranger6667:master Jul 5, 2023
@tobz tobz deleted the fix-unevaluated-props-subschema-correctness branch July 5, 2023 18:30
@tobz
Copy link
Contributor Author

tobz commented Jul 5, 2023

Makes sense, and glad to see it merged. 👍🏻

Like I said, I'm still thinking about reworking the support to be in line with other validator implementations, to simplify the code and improve the performance. Hopefully I can get to that in the next few weeks, but, we'll see. :)

@tobz tobz restored the fix-unevaluated-props-subschema-correctness branch July 6, 2023 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improper handling of subschema validation in unevaluatedProperties.
3 participants