fix: `unevaluatedProperties` doesn't correctly handle subschema validation #423

tobz · 2023-03-24T20:59:31Z

Fixes #421.

Context

As described in #421, my initial attempt at adding subschema validation handling in unevaluatedProperties was a brute-force approach that sufficed to pass the JSON Schema test suite, but didn't correctly evaluate real-world schemas/data.

Instead of only considering properties evaluated if the subschema they were part of was valid, taking into account the behavior of the validator keyword (i.e. oneOf can't have multiple matching subschemas), it took an approach of simply finding the first subschema which seemed to favorably evaluate/validate the property.

Solution

SubschemaSubvalidator has been changed in two significant ways:

it now properly considers the validation type (i.e. allOf vs oneOf)
it validates the entire instance before validating the property

In practice, this means that we generate the SchemaNode for each subschema, as well as our stripped-down unevaluated properties validator based on the subschema, used for evaluating properties themselves. Using oneOf as an example:

for each subschema in the oneOf:
- validate the property against the property validator (UnevaluatedPropertiesValidator) and the property's parent instance against the node validator (SchemaNode)
for each validation result pair:
- if the instance result was valid, store the property validation result in a local
- if the local was already populated, return None overall: more than one valid subschema in a oneOf is not valid
return the local property validation result

This isn't particularly novel or anything, as it's just following the existing behavior for the oneOf/allOf/anyOf validators, with a little extra logic to thread through the evaluation status of the specific property.

Testing

I've added three new unit tests -- one for each subschema validation mode -- with fairly simple examples, but they initially did not pass, and now, with this change, they do. 😂

I've also created a simple CLI program to use jsonschema with for taking a schema and input data, and have used it successfully on a large schema document (~800KB, 200+ instances of unevaluatedProperties: false, 400-500 schema definitions, etc) where it properly detects... well, if a property is unevaluated or not.

It's still a little weird that the official JSON Schema test suite doesn't have a test for this, but I'm guessing it's based on the fact that their unofficial recommendation for implementing unevaluatedProperties is based on checking the output annotations from all of the other keywords, whereas we do it all ourselves in the unevaluatedProperties validator. That might be worth thinking about in the future, in terms of refactoring the validator to make it more robust... but for now, I'm just trying to get this working correctly. :)

codecov · 2023-03-24T21:09:50Z

Codecov Report

Merging #423 (50242d4) into master (4179972) will decrease coverage by 0.02%.
The diff coverage is 80.85%.

@@            Coverage Diff             @@
##           master     #423      +/-   ##
==========================================
- Coverage   79.40%   79.38%   -0.02%     
==========================================
  Files          56       56              
  Lines        5015     5133     +118     
==========================================
+ Hits         3982     4075      +93     
- Misses       1033     1058      +25

Impacted Files	Coverage Δ
jsonschema/src/lib.rs	`66.66% <ø> (ø)`
jsonschema/src/keywords/unevaluated_properties.rs	`85.52% <80.85%> (-1.60%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Stranger6667 · 2023-03-30T21:14:23Z

Hi @tobz

The description is impeccable! Thank you very much :) Indeed, I agree that this is an improvement over the status quo and we can make more changes in the future :)

Maybe we can upstream some of the tests to the official test suite? Also, I guess that @Julian might be interested in reusing some of the logical parts we have in this PR in some other JSON Schema related projects? :) or there could be some useful changes for other implementations too

P.S. I hope to find more bandwidth to work on this myself too, especially on those $ref issues found by bowtie

Stranger6667 · 2023-03-30T21:19:14Z

jsonschema/src/keywords/unevaluated_properties.rs

@@ -1094,6 +1329,11 @@ pub(crate) fn compile<'a>(
    }
 }

+fn boxed_errors<'a>(errors: Vec<ValidationError<'a>>) -> ErrorIterator<'a> {
+    let boxed_errors: ErrorIterator<'a> = Box::new(errors.into_iter());


Would it work without binding? I.e. immediately return Box::new(errors.into_iter())

It doesn't, no. The compiler doesn't want to infer that std::vec::IntoIter<...> is dyn Iterator<...> without it, which is why I made this small helper method in the first place.

Silly compiler. 😠

Stranger6667 · 2023-03-30T21:27:45Z

jsonschema/src/keywords/unevaluated_properties.rs

+            // evaluated by at least one subschema. We group the errors for the property itself
+            // across all subschemas, though.
+            SubschemaBehavior::All => {
+                let results = mapped.collect::<Vec<_>>();


Can we reuse mapped below without collecting it into a Vec?

P.S. There is a similar case above

tobz · 2023-03-30T21:42:57Z

Maybe we can upstream some of the tests to the official test suite? Also, I guess that @Julian might be interested in reusing some of the logical parts we have in this PR in some other JSON Schema related projects? :) or there could be some useful changes for other implementations too

I actually realized that the commit that the test suite submodule is pointed at is actually fair old, and if I bring it up to the latest commit... a lot of new unevaluatedProperties tests are added that don't pass, as well as other, unrelated new tests that also don't pass.

I had the idea that I would get this merged, and then try and open another PR that updates the test suite checkout and fixes whatever needs to be fixed, assuming the fixes are small enough, all in the same PR.

tobz · 2023-03-30T21:47:00Z

As a separate note, using this approach for unevaluatedProperties in practice has shown me that it can be quite slow when there is a lot of subschema validation due to the double validation: we run subschema validation ourselves, and then the actual validators for those keywords run outside of unevaluatedProperties.

This is likely why most implementations actually go with the annotation-based approach, examining what properties got evaluated by successful schema nodes, in addition to simplifying the logic for unevaluatedProperties itself. I do plan to pursue a refactor to do it with the annotation-based approach, but it would involve -- I believe -- deeper changes to all other property-related keywords to emit the right annotations, so I was going to, again, hold off until I got this implementation working as correctly as possible, and then focus on the "make it fast" part.

Julian · 2023-04-02T14:14:27Z

Hey! Have been a bit behind, sorry. Definitely if there's anything more widely usable I'm interested :)

And yeah certainly so if it turns out there are test gaps even after updating!

tobz · 2023-05-31T13:35:40Z

Really delayed response here (back after an extended work leave) but I'm definitely interested in trying to finally get this merged.

I'll still keep thinking about the aforementioned rework to deal with the performance problems, but this PR is still strictly better from a correctness standpoint, at least. :)

Stranger6667 · 2023-07-05T08:50:18Z

@tobz Sorry for the delay from my side too! I agree with you, this PR is a strict improvement over the status quo. I am going to re-review it once more and will work on releasing this :)

tobz · 2023-07-05T18:31:43Z

Makes sense, and glad to see it merged. 👍🏻

Like I said, I'm still thinking about reworking the support to be in line with other validator implementations, to simplify the code and improve the performance. Hopefully I can get to that in the next few weeks, but, we'll see. :)

fix: unevaluatedProperties doesn't correctly handle subschema validation

548907b

tobz added 4 commits March 27, 2023 10:33

fix clippy lints and feature flag stuff

09d6733

trying to optimize slightly

fbe26e0

tweak

95b8c2a

make additionalProperties a first-class subvalidator

50242d4

Stranger6667 reviewed Mar 30, 2023

View reviewed changes

Stranger6667 approved these changes Jul 5, 2023

View reviewed changes

Stranger6667 merged commit fde9c44 into Stranger6667:master Jul 5, 2023

tobz deleted the fix-unevaluated-props-subschema-correctness branch July 5, 2023 18:30

tobz restored the fix-unevaluated-props-subschema-correctness branch July 6, 2023 13:02

uncenter mentioned this pull request Oct 30, 2023

Discrepancy in validation with JSON Schema draft 2020-12 tamasfe/taplo#497

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: `unevaluatedProperties` doesn't correctly handle subschema validation #423

fix: `unevaluatedProperties` doesn't correctly handle subschema validation #423

tobz commented Mar 24, 2023

codecov bot commented Mar 24, 2023 •

edited

Loading

Stranger6667 commented Mar 30, 2023

Stranger6667 Mar 30, 2023

tobz Mar 30, 2023

Stranger6667 Mar 30, 2023

Stranger6667 Mar 30, 2023

tobz commented Mar 30, 2023

tobz commented Mar 30, 2023 •

edited

Loading

Julian commented Apr 2, 2023

tobz commented May 31, 2023

Stranger6667 commented Jul 5, 2023

tobz commented Jul 5, 2023

fix: unevaluatedProperties doesn't correctly handle subschema validation #423

fix: unevaluatedProperties doesn't correctly handle subschema validation #423

Conversation

tobz commented Mar 24, 2023

Context

Solution

Testing

codecov bot commented Mar 24, 2023 • edited Loading

Codecov Report

Stranger6667 commented Mar 30, 2023

Stranger6667 Mar 30, 2023

Choose a reason for hiding this comment

tobz Mar 30, 2023

Choose a reason for hiding this comment

Stranger6667 Mar 30, 2023

Choose a reason for hiding this comment

Stranger6667 Mar 30, 2023

Choose a reason for hiding this comment

tobz commented Mar 30, 2023

tobz commented Mar 30, 2023 • edited Loading

Julian commented Apr 2, 2023

tobz commented May 31, 2023

Stranger6667 commented Jul 5, 2023

tobz commented Jul 5, 2023

fix: `unevaluatedProperties` doesn't correctly handle subschema validation #423

fix: `unevaluatedProperties` doesn't correctly handle subschema validation #423

codecov bot commented Mar 24, 2023 •

edited

Loading

tobz commented Mar 30, 2023 •

edited

Loading