-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid re-resolving column names after analysis #25240
base: master
Are you sure you want to change the base?
Conversation
898190f
to
ef453bb
Compare
Record the fields directly and extract the column name from them when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely looks like a better solution and one small step towards #17, thanks :) In terms of performance overhead, though, it looks like a wash - one linear algorithm gets replaced by some other linear processing paths. But I guess we can't avoid that.
Set<Field> fields = metadata.getTableSchema(session, tableHandle) | ||
.columns().stream() | ||
.map(column -> Field.newQualified( | ||
node.getTableName(), | ||
Optional.of(column.getName()), | ||
column.getType(), | ||
column.isHidden(), | ||
Optional.of(tableName), | ||
Optional.of(column.getName()), | ||
false)) | ||
.collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, that PR is what I started doing, but this is the place where I got into trouble, because I didn't know where to get a Field
instance that I would put into the collection of referenced columns.
.map(Expression::toString))) | ||
.distinct() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why distinct was moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously .distinct()
worked on String
, so it made sense to do it early. Now we're processing Field
instances, which are not comparable, so .distinct()
is moved to a later stage, where we convert them to ColumnInfo
, which are comparable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This distinct
is actually pretty important and is the reason this test is failing:
at io.trino.execution.TestEventListenerBasic.testReferencedTablesWithColumnMask(TestEventListenerBasic.java:799):
Expecting actual:
["test_varchar", "test_varchar", "test_bigint"]
to contain exactly (and in same order):
["test_varchar", "test_bigint"]
but some elements were not expected:
["test_varchar"]
Previously we got the test_varchar
field twice, but it was just a name, so it could easily be disambiguated. Now it's a Field
, and Field
instances are always distinct. Thing is that this field is referenced once as a target of SELECT
and another time as the field that has a column mask. In StatementAnalyzer
we create a new distinct Field
instance each time and it makes it a distinct instance to the one used to register a column mask. So when we are processing the references, only one registered Field
instance has a column mask associated, so of the resulting ColumnInfo
objects only one has a column mask and the two objects for the field are distinct as well and are not disambiguated.
columns.stream() | ||
.map(Field::getOriginColumnName) | ||
.map(Optional::get) | ||
.collect(Collectors.toSet())))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import, immutableSet?
@@ -46,4 +47,19 @@ public Optional<String> getMask() | |||
{ | |||
return mask; | |||
} | |||
|
|||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we migrate this class to record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an SPI class, although marked as @Unstable
, so we can migrate it to record, but still we should do it carefully
Optional.of(tableName), | ||
Optional.of(column.getName()), | ||
false)) | ||
.collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immutable set and static import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is a convention is this class, not sure we should be changing that (at this time)
@ksobolew what do you mean? I see |
Now we linearly converts columns to |
Yes, but now you are doing this loop only once, and then any lookup in the analysis is constant. |
I guess you're right |
Record the fields directly and extract the column name from them when needed.
Follow up to https://github.com/trinodb/trino/pull/24055/files#r1952735457
cc @kokosing @ksobolew
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.