Avoid re-resolving column names after analysis #25240

martint · 2025-03-06T20:31:19Z

Record the fields directly and extract the column name from them when needed.

Follow up to https://github.com/trinodb/trino/pull/24055/files#r1952735457

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

Record the fields directly and extract the column name from them when needed.

ksobolew

This definitely looks like a better solution and one small step towards #17, thanks :) In terms of performance overhead, though, it looks like a wash - one linear algorithm gets replaced by some other linear processing paths. But I guess we can't avoid that.

ksobolew · 2025-03-07T08:15:35Z

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java

+            Set<Field> fields = metadata.getTableSchema(session, tableHandle)
+                    .columns().stream()
+                    .map(column -> Field.newQualified(
+                            node.getTableName(),
+                            Optional.of(column.getName()),
+                            column.getType(),
+                            column.isHidden(),
+                            Optional.of(tableName),
+                            Optional.of(column.getName()),
+                            false))
+                    .collect(Collectors.toSet());


In general, that PR is what I started doing, but this is the place where I got into trouble, because I didn't know where to get a Field instance that I would put into the collection of referenced columns.

kokosing · 2025-03-07T10:30:52Z

core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java

                                            .map(Expression::toString)))
+                            .distinct()


why distinct was moved?

Previously .distinct() worked on String, so it made sense to do it early. Now we're processing Field instances, which are not comparable, so .distinct() is moved to a later stage, where we convert them to ColumnInfo, which are comparable.

This distinct is actually pretty important and is the reason this test is failing:

at io.trino.execution.TestEventListenerBasic.testReferencedTablesWithColumnMask(TestEventListenerBasic.java:799): Expecting actual: ["test_varchar", "test_varchar", "test_bigint"] to contain exactly (and in same order): ["test_varchar", "test_bigint"] but some elements were not expected: ["test_varchar"]

Previously we got the test_varchar field twice, but it was just a name, so it could easily be disambiguated. Now it's a Field, and Field instances are always distinct. Thing is that this field is referenced once as a target of SELECT and another time as the field that has a column mask. In StatementAnalyzer we create a new distinct Field instance each time and it makes it a distinct instance to the one used to register a column mask. So when we are processing the references, only one registered Field instance has a column mask associated, so of the resulting ColumnInfo objects only one has a column mask and the two objects for the field are distinct as well and are not disambiguated.

kokosing · 2025-03-07T10:32:00Z

core/trino-main/src/main/java/io/trino/sql/analyzer/Analyzer.java

+                                    columns.stream()
+                                            .map(Field::getOriginColumnName)
+                                            .map(Optional::get)
+                                            .collect(Collectors.toSet()))));


static import, immutableSet?

kokosing · 2025-03-07T10:32:40Z

core/trino-spi/src/main/java/io/trino/spi/eventlistener/ColumnInfo.java

@@ -46,4 +47,19 @@ public Optional<String> getMask()
    {
        return mask;
    }
+
+    @Override


Should we migrate this class to record?

This is an SPI class, although marked as @Unstable, so we can migrate it to record, but still we should do it carefully

kokosing · 2025-03-07T10:57:51Z

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java

+                            Optional.of(tableName),
+                            Optional.of(column.getName()),
+                            false))
+                    .collect(Collectors.toSet());


immutable set and static import?

It seems this is a convention is this class, not sure we should be changing that (at this time)

kokosing · 2025-03-07T11:01:32Z

In terms of performance overhead, though, it looks like a wash - one linear algorithm gets replaced by some other linear processing paths.

@ksobolew what do you mean? I see resolveColumnMask is gone now so there is no linear search anymore. What am I missing?

ksobolew · 2025-03-07T11:03:35Z

@ksobolew what do you mean? I see resolveColumnMask is gone now so there is no linear search anymore. What am I missing?

Now we linearly converts columns to Fieldin Analyzer and StatementAnalyzer instead.

kokosing · 2025-03-07T11:18:40Z

Yes, but now you are doing this loop only once, and then any lookup in the analysis is constant.

ksobolew · 2025-03-07T11:33:03Z

Yes, but now you are doing this loop only once, and then any lookup in the analysis is constant.

I guess you're right

cla-bot bot added the cla-signed label Mar 6, 2025

martint force-pushed the masks branch 2 times, most recently from 898190f to ef453bb Compare March 7, 2025 00:49

Avoid re-resolving column names after analysis

9fb7cf3

Record the fields directly and extract the column name from them when needed.

martint force-pushed the masks branch from ef453bb to 9fb7cf3 Compare March 7, 2025 02:17

ksobolew reviewed Mar 7, 2025

View reviewed changes

kokosing reviewed Mar 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid re-resolving column names after analysis #25240

Avoid re-resolving column names after analysis #25240

martint commented Mar 6, 2025

ksobolew left a comment

ksobolew Mar 7, 2025

kokosing Mar 7, 2025

ksobolew Mar 12, 2025

ksobolew Mar 12, 2025

kokosing Mar 7, 2025

kokosing Mar 7, 2025

ksobolew Mar 12, 2025

kokosing Mar 7, 2025

ksobolew Mar 12, 2025

kokosing commented Mar 7, 2025

ksobolew commented Mar 7, 2025

kokosing commented Mar 7, 2025

ksobolew commented Mar 7, 2025

Avoid re-resolving column names after analysis #25240

Are you sure you want to change the base?

Avoid re-resolving column names after analysis #25240

Conversation

martint commented Mar 6, 2025

Release notes

ksobolew left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing commented Mar 7, 2025

ksobolew commented Mar 7, 2025

kokosing commented Mar 7, 2025

ksobolew commented Mar 7, 2025