Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid re-resolving column names after analysis #25240

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 8 additions & 27 deletions core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Original file line number Diff line number Diff line change
@@ -110,7 +110,6 @@
import java.util.Optional;
import java.util.OptionalLong;
import java.util.Set;
import java.util.stream.Stream;

import static com.google.common.base.MoreObjects.toStringHelper;
import static com.google.common.base.Preconditions.checkArgument;
@@ -119,7 +118,6 @@
import static com.google.common.collect.ImmutableList.toImmutableList;
import static com.google.common.collect.ImmutableMap.toImmutableMap;
import static com.google.common.collect.ImmutableSet.toImmutableSet;
import static com.google.common.collect.MoreCollectors.toOptional;
import static io.trino.sql.analyzer.QueryType.DESCRIBE;
import static io.trino.sql.analyzer.QueryType.EXPLAIN;
import static java.lang.Boolean.FALSE;
@@ -156,7 +154,7 @@ public class Analysis
private final Map<NodeRef<Expression>, ResolvedField> columnReferences = new LinkedHashMap<>();

// a map of users to the columns per table that they access
private final Map<AccessControlInfo, Map<QualifiedObjectName, Set<String>>> tableColumnReferences = new LinkedHashMap<>();
private final Map<AccessControlInfo, Map<QualifiedObjectName, Set<Field>>> tableColumnReferences = new LinkedHashMap<>();

// Record fields prefixed with labels in row pattern recognition context
private final Map<NodeRef<Expression>, Optional<String>> labels = new LinkedHashMap<>();
@@ -961,10 +959,10 @@ public UnnestAnalysis getUnnest(Unnest node)
return unnestAnalysis.get(NodeRef.of(node));
}

public void addTableColumnReferences(AccessControl accessControl, Identity identity, Multimap<QualifiedObjectName, String> tableColumnMap)
public void addTableColumnReferences(AccessControl accessControl, Identity identity, Multimap<QualifiedObjectName, Field> tableColumnMap)
{
AccessControlInfo accessControlInfo = new AccessControlInfo(accessControl, identity);
Map<QualifiedObjectName, Set<String>> references = tableColumnReferences.computeIfAbsent(accessControlInfo, k -> new LinkedHashMap<>());
Map<QualifiedObjectName, Set<Field>> references = tableColumnReferences.computeIfAbsent(accessControlInfo, k -> new LinkedHashMap<>());
tableColumnMap.asMap()
.forEach((key, value) -> references.computeIfAbsent(key, k -> new HashSet<>()).addAll(value));
}
@@ -1092,7 +1090,7 @@ public JsonTableAnalysis getJsonTableAnalysis(JsonTable jsonTable)
return jsonTableAnalyses.get(NodeRef.of(jsonTable));
}

public Map<AccessControlInfo, Map<QualifiedObjectName, Set<String>>> getTableColumnReferences()
public Map<AccessControlInfo, Map<QualifiedObjectName, Set<Field>>> getTableColumnReferences()
{
return tableColumnReferences;
}
@@ -1188,11 +1186,11 @@ public List<TableInfo> getReferencedTables()
.map(tablesToColumns -> tablesToColumns.get(tableName))
.filter(Objects::nonNull)
.flatMap(Collection::stream)
.distinct()
.map(fieldName -> new ColumnInfo(
fieldName,
resolveColumnMask(table.getNode().getName(), fieldName, columnMasks.getOrDefault(table, ImmutableMap.of()))
.map(field -> new ColumnInfo(
field.getOriginColumnName().get(),
Optional.ofNullable(columnMasks.getOrDefault(table, ImmutableMap.of()).get(field))
.map(Expression::toString)))
.distinct()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why distinct was moved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously .distinct() worked on String, so it made sense to do it early. Now we're processing Field instances, which are not comparable, so .distinct() is moved to a later stage, where we convert them to ColumnInfo, which are comparable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This distinct is actually pretty important and is the reason this test is failing:

	at io.trino.execution.TestEventListenerBasic.testReferencedTablesWithColumnMask(TestEventListenerBasic.java:799):

Expecting actual:
  ["test_varchar", "test_varchar", "test_bigint"]
to contain exactly (and in same order):
  ["test_varchar", "test_bigint"]
but some elements were not expected:
  ["test_varchar"]

Previously we got the test_varchar field twice, but it was just a name, so it could easily be disambiguated. Now it's a Field, and Field instances are always distinct. Thing is that this field is referenced once as a target of SELECT and another time as the field that has a column mask. In StatementAnalyzer we create a new distinct Field instance each time and it makes it a distinct instance to the one used to register a column mask. So when we are processing the references, only one registered Field instance has a column mask associated, so of the resulting ColumnInfo objects only one has a column mask and the two objects for the field are distinct as well and are not disambiguated.

.collect(toImmutableList());

TableEntry info = entry.getValue();
@@ -1212,23 +1210,6 @@ public List<TableInfo> getReferencedTables()
.collect(toImmutableList());
}

private static Optional<Expression> resolveColumnMask(QualifiedName tableName, String fieldName, Map<Field, Expression> expressions)
{
QualifiedName qualifiedFieldName = concatIdentifier(tableName, fieldName);
return expressions.entrySet().stream()
.filter(fieldExpression -> fieldExpression.getKey().canResolve(qualifiedFieldName))
.collect(toOptional())
.map(Map.Entry::getValue);
}

private static QualifiedName concatIdentifier(QualifiedName tableName, String fieldName)
{
return QualifiedName.of(Stream.concat(
tableName.getOriginalParts().stream(),
Stream.of(new Identifier(fieldName)))
.collect(toImmutableList()));
}

public List<RoutineInfo> getRoutines()
{
return resolvedFunctions.values().stream()
Original file line number Diff line number Diff line change
@@ -34,6 +34,8 @@

import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collectors;

import static io.trino.spi.StandardErrorCode.EXPRESSION_NOT_SCALAR;
import static io.trino.sql.analyzer.ExpressionTreeUtils.extractAggregateFunctions;
@@ -105,7 +107,10 @@ public Analysis analyze(Statement statement, QueryType queryType)
accessControlInfo.getAccessControl().checkCanSelectFromColumns(
accessControlInfo.getSecurityContext(session.getRequiredTransactionId(), session.getQueryId(), session.getStart()),
tableName,
columns)));
columns.stream()
.map(Field::getOriginColumnName)
.map(Optional::get)
.collect(Collectors.toSet()))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import, immutableSet?

}

return analysis;
Original file line number Diff line number Diff line change
@@ -328,7 +328,7 @@ public class ExpressionAnalyzer
// For lambda argument references, maps each QualifiedNameReference to the referenced LambdaArgumentDeclaration
private final Map<NodeRef<Identifier>, LambdaArgumentDeclaration> lambdaArgumentReferences = new LinkedHashMap<>();
private final Set<NodeRef<FunctionCall>> windowFunctions = new LinkedHashSet<>();
private final Multimap<QualifiedObjectName, String> tableColumnReferences = HashMultimap.create();
private final Multimap<QualifiedObjectName, Field> tableColumnReferences = HashMultimap.create();

// Track referenced fields from source relation node
private final Multimap<NodeRef<Node>, Field> referencedFields = HashMultimap.create();
@@ -578,7 +578,7 @@ public Set<NodeRef<FunctionCall>> getWindowFunctions()
return unmodifiableSet(windowFunctions);
}

public Multimap<QualifiedObjectName, String> getTableColumnReferences()
public Multimap<QualifiedObjectName, Field> getTableColumnReferences()
{
return tableColumnReferences;
}
@@ -766,7 +766,7 @@ private Type handleResolvedField(Expression node, ResolvedField resolvedField, C
}

if (field.getOriginTable().isPresent() && field.getOriginColumnName().isPresent()) {
tableColumnReferences.put(field.getOriginTable().get(), field.getOriginColumnName().get());
tableColumnReferences.put(field.getOriginTable().get(), field);
}

sourceFields.add(field);
Original file line number Diff line number Diff line change
@@ -886,12 +886,24 @@ protected Scope visitAnalyze(Analyze node, Optional<Scope> scope)

analysis.setAnalyzeMetadata(metadata.getStatisticsCollectionMetadata(session, tableHandle, analyzeProperties));

Set<Field> fields = metadata.getTableSchema(session, tableHandle)
.columns().stream()
.map(column -> Field.newQualified(
node.getTableName(),
Optional.of(column.getName()),
column.getType(),
column.isHidden(),
Optional.of(tableName),
Optional.of(column.getName()),
false))
.collect(Collectors.toSet());
Comment on lines +889 to +899
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, that PR is what I started doing, but this is the place where I got into trouble, because I didn't know where to get a Field instance that I would put into the collection of referenced columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

immutable set and static import?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is a convention is this class, not sure we should be changing that (at this time)


// user must have read and insert permission in order to analyze stats of a table
analysis.addTableColumnReferences(
accessControl,
session.getIdentity(),
ImmutableMultimap.<QualifiedObjectName, String>builder()
.putAll(tableName, metadata.getColumnHandles(session, tableHandle).keySet())
ImmutableMultimap.<QualifiedObjectName, Field>builder()
.putAll(tableName, fields)
.build());
try {
accessControl.checkCanInsertIntoTable(session.toSecurityContext(), tableName);
@@ -3856,7 +3868,7 @@ private void recordColumnAccess(Field field)
analysis.addTableColumnReferences(
accessControl,
session.getIdentity(),
ImmutableMultimap.of(field.getOriginTable().get(), field.getOriginColumnName().get()));
ImmutableMultimap.of(field.getOriginTable().get(), field));
}
}

Original file line number Diff line number Diff line change
@@ -17,6 +17,7 @@
import com.fasterxml.jackson.annotation.JsonProperty;
import io.trino.spi.Unstable;

import java.util.Objects;
import java.util.Optional;

/**
@@ -46,4 +47,19 @@ public Optional<String> getMask()
{
return mask;
}

@Override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we migrate this class to record?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an SPI class, although marked as @Unstable, so we can migrate it to record, but still we should do it carefully

public boolean equals(Object o)
{
if (!(o instanceof ColumnInfo that)) {
return false;
}
return Objects.equals(column, that.column) && Objects.equals(mask, that.mask);
}

@Override
public int hashCode()
{
return Objects.hash(column, mask);
}
}

Unchanged files with check annotations Beta

assertAccessDenied("SHOW STATS FOR (SELECT *, nationkey FROM nation)", "Cannot select from columns \\[nationkey, regionkey, name, comment] in table or view .*.nation", privilege("nation.nationkey", SELECT_COLUMN));
assertAccessDenied("SHOW STATS FOR (SELECT *, * FROM nation)", "Cannot select from columns \\[nationkey, regionkey, name, comment] in table or view .*.nation", privilege("nation.nationkey", SELECT_COLUMN));
assertAccessDenied("SHOW STATS FOR (SELECT linenumber, orderkey FROM lineitem)", "Cannot select from columns \\[linenumber, orderkey] in table or view .*.lineitem.*", privilege("lineitem", SELECT_COLUMN));
assertAccessDenied("SHOW STATS FOR (SELECT linenumber, orderkey, quantity FROM lineitem)", "Cannot select from columns \\[linenumber, orderkey, quantity] in table or view .*.lineitem.*", privilege("lineitem.linenumber", SELECT_COLUMN), privilege("lineitem.orderkey", SELECT_COLUMN));

Check failure on line 366 in testing/trino-tests/src/test/java/io/trino/security/TestAccessControl.java

GitHub Actions / test (testing/trino-tests)

TestAccessControl.testAccessControl

[Query: SHOW STATS FOR (SELECT linenumber, orderkey, quantity FROM lineitem)] Expecting message: "Access Denied: Cannot select from columns [linenumber, quantity, orderkey] in table or view blackhole.default.lineitem" to match regex: ".*Access Denied: Cannot select from columns \[linenumber, orderkey, quantity] in table or view .*.lineitem.*" but did not. Throwable that failed the check: io.trino.testing.QueryFailedException: Access Denied: Cannot select from columns [linenumber, quantity, orderkey] in table or view blackhole.default.lineitem at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:138) at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:565) at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:548) at io.trino.testing.AbstractTestQueryFramework.lambda$executeExclusively$11(AbstractTestQueryFramework.java:486) at io.trino.testing.AbstractTestQueryFramework.executeExclusively(AbstractTestQueryFramework.java:607) at io.trino.testing.AbstractTestQueryFramework.executeExclusively(AbstractTestQueryFramework.java:483) at io.trino.testing.AbstractTestQueryFramework.lambda$assertException$12(AbstractTestQueryFramework.java:519) at org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:63) at org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:904) at org.assertj.core.api.Assertions.catchThrowable(Assertions.java:1472) at org.assertj.core.api.Assertions.assertThatThrownBy(Assertions.java:1315) at io.trino.testing.AbstractTestQueryFramework.assertException(AbstractTestQueryFramework.java:519) at io.trino.testing.AbstractTestQueryFramework.assertAccessDenied(AbstractTestQueryFramework.java:505) at io.trino.testing.AbstractTestQueryFramework.assertAccessDenied(AbstractTestQueryFramework.java:496) at io.trino.security.TestAccessControl.testAccessControl(TestAccessControl.java:366) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:775) at org.junit.platform.commons.support.ReflectionSupport.invokeMethod(ReflectionSupport.java:479) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:161) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:152) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:91) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:112) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:94) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:93) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:87) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:216) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupi
assertAccessDenied("SHOW STATS FOR (SELECT nationkey FROM nation)", "Cannot select from columns \\[nationkey] in table or view .*.nation.*", privilege("nation", SELECT_COLUMN));
assertAccessDenied("SHOW STATS FOR (SELECT * FROM nation)", "Cannot select from columns \\[nationkey, regionkey, name, comment] in table or view .*.nation.*", privilege("nation", SELECT_COLUMN));
}
.hasCatalogSchemaTable("tpch", "tiny", "nation")
.hasAuthorization("alice")
.isNotDirectlyReferenced()
.hasColumnsWithoutMasking("nationkey", "regionkey", "name", "comment")

Check failure on line 513 in testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java

GitHub Actions / test (testing/trino-tests)

TestEventListenerBasic.testReferencedTablesWithMaterializedViewsFresh

Expecting actual: ["nationkey", "name", "regionkey", "comment"] to contain exactly (and in same order): ["nationkey", "regionkey", "name", "comment"] but there were differences at these indexes: - element at index 1: expected "regionkey" but was "name" - element at index 2: expected "name" but was "regionkey"
.hasNoRowFilters()
.hasTableReferencesSatisfying(tableRef -> assertThat(tableRef).asMaterializedViewInfo().hasCatalogSchemaView("mock", "default", "test_materialized_view_fresh"));
.hasCatalogSchemaTable("tpch", "tiny", "nation")
.hasAuthorization("user")
.isDirectlyReferenced()
.hasColumnsWithoutMasking("nationkey", "regionkey", "name", "comment")

Check failure on line 581 in testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java

GitHub Actions / test (testing/trino-tests)

TestEventListenerBasic.testReferencedTablesInCreateView

Expecting actual: ["regionkey", "nationkey", "comment", "name"] to contain exactly (and in same order): ["nationkey", "regionkey", "name", "comment"] but there were differences at these indexes: - element at index 0: expected "nationkey" but was "regionkey" - element at index 1: expected "regionkey" but was "nationkey" - element at index 2: expected "name" but was "comment" - element at index 3: expected "comment" but was "name"
.hasNoRowFilters()
.hasNoTableReferences();
}
.hasCatalogSchemaTable("tpch", "tiny", "nation")
.hasAuthorization("user")
.isDirectlyReferenced()
.hasColumnsWithoutMasking("nationkey", "regionkey", "name", "comment")

Check failure on line 609 in testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java

GitHub Actions / test (testing/trino-tests)

TestEventListenerBasic.testReferencedTablesInCreateMaterializedView

Expecting actual: ["name", "nationkey", "regionkey", "comment"] to contain exactly (and in same order): ["nationkey", "regionkey", "name", "comment"] but there were differences at these indexes: - element at index 0: expected "nationkey" but was "name" - element at index 1: expected "regionkey" but was "nationkey" - element at index 2: expected "name" but was "regionkey"
.hasNoRowFilters()
.hasNoTableReferences();
}
.hasCatalogSchemaTable("mock", "default", "test_table_with_column_mask")
.hasAuthorization("user")
.isDirectlyReferenced()
.hasColumnNames("test_varchar", "test_bigint")

Check failure on line 799 in testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java

GitHub Actions / test (testing/trino-tests)

TestEventListenerBasic.testReferencedTablesWithColumnMask

Expecting actual: ["test_varchar", "test_varchar", "test_bigint"] to contain exactly (and in same order): ["test_varchar", "test_bigint"] but some elements were not expected: ["test_varchar"]
.hasColumnMasks("(SELECT CAST(max(orderkey) AS varchar(15)) FROM orders)", null)
.hasNoRowFilters()
.hasNoTableReferences();