-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider Adding Cache Layer to Reduce Requests to Trino and Improve Response Time #606
Comments
Actually just two weeks ago I built something like this into the Gateway as a hackday project :) I think it's a great idea -- at LinkedIn we see the same situation as what you described, where there are many instances of duplicate queries doing redundant computation. Dashboarding solutions hooked up to Trino definitely cause this -- maybe a bunch of different people are loading the same dashboard, or you've edited one chart on a dashboard and when it refreshes it reloads all of the charts even though most of them haven't changed. Given that Trino queries typically have very small output resultset sizes, the cache should be able to be fairly small. In my POC, I stored the results into the same DB that the GW uses for all of its other operational data. We could consider doing this only for queries with very small outputs, and use a different, more scalable storage for queries with larger result sets. One area I'm also super interested in is using the new spooling client protocol to implement this cache in a zero-copy manner, at least for clients which leverage the spooled protocol. I think the hardest part of this is probably determining how long the cache the results for and when to invalidate the cache. In my POC I just had a fixed TTL and use the exact query text as the cache key. But I think to make this work well, we'd have to add much smarter invalidation logic. We could look for nondeterministic functions (e.g. As an even further extension, what really gets me excited is that once we're persisting the query results, we could use that to implement query-level retries in the Gateway, pulling the same functionality as Trino FTE's |
@xkrogen just out of curiosity can you possibly share the code? |
I think I should be able to share a branch, though warning that is veryyyy far from being production-ready :) I will try to share next week. |
@xkrogen thanks! I was thinking about implementing this as part of the routing rules engine (before stumbling upon this issue). Would it be acceptable to implement this as part of the actual gateway service? Was also leaning towards using Redis as the caching db... |
I would prefer implementing caching at Trino and making the gateway cache-aware to having the gateway implement caching directly. This would have the advantage of supporting data caching (like https://trino.io/docs/current/object-storage/file-system-cache.html) and subquery caching as well as result set caching. This paper has some interesting ideas: https://storage.googleapis.com/gweb-research2023-media/pubtools/5018.pdf Caching at the gateway layer introduces security concerns that we do not have today.
|
Agreed, though doing auth in the GW seems reasonable.
Yes, agreed that a shared cache introduces a lot of complexity. In my POC I have limited the cache in scope to be per-user to sidestep such issues; a user can only access their own cached results. I'm sure other environments will vary, but I found that in our environment this actually only hurts the cache efficacy/hit-rate by a few %.
I would agree with you except for this point I called out in my earlier comment:
This one is really the motivating factor for me -- |
We discussed this in the dev sync as well. I strongly agree with @xkrogen .. per user cache will already solve a lot of dashboard and service user refresh scenarios and is worth doing just for that use case only. Roey will write up more details and potentially work in PoC .. but we should totally collaborate on that. At the same time we do need to be aware of the security issues. We can probably grow functionality of it all over time. And lastly ... we know from trino-lb that issuing query id in the lb, holding on to user while clusters scale up, or start up, and holding on to the request until then are ready work .. we eventually should get there and we have the db backend to store the queue so we should be able to move towards that as well. |
@RoeyoOgen here's my branch. Lots of room for improvement!! https://github.com/xkrogen/trino-gateway/tree/xkrogen/resultset-caching-poc |
Before I forget .. we should use an open source db for caching - so probably valkey rather than redis |
Thanks! Taking all your comments on board and will start to tinker with a design on my part :) |
So a bit of an update. Regarding the point we made about doing so just for Dashboard refreshes
@mosabua do you still think this should be solved at the GW level and not the cluster one? Also @willmostly
Why does the GW need to be "cache-aware"? will the Trino cluster not just return the cached result immediately? - or do we want this just to minimise the number of queries arriving to the cluster? |
For each of future readers, here is a sequence diagram showing the communication flow with the POC from my branch. The same query is submitted twice; the first populates the cache, and the second returns responses from the cache. Note that the first flow (cache population) is unchanged from the GW's current behavior except for the addition of As @RoeyoOgen notes, for cached responses, the Trino backend is completely out of the loop; responses are returned directly from the GW itself. sequenceDiagram
User->>+Client: "SELECT a FROM tbl"
Client->>+GW: POST /v1/statement "SELECT a FROM tbl"
GW->>+Backend: POST /v1/statement "SELECT a FROM tbl"
Backend->>-GW: {"id": "123", "state": "QUEUED", "nextUri": "/v1/statement/queued/123/xxx/1"}
GW->>Database: INSERT {"queryID": "123", pageID: "POST", content: "..."}
GW->>-Client: {"id": "123", "state": "QUEUED", "nextUri": "/v1/statement/queued/123/xxx/1"}
Client->>+GW: GET /v1/statement/queued/123/xxx/1
GW->>+Backend: GET /v1/statement/queued/123/xxx/1
Backend->>-GW: {"id": "123", "state": "RUNNING", "nextUri": "/v1/statement/executing/123/xxx/0"}
GW->>Database: INSERT {"queryID": "123", pageID: "queued/1", content: "..."}
GW->>-Client: {"id": "123", "state": "RUNNING", "nextUri": "/v1/statement/executing/123/xxx/0"}
Client->>+GW: GET /v1/statement/executing/123/xxx/0
GW->>+Backend: GET /v1/statement/executing/123/xxx/0
Backend->>-GW: {"id": "123", "state": "RUNNING", "nextUri": "/v1/statement/executing/123/xxx/1", "data": ["a", "b"]}
GW->>Database: INSERT {"queryID": "123", pageID: "executing/0", content: "..."}
GW->>-Client: {"id": "123", "state": "RUNNING", "nextUri": "/v1/statement/executing/123/xxx/1", "data": ["a", "b"]}
Client->>+GW: GET /v1/statement/executing/123/xxx/1
GW->>+Backend: GET /v1/statement/executing/123/xxx/1
Backend->>-GW: {"id": "123", "state": "FINISHED", "nextUri": "/v1/statement/executing/123/xxx/2", "data": ["c", "d"]}
GW->>Database: INSERT {"queryID": "123", pageID: "executing/1", content: "..."}
GW->>-Client: {"id": "123", "state": "FINISHED", "nextUri": "/v1/statement/executing/123/xxx/2", "data": ["c", "d"]}
Client->>+GW: GET /v1/statement/executing/123/xxx/2
GW->>+Backend: GET /v1/statement/executing/123/xxx/2
Backend->>-GW: {"id": "123", "state": "FINISHED"}
GW->>Database: INSERT {"queryID": "123", pageID: "executing/2", content: "..."}
GW->>Database: INSERT {"queryID": "123", completed: TRUE}
GW->>-Client: {"id": "123", "state": "FINISHED"}
Client->>-User: ["a", "b", "c", "d"]
User->>+Client: "SELECT a FROM tbl"
Client->>+GW: POST /v1/statement "SELECT a FROM tbl"
GW->>+Database: SELECT {"queryID": "123", pageID: "POST"}
Database->>-GW: content: {...}
GW->>-Client: {"id": "123", "state": "QUEUED", "nextUri": "/v1/statement/queued/123/xxx/1"}
Client->>+GW: GET /v1/statement/queued/123/xxx/1
GW->>+Database: SELECT {"queryID": "123", pageID: "queued/1"}
Database->>-GW: content: {...}
GW->>-Client: {"id": "123", "state": "RUNNING", "nextUri": "/v1/statement/executing/123/xxx/0"}
Client->>+GW: GET /v1/statement/executing/123/xxx/0
GW->>+Database: SELECT {"queryID": "123", pageID: "executing/0"}
Database->>-GW: content: {...}
GW->>-Client: {"id": "123", "state": "RUNNING", "nextUri": "/v1/statement/executing/123/xxx/1", "data": ["a", "b"]}
Client->>+GW: GET /v1/statement/executing/123/xxx/1
GW->>+Database: SELECT {"queryID": "123", pageID: "executing/1"}
Database->>-GW: content: {...}
GW->>-Client: {"id": "123", "state": "FINISHED", "nextUri": "/v1/statement/executing/123/xxx/2", "data": ["c", "d"]}
Client->>+GW: GET /v1/statement/executing/123/xxx/2
GW->>+Database: SELECT {"queryID": "123", pageID: "executing/2"}
Database->>-GW: content: {...}
GW->>-Client: {"id": "123", "state": "FINISHED"}
Client->>-User: ["a", "b", "c", "d"]
Can you elaborate more on your opinion here @RoeyoOgen ? |
@xkrogen I think you may have copied the first query's drawing for the 2nd query as well but i get your point anyways :) Lets focus on the way the gateway works today from high level (From my understanding and testing): When the client (user) submits the query the Gateway routes it to the correct cluster but the Trino cluster itself returns the response to the client ("current state" below). In the current suggestion of adding caching, the response is returned via the gateway . This can be problematic for various reasons. With the main one being that the GW cluster will need to have enough resources to save the response and return to the user not only from one cluster but essentially from all of them. Meaning it needs to be as big as all the coordinators and becoming this "mega coordinator" ![]() |
Some points:
Both of these aspects mean in my opinion that caching at the Trino Gateway layer does in fact make sense. Even if we just implement a naive, per-user cache. As long as its an optional thing we can disable it and we can also see what usage scenario it works well in and in which it does not. Caching on the Trino cluster level still make sense as well in parallel. I think they can be complementary and as long as we have both configurable users can adjust to what works for them. Now to one more point that does change things quite a bit - new spooling protocol. We have not really tested this much but it will most likely work just fine. The protocol has multiple operating modes and potentially reduces the time to query completion from the coordinators point of view quite a lot. It also reduces the load on the coordinator a lot. Potentially it also requires client tools to have access to the object storage use for spooling .. or the workers. So depending on the scenario we might have to adjust some aspects in Trino Gateway .. all that work still has to be started .. well.. in fact .. testing and playing around with it has to be started. However I dont think it should stop us from implementing a cache for Trino Gateway.. |
Also .. to be explict: @RoeyoOgen saying
that is NOT the suggested config as documented.. but yes.. it is also possible to do that. |
@mosabua
In our case it's actually a big necessity. Since we have on-prem kubernetes clusters we don't mind that the client has knowledge of the Trino backends, it is however important that the query result are not returned via the GW. |
I'm not sure what you mean. The 2nd query follows a different flow. The communication between client and GW is identical, which is intentional -- the responses are replayed. But the communication between GW and backend has been replaced with communication between GW and DB. Anyway, +1 with @mosabua's commentary above. A lot of functionality, including caching, can be added by using the GW as a full proxy. I agree that if you cannot leverage this setup, then caching in the GW won't work for you. |
@RoeyoOgen I agree that we also want to continue to allow the usage mode as you mention where follow ups go to the cluster .. but they should not be the default use .. basically just what we have now. Spooling also puts another wrinkle on that since there are modes where it goes through the coordinator, through the workers, or straight to storage .. all of them are reasonable in some scenarios and all should ultimately work with Trino Gateway. I think the main thing we are lacking for all that is documentation to explain and help with configuration for the different scenarios. |
We’ve noticed that multiple requests are being made to Trino, which can sometimes result in slow response times due to the need to recalculate results repeatedly. To improve performance, we would like to suggest implementing a cache layer to store intermediate results and avoid unnecessary requests to Trino for the same data.
Would love to hear if this is something that could be implemented or if there are any existing solutions in place for this. Do you have any recommendations or suggestions on how we might approach this?
The text was updated successfully, but these errors were encountered: