-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIP-44 Make standalone dag file processor works in DB isolation mode #40916
AIP-44 Make standalone dag file processor works in DB isolation mode #40916
Conversation
3d1ab99
to
815b517
Compare
There are some improvements here in InternalApiConfig implementation which is much simpler now @jscheffl and @dstandish. So now choosing whether to use Internal API (trusted components only) or DB (if DB configuration is available in DB isolation mode - like it is in Breeze environment) is purely done in main() + we seet the DB configuration to "none://" - which is handled in case there are forks that are re-inititalizing ORM. |
7589da4
to
a89b5fd
Compare
There is a chance It's going to be green @jscheffl @vincbeck @dstandish -> I have to do a few more "manual" tests but I think that one is a complete "Internal_api" implementation for standalone dag file processor. It can be easily tested with breeze:
Also without isolation:
When you run airflow in isolation mode, it should nicely log what "mode" each component works in - in isolation mode, scheduler and webserver (and generally all regular airflow commands like |
a89b5fd
to
cb45fb4
Compare
Should be merged after #40929 -> found it during testing standalone dag processor case. |
cb45fb4
to
5b919a8
Compare
5b919a8
to
48fc039
Compare
48fc039
to
d1dce60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except the comments from @vincbeck I have no other questions/objections. Assume manual tests not needed as pipeline is checking all changes to make it green.
Still working on best way of solving it. |
d1dce60
to
d631feb
Compare
There were a few missing DB operations in DAGFileProcessor that prevented it to run in DB isolation mode. Those have been refactored and exposed as internal API calls. A bug was fixed in scheduler_job_runner that caused using of next_event before it has been declared (which occured when standalone dag processor is used and db isolation mode. The DB retry will now correctly use logger when it is used as decorator on class method. The "main" code that removes DB connection from configuration (mostly in case of Breeze) when untrusted components are used has been improved to handle the case where DAGFile Processor forks parsing subprocesses. Tmux configuration got improved so that both non-isolation and isolation mode distribute panels better. Simplified InternalApiConfig - "main" directly sets db/internal use in db_isolation mode depending on the component.
d631feb
to
458d8c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Allright @jscheffl @dstandish @vincbeck (and @mhenc) - it looks like we have it. Once i have some one more pair of eyes I think this one concludes the main part (and I think we made it before 2.10 cut-off @ephraimbuddy ?). What nicely work there (at least manually checks): a) airflow in breeze works out-of-the-box with any combination of:
b) Authentication works nicely using separate internal_api_secret_key and JWT signer. c) Performance does not look bad actually at least on local machine - but I am sure it can be optimized further. d) we do not use https:// - but I think this should be described in the documentation that SSL terminatinb proxies shoudl be used in front of internal-api to add SSL (usual practice). We might want to give a big WARNING if someone uses HTTP:// in the client What we will need more is: a) special tests for isolatin mode But those can be even added after the cut-off for 2.10 branch and cherry-picked potentially. I think particularly the Dag File Processor was an interesting one in lights of AIP-72 - I think it can be much more easily mapped almost 1-1 to whatever AIP-72 brings. And nice thing about it that we can tell adventorous users to use the internal API and report any issues they encounter with lack of DB access on the side of worker and parser long before AIP-72 will be out) |
ccL @ashb -> might be useful for AIP-72 ^^ |
@potiuk @kosteev FYI I have run these changes with |
Yeah. The whole DAG parsing is now "outside" of the context manager. And I am quite baffled to why the session would be used even if it is not passed down. It should get fixed if we wrap DAG parsing with the session (again conditionally on internal_api) and possibly even more accurate if that wrapping is done with the engine. |
There were a few missing DB operations in DAGFileProcessor that prevented it to run in DB isolation mode. Those have been refactored and exposed as internal API calls.
A bug was fixed in scheduler_job_runner that caused using of next_event before it has been declared (which occured when standalone dag processor is used and db isolation mode.
The DB retry will now correctly use logger when it is used as decorator on class method.
The "main" code that removes DB connection from configuration (mostly in case of Breeze) when untrusted components are used has been improved to handle the case where DAGFile Processor forks parsing subprocesses.
Tmux configuration got improved so that both non-isolation and isolation mode distribute panels better.
Simplified InternalApiConfig - "main" directly sets db/internal use in db_isolation mode depending on the component.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.