Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Destination) - Couchbase connector 🎉 #2

Open
wants to merge 448 commits into
base: master
Choose a base branch
from

Conversation

teetangh
Copy link
Collaborator

What

Implementation of a Couchbase destination connector that enables Airbyte to write data to Couchbase databases. This connector:

  • Supports all Airbyte sync modes (append, append-dedup, overwrite)
  • Handles schema validation and data type conversions
  • Implements efficient batch writing with retry mechanisms
  • Provides robust error handling and reporting

How

The implementation achieves its goals through several key mechanisms:

  1. Connection Management:

    • Uses PasswordAuthenticator for secure connections
    • Implements connection testing with temporary collection creation
    • Configures appropriate timeouts for operations
  2. Data Handling:

    • Implements batch processing with configurable batch sizes (max 1000)
    • Generates unique document IDs based on sync mode and primary keys
    • Validates records against JSON schema before writing
    • Handles null values intelligently based on schema requirements
  3. Sync Modes:

    • Append: Generates UUID-based document IDs
    • Append-dedup: Uses primary key fields for document IDs
    • Overwrite: Clears existing collection data before sync
  4. Error Handling:

    • Implements exponential backoff retry mechanism
    • Handles document conflicts based on sync mode
    • Provides detailed error reporting via AirbyteTraceMessages

Review guide

  1. destination.py:
    • check(): Connection testing and validation
    • write(): Main data writing implementation
    • _prepare_record(): Record validation and document preparation
    • _flush_buffer(): Batch writing with retry logic
    • _setup_collection(): Collection management
    • Helper methods for ID generation, null handling, etc.

User Impact

Users gain the ability to:

  • Write Airbyte data to Couchbase databases with configurable sync modes
  • Validate data against schemas before writing
  • Configure batch sizes for optimal performance
  • Get detailed error reporting for troubleshooting

Side effects to consider:

  • Overwrite mode clears existing collection data
  • Primary key conflicts in append-dedup mode skip new records
  • Schema validation may reject invalid records
  • Performance depends on batch size configuration

Can this PR be safely reverted and rolled back?

  • YES 💚

The implementation is self-contained within the destination-couchbase connector and doesn't modify any shared infrastructure. Rolling back would only affect this specific connector's functionality.

@teetangh teetangh changed the title Da 547 airbyte couchbase destination (Destination) - Couchbase connector 🎉 Oct 30, 2024
Comment on lines 23 to 56
name: Connector Pre-Release Checks
runs-on: linux-20.04-large # Custom runner, defined in GitHub org settings
if: github.event.pull_request.head.repo.fork != true
if: >
github.event.pull_request.head.repo.fork != true &&
github.event.pull_request.draft == false
timeout-minutes: 22
steps:
- name: Checkout Airbyte
uses: actions/checkout@v4
- name: Check PAT rate limits
run: |
./tools/bin/find_non_rate_limited_PAT \
${{ secrets.GH_PAT_BUILD_RUNNER_OSS }} \
${{ secrets.GH_PAT_BUILD_RUNNER_BACKUP }}
- name: Fetch last commit id from remote branch [PULL REQUESTS]
if: github.event_name == 'pull_request'
id: fetch_last_commit_id_pr
run: echo "commit_id=$(git ls-remote --heads origin refs/heads/${{ github.head_ref }} | cut -f 1)" >> $GITHUB_OUTPUT
- name: Test connectors [PULL REQUESTS]
if: github.event_name == 'pull_request'
uses: ./.github/actions/run-airbyte-ci
with:
context: "pull_request"
dagger_cloud_token: ${{ secrets.DAGGER_CLOUD_TOKEN_CACHE_2 }}
docker_hub_password: ${{ secrets.DOCKER_HUB_PASSWORD }}
docker_hub_username: ${{ secrets.DOCKER_HUB_USERNAME }}
gcp_gsm_credentials: ${{ secrets.GCP_GSM_CREDENTIALS }}
sentry_dsn: ${{ secrets.SENTRY_AIRBYTE_CI_DSN }}
git_branch: ${{ github.head_ref }}
git_revision: ${{ steps.fetch_last_commit_id_pr.outputs.commit_id }}
github_token: ${{ env.PAT }}
s3_build_cache_access_key_id: ${{ secrets.SELF_RUNNER_AWS_ACCESS_KEY_ID }}
s3_build_cache_secret_key: ${{ secrets.SELF_RUNNER_AWS_SECRET_ACCESS_KEY }}
subcommand: "connectors --modified test --only-step=version_inc_check --global-status-check-context='Version increment check for Java connectors' --global-status-check-description='Checking if java connectors modified in this PR got their version bumped'"
subcommand: "connectors --modified test --only-step=version_inc_check --only-step=qa_checks --global-status-check-context='Connectors Pre-Release Check' --global-status-check-description='Checking if connectors modified in this PR are ready for release'"

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions Job or Workflow does not set permissions
airbyteio and others added 27 commits March 2, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants