azure data lake storage
9 TopicsControl geo failover for ADLS and SFTP with unplanned failover.
We are excited to announce the General Availability of customer managed unplanned failover for Azure Data Lake Storage and storage accounts with SSH File Transfer Protocol (SFTP) enabled. What is Unplanned Failover? With customer managed unplanned failover, you are in control of initiating your failover. Unplanned failover allows you to switch your storage endpoints from the primary region to the secondary region. During an unplanned failover, write requests are redirected to the secondary region, which then becomes the new primary region. Because an unplanned failover is designed for scenarios where the primary region is experiencing an availability issue, unplanned failover happens without the primary region fully completing replication to the secondary region. As a result, during an unplanned failover there is a possibility of data loss. This loss depends on the amount of data that has yet to be replicated from the primary region to the secondary region. Each storage account has a ‘last sync time’ property, which indicates the last time a full synchronization between the primary and the secondary region was completed. Any data written between the last sync time and the current time may only be partially replicated to the secondary region, which is why unplanned failover may incur data loss. Unplanned failover is intended to be utilized during a true disaster where the primary region is unavailable. Therefore, once completed, the data in the original primary region is erased, the account is changed to locally redundant storage (LRS) and your applications can resume writing data to the storage account. If the previous primary region becomes available again, you can convert your account back to geo-redundant storage (GRS). Migrating your account from LRS to GRS will initiate a full data replication from the new primary region to the secondary which has geo-bandwidth costs. If your scenario involves failing over while the primary region is still available, consider planned failover. Planned failover can be utilized in scenarios including planned disaster recovery testing or recovering from non-storage related outages. Unlike unplanned failover, the storage service endpoints must be available in both the primary and secondary regions before a planned failover can be initiated. This is because planned failover is a 3-step process that includes: (1) making the current primary read only, (2) syncing all the data to the secondary (ensuring no data loss), and (3) swapping the primary and secondary regions so that writes are now in the new region. In contrast with unplanned failover, planned failover maintains the geo-redundancy of the account so planned failback does not require a full data copy. To learn more about planned failover and how it works view, Public Preview: Customer Managed Planned Failover for Azure Storage | Microsoft Community Hub To learn more about each failover option and the primary use case for each view, Azure storage disaster recovery planning and failover - Azure Storage | Microsoft Learn How to get started? Getting started is simple, to learn more about the step-by-step process to initiate an unplanned failover review the documentation: Initiate a storage account failover - Azure Storage | Microsoft Learn Feedback If you have questions or feedback, reach out at storagefailover@service.microsoft.com135Views0likes0CommentsDremio Cloud on Microsoft Azure enables customers to drive value from their data more easily
Dremio Cloud on Microsoft Azure enables customers to drive value from their data more easily. It helps overcoming challenges of grown data lake and database landscapes. Particularly in hybrid environments it allows to shield change from business while at the same time tightening security and eases application integration.2.4KViews1like0CommentsCopy Dataverse data from ADLS Gen2 to Azure SQL DB leveraging Azure Synapse Link
A new template has been added to the ADF and Azure Synapse Pipelines template gallery. This template allows you to copy data from ADLS (Azure Data Lake Storage) Gen2 account to an Azure SQL Database.8.5KViews1like1CommentUnable to load large delta table in azure ml studio
I am writing to report an issue that I am currently experiencing while trying to read a delta table from Azure ML. I have already created data assets to register the delta table, which is located at an ADLS location. However, when attempting to load the data, I have noticed that for large data sizes it is taking an exceedingly long time to load. I have confirmed that for small data sizes, the data is returned within few seconds, which leads me to believe that there may be an issue with the scalability of the data loading process. I would greatly appreciate it if you could investigate this issue and provide me with any recommendations or solutions to resolve this issue. I can provide additional details such as the size of the data, the steps I am taking to load the data, and any error messages if required. I'm following this document: https://learn.microsoft.com/en-us/python/api/mltable/mltable.mltable?view=azure-ml-py#mltable-mltable-from-delta-lake Using this command to read delta table using data asset URI from mltable import from_delta_lake mltable_ts = from_delta_lake(delta_table_uri=<DATA ASSET URI>, timestamp_as_of="2999-08-26T00:00:00Z", include_path_column=True)517Views0likes0CommentsAzure Function with Blob Output Binding returning 404 on GetProperties check before writing the Blob
Hi. This question is similar: https://stackoverflow.com/questions/64546302/how-to-disable-blob-existence-check-in-azure-function-output-binding. But I'm wondering if there are other answers or comments out there, and more recent. I have an Azure Function with an HTTP Trigger input binding and a Blob Storage output binding. For every execution, the output binding looks like it tries to get Blob Properties first, resulting in a 404. Quite rightly, as the data to be written is going to a new Blob. But this will always fail and in this case is redundant. It takes time to go through these steps - admittedly milliseconds, but still. Presumably it's also logging somewhere, so that's a storage cost - might be negligible now, but something to not be ignored. I'm not 100% sure where that logging would be stored, either, to go and manage it. The positive is that the overall function execution is fine. But it's still recording all these failures, and we're getting 10s of thousands through it a day. Is there a way to use the concise output binding code but not do this prior if-exists-get-properties check? My options seem to be live with it, or rewrite to use BlobContainer, BlobClient and so on instead of the Blob attribute output binding. Anyone got some clever ideas?1.8KViews0likes1CommentThe Dremio Open Lakehouse Platform and Microsoft Provide a Solution for Cloud Data Analytics
Cloud data lakes represent the primary storage destination for a growing volume and variety of data. For Microsoft customers, Azure Data Lake Storage (ADLS) provides a flexible, scalable, cost-effective, secure, cloud-native analytics file system for a variety of data sources. The challenge for many organizations is making that data available for Business Intelligence (BI) and reporting. In this article, I’ll share how the Dremio Open Lakehouse Platform simplifies data architectures and accelerates access to insights on ADLS, and enables ad hoc analysis and exploration with Power BI.12KViews1like0Comments