Tuesday 6 September 2016

Azure Data Lake vs Azure Blob Storage

References:
https://azure.microsoft.com/en-us/pricing/details/data-lake-store/
https://azure.microsoft.com/en-us/pricing/details/storage/
https://azure.microsoft.com/en-gb/documentation/articles/data-lake-store-comparison-with-blob-storage/

I was going through the costing structure of Azure Data Lake and was wondering how its pricing stacks up against the pricing of Azure Blob Storage.

As it turns out there is a good difference and for very good reasons.

First of all, its storage cost is a little more expensive than Azure Blob Storage. But that is expected as Azure Data Lake is designed for storing massive amount of unstructured and semi-structured data and has no practical limit on the size of the data that needs to be stored.

Azure Data Lake:

Azure Blob Storage:


So the storage cost is about 10 times per GB for ZRS and comparable for other expensive options like GRS and RA-GRS. If you are keeping data in a single zone and you don't plan to store the whole internet (what?), Azure Blob Storage is definitely cheaper.

Now, let us look at transaction costs:

Azure Data Lake:

Point to note: 1 transaction means reading/writing a 128 KB chunk.

Azure Blob Storage:
        $0.0036 per 100,000 transactions for Block Blobs. Transactions include both read and write operations to storage.

So how do you choose:

1. If you are just piling up unstructured data with the requirement of frequent and fast retrieval, go for Azure Blob Storage.
2. If you want to run analytics (ADAL jobs) on stored data, go for Azure Data Lake.
3. If you want to do both frequent and fast data retrieval and perform analytics, duplicate the data in both stores. There is no either/or scenario here. With current state of PaaS services, it seems to be the only way right now.

7 comments:

  1. Hi, just wanted to tell you, I enjoyed this blog post. It was funny. Keep on posting! Such a lovely blog you have shared here with us. Really nice.
    _________________________
    Cloudera

    ReplyDelete
  2. very informative blog and useful article thank you for sharing with us , keep posting learn more
    Azure Online Course

    ReplyDelete
  3. I found this post when trying to understand why Azure Data Lake Gen. 2 prices are about two times cheaper compared to Azure Blob Storage v2. Do you know why it might be so?

    ReplyDelete
  4. At its core,Azure Data Lake is a data storage service, so you might be wondering what it has to do with Azure Blob Storage. Well, Data Lake Service is definitely a lot more than just storage, as it can perform all sorts of data processing, including ETL, and it supports all of the popular data formats, including JSON, CSV, and Apache Parquet. In fact, Azure Data Lake has its own open source SDK that lets you build applications that can store and access data directly from the blob storage layer, meaning you can use the same tools to access cloud and on-premises storage.

    ReplyDelete
  5. Over the last few years, the world has seen a meteoric rise in the amount of data we capture about our lives. From simple social media posts to the thousands of hours of video uploaded to YouTube, billions of people around the world now share an unprecedented amount of data about who we are, what we do, and what we think. This explosion ofdata technologies are now being used to make better decisions by businesses and governments alike, and has driven the rise of a new industry focused on turning Big Data into actionable insights. As a result, there are now more ways to store, process, and analyze data than ever before.

    ReplyDelete