I was talking to a fellow architect who wanted to know how to store data for Analysis . Presented & walked him through the two standard Azure soltutions, one based on SQL DW & other using the new Azure Data Lake Storage Gen2 . The Arcitecture for Ingesting data into SQL DW is well tested and defined.
However , with the introduction of ADLS Gen2 the obvious question is: ADLS Gen 2 is essentially a storage and you need other technologies to connect the data , read it & work on it. There are quite a few options:
- Azure Databricks using Spark SQL, Hive , Python . See the article for step-by-step approach https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html
- Azure HDInsight – Azure HDInsight supports accessing data using multiple open source technologies like Hadoop , Spark . Read more here
- Azure SQL DW – wait, what? yep. Azure SQL DW supports an exciting feature called polybase. The Polybase allows data external to SQL DW be accessible through SQL DW (wow!!)
- Azure Data Factory – ADF provides a connector to connect to the ADLS Gen2 storage and helps customers to transform and orchestrate data movement
The accessing data from ADLS Gen2 is not limited to technologies but mostly by the developer skills. There is atleast one technology to access ADLS Gen2 data based on the skills the developer has.