Featured

Cosmos DB – Operations using Azure CLI

If you had deployed Cosmos DB and provisioned the database, collection, you would want to know how to operate using Azure CLI. Azure Cosmos DB Command Line interface provides the commands to create the database, the collection and to update the throughput based on how the cosmos DB account has been setup (i.e., as SQL API , MongoDB API,....)

 I was creating a mongodb database in Cosmos DB and followed with the example. 

Creating a new Cosmos DB MongoDB Database

I provisioned my cosmos DB account and after provisioning ran the following CLI command from Powershell (or Azure Shell) to create a new MongoDB database
az cosmosdb mongodb database create --account-name mongodb123a --name newdb1 --resource-group ADFLAB

The "Mongodb123a" is the new account name i had created in my subscription, the "newdb1" is the database that we just created and it belongs to the resource group called "ADFLAB"

Displaying an existing Database using Az CLI

Once the cosmosdb mongodb database has been created or if you want to check the settings of the database, you can use the show command of the database
az cosmosdb mongodb database show --account-name mongodb123a --name newdb1 --resource-group ADFLAB


Listing the collection within the Database

Once you have created the database and collections (either through the CLI or Portal), if you would like to list the collections within the database , you can use the collection list command
az cosmosdb mongodb collection list  --account-name mongodb123a --database-name newdb1  --resource-group ADFLAB

Command group 'cosmosdb mongodb' is in preview. It may be changed/removed in a future release.
[
  {
    "id": null,
    "indexes": [
      {
        "key": {
          "keys": [
            "_id"
          ]
        },
        "options": {
          "expireAfterSeconds": null,
          "unique": null
        }
      },
      {
        "key": {
          "keys": [
            "DocumentDBDefaultIndex"
          ]
        },
        "options": {
          "expireAfterSeconds": null,
          "unique": null
        }
      },
      {
        "key": {
          "keys": [
            "DocumentDBDefaultIndex"
          ]
        },
        "options": {
          "expireAfterSeconds": null,
          "unique": null
        }
      }
    ],
    "location": null,
    "mongoDbCollectionId": "col001",
    "name": null,
    "shardKey": {
      "zipcode": "Hash"
    },
    "tags": null,
    "type": null
  }
]
Azure:/

Listing the Throughput values at the collection level

If you would like to review the throughput for each collection within the database then you would use the collection throughput entity
az cosmosdb mongodb collection throughput show --account-name mongodb123a --database-name newdb1 --name col001 --resource-group ADFLAB
Command group 'cosmosdb mongodb' is in preview. It may be changed/removed in a future release. {   "id": null,   "location": null,   "name": null,   "tags": null,   "throughput": 400,   "type": null } Azure:/

Changing the Throughput values at the collection level

 updating the throughput of the collection we can use the throughput entity which is a sub-entity of the cosmosdb mongodb collection
az cosmosdb mongodb collection throughput update --account-name mongodb123a --database-name newdb1 --name col001 --resource-group ADFLAB --throughput 500
Command group 'cosmosdb mongodb' is in preview. It may be changed/removed in a future release. {   "id": null,   "location": null,   "name": null,   "tags": null,   "throughput": 500,   "type": null } Azure:/

SQL MI Service-aided Subnet configuration

Azure SQL Managed Instance is transitioning from manual to service-aided subnet configuration

with service-aided subnet configuration user is in control , the network requirements to deploy managed instance are as follows:

Dedicated subnet: The managed instance’s subnet can’t contain any other cloud service that’s associated with it, and it can’t be a gateway subnet. The subnet can’t contain any resource but the managed instance, and you can’t later add other types of resources in the subnet.

Subnet delegation: The managed instance’s subnet needs to be delegated to Microsoft.Sql/managedInstances resource provider.

Network security group (NSG): A NSG needs to be associated with the managed instance’s subnet. You can use an NSG to control access to the managed instance’s data endpoint by filtering traffic on port 1433 and ports 11000-11999 when managed instance is configured for redirect connections. Service will automatically add rules required to allow uninterrupted flow of management traffic.

User defined route (UDR) table: A UDR table needs to be associated with the managed instance’s subnet. You can add entries to the route table to route traffic that has on-premises private IP ranges as a destination through the virtual network gateway or virtual network appliance (NVA). Service will automatically add entries required to allow uninterrupted flow of management traffic.

Service endpoints: Service endpoints could be used to configure virtual network rules on storage accounts that keep backups / audit logs.

Sufficient IP addresses: The managed instance subnet must have at least 16 IP addresses. The recommended minimum is 32 IP addresses. For more information, see Determine the size of the subnet for managed instances. You can deploy managed instances in the existing network after you configure it to satisfy the networking requirements for managed instances. Otherwise, create a new network and subnet.

Azure SQL Failover Strategy – I

Azure SQL has multiple failover strategies , the amount of content to understand the different strategies seems to be overwhelming unless you are a database administrator and you had been working on the HA / DR for quite a while.

I am going to attempt to make it simpler and consolidate some of the strategies to make it easy for you to understand . Since the strategies are different for each type of cloud computing (IaaS and PaaS), i am going to focus on the 1st part only on the IaaS and the next part on PaaS, that way it would be easy to look back and understand the magnitude of differences in Azure SQL.

Azure VM based Failover Strategy

There are 3 important failover strategies that we will refer for Azure SQL VM

Option 1 – Availability Groups

Also known as Always On Availability Groups, AOAG, AG, …. you get the picture (literally, see below).

  • Always On availability groups on Azure Virtual Machines are similar to Always On availability groups on premises.
  • Requires an Internal Load balancer within the two SQL VMs
  • The load balancer holds the IP addresses for the availability group listener. If you have more than one availability group each group requires a listener. One load balancer can support multiple listeners.
  • Best Practice is to use Availability set within the two SQL VMs & Domain Controllers to ensure redundancy and high availability

Availability Set:

Azure makes sure that the VMs you place within an Availability Set run across multiple physical servers, compute racks, storage units, and network switches & physical building within the same region. If a hardware or software failure happens, only a subset of your VMs are impacted and your overall solution stays operational.

Pricing

To use BYOL with a SQL Server VM, you must have a license for SQL Server Standard or Enterprise and Software Assurance

Cost savings The Azure Hybrid Benefit offers up to 55% savings. For more information, see Switch licensing model
Free passive secondary replica Another benefit of bringing your own license is the free licensing for one passive secondary replica per SQL Server for high availability purposes. This cuts in half the licensing cost of a highly available SQL Server deployment (for example, using Always On Availability Groups). The rights to run the passive secondary are provided through the Fail-Over Servers Software Assurance benefit.
Windows License Windows license allows Dual-usage , which means you don’t have to pay for Windows license if you are brining your own License and carry EA

Recovery Objective

The time taken to make the secondary replica as primary.

RTO

If the Availability mode is set as Synchronous, then the data committed is replicated , so any data during transit is required to be sent over (retry from Application)

RPO

More info : https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-portal-sql-availability-group-overview

Option 2 – Availability Group across Multiple Regions

This is an extension of the Option 1 , where we have instead of the SQL VM being replicated only within a single region, we can have SQL VM replicated across multiple regions. This provides higher redundancy and increased resiliency . This architecture helps when one region is completely down for some reason (read fire, earthquake, human error) , to protect business staying afloat customers use this strategy

In this deployment, The SQL VMs are in Availability group are across in two regions. The availability group replicas can have synchronous commit with automatic failover on SQL-1 and SQL-2 and SQL-3 is in a different Azure region. SQL-3 is added to the Windows Server Failover Cluster. SQL-3 can host an availability group replica. In this architecture, the replica in the remote region is normally configured with asynchronous commit availability mode and manual failover mode.

When availability group replicas are on Azure virtual machines in different Azure regions, each region requires:

  • A virtual network gateway
  • A virtual network gateway connection
Location Server
Instance
Role Availability Mode Failover Mode
Region 1 SQL-1 Primary Synchronous Automatic
Region 1 SQL-2 Secondary Synchronous Automatic
Region 2 SQL-3 Secondary Asynchronous Manual

Pricing

To use BYOL with a SQL Server VM, you must have a license for SQL Server Standard or Enterprise and Software Assurance

Cost savings The Azure Hybrid Benefit offers up to 55% savings. For more information, see Switch licensing model
Free passive secondary replica Another benefit of bringing your own license is the free licensing for one passive secondary replica per SQL Server for high availability purposes. This cuts in half the licensing cost of a highly available SQL Server deployment (for example, using Always On Availability Groups). The rights to run the passive secondary are provided through the Fail-Over Servers Software Assurance benefit.
Paid Third Replica SQL Licensing covers only one Passive replica, and the third replica will be a paid replica based on the configuration
Windows License Windows license allows Dual-usage , which means you don’t have to pay for Windows license if you are brining your own License and carry EA

Recovery Objective

The time taken to make the secondary replica as primary.

RTO

For the Availability mode is set as Synchronous, then the data committed is replicated , so any data during transit is required to be sent over (retry from Application), However for the Asynchronous replication there could be a data loss of ~5 minutes due to this being a region outage

RPO

Option 3 – Windows Failover Cluster

Failover Cluster Instances (FCI), which require shared storage, can be created in 4 different ways.

1. A two-node failover cluster running in Azure VMs with attached storage using Windows Server 2016 Storage Spaces Direct (S2D) to provide a software-based virtual SAN.

2. A two-node failover cluster running in Azure VMs using Premium File Share. Premium file shares are SSD-backed consistently-low-latency file shares that are fully supported for use with Failover Cluster Instance.

3. A two-node failover cluster running in Azure VMs with storage supported by a third-party clustering solution. For a specific example that uses SIOS DataKeeper, see High availability for a file share using failover clustering and 3rd party software SIOS DataKeeper.

4. A two-node failover cluster running in Azure VMs with remote iSCSI Target shared block storage via ExpressRoute. For example, NetApp Private Storage (NPS) exposes an iSCSI target via ExpressRoute with Equinix to Azure VMs.

Pricing

On Azure virtual machines, you can license SQL Server using pay-as-you-go (PAYG) or bring-your-own-license (BYOL) VM images.

The type of image you choose affects how you’re charged.

  • With pay-as-you-go licensing, a failover cluster instance (FCI) of SQL Server on Azure virtual machines incurs charges for all nodes of the FCI, including the passive nodes
  • If you are using your Enterprise Agreement with Software Assurance, you can use one free passive FCI node for each active node. To take advantage of this benefit in Azure, use BYOL VM images, and use the same license on both the active and passive nodes of the FCI.

Recovery Objective

The time taken to fail over between the nodes.

RTO

Because Always On FCI uses shared storage, the same view of the storage instance is available on failover.

RPO of Zero Data Loss

Azure Big Data Architecture

I was talking to a fellow architect who wanted to know how to store data for Analysis . Presented & walked him through the two standard Azure soltutions, one based on SQL DW & other using the new Azure Data Lake Storage Gen2 . The Arcitecture for Ingesting data into SQL DW is well tested and defined.

However , with the introduction of ADLS Gen2 the obvious question is: ADLS Gen 2 is essentially a storage and you need other technologies to connect the data , read it & work on it. There are quite a few options:

HDInsight ETL at scale overview
  • Azure SQL DW – wait, what? yep. Azure SQL DW supports an exciting feature called polybase. The Polybase allows data external to SQL DW be accessible through SQL DW (wow!!)
  • Azure Data Factory – ADF provides a connector to connect to the ADLS Gen2 storage and helps customers to transform and orchestrate data movement

The accessing data from ADLS Gen2 is not limited to technologies but mostly by the developer skills. There is atleast one technology to access ADLS Gen2 data based on the skills the developer has.

Cosmos DB – Provisioning throughput

Before we understand provisioning throughput, the first step is to understand Cosmos DB Pricing. Cosmos DB pricing is based on requested units.

The cost of all database operations is normalized by Azure Cosmos DB and is expressed by Request Units (or RUs, for short).

You can think of RUs per second as the currency for throughput. RUs per second is a rate-based currency. It abstracts the system resources such as CPU, IOPS, and memory that are required to perform the database operations supported by Azure Cosmos DB.

Cosmos DB provides more than one way to provision throughput

  1. Dedicated collection level throughput
  2. Shared Database level throughput

Collection level throughput provides the dedicated request unit /second for that collection . for e..g if you have data coming from multiple IoT devices and one of them sends data at say 500 data points per second , then you need to provision a dedicated capacity to that collection .

Database Throughput = sum( dedicated throughput for each collection within the database)

Each collection in its own database, and the max throughput for each collection is determined by the dedicated capacity provisioned

Shared Database level throughput is used when you have have few collections and they are not used with a high frequency, for e.g. reference data that is read every single time there is another transaction that occurs once per second , if you have multiple reference data collection, it does not require a high dedicated throughput , then we can use shared database throughput.

The biggest difference in collection level and database provisioned throughput is the difference of marginal aggregation. Mathematically,

Collection throughput = Database throughput – sum(current throughput used by other collections)

Each collection gets the max available throughput provisioned at the database level,

Provisioning throughput using Collection

The RUs are setup while the DB and the collections are provisioned and can be scaled up or down on need basis.

Creating a collection within a new or existing DB

Alternatively, Cosmos DB provides Azure CLI to create the collections along with the throughput option

Provisioning throughput using Database

Alternatively Cosmos DB provides provisioning of throughput using Azure CLI

Scaling the throughput at database level is also just like the way we scale the collection level throughput

In Azure CLI, The throughput sub-entity supports two methods viz., show and update, the show method provides the current throughput and the update helps to scale the throughput at the database level

throughput show method displays the current throughput provisioned for the database level
The throughput update method scales the throughput provisioned at the database level

Next Steps

After started to write this article, i realized i should cover on migration from collection throughput to database throughput. I will try to write the migration steps in my next article