We have compiled a list of best practices for Azure architecture. We based the list on our experience transitioning hundreds of on-premise applications to Azure and building dozens of new Azure applications. The applications required a variety of assets such as databases, data warehouses, websites, data streaming services, and machine learning models. Depending on functionality, scalability, and security needs, we relied on Azure Virtual Machines (VMs) or on serverless Azure services architecture. We look forward to sharing this guide and improving it based on feedback from fellow Azure architects.
Architecture Practices Guide - Exeliq
To Optimize Performance & Costs
Switching off resources during non-usage periods reduces overall subscription costs. There are several ways to schedule auto-shutdowns for Azure VMs. When provisioning a new VM, Azure includes settings for scheduled shutdown time according to time zone.
To reduce Azure costs, turn off VMs at set times
In DevOps mode, our developers typically maintain four parallel environments: Development, Testing, UAT, and Production. We use ARO Toolbox and runbooks to automatically shut down all non-production environments at the end of business hours.
To efficiently resolve failure scenarios, implement checkpoints in the Azure Data Factory (ADF) v2 pipeline
Handling failure scenarios in multiple pipelines can be challenging in ADF. Because of the number of pipelines, identifying the precise failure point is difficult. When you cannot determine the failure point, all pipelines must be retriggered.
Use Query Replicas for Azure Analysis Services (AAS) synchronization
To get a load balanced experience for AAS, use query replicas and synchronization for the AAS Model between the processing node and read-only query replicas. This helps parallelly serve multiple concurrent connections, improving the responsiveness of the models significantly. This also allows for high availability, even when a model is being processed.
To optimize query execution time in Azure SQL Data Warehouse (ADW), use appropriate resource classes
Resource classes help manage workloads by setting limits on the number of queries that run concurrently and the compute-resources assigned to each query. Smaller resource classes reduce the maximum memory per query but increase concurrency. Larger resource classes increase the maximum memory per query but reduce concurrency.
To process large volumes of data, dynamically scale Azure Analysis Services (AAS)
To improve processing performance, schedule automatic scale-up of AAS through a runbook immediately before processing large volumes of data. To optimize costs, schedule scale-down after processing.
To check available and expected roles in AAS, use Azure App functions
To secure data in AAS, we create roles. Often, however, roles are deleted or missing in subsequent deployments. Use Azure App functions to periodically check the available and expected roles in the model, and send administrative alerts if discrepancies are identified.
To reduce latency, use partition-specific processing for AAS
Use Azure App functions to process specific partitions in the AAS model, thereby reducing the processing time and latency when showing the latest data.
To increase security, use Service Principal Identity and Azure Key Vault
Store credentials for data stores and computes in an Azure Key Vault. Azure Data Factory retrieves the credentials when executing an activity that uses the data store/computes.
To avoid performance issues, restrict activities in single Azure Data Factory (ADF) pipelines
Limit single ADF pipelines to 40 activities or less to avoid performance issues and resource contention.
References
Microsoft offers additional documents that provide a high-level framework for best practices. We strongly encourage you to review the following documents:
- Azure Security Best Practices – MAQ Software, published August 2nd, 2018
- Azure Reference Architectures – Microsoft Corporation
- Workload management with resource classes in Azure SQL Data Warehouse Microsoft Corporation, published April 26, 2018