How we DevOps’d our Microsoft SQL Servers

Vinay Kulkarni
THE ICONIC Tech
Published in
6 min readApr 6, 2020

--

Tl;dr We deployed a clustered instance of Microsoft SQL Server with Always On Availability group through a Bamboo CICD build pipeline on AWS

Only a few years ago, using DevOps and Microsoft SQL Server in the same sentence would have raised a few (or many) eyebrows. All of that changed when Microsoft introduced Microsoft Azure Cloud Platform and along came Visual Studio Team Services (now called the Azure DevOps Server) which made it really easy to build CICD pipelines for the Microsoft stack. Meanwhile, SQL Server can now be deployed as On-premise, Managed Cloud Instance, Managed database service, on UNIX based operating systems, on Docker and on Kubernetes too!!! Today, Azure is the second biggest cloud player by market share, and it is growing!

Now, if Azure has services that can DevOps your Microsoft stack, then why this blog post? Well, at THE ICONIC, our cloud platform of choice has been AWS and that is not changing anytime soon. The growth that this company has seen over the years, means that we now have more servers and microservices being deployed, growing infrastructure complexity and essentially a lot of moving pieces. This is why, following and enforcing best practices in our build pipelines eliminates dependencies on people, reduces human error, standardized our tech stack and makes it easily maintainable.

The Challenge

AWS. Microsoft Windows. Microsoft SQL Server. High Availability. Get it?

Following best practices means that not all is easy, at least not in the initial bits of development and deployment. It is especially not an easy task when you have to develop the solution for the first time ever within the company. We have been doing Infrastructure-as-Code (CloudFormation, Terraform) and CICD (using Atlassian Bamboo) for many years now for all of our infra deployments, but not for the windows based operating systems.

Well, then…challenge accepted!

The Candidate

Not that we had a choice, our mission-critical ERP system, Microsoft Business Central (formerly known as Dynamics 365 Business Central) was due for a major upgrade. While the engineers worked on the migration path for the entire system, I was tasked to build the infrastructure with a focus on end-to-end automation of the entire build process.

The Scope

Our ERP system helps us operate our Fulfillment Centre (our huge warehouse), tracks finances and essentially records every activity occurring in the business. Such a data system is heavily used across Tech — for integrating with APIs, data analysis, supply chain management, etc. Our aim was to reduce direct database connections to the database. It was proposed to set up a SQL Server with a read replica, which could potentially be used as a high availability and disaster recovery instance. It would allow us to offload all reads to the Replica while the Primary served the mission-critical operations for our business.

The Implementation

Since we were deploying our resources on AWS infrastructure and due to the fact that earlier Windows servers were built using a CloudFormation stack, we had templates from previous builds available to us. We thought that we could use these as references for building a standalone Windows server with SQL server installed. However, our new requirements needed a more complex setup that can be represented as below:

The efforts were made easier when I found that AWS offered sample templates for such a setup as aws-quickstart templates. But it was not exactly the solution we needed. Using the templates out of the box would deploy AWS Directory Domain Service, AWS Secrets Manager and AWS Systems Manager. We did not need these. We also needed to further customize our server configurations, SQL installations, Active Directory Setup — all of which were required for Business Central itself. Thus began the journey of digging through 2.5k lines of code and customizing it to suit our needs.

AMI

We started with choosing an appropriate AMI (Amazon Machine Images) to use for our environments. The requirements were:

We picked two AMIs, WS2019FULLBASE for our non-production servers on which a custom PowerShell script would be executed for a custom installation of SQL Server Developer Edition, and WS2019FULLSQL2017ENT for the production server, which was a fully licensed version of Windows Server 2019 and SQL Server 2017 Enterprise.

To Witness or Not to Witness?

Next decision to make was more an architectural one. Windows Server Failover Cluster (WSFC) allows one to setup a Witness server which can be a small EC2 instance acting as a file server. Its purpose is to monitor our superstar SQL Server instances and when one of them stops responding or is in an unhealthy state, it will initiate a failover to the second node, so the application can remain functional.

We had the option to set up High Availability as an Active-Active two-node cluster with a Witness node or an Active-Active-Passive three-node cluster. With the mounting licensing costs and additional resources, not to mention the maintenance efforts involved with a three-node cluster, we decided to settle with a two-node cluster and a witness server.

Initialization and Quickstart Scripts

When CloudFormation installs Windows on a new EC2 instance, it calls an init function to initialize the server. The next few lines of code copy all the Quickstart and custom scripts into a folder and start executing them. This basically consists of the server being renamed, setting up the DNS address, joining Active Directory, adding users to admin groups, and creating shares. After a reboot in between, it goes on to enable Windows Failover Cluster and installs SQL Server services using a service account supplied. It took us several builds to finally get them all to work together.

Volumes

This was a harder decision to make in my opinion. We definitely wanted the best IO performance for our Primary (master) server. However, disks with dedicated IOPS provisioned are very, very expensive! We needed 5TB (for hosting data files, log files and a TempDB database) on each node with an expected data growth of 0.5TB every 6 months. After much deliberation, we opted for IO1 volumes, but with a trade-off of allocating a lower dedicated IOPS to the Secondary Replica.

By this point, we had already built and destroyed our instances several times on our staging environment. While at it, we also reported and fixed a bug on aws-quickstart GitHub repository.

WSFC, SQL Server & Always On

The final hurdle. It was a struggle configuring multiple servers to be added into Windows Server Failover Cluster (WSFC). Because we were deploying the three nodes in 3 Availability Zones, we had to troubleshoot issues with CredSSP authentication. This is basically a function that delegates the user credentials from local computers to remote computers. After that, we encountered issues setting up the Always-On feature for SQL Server. We resolved those using another PowerShell script.

So….

5 CloudFormation templates,

25 PowerShell scripts further customized to suit our needs,

15 attempts to deploy through CloudFormation stack,

50+ attempts to deploy through Bamboo CICD pipeline,

1 Contribution to aws-quickstart github repository,

and ….

many sleepless nights of deployments later…

What does this mean for us?

- Easy to deploy production-ready instances when needed

- Our mission-critical database now conforms to company best practices

- Easier to modify/add resources as required

- Possibility of setting up an auto-scaling group if required

- Increased transparency and accountability

Conclusion

This was a personal milestone for me. From not knowing enough about AWS services to setting up EC2 instances, deploying in multiple availability zones within a VPC, on different private subnets, using CloudFormation stacks and automating the complete flow of Windows and SQL Server Always On installation through Bamboo CICD, I had come a long way. It was a massive learning opportunity and I have many at THE ICONIC to thank for it.

After all of this, if there is one thing I would tell myself, again and again, it would be:

Never Settle.

--

--