Azure Analysis Services

Automating Azure Analysis Services

Last year I got the opportunity to speak at a local SQL Saturday. I used the opportunity to really stretch myself and research some creative ways of spinning up, spinning down, and automating Azure Analysis Services. When I wrote the abstract and submitted the session I wasn’t even sure these things were possible (and after you watch the video you may feel a few of my demo hiccups prove some of these ideas weren’t possible). The code samples are posted on GitHub and the slides are available here.

I also got a chance to talk to Shawn Weisfeld of UserGroup.tv who provides an amazing free service and publishes lots of user group session recordings. The video below is courtesy of UserGroup.tv.

The elasticity of the cloud lets you provide a better user experience instead of breaking the bank running your solution at the highest scale 24 hours a day. This demo-heavy session will show how to automate Azure Analysis Services. We will look at solutions to the following questions: How do you schedule a refresh of the data in your Azure Analysis Services model? How do you pause the server on the weekends? Better yet, how can you pause it when the last user disconnects and start it automatically when the first user connects in the morning? How do you scale to meet peak demand each day? How can Azure Analysis Services load balance across servers in multiple countries so that the user connects to the geographically closest server? How do you extend your on-premises SSAS solution into Azure Analysis Services?

How to Make DAX Distinct Counts 100x Faster with Exact Match Aggs

My wife says that if I wait long enough, some of my clothing will come back into fashion. In the data analytics world, apparently we have waited long enough because an old Analysis Services Multidimensional concept called aggregations has come back into fashion. You may have seen announcements that Power BI supports its own flavor of aggregations now. This blog post is not about Power BI aggs. However, I’ll explain why we didn’t use them and then I’ll describe the approach we did take.

The Business Scenario

To provide some context for the real client scenario where we created this exact match agg approach, the client is a retailer and they tend to measure themselves on a per store per day scale as many retailers do. If one region has 50 stores and another region has 80 stores, it’s difficult to compare those two regions head-to-head unless you normalize the sales amounts to the scale of a single store and day. Similarly, if new stores are being opened halfway through the year and other stores are being closed, then it’s difficult to compare apples to apples without normalizing to a per store and per day scale. While this approach is very useful for analytics, it also has a downside since now most every metric must be divided by a distinct count of StoreKey and DateKey. As you know, distinct counts are very expensive relative to sum or count aggregations.

I was presented with this very situation where a distinct count over an 18 billion row table was the denominator for most calculations and performance was terrible. They wanted dashboards to render in under 10 seconds even when looking at sales change year-over-year for the entire company. This blog post summarizes the approach we took to optimize their Analysis Services Tabular model distinct counts to meet those performance expectations.

Could Power BI Aggregations Help?

We chose not to use Power BI aggregations for a few reasons. First, the data was in an Analysis Services Tabular model which was hundreds of GB in size and running on a 128 v-core VM with 4TB of RAM. Unfortunately, Power BI aggs aren’t supported on top of a live connection to SSAS currently. Second, though Power BI does support distinct count aggregations, the agg table must contain the distinct count key and it must be marked as a GroupBy column. In the real client scenario above, the 18 billion row fact table had one row per ProductKey per StoreKey per DateKey. Aggregating it to a product subcategory grain but keeping StoreKey and DateKey in the table only shrunk the table to 4 billion rows. That’s too large to be imported into Power BI at the current time.

We actually did take the approach of creating a product subcategory agg table in the SSAS Tabular model and it did make reports 3x faster. The nice thing about this approach was that 90% of reporting is done at the product subcategory and above so this agg table was hit on most reports. The other nice thing about this approach was that because StoreKey and DateKey were in the agg table, we could slice by any rollup in the Store dimension and any rollup in the Date dimension and hit the agg as long as the report was at the product subcategory level or above. Unfortunately, 3x faster wasn’t nearly fast enough; a distinct count over a 4 billion row table was still much too slow.

How to Make Distinct Counts an Order of Magnitude Faster

I have written in the past about optimizing large distinct counts by separating out the portion of the data which isn’t a distinct count and turning it into a sum. That approach wasn’t applicable to the store-day distinct count in this scenario. But the solution we came up with did take a page out of that book by turning the entire distinct count into a sum. How do you change a distinct count into a sum? By pre-aggregating the data. When I say pre-aggregating, I don’t mean the approach described above where you create a 4 billion row table but still perform the DISTINCTCOUNT at query time. I mean that you perform the distinct count ahead of time so that at query time it is a sum.

Why Call this an Exact Match Agg?

I can pre-aggregate an additive measure like summing up sales amounts to the product subcategory grain and storing it into a FactProductSubCategorySalesAgg table. When the user queries by product category, I can simply further summarize the FactProductSubCategorySalesAgg numbers to answer the query. However, if I’m dealing with distinct counts then that doesn’t work. You can’t rollup a subcategory agg to the category level because a distinct count is nonadditive. So distinct count agg tables (which truly pre-aggregate the distinct count rather than store the distinct count key in the table) only help queries which are an exact match. This means two things. First, I have to anticipate ahead of time the common rollups that users will request. Second, I have to find a way to store thousands of different grain rollups and I have to be able to determine at query time in DAX calculations whether I have already pre-aggregated the rollup the user is requesting. Since there are dozens of columns in the Product, Store, and Date dimension which users use in reports, when you figure out all the combination of filter grains that are commonly seen, you quickly realize there are thousands of rollup grains.

As a side note, in Power BI aggs each column in the agg table must be either a GroupBy column you grouped by to create the aggregation or a pre-aggregated sum/min/max/count column. Distinct count isn’t in the list. (Currently, you have to mark the distinct count column as “GroupBy” to support distinct count over the agg table. But that leads to a large agg table.) Though it’s possible the Power BI team could add “Distinct Count” to the list in the future, it wouldn’t be very useful since each Power BI agg table represents one grain of pre-aggregated data which can’t be further aggregated up and since typical scenarios require thousands of grains, it immediately becomes untenable to create thousands of distinct count agg tables. So we can’t really use the Power BI agg approach for our exact match pre-aggregated distinct counts. We have to come up with a different approach.

How to Store Thousands of Aggregation Grains

For this agg table we decided to pre-aggregate up the Product dimension and pre-aggregate up the Date dimension but keep the StoreKey in the agg table. So any filter on the Store dimension can hit the agg table (since StoreKey is in the distinct count key) but only pre-aggregated rollups on the Product and Date dimensions can hit the agg table. How do we store thousands of different Product and Date rollups without making a mess of the model?

Dates are easiest, so we will start with those. Users commonly look at month-to-date and prior year month-to-date. Assuming today is February 4, 2019, we can represent that month-to-date aggregation as DateRangeKey=20190204004 where the yellow part (20190204) is the ending date of the range and the green part (004) is the duration of the date range. Prior year month-to-date would be 20180204004. Year-to-date through February 4, 2019 would be 20190204035. So we import a hidden DimDateRange table into the Tabular model which has the following columns:

DateRangeKey

MinDateKey

MaxDateKey

20190204004

20190201

20190204

20180204004

20180201

20180204

20190204035

20190101

20190204

How do we determine in DAX whether we have pre-aggregated the date range the user requested? I will show a full implementation later, but the following DAX snippet returns the DateRangeKey if we have it and returns BLANK() if we don’t have it pre-aggregated.

LOOKUPVALUE (
    DateRange[DateRangeKey],
    DateRange[MinDateKey], MIN ( 'Date'[DateKey] ),
    DateRange[MaxDateKey], MAX ( 'Date'[DateKey] ),
    DateRange[DateRangeKey], MAX ( 'Date'[DateKey] ) * 1000 + COUNTROWS ( 'Date' )
)

Products are quite a bit harder. We decided to define each Product dimension rollup as a set of the ProductKeys that are in it. We called this the ProductAggSignature. Next, we looked for a way to represent that signature with some inexpensive DAX functions like SUM, MIN, MAX, and COUNT. For example, the first row represents ProductKeys 1, 3, 5, 7, and 9. The second row represents ProductKeys 1, 4, 5, 6, and 9.

ProductAgg
SignatureKey

Count
ProductKey

Min
ProductKey

Max
ProductKey

Sum
ProductKey

SumExponentMod
ProductKey

998

5

1

9

25

176

999

5

1

9

25

192

To define each of those columns:

  • ProductAggSignatureKey – an identity column

  • CountProductKey – the row count of DimProduct which is included in the agg

  • MinProductKey – the lowest ProductKey included in the agg

  • MaxProductKey – the highest ProductKey included in the agg

  • SumProductKey – the sum of the ProductKey values included in the agg

  • SumExponentModProductKey – the sum of a new column or calculation in the Product table which is: (ProductKey*ProductKey*100)%77

That last column is certainly odd but as you can see in the example above, it was necessary to distinguish the first and second row.

After gathering business requirements and studying the product rollups and date rollups used in several key dashboards, we chose the rollups to pre-aggregate in Azure SQL DW.

How do we determine at query time in DAX whether we have pre-aggregated the product dimension rollup the user requested? I will show a full implementation later, but the following DAX snippet returns the ProductAggSignatureKey if we have it and returns BLANK() if we don’t have it pre-aggregated.

LOOKUPVALUE (
    ProductAggSignature[ProductAggSignatureKey],
    ProductAggSignature[CountProductKey], COUNTROWS ( 'Product' ),
    ProductAggSignature[MinProductKey], MIN ( 'Product'[ProductKey] ),
    ProductAggSignature[MaxProductKey], MAX ( 'Product'[ProductKey] ),
    ProductAggSignature[SumProductKey], SUM ( Product[ProductKey] ),
    ProductAggSignature[SumExponentModProductKey], SUM ( 'Product'[ExponentModProductKey] )
)

Now that we’ve explained the DateRangeKey and the ProductAggSignatureKey, the new agg table is pretty straightforward.

DateRangeKey

ProductAggSignatureKey

StoreKey

StoreDayCount

20190204004

998

101

4

20190204004

998

102

4

In this scenario, we have pre-aggregated MTD February 4, 2019 for a product rollup I described above. We store one row per StoreKey in the rollup. The StoreDayCount is the number of days that store had sales of a product in that product rollup. Therefore, I can sum up 4 + 4 to get the correct “Store Day Count” number if I ask for this exact date range and product rollup.

The following diagram shows the tables and relationships in the Tabular model. The yellow highlighted tables and columns are the new ones added to implement this exact match agg optimization.


How do we do that in DAX? If both of the DateRangeKey and ProductAggSignatureKey LOOKUPVALUE functions return a number, then we return SUM([StoreDayCount]) from rows matching those keys. Otherwise, we fall back to our regular slow DISTINCTCOUNT measure. Here’s the full DAX measure:

Store Day Count (Optimized):=
VAR DateRangeKey =
    LOOKUPVALUE (
        DateRange[DateRangeKey],
        DateRange[MinDateKey], MIN ( 'Date'[DateKey] ),
        DateRange[MaxDateKey], MAX ( 'Date'[DateKey] ),
        DateRange[DateRangeKey], MAX ( 'Date'[DateKey] ) * 1000 + COUNTROWS ( 'Date' )
    )
RETURN
    IF (
        NOT ( ISBLANK ( DateRangeKey ) ),
        VAR ProductAggSignatureKey =
            LOOKUPVALUE (
                ProductAggSignature[ProductAggSignatureKey],
                ProductAggSignature[CountProductKey], COUNTROWS ( 'Product' ),
                ProductAggSignature[MinProductKey], MIN ( 'Product'[ProductKey] ),
                ProductAggSignature[MaxProductKey], MAX ( 'Product'[ProductKey] ),
                ProductAggSignature[SumProductKey], SUM ( Product[ProductKey] ),
                ProductAggSignature[SumExponentModProductKey], SUM ( 'Product'[ExponentModProductKey] )
            )
        RETURN
            IF (
                NOT ( ISBLANK ( ProductAggSignatureKey ) ),
                CALCULATE (
                    SUM ( StoreDayExactMatchAgg[StoreDayCount] ),
                    StoreDayExactMatchAgg[DateRangeKey] = DateRangeKey,
                    StoreDayExactMatchAgg[ProductAggSignatureKey] = ProductAggSignatureKey
                ),
                [Slow DISTINCTCOUNT Measure]
            ),
        [Slow DISTINCTCOUNT Measure]
    )

For this sample code I created this as a new “Store Day Count (Optimized)” measure in order to be able to easily compare the performance of the slow approach and the optimized approach. In a real model, I would have renamed the existing “Store Day Count” measure, hidden it, and replaced the “Store Day Count” measure with the DAX shown above.

Why Did I Not Use the Typical DAX Approach?

If I had been able to use Power BI agg tables, then there would have been no additional DAX changes necessary for Power BI to use the agg tables when possible. (The “typical DAX approach” in Power BI is no additional DAX.) However, we couldn’t use Power BI aggs for reasons stated above.

In SSAS Tabular, Marco Russo describes in this video (around the 1:02:00 mark) the typical DAX approach to detecting what rollups the user chose. Typically, you use ISFILTERED or ISCROSSFILTERED. The DAX query optimizer handles this very efficiently and this works well when you are detecting one fairly simple filter context like “did I filter by product subcategory?” However, when you are detecting thousands of different grain rollups, this approach will be impossible to code and maintain. That’s why we chose to perform a LOOKUPVALUE against the DateRange and ProductAggSignature tables to make this determination.

How Much Faster Is It?

For the real client scenario, the distinct count got about 23x faster. Reports that looked at large date ranges like YTD got approximately 4x faster (realizing we were only optimizing the distinct count denominator of calculations, not the numerator) and we were able to get reports to render under 10 seconds. Reports that looked at one day or one week were approximately 1.4x faster.

For the sample I published to Github (see below) the core distinct count measure became 210x faster and reporting queries that contained sales amounts and year-over-year comparisons became approximately 24x faster. Obviously, your results will vary depending on your data and hardware, but clearly this is an order of magnitude faster performance.

I like that we can deploy this optimization such that it is completely transparent to the user since the DAX code automatically falls back to the slower distinct count if we don’t have the rollup they want pre-aggregated. It’s complex to implement, but on large data volumes where interactive distinct count query performance is needed, it’s definitely an option to consider.

Try It Out and Take the Challenge

Realizing this is a rather complex solution, I published the code here on Github with sample data. It’s only 400 million rows not 18 billion rows but it should be enough to demonstrate a bit of a performance issue and show how the solution is an order of magnitude faster. Please give it a try and use the Issues tab for any questions you have.

I would also like to drop the gauntlet and challenge you to come up with a better solution! If you can make the solution faster or less complex to implement, I would love to hear it. Start a thread on the Issues tab to share your approach. Certainly, further optimizations of the solution are welcome. Thinking out-of-the-box like changing the technologies involved is also welcome.

Sending Users to the Geographically Nearest Azure Analysis Services

image001.png

I’ve been preparing my Automating Azure Analysis Services talk for the May 19, 2018 SQL Saturday in Dallas. The session abstract is:

The elasticity of the cloud lets you provide a better user experience instead of breaking the bank running your solution at the highest scale 24 hours a day. This demo-heavy session will show how to automate Azure Analysis Services. We will look at solutions to the following questions: How do you schedule a refresh of the data in your Azure Analysis Services model? How do you pause the server on the weekends? Better yet, how can you pause it when the last user disconnects and start it automatically when the first user connects in the morning? How do you scale to meet peak demand each day? How can Azure Analysis Services load balance across servers in multiple countries so that the user connects to the geographically closest server? How do you extend your on-premises SSAS solution into Azure Analysis Services?

This blog post is about the bolded part of that abstract. In other words, it is about users from a multi-national corporation connecting automatically to the geographically nearest Azure Analysis Services instance. Why? Performance. High availability. Cost optimization. Even data security involving national borders.

 

The Ingredients

Most Azure solutions are a bit like a soup pot where you toss in various Azure technologies as ingredients to create something greater than the sum of their parts. In this instance, we’ll mince up some Azure Traffic Manager, some Azure Web Apps, and some Azure Analysis Services to create a pretty tasty solution.

image002.png

 

How Users Connect

In the diagram above, we have some worldwide users. Everyone will connect to Azure Analysis Services via the server name they were provided link://gregazureas.trafficmanager.net/. That server name could easily have been a vanity URL such as link://azureas.artisconsulting.com/ using custom DNS. This uses the server aliases feature of Azure Analysis Services and the latest drivers. (Notice Christian Wade mentioned the idea of geographic redirection in that article!)

image003.png

When a user in North America connects, Azure Traffic Manager sends them to the Azure Analysis Services instance in South Central US because it determined that was geographically closest. But when a user in Asia connects, it determines that the Japan East instance of Azure AS is closest. This all happens automatically and transparently to the user. And when the Japan Azure AS instance and Japan web app are paused for the night, any Japanese users who are working late will end up connecting to the South Central US instance. So this solution can also add an extra layer of high availability or cost optimization.


How does this work?

Azure Traffic Manager is just a DNS level service which understands user geography based upon IP addresses and which understands the health of your backend services. It pairs these two pieces of information together and decides where to route a user by returning the appropriate DNS. In this instance, we tell Azure Traffic Manager to route based upon the “Performance” routing method which redirects you to the geo which will have the lowest latency for you. However, review the other routing methods to determine which one matches your needs best.

image004.png

You setup endpoints in Azure Traffic Manager to tell it where it can route traffic.

image005.png

Azure Traffic Manager will monitor those two websites to ensure they are up.

You can use dnschecker.org to resolve the DNS from servers around the world to test your Azure Traffic Manager routing method is working as expected.

image006.png

You can see that users in Malaysia, Japan, and China would get routed to the Japan East server while other users would end up in the South Central US server.

Azure Traffic Manager configuration is so easy I literally set it up from a web browser on my phone while I was at a dead stop in traffic.


Setting up the Web Apps

How were the web apps from above setup? In this instance, the web apps are very simple. Create two web apps. In order to easily integrate with Azure Traffic Manager, you need to choose at least the S1 Standard tier as lower tiers don’t support Azure Traffic Manager.

image007.png

Next, edit the website files via a Visual Studio project or other methods such as the in-browser App Service Editor which I will show below.

image008.png

 

Unless you have more advanced needs, the code can be very simple. The Japan East web app should return the string azureas://japaneast.azureas.windows.net/<yourservername> while the South Central US web app should return asazure://southcentralus.asazure.windows.net/<yourservername>.

 

The URL https://<yourwebsite>.azurewebsites.net will work over SSL properly, but if you try to connect via https://gregazureas.trafficmanager.net then you will get an SSL certificate error since the SSL cert doesn’t match the URL. You need to purchase an SSL certificate for this URL and apply it to the website for your trafficmanager.net URL (or your vanity URL). Since this was just a demo not a production environment, I used Let’s Encrypt to create a free SSL cert. Why do we need SSL? The link:// syntax uses HTTPS under the covers and will not work if there’s an invalid certificate.


What if I’m using Analysis Services on a VM?

Connection to Azure Analysis Services is a little different because it connects over HTTPS and supports the link:// syntax. So how do you accomplish the same solution with Analysis Services on a VM? If you already enabled msmdpump.dll for HTTP connectivity to Analysis Services, then just point Traffic Manager at those URLs. Assuming you haven’t enabled HTTP connectivity or don’t want to, you can just eliminate the web apps and point Azure Traffic Manager at the public IP of the SSAS server and have it monitor on port 2383 TCP.

image010.png

 

What if I Want To Load Balance Across On-Premises SSAS and Azure Analysis Services?

Let’s say you have an on premises SSAS Tabular install which is heavily used by users in your corporate office. But what if you need to burst to the cloud during busy month-end reporting? Or what if you need to put Azure Analysis Services in Japan for Japanese users but allow your corporate users to continue using their on-premises SSAS server? Well, I suspect this isn’t a supported scenario since the versions of the SSAS engine on prem vs. in Azure Analysis Services will be slightly different. But supportability doesn’t quash my curiosity (even if it’s not wise).

If Excel users are your target audience, then have the website return a dynamic ODC file that points to either the on-premises server or the cloud azureas:// server name depending on the location of the user.

As for other client tools, I haven’t figured out an elegant way. Maybe you will!


Conclusion

I have personally experienced on-premises global projects where users that were near the corporate office experienced great SSAS performance while users in Europe or Asia experienced annoyingly sluggish performance due to the network latency and bandwidth to get to US servers. This solution is a perfect fit for such global rollouts of Azure Analysis Services.