SSAS

Automating Azure Analysis Services

Last year I got the opportunity to speak at a local SQL Saturday. I used the opportunity to really stretch myself and research some creative ways of spinning up, spinning down, and automating Azure Analysis Services. When I wrote the abstract and submitted the session I wasn’t even sure these things were possible (and after you watch the video you may feel a few of my demo hiccups prove some of these ideas weren’t possible). The code samples are posted on GitHub and the slides are available here.

I also got a chance to talk to Shawn Weisfeld of UserGroup.tv who provides an amazing free service and publishes lots of user group session recordings. The video below is courtesy of UserGroup.tv.

The elasticity of the cloud lets you provide a better user experience instead of breaking the bank running your solution at the highest scale 24 hours a day. This demo-heavy session will show how to automate Azure Analysis Services. We will look at solutions to the following questions: How do you schedule a refresh of the data in your Azure Analysis Services model? How do you pause the server on the weekends? Better yet, how can you pause it when the last user disconnects and start it automatically when the first user connects in the morning? How do you scale to meet peak demand each day? How can Azure Analysis Services load balance across servers in multiple countries so that the user connects to the geographically closest server? How do you extend your on-premises SSAS solution into Azure Analysis Services?

How to Make DAX Distinct Counts 100x Faster with Exact Match Aggs

My wife says that if I wait long enough, some of my clothing will come back into fashion. In the data analytics world, apparently we have waited long enough because an old Analysis Services Multidimensional concept called aggregations has come back into fashion. You may have seen announcements that Power BI supports its own flavor of aggregations now. This blog post is not about Power BI aggs. However, I’ll explain why we didn’t use them and then I’ll describe the approach we did take.

The Business Scenario

To provide some context for the real client scenario where we created this exact match agg approach, the client is a retailer and they tend to measure themselves on a per store per day scale as many retailers do. If one region has 50 stores and another region has 80 stores, it’s difficult to compare those two regions head-to-head unless you normalize the sales amounts to the scale of a single store and day. Similarly, if new stores are being opened halfway through the year and other stores are being closed, then it’s difficult to compare apples to apples without normalizing to a per store and per day scale. While this approach is very useful for analytics, it also has a downside since now most every metric must be divided by a distinct count of StoreKey and DateKey. As you know, distinct counts are very expensive relative to sum or count aggregations.

I was presented with this very situation where a distinct count over an 18 billion row table was the denominator for most calculations and performance was terrible. They wanted dashboards to render in under 10 seconds even when looking at sales change year-over-year for the entire company. This blog post summarizes the approach we took to optimize their Analysis Services Tabular model distinct counts to meet those performance expectations.

Could Power BI Aggregations Help?

We chose not to use Power BI aggregations for a few reasons. First, the data was in an Analysis Services Tabular model which was hundreds of GB in size and running on a 128 v-core VM with 4TB of RAM. Unfortunately, Power BI aggs aren’t supported on top of a live connection to SSAS currently. Second, though Power BI does support distinct count aggregations, the agg table must contain the distinct count key and it must be marked as a GroupBy column. In the real client scenario above, the 18 billion row fact table had one row per ProductKey per StoreKey per DateKey. Aggregating it to a product subcategory grain but keeping StoreKey and DateKey in the table only shrunk the table to 4 billion rows. That’s too large to be imported into Power BI at the current time.

We actually did take the approach of creating a product subcategory agg table in the SSAS Tabular model and it did make reports 3x faster. The nice thing about this approach was that 90% of reporting is done at the product subcategory and above so this agg table was hit on most reports. The other nice thing about this approach was that because StoreKey and DateKey were in the agg table, we could slice by any rollup in the Store dimension and any rollup in the Date dimension and hit the agg as long as the report was at the product subcategory level or above. Unfortunately, 3x faster wasn’t nearly fast enough; a distinct count over a 4 billion row table was still much too slow.

How to Make Distinct Counts an Order of Magnitude Faster

I have written in the past about optimizing large distinct counts by separating out the portion of the data which isn’t a distinct count and turning it into a sum. That approach wasn’t applicable to the store-day distinct count in this scenario. But the solution we came up with did take a page out of that book by turning the entire distinct count into a sum. How do you change a distinct count into a sum? By pre-aggregating the data. When I say pre-aggregating, I don’t mean the approach described above where you create a 4 billion row table but still perform the DISTINCTCOUNT at query time. I mean that you perform the distinct count ahead of time so that at query time it is a sum.

Why Call this an Exact Match Agg?

I can pre-aggregate an additive measure like summing up sales amounts to the product subcategory grain and storing it into a FactProductSubCategorySalesAgg table. When the user queries by product category, I can simply further summarize the FactProductSubCategorySalesAgg numbers to answer the query. However, if I’m dealing with distinct counts then that doesn’t work. You can’t rollup a subcategory agg to the category level because a distinct count is nonadditive. So distinct count agg tables (which truly pre-aggregate the distinct count rather than store the distinct count key in the table) only help queries which are an exact match. This means two things. First, I have to anticipate ahead of time the common rollups that users will request. Second, I have to find a way to store thousands of different grain rollups and I have to be able to determine at query time in DAX calculations whether I have already pre-aggregated the rollup the user is requesting. Since there are dozens of columns in the Product, Store, and Date dimension which users use in reports, when you figure out all the combination of filter grains that are commonly seen, you quickly realize there are thousands of rollup grains.

As a side note, in Power BI aggs each column in the agg table must be either a GroupBy column you grouped by to create the aggregation or a pre-aggregated sum/min/max/count column. Distinct count isn’t in the list. (Currently, you have to mark the distinct count column as “GroupBy” to support distinct count over the agg table. But that leads to a large agg table.) Though it’s possible the Power BI team could add “Distinct Count” to the list in the future, it wouldn’t be very useful since each Power BI agg table represents one grain of pre-aggregated data which can’t be further aggregated up and since typical scenarios require thousands of grains, it immediately becomes untenable to create thousands of distinct count agg tables. So we can’t really use the Power BI agg approach for our exact match pre-aggregated distinct counts. We have to come up with a different approach.

How to Store Thousands of Aggregation Grains

For this agg table we decided to pre-aggregate up the Product dimension and pre-aggregate up the Date dimension but keep the StoreKey in the agg table. So any filter on the Store dimension can hit the agg table (since StoreKey is in the distinct count key) but only pre-aggregated rollups on the Product and Date dimensions can hit the agg table. How do we store thousands of different Product and Date rollups without making a mess of the model?

Dates are easiest, so we will start with those. Users commonly look at month-to-date and prior year month-to-date. Assuming today is February 4, 2019, we can represent that month-to-date aggregation as DateRangeKey=20190204004 where the yellow part (20190204) is the ending date of the range and the green part (004) is the duration of the date range. Prior year month-to-date would be 20180204004. Year-to-date through February 4, 2019 would be 20190204035. So we import a hidden DimDateRange table into the Tabular model which has the following columns:

DateRangeKey

MinDateKey

MaxDateKey

20190204004

20190201

20190204

20180204004

20180201

20180204

20190204035

20190101

20190204

How do we determine in DAX whether we have pre-aggregated the date range the user requested? I will show a full implementation later, but the following DAX snippet returns the DateRangeKey if we have it and returns BLANK() if we don’t have it pre-aggregated.

LOOKUPVALUE (
    DateRange[DateRangeKey],
    DateRange[MinDateKey], MIN ( 'Date'[DateKey] ),
    DateRange[MaxDateKey], MAX ( 'Date'[DateKey] ),
    DateRange[DateRangeKey], MAX ( 'Date'[DateKey] ) * 1000 + COUNTROWS ( 'Date' )
)

Products are quite a bit harder. We decided to define each Product dimension rollup as a set of the ProductKeys that are in it. We called this the ProductAggSignature. Next, we looked for a way to represent that signature with some inexpensive DAX functions like SUM, MIN, MAX, and COUNT. For example, the first row represents ProductKeys 1, 3, 5, 7, and 9. The second row represents ProductKeys 1, 4, 5, 6, and 9.

ProductAgg
SignatureKey

Count
ProductKey

Min
ProductKey

Max
ProductKey

Sum
ProductKey

SumExponentMod
ProductKey

998

5

1

9

25

176

999

5

1

9

25

192

To define each of those columns:

  • ProductAggSignatureKey – an identity column

  • CountProductKey – the row count of DimProduct which is included in the agg

  • MinProductKey – the lowest ProductKey included in the agg

  • MaxProductKey – the highest ProductKey included in the agg

  • SumProductKey – the sum of the ProductKey values included in the agg

  • SumExponentModProductKey – the sum of a new column or calculation in the Product table which is: (ProductKey*ProductKey*100)%77

That last column is certainly odd but as you can see in the example above, it was necessary to distinguish the first and second row.

After gathering business requirements and studying the product rollups and date rollups used in several key dashboards, we chose the rollups to pre-aggregate in Azure SQL DW.

How do we determine at query time in DAX whether we have pre-aggregated the product dimension rollup the user requested? I will show a full implementation later, but the following DAX snippet returns the ProductAggSignatureKey if we have it and returns BLANK() if we don’t have it pre-aggregated.

LOOKUPVALUE (
    ProductAggSignature[ProductAggSignatureKey],
    ProductAggSignature[CountProductKey], COUNTROWS ( 'Product' ),
    ProductAggSignature[MinProductKey], MIN ( 'Product'[ProductKey] ),
    ProductAggSignature[MaxProductKey], MAX ( 'Product'[ProductKey] ),
    ProductAggSignature[SumProductKey], SUM ( Product[ProductKey] ),
    ProductAggSignature[SumExponentModProductKey], SUM ( 'Product'[ExponentModProductKey] )
)

Now that we’ve explained the DateRangeKey and the ProductAggSignatureKey, the new agg table is pretty straightforward.

DateRangeKey

ProductAggSignatureKey

StoreKey

StoreDayCount

20190204004

998

101

4

20190204004

998

102

4

In this scenario, we have pre-aggregated MTD February 4, 2019 for a product rollup I described above. We store one row per StoreKey in the rollup. The StoreDayCount is the number of days that store had sales of a product in that product rollup. Therefore, I can sum up 4 + 4 to get the correct “Store Day Count” number if I ask for this exact date range and product rollup.

The following diagram shows the tables and relationships in the Tabular model. The yellow highlighted tables and columns are the new ones added to implement this exact match agg optimization.


How do we do that in DAX? If both of the DateRangeKey and ProductAggSignatureKey LOOKUPVALUE functions return a number, then we return SUM([StoreDayCount]) from rows matching those keys. Otherwise, we fall back to our regular slow DISTINCTCOUNT measure. Here’s the full DAX measure:

Store Day Count (Optimized):=
VAR DateRangeKey =
    LOOKUPVALUE (
        DateRange[DateRangeKey],
        DateRange[MinDateKey], MIN ( 'Date'[DateKey] ),
        DateRange[MaxDateKey], MAX ( 'Date'[DateKey] ),
        DateRange[DateRangeKey], MAX ( 'Date'[DateKey] ) * 1000 + COUNTROWS ( 'Date' )
    )
RETURN
    IF (
        NOT ( ISBLANK ( DateRangeKey ) ),
        VAR ProductAggSignatureKey =
            LOOKUPVALUE (
                ProductAggSignature[ProductAggSignatureKey],
                ProductAggSignature[CountProductKey], COUNTROWS ( 'Product' ),
                ProductAggSignature[MinProductKey], MIN ( 'Product'[ProductKey] ),
                ProductAggSignature[MaxProductKey], MAX ( 'Product'[ProductKey] ),
                ProductAggSignature[SumProductKey], SUM ( Product[ProductKey] ),
                ProductAggSignature[SumExponentModProductKey], SUM ( 'Product'[ExponentModProductKey] )
            )
        RETURN
            IF (
                NOT ( ISBLANK ( ProductAggSignatureKey ) ),
                CALCULATE (
                    SUM ( StoreDayExactMatchAgg[StoreDayCount] ),
                    StoreDayExactMatchAgg[DateRangeKey] = DateRangeKey,
                    StoreDayExactMatchAgg[ProductAggSignatureKey] = ProductAggSignatureKey
                ),
                [Slow DISTINCTCOUNT Measure]
            ),
        [Slow DISTINCTCOUNT Measure]
    )

For this sample code I created this as a new “Store Day Count (Optimized)” measure in order to be able to easily compare the performance of the slow approach and the optimized approach. In a real model, I would have renamed the existing “Store Day Count” measure, hidden it, and replaced the “Store Day Count” measure with the DAX shown above.

Why Did I Not Use the Typical DAX Approach?

If I had been able to use Power BI agg tables, then there would have been no additional DAX changes necessary for Power BI to use the agg tables when possible. (The “typical DAX approach” in Power BI is no additional DAX.) However, we couldn’t use Power BI aggs for reasons stated above.

In SSAS Tabular, Marco Russo describes in this video (around the 1:02:00 mark) the typical DAX approach to detecting what rollups the user chose. Typically, you use ISFILTERED or ISCROSSFILTERED. The DAX query optimizer handles this very efficiently and this works well when you are detecting one fairly simple filter context like “did I filter by product subcategory?” However, when you are detecting thousands of different grain rollups, this approach will be impossible to code and maintain. That’s why we chose to perform a LOOKUPVALUE against the DateRange and ProductAggSignature tables to make this determination.

How Much Faster Is It?

For the real client scenario, the distinct count got about 23x faster. Reports that looked at large date ranges like YTD got approximately 4x faster (realizing we were only optimizing the distinct count denominator of calculations, not the numerator) and we were able to get reports to render under 10 seconds. Reports that looked at one day or one week were approximately 1.4x faster.

For the sample I published to Github (see below) the core distinct count measure became 210x faster and reporting queries that contained sales amounts and year-over-year comparisons became approximately 24x faster. Obviously, your results will vary depending on your data and hardware, but clearly this is an order of magnitude faster performance.

I like that we can deploy this optimization such that it is completely transparent to the user since the DAX code automatically falls back to the slower distinct count if we don’t have the rollup they want pre-aggregated. It’s complex to implement, but on large data volumes where interactive distinct count query performance is needed, it’s definitely an option to consider.

Try It Out and Take the Challenge

Realizing this is a rather complex solution, I published the code here on Github with sample data. It’s only 400 million rows not 18 billion rows but it should be enough to demonstrate a bit of a performance issue and show how the solution is an order of magnitude faster. Please give it a try and use the Issues tab for any questions you have.

I would also like to drop the gauntlet and challenge you to come up with a better solution! If you can make the solution faster or less complex to implement, I would love to hear it. Start a thread on the Issues tab to share your approach. Certainly, further optimizations of the solution are welcome. Thinking out-of-the-box like changing the technologies involved is also welcome.

Sending Users to the Geographically Nearest Azure Analysis Services

image001.png

I’ve been preparing my Automating Azure Analysis Services talk for the May 19, 2018 SQL Saturday in Dallas. The session abstract is:

The elasticity of the cloud lets you provide a better user experience instead of breaking the bank running your solution at the highest scale 24 hours a day. This demo-heavy session will show how to automate Azure Analysis Services. We will look at solutions to the following questions: How do you schedule a refresh of the data in your Azure Analysis Services model? How do you pause the server on the weekends? Better yet, how can you pause it when the last user disconnects and start it automatically when the first user connects in the morning? How do you scale to meet peak demand each day? How can Azure Analysis Services load balance across servers in multiple countries so that the user connects to the geographically closest server? How do you extend your on-premises SSAS solution into Azure Analysis Services?

This blog post is about the bolded part of that abstract. In other words, it is about users from a multi-national corporation connecting automatically to the geographically nearest Azure Analysis Services instance. Why? Performance. High availability. Cost optimization. Even data security involving national borders.

 

The Ingredients

Most Azure solutions are a bit like a soup pot where you toss in various Azure technologies as ingredients to create something greater than the sum of their parts. In this instance, we’ll mince up some Azure Traffic Manager, some Azure Web Apps, and some Azure Analysis Services to create a pretty tasty solution.

image002.png

 

How Users Connect

In the diagram above, we have some worldwide users. Everyone will connect to Azure Analysis Services via the server name they were provided link://gregazureas.trafficmanager.net/. That server name could easily have been a vanity URL such as link://azureas.artisconsulting.com/ using custom DNS. This uses the server aliases feature of Azure Analysis Services and the latest drivers. (Notice Christian Wade mentioned the idea of geographic redirection in that article!)

image003.png

When a user in North America connects, Azure Traffic Manager sends them to the Azure Analysis Services instance in South Central US because it determined that was geographically closest. But when a user in Asia connects, it determines that the Japan East instance of Azure AS is closest. This all happens automatically and transparently to the user. And when the Japan Azure AS instance and Japan web app are paused for the night, any Japanese users who are working late will end up connecting to the South Central US instance. So this solution can also add an extra layer of high availability or cost optimization.


How does this work?

Azure Traffic Manager is just a DNS level service which understands user geography based upon IP addresses and which understands the health of your backend services. It pairs these two pieces of information together and decides where to route a user by returning the appropriate DNS. In this instance, we tell Azure Traffic Manager to route based upon the “Performance” routing method which redirects you to the geo which will have the lowest latency for you. However, review the other routing methods to determine which one matches your needs best.

image004.png

You setup endpoints in Azure Traffic Manager to tell it where it can route traffic.

image005.png

Azure Traffic Manager will monitor those two websites to ensure they are up.

You can use dnschecker.org to resolve the DNS from servers around the world to test your Azure Traffic Manager routing method is working as expected.

image006.png

You can see that users in Malaysia, Japan, and China would get routed to the Japan East server while other users would end up in the South Central US server.

Azure Traffic Manager configuration is so easy I literally set it up from a web browser on my phone while I was at a dead stop in traffic.


Setting up the Web Apps

How were the web apps from above setup? In this instance, the web apps are very simple. Create two web apps. In order to easily integrate with Azure Traffic Manager, you need to choose at least the S1 Standard tier as lower tiers don’t support Azure Traffic Manager.

image007.png

Next, edit the website files via a Visual Studio project or other methods such as the in-browser App Service Editor which I will show below.

image008.png

 

Unless you have more advanced needs, the code can be very simple. The Japan East web app should return the string azureas://japaneast.azureas.windows.net/<yourservername> while the South Central US web app should return asazure://southcentralus.asazure.windows.net/<yourservername>.

 

The URL https://<yourwebsite>.azurewebsites.net will work over SSL properly, but if you try to connect via https://gregazureas.trafficmanager.net then you will get an SSL certificate error since the SSL cert doesn’t match the URL. You need to purchase an SSL certificate for this URL and apply it to the website for your trafficmanager.net URL (or your vanity URL). Since this was just a demo not a production environment, I used Let’s Encrypt to create a free SSL cert. Why do we need SSL? The link:// syntax uses HTTPS under the covers and will not work if there’s an invalid certificate.


What if I’m using Analysis Services on a VM?

Connection to Azure Analysis Services is a little different because it connects over HTTPS and supports the link:// syntax. So how do you accomplish the same solution with Analysis Services on a VM? If you already enabled msmdpump.dll for HTTP connectivity to Analysis Services, then just point Traffic Manager at those URLs. Assuming you haven’t enabled HTTP connectivity or don’t want to, you can just eliminate the web apps and point Azure Traffic Manager at the public IP of the SSAS server and have it monitor on port 2383 TCP.

image010.png

 

What if I Want To Load Balance Across On-Premises SSAS and Azure Analysis Services?

Let’s say you have an on premises SSAS Tabular install which is heavily used by users in your corporate office. But what if you need to burst to the cloud during busy month-end reporting? Or what if you need to put Azure Analysis Services in Japan for Japanese users but allow your corporate users to continue using their on-premises SSAS server? Well, I suspect this isn’t a supported scenario since the versions of the SSAS engine on prem vs. in Azure Analysis Services will be slightly different. But supportability doesn’t quash my curiosity (even if it’s not wise).

If Excel users are your target audience, then have the website return a dynamic ODC file that points to either the on-premises server or the cloud azureas:// server name depending on the location of the user.

As for other client tools, I haven’t figured out an elegant way. Maybe you will!


Conclusion

I have personally experienced on-premises global projects where users that were near the corporate office experienced great SSAS performance while users in Europe or Asia experienced annoyingly sluggish performance due to the network latency and bandwidth to get to US servers. This solution is a perfect fit for such global rollouts of Azure Analysis Services.

 

Optimizing Large Distinct Counts in Analysis Services Tabular

The most glaring hole in the Tabular Performance Guide is information about tuning the performance of distinct counts. In fact I don’t believe distinct counts are even mentioned once in the whole whitepaper. (Since I helped author that whitepaper, visualize me hanging my head in shame and hoping this blog post will serve as my redemption.)

Thankfully others have picked up the slack and have published some great research on optimizing distinct count performance. Alex Whittles did some tests about the impact of DefaultSegmentRowCount on distinct counts. Alberto Ferrari wrote a nice whitepaper all about different approaches to distinct counts and understanding distinct count query plans and he also wrote a nice case study about InContact which suggested if you never need to report across business units you can separate business units into multiple SSAS databases to dramatically improve distinct count query performance. Marco Russo reminded us of an important performance improvement for large distinct counts that came out in SQL2012 SP1 CU9 and SQL2014 RTM CU1.


The Best Way to Optimize Distinct Counts

However, the best way to optimize a distinct count is to change it to a regular count. This blog post documents one way of doing this if there are many rows where the value you are distinct counting only appears once per table. For example in the following table we are distinct counting CustomerKey. Customer 99 and 88 only have one order:

OrderKey

CustomerKey

1111

99

2222

88

3333

77

4444

77

5555

66

6666

66

So we could transform this table to the following:

OrderKey

CustomerKey

CustomerKeyMulti

CustomerKeySingle

1111

99

 

1

2222

88

 

1

3333

77

77

 

4444

77

77

 

5555

66

66

 

6666

66

66

 

Now instead of DISTINCTCOUNT(FactOrder[CustomerKey]) we can improve performance by doing:

DISTINCTCOUNT(FactOrder[CustomerKeyMulti])
+ SUM(FactOrder[CustomerKeySingle])

That calculation will actually be 1 high because a distinct count on a column with nulls will count those nulls. So to fix that bug we need to make the calculation slightly more complex:

DISTINCTCOUNT(FactOrder[CustomerKeyMulti])
+ IF(
   SUM(FactOrder[CustomerKeySingle])>=1,
   SUM(FactOrder[CustomerKeySingle])-1
  )

How much faster is this optimized version than a plain distinct count? The following tests were done on a 200 million row table on my laptop which has an Intel i7-4702HQ processor. I did some similar tests on Azure D-series and G-series virtual machines and saw relatively similar numbers. As you can see, if 195 million rows are customers who only placed one order, then a 19 second distinct count becomes half a second once optimized. As a smaller percentage of rows are customers who only placed one order, the improvement after optimization shrinks to the point that the optimization is not worth the data preparation effort.

As with performance of most things in Tabular, the cardinality of columns largely determines performance. Drilling into these numbers in more detail, study the first row which took 18 seconds to run a distinct count across 200 million rows. Optimized it took 0.5 seconds and the optimized distinct count only had to distinct count 5 million values (200M - 195M = 5M distinct values in the CustomerKeyMulti column).

Rows with Unique Distinct Count Value

CustomerKey Cardinality

Regular Query Time (ms)

CustomerKeyMulti Cardinality

Optimized Query Time (ms)

195M

200M

18,480

5M

500

160M

180M

17,093

20M

1,858

120M

160M

13,003

40M

3,528

80M

140M

12,134

60M

5,620

40M

120M

11,632

80M

6,789

10M

105M

10,939

95M

9,523


Data Preparation for this Optimization

The ETL to calculate these two new columns is fairly expensive but depending on your data the resulting Analysis Services query performance enhancement may be worth it. The most straightforward approach to building the Multi and Single columns is:

select OrderKey
,
CustomerKey
,CustomerKeyMulti = case when CustomerKeyCount > 1 then CustomerKey end
,CustomerKeySingle = case when CustomerKeyCount = 1 then 1 end
from (
 select OrderKey
 ,CustomerKey
 ,CustomerKeyCount = count(*) over (PARTITION BY CustomerKey) > 1
 from FactOrder
) f

You could also split your single fact table into two fact tables to potentially optimize it even further. I briefly tested this and at least on my particular dataset this optimization didn’t make a significant difference, but your mileage may vary. If you don’t have any degenerate dimension columns, you can split the one optimized table into:

OrderKey

CustomerKey

CustomerKeyMulti

3333

77

77

4444

77

77

5555

66

66

6666

66

66

OrderKey

CustomerKey

CustomerKeySingle

1111

99

1

2222

88

1

 

Conclusion

Why perform an expensive distinct count if it’s not needed? If you can transform even a portion of your distinct count into a regular count, your query performance gain inside Analysis Services Tabular may be tremendous.

Setting up a Demo of the Power BI Analysis Services Connector

Note: This post is written as of December 2014 and represents functionality at this time.

Shameless Plug for Spatula City

I fear those who have never seen UHF and its famous Spatula City commercial will worry about my mental health. But compared to Adventure Works or Contoso, I feel that Spatula City makes a legitimate demo company. So let’s say I wanted to setup a demo of the new Power BI Analysis Services Connector for less than the cost of a spatula… how could I do that? I don’t want to purchase the spatulacity.com domain (because some loser already beat me to it and because that would cost more than a spatula) and I don’t want to bother federating a domain to Azure. Here’s a trick for just such a scenario. First a little history lesson on Power BI.

 

Old Power BI = Self Service Cloud BI

Prior to this month, Power BI could be neatly summarized as Microsoft’s self-service cloud BI offering. You could share online whatever data you could fit inside a single Excel workbook up to 250MB (which with columnar compression may be a couple GB of data). You could refresh daily against on prem sources using the Data Management Gateway. Your subject matter expert could share their Power Queries with the rest of the company enabling others to do self-service mashups. You could share your workbook with others via SharePoint Online granting them the ability to read all the data in the Excel workbook. You could even programmatically generate many clones of a workbook, each with a different subset of data, each shared with a different subset of users.

 

New Power BI = Enterprise Ready Cloud BI

However, this month the Power BI Public Preview released a new connector for Analysis Services which changes the game completely by enabling a great hybrid cloud scenario. Here’s how it works in a real enterprise (i.e. not Spatula City). You leverage your existing Analysis Services Tabular Model which has role-based security and leave it on premises. Then you connect to it through the Power BI site. For that connection to work you must install the Analysis Services Connector and configure it to connect to your Analysis Services instance using a myawesomecompany\ssasadmin (a service account which is an Analysis Services admin). Your IT department has federated your myawesomecompany domain to Azure so you can login to https://app.powerbi.com as bob@myawesomecompany.com and open a report connected live to a 200GB Tabular model (much more than the 250MB Excel file limit mentioned above). Per the role-based security in the Tabular model, Bob is only allowed to see sales from his territory. Under the covers, the Analysis Services Connector opens a connection as myawesomecompany\ssasadmin and adds EffectiveUserName=bob@myawesomecompany.com to the connection string. Since the SSAS server is joined to the myawesomecompany domain (or to a domain that trusts myawesomecompany) it impersonates Bob and he only sees data from his territory in Power BI. The model stays on premises and live DAX queries are run against your model so only the summarized query results displayed on the screen are transferred to the cloud.

 

An Error You May Hit With the Analysis Services Connector in a Demo Environment

All of that works great in real enterprise deployments of Power BI where you have federated your domain to Azure. However, for my Spatula City demo, I don’t want to buy the spatulacity.com domain or federate my demo domain with Azure. I want to do a quick proof-of-concept, so I will spin up a Power BI tenant called spatulacity.onmicrosoft.com. That’s Azure Active Directory; that’s not an Active Directory domain you can join your virtual machine to directly. (Update: Now you can join your Azure VM to Azure Active Directory Domain Services. I describe how that works with Power BI here.) So when you login to Power BI as fred@spatulacity.onmicrosoft.com and try to connect to Analysis Services, you will get an error in Analysis Services since it can’t impersonate fred@spatulacity.onmicrosoft.com. If your SSAS server is joined to a domain the error you see in SQL Server Profiler connected to SSAS will read “The following system error occurred: The user name or password is incorrect.” but if your SSAS server isn’t on a domain, then your error may read “The following system error occurred:  The name provided is not a properly formed account name.”

 

How to Setup a Demo Environment for the Analysis Services Connector

Here’s my setup and the trick I used to make this work in a demo environment. I created a virtual network in Azure and added two virtual machines. The first VM I added was a small VM to serve as a domain controller for a new Active Directory domain called gregazuredomain.local. I added a second VM as an Analysis Services server and I joined the server to the domain. I installed the Power BI Analysis Services Connector. Next I needed to setup my users.

Creating users like fred@gregazuredomain.local won’t help me. I need to setup a fred@spatulacity.onmicrosoft.com user in my Active Directory domain so that I can impersonate that user in Analysis Services. (To reiterate, in a real production scenario, I would just federate my domain with Azure Active Directory and log onto Power BI as fred@spatulacity.com, but I’m not doing a real production deployment, just a demo.) So I need to fake out Active Directory so it lets me create fred@spatulacity.onmicrosoft.com as a user. I open up the Active Directory Domains and Trusts app and right click on the top node and choose Properties. Then I add spatulacity.onmicrosoft.com as an alternate UPN suffix:

 

Now when I go into the Active Directory Administrative Center to add a new user, I can create a user with the desired UPN (userPrincipalName) fred@spatulacity.onmicrosoft.com. It’s not actually the same account I used in the Power BI site; it just has the same UPN. The password I enter doesn’t matter and doesn’t necessarily have to match the password you used when signing into Power BI. Power BI will connect to Analysis Services with the service account you setup when you installed the connector (an Analysis Services admin) and then do EffectiveUserName impersonation. EffectiveUserName impersonation doesn’t require knowing the user’s password. It just requires being an Analysis Services admin.

Now I can setup role-based security in Analysis Services to grant access to Fred:

Now I can login to my dashboard and see how Spatula City is revolutionizing the kitchen accessories industry:
 

Under the covers, you can see in Profiler connected to SSAS that it’s setting the EffectiveUserName property:

In addition to Power BI using EffectiveUserName, I have blogged in the past about EffectiveUserName and Reporting Services. EffectiveUserName is also built-in to SharePoint 2013 Excel Services and PerformancePoint. If it’s not obvious, I’m a fan because you can avoid the hassle of Kerberos setup for common on prem and cloud BI scenarios.

To summarize, editing the UPN of a user in a demo Active Directory domain to match your Power BI login is a quick way of enabling a Power BI Analysis Services Connector demo environment. Just make sure that your demo covers a topic more interesting than spatula sales.