SSAS

Sending Users to the Geographically Nearest Azure Analysis Services

image001.png

I’ve been preparing my Automating Azure Analysis Services talk for the May 19, 2018 SQL Saturday in Dallas. The session abstract is:

The elasticity of the cloud lets you provide a better user experience instead of breaking the bank running your solution at the highest scale 24 hours a day. This demo-heavy session will show how to automate Azure Analysis Services. We will look at solutions to the following questions: How do you schedule a refresh of the data in your Azure Analysis Services model? How do you pause the server on the weekends? Better yet, how can you pause it when the last user disconnects and start it automatically when the first user connects in the morning? How do you scale to meet peak demand each day? How can Azure Analysis Services load balance across servers in multiple countries so that the user connects to the geographically closest server? How do you extend your on-premises SSAS solution into Azure Analysis Services?

This blog post is about the bolded part of that abstract. In other words, it is about users from a multi-national corporation connecting automatically to the geographically nearest Azure Analysis Services instance. Why? Performance. High availability. Cost optimization. Even data security involving national borders.

 

The Ingredients

Most Azure solutions are a bit like a soup pot where you toss in various Azure technologies as ingredients to create something greater than the sum of their parts. In this instance, we’ll mince up some Azure Traffic Manager, some Azure Web Apps, and some Azure Analysis Services to create a pretty tasty solution.

image002.png

 

How Users Connect

In the diagram above, we have some worldwide users. Everyone will connect to Azure Analysis Services via the server name they were provided link://gregazureas.trafficmanager.net/. That server name could easily have been a vanity URL such as link://azureas.artisconsulting.com/ using custom DNS. This uses the server aliases feature of Azure Analysis Services and the latest drivers. (Notice Christian Wade mentioned the idea of geographic redirection in that article!)

image003.png

When a user in North America connects, Azure Traffic Manager sends them to the Azure Analysis Services instance in South Central US because it determined that was geographically closest. But when a user in Asia connects, it determines that the Japan East instance of Azure AS is closest. This all happens automatically and transparently to the user. And when the Japan Azure AS instance and Japan web app are paused for the night, any Japanese users who are working late will end up connecting to the South Central US instance. So this solution can also add an extra layer of high availability or cost optimization.


How does this work?

Azure Traffic Manager is just a DNS level service which understands user geography based upon IP addresses and which understands the health of your backend services. It pairs these two pieces of information together and decides where to route a user by returning the appropriate DNS. In this instance, we tell Azure Traffic Manager to route based upon the “Performance” routing method which redirects you to the geo which will have the lowest latency for you. However, review the other routing methods to determine which one matches your needs best.

image004.png

You setup endpoints in Azure Traffic Manager to tell it where it can route traffic.

image005.png

Azure Traffic Manager will monitor those two websites to ensure they are up.

You can use dnschecker.org to resolve the DNS from servers around the world to test your Azure Traffic Manager routing method is working as expected.

image006.png

You can see that users in Malaysia, Japan, and China would get routed to the Japan East server while other users would end up in the South Central US server.

Azure Traffic Manager configuration is so easy I literally set it up from a web browser on my phone while I was at a dead stop in traffic.


Setting up the Web Apps

How were the web apps from above setup? In this instance, the web apps are very simple. Create two web apps. In order to easily integrate with Azure Traffic Manager, you need to choose at least the S1 Standard tier as lower tiers don’t support Azure Traffic Manager.

image007.png

Next, edit the website files via a Visual Studio project or other methods such as the in-browser App Service Editor which I will show below.

image008.png

 

Unless you have more advanced needs, the code can be very simple. The Japan East web app should return the string azureas://japaneast.azureas.windows.net/<yourservername> while the South Central US web app should return asazure://southcentralus.asazure.windows.net/<yourservername>.

 

The URL https://<yourwebsite>.azurewebsites.net will work over SSL properly, but if you try to connect via https://gregazureas.trafficmanager.net then you will get an SSL certificate error since the SSL cert doesn’t match the URL. You need to purchase an SSL certificate for this URL and apply it to the website for your trafficmanager.net URL (or your vanity URL). Since this was just a demo not a production environment, I used Let’s Encrypt to create a free SSL cert. Why do we need SSL? The link:// syntax uses HTTPS under the covers and will not work if there’s an invalid certificate.


What if I’m using Analysis Services on a VM?

Connection to Azure Analysis Services is a little different because it connects over HTTPS and supports the link:// syntax. So how do you accomplish the same solution with Analysis Services on a VM? If you already enabled msmdpump.dll for HTTP connectivity to Analysis Services, then just point Traffic Manager at those URLs. Assuming you haven’t enabled HTTP connectivity or don’t want to, you can just eliminate the web apps and point Azure Traffic Manager at the public IP of the SSAS server and have it monitor on port 2383 TCP.

image010.png

 

What if I Want To Load Balance Across On-Premises SSAS and Azure Analysis Services?

Let’s say you have an on premises SSAS Tabular install which is heavily used by users in your corporate office. But what if you need to burst to the cloud during busy month-end reporting? Or what if you need to put Azure Analysis Services in Japan for Japanese users but allow your corporate users to continue using their on-premises SSAS server? Well, I suspect this isn’t a supported scenario since the versions of the SSAS engine on prem vs. in Azure Analysis Services will be slightly different. But supportability doesn’t quash my curiosity (even if it’s not wise).

If Excel users are your target audience, then have the website return a dynamic ODC file that points to either the on-premises server or the cloud azureas:// server name depending on the location of the user.

As for other client tools, I haven’t figured out an elegant way. Maybe you will!


Conclusion

I have personally experienced on-premises global projects where users that were near the corporate office experienced great SSAS performance while users in Europe or Asia experienced annoyingly sluggish performance due to the network latency and bandwidth to get to US servers. This solution is a perfect fit for such global rollouts of Azure Analysis Services.

 

Optimizing Large Distinct Counts in Analysis Services Tabular

The most glaring hole in the Tabular Performance Guide is information about tuning the performance of distinct counts. In fact I don’t believe distinct counts are even mentioned once in the whole whitepaper. (Since I helped author that whitepaper, visualize me hanging my head in shame and hoping this blog post will serve as my redemption.)

Thankfully others have picked up the slack and have published some great research on optimizing distinct count performance. Alex Whittles did some tests about the impact of DefaultSegmentRowCount on distinct counts. Alberto Ferrari wrote a nice whitepaper all about different approaches to distinct counts and understanding distinct count query plans and he also wrote a nice case study about InContact which suggested if you never need to report across business units you can separate business units into multiple SSAS databases to dramatically improve distinct count query performance. Marco Russo reminded us of an important performance improvement for large distinct counts that came out in SQL2012 SP1 CU9 and SQL2014 RTM CU1.


The Best Way to Optimize Distinct Counts

However, the best way to optimize a distinct count is to change it to a regular count. This blog post documents one way of doing this if there are many rows where the value you are distinct counting only appears once per table. For example in the following table we are distinct counting CustomerKey. Customer 99 and 88 only have one order:

OrderKey

CustomerKey

1111

99

2222

88

3333

77

4444

77

5555

66

6666

66

So we could transform this table to the following:

OrderKey

CustomerKey

CustomerKeyMulti

CustomerKeySingle

1111

99

 

1

2222

88

 

1

3333

77

77

 

4444

77

77

 

5555

66

66

 

6666

66

66

 

Now instead of DISTINCTCOUNT(FactOrder[CustomerKey]) we can improve performance by doing:

DISTINCTCOUNT(FactOrder[CustomerKeyMulti])
+ SUM(FactOrder[CustomerKeySingle])

That calculation will actually be 1 high because a distinct count on a column with nulls will count those nulls. So to fix that bug we need to make the calculation slightly more complex:

DISTINCTCOUNT(FactOrder[CustomerKeyMulti])
+ IF(
   SUM(FactOrder[CustomerKeySingle])>=1,
   SUM(FactOrder[CustomerKeySingle])-1
  )

How much faster is this optimized version than a plain distinct count? The following tests were done on a 200 million row table on my laptop which has an Intel i7-4702HQ processor. I did some similar tests on Azure D-series and G-series virtual machines and saw relatively similar numbers. As you can see, if 195 million rows are customers who only placed one order, then a 19 second distinct count becomes half a second once optimized. As a smaller percentage of rows are customers who only placed one order, the improvement after optimization shrinks to the point that the optimization is not worth the data preparation effort.

As with performance of most things in Tabular, the cardinality of columns largely determines performance. Drilling into these numbers in more detail, study the first row which took 18 seconds to run a distinct count across 200 million rows. Optimized it took 0.5 seconds and the optimized distinct count only had to distinct count 5 million values (200M - 195M = 5M distinct values in the CustomerKeyMulti column).

Rows with Unique Distinct Count Value

CustomerKey Cardinality

Regular Query Time (ms)

CustomerKeyMulti Cardinality

Optimized Query Time (ms)

195M

200M

18,480

5M

500

160M

180M

17,093

20M

1,858

120M

160M

13,003

40M

3,528

80M

140M

12,134

60M

5,620

40M

120M

11,632

80M

6,789

10M

105M

10,939

95M

9,523


Data Preparation for this Optimization

The ETL to calculate these two new columns is fairly expensive but depending on your data the resulting Analysis Services query performance enhancement may be worth it. The most straightforward approach to building the Multi and Single columns is:

select OrderKey
,
CustomerKey
,CustomerKeyMulti = case when CustomerKeyCount > 1 then CustomerKey end
,CustomerKeySingle = case when CustomerKeyCount = 1 then 1 end
from (
 select OrderKey
 ,CustomerKey
 ,CustomerKeyCount = count(*) over (PARTITION BY CustomerKey) > 1
 from FactOrder
) f

You could also split your single fact table into two fact tables to potentially optimize it even further. I briefly tested this and at least on my particular dataset this optimization didn’t make a significant difference, but your mileage may vary. If you don’t have any degenerate dimension columns, you can split the one optimized table into:

OrderKey

CustomerKey

CustomerKeyMulti

3333

77

77

4444

77

77

5555

66

66

6666

66

66

OrderKey

CustomerKey

CustomerKeySingle

1111

99

1

2222

88

1

 

Conclusion

Why perform an expensive distinct count if it’s not needed? If you can transform even a portion of your distinct count into a regular count, your query performance gain inside Analysis Services Tabular may be tremendous.

Setting up a Demo of the Power BI Analysis Services Connector

Note: This post is written as of December 2014 and represents functionality at this time.

Shameless Plug for Spatula City

I fear those who have never seen UHF and its famous Spatula City commercial will worry about my mental health. But compared to Adventure Works or Contoso, I feel that Spatula City makes a legitimate demo company. So let’s say I wanted to setup a demo of the new Power BI Analysis Services Connector for less than the cost of a spatula… how could I do that? I don’t want to purchase the spatulacity.com domain (because some loser already beat me to it and because that would cost more than a spatula) and I don’t want to bother federating a domain to Azure. Here’s a trick for just such a scenario. First a little history lesson on Power BI.

 

Old Power BI = Self Service Cloud BI

Prior to this month, Power BI could be neatly summarized as Microsoft’s self-service cloud BI offering. You could share online whatever data you could fit inside a single Excel workbook up to 250MB (which with columnar compression may be a couple GB of data). You could refresh daily against on prem sources using the Data Management Gateway. Your subject matter expert could share their Power Queries with the rest of the company enabling others to do self-service mashups. You could share your workbook with others via SharePoint Online granting them the ability to read all the data in the Excel workbook. You could even programmatically generate many clones of a workbook, each with a different subset of data, each shared with a different subset of users.

 

New Power BI = Enterprise Ready Cloud BI

However, this month the Power BI Public Preview released a new connector for Analysis Services which changes the game completely by enabling a great hybrid cloud scenario. Here’s how it works in a real enterprise (i.e. not Spatula City). You leverage your existing Analysis Services Tabular Model which has role-based security and leave it on premises. Then you connect to it through the Power BI site. For that connection to work you must install the Analysis Services Connector and configure it to connect to your Analysis Services instance using a myawesomecompany\ssasadmin (a service account which is an Analysis Services admin). Your IT department has federated your myawesomecompany domain to Azure so you can login to https://app.powerbi.com as bob@myawesomecompany.com and open a report connected live to a 200GB Tabular model (much more than the 250MB Excel file limit mentioned above). Per the role-based security in the Tabular model, Bob is only allowed to see sales from his territory. Under the covers, the Analysis Services Connector opens a connection as myawesomecompany\ssasadmin and adds EffectiveUserName=bob@myawesomecompany.com to the connection string. Since the SSAS server is joined to the myawesomecompany domain (or to a domain that trusts myawesomecompany) it impersonates Bob and he only sees data from his territory in Power BI. The model stays on premises and live DAX queries are run against your model so only the summarized query results displayed on the screen are transferred to the cloud.

 

An Error You May Hit With the Analysis Services Connector in a Demo Environment

All of that works great in real enterprise deployments of Power BI where you have federated your domain to Azure. However, for my Spatula City demo, I don’t want to buy the spatulacity.com domain or federate my demo domain with Azure. I want to do a quick proof-of-concept, so I will spin up a Power BI tenant called spatulacity.onmicrosoft.com. That’s Azure Active Directory; that’s not an Active Directory domain you can join your virtual machine to directly. (Update: Now you can join your Azure VM to Azure Active Directory Domain Services. I describe how that works with Power BI here.) So when you login to Power BI as fred@spatulacity.onmicrosoft.com and try to connect to Analysis Services, you will get an error in Analysis Services since it can’t impersonate fred@spatulacity.onmicrosoft.com. If your SSAS server is joined to a domain the error you see in SQL Server Profiler connected to SSAS will read “The following system error occurred: The user name or password is incorrect.” but if your SSAS server isn’t on a domain, then your error may read “The following system error occurred:  The name provided is not a properly formed account name.”

 

How to Setup a Demo Environment for the Analysis Services Connector

Here’s my setup and the trick I used to make this work in a demo environment. I created a virtual network in Azure and added two virtual machines. The first VM I added was a small VM to serve as a domain controller for a new Active Directory domain called gregazuredomain.local. I added a second VM as an Analysis Services server and I joined the server to the domain. I installed the Power BI Analysis Services Connector. Next I needed to setup my users.

Creating users like fred@gregazuredomain.local won’t help me. I need to setup a fred@spatulacity.onmicrosoft.com user in my Active Directory domain so that I can impersonate that user in Analysis Services. (To reiterate, in a real production scenario, I would just federate my domain with Azure Active Directory and log onto Power BI as fred@spatulacity.com, but I’m not doing a real production deployment, just a demo.) So I need to fake out Active Directory so it lets me create fred@spatulacity.onmicrosoft.com as a user. I open up the Active Directory Domains and Trusts app and right click on the top node and choose Properties. Then I add spatulacity.onmicrosoft.com as an alternate UPN suffix:

 

Now when I go into the Active Directory Administrative Center to add a new user, I can create a user with the desired UPN (userPrincipalName) fred@spatulacity.onmicrosoft.com. It’s not actually the same account I used in the Power BI site; it just has the same UPN. The password I enter doesn’t matter and doesn’t necessarily have to match the password you used when signing into Power BI. Power BI will connect to Analysis Services with the service account you setup when you installed the connector (an Analysis Services admin) and then do EffectiveUserName impersonation. EffectiveUserName impersonation doesn’t require knowing the user’s password. It just requires being an Analysis Services admin.

Now I can setup role-based security in Analysis Services to grant access to Fred:

Now I can login to my dashboard and see how Spatula City is revolutionizing the kitchen accessories industry:
 

Under the covers, you can see in Profiler connected to SSAS that it’s setting the EffectiveUserName property:

In addition to Power BI using EffectiveUserName, I have blogged in the past about EffectiveUserName and Reporting Services. EffectiveUserName is also built-in to SharePoint 2013 Excel Services and PerformancePoint. If it’s not obvious, I’m a fan because you can avoid the hassle of Kerberos setup for common on prem and cloud BI scenarios.

To summarize, editing the UPN of a user in a demo Active Directory domain to match your Power BI login is a quick way of enabling a Power BI Analysis Services Connector demo environment. Just make sure that your demo covers a topic more interesting than spatula sales.

Tabular Performance Guide Speaking Tour

Some of the most fun I’ve had this year was spent collaborating with some friends at Microsoft to publish an important guide on performance tuning Tabular models in Analysis Services. I am very proud to have helped co-author the paper which represents hundreds of hours of work by many people.

Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services

http://aka.ms/ASTabPerf2012

The reaction from the community has been great (here and here for example). Chris Webb even noted that there is some content in the whitepaper relevant to Multidimensional models, too. Specifically he mentioned we noted in the whitepaper that SQL 2012 SP1 CU4 (though you should just skip to CU6 which is the latest at the moment) included an enhancement so that multi-select filters no longer prevent caching in most scenarios. While that is a huge deal for Multidimensional models, it’s also very helpful for Tabular models if you’re querying them in a client tool like Excel which runs MDX queries. For those who are optimizing Multidimensional models, I’d also point you to a summary of SSAS enhancements made in SSAS 2012 on page 144, a warning about very wide models and heap fragmentation on page 67, and a discussion of updated SSAS drivers and performance on page 69.


PASS Summit 2013

John Sirmon and I had the privilege of summarizing 147 pages of content into a 75 minute presentation at PASS Summit 2013 in Charlotte, NC recently. The room was pretty packed and there were some great questions afterward.

SQL Saturday #255 – Dallas

For the locals who weren’t able to make it to Charlotte, I’m going to be doing the talk in my hometown for SQL Saturday Dallas on November 2, 2013. Here’s the abstract:

Gain the knowledge, tips, and techniques you need to tune your Tabular models in SQL Server Analysis Services 2012. This session will begin by exploring the Tabular model architecture and explaining the factors that affect performance. We’ll then walk through tuning techniques and strategies specific to querying, processing, partitioning, and server configuration and demonstrate optimizing performance in some common scenarios. This session is based on the recently released white paper “Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services.”


Tabular Performance Event

It’s hard to do justice to this content in an hour, so I am looking forward to speaking at an all-day event at our local Microsoft office in Irving, TX on November 12, 2013. If you are interested, please reach out to Bryan Smith.

N/A For Subtotals

Adventure Works bashing is quite a fun sport. It’s unfortunate Adventure Works makes such an easy target. In this post I want to pick on how the Adventure Works cube shows N/A for the Statistical Account subtotal.

Obviously it makes no sense to sum together Headcount and Units and Average Unit Price. Setting the Statistical Accounts subtotal to NA does make sense. Adventure Works uses the following code to do this:

// Assignment of constant for Statistical Accounts account member.
// This will allow parent member to be visible by default.
( [Account].[Accounts].&[95], [Measures].[Amount] ) = "NA" ;

Unfortunately, this is not the best way for several reasons.


Strings Break Aggregations

First, assigning a string will null out any subtotals and grand totals above where you assigned the string. This problem isn’t visible in Adventure Works since IsAggregatable is set to false on the [Account].[Accounts] hierarchy which removes the grand total. But if we do the following assignment on a hierarchy with a grand total, we see the problem:

( [Product].[Product Categories].[Category].[Accessories], [Measures].[Internet Sales Amount] ) = "NA";

      before assignment

       after assignment

You can see that when we assigned a string to the Accessories subtotal, it nulled out the grand total making it disappear. And null grand totals are a major usability problem, as I’ll explain in a second.


Preserving Sparsity

The second reason the Adventure Works NA approach is flawed is that it eliminates the sparsity in the cube for the Statistical Accounts subtotal. For example, let’s say we put [Date].[Date] on columns, note a bunch of dates with no data except for NA show up. This hurts performance tremendously and makes it more difficult to find meaningful data.

 

Typically, fixing this problem involves a conditional assignment which ensures empty cells stay empty:

( [Account].[Accounts].&[95], [Measures].[Amount] ) = IIf(IsEmpty([Measures].[Amount]),null,"NA");

But in Adventure Works, all the Statistical Accounts have a ~ unary operator which means that Statistical Accounts subtotal is always null. So to preserve sparsity and fill in the Statistical Accounts subtotal with NA only where any statistical accounts have data, we would have to do something like:

( [Account].[Accounts].&[95], [Measures].[Amount] ) =
IIf(
 IsEmpty(
Sum(
 [Account].[Accounts].&[95].Children
 ,[Measures].[Amount]
)
 )
 ,null
 ,"NA"
);


Strings Are Less Efficient

The third reason the Adventure Works NA approach is flawed is the degradation in performance due to the use of strings. In talking with the SSAS team, Marius Dumitru summarized this problem by saying, “strings are larger than other data types, harder to manipulate, introduce data structure indirections and memory copies/allocations.”

Mosha Pasumansky mentioned how strings and sparsity impact performance in the discussion of the Ratio calculation in this old blog post.


Null Grand Totals

Recently some co-workers were building a PivotTable and no matter what they did, they could never get anything more than a blank PivotTable. After a bit of head scratching, I realized what was happening was that the grand total was nulled out, so they were never even able to start the PivotTable properly.

To demonstrate this problem, let’s pretend we want to null out the grand total on the Product Categories hierarchy:

( [Product].[Product Categories].[All Products], [Measures].[Internet Sales Amount] ) = null;

Now if we start a PivotTable and put something other than the Product Categories hierarchy on rows, you’ll see this null grand total prevents us from starting the PivotTable since no rows even show up:

       before assignment

      after assignment

Incidentally, you can tell from the “this will allow parent member to be visible” comment in the code at the top of this blog post that the developers of Adventure Works had this very thing in mind. Since the ~ unary operator caused Statistical Accounts to be null always, it made it impossible to expand that subtotal since it would never appear.

Of course, your users could check the “Show items with no data on rows” option under PivotTable Options… Display. But you can’t expect your users to know to do that. PivotTables are designed for top down analysis. And if the top is null, your PivotTable is not user friendly.


My Solution

So how do we avoid all three Adventure Works problems created by assigning strings? And how do we avoid the problem of null subtotals and null grand totals?

You will note Mosha Pasumansky’s approach in the blog post above was to use NULL and a four-part format string like '#.#;;;NA'. The problem with that approach is that it doesn’t work in Excel PivotTables. Excel retrieves the VALUE and the FORMAT_STRING cell properties then does the formatting itself. But the formatting code in Excel ignores the fourth part, unfortunately. (As a reminder, format strings are semicolon separated. The first part is for positive numbers, the second part is for negative numbers, the third part is for zero, and the fourth part is for nulls.) So our solution will have to limit itself to three-part format strings which work in Excel.

My approach of choice looks like the following:

scope( [Account].[Accounts].&[95], [Measures].[Amount] );
 this = IIf(IsEmpty(Sum([Account].[Accounts].&[95].Children, [Measures].[Amount])),null,0);
 Format_String(this) = ";;"; //format zeros as blank
end scope;

It is assigning a zero to a cell if any statistical accounts have data in this slice, otherwise it’s assigning null. Then it’s using a custom format string just for the Statistical Accounts subtotal. The format string is just two semicolons. That three-part format string says to format zero as blank. Since the format string is inside the scope statement, we’re only customizing the format string for the Statistical Accounts subtotal.

If you prefer NA, then you would use the following format string code inside the scope statement:

 Format_String(this) = ";;\N\A";

And as discussed above, typically IsEmpty(Sum([Account].[Accounts].&[95].Children, [Measures].[Amount])) would just be IsEmpty([Measures].[Amount]) but we had to use the more complex expression here because of the ~ unary operators in play.

This approach solves all the problems we were trying to avoid. It doesn’t null out the grand total. It preserves sparsity in the cube. It avoids using expensive strings. It avoids null subtotals and null grand totals. And it is usable in Excel.