Power BI Analysis Services Connector and Azure Active Directory Domain Services

Note: This post represents functionality in Power BI and Azure Active Directory as of October 2015.

Leveraging the new Azure Active Directory Domain Services feature with the Power BI Analysis Services Connector works! While it doesn’t provide any functionality on top of what you could already do with a full Active Directory domain controller in an Azure VM, it does simplify the setup. Pricing depends on your directory size, but if you have a small number of users/group/servers, AAD Domain Services may be only $37/month compared to a pair of A1 Azure VMs for a high availability domain controller which is about $134/month.

Previously I’ve blogged about the Power BI Analysis Services Connector and how it uses Azure Active Directory (AAD) authentication and EffectiveUserName. I described a UPN suffix trick to setup a demo environment. One bit of information is out-of-date now.

I want to do a quick proof-of-concept, so I will spin up a Power BI tenant called spatulacity.onmicrosoft.com. That’s Azure Active Directory; that’s not an Active Directory domain you can join your virtual machine to directly.

The AAD team has been busy and has released a new feature called AAD Domain Services. When you turn this feature on, you can join an Azure VM directly to a domain-controller-as-a-service. You don’t have to build your own VM to serve as a domain controller. In this blog post I describe that it works with the Power BI Analysis Services Connector. I also discuss a reference architecture for using the Power BI Analysis Services Connector in an extranet scenario.

High Level Logical Architecture

You log into Power BI with Azure Active Directory authentication. Period. Inside Power BI, one type of data source is live connectivity to Analysis Services through the Analysis Services connector. That connector must be installed on a domain-joined server. If you’re doing this in Azure and don’t already have domain controller VMs built out, then AAD Domain Services is a perfect fit. You put your Analysis Services VM in the virtual network that is associated to AAD Domain Services and then domain join it.

To drill into a bit more detail on the setup of AAD Domain Services, in the old Azure portal configure the directory, create an “AAD DC Administrators” group, enable domain services, add the domain services IP addresses as DNS servers on your virtual network, then domain join your Azure VM. The full instructions can be found here.

 


AAD Domain Services and Power BI Extranets

AAD can have internal users and external users. An internal user would be joe@spatulacity.onmicrosoft.com. An external user would be a user from a different AAD. For example, if a supplier of Spatula City is PlasticParts.com, then an external user might be zane@plasticparts.onmicrosoft.com or zane@plasticparts.com (depending on their AAD setup). We could add Zane to the Spatula City AAD and grant them access to resources.

But how do you share Power BI reports with external users? (This is the top requested Power BI feature at the moment.) If you build a report and try to share with an external user in Power BI, it will give you an error saying you can’t share outside of your organization.

So sharing a dashboard itself with an external user isn’t supported in Power BI at the moment. One solution would be for Spatula City to create a public content pack that would appear in the Get Data… Services screen next to the likes of Google Analytics and Salesforce content packs. But at the moment content packs are periodic refresh and the data must compress down to 250MB or less. So content packs at the moment aren’t a good fit if you need live data or need to allow external reporting on terabytes of data.

The following reference architecture for external users to access Analysis Services in Power BI can be successful (with some caveats):


VM #1 has the Analysis Services models. It also has the Analysis Services connector that’s signed into the Spatula City Power BI tenant. It is domain joined to Spatula City AAD Domain Services.

VM #2 is just a small VM that has another Analysis Services connector but this one is signed into the PlasticParts.com Power BI tenant. The reason for the separate VM is that you can only have one Analysis Services Connector installed per VM. Spatula City IT will install the Analysis Services Connector and work together with Plastic Parts IT to get the connector signed into the Plastic Parts Power BI tenant. This VM is also domain joined to the same Spatula City AAD Domain Services.

When zane@plasticparts.com signs into Power BI (the Plastic Parts Power BI tenant) and starts a report connected live to Analysis Services, the Analysis Services connector on VM #2 opens a connection to Analysis Services on VM #1 and passes EffectiveUserName=zane@plasticparts.com on the connection string. And because zane@plasticparts.com is an external user that has been previously added to the Spatula City AAD, this all works.

Though I mention this in an AAD Domain Services blog post, you could make this reference architecture work if you use Active Directory VMs with the alternate UPN suffix trick for those external users.


Pros and Cons of External Users and AAD Domain Services

Pro #1: Both Spatula City and Plastic Parts can control Zane’s access. If Zane leaves the company, Plastic Parts can disable the user and he won’t have access to Power BI. Or if Spatula City stops working with Plastic Parts as a supplier, Spatula City can remove their users from its AAD to remove access.

Con #1: The main downside of the above scenario is that external users are a very limited fit with AAD Domain Services currently. For example, if there’s a joe@spatulacity.com and a joe@plasticparts.com then currently only one of the two users will work inside AAD Domain Services. Why? Internally, both of these users get converted to GREGDS.LOCAL\joe and only one of the two (undeterminstically) will work. (The same can actually happen with internal users if you have a couple of domains like joe@spatulacity.com and joe@spatulacitycanada.com.) Hopefully this is a limitation that will be fixed inside AAD Domain Services in the coming months.

Con #2: Prior to a B2B feature coming out in September 2015, external users were a pain to provision. You can’t add the first @plasticparts.com to @spatulacity.com AAD without adding a Microsoft account (e.g. LiveID like spatulajoe@outlook.com) as an admin of both AADs. And there’s currently no API or PowerShell way of adding external users. Luckily this recently changed with the new AAD B2B invitation feature. Upload a CSV of the external user email addresses, the users get an email invitation, and you’re done. No need to make a Microsoft account an admin of both AADs.

Con #3: This setup requires Spatula City to have a working session with Plastic Parts IT so that they can provide the login credentials to connect it with the PlasticParts.com Power BI tenant.

Con #4: If build out an Active Directory VM and an Analysis Services VM for demo purposes, you can stop both VMs after the demo is over so you only have to pay for VHD storage not any compute. AAD Domain Services can’t currently be paused. And disabling Domain Services will delete the domain such that you have to domain join the computer again from scratch once you enable Domain Services again.

Con #5: The user that creates the dashboard in the Spatula City Power BI tenant can’t just share that with an external PlasticParts.com user. However, he can email a Power BI Desktop file or an Excel workbook with Power View sheets to the PlasticParts.com user. If that emailed file is connected to Analysis Services then when that user uploads it to the PlasticParts.com tenant, Power BI will connect up that report to the Analysis Services connector in the PlasticParts.com tenant. Then the user can pin the appropriate tiles and it will all be connected live to Analysis Services.


Conclusions

In summary, AAD Domain Services works great with Power BI Analysis Services Connector. AAD Domain Services doesn’t really offer any additional functionality over full Active Directory domain controller VMs, but it does simplify the setup. It’s a great fit for a Power BI demo environment or for an Azure-only business who wants to use Analysis Services inside Power BI. While external users work, there are serious limitations around username clashes that will hopefully be resolved in the future. Given the pent up demand for sharing Power BI content with third parties I decided that publishing this admittedly imperfect reference architecture was worthwhile.

Optimizing Large Distinct Counts in Analysis Services Tabular

The most glaring hole in the Tabular Performance Guide is information about tuning the performance of distinct counts. In fact I don’t believe distinct counts are even mentioned once in the whole whitepaper. (Since I helped author that whitepaper, visualize me hanging my head in shame and hoping this blog post will serve as my redemption.)

Thankfully others have picked up the slack and have published some great research on optimizing distinct count performance. Alex Whittles did some tests about the impact of DefaultSegmentRowCount on distinct counts. Alberto Ferrari wrote a nice whitepaper all about different approaches to distinct counts and understanding distinct count query plans and he also wrote a nice case study about InContact which suggested if you never need to report across business units you can separate business units into multiple SSAS databases to dramatically improve distinct count query performance. Marco Russo reminded us of an important performance improvement for large distinct counts that came out in SQL2012 SP1 CU9 and SQL2014 RTM CU1.


The Best Way to Optimize Distinct Counts

However, the best way to optimize a distinct count is to change it to a regular count. This blog post documents one way of doing this if there are many rows where the value you are distinct counting only appears once per table. For example in the following table we are distinct counting CustomerKey. Customer 99 and 88 only have one order:

OrderKey

CustomerKey

1111

99

2222

88

3333

77

4444

77

5555

66

6666

66

So we could transform this table to the following:

OrderKey

CustomerKey

CustomerKeyMulti

CustomerKeySingle

1111

99

 

1

2222

88

 

1

3333

77

77

 

4444

77

77

 

5555

66

66

 

6666

66

66

 

Now instead of DISTINCTCOUNT(FactOrder[CustomerKey]) we can improve performance by doing:

DISTINCTCOUNT(FactOrder[CustomerKeyMulti])
+ SUM(FactOrder[CustomerKeySingle])

That calculation will actually be 1 high because a distinct count on a column with nulls will count those nulls. So to fix that bug we need to make the calculation slightly more complex:

DISTINCTCOUNT(FactOrder[CustomerKeyMulti])
+ IF(
   SUM(FactOrder[CustomerKeySingle])>=1,
   SUM(FactOrder[CustomerKeySingle])-1
  )

How much faster is this optimized version than a plain distinct count? The following tests were done on a 200 million row table on my laptop which has an Intel i7-4702HQ processor. I did some similar tests on Azure D-series and G-series virtual machines and saw relatively similar numbers. As you can see, if 195 million rows are customers who only placed one order, then a 19 second distinct count becomes half a second once optimized. As a smaller percentage of rows are customers who only placed one order, the improvement after optimization shrinks to the point that the optimization is not worth the data preparation effort.

As with performance of most things in Tabular, the cardinality of columns largely determines performance. Drilling into these numbers in more detail, study the first row which took 18 seconds to run a distinct count across 200 million rows. Optimized it took 0.5 seconds and the optimized distinct count only had to distinct count 5 million values (200M - 195M = 5M distinct values in the CustomerKeyMulti column).

Rows with Unique Distinct Count Value

CustomerKey Cardinality

Regular Query Time (ms)

CustomerKeyMulti Cardinality

Optimized Query Time (ms)

195M

200M

18,480

5M

500

160M

180M

17,093

20M

1,858

120M

160M

13,003

40M

3,528

80M

140M

12,134

60M

5,620

40M

120M

11,632

80M

6,789

10M

105M

10,939

95M

9,523


Data Preparation for this Optimization

The ETL to calculate these two new columns is fairly expensive but depending on your data the resulting Analysis Services query performance enhancement may be worth it. The most straightforward approach to building the Multi and Single columns is:

select OrderKey
,
CustomerKey
,CustomerKeyMulti = case when CustomerKeyCount > 1 then CustomerKey end
,CustomerKeySingle = case when CustomerKeyCount = 1 then 1 end
from (
 select OrderKey
 ,CustomerKey
 ,CustomerKeyCount = count(*) over (PARTITION BY CustomerKey) > 1
 from FactOrder
) f

You could also split your single fact table into two fact tables to potentially optimize it even further. I briefly tested this and at least on my particular dataset this optimization didn’t make a significant difference, but your mileage may vary. If you don’t have any degenerate dimension columns, you can split the one optimized table into:

OrderKey

CustomerKey

CustomerKeyMulti

3333

77

77

4444

77

77

5555

66

66

6666

66

66

OrderKey

CustomerKey

CustomerKeySingle

1111

99

1

2222

88

1

 

Conclusion

Why perform an expensive distinct count if it’s not needed? If you can transform even a portion of your distinct count into a regular count, your query performance gain inside Analysis Services Tabular may be tremendous.

Azure SQL Data Warehouse Announcement and Analytics Platform System Speaking Tour

I could not have picked a better time to do a speaking tour about the APS (Analytics Platform System which is Microsoft's data warehouse and big data appliance). On April 29 at the Build conference, Microsoft made an incredibly exciting announcement: Azure SQL Data Warehouse.

The video announcement from Build about Azure SQL Data Warehouse, a fully managed relational data warehouse-as-a-service

Artis has been doing a ton of work recently with the APS appliance, and I have really enjoyed boiling down some of the lessons learned into presentation form. Speaking on a technology like APS is incredibly rewarding when you see everyone's jaw drop at the query performance and the hardware stats.

SQL Saturday #396 - Dallas - BI Edition

I recently had the pleasure of presenting a session entitled One is the Loneliest Number: Scaling out your Data Warehouse to a local SQL Saturday crowd. The slides can be found here.

If your current data warehouse is overworked and underpaid, maybe it needs a friend to share the load. Do you want to be able to scan your 1TB fact table in 20 seconds? Do you want to maintain performance even if data volumes triple without rewriting any code? Do you want a data warehouse which extends to the cloud?

This demo heavy session will be a deep dive into scaling out your data warehouse onto the Analytics Platform System appliance from Microsoft and querying it with familiar T-SQL. You will learn performance tuning best practices for taking advantage of a distributed data warehouse. See demos of hybrid connectivity to Azure Blob Storage through a technology called Polybase and learn how a modern data warehouse can handle structured data, semi-structured data, and streaming data.

In case you're wondering about the title, it's a Three Dog Night song, so of course we had to start the session by watching the rather silly video.

Three Dog Night's "One"

 

Adopting a Modern Hybrid Data Warehouse

In the next few weeks I will be speaking in Houston and Dallas:

Houston - May 14, 2015
Dallas - May 20, 2015

Do you want to be able to get results much faster from really large data sets? Are you and your team tired of waiting for slow reports to complete? Do you want to be able to scan your 1TB fact table in 20 seconds?

Join Microsoft and Artis Consulting to learn about the Modern Data Warehouse and how it can handle structured data, semi-structured data, and streaming data with the Analytics Platform System (APS) appliance and the recently announced Azure SQL Data Warehouse from Microsoft.  In this interactive lunch session, you will:

  • See demos of hybrid connectivity to Azure Blob Storage through a technology called Polybase

  • Learn how to scale out your data warehouse onto the APS and query it with familiar T-SQL

  • Learn the underpinnings of a distributed data warehouse which can scale to satisfy even the most demanding workloads

  • Hear a case study of a company that is revolutionizing their business and their industry with a modern hybrid data warehouse

Together we will answer the question of whether migrating to the APS appliance is the next step for your overworked data warehouse.

Setting up a Demo of the Power BI Analysis Services Connector

Note: This post is written as of December 2014 and represents functionality at this time.

Shameless Plug for Spatula City

I fear those who have never seen UHF and its famous Spatula City commercial will worry about my mental health. But compared to Adventure Works or Contoso, I feel that Spatula City makes a legitimate demo company. So let’s say I wanted to setup a demo of the new Power BI Analysis Services Connector for less than the cost of a spatula… how could I do that? I don’t want to purchase the spatulacity.com domain (because some loser already beat me to it and because that would cost more than a spatula) and I don’t want to bother federating a domain to Azure. Here’s a trick for just such a scenario. First a little history lesson on Power BI.

 

Old Power BI = Self Service Cloud BI

Prior to this month, Power BI could be neatly summarized as Microsoft’s self-service cloud BI offering. You could share online whatever data you could fit inside a single Excel workbook up to 250MB (which with columnar compression may be a couple GB of data). You could refresh daily against on prem sources using the Data Management Gateway. Your subject matter expert could share their Power Queries with the rest of the company enabling others to do self-service mashups. You could share your workbook with others via SharePoint Online granting them the ability to read all the data in the Excel workbook. You could even programmatically generate many clones of a workbook, each with a different subset of data, each shared with a different subset of users.

 

New Power BI = Enterprise Ready Cloud BI

However, this month the Power BI Public Preview released a new connector for Analysis Services which changes the game completely by enabling a great hybrid cloud scenario. Here’s how it works in a real enterprise (i.e. not Spatula City). You leverage your existing Analysis Services Tabular Model which has role-based security and leave it on premises. Then you connect to it through the Power BI site. For that connection to work you must install the Analysis Services Connector and configure it to connect to your Analysis Services instance using a myawesomecompany\ssasadmin (a service account which is an Analysis Services admin). Your IT department has federated your myawesomecompany domain to Azure so you can login to https://app.powerbi.com as bob@myawesomecompany.com and open a report connected live to a 200GB Tabular model (much more than the 250MB Excel file limit mentioned above). Per the role-based security in the Tabular model, Bob is only allowed to see sales from his territory. Under the covers, the Analysis Services Connector opens a connection as myawesomecompany\ssasadmin and adds EffectiveUserName=bob@myawesomecompany.com to the connection string. Since the SSAS server is joined to the myawesomecompany domain (or to a domain that trusts myawesomecompany) it impersonates Bob and he only sees data from his territory in Power BI. The model stays on premises and live DAX queries are run against your model so only the summarized query results displayed on the screen are transferred to the cloud.

 

An Error You May Hit With the Analysis Services Connector in a Demo Environment

All of that works great in real enterprise deployments of Power BI where you have federated your domain to Azure. However, for my Spatula City demo, I don’t want to buy the spatulacity.com domain or federate my demo domain with Azure. I want to do a quick proof-of-concept, so I will spin up a Power BI tenant called spatulacity.onmicrosoft.com. That’s Azure Active Directory; that’s not an Active Directory domain you can join your virtual machine to directly. (Update: Now you can join your Azure VM to Azure Active Directory Domain Services. I describe how that works with Power BI here.) So when you login to Power BI as fred@spatulacity.onmicrosoft.com and try to connect to Analysis Services, you will get an error in Analysis Services since it can’t impersonate fred@spatulacity.onmicrosoft.com. If your SSAS server is joined to a domain the error you see in SQL Server Profiler connected to SSAS will read “The following system error occurred: The user name or password is incorrect.” but if your SSAS server isn’t on a domain, then your error may read “The following system error occurred:  The name provided is not a properly formed account name.”

 

How to Setup a Demo Environment for the Analysis Services Connector

Here’s my setup and the trick I used to make this work in a demo environment. I created a virtual network in Azure and added two virtual machines. The first VM I added was a small VM to serve as a domain controller for a new Active Directory domain called gregazuredomain.local. I added a second VM as an Analysis Services server and I joined the server to the domain. I installed the Power BI Analysis Services Connector. Next I needed to setup my users.

Creating users like fred@gregazuredomain.local won’t help me. I need to setup a fred@spatulacity.onmicrosoft.com user in my Active Directory domain so that I can impersonate that user in Analysis Services. (To reiterate, in a real production scenario, I would just federate my domain with Azure Active Directory and log onto Power BI as fred@spatulacity.com, but I’m not doing a real production deployment, just a demo.) So I need to fake out Active Directory so it lets me create fred@spatulacity.onmicrosoft.com as a user. I open up the Active Directory Domains and Trusts app and right click on the top node and choose Properties. Then I add spatulacity.onmicrosoft.com as an alternate UPN suffix:

 

Now when I go into the Active Directory Administrative Center to add a new user, I can create a user with the desired UPN (userPrincipalName) fred@spatulacity.onmicrosoft.com. It’s not actually the same account I used in the Power BI site; it just has the same UPN. The password I enter doesn’t matter and doesn’t necessarily have to match the password you used when signing into Power BI. Power BI will connect to Analysis Services with the service account you setup when you installed the connector (an Analysis Services admin) and then do EffectiveUserName impersonation. EffectiveUserName impersonation doesn’t require knowing the user’s password. It just requires being an Analysis Services admin.

Now I can setup role-based security in Analysis Services to grant access to Fred:

Now I can login to my dashboard and see how Spatula City is revolutionizing the kitchen accessories industry:
 

Under the covers, you can see in Profiler connected to SSAS that it’s setting the EffectiveUserName property:

In addition to Power BI using EffectiveUserName, I have blogged in the past about EffectiveUserName and Reporting Services. EffectiveUserName is also built-in to SharePoint 2013 Excel Services and PerformancePoint. If it’s not obvious, I’m a fan because you can avoid the hassle of Kerberos setup for common on prem and cloud BI scenarios.

To summarize, editing the UPN of a user in a demo Active Directory domain to match your Power BI login is a quick way of enabling a Power BI Analysis Services Connector demo environment. Just make sure that your demo covers a topic more interesting than spatula sales.

Wait, Power Pivot! Don’t Refresh Yet! I’ve Got More Changes!

My Tuesday morning was an exercise in patience waiting on Excel 2013. I was working on a complex workbook with a complex Power Pivot model with half a dozen complex PivotTables. After every minor change in Power Pivot, all the reports refresh. Hide a column… wait 2 minutes… change a calculation… wait 2 minutes. I finally got fed up with it, had a brain spark, and then my Tuesday afternoon was very productive.

Excel 2013 in most ways is a great leap over Excel 2010. Baking Power View directly into Excel was such a smart move for Microsoft to have made. Including Power Pivot in Excel in the form of the Data Model was clever because a user can leverage all that power without thinking about modeling or add-ins. However, in one way Excel 2013 was a major step backwards. Unlike Power Pivot in Excel 2010 where it assumed the user was intelligent enough to make a batch of changes to the model and then manually refresh the PivotTable reports, Excel 2013 tries to make this process more automatic. With every minor change to the model, all PivotTable reports refresh. What’s worse is that if I want to move a calculated measure from one table to another, Excel 2013 decides that I want to remove the measure from all reports.

Fortunately, we can do something about that…

Let’s say you are about to make a large batch of Power Pivot changes and don’t want PivotTables to refresh after each small change.

Install the latest version of OLAP PivotTable Extensions. Then use the Disable Auto Refresh feature by right clicking on a PivotTable connected to your Power Pivot model:

That command will disable auto refresh on all PivotTables connected to your Power Pivot model.

Now proceed with your Power Pivot changes and none of the PivotTables will refresh after every change. What’s even better is that you can make major changes like moving a calculated measure from one table to another without that measure being deleted from all the PivotTables.

Right click on a PivotTable connected to your model and Enable Auto Refresh:

On the topic of refresh, don’t forget that you can also speed up model changes by changing the calculation mode within Power Pivot under the Design tab and the Calculation Options dropdown to Manual Calculation Mode and choosing Calculate Now only as necessary:

The real question now is this… what are you going to do with all that time you got back!?