Azure SQL Data Warehouse Announcement and Analytics Platform System Speaking Tour

I could not have picked a better time to do a speaking tour about the APS (Analytics Platform System which is Microsoft's data warehouse and big data appliance). On April 29 at the Build conference, Microsoft made an incredibly exciting announcement: Azure SQL Data Warehouse.

The video announcement from Build about Azure SQL Data Warehouse, a fully managed relational data warehouse-as-a-service

Artis has been doing a ton of work recently with the APS appliance, and I have really enjoyed boiling down some of the lessons learned into presentation form. Speaking on a technology like APS is incredibly rewarding when you see everyone's jaw drop at the query performance and the hardware stats.

SQL Saturday #396 - Dallas - BI Edition

I recently had the pleasure of presenting a session entitled One is the Loneliest Number: Scaling out your Data Warehouse to a local SQL Saturday crowd. The slides can be found here.

If your current data warehouse is overworked and underpaid, maybe it needs a friend to share the load. Do you want to be able to scan your 1TB fact table in 20 seconds? Do you want to maintain performance even if data volumes triple without rewriting any code? Do you want a data warehouse which extends to the cloud?

This demo heavy session will be a deep dive into scaling out your data warehouse onto the Analytics Platform System appliance from Microsoft and querying it with familiar T-SQL. You will learn performance tuning best practices for taking advantage of a distributed data warehouse. See demos of hybrid connectivity to Azure Blob Storage through a technology called Polybase and learn how a modern data warehouse can handle structured data, semi-structured data, and streaming data.

In case you're wondering about the title, it's a Three Dog Night song, so of course we had to start the session by watching the rather silly video.

Three Dog Night's "One"

 

Adopting a Modern Hybrid Data Warehouse

In the next few weeks I will be speaking in Houston and Dallas:

Houston - May 14, 2015
Dallas - May 20, 2015

Do you want to be able to get results much faster from really large data sets? Are you and your team tired of waiting for slow reports to complete? Do you want to be able to scan your 1TB fact table in 20 seconds?

Join Microsoft and Artis Consulting to learn about the Modern Data Warehouse and how it can handle structured data, semi-structured data, and streaming data with the Analytics Platform System (APS) appliance and the recently announced Azure SQL Data Warehouse from Microsoft.  In this interactive lunch session, you will:

  • See demos of hybrid connectivity to Azure Blob Storage through a technology called Polybase

  • Learn how to scale out your data warehouse onto the APS and query it with familiar T-SQL

  • Learn the underpinnings of a distributed data warehouse which can scale to satisfy even the most demanding workloads

  • Hear a case study of a company that is revolutionizing their business and their industry with a modern hybrid data warehouse

Together we will answer the question of whether migrating to the APS appliance is the next step for your overworked data warehouse.

Setting up a Demo of the Power BI Analysis Services Connector

Note: This post is written as of December 2014 and represents functionality at this time.

Shameless Plug for Spatula City

I fear those who have never seen UHF and its famous Spatula City commercial will worry about my mental health. But compared to Adventure Works or Contoso, I feel that Spatula City makes a legitimate demo company. So let’s say I wanted to setup a demo of the new Power BI Analysis Services Connector for less than the cost of a spatula… how could I do that? I don’t want to purchase the spatulacity.com domain (because some loser already beat me to it and because that would cost more than a spatula) and I don’t want to bother federating a domain to Azure. Here’s a trick for just such a scenario. First a little history lesson on Power BI.

 

Old Power BI = Self Service Cloud BI

Prior to this month, Power BI could be neatly summarized as Microsoft’s self-service cloud BI offering. You could share online whatever data you could fit inside a single Excel workbook up to 250MB (which with columnar compression may be a couple GB of data). You could refresh daily against on prem sources using the Data Management Gateway. Your subject matter expert could share their Power Queries with the rest of the company enabling others to do self-service mashups. You could share your workbook with others via SharePoint Online granting them the ability to read all the data in the Excel workbook. You could even programmatically generate many clones of a workbook, each with a different subset of data, each shared with a different subset of users.

 

New Power BI = Enterprise Ready Cloud BI

However, this month the Power BI Public Preview released a new connector for Analysis Services which changes the game completely by enabling a great hybrid cloud scenario. Here’s how it works in a real enterprise (i.e. not Spatula City). You leverage your existing Analysis Services Tabular Model which has role-based security and leave it on premises. Then you connect to it through the Power BI site. For that connection to work you must install the Analysis Services Connector and configure it to connect to your Analysis Services instance using a myawesomecompany\ssasadmin (a service account which is an Analysis Services admin). Your IT department has federated your myawesomecompany domain to Azure so you can login to https://app.powerbi.com as bob@myawesomecompany.com and open a report connected live to a 200GB Tabular model (much more than the 250MB Excel file limit mentioned above). Per the role-based security in the Tabular model, Bob is only allowed to see sales from his territory. Under the covers, the Analysis Services Connector opens a connection as myawesomecompany\ssasadmin and adds EffectiveUserName=bob@myawesomecompany.com to the connection string. Since the SSAS server is joined to the myawesomecompany domain (or to a domain that trusts myawesomecompany) it impersonates Bob and he only sees data from his territory in Power BI. The model stays on premises and live DAX queries are run against your model so only the summarized query results displayed on the screen are transferred to the cloud.

 

An Error You May Hit With the Analysis Services Connector in a Demo Environment

All of that works great in real enterprise deployments of Power BI where you have federated your domain to Azure. However, for my Spatula City demo, I don’t want to buy the spatulacity.com domain or federate my demo domain with Azure. I want to do a quick proof-of-concept, so I will spin up a Power BI tenant called spatulacity.onmicrosoft.com. That’s Azure Active Directory; that’s not an Active Directory domain you can join your virtual machine to directly. So when you login to Power BI as fred@spatulacity.onmicrosoft.com and try to connect to Analysis Services, you will get an error in Analysis Services since it can’t impersonate fred@spatulacity.onmicrosoft.com. If your SSAS server is joined to a domain the error you see in SQL Server Profiler connected to SSAS will read “The following system error occurred: The user name or password is incorrect.” but if your SSAS server isn’t on a domain, then your error may read “The following system error occurred:  The name provided is not a properly formed account name.”

 

How to Setup a Demo Environment for the Analysis Services Connector

Here’s my setup and the trick I used to make this work in a demo environment. I created a virtual network in Azure and added two virtual machines. The first VM I added was a small VM to serve as a domain controller for a new Active Directory domain called gregazuredomain.local. I added a second VM as an Analysis Services server and I joined the server to the domain. I installed the Power BI Analysis Services Connector. Next I needed to setup my users.

Creating users like fred@gregazuredomain.local won’t help me. I need to setup a fred@spatulacity.onmicrosoft.com user in my Active Directory domain so that I can impersonate that user in Analysis Services. (To reiterate, in a real production scenario, I would just federate my domain with Azure Active Directory and log onto Power BI as fred@spatulacity.com, but I’m not doing a real production deployment, just a demo.) So I need to fake out Active Directory so it lets me create fred@spatulacity.onmicrosoft.com as a user. I open up the Active Directory Domains and Trusts app and right click on the top node and choose Properties. Then I add spatulacity.onmicrosoft.com as an alternate UPN suffix:

 

Now when I go into the Active Directory Administrative Center to add a new user, I can create a user with the desired UPN (userPrincipalName) fred@spatulacity.onmicrosoft.com. It’s not actually the same account I used in the Power BI site; it just has the same UPN. The password I enter doesn’t matter and doesn’t necessarily have to match the password you used when signing into Power BI. Power BI will connect to Analysis Services with the service account you setup when you installed the connector (an Analysis Services admin) and then do EffectiveUserName impersonation. EffectiveUserName impersonation doesn’t require knowing the user’s password. It just requires being an Analysis Services admin.

Now I can setup role-based security in Analysis Services to grant access to Fred:

Now I can login to my dashboard and see how Spatula City is revolutionizing the kitchen accessories industry:
 

Under the covers, you can see in Profiler connected to SSAS that it’s setting the EffectiveUserName property:

In addition to Power BI using EffectiveUserName, I have blogged in the past about EffectiveUserName and Reporting Services. EffectiveUserName is also built-in to SharePoint 2013 Excel Services and PerformancePoint. If it’s not obvious, I’m a fan because you can avoid the hassle of Kerberos setup for common on prem and cloud BI scenarios.

To summarize, editing the UPN of a user in a demo Active Directory domain to match your Power BI login is a quick way of enabling a Power BI Analysis Services Connector demo environment. Just make sure that your demo covers a topic more interesting than spatula sales.

Wait, Power Pivot! Don’t Refresh Yet! I’ve Got More Changes!

My Tuesday morning was an exercise in patience waiting on Excel 2013. I was working on a complex workbook with a complex Power Pivot model with half a dozen complex PivotTables. After every minor change in Power Pivot, all the reports refresh. Hide a column… wait 2 minutes… change a calculation… wait 2 minutes. I finally got fed up with it, had a brain spark, and then my Tuesday afternoon was very productive.

Excel 2013 in most ways is a great leap over Excel 2010. Baking Power View directly into Excel was such a smart move for Microsoft to have made. Including Power Pivot in Excel in the form of the Data Model was clever because a user can leverage all that power without thinking about modeling or add-ins. However, in one way Excel 2013 was a major step backwards. Unlike Power Pivot in Excel 2010 where it assumed the user was intelligent enough to make a batch of changes to the model and then manually refresh the PivotTable reports, Excel 2013 tries to make this process more automatic. With every minor change to the model, all PivotTable reports refresh. What’s worse is that if I want to move a calculated measure from one table to another, Excel 2013 decides that I want to remove the measure from all reports.

Fortunately, we can do something about that…

Let’s say you are about to make a large batch of Power Pivot changes and don’t want PivotTables to refresh after each small change.

Install the latest version of OLAP PivotTable Extensions. Then use the Disable Auto Refresh feature by right clicking on a PivotTable connected to your Power Pivot model:

That command will disable auto refresh on all PivotTables connected to your Power Pivot model.

Now proceed with your Power Pivot changes and none of the PivotTables will refresh after every change. What’s even better is that you can make major changes like moving a calculated measure from one table to another without that measure being deleted from all the PivotTables.

Right click on a PivotTable connected to your model and Enable Auto Refresh:

On the topic of refresh, don’t forget that you can also speed up model changes by changing the calculation mode within Power Pivot under the Design tab and the Calculation Options dropdown to Manual Calculation Mode and choosing Calculate Now only as necessary:

The real question now is this… what are you going to do with all that time you got back!?

Tabular Performance Guide Speaking Tour

Some of the most fun I’ve had this year was spent collaborating with some friends at Microsoft to publish an important guide on performance tuning Tabular models in Analysis Services. I am very proud to have helped co-author the paper which represents hundreds of hours of work by many people.

Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services

http://aka.ms/ASTabPerf2012

The reaction from the community has been great (here and here for example). Chris Webb even noted that there is some content in the whitepaper relevant to Multidimensional models, too. Specifically he mentioned we noted in the whitepaper that SQL 2012 SP1 CU4 (though you should just skip to CU6 which is the latest at the moment) included an enhancement so that multi-select filters no longer prevent caching in most scenarios. While that is a huge deal for Multidimensional models, it’s also very helpful for Tabular models if you’re querying them in a client tool like Excel which runs MDX queries. For those who are optimizing Multidimensional models, I’d also point you to a summary of SSAS enhancements made in SSAS 2012 on page 144, a warning about very wide models and heap fragmentation on page 67, and a discussion of updated SSAS drivers and performance on page 69.


PASS Summit 2013

John Sirmon and I had the privilege of summarizing 147 pages of content into a 75 minute presentation at PASS Summit 2013 in Charlotte, NC recently. The room was pretty packed and there were some great questions afterward.

SQL Saturday #255 – Dallas

For the locals who weren’t able to make it to Charlotte, I’m going to be doing the talk in my hometown for SQL Saturday Dallas on November 2, 2013. Here’s the abstract:

Gain the knowledge, tips, and techniques you need to tune your Tabular models in SQL Server Analysis Services 2012. This session will begin by exploring the Tabular model architecture and explaining the factors that affect performance. We’ll then walk through tuning techniques and strategies specific to querying, processing, partitioning, and server configuration and demonstrate optimizing performance in some common scenarios. This session is based on the recently released white paper “Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services.”


Tabular Performance Event

It’s hard to do justice to this content in an hour, so I am looking forward to speaking at an all-day event at our local Microsoft office in Irving, TX on November 12, 2013. If you are interested, please reach out to Bryan Smith.

N/A For Subtotals

Adventure Works bashing is quite a fun sport. It’s unfortunate Adventure Works makes such an easy target. In this post I want to pick on how the Adventure Works cube shows N/A for the Statistical Account subtotal.

Obviously it makes no sense to sum together Headcount and Units and Average Unit Price. Setting the Statistical Accounts subtotal to NA does make sense. Adventure Works uses the following code to do this:

// Assignment of constant for Statistical Accounts account member.
// This will allow parent member to be visible by default.
( [Account].[Accounts].&[95], [Measures].[Amount] ) = "NA" ;

Unfortunately, this is not the best way for several reasons.


Strings Break Aggregations

First, assigning a string will null out any subtotals and grand totals above where you assigned the string. This problem isn’t visible in Adventure Works since IsAggregatable is set to false on the [Account].[Accounts] hierarchy which removes the grand total. But if we do the following assignment on a hierarchy with a grand total, we see the problem:

( [Product].[Product Categories].[Category].[Accessories], [Measures].[Internet Sales Amount] ) = "NA";

      before assignment

       after assignment

You can see that when we assigned a string to the Accessories subtotal, it nulled out the grand total making it disappear. And null grand totals are a major usability problem, as I’ll explain in a second.


Preserving Sparsity

The second reason the Adventure Works NA approach is flawed is that it eliminates the sparsity in the cube for the Statistical Accounts subtotal. For example, let’s say we put [Date].[Date] on columns, note a bunch of dates with no data except for NA show up. This hurts performance tremendously and makes it more difficult to find meaningful data.

 

Typically, fixing this problem involves a conditional assignment which ensures empty cells stay empty:

( [Account].[Accounts].&[95], [Measures].[Amount] ) = IIf(IsEmpty([Measures].[Amount]),null,"NA");

But in Adventure Works, all the Statistical Accounts have a ~ unary operator which means that Statistical Accounts subtotal is always null. So to preserve sparsity and fill in the Statistical Accounts subtotal with NA only where any statistical accounts have data, we would have to do something like:

( [Account].[Accounts].&[95], [Measures].[Amount] ) =
IIf(
 IsEmpty(
Sum(
 [Account].[Accounts].&[95].Children
 ,[Measures].[Amount]
)
 )
 ,null
 ,"NA"
);


Strings Are Less Efficient

The third reason the Adventure Works NA approach is flawed is the degradation in performance due to the use of strings. In talking with the SSAS team, Marius Dumitru summarized this problem by saying, “strings are larger than other data types, harder to manipulate, introduce data structure indirections and memory copies/allocations.”

Mosha Pasumansky mentioned how strings and sparsity impact performance in the discussion of the Ratio calculation in this old blog post.


Null Grand Totals

Recently some co-workers were building a PivotTable and no matter what they did, they could never get anything more than a blank PivotTable. After a bit of head scratching, I realized what was happening was that the grand total was nulled out, so they were never even able to start the PivotTable properly.

To demonstrate this problem, let’s pretend we want to null out the grand total on the Product Categories hierarchy:

( [Product].[Product Categories].[All Products], [Measures].[Internet Sales Amount] ) = null;

Now if we start a PivotTable and put something other than the Product Categories hierarchy on rows, you’ll see this null grand total prevents us from starting the PivotTable since no rows even show up:

       before assignment

      after assignment

Incidentally, you can tell from the “this will allow parent member to be visible” comment in the code at the top of this blog post that the developers of Adventure Works had this very thing in mind. Since the ~ unary operator caused Statistical Accounts to be null always, it made it impossible to expand that subtotal since it would never appear.

Of course, your users could check the “Show items with no data on rows” option under PivotTable Options… Display. But you can’t expect your users to know to do that. PivotTables are designed for top down analysis. And if the top is null, your PivotTable is not user friendly.


My Solution

So how do we avoid all three Adventure Works problems created by assigning strings? And how do we avoid the problem of null subtotals and null grand totals?

You will note Mosha Pasumansky’s approach in the blog post above was to use NULL and a four-part format string like '#.#;;;NA'. The problem with that approach is that it doesn’t work in Excel PivotTables. Excel retrieves the VALUE and the FORMAT_STRING cell properties then does the formatting itself. But the formatting code in Excel ignores the fourth part, unfortunately. (As a reminder, format strings are semicolon separated. The first part is for positive numbers, the second part is for negative numbers, the third part is for zero, and the fourth part is for nulls.) So our solution will have to limit itself to three-part format strings which work in Excel.

My approach of choice looks like the following:

scope( [Account].[Accounts].&[95], [Measures].[Amount] );
 this = IIf(IsEmpty(Sum([Account].[Accounts].&[95].Children, [Measures].[Amount])),null,0);
 Format_String(this) = ";;"; //format zeros as blank
end scope;

It is assigning a zero to a cell if any statistical accounts have data in this slice, otherwise it’s assigning null. Then it’s using a custom format string just for the Statistical Accounts subtotal. The format string is just two semicolons. That three-part format string says to format zero as blank. Since the format string is inside the scope statement, we’re only customizing the format string for the Statistical Accounts subtotal.

If you prefer NA, then you would use the following format string code inside the scope statement:

 Format_String(this) = ";;\N\A";

And as discussed above, typically IsEmpty(Sum([Account].[Accounts].&[95].Children, [Measures].[Amount])) would just be IsEmpty([Measures].[Amount]) but we had to use the more complex expression here because of the ~ unary operators in play.

This approach solves all the problems we were trying to avoid. It doesn’t null out the grand total. It preserves sparsity in the cube. It avoids using expensive strings. It avoids null subtotals and null grand totals. And it is usable in Excel.