Dec 19
Setting up a Demo of the Power BI Analysis Services Connector

Note: This post is written as of December 2014 and represents functionality at this time.

clip_image002Shameless Plug for Spatula City

I fear those who have never seen UHF and its famous Spatula City commercial will worry about my mental health. But compared to Adventure Works or Contoso, I feel that Spatula City makes a legitimate demo company. So let’s say I wanted to setup a demo of the new Power BI Analysis Services Connector for less than the cost of a spatula… how could I do that? I don’t want to purchase the spatulacity.com domain (because some loser already beat me to it and because that would cost more than a spatula) and I don’t want to bother federating a domain to Azure. Here’s a trick for just such a scenario. First a little history lesson on Power BI.

Old Power BI = Self Service Cloud BI

Prior to this month, Power BI could be neatly summarized as Microsoft’s self-service cloud BI offering. You could share online whatever data you could fit inside a single Excel workbook up to 250MB (which with columnar compression may be a couple GB of data). You could refresh daily against on prem sources using the Data Management Gateway. Your subject matter expert could share their Power Queries with the rest of the company enabling others to do self-service mashups. You could share your workbook with others via SharePoint Online granting them the ability to read all the data in the Excel workbook. You could even programmatically generate many clones of a workbook, each with a different subset of data, each shared with a different subset of users.

New Power BI = Enterprise Ready Cloud BI

However, this month the Power BI Public Preview released a new connector for Analysis Services which changes the game completely by enabling a great hybrid cloud scenario. Here’s how it works in a real enterprise (i.e. not Spatula City). You leverage your existing Analysis Services Tabular Model which has role-based security and leave it on premises. Then you connect to it through the Power BI site. For that connection to work you must install the Analysis Services Connector and configure it to connect to your Analysis Services instance using a myawesomecompany\ssasadmin (a service account which is an Analysis Services admin). Your IT department has federated your myawesomecompany domain to Azure so you can login to https://preview.powerbi.com as bob@myawesomecompany.com and open a report connected live to a 200GB Tabular model (much more than the 250MB Excel file limit mentioned above). Per the role-based security in the Tabular model, Bob is only allowed to see sales from his territory. Under the covers, the Analysis Services Connector opens a connection as myawesomecompany\ssasadmin and adds EffectiveUserName=bob@myawesomecompany.com to the connection string. Since the SSAS server is joined to the myawesomecompany domain (or to a domain that trusts myawesomecompany) it impersonates Bob and he only sees data from his territory in Power BI. The model stays on premises and live DAX queries are run against your model so only the summarized query results displayed on the screen are transferred to the cloud.

clip_image002   

An Error You May Hit With the Analysis Services Connector in a Demo Environment

All of that works great in real enterprise deployments of Power BI where you have federated your domain to Azure. However, for my Spatula City demo, I don’t want to buy the spatulacity.com domain or federate my demo domain with Azure. I want to do a quick proof-of-concept, so I will spin up a Power BI tenant called spatulacity.onmicrosoft.com. That’s Azure Active Directory; that’s not an Active Directory domain you can join your virtual machine to directly. So when you login to Power BI as fred@spatulacity.onmicrosoft.com and try to connect to Analysis Services, you will get an error in Analysis Services since it can’t impersonate fred@spatulacity.onmicrosoft.com. If your SSAS server is joined to a domain the error you see in SQL Server Profiler connected to SSAS will read “The following system error occurred: The user name or password is incorrect.” but if your SSAS server isn’t on a domain, then your error may read “The following system error occurred:  The name provided is not a properly formed account name.”

image

How to Setup a Demo Environment for the Analysis Services Connector

Here’s my setup and the trick I used to make this work in a demo environment. I created a virtual network in Azure and added two virtual machines. The first VM I added was a small VM to serve as a domain controller for a new Active Directory domain called gregazuredomain.local. I added a second VM as an Analysis Services server and I joined the server to the domain. I installed the Power BI Analysis Services Connector. Next I needed to setup my users.

Creating users like fred@gregazuredomain.local won’t help me. I need to setup a fred@spatulacity.onmicrosoft.com user in my Active Directory domain so that I can impersonate that user in Analysis Services. (To reiterate, in a real production scenario, I would just federate my domain with Azure Active Directory and log onto Power BI as fred@spatulacity.com, but I’m not doing a real production deployment, just a demo.) So I need to fake out Active Directory so it lets me create fred@spatulacity.onmicrosoft.com as a user. I open up the Active Directory Domains and Trusts app and right click on the top node and choose Properties. Then I add spatulacity.onmicrosoft.com as an alternate UPN suffix:

image

Now when I go into the Active Directory Administrative Center to add a new user, I can create a user with the desired UPN (userPrincipalName) fred@spatulacity.onmicrosoft.com. It’s not actually the same account I used in the Power BI site; it just has the same UPN. The password I enter doesn’t matter and doesn’t necessarily have to match the password you used when signing into Power BI. Power BI will connect to Analysis Services with the service account you setup when you installed the connector (an Analysis Services admin) and then do EffectiveUserName impersonation. EffectiveUserName impersonation doesn’t require knowing the user’s password. It just requires being an Analysis Services admin.

image

Now I can setup role-based security in Analysis Services to grant access to Fred:

image

Now I can login to my dashboard and see how Spatula City is revolutionizing the kitchen accessories industry:

image

Under the covers, you can see in Profiler connected to SSAS that it’s setting the EffectiveUserName property:

ProfilerEffectiveUserName

In addition to Power BI using EffectiveUserName, I have blogged in the past about EffectiveUserName and Reporting Services. EffectiveUserName is also built-in to SharePoint 2013 Excel Services and PerformancePoint. If it’s not obvious, I’m a fan because you can avoid the hassle of Kerberos setup for common on prem and cloud BI scenarios.

To summarize, editing the UPN of a user in a demo Active Directory domain to match your Power BI login is a quick way of enabling a Power BI Analysis Services Connector demo environment. Just make sure that your demo covers a topic more interesting than spatula sales.

Apr 02
Wait, Power Pivot! Don’t Refresh Yet! I’ve Got More Changes!

My Tuesday morning was an exercise in patience waiting on Excel 2013. I was working on a complex workbook with a complex Power Pivot model with half a dozen complex PivotTables. After every minor change in Power Pivot, all the reports refresh. Hide a column… wait 2 minutes… change a calculation… wait 2 minutes. I finally got fed up with it, had a brain spark, and then my Tuesday afternoon was very productive.

Excel 2013 in most ways is a great leap over Excel 2010. Baking Power View directly into Excel was such a smart move for Microsoft to have made. Including Power Pivot in Excel in the form of the Data Model was clever because a user can leverage all that power without thinking about modeling or add-ins. However, in one way Excel 2013 was a major step backwards. Unlike Power Pivot in Excel 2010 where it assumed the user was intelligent enough to make a batch of changes to the model and then manually refresh the PivotTable reports, Excel 2013 tries to make this process more automatic. With every minor change to the model, all PivotTable reports refresh. What’s worse is that if I want to move a calculated measure from one table to another, Excel 2013 decides that I want to remove the measure from all reports.

Fortunately, we can do something about that…

DisableAutoRefreshHappyHeader 

Let’s say you are about to make a large batch of Power Pivot changes and don’t want PivotTables to refresh after each small change.

Install the latest version of OLAP PivotTable Extensions. Then use the Disable Auto Refresh feature by right clicking on a PivotTable connected to your Power Pivot model:

DisableAutoRefresh 

That command will disable auto refresh on all PivotTables connected to your Power Pivot model.

Now proceed with your Power Pivot changes and none of the PivotTables will refresh after every change. What’s even better is that you can make major changes like moving a calculated measure from one table to another without that measure being deleted from all the PivotTables.

Right click on a PivotTable connected to your model and Enable Auto Refresh:

EnableAutoRefresh 

On the topic of refresh, don’t forget that you can also speed up model changes by changing the calculation mode within Power Pivot under the Design tab and the Calculation Options dropdown to Manual Calculation Mode and choosing Calculate Now only as necessary:

image 

The real question now is this… what are you going to do with all that time you got back!?

Oct 21
Tabular Performance Guide Speaking Tour

Some of the most fun I’ve had this year was spent collaborating with some friends at Microsoft to publish an important guide on performance tuning Tabular models in Analysis Services. I am very proud to have helped co-author the paper which represents hundreds of hours of work by many people.

Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services
http://aka.ms/ASTabPerf2012

image

The reaction from the community has been great (here and here for example). Chris Webb even noted that there is some content in the whitepaper relevant to Multidimensional models, too. Specifically he mentioned we noted in the whitepaper that SQL 2012 SP1 CU4 (though you should just skip to CU6 which is the latest at the moment) included an enhancement so that multi-select filters no longer prevent caching in most scenarios. While that is a huge deal for Multidimensional models, it’s also very helpful for Tabular models if you’re querying them in a client tool like Excel which runs MDX queries. For those who are optimizing Multidimensional models, I’d also point you to a summary of SSAS enhancements made in SSAS 2012 on page 144, a warning about very wide models and heap fragmentation on page 67, and a discussion of updated SSAS drivers and performance on page 69.


PASS Summit 2013

John Sirmon and I had the privilege of summarizing 147 pages of content into a 75 minute presentation at PASS Summit 2013 in Charlotte, NC recently. The room was pretty packed and there were some great questions afterward.

photo


SQL Saturday #255 – Dallas

For the locals who weren’t able to make it to Charlotte, I’m going to be doing the talk in my hometown for SQL Saturday Dallas on November 2, 2013. Here’s the abstract:

Gain the knowledge, tips, and techniques you need to tune your Tabular models in SQL Server Analysis Services 2012. This session will begin by exploring the Tabular model architecture and explaining the factors that affect performance. We’ll then walk through tuning techniques and strategies specific to querying, processing, partitioning, and server configuration and demonstrate optimizing performance in some common scenarios. This session is based on the recently released white paper “Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services.”


Tabular Performance Event

It’s hard to do justice to this content in an hour, so I am looking forward to speaking at an all-day event at our local Microsoft office in Irving, TX on November 12, 2013. If you are interested, please reach out to Bryan Smith.

Jun 08
N/A For Subtotals

Adventure Works bashing is quite a fun sport. It’s unfortunate Adventure Works makes such an easy target. In this post I want to pick on how the Adventure Works cube shows N/A for the Statistical Account subtotal.

clip_image002 

Obviously it makes no sense to sum together Headcount and Units and Average Unit Price. Setting the Statistical Accounts subtotal to NA does make sense. Adventure Works uses the following code to do this:

// Assignment of constant for Statistical Accounts account member.
// This will allow parent member to be visible by default.

( [Account].[Accounts].&[95], [Measures].[Amount] ) = "NA" ;

Unfortunately, this is not the best way for several reasons.


Strings Break Aggregations

First, assigning a string will null out any subtotals and grand totals above where you assigned the string. This problem isn’t visible in Adventure Works since IsAggregatable is set to false on the [Account].[Accounts] hierarchy which removes the grand total. But if we do the following assignment on a hierarchy with a grand total, we see the problem:

( [Product].[Product Categories].[Category].[Accessories], [Measures].[Internet Sales Amount] ) = "NA";

image image

before assignment

after assignment

You can see that when we assigned a string to the Accessories subtotal, it nulled out the grand total making it disappear. And null grand totals are a major usability problem, as I’ll explain in a second.


Preserving Sparsity

The second reason the Adventure Works NA approach is flawed is that it eliminates the sparsity in the cube for the Statistical Accounts subtotal. For example, let’s say we put [Date].[Date] on columns, note a bunch of dates with no data except for NA show up. This hurts performance tremendously and makes it more difficult to find meaningful data.

image 

Typically, fixing this problem involves a conditional assignment which ensures empty cells stay empty:

( [Account].[Accounts].&[95], [Measures].[Amount] ) = IIf(IsEmpty([Measures].[Amount]),null,"NA");

But in Adventure Works, all the Statistical Accounts have a ~ unary operator which means that Statistical Accounts subtotal is always null. So to preserve sparsity and fill in the Statistical Accounts subtotal with NA only where any statistical accounts have data, we would have to do something like:

( [Account].[Accounts].&[95], [Measures].[Amount] ) =
    IIf(
     IsEmpty(
      Sum(
       [Account].[Accounts].&[95].Children
       ,[Measures].[Amount]
      )
     )
     ,null
     ,"NA"
    );

image 


Strings Are Less Efficient

The third reason the Adventure Works NA approach is flawed is the degradation in performance due to the use of strings. In talking with the SSAS team, Marius Dumitru summarized this problem by saying, “strings are larger than other data types, harder to manipulate, introduce data structure indirections and memory copies/allocations.”

Mosha Pasumansky mentioned how strings and sparsity impact performance in the discussion of the Ratio calculation in this old blog post.


Null Grand Totals

Recently some co-workers were building a PivotTable and no matter what they did, they could never get anything more than a blank PivotTable. After a bit of head scratching, I realized what was happening was that the grand total was nulled out, so they were never even able to start the PivotTable properly.

To demonstrate this problem, let’s pretend we want to null out the grand total on the Product Categories hierarchy:

( [Product].[Product Categories].[All Products], [Measures].[Internet Sales Amount] ) = null;

Now if we start a PivotTable and put something other than the Product Categories hierarchy on rows, you’ll see this null grand total prevents us from starting the PivotTable since no rows even show up:

image image

before assignment

after assignment

Incidentally, you can tell from the “this will allow parent member to be visible” comment in the code at the top of this blog post that the developers of Adventure works had this very thing in mind. Since the ~ unary operator caused Statistical Accounts to be null always, it made it impossible to expand that subtotal since it would never appear.

Of course, your users could check the “Show items with no data on rows” option under PivotTable Options… Display. But you can’t expect your users to know to do that. PivotTables are designed for top down analysis. And if the top is null, your PivotTable is not user friendly.


My Solution

So how do we avoid all three Adventure Works problems created by assigning strings? And how do we avoid the problem of null subtotals and null grand totals?

You will note Mosha Pasumansky’s approach in the blog post above was to use NULL and a four-part format string like '#.#;;;NA'. The problem with that approach is that it doesn’t work in Excel PivotTables. Excel retrieves the VALUE and the FORMAT_STRING cell properties then does the formatting itself. But the formatting code in Excel ignores the fourth part, unfortunately. (As a reminder, format strings are semicolon separated. The first part is for positive numbers, the second part is for negative numbers, the third part is for zero, and the fourth part is for nulls.) So our solution will have to limit itself to three-part format strings which work in Excel.

My approach of choice looks like the following:

scope( [Account].[Accounts].&[95], [Measures].[Amount] );
    this = IIf(IsEmpty(Sum([Account].[Accounts].&[95].Children, [Measures].[Amount])),null,0);
    Format_String(this) = ";;"; //format zeros as blank
end scope;

It is assigning a zero to a cell if any statistical accounts have data in this slice, otherwise it’s assigning null. Then it’s using a custom format string just for the Statistical Accounts subtotal. The format string is just two semicolons. That three-part format string says to format zero as blank. Since the format string is inside the scope statement, we’re only customizing the format string for the Statistical Accounts subtotal.

If you prefer NA, then you would use the following format string code inside the scope statement:

    Format_String(this) = ";;\N\A";

And as discussed above, typically IsEmpty(Sum([Account].[Accounts].&[95].Children, [Measures].[Amount])) would just be IsEmpty([Measures].[Amount]) but we had to use the more complex expression here because of the ~ unary operators in play.

This approach solves all the problems we were trying to avoid. It doesn’t null out the grand total. It preserves sparsity in the cube. It avoids using expensive strings. It avoids null subtotals and null grand totals. And it is usable in Excel.

Mar 02
Tooting My Own Horn: Speaking at SQL Rally Dallas

SQL Rally LogoYou may have noticed I never post business intelligence industry news on this blog. That was a conscious decision on my part when I started this blog nearly five years ago. Why? Because so many others cover that much better than I ever could dream of.

I never really post news about myself either. Why? Because I doubt you really care, and I’d rather blog about BI. (I have to say it was difficult not to mention how proud I am to have achieved the Analysis Services Maestro designation since, at the moment, there are only 11 in the world and 3 in the United States.)

All that is to say that I’m breaking with tradition to announce that I’m speaking at SQL Rally in Dallas this May. Why? Because I love meeting other SSAS geeks in the area (and hopefully one or two of you are actually reading this blog).

99 Tips for Tuning and Enhancing Analysis Services
All-day preconference session on Wednesday, May 09, 2012
Come and hear Greg Galloway share all his best tips for optimizing performance of Analysis Services and improving the user experience and reach of your cubes. This session is targeted to BI developers responsible for maintaining Analysis Services cubes and to DBAs who are tasked with monitoring and troubleshooting Analysis Services issues. Expect to leave with an actionable checklist of ideas to try on Monday when you return to work, with a deeper grasp of the internals of Analysis Services, with MDX optimization experience, and with best practices that ensure your projects succeed. This preconference session will focus on Multidimensional cubes and MDX but will briefly discuss the advantages and limitations of new Tabular models and DAX in Analysis Services 2012.

Making Heads or Tails of Analysis Services Storage
Regular 75 minute session during the conference May 10-11, 2012
If you are about to start a new Microsoft business intelligence project, you have more Analysis Services choices and flexibility than ever before. Come learn what’s new in Analysis Services 2012 and learn the advantages and limitations of the new Tabular models compared with classic Multidimensional models. In this session we will take a deep dive into storage modes in Analysis Services. We’ll discuss the two storage modes a Multidimensional model can use: MOLAP and ROLAP. We’ll discuss the storage modes in new Tabular models: VertiPaq and DirectQuery. We’ll preview the new columnstore index in SQL Server 2012 and discuss how it impacts Analysis Services. And we’ll take a deep dive into two successful Analysis Services deployments at two Fortune 500 companies and discuss why one chose ROLAP and the other chose MOLAP.

I will do my best to avoid much overlap between those sessions, so I look forward to seeing you in both of them. Register for SQL Rally soon since prices go up after March 14th. The pre-conference session is only $219, which I think is quite a bargain.

Artis is a silver sponsor and will definitely have a big presence at the conference. And my Artis co-worker and fellow MVP Tim Mitchell is pulling double duty in helping organize the conference and speaking on Data Quality Services. (As a side note, I have to applaud how the program committee excluded their own speaking sessions from selection and simply put them up for community vote. Tim and others on the committee are fantastic speakers, so the community rightly chose many of their sessions.)

Jan 10
Excel Writeback – Security and Leveraging the UPDATEABLE Cell Property

Most of my blog posts chronicle some lesson learned in the heat of battle at a client site. This blog post is a bit different since the lesson learned came from an Artis internal project we started and completed last year. At Artis, our time entry system feeds into an Artis data warehouse and an Analysis Services cube for reporting. We also support writeback against the cube for project managers to enter forecasts for projects. That way, at the weekly staffing meeting, it’s pretty simple to see that Greg is forecasted to work 95 hours between four projects next week. At that point, preventative measures can be taken… like alerting local emergency rooms that I may be making a visit.

Seriously though, Artis is a great place to work since I get to work with people like Drew Jones and Cindy Liang (who actually implemented this resource planning cube), and I’ve never once gotten close to working 95 hours. I’ve got an MDX query to prove it! But I digress…

Writeback Cube Security
Since forecasting and resource planning is the goal, we use role-based security in the cube to prevent project managers from setting a forecast for a prior week (since it’s already too late to plan resources at that point). A typical scenario in industry where write security would be needed would be making sure each regional manager can only set the budget for his region. Setting up this security was a bit tricky, so I’ll describe it step-by-step.

The first step is to create a ReadAll role. In this case, the ReadAll role grants members access to read the whole cube. I’ve definitely seen a number of clients make mistakes here and grant extra permissions that are not required. Nothing needs to be checked on the General tab (not even Read definition which grants them access to see the metadata in tools like SSMS). On the Cubes tab you need Read and optionally Drillthrough access. If you want a user to be able to read the whole cube from a tool like Excel, that’s all you need to do.

image 

Next, build a separate Writeback role. On the Cubes tab, grant Read/Write, check Process, and (optionally) Drillthrough permissions. I’m frankly not sure why you need Process permissions to perform ROLAP writeback, but it appears you need Process permissions for ROLAP or MOLAP writeback:

image 

In the Data Sources tab, make sure to grant Read permissions on the data source used for writeback. For example, when I created my writeback partition, I used the Greg Test data source to direct that writeback table to live in another database (such as a database with Full recovery model):

image 

Finally, if you want to only let them writeback to certain cells, control this on the Cell Data tab. Since I was prototyping against Adventure Works data, the expression below returns true when the current year is later than 2003. In your cube, the expression might look at a current week flag in the Date dimension or something like that.

image 

Am I Seriously Recommending Cell Security?!?
Wait! Cell security! Won’t that kill performance? Definitely Analysis Services best practices call to avoid cell security in favor of dimension data security because cell security is a performance killer. The reason I use it here is because I don’t know of a way to accomplish the writeback security without cell security. The question is whether this particular combination of cell security designed to be write security impacts read performance. I verified this security only impacts the performance of writes with a simple trick. In the read/write permissions box above, I put in the following expression to prove this:

ASSP.FireTraceEventAndReturnValue(99) < 100

Let me explain what that’s doing. This FireTraceEventAndReturnValue function was designed for the very purpose of being able to detect via Profiler when Analysis Services evaluates it. So it simply fires a “User Defined” event in Profiler and then returns the integer you pass into the function. That way, you can watch Profiler and verify cell security isn’t being evaluated during read-only queries. It only appears in Profiler when you query the UPDATEABLE property or when you perform a write. So just building a PivotTable in Excel isn’t slowed down by the above cell security since Excel currently doesn’t query the UPDATEABLE cell property. But I’m getting ahead of myself as I’ll describe the UPDATEABLE cell property below.

Why is this the case that cell security didn’t impact read performance? Note that we used two roles. One role grants read permissions which let you read the whole cube. Another role grants write permissions but then says you can only write to recent years. If we had just created the Writeback role, users would not have been able to read 2003 data and cell security would impact read performance, too. By creating two roles, the read and write permissions are managed separately.

As a side note, we have a support case open to fix a bug where attempting to write data to 2003 sometimes causes SSAS to crash. It appears this may happen when the user is in two roles, as described above. Hopefully that bug will be fixed soon. (Update:It looks like it will be fixed in SQL 2008 R2 SP1 CU6.) Coincidentally, the VBA described below provides a workaround to the bug.


Testing Out Writeback in Excel 2010
Now that we’ve setup our role-based security, let’s see what the user experience is in Excel 2010. We start a PivotTable as usual. But since I’m an administrator on my Analysis Services dev server, I need to tweak the connection string to pretend to be a lower privileged user. Go to the Data tab, click Connections, and click Properties for the connection. Then append either of the following snippets to the end of the connection string:

;Roles=ReadAll,Writeback

;EffectiveUserName=DOMAIN\UserWithReadWritePermissions

The first pretends to be a member of the ReadAll and the Writeback roles. This works fine as long as the security is hardcoded (i.e. not dynamic security which uses the UserName() function in MDX expressions in the role). The EffectiveUserName option is another option which impersonates a specific user, and I’ve blogged about it before.

Now I’ll build a simple PivotTable then enable What-If analysis. What happens when I edit a 2003 cell and publish my changes? I get this error message:

image 

The error message is at least helpful in describing what went wrong. But it’s not helpful in describing which cell was secured. Also, it’s not helpful that it waited until I was finished entering my data before giving me an error during the Publish. Some immediate feedback would have been nice.

So how do you tell in advance which cells are secured and which cells are updateable. Fortunately, there’s a handy cell property called UPDATEABLE which does just that. Unfortunately, Excel 2010 what-if analysis doesn’t leverage it, so I have proposed this feature. For now we can use a little VBA to roll our own.


Using VBA to Query the UPDATEABLE Cell Property
The following VBA code catches the Worksheet’s PivotTableAfterValueChange event. I could have taken a number of approaches, and I’ll discuss the pros and cons of this approach below, but I chose this approach for this prototype because it was the most straightforward. When you type in a new value for a cell in a PivotTable, this event fires. This code borrows the ADO connection from the PivotTable and runs a simple query that retrieves the UPDATEABLE cell property for that cell’s coordinates in the cube. I definitely haven’t explored all the possible values of the UPDATEABLE cell property, so you’ll need to verify this code with your cube. But for the simple example in this blog post, I’ve seen two possible values. If UPDATEABLE is CELL_UPDATE_ENABLED then writeback is allowed and the cell is not secured. If UPDATEABLE is CELL_UPDATE_NOT_ENABLED_SECURE then the cell is secured and will produce an error upon writeback, so we discard that change and popup the error message:

image 

In my mind, that’s a better user experience. You get immediate feedback that tells you which cell is the problem. So without further ado, here’s the code:

Private Sub Worksheet_PivotTableAfterValueChange(ByVal TargetPivotTable As PivotTable, ByVal TargetRange As Range)

 

    On Error GoTo ErrHandler

 

    Application.ScreenUpdating = False ' turns off screen updating

 

    'from http://msdn.microsoft.com/en-us/library/ms145573.aspx

    Const MD_MASK_ENABLED = &H0

    Const MD_MASK_NOT_ENABLED = &H10000000

    Const CELL_UPDATE_ENABLED = &H1

    Const CELL_UPDATE_ENABLED_WITH_UPDATE = &H2

    Const CELL_UPDATE_NOT_ENABLED_FORMULA = &H10000001

    Const CELL_UPDATE_NOT_ENABLED_NONSUM_MEASURE = &H10000002

    Const CELL_UPDATE_NOT_ENABLED_NACELL_VIRTUALCUBE = &H10000003

    Const CELL_UPDATE_NOT_ENABLED_SECURE = &H10000005

    Const CELL_UPDATE_NOT_ENABLED_CALCLEVEL = &H10000006

    Const CELL_UPDATE_NOT_ENABLED_CANNOTUPDATE = &H10000007

    Const CELL_UPDATE_NOT_ENABLED_INVALIDDIMENSIONTYPE = &H10000009

 

    Dim pcache As PivotCache

    Set pcache = TargetPivotTable.PivotCache

 

    If Not pcache.IsConnected Then

        pcache.MakeConnection

    End If

 

    Dim oCellset As New ADOMD.cellset

    oCellset.ActiveConnection = pcache.ADOConnection

 

    Dim sError As String

    Dim oPivotTableCell As Range

    For Each oPivotTableCell In TargetRange.Cells

        Dim oPivotCell As PivotCell

        Set oPivotCell = oPivotTableCell.PivotCell

 

        Dim sMDX As String

        sMDX = "select from [" & pcache.CommandText & "] WHERE " _ 
         & oPivotCell.MDX & " CELL PROPERTIES UPDATEABLE"

 

        oCellset.Source = sMDX

        oCellset.Open

 

        Dim iUPDATEABLE As Long

        iUPDATEABLE = oCellset(0).Properties("UPDATEABLE")

 

        If iUPDATEABLE = CELL_UPDATE_ENABLED Then

            'update allowed

        ElseIf iUPDATEABLE = CELL_UPDATE_NOT_ENABLED_SECURE Then

            sError = sError & "Cell " & Replace(oPivotTableCell.Address, "$", "") _
             & " is secured." & vbCrLf

            oPivotCell.DiscardChange

        End If

 

        oCellset.Close

 

    Next

 

    If sError <> "" Then

        sError = "The following cells do not allow writeback, " _
         & "so their values were discarded."
& vbCrLf & vbCrLf & sError

        MsgBox(sError)

    End If

 

ErrHandler:

    Application.ScreenUpdating = True ' turns on screen updating

 

End Sub

 

There is one main downside to this approach. It can slow down the data entry process since after each cell is changed, we execute one MDX query per cell to retrieve the UPDATEABLE cell property. This is generally fairly snappy as running 100 of these MDX queries only took a second against the local network and several seconds over a slower VPN connection. But when there are a bunch of secured cells you attempt to update, discarding each change one cell at a time takes a couple of seconds, which adds up. Another downside to consider is that if you enter a change on Monday then save the workbook without publishing those changes, when you open the workbook on Tuesday your writeback permissions in the cube may have changed yet this VBA code doesn’t rerun for previous changes you’ve entered but not published.

Another approach would be to catch the PivotTableUpdate event and grab the coordinates of all the cells and run one MDX query to retrieve the UPDATEABLE property for all cells in the PivotTable, caching this information for later when the user updates cells. The reason this approach would be more complex is because the .MDX property of a subtotal PivotCell returns a tuple with one less member than a PivotCell that’s a detail level cell since the All member isn’t mentioned in the subtotal’s MDX. And you can’t easily combine two tuples with different dimensionality. So I went with the more straightforward approach for this prototype.

Note in the VBA above the use of the ADOMD.Cellset object. This requires a reference to any version of Microsoft ActiveX Data Objects (Multi-dimensional), and you can add that reference from within the VBA editor from the Tools… References menu:

image 

What’s He Not Telling Me?

This post focused on role-based security for writeback and how to improve that user experience. I didn’t talk about setting up a writeback partition or whether to choose MOLAP or ROLAP writeback. I didn’t discuss how allocation method impacts writeback performance. I didn’t discuss how to writeback to non-leaf cells. I didn’t discuss the necessity of periodically moving data from the writeback table into the regular fact table. Maybe I’ll blog about those in the future.

I also didn’t discuss the pros and cons of the different architectures we could have chosen. This project lent itself to extending our existing time entry reporting cube with forecast writeback, but we could have also chosen to implement a good budgeting and planning tool like deFacto. That decision process is a blog post for another time.

Conclusion
To summarize, be sure to grant read permissions and write permissions in separate roles if the user should be able to read more of the cube than they can write to. The UPDATEABLE cell property can come in handy if you want to know in advance which cells a user has permissions to update. With a little VBA, it’s pretty straightforward to improve the writeback experience where writeback security is involved by querying the UPDATEABLE cell property.

Updates
1. Fixing some VBA formatting/syntax issues that happened when I pasted my the code into my blog
2. Updating the status of a hotfix for the SSAS bug I mentioned which will be fixed in SQL 2008 R2 SP1 CU6.

Dec 29
Analysis Services and the Case of the Standby Cache

As an homage to my two favorite detectives, Nero Wolfe and Gregory House, here’s an Analysis Services detective story for you. If you enjoy the thrill of the chase, read on. If you take yourself too seriously and just want the facts, skip to the Conclusions section at the end.

Our story begins on a busy summer day in which our hero (me) finds himself covering for a co-worker on vacation. I get a call that queries are running slow on the Analysis Services box, so I go investigate. The first thing I noticed was that Task Manager shows zero free memory...

Windows Server 2008 Task Manager showing zero free RAM
Image 1: Windows Server 2008 Task Manager showing zero free RAM


Chapter 1 (Or How I Discovered Analysis Services Didn’t Want More Memory)

Honestly, my first instinct was that Analysis Services thought it was in a low memory situation, and I suspected that might explain the performance issues. So I decided to investigate what was consuming that memory. You could look at msmdsrv.exe memory usage in Task Manager, but there are times when it doesn’t show the right number. So I checked the “MSOLAP$instance:Memory:Memory Usage KB” perfmon counter, and it told me SSAS was only using 1GB of memory. Considering this was a dedicated SSAS box, and considering LowMemoryLimit was set at 6GB and TotalMemoryLimit was set at 8GB, either SSAS didn’t need the memory, or worse, it did need it but didn’t think there was enough memory available.

Akshai Mirchandani on the SSAS dev team was kind enough to tell me that SSAS uses the GlobalMemoryStatusEx Windows API to determine how much available memory there is at the moment. I wrote a little EXE to pull that number, and I found that it told me there were 8GB available. That number matched the “Memory : Available MBytes” perfmon counter, as it should. I learned the Free number showing in Task Manager just means zeroed out RAM, so it’s nothing to be concerned with as long as there’s still enough Available memory. In short, ignore the “Free” number in Task Manager. The confusion caused by showing Free but not Available in Windows 2008 Task Manager is probably the reason they added Available back into Task Manager in Windows 2008 R2.

Akshai also suggested I double check it was 64-bit SSAS installed since 32-bit SSAS is limited to 2GB memory usage. So I ran the following from an XMLA window in Management Studio, looked for the <Edition> tag in the result, and verified it ended in 64 (i.e. <Edition>Enterprise64</Edition> means 64-bit enterprise edition).

<Discover xmlns="urn:schemas-microsoft-com:xml-analysis">
  <RequestType>DISCOVER_XML_METADATA</RequestType>
  <Restrictions>
    <RestrictionList>
      <ObjectExpansion>ObjectProperties</ObjectExpansion>
    </RestrictionList>
  </Restrictions>
  <Properties>
    <PropertyList>
    </PropertyList>
  </Properties>
</Discover>

My conclusion was that SSAS knew there was memory available to it and it could have grabbed more memory if it needed it, so it must not have needed the memory.


Chapter 2 (Or How I Discovered the Standby Cache)

Even though there was 8GB available RAM, I still wanted to track down what was consuming the memory such that Windows reported there was 0MB free. Knowing that when SSAS reads a file off disk, it uses a buffered read which is able to cache the file in the Windows System File Cache, I went to check the “Memory : Cache Bytes” perfmon counter. It said the Windows System File Cache was using 3GB currently. Since the cube in question was 30GB, and since I had set the LimitSystemFileCacheSizeMB setting to 3GB to make sure the system file cache didn’t get out of hand, I wasn’t surprised it was using exactly 3GB. (3GB was the value I had chosen on this server after studying usage patterns during querying and processing. This is just a dev server, so it doesn’t have as much memory as it really needed to perform optimally. For at 30GB cube, we would recommend at least 30GB of RAM usually. But for this dev server, we have to make due with what we have.)

Recapping, SSAS was using 1GB, and the system file cache was using 3GB. The server had 12GB total. Do the math. Something else is using 8GB of memory on this dedicated SSAS box. There weren’t any processes in Task Manager that were using significant chunks of RAM.

At this point, the Task Manager number saying Cached was about 10GB gave me my next hint. I began researching Windows 2008 caching and made one of my most significant discoveries of 2010. The discovery was this: there’s a standby cache. There were several articles that helped me figure this out, but one of the key articles was the Too Much Cache blog post which mentions the standby cache briefly. Let me explain how it works. Let’s say you let your active system file cache grow to 4GB, then you set the LimitSystemFileCacheSizeMB setting to 3GB. That extra 1GB doesn’t get zeroed; it just gets recategorized as “Standby Cache” rather than “Active Cache” and is now categorized as “available” so it can be quickly zeroed and given to other processes if needed. Or if the active system file cache needs it again, it can be quickly soft faulted back into active system file cache.

Why did I say this was such an important discovery? As Somak noted in the Too Much Cache blog post, grabbing a page off disk takes about 10 milliseconds, whereas grabbing a page out of RAM takes about 10 nanoseconds… that’s 1 million times faster. So obviously the system file cache is crucial to SSAS query performance. And accounting for the system file cache is key to repeatability in testing MDX query performance—you will never be able to compare two executions of an MDX query unless you can ensure the system file cache is clear before both executions. I’ll get to how you can clear the active and standby cache in a second.

At this point, I discovered a killer Sysinternals tool called RAMMap which gives you great insight into the system file cache. I ran a few queries against Adventure Works before the following screenshot. Note the Internet Sales 2004 partition is in the active system file cache, as shown on the File Summary tab:

RAMMap showing active system file cache
Image 2: RAMMap showing active system file cache

Now I clear the active system file cache using ASSP (which I’ll mention later). Note the cached files just got moved into the standby cache:

RAMMap showing standby system file cache
Image 3: RAMMap showing standby system file cache

Now if I clear the standby cache using ASSP (which I’ll mention later) or using RAMMap’s “Empty… Empty Standby List” menu, you’ll note those Adventure Works files drop out of the standby cache.

Anyway, back to our detective story. Poor performance on a query had been reported, but I couldn’t reproduce it. I thought I was testing that query on a cold cache because I had cleared the active system file cache and the Analysis Services caches, but I hadn’t accounted for the standby cache. So the query was performing OK for me because many of the files were in the standby cache. Once I cleared the standby cache, I could reproduce the poor performing query, and I ended up building a new aggregation that solved the problem.

The lesson learned was to pay attention to the standby cache (the “Memory : Standby Cache Normal Priority Bytes” perfmon counter) in addition to the active cache (the “Memory : Cache Bytes” perfmon counter).


Chapter 3 (Or How ASSP Got Its Groove Back and Cleared the Standby Cache)

Being able to use the “Empty… Empty Standby List” menu item in RAMMap is nice, but wrapping that Windows API call into an Analysis Services stored procedure lets you clear the standby cache right from a Management Studio query window (or with an automated testing tool). So I added this functionality to the ASSP project. Shout out to Tim Mangan and Marco Russo who both helped track down that Windows API call.

About a year and a half ago I added some code into the ASSP project to clear the system file cache. I thought it would help MDX query test repeatability, but it strangely had little impact. Now I know why—it was only clearing the active system file cache (which simply shuffled those pages into the standby cache). Now ASSP has its groove back because it clears both the active and standby cache with the ClearFileSystemCache sproc. So if you want to test on a true cold cache without rebooting the server, the best practice is to run run the ClearAllCaches sproc (which internally clears both the Analysis Services cache with the ClearCache sproc and clears the active and standby system file cache with the ClearFileSystemCache sproc), then run your MDX query, all from an MDX window in Management Studio:

call ASSP.ClearAllCaches()
GO
<your MDX query here>

Chapter 4 (Or How I Stand on my Soapbox and Condemn Paging)

During the discussion above, I have only been speaking of memory in terms of physical RAM (rather than virtual memory). I can hear Marco Russo chiding me and whispering in my ear, “Programs request virtual memory from Windows, and Windows allocates RAM or pages memory to disk as it sees fit.” While he is right, physical RAM is all that’s relevant in practice. If there’s much paging going on, performance will be terrible. And this brings me to a favorite soapbox.

I can’t count the number of times I’ve seen a new client’s server in the following state. The server has SQL and SSAS on the same box. Both are set to the default memory limits (which is no memory cap for SQL and 80% of server memory for SSAS). By the time I see the server, SQL and SSAS are fighting each other for memory, causing each other to page out, and performance degrades dramatically. (Remember what I said about disk being a million times slower than RAM?)

What’s the solution? Cap all the large consumers of memory such that they don’t step on each others’ toes. That’s an easy principle, but it gets very difficult when you have to go through the testing iterations to prove your server performs better overall with memory limits for SQL and SSAS. For example, I have observed one particular stored procedure take 12 hours to complete if SQL was capped at 6GB of memory, but that same stored procedure would complete in 3 hours if SQL was capped at 14GB of memory. The point is that you have to thoroughly test your memory limit changes. But in my experience, a server with a well-tested memory usage plan will perform better and more consistently because it largely avoids paging.

An example memory usage plan on a 24GB RAM server with SQL and SSAS might be similar to the following. SQL will be capped at 10GB using the “maximum server memory” setting. SSAS will be capped at 8GB using TotalMemoryLimit/HardMemoryLimit settings (in combination with LowMemoryLimit). The active system file cache will be capped at 4GB using the LimitSystemFileCacheSizeMB setting. And we’ll assume the OS will use about 1GB, and programs like SSMS or BIDS will use about 1GB. Note the standby system file cache doesn’t need to be figured into your memory usage plan because that memory is categorized by Windows as Available.

Memory Consumer GB
SQL 10
SSAS 8
Active System File Cache 4
OS 1
Programs like SSMS 1
24


Even better, if you determine that during the ETL, SQL needs more memory and SSAS needs less, it is easy to build a SQL script and an XMLA script to programmatically change those memory limits before and after the ETL.

Another best practice to consider is using the “lock pages in memory” setting to prevent paging. This KB article describes how to enable this setting for SQL Server. For Analysis Services, once you’ve set that group policy, use the PreAllocate setting to tell Analysis Services to grab a chunk of RAM that won’t be paged out. (Even without “lock pages in memory” PreAllocate helps SSAS memory allocation performance on Windows 2003. On Windows 2008, the main benefit of PreAllocate is to lock pages in memory.) Just be aware (and don’t panic) that Analysis Services will take quite a while to start and will pretty much lock up the server during that period.


Conclusion (Or How I Summarize Seven Pages In One Paragraph)

Ignore the “Free” number in Task Manager and focus on the “Memory : Available MBytes” perfmon counter. Be aware of the active system file cache (the “Memory : Cache Bytes” perfmon counter) and the standby system file cache (the “Memory : Standby Cache Normal Priority Bytes” perfmon counter) as they both impact MDX query performance drastically. Use RAMMap to see what’s in the system file cache. To test MDX query performance on a cold cache, use ASSP.ClearFileSystemCache. Avoid paging by capping the memory usage of large consumers of memory and by using the “lock pages in memory” group policy. Test before and after.

Apr 01
Using EffectiveUserName To Impersonate in SSAS

It's common to implement your role-based security in your Analysis Services cube. That way Fred can only see the sales from the east region, but Frank can only see sales from the west region. Role-based security requires connections to your cube identify the end user so the cube can secure itself. One way of tackling this is to implement Kerberos so that the credentials of the end user pass from the end user's laptop, to the report server, to the cube.

image

But what if your end user's laptop is on another company's domain? Or what if your end user's laptop is on a home workgroup? That laptop can't start a Kerberos ticket that will be valid on your domain. One option is to throw ISA Server 2006 into this mix, use it to present the user with a login form over HTTPS, then let ISA start the Kerberos ticket:

image

For either of the above diagrams, you would setup your data source in SSRS as follows:

image

We would definitely recommend the above approach, but we realize that ISA and Kerberos add a good deal of complexity to your infrastructure. So the point of this blog post is to mention a good trick for getting role-based security without requiring your end-users' laptop to be able to start a Kerberos ticket.

Before I mention the main trick that's the focus of this post, I should mention that enabling basic auth over HTTPS in SSRS is one possibility to solve this problem. It's not as good of a user experience as the single sign-on login form you can customize with ISA Server, but it works.


A Checkbox in SSRS Many People Misunderstand

The following setup of a data source in SSRS will allow you to pass the end user's credentials to the cube without having a Kerberos ticket start at the end user's laptop, and without having ISA Server.

image

Because you stored credentials, SSRS will open the connection to SSAS using those domain\SSASAdminUser credentials. Because you checked the "Impersonate the authenticated user after a connection has been made to the data source" checkbox, behind the scenes SSRS is going to append the EffectiveUserName property into the connection string, specifying the end user who is running the report. The EffectiveUserName property tells SSAS to pretend to be the specified user so that the results of MDX queries use the role-based security setup for that end user.

Only SSAS administrators can use the EffectiveUserName property, so the domain\SSASAdminUser must be an SSAS admin. The other requirement is that you must setup Kerberos on the SSAS server. To be clear, you need to setup the SPN on the SSAS server and you need to setup the AD configurations for delegation, but you don't need to setup Kerberos on any other servers besides the SSAS server. So this is a fairly quick way to get up in running without having your whole network setup for Kerberos.

Since you're using a service account which is an SSAS admin, carefully validate that you've setup the data source correctly to use EffectiveUserName impersonation. You wouldn't want to show your users more data than they're allowed to see just because you forgot to check a box!


SSRS Subscriptions

What happens when an end user tries to create a new subscription on a report with the above data source setup? You get this annoying error message: "Subscriptions cannot be created because the credentials used to run the report are not stored, or if a linked report, the link is no longer valid." It says the credentials have to be stored. Uh. They are. But it's just complaining that we've checked the "impersonate the authenticated user" checkbox. My personal opinion is that this is an oversight. Feel free to vote for this Connect suggestion.

So we have to go for a workaround. The best workaround I could find is to use an expression-based connection string. So edit your report in BIDS, edit the data source, click the fx (expression) button, and enter the following expression-based connection string:

="Data Source=SSAS;Initial Catalog=YourDB;EffectiveUserName=" & User!UserID

After you deploy it, check the data source has stored credentials. Don't check the "Impersonate the authenticated user" checkbox.

image

Now users can create their own subscriptions. When the subscriptions run, the user receives report data according to his role-based security.


Data-Driven Subscriptions

Now with your expression-based connection string, you try to create a data-driven subscription. And you get this error message: "Data-driven subscriptions cannot be created because the credentials used to run the report are not stored, the report is using user-defined parameter values, or if a linked report, the link is no longer valid." I believe it's complaining about the user of User!UserID.

You need to create a separate report for data-driven subscriptions. The expression-based connection string in the report needs to be changed to:

="Data Source=SSAS;Initial Catalog=YourDB;EffectiveUserName=" & Parameters!User.Value

Then you need to create a hidden parameter called User. Then the data-driven subscription needs to pass in the appropriate domain\username into that parameter.

Make sure end users don't have access to this report designed for data-driven subscriptions because if they figure out how to set the User parameter, they could run the report and see someone else's data.


The Dark Side of Expression-Based Connection Strings

Note expression-based connection strings can't be used in shared data sources. So you have to manage this one report at a time. Plus you can't even set the expression-based connection string from Report Manager. So you have to set the expression-based connection string in the report at development time, and this makes it a pain to deploy the same report to dev and prod without changing the RDL. Also, once you have changed your report to use an expression-based connection string, you can't open the query designer. So expression-based connection strings really get in the way of the proper development, deployment, and manageability experience.

So you might consider developing your reports with regular shared data sources. Then after you deploy the reports to the server, you can run .NET code like the RSExpressionBasedConnectionString sample code I worked up. Once you've got the sample code, edit the app.config pointing it at a folder in SSRS. It loops through all the reports in that folder, finds SSAS data sources, and changes them all to expression-based connection strings. You'll obviously need to tweak the code a good deal to have it do exactly what you want in your environment, but hopefully that's a start. I might know a few good consultants if you happen to need help customizing this for your needs. :-)

Teo Lachev also suggested to me that you could use an RDCE to accomplish the same thing as that .NET code above. But just like with the .NET code above which has to run after the report has been deployed, you'd have to set each report to use the RDCE after it is deployed. Neither option is ideal, but both would work.


Conclusion

Knowing about that "Impersonate the authenticated user" checkbox is a handy trick to have in your bag. Knowing that it's using EffectiveUserName under the covers is very helpful because it lets you use an expression-based connection string to enable subscriptions. But expression-based connection strings are almost more pain than they are worth. So you might consider automating the setup of those expression-based connection strings with some code like what I have mentioned above.

Sep 22
Self-Documenting Cubes in Excel PivotTables

imageIf I were the supreme leader of the Excel team (and I've had trouble convincing them of that), one of the first things I would do is add tooltips in the PivotTable field list so that when you mouse over a field, it would describe that field. I've asked for that feature, but so far, it doesn't look like it will make Excel 2010.

So... plan B. But let me first give some background on self-documenting cubes.


Self-Documenting Cubes

I'm big on self-documenting code because I've had too many experiences where separate documentation gets stale and irrelevant. Plus, I'm lazy. Thus, anything that makes it easier for me to keep documentation up-to-date and relevant is a good thing in my book.

Having descriptive names for measures and dimensions is one aspect of self-documenting cubes. You'll see descriptive calculation names like "Internet Sales Amount Variance to Annual Quota" in my cubes. But what I really want to focus on in this post is the Description property which appears on measures, dimensions, and all other objects in Analysis Services.

imageA few years ago at a user group, I was speaking with one prominent member of the Analysis Services team (who shall remain unnamed), and when I mentioned the Description property, he asked incredulously whether anybody ever filled those in. If you're wondering the same thing, the reason I fill in every single one is because it's a cinch to generate documentation that's relevant and up-to-date. You can use various tools that are out there. Or you can easily write a simple custom .NET app which loops through measures and dimension attributes and writes the descriptions to an HTML document. Or you can make use of descriptions with the following idea.


Plan B

Since we don't get tooltips on the field list in PivotTables for now, what's plan B? I'll credit my co-worker Drew Jones with this idea. He suggested using actions in cubes to approximate this as much as possible. Here's what we came up with.

Right-click on any numeric cell in a PivotTable, and you can see the first 100 characters of the description property of that measure under Additional Actions...

image

Here's how you do that. First of all, you'll need the latest version of the ASSP assembly which includes the DiscoverSingleValue functions. Then you'll need to create an action in your cube with the following setup:

Target type Cells
Target object All cells
Condition

Len(
ASSP.DiscoverSingleValue(
  "Description"
  , "MDSCHEMA_MEASURES"
  , "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
    + "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
)
)>0

Action Content Type URL
Action Expression "http://intranet/Documentation/CubeDocumentation.html"
Caption

"Definition: "
+ Left(
   ASSP.DiscoverSingleValue(
    "Description"
    , "MDSCHEMA_MEASURES"
    , "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
      + "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
   )
  ,100
  )
+ iif(
   Len(
    ASSP.DiscoverSingleValue(
     "Description"
     , "MDSCHEMA_MEASURES"
     , "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
       + "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
    )
   )>100
   ,"..."
   ,""
  )

Caption is MDX True


The action looks like the following in BIDS when you're done:

image

The only downside is that Excel 2007 doesn't even ask for actions on cells any time there is a multi-select in a PivotTable filter. But other than that limitation, it's a pretty good workaround.

Feb 19
LastNonEmpty Gotchas

I have been collecting a few advanced tidbits about semi-additive measures over the last few months, and it's time to share them.

Role-Playing Time Dimensions

Semi-additive aggregation functions like LastNonEmpty work just like Sum on all dimensions except for the Time dimension. How does it determine which is "the Time dimension"? It looks for the dimension marked Type=Time:

TypeTime

But what happens when, like most cubes, you have multiple role-playing versions of that dimension in your cube? Which does it choose? Flip to the Dimension Usage tab in BIDS. For the column representing that measure group with the semi-additive measure, find the role-playing date dimensions. I've circled them in green below. Then find the topmost one that's connected to the measure group. I've circled that in orange below. That's the semi-additive dimension for this measure group. A LastNonEmpty measure in this measure group will look for the most recent day with data in that Date dimension.

DimensionUsage

If you discover that the wrong dimension is the semi-additive dimension, it's very easy to fix. You just need to make the correct dimension the topmost connected role-playing date dimension by reordering the cube dimensions. To reorder the cube dimensions, flip to the Cube Structure tab, look at the list of cube dimensions in the bottom left, then drag and drop them into the correct order, similar to the following:

ReorderCubeDimensions

I suggest you arrange the cube so that the semi-additive dimension be the same for every measure group. This is probably a topic for a post another time, but I'm a big proponent for having a dimension simply called Date which is connected to every single measure group as the most common meaning of date. This dimension should be the semi-additive dimension.

I was warned that there have been some code changes in the Analysis Services product in this area at some point. So I tested this behavior in AS2005 RTM, SP1, SP2, SP3, and AS2008 RTM. I found it worked as I described above in all those versions. But test your cube thoroughly.

On a related note, the BIDS Helper Validate Aggs feature understands semi-additive measures and role-playing dimensions correctly. If you’ve got a measure group with only semi-additive measures, and if there are any aggs that don’t include the granularity attribute of the correct semi-additive dimension, then it warns you that those aggs will not be used in most situations.

Performance of LastNonEmpty vs. LastChild

On my largest cube, we decided to go with LastChild over LastNonEmpty. The driving reason was that LastNonEmpty hit more partitions than it needed to. To be specific, if we asked for a LastNonEmpty measure for 2008, it would hit every partition in 2008, not just the latest. LastChild only hits the latest partition. I would like to credit Robert Skoglund of Microsoft Consulting Services with making this observation and suggesting the solution.

The only hurdle was that the current month/quarter/year weren't complete, so the LastChild didn't have any data. That was easy to fix with an addition to the MDX script like the following:

// the most recent day with data
CREATE HIDDEN SET [Yesterday]
AS Exists([Date].[Calendar].[Date].Members, [Date].[Yesterday Flag].&[True]);

// The Inventory measures are AggregateFunction=LastChild.
// When we are in a current period, use the most recent day to
// get the inventory counts.
SCOPE(MeasureGroupMeasures("Inventory"));

 
SCOPE(Exists([Date].[Calendar].[Month].Members, [Date].[Yesterday Flag].&[True]));
    this = [Yesterday].Item(0).Item(0);
  END SCOPE;
 
  SCOPE(Exists([Date].[Calendar].[Quarter].Members, [Date].[Yesterday Flag].&[True]));
    this = [Yesterday].Item(0).Item(0);
  END SCOPE;
 
  SCOPE(Exists([Date].[Calendar].[Year].Members, [Date].[Yesterday Flag].&[True]));
    this = [Yesterday].Item(0).Item(0);
  END SCOPE;
 
  [Date].[Calendar].[All] = [Yesterday].Item(0).Item(0);
 
END SCOPE;


Another reason for going with LastChild over LastNonEmpty is that the results are often more intuitive in situations where not every single item is snapshotted every single period. Teo Lachev explained this problem very well in his post Last Non Empty Affairs. Depending on the business requirements, you could workaround the problem as he did with MDX that looks at each Customer, or you could change the ETL to snapshot every single item every period, or you could use LastChild to ensure that your results come from a consistent time period. The LastChild trick won't work in every situation, but it is a handy trick to have in your bag.

Skip to main content Greg Galloway