My Tuesday morning was an exercise in patience waiting on Excel 2013. I was working on a complex workbook with a complex Power Pivot model with half a dozen complex PivotTables. After every minor change in Power Pivot, all the reports refresh. Hide a column… wait 2 minutes… change a calculation… wait 2 minutes. I finally got fed up with it, had a brain spark, and then my Tuesday afternoon was very productive.
Excel 2013 in most ways is a great leap over Excel 2010. Baking Power View directly into Excel was such a smart move for Microsoft to have made. Including Power Pivot in Excel in the form of the Data Model was clever because a user can leverage all that power without thinking about modeling or add-ins. However, in one way Excel 2013 was a major step backwards. Unlike Power Pivot in Excel 2010 where it assumed the user was intelligent enough to make a batch of changes to the model and then manually refresh the PivotTable reports, Excel 2013 tries to make this process more automatic. With every minor change to the model, all PivotTable reports refresh. What’s worse is that if I want to move a calculated measure from one table to another, Excel 2013 decides that I want to remove the measure from all reports.
Fortunately, we can do something about that…
Let’s say you are about to make a large batch of Power Pivot changes and don’t want PivotTables to refresh after each small change.
Install the latest version of OLAP PivotTable Extensions. Then use the Disable Auto Refresh feature by right clicking on a PivotTable connected to your Power Pivot model:
That command will disable auto refresh on all PivotTables connected to your Power Pivot model.
Now proceed with your Power Pivot changes and none of the PivotTables will refresh after every change. What’s even better is that you can make major changes like moving a calculated measure from one table to another without that measure being deleted from all the PivotTables.
Right click on a PivotTable connected to your model and Enable Auto Refresh:
On the topic of refresh, don’t forget that you can also speed up model changes by changing the calculation mode within Power Pivot under the Design tab and the Calculation Options dropdown to Manual Calculation Mode and choosing Calculate Now only as necessary:
The real question now is this… what are you going to do with all that time you got back!?
Some of the most fun I’ve had this year was spent collaborating with some friends at Microsoft to publish an important guide on performance tuning Tabular models in Analysis Services. I am very proud to have helped co-author the paper which represents hundreds of hours of work by many people.
Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services
The reaction from the community has been great (here and here for example). Chris Webb even noted that there is some content in the whitepaper relevant to Multidimensional models, too. Specifically he mentioned we noted in the whitepaper that SQL 2012 SP1 CU4 (though you should just skip to CU6 which is the latest at the moment) included an enhancement so that multi-select filters no longer prevent caching in most scenarios. While that is a huge deal for Multidimensional models, it’s also very helpful for Tabular models if you’re querying them in a client tool like Excel which runs MDX queries. For those who are optimizing Multidimensional models, I’d also point you to a summary of SSAS enhancements made in SSAS 2012 on page 144, a warning about very wide models and heap fragmentation on page 67, and a discussion of updated SSAS drivers and performance on page 69.
PASS Summit 2013
John Sirmon and I had the privilege of summarizing 147 pages of content into a 75 minute presentation at PASS Summit 2013 in Charlotte, NC recently. The room was pretty packed and there were some great questions afterward.
SQL Saturday #255 – Dallas
For the locals who weren’t able to make it to Charlotte, I’m going to be doing the talk in my hometown for SQL Saturday Dallas on November 2, 2013. Here’s the abstract:
Gain the knowledge, tips, and techniques you need to tune your Tabular models in SQL Server Analysis Services 2012. This session will begin by exploring the Tabular model architecture and explaining the factors that affect performance. We’ll then walk through tuning techniques and strategies specific to querying, processing, partitioning, and server configuration and demonstrate optimizing performance in some common scenarios. This session is based on the recently released white paper “Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services.”
Tabular Performance Event
It’s hard to do justice to this content in an hour, so I am looking forward to speaking at an all-day event at our local Microsoft office in Irving, TX on November 12, 2013. If you are interested, please reach out to Bryan Smith.
Adventure Works bashing is quite a fun sport. It’s unfortunate Adventure Works makes such an easy target. In this post I want to pick on how the Adventure Works cube shows N/A for the Statistical Account subtotal.
Obviously it makes no sense to sum together Headcount and Units and Average Unit Price. Setting the Statistical Accounts subtotal to NA does make sense. Adventure Works uses the following code to do this:
// Assignment of constant for Statistical Accounts account member.
// This will allow parent member to be visible by default.
( [Account].[Accounts].&, [Measures].[Amount] ) = "NA" ;
Unfortunately, this is not the best way for several reasons.
Strings Break Aggregations
First, assigning a string will null out any subtotals and grand totals above where you assigned the string. This problem isn’t visible in Adventure Works since IsAggregatable is set to false on the [Account].[Accounts] hierarchy which removes the grand total. But if we do the following assignment on a hierarchy with a grand total, we see the problem:
( [Product].[Product Categories].[Category].[Accessories], [Measures].[Internet Sales Amount] ) = "NA";
| || |
You can see that when we assigned a string to the Accessories subtotal, it nulled out the grand total making it disappear. And null grand totals are a major usability problem, as I’ll explain in a second.
The second reason the Adventure Works NA approach is flawed is that it eliminates the sparsity in the cube for the Statistical Accounts subtotal. For example, let’s say we put [Date].[Date] on columns, note a bunch of dates with no data except for NA show up. This hurts performance tremendously and makes it more difficult to find meaningful data.
Typically, fixing this problem involves a conditional assignment which ensures empty cells stay empty:
( [Account].[Accounts].&, [Measures].[Amount] ) = IIf(IsEmpty([Measures].[Amount]),null,"NA");
But in Adventure Works, all the Statistical Accounts have a ~ unary operator which means that Statistical Accounts subtotal is always null. So to preserve sparsity and fill in the Statistical Accounts subtotal with NA only where any statistical accounts have data, we would have to do something like:
( [Account].[Accounts].&, [Measures].[Amount] ) =
Strings Are Less Efficient
The third reason the Adventure Works NA approach is flawed is the degradation in performance due to the use of strings. In talking with the SSAS team, Marius Dumitru summarized this problem by saying, “strings are larger than other data types, harder to manipulate, introduce data structure indirections and memory copies/allocations.”
Mosha Pasumansky mentioned how strings and sparsity impact performance in the discussion of the Ratio calculation in this old blog post.
Null Grand Totals
Recently some co-workers were building a PivotTable and no matter what they did, they could never get anything more than a blank PivotTable. After a bit of head scratching, I realized what was happening was that the grand total was nulled out, so they were never even able to start the PivotTable properly.
To demonstrate this problem, let’s pretend we want to null out the grand total on the Product Categories hierarchy:
( [Product].[Product Categories].[All Products], [Measures].[Internet Sales Amount] ) = null;
Now if we start a PivotTable and put something other than the Product Categories hierarchy on rows, you’ll see this null grand total prevents us from starting the PivotTable since no rows even show up:
| || |
Incidentally, you can tell from the “this will allow parent member to be visible” comment in the code at the top of this blog post that the developers of Adventure works had this very thing in mind. Since the ~ unary operator caused Statistical Accounts to be null always, it made it impossible to expand that subtotal since it would never appear.
Of course, your users could check the “Show items with no data on rows” option under PivotTable Options… Display. But you can’t expect your users to know to do that. PivotTables are designed for top down analysis. And if the top is null, your PivotTable is not user friendly.
So how do we avoid all three Adventure Works problems created by assigning strings? And how do we avoid the problem of null subtotals and null grand totals?
You will note Mosha Pasumansky’s approach in the blog post above was to use NULL and a four-part format string like '#.#;;;NA'. The problem with that approach is that it doesn’t work in Excel PivotTables. Excel retrieves the VALUE and the FORMAT_STRING cell properties then does the formatting itself. But the formatting code in Excel ignores the fourth part, unfortunately. (As a reminder, format strings are semicolon separated. The first part is for positive numbers, the second part is for negative numbers, the third part is for zero, and the fourth part is for nulls.) So our solution will have to limit itself to three-part format strings which work in Excel.
My approach of choice looks like the following:
scope( [Account].[Accounts].&, [Measures].[Amount] );
this = IIf(IsEmpty(Sum([Account].[Accounts].&.Children, [Measures].[Amount])),null,0);
Format_String(this) = ";;"; //format zeros as blank
It is assigning a zero to a cell if any statistical accounts have data in this slice, otherwise it’s assigning null. Then it’s using a custom format string just for the Statistical Accounts subtotal. The format string is just two semicolons. That three-part format string says to format zero as blank. Since the format string is inside the scope statement, we’re only customizing the format string for the Statistical Accounts subtotal.
If you prefer NA, then you would use the following format string code inside the scope statement:
Format_String(this) = ";;\N\A";
And as discussed above, typically IsEmpty(Sum([Account].[Accounts].&.Children, [Measures].[Amount])) would just be IsEmpty([Measures].[Amount]) but we had to use the more complex expression here because of the ~ unary operators in play.
This approach solves all the problems we were trying to avoid. It doesn’t null out the grand total. It preserves sparsity in the cube. It avoids using expensive strings. It avoids null subtotals and null grand totals. And it is usable in Excel.
You may have noticed I never post business intelligence industry news on this blog. That was a conscious decision on my part when I started this blog nearly five years ago. Why? Because so many others cover that much better than I ever could dream of.
I never really post news about myself either. Why? Because I doubt you really care, and I’d rather blog about BI. (I have to say it was difficult not to mention how proud I am to have achieved the Analysis Services Maestro designation since, at the moment, there are only 11 in the world and 3 in the United States.)
All that is to say that I’m breaking with tradition to announce that I’m speaking at SQL Rally in Dallas this May. Why? Because I love meeting other SSAS geeks in the area (and hopefully one or two of you are actually reading this blog).
99 Tips for Tuning and Enhancing Analysis Services
All-day preconference session on Wednesday, May 09, 2012
Come and hear Greg Galloway share all his best tips for optimizing performance of Analysis Services and improving the user experience and reach of your cubes. This session is targeted to BI developers responsible for maintaining Analysis Services cubes and to DBAs who are tasked with monitoring and troubleshooting Analysis Services issues. Expect to leave with an actionable checklist of ideas to try on Monday when you return to work, with a deeper grasp of the internals of Analysis Services, with MDX optimization experience, and with best practices that ensure your projects succeed. This preconference session will focus on Multidimensional cubes and MDX but will briefly discuss the advantages and limitations of new Tabular models and DAX in Analysis Services 2012.
Making Heads or Tails of Analysis Services Storage
Regular 75 minute session during the conference May 10-11, 2012
If you are about to start a new Microsoft business intelligence project, you have more Analysis Services choices and flexibility than ever before. Come learn what’s new in Analysis Services 2012 and learn the advantages and limitations of the new Tabular models compared with classic Multidimensional models. In this session we will take a deep dive into storage modes in Analysis Services. We’ll discuss the two storage modes a Multidimensional model can use: MOLAP and ROLAP. We’ll discuss the storage modes in new Tabular models: VertiPaq and DirectQuery. We’ll preview the new columnstore index in SQL Server 2012 and discuss how it impacts Analysis Services. And we’ll take a deep dive into two successful Analysis Services deployments at two Fortune 500 companies and discuss why one chose ROLAP and the other chose MOLAP.
I will do my best to avoid much overlap between those sessions, so I look forward to seeing you in both of them. Register for SQL Rally soon since prices go up after March 14th. The pre-conference session is only $219, which I think is quite a bargain.
Artis is a silver sponsor and will definitely have a big presence at the conference. And my Artis co-worker and fellow MVP Tim Mitchell is pulling double duty in helping organize the conference and speaking on Data Quality Services. (As a side note, I have to applaud how the program committee excluded their own speaking sessions from selection and simply put them up for community vote. Tim and others on the committee are fantastic speakers, so the community rightly chose many of their sessions.)
Most of my blog posts chronicle some lesson learned in the heat of battle at a client site. This blog post is a bit different since the lesson learned came from an Artis internal project we started and completed last year. At Artis, our time entry system feeds into an Artis data warehouse and an Analysis Services cube for reporting. We also support writeback against the cube for project managers to enter forecasts for projects. That way, at the weekly staffing meeting, it’s pretty simple to see that Greg is forecasted to work 95 hours between four projects next week. At that point, preventative measures can be taken… like alerting local emergency rooms that I may be making a visit.
Seriously though, Artis is a great place to work since I get to work with people like Drew Jones and Cindy Liang (who actually implemented this resource planning cube), and I’ve never once gotten close to working 95 hours. I’ve got an MDX query to prove it! But I digress…
Writeback Cube Security
Since forecasting and resource planning is the goal, we use role-based security in the cube to prevent project managers from setting a forecast for a prior week (since it’s already too late to plan resources at that point). A typical scenario in industry where write security would be needed would be making sure each regional manager can only set the budget for his region. Setting up this security was a bit tricky, so I’ll describe it step-by-step.
The first step is to create a ReadAll role. In this case, the ReadAll role grants members access to read the whole cube. I’ve definitely seen a number of clients make mistakes here and grant extra permissions that are not required. Nothing needs to be checked on the General tab (not even Read definition which grants them access to see the metadata in tools like SSMS). On the Cubes tab you need Read and optionally Drillthrough access. If you want a user to be able to read the whole cube from a tool like Excel, that’s all you need to do.
Next, build a separate Writeback role. On the Cubes tab, grant Read/Write, check Process, and (optionally) Drillthrough permissions. I’m frankly not sure why you need Process permissions to perform ROLAP writeback, but it appears you need Process permissions for ROLAP or MOLAP writeback:
In the Data Sources tab, make sure to grant Read permissions on the data source used for writeback. For example, when I created my writeback partition, I used the Greg Test data source to direct that writeback table to live in another database (such as a database with Full recovery model):
Finally, if you want to only let them writeback to certain cells, control this on the Cell Data tab. Since I was prototyping against Adventure Works data, the expression below returns true when the current year is later than 2003. In your cube, the expression might look at a current week flag in the Date dimension or something like that.
Am I Seriously Recommending Cell Security?!?
Wait! Cell security! Won’t that kill performance? Definitely Analysis Services best practices call to avoid cell security in favor of dimension data security because cell security is a performance killer. The reason I use it here is because I don’t know of a way to accomplish the writeback security without cell security. The question is whether this particular combination of cell security designed to be write security impacts read performance. I verified this security only impacts the performance of writes with a simple trick. In the read/write permissions box above, I put in the following expression to prove this:
ASSP.FireTraceEventAndReturnValue(99) < 100
Let me explain what that’s doing. This FireTraceEventAndReturnValue function was designed for the very purpose of being able to detect via Profiler when Analysis Services evaluates it. So it simply fires a “User Defined” event in Profiler and then returns the integer you pass into the function. That way, you can watch Profiler and verify cell security isn’t being evaluated during read-only queries. It only appears in Profiler when you query the UPDATEABLE property or when you perform a write. So just building a PivotTable in Excel isn’t slowed down by the above cell security since Excel currently doesn’t query the UPDATEABLE cell property. But I’m getting ahead of myself as I’ll describe the UPDATEABLE cell property below.
Why is this the case that cell security didn’t impact read performance? Note that we used two roles. One role grants read permissions which let you read the whole cube. Another role grants write permissions but then says you can only write to recent years. If we had just created the Writeback role, users would not have been able to read 2003 data and cell security would impact read performance, too. By creating two roles, the read and write permissions are managed separately.
As a side note, we have a support case open to fix a bug where attempting to write data to 2003 sometimes causes SSAS to crash. It appears this may happen when the user is in two roles, as described above. Hopefully that bug will be fixed soon. (Update:It looks like it will be fixed in SQL 2008 R2 SP1 CU6.) Coincidentally, the VBA described below provides a workaround to the bug.
Testing Out Writeback in Excel 2010
Now that we’ve setup our role-based security, let’s see what the user experience is in Excel 2010. We start a PivotTable as usual. But since I’m an administrator on my Analysis Services dev server, I need to tweak the connection string to pretend to be a lower privileged user. Go to the Data tab, click Connections, and click Properties for the connection. Then append either of the following snippets to the end of the connection string:
The first pretends to be a member of the ReadAll and the Writeback roles. This works fine as long as the security is hardcoded (i.e. not dynamic security which uses the UserName() function in MDX expressions in the role). The EffectiveUserName option is another option which impersonates a specific user, and I’ve blogged about it before.
Now I’ll build a simple PivotTable then enable What-If analysis. What happens when I edit a 2003 cell and publish my changes? I get this error message:
The error message is at least helpful in describing what went wrong. But it’s not helpful in describing which cell was secured. Also, it’s not helpful that it waited until I was finished entering my data before giving me an error during the Publish. Some immediate feedback would have been nice.
So how do you tell in advance which cells are secured and which cells are updateable. Fortunately, there’s a handy cell property called UPDATEABLE which does just that. Unfortunately, Excel 2010 what-if analysis doesn’t leverage it, so I have proposed this feature. For now we can use a little VBA to roll our own.
Using VBA to Query the UPDATEABLE Cell Property
The following VBA code catches the Worksheet’s PivotTableAfterValueChange event. I could have taken a number of approaches, and I’ll discuss the pros and cons of this approach below, but I chose this approach for this prototype because it was the most straightforward. When you type in a new value for a cell in a PivotTable, this event fires. This code borrows the ADO connection from the PivotTable and runs a simple query that retrieves the UPDATEABLE cell property for that cell’s coordinates in the cube. I definitely haven’t explored all the possible values of the UPDATEABLE cell property, so you’ll need to verify this code with your cube. But for the simple example in this blog post, I’ve seen two possible values. If UPDATEABLE is CELL_UPDATE_ENABLED then writeback is allowed and the cell is not secured. If UPDATEABLE is CELL_UPDATE_NOT_ENABLED_SECURE then the cell is secured and will produce an error upon writeback, so we discard that change and popup the error message:
In my mind, that’s a better user experience. You get immediate feedback that tells you which cell is the problem. So without further ado, here’s the code:
Private Sub Worksheet_PivotTableAfterValueChange(ByVal TargetPivotTable As PivotTable, ByVal TargetRange As Range)
On Error GoTo ErrHandler
Application.ScreenUpdating = False ' turns off screen updating
Const MD_MASK_ENABLED = &H0
Const MD_MASK_NOT_ENABLED = &H10000000
Const CELL_UPDATE_ENABLED = &H1
Const CELL_UPDATE_ENABLED_WITH_UPDATE = &H2
Const CELL_UPDATE_NOT_ENABLED_FORMULA = &H10000001
Const CELL_UPDATE_NOT_ENABLED_NONSUM_MEASURE = &H10000002
Const CELL_UPDATE_NOT_ENABLED_NACELL_VIRTUALCUBE = &H10000003
Const CELL_UPDATE_NOT_ENABLED_SECURE = &H10000005
Const CELL_UPDATE_NOT_ENABLED_CALCLEVEL = &H10000006
Const CELL_UPDATE_NOT_ENABLED_CANNOTUPDATE = &H10000007
Const CELL_UPDATE_NOT_ENABLED_INVALIDDIMENSIONTYPE = &H10000009
Dim pcache As PivotCache
Set pcache = TargetPivotTable.PivotCache
If Not pcache.IsConnected Then
Dim oCellset As New ADOMD.cellset
oCellset.ActiveConnection = pcache.ADOConnection
Dim sError As String
Dim oPivotTableCell As Range
For Each oPivotTableCell In TargetRange.Cells
Dim oPivotCell As PivotCell
Set oPivotCell = oPivotTableCell.PivotCell
Dim sMDX As String
sMDX = "select from [" & pcache.CommandText & "] WHERE " _
& oPivotCell.MDX & " CELL PROPERTIES UPDATEABLE"
oCellset.Source = sMDX
Dim iUPDATEABLE As Long
iUPDATEABLE = oCellset(0).Properties("UPDATEABLE")
If iUPDATEABLE = CELL_UPDATE_ENABLED Then
ElseIf iUPDATEABLE = CELL_UPDATE_NOT_ENABLED_SECURE Then
sError = sError & "Cell " & Replace(oPivotTableCell.Address, "$", "") _
& " is secured." & vbCrLf
If sError <> "" Then
sError = "The following cells do not allow writeback, " _
& "so their values were discarded." & vbCrLf & vbCrLf & sError
Application.ScreenUpdating = True ' turns on screen updating
There is one main downside to this approach. It can slow down the data entry process since after each cell is changed, we execute one MDX query per cell to retrieve the UPDATEABLE cell property. This is generally fairly snappy as running 100 of these MDX queries only took a second against the local network and several seconds over a slower VPN connection. But when there are a bunch of secured cells you attempt to update, discarding each change one cell at a time takes a couple of seconds, which adds up. Another downside to consider is that if you enter a change on Monday then save the workbook without publishing those changes, when you open the workbook on Tuesday your writeback permissions in the cube may have changed yet this VBA code doesn’t rerun for previous changes you’ve entered but not published.
Another approach would be to catch the PivotTableUpdate event and grab the coordinates of all the cells and run one MDX query to retrieve the UPDATEABLE property for all cells in the PivotTable, caching this information for later when the user updates cells. The reason this approach would be more complex is because the .MDX property of a subtotal PivotCell returns a tuple with one less member than a PivotCell that’s a detail level cell since the All member isn’t mentioned in the subtotal’s MDX. And you can’t easily combine two tuples with different dimensionality. So I went with the more straightforward approach for this prototype.
Note in the VBA above the use of the ADOMD.Cellset object. This requires a reference to any version of Microsoft ActiveX Data Objects (Multi-dimensional), and you can add that reference from within the VBA editor from the Tools… References menu:
What’s He Not Telling Me?
This post focused on role-based security for writeback and how to improve that user experience. I didn’t talk about setting up a writeback partition or whether to choose MOLAP or ROLAP writeback. I didn’t discuss how allocation method impacts writeback performance. I didn’t discuss how to writeback to non-leaf cells. I didn’t discuss the necessity of periodically moving data from the writeback table into the regular fact table. Maybe I’ll blog about those in the future.
I also didn’t discuss the pros and cons of the different architectures we could have chosen. This project lent itself to extending our existing time entry reporting cube with forecast writeback, but we could have also chosen to implement a good budgeting and planning tool like deFacto. That decision process is a blog post for another time.
To summarize, be sure to grant read permissions and write permissions in separate roles if the user should be able to read more of the cube than they can write to. The UPDATEABLE cell property can come in handy if you want to know in advance which cells a user has permissions to update. With a little VBA, it’s pretty straightforward to improve the writeback experience where writeback security is involved by querying the UPDATEABLE cell property.
1. Fixing some VBA formatting/syntax issues that happened when I pasted my the code into my blog
2. Updating the status of a hotfix for the SSAS bug I mentioned which will be fixed in SQL 2008 R2 SP1 CU6.
As an homage to my two favorite detectives, Nero Wolfe and Gregory House, here’s an Analysis Services detective story for you. If you enjoy the thrill of the chase, read on. If you take yourself too seriously and just want the facts, skip to the Conclusions section at the end.
Our story begins on a busy summer day in which our hero (me) finds himself covering for a co-worker on vacation. I get a call that queries are running slow on the Analysis Services box, so I go investigate. The first thing I noticed was that Task Manager shows zero free memory...
Image 1: Windows Server 2008 Task Manager showing zero free RAM
Chapter 1 (Or How I Discovered Analysis Services Didn’t Want More Memory)
Honestly, my first instinct was that Analysis Services thought it was in a low memory situation, and I suspected that might explain the performance issues. So I decided to investigate what was consuming that memory. You could look at msmdsrv.exe memory usage in Task Manager, but there are times when it doesn’t show the right number. So I checked the “MSOLAP$instance:Memory:Memory Usage KB” perfmon counter, and it told me SSAS was only using 1GB of memory. Considering this was a dedicated SSAS box, and considering LowMemoryLimit was set at 6GB and TotalMemoryLimit was set at 8GB, either SSAS didn’t need the memory, or worse, it did need it but didn’t think there was enough memory available.
Akshai Mirchandani on the SSAS dev team was kind enough to tell me that SSAS uses the GlobalMemoryStatusEx Windows API to determine how much available memory there is at the moment. I wrote a little EXE to pull that number, and I found that it told me there were 8GB available. That number matched the “Memory : Available MBytes” perfmon counter, as it should. I learned the Free number showing in Task Manager just means zeroed out RAM, so it’s nothing to be concerned with as long as there’s still enough Available memory. In short, ignore the “Free” number in Task Manager. The confusion caused by showing Free but not Available in Windows 2008 Task Manager is probably the reason they added Available back into Task Manager in Windows 2008 R2.
Akshai also suggested I double check it was 64-bit SSAS installed since 32-bit SSAS is limited to 2GB memory usage. So I ran the following from an XMLA window in Management Studio, looked for the <Edition> tag in the result, and verified it ended in 64 (i.e. <Edition>Enterprise64</Edition> means 64-bit enterprise edition).
My conclusion was that SSAS knew there was memory available to it and it could have grabbed more memory if it needed it, so it must not have needed the memory.
Chapter 2 (Or How I Discovered the Standby Cache)
Even though there was 8GB available RAM, I still wanted to track down what was consuming the memory such that Windows reported there was 0MB free. Knowing that when SSAS reads a file off disk, it uses a buffered read which is able to cache the file in the Windows System File Cache, I went to check the “Memory : Cache Bytes” perfmon counter. It said the Windows System File Cache was using 3GB currently. Since the cube in question was 30GB, and since I had set the LimitSystemFileCacheSizeMB setting to 3GB to make sure the system file cache didn’t get out of hand, I wasn’t surprised it was using exactly 3GB. (3GB was the value I had chosen on this server after studying usage patterns during querying and processing. This is just a dev server, so it doesn’t have as much memory as it really needed to perform optimally. For at 30GB cube, we would recommend at least 30GB of RAM usually. But for this dev server, we have to make due with what we have.)
Recapping, SSAS was using 1GB, and the system file cache was using 3GB. The server had 12GB total. Do the math. Something else is using 8GB of memory on this dedicated SSAS box. There weren’t any processes in Task Manager that were using significant chunks of RAM.
At this point, the Task Manager number saying Cached was about 10GB gave me my next hint. I began researching Windows 2008 caching and made one of my most significant discoveries of 2010. The discovery was this: there’s a standby cache. There were several articles that helped me figure this out, but one of the key articles was the Too Much Cache blog post which mentions the standby cache briefly. Let me explain how it works. Let’s say you let your active system file cache grow to 4GB, then you set the LimitSystemFileCacheSizeMB setting to 3GB. That extra 1GB doesn’t get zeroed; it just gets recategorized as “Standby Cache” rather than “Active Cache” and is now categorized as “available” so it can be quickly zeroed and given to other processes if needed. Or if the active system file cache needs it again, it can be quickly soft faulted back into active system file cache.
Why did I say this was such an important discovery? As Somak noted in the Too Much Cache blog post, grabbing a page off disk takes about 10 milliseconds, whereas grabbing a page out of RAM takes about 10 nanoseconds… that’s 1 million times faster. So obviously the system file cache is crucial to SSAS query performance. And accounting for the system file cache is key to repeatability in testing MDX query performance—you will never be able to compare two executions of an MDX query unless you can ensure the system file cache is clear before both executions. I’ll get to how you can clear the active and standby cache in a second.
At this point, I discovered a killer Sysinternals tool called RAMMap which gives you great insight into the system file cache. I ran a few queries against Adventure Works before the following screenshot. Note the Internet Sales 2004 partition is in the active system file cache, as shown on the File Summary tab:
Image 2: RAMMap showing active system file cache
Now I clear the active system file cache using ASSP (which I’ll mention later). Note the cached files just got moved into the standby cache:
Image 3: RAMMap showing standby system file cache
Now if I clear the standby cache using ASSP (which I’ll mention later) or using RAMMap’s “Empty… Empty Standby List” menu, you’ll note those Adventure Works files drop out of the standby cache.
Anyway, back to our detective story. Poor performance on a query had been reported, but I couldn’t reproduce it. I thought I was testing that query on a cold cache because I had cleared the active system file cache and the Analysis Services caches, but I hadn’t accounted for the standby cache. So the query was performing OK for me because many of the files were in the standby cache. Once I cleared the standby cache, I could reproduce the poor performing query, and I ended up building a new aggregation that solved the problem.
The lesson learned was to pay attention to the standby cache (the “Memory : Standby Cache Normal Priority Bytes” perfmon counter) in addition to the active cache (the “Memory : Cache Bytes” perfmon counter).
Chapter 3 (Or How ASSP Got Its Groove Back and Cleared the Standby Cache)
Being able to use the “Empty… Empty Standby List” menu item in RAMMap is nice, but wrapping that Windows API call into an Analysis Services stored procedure lets you clear the standby cache right from a Management Studio query window (or with an automated testing tool). So I added this functionality to the ASSP project. Shout out to Tim Mangan and Marco Russo who both helped track down that Windows API call.
About a year and a half ago I added some code into the ASSP project to clear the system file cache. I thought it would help MDX query test repeatability, but it strangely had little impact. Now I know why—it was only clearing the active system file cache (which simply shuffled those pages into the standby cache). Now ASSP has its groove back because it clears both the active and standby cache with the ClearFileSystemCache sproc. So if you want to test on a true cold cache without rebooting the server, the best practice is to run run the ClearAllCaches sproc (which internally clears both the Analysis Services cache with the ClearCache sproc and clears the active and standby system file cache with the ClearFileSystemCache sproc), then run your MDX query, all from an MDX window in Management Studio: call ASSP.ClearAllCaches()
<your MDX query here>
Chapter 4 (Or How I Stand on my Soapbox and Condemn Paging)
During the discussion above, I have only been speaking of memory in terms of physical RAM (rather than virtual memory). I can hear Marco Russo chiding me and whispering in my ear, “Programs request virtual memory from Windows, and Windows allocates RAM or pages memory to disk as it sees fit.” While he is right, physical RAM is all that’s relevant in practice. If there’s much paging going on, performance will be terrible. And this brings me to a favorite soapbox.
I can’t count the number of times I’ve seen a new client’s server in the following state. The server has SQL and SSAS on the same box. Both are set to the default memory limits (which is no memory cap for SQL and 80% of server memory for SSAS). By the time I see the server, SQL and SSAS are fighting each other for memory, causing each other to page out, and performance degrades dramatically. (Remember what I said about disk being a million times slower than RAM?)
What’s the solution? Cap all the large consumers of memory such that they don’t step on each others’ toes. That’s an easy principle, but it gets very difficult when you have to go through the testing iterations to prove your server performs better overall with memory limits for SQL and SSAS. For example, I have observed one particular stored procedure take 12 hours to complete if SQL was capped at 6GB of memory, but that same stored procedure would complete in 3 hours if SQL was capped at 14GB of memory. The point is that you have to thoroughly test your memory limit changes. But in my experience, a server with a well-tested memory usage plan will perform better and more consistently because it largely avoids paging.
An example memory usage plan on a 24GB RAM server with SQL and SSAS might be similar to the following. SQL will be capped at 10GB using the “maximum server memory” setting. SSAS will be capped at 8GB using TotalMemoryLimit/HardMemoryLimit settings (in combination with LowMemoryLimit). The active system file cache will be capped at 4GB using the LimitSystemFileCacheSizeMB setting. And we’ll assume the OS will use about 1GB, and programs like SSMS or BIDS will use about 1GB. Note the standby system file cache doesn’t need to be figured into your memory usage plan because that memory is categorized by Windows as Available.
|Memory Consumer ||GB |
|SQL ||10 |
|SSAS ||8 |
|Active System File Cache ||4 |
|OS ||1 |
|Programs like SSMS ||1 |
| ||24 |
Even better, if you determine that during the ETL, SQL needs more memory and SSAS needs less, it is easy to build a SQL script and an XMLA script to programmatically change those memory limits before and after the ETL.
Another best practice to consider is using the “lock pages in memory” setting to prevent paging. This KB article describes how to enable this setting for SQL Server. For Analysis Services, once you’ve set that group policy, use the PreAllocate setting to tell Analysis Services to grab a chunk of RAM that won’t be paged out. (Even without “lock pages in memory” PreAllocate helps SSAS memory allocation performance on Windows 2003. On Windows 2008, the main benefit of PreAllocate is to lock pages in memory.) Just be aware (and don’t panic) that Analysis Services will take quite a while to start and will pretty much lock up the server during that period.
Conclusion (Or How I Summarize Seven Pages In One Paragraph)
Ignore the “Free” number in Task Manager and focus on the “Memory : Available MBytes” perfmon counter. Be aware of the active system file cache (the “Memory : Cache Bytes” perfmon counter) and the standby system file cache (the “Memory : Standby Cache Normal Priority Bytes” perfmon counter) as they both impact MDX query performance drastically. Use RAMMap to see what’s in the system file cache. To test MDX query performance on a cold cache, use ASSP.ClearFileSystemCache. Avoid paging by capping the memory usage of large consumers of memory and by using the “lock pages in memory” group policy. Test before and after.
It's common to implement your role-based security in your Analysis Services cube. That way Fred can only see the sales from the east region, but Frank can only see sales from the west region. Role-based security requires connections to your cube identify the end user so the cube can secure itself. One way of tackling this is to implement Kerberos so that the credentials of the end user pass from the end user's laptop, to the report server, to the cube.
But what if your end user's laptop is on another company's domain? Or what if your end user's laptop is on a home workgroup? That laptop can't start a Kerberos ticket that will be valid on your domain. One option is to throw ISA Server 2006 into this mix, use it to present the user with a login form over HTTPS, then let ISA start the Kerberos ticket:
For either of the above diagrams, you would setup your data source in SSRS as follows:
We would definitely recommend the above approach, but we realize that ISA and Kerberos add a good deal of complexity to your infrastructure. So the point of this blog post is to mention a good trick for getting role-based security without requiring your end-users' laptop to be able to start a Kerberos ticket.
Before I mention the main trick that's the focus of this post, I should mention that enabling basic auth over HTTPS in SSRS is one possibility to solve this problem. It's not as good of a user experience as the single sign-on login form you can customize with ISA Server, but it works.
A Checkbox in SSRS Many People Misunderstand
The following setup of a data source in SSRS will allow you to pass the end user's credentials to the cube without having a Kerberos ticket start at the end user's laptop, and without having ISA Server.
Because you stored credentials, SSRS will open the connection to SSAS using those domain\SSASAdminUser credentials. Because you checked the "Impersonate the authenticated user after a connection has been made to the data source" checkbox, behind the scenes SSRS is going to append the EffectiveUserName property into the connection string, specifying the end user who is running the report. The EffectiveUserName property tells SSAS to pretend to be the specified user so that the results of MDX queries use the role-based security setup for that end user.
Only SSAS administrators can use the EffectiveUserName property, so the domain\SSASAdminUser must be an SSAS admin. The other requirement is that you must setup Kerberos on the SSAS server. To be clear, you need to setup the SPN on the SSAS server and you need to setup the AD configurations for delegation, but you don't need to setup Kerberos on any other servers besides the SSAS server. So this is a fairly quick way to get up in running without having your whole network setup for Kerberos.
Since you're using a service account which is an SSAS admin, carefully validate that you've setup the data source correctly to use EffectiveUserName impersonation. You wouldn't want to show your users more data than they're allowed to see just because you forgot to check a box!
What happens when an end user tries to create a new subscription on a report with the above data source setup? You get this annoying error message: "Subscriptions cannot be created because the credentials used to run the report are not stored, or if a linked report, the link is no longer valid." It says the credentials have to be stored. Uh. They are. But it's just complaining that we've checked the "impersonate the authenticated user" checkbox. My personal opinion is that this is an oversight. Feel free to vote for this Connect suggestion.
So we have to go for a workaround. The best workaround I could find is to use an expression-based connection string. So edit your report in BIDS, edit the data source, click the fx (expression) button, and enter the following expression-based connection string:
="Data Source=SSAS;Initial Catalog=YourDB;EffectiveUserName=" & User!UserID
After you deploy it, check the data source has stored credentials. Don't check the "Impersonate the authenticated user" checkbox.
Now users can create their own subscriptions. When the subscriptions run, the user receives report data according to his role-based security.
Now with your expression-based connection string, you try to create a data-driven subscription. And you get this error message: "Data-driven subscriptions cannot be created because the credentials used to run the report are not stored, the report is using user-defined parameter values, or if a linked report, the link is no longer valid." I believe it's complaining about the user of User!UserID.
You need to create a separate report for data-driven subscriptions. The expression-based connection string in the report needs to be changed to:
="Data Source=SSAS;Initial Catalog=YourDB;EffectiveUserName=" & Parameters!User.Value
Then you need to create a hidden parameter called User. Then the data-driven subscription needs to pass in the appropriate domain\username into that parameter.
Make sure end users don't have access to this report designed for data-driven subscriptions because if they figure out how to set the User parameter, they could run the report and see someone else's data.
The Dark Side of Expression-Based Connection Strings
Note expression-based connection strings can't be used in shared data sources. So you have to manage this one report at a time. Plus you can't even set the expression-based connection string from Report Manager. So you have to set the expression-based connection string in the report at development time, and this makes it a pain to deploy the same report to dev and prod without changing the RDL. Also, once you have changed your report to use an expression-based connection string, you can't open the query designer. So expression-based connection strings really get in the way of the proper development, deployment, and manageability experience.
So you might consider developing your reports with regular shared data sources. Then after you deploy the reports to the server, you can run .NET code like the RSExpressionBasedConnectionString sample code I worked up. Once you've got the sample code, edit the app.config pointing it at a folder in SSRS. It loops through all the reports in that folder, finds SSAS data sources, and changes them all to expression-based connection strings. You'll obviously need to tweak the code a good deal to have it do exactly what you want in your environment, but hopefully that's a start. I might know a few good consultants if you happen to need help customizing this for your needs. :-)
Teo Lachev also suggested to me that you could use an RDCE to accomplish the same thing as that .NET code above. But just like with the .NET code above which has to run after the report has been deployed, you'd have to set each report to use the RDCE after it is deployed. Neither option is ideal, but both would work.
Knowing about that "Impersonate the authenticated user" checkbox is a handy trick to have in your bag. Knowing that it's using EffectiveUserName under the covers is very helpful because it lets you use an expression-based connection string to enable subscriptions. But expression-based connection strings are almost more pain than they are worth. So you might consider automating the setup of those expression-based connection strings with some code like what I have mentioned above.
If I were the supreme leader of the Excel team (and I've had trouble convincing them of that), one of the first things I would do is add tooltips in the PivotTable field list so that when you mouse over a field, it would describe that field. I've asked for that feature, but so far, it doesn't look like it will make Excel 2010.
So... plan B. But let me first give some background on self-documenting cubes.
I'm big on self-documenting code because I've had too many experiences where separate documentation gets stale and irrelevant. Plus, I'm lazy. Thus, anything that makes it easier for me to keep documentation up-to-date and relevant is a good thing in my book.
Having descriptive names for measures and dimensions is one aspect of self-documenting cubes. You'll see descriptive calculation names like "Internet Sales Amount Variance to Annual Quota" in my cubes. But what I really want to focus on in this post is the Description property which appears on measures, dimensions, and all other objects in Analysis Services.
A few years ago at a user group, I was speaking with one prominent member of the Analysis Services team (who shall remain unnamed), and when I mentioned the Description property, he asked incredulously whether anybody ever filled those in. If you're wondering the same thing, the reason I fill in every single one is because it's a cinch to generate documentation that's relevant and up-to-date. You can use various tools that are out there. Or you can easily write a simple custom .NET app which loops through measures and dimension attributes and writes the descriptions to an HTML document. Or you can make use of descriptions with the following idea.
Since we don't get tooltips on the field list in PivotTables for now, what's plan B? I'll credit my co-worker Drew Jones with this idea. He suggested using actions in cubes to approximate this as much as possible. Here's what we came up with.
Right-click on any numeric cell in a PivotTable, and you can see the first 100 characters of the description property of that measure under Additional Actions...
Here's how you do that. First of all, you'll need the latest version of the ASSP assembly which includes the DiscoverSingleValue functions. Then you'll need to create an action in your cube with the following setup:
|Target type ||Cells |
|Target object ||All cells |
|Condition || |
, "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
+ "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
|Action Content Type ||URL |
|Action Expression ||"http://intranet/Documentation/CubeDocumentation.html" |
|Caption || |
, "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
+ "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
, "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
+ "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
|Caption is MDX ||True |
The action looks like the following in BIDS when you're done:
The only downside is that Excel 2007 doesn't even ask for actions on cells any time there is a multi-select in a PivotTable filter. But other than that limitation, it's a pretty good workaround.
I have been collecting a few advanced tidbits about semi-additive measures over the last few months, and it's time to share them.
Role-Playing Time Dimensions
Semi-additive aggregation functions like LastNonEmpty work just like Sum on all dimensions except for the Time dimension. How does it determine which is "the Time dimension"? It looks for the dimension marked Type=Time:
But what happens when, like most cubes, you have multiple role-playing versions of that dimension in your cube? Which does it choose? Flip to the Dimension Usage tab in BIDS. For the column representing that measure group with the semi-additive measure, find the role-playing date dimensions. I've circled them in green below. Then find the topmost one that's connected to the measure group. I've circled that in orange below. That's the semi-additive dimension for this measure group. A LastNonEmpty measure in this measure group will look for the most recent day with data in that Date dimension.
If you discover that the wrong dimension is the semi-additive dimension, it's very easy to fix. You just need to make the correct dimension the topmost connected role-playing date dimension by reordering the cube dimensions. To reorder the cube dimensions, flip to the Cube Structure tab, look at the list of cube dimensions in the bottom left, then drag and drop them into the correct order, similar to the following:
I suggest you arrange the cube so that the semi-additive dimension be the same for every measure group. This is probably a topic for a post another time, but I'm a big proponent for having a dimension simply called Date which is connected to every single measure group as the most common meaning of date. This dimension should be the semi-additive dimension.
I was warned that there have been some code changes in the Analysis Services product in this area at some point. So I tested this behavior in AS2005 RTM, SP1, SP2, SP3, and AS2008 RTM. I found it worked as I described above in all those versions. But test your cube thoroughly.
On a related note, the BIDS Helper Validate Aggs feature understands semi-additive measures and role-playing dimensions correctly. If you’ve got a measure group with only semi-additive measures, and if there are any aggs that don’t include the granularity attribute of the correct semi-additive dimension, then it warns you that those aggs will not be used in most situations.
Performance of LastNonEmpty vs. LastChild
On my largest cube, we decided to go with LastChild over LastNonEmpty. The driving reason was that LastNonEmpty hit more partitions than it needed to. To be specific, if we asked for a LastNonEmpty measure for 2008, it would hit every partition in 2008, not just the latest. LastChild only hits the latest partition. I would like to credit Robert Skoglund of Microsoft Consulting Services with making this observation and suggesting the solution.
The only hurdle was that the current month/quarter/year weren't complete, so the LastChild didn't have any data. That was easy to fix with an addition to the MDX script like the following:
// the most recent day with data
CREATE HIDDEN SET [Yesterday]
AS Exists([Date].[Calendar].[Date].Members, [Date].[Yesterday Flag].&[True]);
// The Inventory measures are AggregateFunction=LastChild.
// When we are in a current period, use the most recent day to
// get the inventory counts.
SCOPE(Exists([Date].[Calendar].[Month].Members, [Date].[Yesterday Flag].&[True]));
this = [Yesterday].Item(0).Item(0);
SCOPE(Exists([Date].[Calendar].[Quarter].Members, [Date].[Yesterday Flag].&[True]));
this = [Yesterday].Item(0).Item(0);
SCOPE(Exists([Date].[Calendar].[Year].Members, [Date].[Yesterday Flag].&[True]));
this = [Yesterday].Item(0).Item(0);
[Date].[Calendar].[All] = [Yesterday].Item(0).Item(0);
Another reason for going with LastChild over LastNonEmpty is that the results are often more intuitive in situations where not every single item is snapshotted every single period. Teo Lachev explained this problem very well in his post Last Non Empty Affairs. Depending on the business requirements, you could workaround the problem as he did with MDX that looks at each Customer, or you could change the ETL to snapshot every single item every period, or you could use LastChild to ensure that your results come from a consistent time period. The LastChild trick won't work in every situation, but it is a handy trick to have in your bag.
We have chosen to use relative paths in Integration Services on several projects now, and since it has worked well, it is time to share this tip with the community. The whole purpose of using relative paths is to make packages more portable, as opposed to packages and dtsConfig files having to live in a specific directory structure or drive letter on servers and developers’ laptops. As evidenced by the volume of posts in the forums about problems with deploying SSIS packages, anything that makes packages more portable will make deployment less prone to error.
When I say “relative paths in SSIS” I mean the following:
- .dtsx files and .dtsConfig files are all inside one directory
- Execute Package tasks refer to the package to execute using a File connection manager, and that connection manager refers to the .dtsx file with a relative path (i.e. just the filename)
- Package Configurations are of type “XML configuration file” and the path to the .dtsConfig file is just a relative path (i.e. just the filename)
- dtexec executes packages from disk, not from SQL Server or from the SSIS package store
- When dtexec is run, the working directory has to be set to the directory with all your files. This is accomplished by running dtexec from a .bat file which uses the CD command to set the working directory.
- When designing packages, Visual Studio must be started with the appropriate working directory which is the directory with all the packages. This can be accomplished by double clicking the .sln file if it is in the same directory as the packages. Or you can write a .bat file that runs the CD command to change the working directory before it starts Visual Studio.
When setting up an Execute Package task, mimic the following screenshots to use a relative path:
When setting up a package configuration, use the relative path:
Unfortunately, in SSIS 2008, this Package Configuration Wizard screen does not allow you to enter a plain relative path due to a bug. However, there is a BIDS Helper feature which allows you to workaround this bug.
Setting Working Directory
An example .bat file which runs dtexec with the correct working directory would look like the following. The first line changes to the drive of the working directory. The second line actually changes to that directory. The third line runs dtexec.
dtexec /FILE "master.dtsx" /MAXCONCURRENT " -1 " /CHECKPOINTING OFF /REPORTING EW
When setting up a SQL Agent job step, you cannot use the “SQL Server Integration Services Package” job step type in a SQL Agent job since it does not let you set the working directory. But you can use the “Operating System (CmdExec)” job step type which lets you run the .bat file. (To go further, some will argue here and here that you should always use the “Operating System (CmdExec)” even when you’re not needing to set the working directory for relative paths.) To run this bat file, set the “command” property to the full path to the bat file in quotes:
An example .bat file which runs Visual Studio with the correct working directory would look like the following. The first line changes to the drive of the working directory. The second line actually changes to that directory. The third line runs dtexec.
"C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\devenv.exe" "D:\MyPackages\test.sln"
If you forget and start Visual Studio from the Start Menu (such that the working directory is not set to your packages directory), the .dtsConfig file it creates for you from your relative path will end up in the “C:\Program Files\Microsoft Visual Studio 8\Common7\IDE” directory.
Adding .dtsConfig Files to Your Visual Studio Solution
You can include .dtsConfig files in your solution in Visual Studio. Move the .dtsConfig out of the directory with your .dtsx files, then in Windows Explorer copy the .dtsConfig file to your clipboard, then left click on the Miscellaneous folder in Solution Explorer in Visual Studio, then type Ctrl-V to paste it into your SSIS project. The .dtsConfig file is now in the same directory as your .dtsx files (which is important for relative paths, as discussed above). And your .dtsConfig file can now be checked into source control with your .dtsx files. And the .dtsConfig file will be readily accessible in Visual Studio. Opening that .dtsConfig by double clicking it, you can pretty-print the XML by pressing Ctrl-K then Ctrl-D. (Or if you have BIDS Helper installed, it will automatically pretty-print it for you upon opening it.)
As a side note, I should mention in this context that whenever I’m working with package configurations, I use the Expression and Configuration Highlighter feature of BIDS Helper which gives you a little teal triangle on anything controlled by a package configuration. I always look to see that every connection manager is controlled by an expression or a configuration.
There are several other blog posts I know of (here and here) around this topic, but they suggest solutions that are a bit more intrusive than the one we prefer which I’ve outlined in this post.
This relative paths strategy has saved us some headaches several times when moving the deployment destination for packages to different environments. So I would recommend you consider using the relative paths trick when deciding on the deployment and configuration strategy for your ETL project.