Home
About Us
News
Solutions
Experience
Blogs
Resources
Greg Galloway
Using EffectiveUserName To Impersonate in SSAS

It's common to implement your role-based security in your Analysis Services cube. That way Fred can only see the sales from the east region, but Frank can only see sales from the west region. Role-based security requires connections to your cube identify the end user so the cube can secure itself. One way of tackling this is to implement Kerberos so that the credentials of the end user pass from the end user's laptop, to the report server, to the cube.

image

But what if your end user's laptop is on another company's domain? Or what if your end user's laptop is on a home workgroup? That laptop can't start a Kerberos ticket that will be valid on your domain. One option is to throw ISA Server 2006 into this mix, use it to present the user with a login form over HTTPS, then let ISA start the Kerberos ticket:

image

For either of the above diagrams, you would setup your data source in SSRS as follows:

image

We would definitely recommend the above approach, but we realize that ISA and Kerberos add a good deal of complexity to your infrastructure. So the point of this blog post is to mention a good trick for getting role-based security without requiring your end-users' laptop to be able to start a Kerberos ticket.

Before I mention the main trick that's the focus of this post, I should mention that enabling basic auth over HTTPS in SSRS is one possibility to solve this problem. It's not as good of a user experience as the single sign-on login form you can customize with ISA Server, but it works.


A Checkbox in SSRS Many People Misunderstand

The following setup of a data source in SSRS will allow you to pass the end user's credentials to the cube without having a Kerberos ticket start at the end user's laptop, and without having ISA Server.

image

Because you stored credentials, SSRS will open the connection to SSAS using those domain\SSASAdminUser credentials. Because you checked the "Impersonate the authenticated user after a connection has been made to the data source" checkbox, behind the scenes SSRS is going to append the EffectiveUserName property into the connection string, specifying the end user who is running the report. The EffectiveUserName property tells SSAS to pretend to be the specified user so that the results of MDX queries use the role-based security setup for that end user.

Only SSAS administrators can use the EffectiveUserName property, so the domain\SSASAdminUser must be an SSAS admin. The other requirement is that you must setup Kerberos on the SSAS server. To be clear, you need to setup the SPN on the SSAS server and you need to setup the AD configurations for delegation, but you don't need to setup Kerberos on any other servers besides the SSAS server. So this is a fairly quick way to get up in running without having your whole network setup for Kerberos.

Since you're using a service account which is an SSAS admin, carefully validate that you've setup the data source correctly to use EffectiveUserName impersonation. You wouldn't want to show your users more data than they're allowed to see just because you forgot to check a box!


SSRS Subscriptions

What happens when an end user tries to create a new subscription on a report with the above data source setup? You get this annoying error message: "Subscriptions cannot be created because the credentials used to run the report are not stored, or if a linked report, the link is no longer valid." It says the credentials have to be stored. Uh. They are. But it's just complaining that we've checked the "impersonate the authenticated user" checkbox. My personal opinion is that this is an oversight. Feel free to vote for this Connect suggestion.

So we have to go for a workaround. The best workaround I could find is to use an expression-based connection string. So edit your report in BIDS, edit the data source, click the fx (expression) button, and enter the following expression-based connection string:

="Data Source=SSAS;Initial Catalog=YourDB;EffectiveUserName=" & User!UserID

After you deploy it, check the data source has stored credentials. Don't check the "Impersonate the authenticated user" checkbox.

image

Now users can create their own subscriptions. When the subscriptions run, the user receives report data according to his role-based security.


Data-Driven Subscriptions

Now with your expression-based connection string, you try to create a data-driven subscription. And you get this error message: "Data-driven subscriptions cannot be created because the credentials used to run the report are not stored, the report is using user-defined parameter values, or if a linked report, the link is no longer valid." I believe it's complaining about the user of User!UserID.

You need to create a separate report for data-driven subscriptions. The expression-based connection string in the report needs to be changed to:

="Data Source=SSAS;Initial Catalog=YourDB;EffectiveUserName=" & Parameters!User.Value

Then you need to create a hidden parameter called User. Then the data-driven subscription needs to pass in the appropriate domain\username into that parameter.

Make sure end users don't have access to this report designed for data-driven subscriptions because if they figure out how to set the User parameter, they could run the report and see someone else's data.


The Dark Side of Expression-Based Connection Strings

Note expression-based connection strings can't be used in shared data sources. So you have to manage this one report at a time. Plus you can't even set the expression-based connection string from Report Manager. So you have to set the expression-based connection string in the report at development time, and this makes it a pain to deploy the same report to dev and prod without changing the RDL. Also, once you have changed your report to use an expression-based connection string, you can't open the query designer. So expression-based connection strings really get in the way of the proper development, deployment, and manageability experience.

So you might consider developing your reports with regular shared data sources. Then after you deploy the reports to the server, you can run .NET code like the RSExpressionBasedConnectionString sample code I worked up. (You'll need to request a login to download that sample code.) Once you've got the sample code, edit the app.config pointing it at a folder in SSRS. It loops through all the reports in that folder, finds SSAS data sources, and changes them all to expression-based connection strings. You'll obviously need to tweak the code a good deal to have it do exactly what you want in your environment, but hopefully that's a start. I might know a few good consultants if you happen to need help customizing this for your needs. :-)

Teo Lachev also suggested to me that you could use an RDCE to accomplish the same thing as that .NET code above. But just like with the .NET code above which has to run after the report has been deployed, you'd have to set each report to use the RDCE after it is deployed. Neither option is ideal, but both would work.


Conclusion

Knowing about that "Impersonate the authenticated user" checkbox is a handy trick to have in your bag. Knowing that it's using EffectiveUserName under the covers is very helpful because it lets you use an expression-based connection string to enable subscriptions. But expression-based connection strings are almost more pain than they are worth. So you might consider automating the setup of those expression-based connection strings with some code like what I have mentioned above.

Self-Documenting Cubes in Excel PivotTables

imageIf I were the supreme leader of the Excel team (and I've had trouble convincing them of that), one of the first things I would do is add tooltips in the PivotTable field list so that when you mouse over a field, it would describe that field. I've asked for that feature, but so far, it doesn't look like it will make Excel 2010.

So... plan B. But let me first give some background on self-documenting cubes.


Self-Documenting Cubes

I'm big on self-documenting code because I've had too many experiences where separate documentation gets stale and irrelevant. Plus, I'm lazy. Thus, anything that makes it easier for me to keep documentation up-to-date and relevant is a good thing in my book.

Having descriptive names for measures and dimensions is one aspect of self-documenting cubes. You'll see descriptive calculation names like "Internet Sales Amount Variance to Annual Quota" in my cubes. But what I really want to focus on in this post is the Description property which appears on measures, dimensions, and all other objects in Analysis Services.

imageA few years ago at a user group, I was speaking with one prominent member of the Analysis Services team (who shall remain unnamed), and when I mentioned the Description property, he asked incredulously whether anybody ever filled those in. If you're wondering the same thing, the reason I fill in every single one is because it's a cinch to generate documentation that's relevant and up-to-date. You can use various tools that are out there. Or you can easily write a simple custom .NET app which loops through measures and dimension attributes and writes the descriptions to an HTML document. Or you can make use of descriptions with the following idea.


Plan B

Since we don't get tooltips on the field list in PivotTables for now, what's plan B? I'll credit my co-worker Drew Jones with this idea. He suggested using actions in cubes to approximate this as much as possible. Here's what we came up with.

Right-click on any numeric cell in a PivotTable, and you can see the first 100 characters of the description property of that measure under Additional Actions...

image

Here's how you do that. First of all, you'll need the latest version of the ASSP assembly which includes the DiscoverSingleValue functions. Then you'll need to create an action in your cube with the following setup:

Target type Cells
Target object All cells
Condition

Len(
ASSP.DiscoverSingleValue(
  "Description"
  , "MDSCHEMA_MEASURES"
  , "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
    + "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
)
)>0

Action Content Type URL
Action Expression "http://intranet/Documentation/CubeDocumentation.html"
Caption

"Definition: "
+ Left(
   ASSP.DiscoverSingleValue(
    "Description"
    , "MDSCHEMA_MEASURES"
    , "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
      + "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
   )
  ,100
  )
+ iif(
   Len(
    ASSP.DiscoverSingleValue(
     "Description"
     , "MDSCHEMA_MEASURES"
     , "<MEASURE_NAME>" + [Measures].CurrentMember.Name + "</MEASURE_NAME>"
       + "<CUBE_NAME>" + [Measures].CurrentMember.Properties("CUBE_NAME") + "</CUBE_NAME>"
    )
   )>100
   ,"..."
   ,""
  )

Caption is MDX True


The action looks like the following in BIDS when you're done:

image

The only downside is that Excel 2007 doesn't even ask for actions on cells any time there is a multi-select in a PivotTable filter. But other than that limitation, it's a pretty good workaround.

LastNonEmpty Gotchas

I have been collecting a few advanced tidbits about semi-additive measures over the last few months, and it's time to share them.

Role-Playing Time Dimensions

Semi-additive aggregation functions like LastNonEmpty work just like Sum on all dimensions except for the Time dimension. How does it determine which is "the Time dimension"? It looks for the dimension marked Type=Time:

TypeTime

But what happens when, like most cubes, you have multiple role-playing versions of that dimension in your cube? Which does it choose? Flip to the Dimension Usage tab in BIDS. For the column representing that measure group with the semi-additive measure, find the role-playing date dimensions. I've circled them in green below. Then find the topmost one that's connected to the measure group. I've circled that in orange below. That's the semi-additive dimension for this measure group. A LastNonEmpty measure in this measure group will look for the most recent day with data in that Date dimension.

DimensionUsage

If you discover that the wrong dimension is the semi-additive dimension, it's very easy to fix. You just need to make the correct dimension the topmost connected role-playing date dimension by reordering the cube dimensions. To reorder the cube dimensions, flip to the Cube Structure tab, look at the list of cube dimensions in the bottom left, then drag and drop them into the correct order, similar to the following:

ReorderCubeDimensions

I suggest you arrange the cube so that the semi-additive dimension be the same for every measure group. This is probably a topic for a post another time, but I'm a big proponent for having a dimension simply called Date which is connected to every single measure group as the most common meaning of date. This dimension should be the semi-additive dimension.

I was warned that there have been some code changes in the Analysis Services product in this area at some point. So I tested this behavior in AS2005 RTM, SP1, SP2, SP3, and AS2008 RTM. I found it worked as I described above in all those versions. But test your cube thoroughly.

On a related note, the BIDS Helper Validate Aggs feature understands semi-additive measures and role-playing dimensions correctly. If you’ve got a measure group with only semi-additive measures, and if there are any aggs that don’t include the granularity attribute of the correct semi-additive dimension, then it warns you that those aggs will not be used in most situations.

Performance of LastNonEmpty vs. LastChild

On my largest cube, we decided to go with LastChild over LastNonEmpty. The driving reason was that LastNonEmpty hit more partitions than it needed to. To be specific, if we asked for a LastNonEmpty measure for 2008, it would hit every partition in 2008, not just the latest. LastChild only hits the latest partition. I would like to credit Robert Skoglund of Microsoft Consulting Services with making this observation and suggesting the solution.

The only hurdle was that the current month/quarter/year weren't complete, so the LastChild didn't have any data. That was easy to fix with an addition to the MDX script like the following:

// the most recent day with data
CREATE HIDDEN SET [Yesterday]
AS Exists([Date].[Calendar].[Date].Members, [Date].[Yesterday Flag].&[True]);

// The Inventory measures are AggregateFunction=LastChild.
// When we are in a current period, use the most recent day to
// get the inventory counts.
SCOPE(MeasureGroupMeasures("Inventory"));

 
SCOPE(Exists([Date].[Calendar].[Month].Members, [Date].[Yesterday Flag].&[True]));
    this = [Yesterday].Item(0).Item(0);
  END SCOPE;
 
  SCOPE(Exists([Date].[Calendar].[Quarter].Members, [Date].[Yesterday Flag].&[True]));
    this = [Yesterday].Item(0).Item(0);
  END SCOPE;
 
  SCOPE(Exists([Date].[Calendar].[Year].Members, [Date].[Yesterday Flag].&[True]));
    this = [Yesterday].Item(0).Item(0);
  END SCOPE;
 
  [Date].[Calendar].[All] = [Yesterday].Item(0).Item(0);
 
END SCOPE;


Another reason for going with LastChild over LastNonEmpty is that the results are often more intuitive in situations where not every single item is snapshotted every single period. Teo Lachev explained this problem very well in his post Last Non Empty Affairs. Depending on the business requirements, you could workaround the problem as he did with MDX that looks at each Customer, or you could change the ETL to snapshot every single item every period, or you could use LastChild to ensure that your results come from a consistent time period. The LastChild trick won't work in every situation, but it is a handy trick to have in your bag.

Analysis Services Project Configurations

If you’ve regularly found yourself changing the target server property and the data source connection string before you deploy your cube to another environment, it’s simply because you’re not aware of a feature that hasn’t been publicized well. (I’m just judging based on my own ignorance of this feature until recently. However, now that I go look, both this article and Teo Lachev’s book mention Configuration Manager, though they don’t mention the exact list of settings that are configurable.)

Visual Studio Configurations
Visual Studio has a configurations feature so that you can specify settings for different build types. In .NET projects, it’s usually used to compile debug versus release builds. For business intelligence projects, it can be used to setup a configuration per deployment target environment:
ConfigurationsDropdown

What settings can be saved per configuration? Obviously, all the settings in the Project Properties dialog:
PropertyPages

What I Didn’t Know…
What I didn’t know until recently was that there are other settings (like the data source connection strings) in the Analysis Services project which can be controlled per configuration. Basically, any setting which is environment specific is “configurable.” I’ll enumerate those settings below.

The other tidbit to discuss is where these settings are stored. Some are stored in the .dwproj file and others are stored in the .dwproj.user file. The settings in the dwproj file can be considered “shared” because this file is checked into source control and shared among team members. The settings in the dwproj.user file can be considered “private” as this file often not checked into source control or shared. The counterargument is that the dwproj.user file holds the TargetServer setting which is a crucial part of setting up separate Visual Studio configurations per environment, so you may want to consider checking in that file.

Configurable Settings
The following table enumerates all the settings controlled per configuration, so you should be sure to review each of them when you setup a new deployment environment.

Setting Name

Storage File

Where To Edit Setting

ConnectionMappings

dwproj

Double click data source… edit connection string. The connection string is saved with the SQL security username and password removed.

ConnectionProviderMappings

dwproj

Double click data source… change provider

ConnectionSecurityMappings

dwproj

Double click data source…  this setting indicates whether SQL security username and password was removed when saved to disk. The username and passwords are stored in the dwproj.user file (see the UserIDs and UserPasswords setting below.)

CubeKeyErrorLogFiles

dwproj

Cube properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

CubeStorageLocations

dwproj

Cube properties… Configurable… StorageLocation

DeploymentMode

dwproj.user

Project Properties dialog

DeploymentServerEdition

dwproj

Project Properties dialog

DimensionKeyErrorLogFiles

dwproj

Dimension Properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

MeasureGroupKeyErrorLogFiles

dwproj

Measure Group Properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

MiningStructureKeyErrorLogFiles

dwproj

Mining Structure Properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

OutputPath

dwproj

Project Properties dialog

PartitionKeyErrorLogFiles

dwproj

Partition Properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

PartitionRemoteServers

dwproj

Only applicable when using remote partitions

PartitionStorageLocations

dwproj

Partition properties… Configurable… StorageLocation

ProcessingOption

dwproj.user

Project Properties dialog

RemovePasswords

dwproj

Project Properties dialog

ReportActionPaths

dwproj

Cube Actions tab… properties on a report action… Path property

ReportActionServers

dwproj

Cube Actions tab… properties on a report action… ReportServer property

StartObject

dwproj.user

Project Properties dialog

TargetDatabase

dwproj.user

Project Properties dialog

TargetServer

dwproj.user

Project Properties dialog

TransactionalDeployment

dwproj.user

Project Properties dialog

UserIDs

dwproj.user

Double click data source…  set the SQL security username

UserPasswords

dwproj.user

Double click data source…  set the SQL security password


Visual Studio configurations are very helpful for SSAS. In fact, I would call it a best practice to use them.

Configurations vs. Deployment Wizard
When is it appropriate to use Visual Studio configurations, and when is it appropriate to use the Analysis Services Deployment Wizard? I would recommend you always use Visual Studio configurations, and if you’re doing a more advanced deployment where the Deployment Wizard would help, then you can use it, too. (The Deployment Wizard has some advanced deployment options like retaining partitions on the server, or retaining security roles on the server, etc.)

The Deployment Wizard works very well together with Visual Studio configurations. When you build or deploy your cube from Visual Studio, it writes files to the bin directory. The “bin\.configsettings” file contains all your Analysis Services project configurations, and that file drives the defaults on the “Specify Configuration Properties” screen in the Deployment Wizard. If you’ve properly setup your Visual Studio configurations, then you’ll need to do less work in the Deployment Wizard.

Visual Studio Configurations for SSIS and SSRS
Integration Services has a project properties dialog that lets you configure settings regarding debugging and regarding the building of the deployment manifest for use with the Package Installation Wizard. (And thanks to BIDS Helper, it has settings regarding deploying packages directly from BIDS.) And obviously those settings from the project properties dialog can be stored per Visual Studio configuration. SSIS projects do also have a Data Sources folder in Solution Explorer, and the connection strings are stored per Visual Studio configuration. However, the Data Sources folder is more trouble than it is worth. (See the comments from June 22, 2007 at the bottom of this post of Jamie Thompson’s.) You should use package configurations to control connection strings, and package configurations are unrelated to Visual Studio configurations.

Reporting Services has a project properties dialog that lets you configure settings regarding deploying of reports. These settings can be stored per Visual Studio configuration. However there are no other settings that are stored per configuration. Unfortunately, connection strings of the data sources in the Shared Data Sources folder are not controlled per configuration. (But the OverwriteDataSources setting in the project properties dialog prevents this from hurting you much. You just have to manually configure the data source in Report Manager the first time you deploy that data source, and then it will not be overwritten on subsequent deployments.)

Incremental Many-to-Many Matrix Package

I was privileged to be able to work with Carl Rabeler from the SQLCAT team on a good whitepaper which performance tests several optimization techniques for many-to-many dimensions in Analysis Services. The conclusion was that a technique they call Matrix Relationship Optimization with the appropriate aggs results in the best MDX query performance. In a nutshell, the idea is that there are recurring patterns or signatures among m2m dimensions. For instance, in the Adventure Works database, every sales order in the database shares one of twelve common sales reason signatures. Query performance can be improved by reusing these shared signatures.

This optimization technique pushes some of the work to the ETL layer. Along with the whitepaper, they distributed sample Integration Services packages to load the matrix relationship tables. Those packages were, I thought, rather ingenious in their use of the script transforms to pivot and unpivot the sales reasons data and accomplish everything in one dataflow task. The packages they distributed are coded to do a full refresh during each run. I created a spinoff of their sample package that does an incremental load, so I thought I would post it in case it helps others get started. Download IncrementalSalesReasonMatrixLoad.zip and run against a fresh copy of AdventureWorksDW.

I would recommend using the incremental load package if (a) you have the ability to easily detect which rows in the source system have changed since the last successful run and (b) if the incremental load will not produce many unused signatures over time.

The whitepaper also discusses aggregations for m2m dimensions in detail. Carl’s experience with real customers suggested that aggs on the intermediate measure group are usually very helpful (i.e. are considerably smaller than the intermediate measure group itself). Thankfully, these aggs can be built as normal since the Query Subcube event in Profiler accurately reports the subcube vector.

On the other hand, he found that in real life, aggs on the data measure group to help with queries against m2m dimensions often end up being about the same size as the measure group itself, thus they are not helpful. It all just depends on your data. An agg is smaller on disk not only because it has a lower rowcount, but because it has fewer dimension keys. (It’s not as tall and not as wide.) Therefore, data measure groups with a large number of dimensions unused in queries benefit from aggs, even if the rowcount of the resulting agg alone isn’t much less than the measure group rowcount itself. So it is helpful to know how to build m2m aggs on the data measure group. Unfortunately, these aggs must be built by hand using Agg Manager, and they are rather tricky to build. Let me suggest the following steps for building them:

  1. Turn on Query Logging and set QueryLogSampling to 1.
  2. Install ASSP and run the following before each MDX query: CALL ASSP.ClearCache()
  3. Run your MDX queries which use the m2m dimension.
  4. Turn off Query Logging by setting QueryLogSampling to 0.
  5. Install BIDS Helper and launch Agg Manager. Right click on the agg design for your data measure group (e.g. Internet Sales) and choose “Add Aggregations from Query Log.” Click the “Execute SQL” button and then click OK after the query returns. Right click on the agg design and choose Edit.
  6. (Instead of using query logging, you can just run profiler and manually cut and paste the subcube vector from the Query Subcube event if you prefer.)
  7. Click the “Eliminate Redundancy” and “Eliminate Duplicates” buttons as usual.
  8. Click the “Validate Aggregations” button. It should prompt you with a message similar to: “This aggregation contains many-to-many dimension [Sales Reason]. It will not be used unless it also contains [Internet Sales Order Details].[Internet Sales Order]. The many-to-many dimension [Sales Reason] itself should not be included in the aggregation to workaround a bug.” For each agg with such a warning, edit that agg, then double-click the attribute in the m2m dimension which is already selected. This will unselect then attempt to reselect that attribute. At this point, Agg Manager will prompt you with a message similar to: “This aggregation will not be used when querying many-to-many dimension [Sales Reason] unless it also contains [Internet Sales Order Details].[Internet Sales Order]. The many-to-many dimension [Sales Reason] itself should not be included in the aggregation to workaround a bug. Would you like BIDS Helper to fix this for you?” Click Yes.

The other idea around many-to-many dimension optimization that I would like to mention is adding extra dimensions (particularly the time dimension) to the intermediate measure group and partitioning by them. Chris Webb explained this idea, too. To put it in my terms, if there’s a good deal of turnover among m2m signatures over time (i.e. signatures are retired over time) then this is a strategy to consider. In the whitepaper there are only a few tests in which the Matrix Relationship Optimization technique doesn’t perform as well as the other design alternatives. (The best example is on page 33.) I suspect that adding the time dimension to the intermediate measure group will help (though possibly at the expense of performance on other queries). As for the whitepaper, we agreed this situation is a bit more of a fringe scenario because retiring m2m signatures doesn’t happen as frequently in the real world, so we didn’t complicate the whitepaper by mentioning it. However, it is something to be aware of as you’re analyzing your m2m dimension data.

I’m glad to see that Erik Veerman’s m2m optimization ideas (#1 and #2) have now been fully fleshed out and publicized.

Long Navigation Parameters

I was shocked recently when my nice report that was executing in 2 seconds suddenly took 22 seconds to finish after I enabled navigation (i.e. drillthrough links) in every textbox in the report. The point of this post is to explain what’s going on and suggest workarounds to improve performance.


The Symptoms

This particular report was an MDX report in Reporting Services 2005. If your MDX reports are anything like mine, you end up having an obscene number of multi-select parameters to give the end user maximum flexibility. This report was no exception. In SQL reports, the parameter values are often integers. However, in MDX reports, the parameter values are usually MDX unique member names such as [Product].[Category].&[Bikes]. You can already see the problem. If you have many large multi-select parameters, the concatenated list of selected parameter values becomes large. In fact, for this report, that list ended up being about 30KB long. Multiply that times 3,000 textboxes in the report, and your speedy report is now a 90MB beast.

Another interesting thing I noticed is that of the TimeDataRetrieval, TimeProcessing, and TimeRendering columns in ReportServer.dbo.ExecutionLog, the only one that changed when I turned on navigation was the TimeProcessing column.

Also, you might guess shortening the name your “Geography” report parameter to “G” would help, but in fact, the length of the report parameter name does not matter (just the length of the selected parameter values).

The final interesting thing I noticed is that the Drillthrough chunk in ReportServerTempDB was massive. Clearly, part of the expense of lots of huge drillthrough links was in constructing and saving that chunk to ReportServerTempDB. The following query shows you the chunks and their sizes:

select sd.SnapshotDataID
,sd.CreatedDate
,sd.ExpirationDate
,s.ReportPath
,s.Expiration
,s.CreationTime
,c.ChunkName
,datalength(c.content) as ChunkBytes
from ReportServerTempDB.dbo.SnapshotData sd
join ReportServerTempDB.dbo.SessionData s
on s.SnapshotDataID = sd.SnapshotDataID
join ReportServerTempDB.dbo.ChunkData c
on c.SnapshotDataID = sd.SnapshotDataID
order by sd.CreatedDate, c.ChunkName

I did a little research into how drillthrough works in SSRS 2005. Here is my understanding based solely on observation of the HTML in Report Manager and watching SQL profiler events against the ReportServer database. When a report is rendered in Report Manager, it does not pass the entire URL for each drillthrough link to the web browser. All it does is capture the cell coordinates you clicked when you click a drillthrough link. Then it posts that event back to the server. Then the server runs a few SQL queries against ReportServerTempDB to lookup info from that ChunkData table. I was rather impressed with how efficient this lookup was as it did not require reading the entire chunk. From what I can tell, that drillthrough chunk is some sort of binary structure that is internally indexed somehow. When the server goes to lookup the full drillthrough URL for the link you just clicked, it grabs the first few segments of that drillthrough chunk (which I’m guessing is loading the “index” for the entire chunk) and then it skips right to the segment of that chunk which has the info it needs. So my conclusion is that in SSRS 2005, they have done a good job optimizing the click of a drillthrough link. However, the up-front processing required to optimized that click is rather expensive.

Indications so far are that SSRS 2008 will improve this scenario. I tried this in the November CTP of SSRS 2008 and the report with the navigation rendered the first page 2x faster than on SSRS 2005 SP2. Interestingly enough, the time difference between a report with navigation and a report without navigation now shows up in SSRS 2008 under the TimeRendering column, not the TimeProcessing column. Also, the chunks are stored differently in SSRS 2008. Hopefully someone from Microsoft will fill us in on how they were able to achieve this performance improvement. (Update: Robert Bruckner from Microsoft responded here.)


Workaround #1

Short of waiting for SSRS 2008, what is the workaround? I wrote an MDX sproc called ASSP.KeysStrToSet. You can refer to all the technical details, get some usage examples, and download some sample reports at that link. But let me explain the workaround at a high level. There are two parts.

First, you need to find a way to shorten the parameter values. In general, this is pretty easy as you can just use the surrogate key (i.e. use “1” as the key instead of “[Product].[Product].&[1]”). For optimal performance, change the parameter datatype to Integer if possible. However, there are a couple of more advanced scenarios such as composite keys or nullable keys which add some complexity. The link to the Analysis Services Stored Procedure project above explains all these advanced scenarios and the solution.

Second, you need to figure out a way to let MDX accept those shortened parameter values. That’s where the ASSP.KeysStrToSet function comes into play. Here’s an example. The following two expressions return the same set:

StrToSet("{ [Product].[Product].&[1],[Product].[Product].&[2] }")
ASSP.KeysStrToSet("[Product].[Product]", "1,2")

How much improvement should you expect from this workaround? Our tests on reports with lots of navigation-enabled textboxes have shown anywhere from a 2x to an 8x performance improvement in the time it takes to render to the browser in Report Manager. Of course, that depends upon how many textboxes have navigation enabled and on how large the drillthrough links are for each textbox.

I suspect utilizing this workaround with SSRS 2008 will make the performance improvement even greater. (Unfortunately, I can’t test that theory on the November CTP at the moment because of a bug.)

If you agree with this approach, please vote for this feature suggestion so that KeysStrToSet functionality becomes available in a built-in MDX function.


Workaround #2

Another possible workaround is to add a “Show Drillthrough?” parameter that controls whether or not to enable drillthrough on the textboxes. Then in the “Jump to Report” property, put the following expression:
=IIf(Parameters!ShowDrill.Value,"MyDrillReportName",Nothing)

In SSRS 2005 (it’s not necessary in SSRS 2008 apparently), you also have to wrap each parameter you hook up to the drill report in that way:
=IIf(Parameters!ShowDrill.Value,Parameters!City.Value,Nothing)

You should probably use workaround #2 in conjunction with workaround #1 to achieve good performance when they flip the “Show Drillthrough?” parameter to True.

Note you may have to make that “Show Drillthrough?” parameter a string to workaround a bug.


Conclusions

Enabling drillthrough on every cell in a report can drastically slow report rendering if the drillthrough parameters are very long. However, you can adequately workaround this problem by shortening the parameter values and dynamically disabling drillthrough when it is not needed.

Retrieving Cell Properties

The point of this post is to discuss how to retrieve cell properties. Specifically, I want to call attention to the ReturnCellProperties setting on an Analysis Services connection and command. I will also talk about how you can use cell properties in Reporting Services reports.

It must be said that Teo Lachev beat me on publishing this tip. I had already drafted most of this post and had a couple of extra things to say on the topic, so I’m publishing it anyway.

In an MDX query, you’ve got dimension properties and cell properties. Dimension properties are intrinsic member properties such as MEMBER_CAPTION in addition to member properties you define in your dimension (such as the Customer attribute having a member property called Email). Cell properties are properties such as VALUE, FORMATTED_VALUE, and FORMAT_STRING which are simply extra information about a cell.

In most cubes, an effort has been made to properly set the FORMAT_STRING for all measures so that percents are formatted as percents and currency is formatted as currency. (Sometimes, the calc script even customizes the FORMAT_STRING for particular slices of the cube.) Therefore, it’s important to know how to retrieve the FORMATTED_VALUE cell property so that you can fully leverage all the cube provides you.

When executing an MDX query, you can retrieve the results in two-dimensional (DataReader/DataSet) or multi-dimensional (CellSet) format. The CellSet object lets you easily access any cell properties the query returns. On the other hand, it is not obvious how it is possible to retrieve any cell property other than VALUE in a DataReader/DataSet.

Putting the ReturnCellProperties=True property on the connection string changes the columns you get in a DataReader/DataSet:

AdomdConnection conn = new AdomdConnection();
conn.ConnectionString =
"Data Source=(local);Initial Catalog=YourDB;ReturnCellProperties=True";

Similarly, setting the ReturnCellProperties property on a command object will produce the same effect:

AdomdCommand cmd = new AdomdCommand();
cmd.Properties.Add("ReturnCellProperties", true);

This is obviously relevant if you’ve got a custom .NET application that executes MDX queries because it allows you to retrieve cell properties. It’s also relevant if you’re using MDX queries in Reporting Services. If you’re using the OLE DB provider to connect to Analysis Services (which is less common in SSRS2005), you can retrieve cell properties by adding ReturnCellProperties=True to the connection string which causes the extra columns to show up. However, a “Microsoft SQL Server Analysis Services” data source in Reporting Services automatically sets the ReturnCellProperties property (and the DbpropMsmdFlattened2 property, which I won’t discuss) to true on the underlying command object. That data source also hides the extra columns from you when you’re previewing your dataset, so you just have to trust that they are there.

select [Measures].[Internet Sales Amount] on 0,
[Product].[Category].Members on 1
from [Adventure Works]

If your MDX query for your report does not specify a CELL PROPERTIES clause (as in the query above), then you get the VALUE and FORMATTED_VALUE properties by default in Reporting Services. Obviously, =Fields!FieldName.Value retrieves the VALUE cell property. You can retrieve the FORMATTED_VALUE cell property with =Fields!FieldName.FormattedValue expression. More information on all the field properties you can use in SSRS expressions is available here.

select [Measures].[Internet Sales Amount] on 0,
[Product].[Category].Members on 1
from [Adventure Works]
CELL PROPERTIES VALUE, FORMATTED_VALUE, FORMAT_STRING

If your MDX query for your report does specify a CELL PROPERTIES clause (as in the query above), you override the defaults. This is useful for instructing Reporting Services to retrieve fewer properties for performance reasons. (For instance, a best practice is to specify “CELL PROPERTIES VALUE” so you don’t retrieve FORMATTED_VALUE if you’re not going to use the FormattedValue expression in your report.) Overriding the default CELL PROPERTIES is also useful to pull back extra properties such as FORMAT_STRING. Then in a report you can retrieve the FORMAT_STRING for a cell by using the =Fields!FieldName("FORMAT_STRING") expression.

There is a known limitation such that you cannot use the SSRS Aggregate expression and FormattedValue together. This limits the usefulness of the FormattedValue property. A workaround is to format the textbox in Reporting Services instead of using the FormattedValue property. This formatting could be hardcoded, but sometimes measures have different format strings for different slices of the cube. So if the format string is not constant, you can make the SSRS formatting dynamic by putting a =Fields!FieldName("FORMAT_STRING") expression on the Format property of a textbox. For footers, it would need to be =First(Fields!FieldName("FORMAT_STRING")). Note that .NET format strings (which SSRS uses) aren’t quite the same as SSAS format strings, so you may need to write a small function in the Code tab of your report and do replacements such as changing Currency to c0.

As a side note, Darren Gosbell blogged about an easy way to see the connection string properties for SSAS, but unfortunately, ReturnCellProperties is one of several properties that don’t appear in that list. You have to look at this page to see all the available properties. Mosha says that you can use all the XMLA properties on the connection string.

ProcessAdd Examples

The concept of incremental processing is well documented by T.K. Anand's fabulous whitepaper on processing (which you should probably digest before continuing), but there is a severe lack of full working example XMLA commands. The purpose of this post is to provide full working examples of ProcessAdd against Adventure Works.

1. ProcessAdd Partition (download ProcessAdd Partition.xmla)
This example uses an out-of-line query binding to run a ProcessAdd on the 2004 partition of the Internet Sales measure group. Real-life use for such an XMLA command would require that you specify an out-of-line SQL query which retrieves rows that currently do not exist in the cube. No such rows were available in AdventureWorksDW, so I just specified a query which would add duplicate rows to the partition for illustration purposes. This XMLA command can be created in Management Studio by right-clicking on a partition, choosing Process, changing to Process Incremental, clicking Configure, setting a query, then clicking the Script button:
ProcessIncremental

2. ProcessAdd Dimension (download ProcessAdd Dimension.xmla)
This example uses an out-of-line data source view to run a ProcessAdd on the Sales Territory dimension. Because dimensions can be built off of multiple tables from the DSV, you cannot use an out-of-line query binding like what was used for example #1. Instead, you have to use an out-of-line DSV. Real-life use for such an XMLA command would entail specifying a named query in the out-of-line DSV which returned only rows which did not exist in the Analysis Services dimension. Rows which already exist will be ignored, so it is wasteful to allow your query to return them. No such rows were available in AdventureWorksDW, so I just specified a query which returned rows that already exist in the dimension for illustration purposes. With the example out-of-line DSV using the query it does, the Sales Territory dimension will not actually get any extra members.

There is no UI for creating this XMLA command in Management Studio. So I recommend the following steps if you wish to build your own ProcessAdd command for a dimension.

  1. Right-click on the dimension in Management Studio, choose Process, change to ProcessUpdate, and click the Script button. Change ProcessUpdate to ProcessAdd in the XMLA script.
  2. In Business Intelligence Development Studio (BIDS), right-click in Solution Explorer on the Data Source object that currently provides the connection for processing this dimension, and choose View Code. Copy that XML. Paste it into the <Process> tag. Remove all the properties of the DataSource except for ID, Name, and ConnectionString. Note that even if you're not changing the connection string, it is still necessary to include an out-of-line data source with the out-of-line DSV. This is a known issue that will hopefully be fixed sometime after SP2.
  3. In BIDS, open the data source view which your dimension is currently processed off of. Right-click on the table(s) your dimension currently uses, and choose "Replace Table... With New Named Query..." Then edit the query specifying the appropriate where clause. Do not save the DSV changes. Right-click on the .dsv file in Solution Explorer and choose View Code. Copy the entire XML and paste it into the <Process> tag. Remove the CreatedTimestamp and LastSchemaUpdate properties (to avoid a warning message). There are other large sections (such as the Annotations) you can remove if you like, but they will not hurt anything.

Note that if you have more than one <Process> command, the out-of-line DSV must contain the tables required for all dimensions or partitions being processed which use that DSV normally. Here is such an example: Process Multiple Objects With Out-of-Line DSV.xmla

Normal processing involves Analysis Services querying a SQL database for the necessary data. Analysis Services also supports push mode processing where the data itself is provided in the XMLA command. Integration Services uses push mode processing whenever you have a Partition or Dimension Destination:
PartitionDestination
DimensionDestination

3. Push Mode ProcessAdd for Partition (download Push Mode ProcessAdd for Partition.xmla)
This example shows how to process a partition using push mode by specifying an out-of-line rowset. Generally push mode processing is done using Integration Services. If you need to do it without the aid of SSIS, I still recommend you build a simple package in SSIS, then run Profiler while you run the SSIS package. That XMLA query you will see in Profiler will not have the rowset schema or the actual data, but it will get you started. Note that in SSIS when you're using a Partition or Dimension destination, if a column is used twice (such as in the key and the name), you need to duplicate that column and put it in the pipeline twice. This is only a restriction of the UI in SSIS, not a restriction of push mode processing.

4. Push Mode ProcessAdd for Dimension (download Push Mode ProcessAdd for Dimension.xmla)
This example shows how to process a dimension using push mode by specifying an out-of-line rowset. See above for more information on push mode processing.

Miscellaneous Notes

  • If a dimension or partition is unprocessed, you can't do ProcessAdd on it. So make sure your processing procedures check this and do a ProcessFull the first time.
  • The MemoryUsage class of ASSP uses push mode processing to load a cube incrementally without reading from a SQL database.
  • This post goes hand-in-hand with another post which examines the performance of ProcessAdd.
  • Push mode processing is only allowed on partitions or dimensions which use one table from the DSV. So that eliminates snowflaked dimensions or measure groups with reference dimensions.
ProcessAdd on Large Dimensions

ProcessFull, ProcessUpdate, and ProcessAdd are the three processing options for keeping data fresh in dimensions in Analysis Services 2005. This post takes a look at these options as they relate to processing very large dimensions.


Background

When considering the different possibilities for a nightly Analysis Services processing strategy, we usually recommend the following evolution which attempts to weigh maintenance complexity versus processing performance. The first processing scheme is low complexity. The other partitioning schemes add complexity in order to improve performance.

Processing Scheme #1: ProcessFull entire database
If doing a ProcessFull on the entire database completes within the allowed time, then this is obviously the most straightforward method of keeping your cubes fresh. If ProcessFull takes too long, then...

Processing Scheme #2: ProcessUpdate dimensions and process fewest partitions possible
Next we recommend they look into partitioning their measure groups such that they only have to process a small subset of the partitions every night. For instance, if you partition by month, you may be able to just process the current month's partition nightly. Since not all partitions will be reprocessed nightly, dimension processing must be done so as not to unprocess the existing partitions. The easiest way is to accomplish this is to ProcessUpdate all dimensions, then ProcessData the partitions that have changed data (i.e. the current month), then ProcessDefault each measure group. (ProcessDefault will process aggs and indexes on the partitions you just processed. In addition, it will reprocess the flexible aggs which were dropped on old partitions during ProcessUpdate of dimensions.) This strategy is moderate complexity and requires custom coding to process your cube database. If this strategy isn't fast enough, it may be that there is an opportunity to optimize the processing of huge dimensions...

Processing Scheme #3: ProcessAdd dimensions instead of ProcessUpdate
Next we recommend looking into whether your huge dimensions allow updates or just inserts in your SQL data warehouse. If you can guarantee there will be only inserts and no updates, then you can speed up the processing of these huge dimensions by doing a ProcessAdd on them. This option adds more complexity on top of #2, so the purpose of this post is to discuss whether this option is worth it.

We have previously encountered a massive dimension which adds 20 million rows per month. At the time, it was being maintained using processing scheme #2, but the nightly processing performance needed to be tuned further. Since you can't partition dimensions in Analysis Services 2005, ProcessAdd is the option you're left with. I did the following research to decide whether it was worth the added complexity to recommend processing scheme #3. We ended up deciding it was worth it, but in order to do it, we had to move the columns from the large dimension which allowed updates and put them in a smaller, separate dimension. This seemed a reasonable compromise.

This post goes hand-in-hand with another post which gives full working XMLA examples of various ProcessAdd commands.


Setup for the Performance Test

I created a simple table:

create table LargeDimension (
ID int not null primary key clustered
,NumberAttribute int not null
,StringAttribute varchar(10) not null
,Subcategory varchar(10) not null
,Category varchar(10) not null
)

And built an Analysis Services database with nothing but the following dimension:

 LargeDimensionAttributeLattice

created with BIDS Helper

The only column that has high cardinality is the key (which is ID column in the table and is called Large Dimension in the attribute lattice picture above). Every other column has under 100 distinct values.


The Performance Test

The tests were done on an x64 server with 16GB of RAM (plenty for this test). I did two flavors of each test. On the first (the "Ordered" column below) I went with the defaults. All dimension attributes had AttributeHierarchyOptimizedState=FullyOptimized and AttributeHierarchyOrdered=true. On the second (the "Unordered" column below) I set AttributeHierarchyOrdered=false just on the key attribute. (I did try some other combinations but they didn't make a significant difference in terms of dimensions processing time. The AttributeHierarchyOptimizedState setting will be more significant if there are non-key attributes which are high cardinality. It will also impact performance of partition processing, but that's outside the scope of this post.)

Test Case Description

Ordered
Minutes

  Unordered
Minutes

A. ProcessFull 10 million rows

1:30

1:06

B. ProcessUpdate 10 million rows with zero rows changed

1:46

1:30

C. ProcessUpdate 10 million rows with 10% of rows changed

2:01

1:28

D. ProcessAdd one new row to the existing 10 million rows

0:54

0:21

E. ProcessAdd 1 million new rows to the existing 10 million rows

1:06

0:41

F. ProcessAdd 90 million new rows to the existing 10 million rows

17:36

12:09

G. ProcessFull 100 million rows

19:04

13:23

H. ProcessAdd one new row to the existing 100 million rows

11:32

5:04

I. ProcessAdd 10 million new rows to the existing 100 million rows

22:50

17:08

J. ProcessUpdate 100 million rows with no rows changed

34:15

29:02

K. ProcessUpdate 100 million rows with 10% of rows changed

34:24

29:08


Conclusions

  • It is clear that ProcessAdd is worth the extra effort if you need to further optimize ProcessUpdate dimension processing.
  • It is clear that AttributeHierarchyOrdered=false on high cardinality attributes makes a significant difference to dimension processing. Test case H shows that AttributeHierarchyOrdered=false on high cardinality attributes can double performance of ProcessAdd.
  • ProcessUpdate is by far the most expensive option available. It is so much more expensive than ProcessFull that you should definitely verify whether processing scheme #1 actually performs slower than processing scheme #2 on your dataset.

Other Notes

  • Test case H unordered took about 5 minutes which is about exactly the time it took to duplicate the entire SSAS dimension directory to the same drive via copy/paste. (Those SSAS dimension files for 100 million rows took up about 8GB on disk.) So a good way to estimate your best-case ProcessAdd performance is to manually duplicate the dimension files to the same drive. This makes sense because during processing, Analysis Services works on a duplicate set of files until finished, at which point it swaps in the new files and drops the old files. This is done so there will be no cube downtime during processing.
  • Chris Webb discusses an alternate way of dealing with large dimensions by breaking them up. If your dimension ID was 123456789 then you could break that into three dimensions with keys 123, 456, and 789. While this makes end user ad-hoc analysis more clumsy, it certainly gets around the problem of large dimensions being expensive to process.
  • I also tried processing the dimension I tested above using the ProcessingGroup=ByTable setting, and the server ran out of memory by the time it had read in 60 million rows. Incidentally, page 68 of the performance guide has some helpful information on deciding when ByTable may perform better than ByAttribute. But bottom line, you won't know until you test it.
  • Test ProcessAdd thoroughly to make sure it is stable on your hardware and dataset. I've had sporatic trouble with it getting into an infinite loop and running out of disk space. If I had to guess, it appears to happen when your dimension can't be loaded into memory. I've reported this bug here.
  • I also researched the possibility of making the large dimension a ROLAP dimension, but I was disappointed with query performance.