Search  
  July 5, 2008
Blog List

BI Quick Start

Raise your BI awareness with our special BI Quick Start offering. Learn more ...


  You are here:  Blogs   
Most recent blog entries
Analysis Services Project Configurations
Greg Galloway By Greg Galloway on 3/19/2008 6:12 PM

If you’ve regularly found yourself changing the target server property and the data source connection string before you deploy your cube to another environment, it’s simply because you’re not aware of a feature that hasn’t been publicized well. (I’m just judging based on my own ignorance of this feature until recently. However, now that I go look, both this article and Teo Lachev’s book mention Configuration Manager, though they don’t mention the exact list of settings that are configurable.)


Visual Studio Configurations
Visual Studio has a configurations feature so that you can specify settings for different build types. In .NET projects, it’s usually used to compile debug versus release builds. For business intelligence projects, it can be used to setup a configuration per deployment target environment:
Configurations Dropdown

What settings can be saved per configuration? Obviously, all the settings in the Project Properties dialog:
Property Pages 


What I Didn’t Know…
What I didn’t know until recently was that there are other settings (like the data source connection strings) in the Analysis Services project which can be controlled per configuration. Basically, any setting which is environment specific is “configurable.” I’ll enumerate those settings below.

The other tidbit to discuss is where these settings are stored. Some are stored in the .dwproj file and others are stored in the .dwproj.user file. The settings in the dwproj file can be considered “shared” because this file is checked into source control and shared among team members. The settings in the dwproj.user file can be considered “private” as this file often not checked into source control or shared. The counterargument is that the dwproj.user file holds the TargetServer setting which is a crucial part of setting up separate Visual Studio configurations per environment, so you may want to consider checking in that file.


Configurable Settings
The following table enumerates all the settings controlled per configuration, so you should be sure to review each of them when you setup a new deployment environment.

Setting Name

Storage File

Where To Edit Setting

ConnectionMappings

dwproj

Double click data source… edit connection string. The connection string is saved with the SQL security username and password removed.

ConnectionProviderMappings

dwproj

Double click data source… change provider

ConnectionSecurityMappings

dwproj

Double click data source…  this setting indicates whether SQL security username and password was removed when saved to disk. The username and passwords are stored in the dwproj.user file (see the UserIDs and UserPasswords setting below.)

CubeKeyErrorLogFiles

dwproj

Cube properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

CubeStorageLocations

dwproj

Cube properties… Configurable… StorageLocation

DeploymentMode

dwproj.user

Project Properties dialog

DeploymentServerEdition

dwproj

Project Properties dialog

DimensionKeyErrorLogFiles

dwproj

Dimension Properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

MeasureGroupKeyErrorLogFiles

dwproj

Measure Group Properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

MiningStructureKeyErrorLogFiles

dwproj

Mining Structure Properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

OutputPath

dwproj

Project Properties dialog

PartitionKeyErrorLogFiles

dwproj

Partition Properties… Advanced… ErrorConfiguration… (custom)… KeyErrorLogFile

PartitionRemoteServers

dwproj

Only applicable when using remote partitions

PartitionStorageLocations

dwproj

Partition properties… Configurable… StorageLocation

ProcessingOption

dwproj.user

Project Properties dialog

RemovePasswords

dwproj

Project Properties dialog

ReportActionPaths

dwproj

Cube Actions tab… properties on a report action… Path property

ReportActionServers

dwproj

Cube Actions tab… properties on a report action… ReportServer property

StartObject

dwproj.user

Project Properties dialog

TargetDatabase

dwproj.user

Project Properties dialog

TargetServer

dwproj.user

Project Properties dialog

TransactionalDeployment

dwproj.user

Project Properties dialog

UserIDs

dwproj.user

Double click data source…  set the SQL security username

UserPasswords

dwproj.user

Double click data source…  set the SQL security password

Visual Studio configurations are very helpful for SSAS. In fact, I would call it a best practice to use them.


Configurations vs. Deployment Wizard
When is it appropriate to use Visual Studio configurations, and when is it appropriate to use the Analysis Services Deployment Wizard? I would recommend you always use Visual Studio configurations, and if you’re doing a more advanced deployment where the Deployment Wizard would help, then you can use it, too. (The Deployment Wizard has some advanced deployment options like retaining partitions on the server, or retaining security roles on the server, etc.)

The Deployment Wizard works very well together with Visual Studio configurations. When you build or deploy your cube from Visual Studio, it writes files to the bin directory. The “bin\.configsettings” file contains all your Analysis Services project configurations, and that file drives the defaults on the “Specify Configuration Properties” screen in the Deployment Wizard. If you’ve properly setup your Visual Studio configurations, then you’ll need to do less work in the Deployment Wizard.


Visual Studio Configurations for SSIS and SSRS
Integration Services has a project properties dialog that lets you configure settings regarding debugging and regarding the building of the deployment manifest for use with the Package Installation Wizard. (And thanks to BIDS Helper, it has settings regarding deploying packages directly from BIDS.) And obviously those settings from the project properties dialog can be stored per Visual Studio configuration. SSIS projects do also have a Data Sources folder in Solution Explorer, and the connection strings are stored per Visual Studio configuration. However, the Data Sources folder is more trouble than it is worth. (See the comments from July 21, 2007 at the bottom of this post of Jamie Thompson’s.) You should use package configurations to control connection strings, and package configurations are unrelated to Visual Studio configurations.

Reporting Services has a project properties dialog that lets you configure settings regarding deploying of reports. These settings can be stored per Visual Studio configuration. However there are no other settings that are stored per configuration. Unfortunately, connection strings of the data sources in the Shared Data Sources folder are not controlled per configuration. (But the OverwriteDataSources setting in the project properties dialog prevents this from hurting you much. You just have to manually configure the data source in Report Manager the first time you deploy that data source, and then it will not be overwritten on subsequent deployments.)

Comments (1)

Incremental Many-to-Many Matrix Package
Greg Galloway By Greg Galloway on 1/9/2008 1:47 PM

I was privileged to be able to work with Carl Rabeler from the SQLCAT team on a good whitepaper which performance tests several optimization techniques for many-to-many dimensions in Analysis Services. The conclusion was that a technique they call Matrix Relationship Optimization with the appropriate aggs results in the best MDX query performance. In a nutshell, the idea is that there are recurring patterns or signatures among m2m dimensions. For instance, in the Adventure Works database, every sales order in the database shares one of twelve common sales reason signatures. Query performance can be improved by reusing these shared signatures.

This optimization technique pushes some of the work to the ETL layer. Along with the whitepaper, they distributed sample Integration Services packages to load the matrix relationship tables. Those packages were, I thought, rather ingenious in their use of the script transforms to pivot and unpivot the sales reasons data and accomplish everything in one dataflow task. The packages they distributed are coded to do a full refresh during each run. I created a spinoff of their sample package that does an incremental load, so I thought I would post it in case it helps others get started. Download IncrementalSalesReasonMatrixLoad.zip and run against a fresh copy of AdventureWorksDW.

Incremental Sales Reason Matrix Load Package Screenshot

I would recommend using the incremental load package if (a) you have the ability to easily detect which rows in the source system have changed since the last successful run and (b) if the incremental load will not produce many unused signatures over time.

The whitepaper also discusses aggregations for m2m dimensions in detail. Carl’s experience with real customers suggested that aggs on the intermediate measure group are usually very helpful (i.e. are considerably smaller than the intermediate measure group itself). Thankfully, these aggs can be built as normal since the Query Subcube event in Profiler accurately reports the subcube vector.

On the other hand, he found that in real life, aggs on the data measure group to help with queries against m2m dimensions often end up being about the same size as the measure group itself, thus they are not helpful. It all just depends on your data. An agg is smaller on disk not only because it has a lower rowcount, but because it has fewer dimension keys. (It’s not as tall and not as wide.) Therefore, data measure groups with a large number of dimensions unused in queries benefit from aggs, even if the rowcount of the resulting agg alone isn’t much less than the measure group rowcount itself. So it is helpful to know how to build m2m aggs on the data measure group. Unfortunately, these aggs must be built by hand using Agg Manager, and they are rather tricky to build. Let me suggest the following steps for building them:

  1. Turn on Query Logging and set QueryLogSampling to 1.
  2. Install ASSP and run the following before each MDX query: CALL ASSP.ClearCache()
  3. Run your MDX queries which use the m2m dimension.
  4. Turn off Query Logging by setting QueryLogSampling to 0.
  5. Install BIDS Helper and launch Agg Manager. Right click on the agg design for your data measure group (e.g. Internet Sales) and choose “Add Aggregations from Query Log.” Click the “Execute SQL” button and then click OK after the query returns. Right click on the agg design and choose Edit.
  6. (Instead of using query logging, you can just run profiler and manually cut and paste the subcube vector from the Query Subcube event if you prefer.)
  7. Click the “Eliminate Redundancy” and “Eliminate Duplicates” buttons as usual.
  8. Click the “Validate Aggregations” button. It should prompt you with a message similar to: “This aggregation contains many-to-many dimension [Sales Reason]. It will not be used unless it also contains [Internet Sales Order Details].[Internet Sales Order]. The many-to-many dimension [Sales Reason] itself should not be included in the aggregation to workaround a bug.” For each agg with such a warning, edit that agg, then double-click the attribute in the m2m dimension which is already selected. This will unselect then attempt to reselect that attribute. At this point, Agg Manager will prompt you with a message similar to: “This aggregation will not be used when querying many-to-many dimension [Sales Reason] unless it also contains [Internet Sales Order Details].[Internet Sales Order]. The many-to-many dimension [Sales Reason] itself should not be included in the aggregation to workaround a bug. Would you like BIDS Helper to fix this for you?” Click Yes.

The other idea around many-to-many dimension optimization that I would like to mention is adding extra dimensions (particularly the time dimension) to the intermediate measure group and partitioning by them. Chris Webb explained this idea, too. To put it in my terms, if there’s a good deal of turnover among m2m signatures over time (i.e. signatures are retired over time) then this is a strategy to consider. In the whitepaper there are only a few tests in which the Matrix Relationship Optimization technique doesn’t perform as well as the other design alternatives. (The best example is on page 33.) I suspect that adding the time dimension to the intermediate measure group will help (though possibly at the expense of performance on other queries). As for the whitepaper, we agreed this situation is a bit more of a fringe scenario because retiring m2m signatures doesn’t happen as frequently in the real world, so we didn’t complicate the whitepaper by mentioning it. However, it is something to be aware of as you’re analyzing your m2m dimension data.

I’m glad to see that Erik Veerman’s m2m optimization ideas (#1 and #2) have now been fully fleshed out and publicized.

Comments (0)

Long Navigation Parameters
Greg Galloway By Greg Galloway on 11/21/2007 5:28 PM

I was shocked recently when my nice report that was executing in 2 seconds suddenly took 22 seconds to finish after I enabled navigation (i.e. drillthrough links) in every textbox in the report. The point of this post is to explain what’s going on and suggest workarounds to improve performance.


The Symptoms

This particular report was an MDX report in Reporting Services 2005. If your MDX reports are anything like mine, you end up having an obscene number of multi-select parameters to give the end user maximum flexibility. This report was no exception. In SQL reports, the parameter values are often integers. However, in MDX reports, the parameter values are usually MDX unique member names such as [Product].[Category].&[Bikes]. You can already see the problem. If you have many large multi-select parameters, the concatenated list of selected parameter values becomes large. In fact, for this report, that list ended up being about 30KB long. Multiply that times 3,000 textboxes in the report, and your speedy report is now a 90MB beast.

Another interesting thing I noticed is that of the TimeDataRetrieval, TimeProcessing, and TimeRendering columns in ReportServer.dbo.ExecutionLog, the only one that changed when I turned on navigation was the TimeProcessing column.

Also, you might guess shortening the name your “Geography” report parameter to “G” would help, but in fact, the length of the report parameter name does not matter (just the length of the selected parameter values).

The final interesting thing I noticed is that the Drillthrough chunk in ReportServerTempDB was massive. Clearly, part of the expense of lots of huge drillthrough links was in constructing and saving that chunk to ReportServerTempDB. The following query shows you the chunks and their sizes:

select sd.SnapshotDataID
,sd.CreatedDate
,sd.ExpirationDate
,s.ReportPath
,s.Expiration
,s.CreationTime
,c.ChunkName
,datalength(c.content) as ChunkBytes
from ReportServerTempDB.dbo.SnapshotData sd
join ReportServerTempDB.dbo.SessionData s
on s.SnapshotDataID = sd.SnapshotDataID
join ReportServerTempDB.dbo.ChunkData c
on c.SnapshotDataID = sd.SnapshotDataID
order by sd.CreatedDate, c.ChunkName

I did a little research into how drillthrough works in SSRS 2005. Here is my understanding based solely on observation of the HTML in Report Manager and watching SQL profiler events against the ReportServer database. When a report is rendered in Report Manager, it does not pass the entire URL for each drillthrough link to the web browser. All it does is capture the cell coordinates you clicked when you click a drillthrough link. Then it posts that event back to the server. Then the server runs a few SQL queries against ReportServerTempDB to lookup info from that ChunkData table. I was rather impressed with how efficient this lookup was as it did not require reading the entire chunk. From what I can tell, that drillthrough chunk is some sort of binary structure that is internally indexed somehow. When the server goes to lookup the full drillthrough URL for the link you just clicked, it grabs the first few segments of that drillthrough chunk (which I’m guessing is loading the “index” for the entire chunk) and then it skips right to the segment of that chunk which has the info it needs. So my conclusion is that in SSRS 2005, they have done a good job optimizing the click of a drillthrough link. However, the up-front processing required to optimized that click is rather expensive.

Indications so far are that SSRS 2008 will improve this scenario. I tried this in the November CTP of SSRS 2008 and the report with the navigation rendered the first page 2x faster than on SSRS 2005 SP2. Interestingly enough, the time difference between a report with navigation and a report without navigation now shows up in SSRS 2008 under the TimeRendering column, not the TimeProcessing column. Also, the chunks are stored differently in SSRS 2008. Hopefully someone from Microsoft will fill us in on how they were able to achieve this performance improvement. (Update: Robert Bruckner from Microsoft responded here.)


Workaround #1

Short of waiting for SSRS 2008, what is the workaround? I wrote an MDX sproc called ASSP.KeysStrToSet. You can refer to all the technical details, get some usage examples, and download some sample reports at that link. But let me explain the workaround at a high level. There are two parts.

First, you need to find a way to shorten the parameter values. In general, this is pretty easy as you can just use the surrogate key (i.e. use “1” as the key instead of “[Product].[Product].&[1]”). For optimal performance, change the parameter datatype to Integer if possible. However, there are a couple of more advanced scenarios such as composite keys or nullable keys which add some complexity. The link to the Analysis Services Stored Procedure project above explains all these advanced scenarios and the solution.

Second, you need to figure out a way to let MDX accept those shortened parameter values. That’s where the ASSP.KeysStrToSet function comes into play. Here’s an example. The following two expressions return the same set:

StrToSet("{ [Product].[Product].&[1],[Product].[Product].&[2] }")
ASSP.KeysStrToSet("[Product].[Product]", "1,2")

How much improvement should you expect from this workaround? Our tests on reports with lots of navigation-enabled textboxes have shown anywhere from a 2x to an 8x performance improvement in the time it takes to render to the browser in Report Manager. Of course, that depends upon how many textboxes have navigation enabled and on how large the drillthrough links are for each textbox.

I suspect utilizing this workaround with SSRS 2008 will make the performance improvement even greater. (Unfortunately, I can’t test that theory on the November CTP at the moment because of a bug.)

If you agree with this approach, please vote for this feature suggestion so that KeysStrToSet functionality becomes available in a built-in MDX function.


Workaround #2

Another possible workaround is to add a “Show Drillthrough?” parameter that controls whether or not to enable drillthrough on the textboxes. Then in the “Jump to Report” property, put the following expression:
=IIf(Parameters!ShowDrill.Value,"MyDrillReportName",Nothing)

In SSRS 2005 (it’s not necessary in SSRS 2008 apparently), you also have to wrap each parameter you hook up to the drill report in that way:
=IIf(Parameters!ShowDrill.Value,Parameters!City.Value,Nothing)

You should probably use workaround #2 in conjunction with workaround #1 to achieve good performance when they flip the “Show Drillthrough?” parameter to True.

Note you may have to make that “Show Drillthrough?” parameter a string to workaround a bug.


Conclusions

Enabling drillthrough on every cell in a report can drastically slow report rendering if the drillthrough parameters are very long. However, you can adequately workaround this problem by shortening the parameter values and dynamically disabling drillthrough when it is not needed.

Comments (0)

Oracle Drivers on an x64 Box for SSIS Development
Greg Galloway By Greg Galloway on 10/19/2007 10:32 AM

As Steve McHugh describes very well, getting Oracle drivers to work on an x64 box for use with Integration Services 2005 development is a challenge. We are indebted to Steve for explaining the underlying problems, and you should review his article before proceeding. The following instructions are, we feel, an improvement over Steve’s instructions for the best way to get Oracle drivers working on an x64 box on which you wish to develop packages.

If all you want to do is run scheduled packages under the x64 version of SSIS, you can just do step 2. If you want to develop packages on this box or run the SQL Server Import/Export Wizard, then you need to do all the steps.

  1. Download the latest 32-bit drivers and install them. We would suggest Oracle10g Release 2 ODAC.
  2. Download the latest 64-bit drivers and install them. We would suggest Oracle10g Release 2 ODAC (64-bit) 10.2.0.3 for Windows x64.
  3. Because you have several drivers installed, we suggest you add a TNS_ADMIN environment variable which says “C:\TNS” and then move your tnsnames.ora and sqlnet.ora files to that one “C:\TNS” directory. You can add a TNS_ADMIN environment variable by right clicking My Computer on your desktop, choosing Properties, flipping to the Advanced tab, clicking the Environment Variables button, and adding that variable.
  4. Note: During one install of the drivers, it failed because it couldn’t find gacutil.exe. If you get that failure, uninstall the Oracle driver you just tried to install, then copy gacutil.exe and gacutil.exe.config from <C:\program files\Microsoft Visual Studio 8\SDK\v2.0\Bin> to <C:\program files (x86)\Microsoft Visual Studio 8\SDK\v2.0\Bin>. The most recent time we did the installs, this was not a problem, so maybe Oracle has fixed this issue.
  5. All 32-bit SQL Server applications start under the “Program Files (x86)” directory. You need to fix them so they start in the “Progra~2” directory. The following steps will need to be rerun after every subsequent SQL service pack is installed.
    1. To fix the SQL Server Import/Export Wizard so you can launch it from Management Studio, fix the reference to it in the registry by running the “SQL import-export wizard on 64-bit server.reg” file that I have included in the ZIP file below. (That registry file fixes the HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\MSDTS\Setup\WizardPath registry path.)
    2. The shortcuts to SQL Server Management Studio and SQL Server Business Intelligence Development Studio need to be fixed. Unfortunately, you can’t just change the shortcut, because Windows automatically expands the “Progra~2”. Instead, create a bat file that launches the EXE, then change the shortcut to point to the bat file. These bat files and shortcuts are included in the ZIP file below. Basically, those bat files look like:
      start /B "C:\Progra~2\Microsoft Visual Studio 8\Common7\IDE" "C:\Progra~2\Microsoft Visual Studio 8\Common7\IDE\devenv.exe"
    3. When you double-click a .sln file, the path it uses for Visual Studio needs to be fixed. If you only have Visual Studio 2005 (i.e. SQL Server 2005) installed on this server, you can make the following change. Open the C drive… Go to the Tools menu… Folder Options… Flip to the File Types tab… Type in SLN to skip down to the SLN file type… Click the Advanced button… Highlight the “Open” action… Click Edit… Change the path to say:
      "C:\Progra~2\Microsoft Visual Studio 8\Common7\IDE\devenv.exe" "%1"
      Consider fixing any other file extensions you wish to double click which should launch 32-bit processes.
    4. Fix the PATH environment variable by changing any reference that says “C:\Program Files (x86)\Microsoft SQL Server\” to “C:\Progra~2\Microsoft SQL Server\”. And change any reference to “C:\Program Files (x86)\Microsoft Visual Studio 8” to “C:\Progra~2\Microsoft Visual Studio 8”. Environment variables can be edited by right clicking My Computer on your desktop, choosing Properties, flipping to the Advanced tab, clicking the Environment Variables button, choosing the Path system variable, and clicking Edit.
    5. Fix the shortcuts for other 32-bit applications that need to connect to Oracle (such as Toad) as outlined in step B above.
  6. We had trouble when our master SSIS package launched child packages out-of-process if those child packages needed to connect to Oracle. We decided to change those Execute Package Tasks to run in-process, and we didn’t investigate further.
  7. You will probably have to set the AlwaysUseDefaultCodePage property to True on the OLE DB Source components in your data flow tasks that pull from Oracle.

Download a oracle x64 fix.zip which contains a couple of shortcuts and the .reg file used in step 5a above.

Miscellaneous Note: To setup a linked server to Oracle, review this post by Max Oleznyuk.

Comments (6)

Retrieving Cell Properties
Greg Galloway By Greg Galloway on 8/23/2007 5:08 PM

The point of this post is to discuss how to retrieve cell properties. Specifically, I want to call attention to the ReturnCellProperties setting on an Analysis Services connection and command. I will also talk about how you can use cell properties in Reporting Services reports.

It must be said that Teo Lachev beat me on publishing this tip. I had already drafted most of this post and had a couple of extra things to say on the topic, so I’m publishing it anyway.

In an MDX query, you’ve got dimension properties and cell properties. Dimension properties are intrinsic member properties such as MEMBER_CAPTION in addition to member properties you define in your dimension (such as the Customer attribute having a member property called Email). Cell properties are properties such as VALUE, FORMATTED_VALUE, and FORMAT_STRING which are simply extra information about a cell.

In most cubes, an effort has been made to properly set the FORMAT_STRING for all measures so that percents are formatted as percents and currency is formatted as currency. (Sometimes, the calc script even customizes the FORMAT_STRING for particular slices of the cube.) Therefore, it’s important to know how to retrieve the FORMATTED_VALUE cell property so that you can fully leverage all the cube provides you.

When executing an MDX query, you can retrieve the results in two-dimensional (DataReader/DataSet) or multi-dimensional (CellSet) format. The CellSet object lets you easily access any cell properties the query returns. On the other hand, it is not obvious how it is possible to retrieve any cell property other than VALUE in a DataReader/DataSet.

Putting the ReturnCellProperties=True property on the connection string changes the columns you get in a DataReader/DataSet:

AdomdConnection conn = new AdomdConnection();
conn.ConnectionString =
 "Data Source=(local);Initial Catalog=YourDB;ReturnCellProperties=True";

Similarly, setting the ReturnCellProperties property on a command object will produce the same effect:

AdomdCommand cmd = new AdomdCommand();
cmd.Properties.Add("ReturnCellProperties", true);

This is obviously relevant if you’ve got a custom .NET application that executes MDX queries because it allows you to retrieve cell properties. It’s also relevant if you’re using MDX queries in Reporting Services. If you’re using the OLE DB provider to connect to Analysis Services (which is less common in SSRS2005), you can retrieve cell properties by adding ReturnCellProperties=True to the connection string which causes the extra columns to show up. However, a “Microsoft SQL Server Analysis Services” data source in Reporting Services automatically sets the ReturnCellProperties property (and the DbpropMsmdFlattened2 property, which I won’t discuss) to true on the underlying command object. That data source also hides the extra columns from you when you’re previewing your dataset, so you just have to trust that they are there.

select [Measures].[Internet Sales Amount] on 0,
[Product].[Category].Members on 1
from [Adventure Works]

If your MDX query for your report does not specify a CELL PROPERTIES clause (as in the query above), then you get the VALUE and FORMATTED_VALUE properties by default in Reporting Services. Obviously, =Fields!FieldName.Value retrieves the VALUE cell property. You can retrieve the FORMATTED_VALUE cell property with =Fields!FieldName.FormattedValue expression. More information on all the field properties you can use in SSRS expressions is available here.

select [Measures].[Internet Sales Amount] on 0,
[Product].[Category].Members on 1
from [Adventure Works]
CELL PROPERTIES VALUE, FORMATTED_VALUE, FORMAT_STRING

If your MDX query for your report does specify a CELL PROPERTIES clause (as in the query above), you override the defaults. This is useful for instructing Reporting Services to retrieve fewer properties for performance reasons. (For instance, a best practice is to specify “CELL PROPERTIES VALUE” so you don’t retrieve FORMATTED_VALUE if you’re not going to use the FormattedValue expression in your report.) Overriding the default CELL PROPERTIES is also useful to pull back extra properties such as FORMAT_STRING. Then in a report you can retrieve the FORMAT_STRING for a cell by using the =Fields!FieldName("FORMAT_STRING") expression.

There is a known limitation such that you cannot use the SSRS Aggregate expression and FormattedValue together. This limits the usefulness of the FormattedValue property. A workaround is to format the textbox in Reporting Services instead of using the FormattedValue property. This formatting could be hardcoded, but sometimes measures have different format strings for different slices of the cube. So if the format string is not constant, you can make the SSRS formatting dynamic by putting a =Fields!FieldName("FORMAT_STRING") expression on the Format property of a textbox. For footers, it would need to be =First(Fields!FieldName("FORMAT_STRING")). Note that .NET format strings (which SSRS uses) aren’t quite the same as SSAS format strings, so you may need to write a small function in the Code tab of your report and do replacements such as changing Currency to c0.

As a side note, Darren Gosbell blogged about an easy way to see the connection string properties for SSAS, but unfortunately, ReturnCellProperties is one of several properties that don’t appear in that list. You have to look at this page to see all the available properties. Mosha says that you can use all the XMLA properties on the connection string.


Comments (1)

ProcessAdd on Large Dimensions
Greg Galloway By Greg Galloway on 4/20/2007 6:55 PM

ProcessFull, ProcessUpdate, and ProcessAdd are the three processing options for keeping data fresh in dimensions in Analysis Services 2005. This post takes a look at these options as they relate to processing very large dimensions.


Background

When considering the different possibilities for a nightly Analysis Services processing strategy, we usually recommend the following evolution which attempts to weigh maintenance complexity versus processing performance. The first processing scheme is low complexity. The other partitioning schemes add complexity in order to improve performance.

Processing Scheme #1: ProcessFull entire database
If doing a ProcessFull on the entire database completes within the allowed time, then this is obviously the most straightforward method of keeping your cubes fresh. If ProcessFull takes too long, then...

Processing Scheme #2: ProcessUpdate dimensions and process fewest partitions possible
Next we recommend they look into partitioning their measure groups such that they only have to process a small subset of the partitions every night. For instance, if you partition by month, you may be able to just process the current month's partition nightly. Since not all partitions will be reprocessed nightly, dimension processing must be done so as not to unprocess the existing partitions. The easiest way is to accomplish this is to ProcessUpdate all dimensions, then ProcessData the partitions that have changed data (i.e. the current month), then ProcessDefault each measure group. (ProcessDefault will process aggs and indexes on the partitions you just processed. In addition, it will reprocess the flexible aggs which were dropped on old partitions during ProcessUpdate of dimensions.) This strategy is moderate complexity and requires custom coding to process your cube database. If this strategy isn't fast enough, it may be that there is an opportunity to optimize the processing of huge dimensions...

Processing Scheme #3: ProcessAdd dimensions instead of ProcessUpdate
Next we recommend looking into whether your huge dimensions allow updates or just inserts in your SQL data warehouse. If you can guarantee there will be only inserts and no updates, then you can speed up the processing of these huge dimensions by doing a ProcessAdd on them. This option adds more complexity on top of #2, so the purpose of this post is to discuss whether this option is worth it.

We have previously encountered a massive dimension which adds 20 million rows per month. At the time, it was being maintained using processing scheme #2, but the nightly processing performance needed to be tuned further. Since you can't partition dimensions in Analysis Services 2005, ProcessAdd is the option you're left with. I did the following research to decide whether it was worth the added complexity to recommend processing scheme #3. We ended up deciding it was worth it, but in order to do it, we had to move the columns from the large dimension which allowed updates and put them in a smaller, separate dimension. This seemed a reasonable compromise.

This post goes hand-in-hand with another post which gives full working XMLA examples of various ProcessAdd commands.


Setup for the Performance Test

I created a simple table:

create table LargeDimension (
ID int not null primary key clustered
,NumberAttribute int not null
,StringAttribute varchar(10) not null
,Subcategory varchar(10) not null
,Category varchar(10) not null
)

And built an Analysis Services database with nothing but the following dimension:
created with BIDS Helper

The only column that has high cardinality is the key (which is ID column in the table and is called Large Dimension in the attribute lattice picture above). Every other column has under 100 distinct values.


The Performance Test

The tests were done on an x64 server with 16GB of RAM (plenty for this test). I did two flavors of each test. On the first (the "Ordered" column below) I went with the defaults. All dimension attributes had AttributeHierarchyOptimizedState=FullyOptimized and AttributeHierarchyOrdered=true. On the second (the "Unordered" column below) I set AttributeHierarchyOrdered=false just on the key attribute. (I did try some other combinations but they didn't make a significant difference in terms of dimensions processing time. The AttributeHierarchyOptimizedState setting will be more significant if there are non-key attributes which are high cardinality. It will also impact performance of partition processing, but that's outside the scope of this post.)

Test Case Description Ordered
Minutes
  Unordered
Minutes
A. ProcessFull 10 million rows 1:30 1:06
B. ProcessUpdate 10 million rows with zero rows changed 1:46 1:30
C. ProcessUpdate 10 million rows with 10% of rows changed 2:01 1:28
D. ProcessAdd one new row to the existing 10 million rows 0:54 0:21
E. ProcessAdd 1 million new rows to the existing 10 million rows 1:06 0:41
F. ProcessAdd 90 million new rows to the existing 10 million rows 17:36 12:09
G. ProcessFull 100 million rows 19:04 13:23
H. ProcessAdd one new row to the existing 100 million rows 11:32 5:04
I. ProcessAdd 10 million new rows to the existing 100 million rows 22:50 17:08
J. ProcessUpdate 100 million rows with no rows changed 34:15 29:02
K. ProcessUpdate 100 million rows with 10% of rows changed 34:24 29:08


Conclusions

  • It is clear that ProcessAdd is worth the extra effort if you need to further optimize ProcessUpdate dimension processing.
  • It is clear that AttributeHierarchyOrdered=false on high cardinality attributes makes a significant difference to dimension processing. Test case H shows that AttributeHierarchyOrdered=false on high cardinality attributes can double performance of ProcessAdd.
  • ProcessUpdate is by far the most expensive option available. It is so much more expensive than ProcessFull that you should definitely verify whether processing scheme #1 actually performs slower than processing scheme #2 on your dataset.

Other Notes

  • Test case H unordered took about 5 minutes which is about exactly the time it took to duplicate the entire SSAS dimension directory to the same drive via copy/paste. (Those SSAS dimension files for 100 million rows took up about 8GB on disk.) So a good way to estimate your best-case ProcessAdd performance is to manually duplicate the dimension files to the same drive. This makes sense because during processing, Analysis Services works on a duplicate set of files until finished, at which point it swaps in the new files and drops the old files. This is done so there will be no cube downtime during processing.
  • Chris Webb discusses an alternate way of dealing with large dimensions by breaking them up. If your dimension ID was 123456789 then you could break that into three dimensions with keys 123, 456, and 789. While this makes end user ad-hoc analysis more clumsy, it certainly gets around the problem of large dimensions being expensive to process.
  • I also tried processing the dimension I tested above using the ProcessingGroup=ByTable setting, and the server ran out of memory by the time it had read in 60 million rows. Incidentally, page 68 of the performance guide has some helpful information on deciding when ByTable may perform better than ByAttribute. But bottom line, you won't know until you test it.
  • Test ProcessAdd thoroughly to make sure it is stable on your hardware and dataset. I've had sporatic trouble with it getting into an infinite loop and running out of disk space. If I had to guess, it appears to happen when your dimension can't be loaded into memory. I've reported this bug here.
  • I also researched the possibility of making the large dimension a ROLAP dimension, but I was disappointed with query performance.
Comments (0)

ProcessAdd Examples
Greg Galloway By Greg Galloway on 4/20/2007 6:25 PM

The concept of incremental processing is well documented by T.K. Anand's fabulous whitepaper on processing (which you should probably digest before continuing), but there is a severe lack of full working example XMLA commands. The purpose of this post is to provide full working examples of ProcessAdd against Adventure Works.

1. ProcessAdd Partition (download ProcessAdd Partition.xmla)
This example uses an out-of-line query binding to run a ProcessAdd on the 2004 partition of the Internet Sales measure group. Real-life use for such an XMLA command would require that you specify an out-of-line SQL query which retrieves rows that currently do not exist in the cube. No such rows were available in AdventureWorksDW, so I just specified a query which would add duplicate rows to the partition for illustration purposes. This XMLA command can be created in Management Studio by right-clicking on a partition, choosing Process, changing to Process Incremental, clicking Configure, setting a query, then clicking the Script button:

2. ProcessAdd Dimension (download ProcessAdd Dimension.xmla)
This example uses an out-of-line data source view to run a ProcessAdd on the Sales Territory dimension. Because dimensions can be built off of multiple tables from the DSV, you cannot use an out-of-line query binding like what was used for example #1. Instead, you have to use an out-of-line DSV. Real-life use for such an XMLA command would entail specifying a named query in the out-of-line DSV which returned only rows which did not exist in the Analysis Services dimension. Rows which already exist will be ignored, so it is wasteful to allow your query to return them. No such rows were available in AdventureWorksDW, so I just specified a query which returned rows that already exist in the dimension for illustration purposes. With the example out-of-line DSV using the query it does, the Sales Territory dimension will not actually get any extra members.

There is no UI for creating this XMLA command in Management Studio. So I recommend the following steps if you wish to build your own ProcessAdd command for a dimension.

  1. Right-click on the dimension in Management Studio, choose Process, change to ProcessUpdate, and click the Script button. Change ProcessUpdate to ProcessAdd in the XMLA script.
  2. In Business Intelligence Development Studio (BIDS), right-click in Solution Explorer on the Data Source object that currently provides the connection for processing this dimension, and choose View Code. Copy that XML. Paste it into the <Process> tag. Remove all the properties of the DataSource except for ID, Name, and ConnectionString. Note that even if you're not changing the connection string, it is still necessary to include an out-of-line data source with the out-of-line DSV. This is a known issue that will hopefully be fixed sometime after SP2.
  3. In BIDS, open the data source view which your dimension is currently processed off of. Right-click on the table(s) your dimension currently uses, and choose "Replace Table... With New Named Query..." Then edit the query specifying the appropriate where clause. Do not save the DSV changes. Right-click on the .dsv file in Solution Explorer and choose View Code. Copy the entire XML and paste it into the <Process> tag. Remove the CreatedTimestamp and LastSchemaUpdate properties (to avoid a warning message). There are other large sections (such as the Annotations) you can remove if you like, but they will not hurt anything.

Note that if you have more than one <Process> command, the out-of-line DSV must contain the tables required for all dimensions or partitions being processed which use that DSV normally. Here is such an example: Process Multiple Objects With Out-of-Line DSV.xmla


Normal processing involves Analysis Services querying a SQL database for the necessary data. Analysis Services also supports push mode processing where the data itself is provided in the XMLA command. Integration Services uses push mode processing whenever you have a Partition or Dimension Destination:

3. Push Mode ProcessAdd for Partition (download Push Mode ProcessAdd for Partition.xmla)
This example shows how to process a partition using push mode by specifying an out-of-line rowset. Generally push mode processing is done using Integration Services. If you need to do it without the aid of SSIS, I still recommend you build a simple package in SSIS, then run Profiler while you run the SSIS package. That XMLA query you will see in Profiler will not have the rowset schema or the actual data, but it will get you started. Note that in SSIS when you're using a Partition or Dimension destination, if a column is used twice (such as in the key and the name), you need to duplicate that column and put it in the pipeline twice. This is only a restriction of the UI in SSIS, not a restriction of push mode processing.

4. Push Mode ProcessAdd for Dimension (download Push Mode ProcessAdd for Dimension.xmla)
This example shows how to process a dimension using push mode by specifying an out-of-line rowset. See above for more information on push mode processing.


Miscellaneous Notes

  • If a dimension or partition is unprocessed, you can't do ProcessAdd on it. So make sure your processing procedures check this and do a ProcessFull the first time.
  • The MemoryUsage class of ASSP uses push mode processing to load a cube incrementally without reading from a SQL database.
  • This post goes hand-in-hand with another post which examines the performance of ProcessAdd.
  • Push mode processing is only allowed on partitions or dimensions which use one table from the DSV. So that eliminates snowflaked dimensions or measure groups with reference dimensions.

Comments (11)

Reporting Services SP2 and MDX Summary Rows
Greg Galloway By Greg Galloway on 3/27/2007 10:18 PM

If you have built any Reporting Services reports which use an Analysis Services datasource, SQL Server 2005 SP2 contains a fairly major change to be aware of. I’m going to describe the way it worked before SP2, then I’ll describe the change.

Prior to SP2

There’s been a good deal of frustration about how not all rows from an MDX query are shown as detail rows when used in an SSRS report. For instance, the following query returns two rows.

select [Measures].[Internet Sales Amount] on 0,
{[Product].[Category].[Bikes], [Product].[Category].[All]} on 1
from [Adventure Works]

But if used in an SSRS report, you only get one detail row because SSRS ignores any rows which contain an All member in the tuple defining that row. To explain that differently, the leftmost columns in the dataset originate from the “on rows” or “on 1” axis of your MDX query. If any of those contain an All member, that row will not be shown as a detail row. Notice how the All member is rendered as a null when previewing the dataset:

Of course, you can still use the summary rows from the MDX query by putting the following expression in a table/matrix group or in the header/footer:

=Aggregate(Fields!Internet_Sales_Amount.Value)

This behavior is sometimes helpful and sometimes annoying, especially for experienced MDX writers who expected the report to match their MDX query. It turns out that you now have an option with SP2.

With SP2

If you reference any field from that dataset as the parameter for the Aggregate function in an expression, then the old behavior continues. But if you don’t, then all rows from the MDX query become detail rows in the report. This is on a per dataset basis, not a per table/matrix basis.

If your reports were built with correct usage of the Aggregate expression, they will work fine in SP2. The only way SP2 could break your report is if your MDX dataset contains summary rows which you never intended to be used. Obviously, that should never be the case, but it is easy to see how a developer who didn’t understand how it worked before SP2 could build a report that might break with SP2. For instance, let’s say you built a report off the query mentioned above and setup the Layout tab as follows:

When this report is run prior to SP2 it looks like:

When this same report is run with SP2 you get totally bogus numbers:

It’s worth carefully testing before you apply SP2 to your production servers.

Comments (0)


Search Blogs

Blog Calendar
Archive
<July 2008>
SunMonTueWedThuFriSat
293012345