Incremental Many-to-Many Matrix Package

I was privileged to be able to work with Carl Rabeler from the SQLCAT team on a good whitepaper which performance tests several optimization techniques for many-to-many dimensions in Analysis Services. The conclusion was that a technique they call Matrix Relationship Optimization with the appropriate aggs results in the best MDX query performance. In a nutshell, the idea is that there are recurring patterns or signatures among m2m dimensions. For instance, in the Adventure Works database, every sales order in the database shares one of twelve common sales reason signatures. Query performance can be improved by reusing these shared signatures.

This optimization technique pushes some of the work to the ETL layer. Along with the whitepaper, they distributed sample Integration Services packages to load the matrix relationship tables. Those packages were, I thought, rather ingenious in their use of the script transforms to pivot and unpivot the sales reasons data and accomplish everything in one dataflow task. The packages they distributed are coded to do a full refresh during each run. I created a spinoff of their sample package that does an incremental load, so I thought I would post it in case it helps others get started. Download and run against a fresh copy of AdventureWorksDW.

I would recommend using the incremental load package if (a) you have the ability to easily detect which rows in the source system have changed since the last successful run and (b) if the incremental load will not produce many unused signatures over time.

The whitepaper also discusses aggregations for m2m dimensions in detail. Carl’s experience with real customers suggested that aggs on the intermediate measure group are usually very helpful (i.e. are considerably smaller than the intermediate measure group itself). Thankfully, these aggs can be built as normal since the Query Subcube event in Profiler accurately reports the subcube vector.

On the other hand, he found that in real life, aggs on the data measure group to help with queries against m2m dimensions often end up being about the same size as the measure group itself, thus they are not helpful. It all just depends on your data. An agg is smaller on disk not only because it has a lower rowcount, but because it has fewer dimension keys. (It’s not as tall and not as wide.) Therefore, data measure groups with a large number of dimensions unused in queries benefit from aggs, even if the rowcount of the resulting agg alone isn’t much less than the measure group rowcount itself. So it is helpful to know how to build m2m aggs on the data measure group. Unfortunately, these aggs must be built by hand using Agg Manager, and they are rather tricky to build. Let me suggest the following steps for building them:

  1. Turn on Query Logging and set QueryLogSampling to 1.
  2. Install ASSP and run the following before each MDX query: CALL ASSP.ClearCache()
  3. Run your MDX queries which use the m2m dimension.
  4. Turn off Query Logging by setting QueryLogSampling to 0.
  5. Install BIDS Helper and launch Agg Manager. Right click on the agg design for your data measure group (e.g. Internet Sales) and choose “Add Aggregations from Query Log.” Click the “Execute SQL” button and then click OK after the query returns. Right click on the agg design and choose Edit.
  6. (Instead of using query logging, you can just run profiler and manually cut and paste the subcube vector from the Query Subcube event if you prefer.)
  7. Click the “Eliminate Redundancy” and “Eliminate Duplicates” buttons as usual.
  8. Click the “Validate Aggregations” button. It should prompt you with a message similar to: “This aggregation contains many-to-many dimension [Sales Reason]. It will not be used unless it also contains [Internet Sales Order Details].[Internet Sales Order]. The many-to-many dimension [Sales Reason] itself should not be included in the aggregation to workaround a bug.” For each agg with such a warning, edit that agg, then double-click the attribute in the m2m dimension which is already selected. This will unselect then attempt to reselect that attribute. At this point, Agg Manager will prompt you with a message similar to: “This aggregation will not be used when querying many-to-many dimension [Sales Reason] unless it also contains [Internet Sales Order Details].[Internet Sales Order]. The many-to-many dimension [Sales Reason] itself should not be included in the aggregation to workaround a bug. Would you like BIDS Helper to fix this for you?” Click Yes.

The other idea around many-to-many dimension optimization that I would like to mention is adding extra dimensions (particularly the time dimension) to the intermediate measure group and partitioning by them. Chris Webb explained this idea, too. To put it in my terms, if there’s a good deal of turnover among m2m signatures over time (i.e. signatures are retired over time) then this is a strategy to consider. In the whitepaper there are only a few tests in which the Matrix Relationship Optimization technique doesn’t perform as well as the other design alternatives. (The best example is on page 33.) I suspect that adding the time dimension to the intermediate measure group will help (though possibly at the expense of performance on other queries). As for the whitepaper, we agreed this situation is a bit more of a fringe scenario because retiring m2m signatures doesn’t happen as frequently in the real world, so we didn’t complicate the whitepaper by mentioning it. However, it is something to be aware of as you’re analyzing your m2m dimension data.

I’m glad to see that Erik Veerman’s m2m optimization ideas (#1 and #2) have now been fully fleshed out and publicized.