ProcessFull, ProcessUpdate, and ProcessAdd are the three processing options for keeping data fresh in dimensions in Analysis Services 2005. This post takes a look at these options as they relate to processing very large dimensions.
When considering the different possibilities for a nightly Analysis Services processing strategy, we usually recommend the following evolution which attempts to weigh maintenance complexity versus processing performance. The first processing scheme is low complexity. The other partitioning schemes add complexity in order to improve performance.
Processing Scheme #1: ProcessFull entire database
If doing a ProcessFull on the entire database completes within the allowed time, then this is obviously the most straightforward method of keeping your cubes fresh. If ProcessFull takes too long, then...
Processing Scheme #2: ProcessUpdate dimensions and process fewest partitions possible
Next we recommend they look into partitioning their measure groups such that they only have to process a small subset of the partitions every night. For instance, if you partition by month, you may be able to just process the current month's partition nightly. Since not all partitions will be reprocessed nightly, dimension processing must be done so as not to unprocess the existing partitions. The easiest way is to accomplish this is to ProcessUpdate all dimensions, then ProcessData the partitions that have changed data (i.e. the current month), then ProcessDefault each measure group. (ProcessDefault will process aggs and indexes on the partitions you just processed. In addition, it will reprocess the flexible aggs which were dropped on old partitions during ProcessUpdate of dimensions.) This strategy is moderate complexity and requires custom coding to process your cube database. If this strategy isn't fast enough, it may be that there is an opportunity to optimize the processing of huge dimensions...
Processing Scheme #3: ProcessAdd dimensions instead of ProcessUpdate
Next we recommend looking into whether your huge dimensions allow updates or just inserts in your SQL data warehouse. If you can guarantee there will be only inserts and no updates, then you can speed up the processing of these huge dimensions by doing a ProcessAdd on them. This option adds more complexity on top of #2, so the purpose of this post is to discuss whether this option is worth it.
We have previously encountered a massive dimension which adds 20 million rows per month. At the time, it was being maintained using processing scheme #2, but the nightly processing performance needed to be tuned further. Since you can't partition dimensions in Analysis Services 2005, ProcessAdd is the option you're left with. I did the following research to decide whether it was worth the added complexity to recommend processing scheme #3. We ended up deciding it was worth it, but in order to do it, we had to move the columns from the large dimension which allowed updates and put them in a smaller, separate dimension. This seemed a reasonable compromise.
This post goes hand-in-hand with another post which gives full working XMLA examples of various ProcessAdd commands.
Setup for the Performance Test
I created a simple table:
create table LargeDimension (
ID int not null primary key clustered
,NumberAttribute int not null
,StringAttribute varchar(10) not null
,Subcategory varchar(10) not null
,Category varchar(10) not null
And built an Analysis Services database with nothing but the following dimension: