SSIS

Relative Paths in SSIS

We have chosen to use relative paths in Integration Services on several projects now, and since it has worked well, it is time to share this tip with the community. The whole purpose of using relative paths is to make packages more portable, as opposed to packages and dtsConfig files having to live in a specific directory structure or drive letter on servers and developers’ laptops. As evidenced by the volume of posts in the forums about problems with deploying SSIS packages, anything that makes packages more portable will make deployment less prone to error.

When I say “relative paths in SSIS” I mean the following:

  • .dtsx files and .dtsConfig files are all inside one directory
  • Execute Package tasks refer to the package to execute using a File connection manager, and that connection manager refers to the .dtsx file with a relative path (i.e. just the filename)
  • Package Configurations are of type “XML configuration file” and the path to the .dtsConfig file is just a relative path (i.e. just the filename)
  • dtexec executes packages from disk, not from SQL Server or from the SSIS package store
  • When dtexec is run, the working directory has to be set to the directory with all your files. This is accomplished by running dtexec from a .bat file which uses the CD command to set the working directory.
  • When designing packages, Visual Studio must be started with the appropriate working directory which is the directory with all the packages. This can be accomplished by double clicking the .sln file if it is in the same directory as the packages. Or you can write a .bat file that runs the CD command to change the working directory before it starts Visual Studio.

Screenshots

When setting up an Execute Package task, mimic the following screenshots to use a relative path:

When setting up a package configuration, use the relative path:

 

Unfortunately, in SSIS 2008, this Package Configuration Wizard screen does not allow you to enter a plain relative path due to a bug. However, there is a BIDS Helper feature which allows you to workaround this bug.

Setting Working Directory

An example .bat file which runs dtexec with the correct working directory would look like the following. The first line changes to the drive of the working directory. The second line actually changes to that directory. The third line runs dtexec.

D: 
cd "D:\MyPackages\" 
dtexec /FILE "master.dtsx" /MAXCONCURRENT " -1 " /CHECKPOINTING OFF  /REPORTING EW 

When setting up a SQL Agent job step, you cannot use the “SQL Server Integration Services Package” job step type in a SQL Agent job since it does not let you set the working directory. But you can use the “Operating System (CmdExec)” job step type which lets you run the .bat file. (To go further, some will argue here and here that you should always use the “Operating System (CmdExec)” even when you’re not needing to set the working directory for relative paths.) To run this bat file, set the “command” property to the full path to the bat file in quotes:

"D:\MyPackages\execute master.bat"

An example .bat file which runs Visual Studio with the correct working directory would look like the following. The first line changes to the drive of the working directory. The second line actually changes to that directory. The third line runs dtexec.

D: 
cd "D:\MyPackages\" 
"C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\devenv.exe" "D:\MyPackages\test.sln" 

If you forget and start Visual Studio from the Start Menu (such that the working directory is not set to your packages directory), the .dtsConfig file it creates for you from your relative path will end up in the “C:\Program Files\Microsoft Visual Studio 8\Common7\IDE” directory.

Adding .dtsConfig Files to Your Visual Studio Solution

You can include .dtsConfig files in your solution in Visual Studio. Move the .dtsConfig out of the directory with your .dtsx files, then in Windows Explorer copy the .dtsConfig file to your clipboard, then left click on the Miscellaneous folder in Solution Explorer in Visual Studio, then type Ctrl-V to paste it into your SSIS project. The .dtsConfig file is now in the same directory as your .dtsx files (which is important for relative paths, as discussed above). And your .dtsConfig file can now be checked into source control with your .dtsx files. And the .dtsConfig file will be readily accessible in Visual Studio. Opening that .dtsConfig by double clicking it, you can pretty-print the XML by pressing Ctrl-K then Ctrl-D. (Or if you have BIDS Helper installed, it will automatically pretty-print it for you upon opening it.)

As a side note, I should mention in this context that whenever I’m working with package configurations, I use the Expression and Configuration Highlighter feature of BIDS Helper which gives you a little teal triangle on anything controlled by a package configuration. I always look to see that every connection manager is controlled by an expression or a configuration.

Conclusion

There are several other blog posts I know of (here and here) around this topic, but they suggest solutions that are a bit more intrusive than the one we prefer which I’ve outlined in this post.

This relative paths strategy has saved us some headaches several times when moving the deployment destination for packages to different environments. So I would recommend you consider using the relative paths trick when deciding on the deployment and configuration strategy for your ETL project.

Incremental Many-to-Many Matrix Package

I was privileged to be able to work with Carl Rabeler from the SQLCAT team on a good whitepaper which performance tests several optimization techniques for many-to-many dimensions in Analysis Services. The conclusion was that a technique they call Matrix Relationship Optimization with the appropriate aggs results in the best MDX query performance. In a nutshell, the idea is that there are recurring patterns or signatures among m2m dimensions. For instance, in the Adventure Works database, every sales order in the database shares one of twelve common sales reason signatures. Query performance can be improved by reusing these shared signatures.

This optimization technique pushes some of the work to the ETL layer. Along with the whitepaper, they distributed sample Integration Services packages to load the matrix relationship tables. Those packages were, I thought, rather ingenious in their use of the script transforms to pivot and unpivot the sales reasons data and accomplish everything in one dataflow task. The packages they distributed are coded to do a full refresh during each run. I created a spinoff of their sample package that does an incremental load, so I thought I would post it in case it helps others get started. Download IncrementalSalesReasonMatrixLoad.zip and run against a fresh copy of AdventureWorksDW.

I would recommend using the incremental load package if (a) you have the ability to easily detect which rows in the source system have changed since the last successful run and (b) if the incremental load will not produce many unused signatures over time.

The whitepaper also discusses aggregations for m2m dimensions in detail. Carl’s experience with real customers suggested that aggs on the intermediate measure group are usually very helpful (i.e. are considerably smaller than the intermediate measure group itself). Thankfully, these aggs can be built as normal since the Query Subcube event in Profiler accurately reports the subcube vector.

On the other hand, he found that in real life, aggs on the data measure group to help with queries against m2m dimensions often end up being about the same size as the measure group itself, thus they are not helpful. It all just depends on your data. An agg is smaller on disk not only because it has a lower rowcount, but because it has fewer dimension keys. (It’s not as tall and not as wide.) Therefore, data measure groups with a large number of dimensions unused in queries benefit from aggs, even if the rowcount of the resulting agg alone isn’t much less than the measure group rowcount itself. So it is helpful to know how to build m2m aggs on the data measure group. Unfortunately, these aggs must be built by hand using Agg Manager, and they are rather tricky to build. Let me suggest the following steps for building them:

  1. Turn on Query Logging and set QueryLogSampling to 1.
  2. Install ASSP and run the following before each MDX query: CALL ASSP.ClearCache()
  3. Run your MDX queries which use the m2m dimension.
  4. Turn off Query Logging by setting QueryLogSampling to 0.
  5. Install BIDS Helper and launch Agg Manager. Right click on the agg design for your data measure group (e.g. Internet Sales) and choose “Add Aggregations from Query Log.” Click the “Execute SQL” button and then click OK after the query returns. Right click on the agg design and choose Edit.
  6. (Instead of using query logging, you can just run profiler and manually cut and paste the subcube vector from the Query Subcube event if you prefer.)
  7. Click the “Eliminate Redundancy” and “Eliminate Duplicates” buttons as usual.
  8. Click the “Validate Aggregations” button. It should prompt you with a message similar to: “This aggregation contains many-to-many dimension [Sales Reason]. It will not be used unless it also contains [Internet Sales Order Details].[Internet Sales Order]. The many-to-many dimension [Sales Reason] itself should not be included in the aggregation to workaround a bug.” For each agg with such a warning, edit that agg, then double-click the attribute in the m2m dimension which is already selected. This will unselect then attempt to reselect that attribute. At this point, Agg Manager will prompt you with a message similar to: “This aggregation will not be used when querying many-to-many dimension [Sales Reason] unless it also contains [Internet Sales Order Details].[Internet Sales Order]. The many-to-many dimension [Sales Reason] itself should not be included in the aggregation to workaround a bug. Would you like BIDS Helper to fix this for you?” Click Yes.

The other idea around many-to-many dimension optimization that I would like to mention is adding extra dimensions (particularly the time dimension) to the intermediate measure group and partitioning by them. Chris Webb explained this idea, too. To put it in my terms, if there’s a good deal of turnover among m2m signatures over time (i.e. signatures are retired over time) then this is a strategy to consider. In the whitepaper there are only a few tests in which the Matrix Relationship Optimization technique doesn’t perform as well as the other design alternatives. (The best example is on page 33.) I suspect that adding the time dimension to the intermediate measure group will help (though possibly at the expense of performance on other queries). As for the whitepaper, we agreed this situation is a bit more of a fringe scenario because retiring m2m signatures doesn’t happen as frequently in the real world, so we didn’t complicate the whitepaper by mentioning it. However, it is something to be aware of as you’re analyzing your m2m dimension data.

I’m glad to see that Erik Veerman’s m2m optimization ideas (#1 and #2) have now been fully fleshed out and publicized.

Oracle Drivers on an x64 Box for SSIS Development

As Steve McHugh describes very well, getting Oracle drivers to work on an x64 box for use with Integration Services 2005 development is a challenge. We are indebted to Steve for explaining the underlying problems, and you should review his article before proceeding. The following instructions are, we feel, an improvement over Steve’s instructions for the best way to get Oracle drivers working on an x64 box on which you wish to develop packages.

If all you want to do is run scheduled packages under the x64 version of SSIS, you can just do step 2. If you want to develop packages on this box or run the SQL Server Import/Export Wizard, then you need to do all the steps.

  1. Download the latest 32-bit drivers and install them. We would suggest Oracle10g Release 2 ODAC.
  2. Download the latest 64-bit drivers and install them. We would suggest Oracle10g Release 2 ODAC (64-bit) 10.2.0.3 for Windows x64.
  3. Because you have several drivers installed, we suggest you add a TNS_ADMIN environment variable which says “C:\TNS” and then move your tnsnames.ora and sqlnet.ora files to that one “C:\TNS” directory. You can add a TNS_ADMIN environment variable by right clicking My Computer on your desktop, choosing Properties, flipping to the Advanced tab, clicking the Environment Variables button, and adding that variable.
  4. Note: During one install of the drivers, it failed because it couldn’t find gacutil.exe. If you get that failure, uninstall the Oracle driver you just tried to install, then copy gacutil.exe and gacutil.exe.config from <C:\program files\Microsoft Visual Studio 8\SDK\v2.0\Bin> to <C:\program files (x86)\Microsoft Visual Studio 8\SDK\v2.0\Bin>. The most recent time we did the installs, this was not a problem, so maybe Oracle has fixed this issue.
  5. All 32-bit SQL Server applications start under the “Program Files (x86)” directory. You need to fix them so they start in the “Progra~2” directory. The following steps will need to be rerun after every subsequent SQL service pack is installed.
    • To fix the SQL Server Import/Export Wizard so you can launch it from Management Studio, fix the reference to it in the registry by running the “SQL import-export wizard on 64-bit server.reg” file that I have included in the ZIP file below. (That registry file fixes the HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\MSDTS\Setup\WizardPath registry path.)
    • The shortcuts to SQL Server Management Studio and SQL Server Business Intelligence Development Studio need to be fixed. Unfortunately, you can’t just change the shortcut, because Windows automatically expands the “Progra~2”. Instead, create a bat file that launches the EXE, then change the shortcut to point to the bat file. These bat files and shortcuts are included in the ZIP file below. Basically, those bat files look like:
      start /B "C:\Progra~2\Microsoft Visual Studio 8\Common7\IDE" "C:\Progra~2\Microsoft Visual Studio 8\Common7\IDE\devenv.exe"
    • When you double-click a .sln file, the path it uses for Visual Studio needs to be fixed. If you only have Visual Studio 2005 (i.e. SQL Server 2005) installed on this server, you can make the following change. Open the C drive… Go to the Tools menu… Folder Options… Flip to the File Types tab… Type in SLN to skip down to the SLN file type… Click the Advanced button… Highlight the “Open” action… Click Edit… Change the path to say:
      "C:\Progra~2\Microsoft Visual Studio 8\Common7\IDE\devenv.exe" "%1"
      Consider fixing any other file extensions you wish to double click which should launch 32-bit processes.
    • Fix the PATH environment variable by changing any reference that says “C:\Program Files (x86)\Microsoft SQL Server\” to “C:\Progra~2\Microsoft SQL Server\”. And change any reference to “C:\Program Files (x86)\Microsoft Visual Studio 8” to “C:\Progra~2\Microsoft Visual Studio 8”. Environment variables can be edited by right clicking My Computer on your desktop, choosing Properties, flipping to the Advanced tab, clicking the Environment Variables button, choosing the Path system variable, and clicking Edit.
    • Fix the shortcuts for other 32-bit applications that need to connect to Oracle (such as Toad) as outlined in step B above.
  6. We had trouble when our master SSIS package launched child packages out-of-process if those child packages needed to connect to Oracle. We decided to change those Execute Package Tasks to run in-process, and we didn’t investigate further.
  7. You will probably have to set the AlwaysUseDefaultCodePage property to True on the OLE DB Source components in your data flow tasks that pull from Oracle.

Download a oracle x64 fix.zip which contains a couple of shortcuts and the .reg file used in step 5a above.

Miscellaneous Note: To setup a linked server to Oracle, review this post by Max Oleznyuk.