Microsoft SQL Server provides a completely new enterprise extraction, transformation, and loading (ETL) platform called SQL Server Integration Services (SSIS) that ships with the features, tools, and functionality to build both classic and innovative kinds of ETL-based applications. This article examines some of the exciting SSIS features that you can use to build ETL applications. Along the way, you will also learn how to build a simple package using the new Business Intelligence Development Studio, which is a key component of SSIS features and can be seen as sophisticated Visual Studio.
Previous Versions of SQL Server:
Back in the days when Microsoft SQL Server 6.5 was still a hot product, the bulk copy program (BCP command utility) was the only way to back up or export databases from SQL Server to other data sources, such as other database servers or text files. However, in subsequent SQL Server versions, Microsoft added a large number of features that were aimed at improving the productivity of SQL Server developers and administrators. One such feature is DTS (Data Transformation Services). By creating DTS packages, you can combine several tasks into one process and use any programming language that supports automation (such as Visual Basic.NET, Visual C#, or Managed C++) to execute these packages and monitor their progress for errors.
With the upcoming SQL Server 2005 release, Microsoft has raised the bar by introducing SSIS, the brand new ETL tool. Similar to DTS, it provides functions for copying data from one place to another and manipulating that data at run time. However, SSIS has been completely redesigned from scratch to be an enterprise ETL platform. SSIS provides the breadth of features—and very high-level performance—necessary to build enterprise-class ETL applications. The performance comes from the integration of SQL server into .Net CLR.
SSIS is fully programmable, embeddable, and extensible, which makes it an ideal ETL platform. It has a great development environment hosted in a Visual Studio shell with cool capabilities for building workflows and pipelines through a rich set of pre-built or custom components. SQL Server 2005 ships with the following development and management environments for designing and managing packages:
- The SQL Server Management Studio is an environment for managing the storage and execution of deployed packages. It has special features for doing this, including integration with the DTS Service, and the ability to enumerate packages on remote servers. But it is not a design environment.
- The Business Intelligence Design Studio is an environment for designing packages, organizing them in Solutions and Projects, debugging them, and managing source and version control for multi-user projects.
These two environments offer powerful facilities for deploying, debugging, and monitoring deployed packages. In addition, SQL Server 2005 provides rich workflow capabilities that you can use for performing sophisticated data manipulations.
There are too many SSIS features to cover in a single article, so this piece highlights some of the important ones and then moves on to creating a simple package with Business Intelligence Development Studio. The following are the most notable features:
- You can use SSIS to transfer millions of rows of data to and from heterogeneous data sources, but SSIS functionality doesn't stop there. The tool leverages the end-to-end BI suite by offering complete data integration, movement, and shaping, which means that SSIS provides data cleansing, extensibility, and interoperability.
- SSIS comes with a lot of pre-built data-cleansing functionality, including completely integrated functions, such as fuzzy matching and fuzzy grouping, which use algorithms to match or group disparate data to a configurable degree of accuracy.
- SSIS offers broad extensibility points for third-party component vendors, meaning that if functionality is not available out of the box, SSIS lets you build your own components or add a third-party component to solve your problem.
- SSIS also has greatly improved performance and scalability that allow you to host complex, high-volume ETL applications on lightweight servers, enabling you to scale down.
- SSIS can also help reduce ETL data staging areas and help minimize performance costs associated with data staging (disk I/O and serial processing). This is made possible by the ability to perform complex data transformations, data cleansing, and high-volume lookups—all inline from source to destination.
- SSIS also provides a new feature, the Slowly Changing Dimension (SCD) wizard. Through the SCD interface, you can rapidly generate all the steps and required code to add unique handling of history to multiple attributes in a given dimension.
- The development environment for SSIS, known as Business Intelligence Development Studio, is hosted in Visual Studio, enabling scripting and other programming tasks that take advantage of that enterprise development environment.
- SSIS now fully supports the Microsoft .NET Framework, allowing developers to program SSIS in their choice of .NET-compliant languages, as well as native code.
- The Data Transformation run-time engine is exposed both as a native COM object model and as an entirely managed object model. Although the Data Transformation engine is written in native code, it is available though a signed Primary Interop Assembly (PIA) that enables full managed access to it.
Now that you have a general understanding of the SSIS features, consider an example that shows the steps involved in constructing and executing a package using the Business Intelligence Development Studio environment.
Creating a Simple Package
Consider a simple example wherein you transfer data from one table to another table by executing a stored procedure. For each row in the source table, you will invoke the stored procedure to load the data into the destination table. Before looking at the package design, create the required tables and the stored procedure. You will create these database objects in the AdventureWorks database, which is one of the sample databases that ships with SQL Server 2005.
Using the package, you will retrieve the FirstName and LastName columns from the Person.Contact table and load them into the dbo.Contact Table in other database. Using the derived column transformation given in SSIS, you will concatenate the FirstName and LastName columns and assign the concatenated value to the FullName column of the Emp table. Now that you understand the package functionality, take a look at the design of the package, as shown in the next section.
Creating a Package Using Business Intelligence Studio
Before creating the package, you first need to create a project that can host the package. Open Business Intelligence Development Studio from the Start->Programs menu. Select File->New Project from the menu and you will see the dialog box in Figure 1.
Figure 1:New Project Dialog Box
As Figure 1 shows, select Integration Services Project as the project template and specify the name of the project as PracticeSSIS. Once the project is created, you will see the package designer window of the default package Package.dtsx.
For the requirements we've outlined, the package we are going to create is very simple and straightforward. It consists of one Data Flow task with two components inside the data flow: an OLE DB Source and an OLE DB Destination. To start, bring up the toolbox by selecting View->Toolbox from the menu. You will see the dialog box in Figure 2.
Figure 2:Toolbox Dialog Box
As you can see, the toolbox offers a number of options that you can leverage to construct your packages. Drag and drop a Data Flow Task from the toolbar onto the package designer. Next, double-click on the Data Flow Task and you will see the designer window in Figure 3, which you can use to build sophisticated data flows.
Figure 3:Data Flow Task Designer Window
Now, bring up the toolbox again by selecting View->Toolbox from the menu. From the toolbox, drag and drop an OLE DB Source onto the designer and double-click on the object to bring up the window in Figure 4, 5, 6. The window allows you to specify the data source as well as the name of the source table to use for transformation.
Figure 4:Designer View of OLE DB Source
Figure 5:Specify Connection and source table.
Figure 6:Specify columns you are interested in.
Click on OK in the Figure 6 dialog box.
Now, drag and drop a Derived Column Transformation onto the designer and connect the OLE DB Source object to it as shown in Figure 6.
Figure 7:Derived column transformation.
Double-click on the Derived Column object to bring up its properties window. Select <add as new column> under the derived column. Type FullName in Derived column name and put expression as FirstName + “ “ + LastName.
Figure 7:Setting properties for Derived column transformation.
Now, drag and drop an OLE DB destination object onto the designer and connect the Derived column transformation object to the OLE DB destination object. Double-click on the OLE DB destination object to bring up its properties window. Modify the Connection Managers tab of the properties window to look like Figure 8.
Figure 8:OLEDB destination transformation.
Figure 9:Setting properties for OLEDB destination transformation.
Now start the package execution by Debug->Start Debugging or pressing F5.
Figure 10:Output of a Successful Package Execution
Background Color of object:
Green: Success, Yellow: In progress, Red: Error occurred
If the package executes successfully, you will see an output that is somewhat similar to Figure 10, which shows the objects on a green background. The above screenshot also shows the number of rows transferred between the source and destination tables.
A powerful base for Creating Packages
With the release of SQL Server Integration Services, Microsoft now has a powerful ETL tool that is not only enterprise class but can also go a long way in increasing the productivity of developers. Its feature set makes it extremely easy and seamless to build sophisticated, high-performance ETL applications. This article has only scratched the surface of what is possible with SSIS, but the simple example it gave should provide a powerful base for creating packages with SSIS.