Microsoft SSIS or SQL Server Integration Services is a data migration and integration tool that comes with the Microsoft SQL Server database that can be used to extract, integrate, and transform data. SSIS is an Extract, Transform and Load (ETL) solution.
SSIS is an upgrade of Data Transformation Services (DTS), which was an old data transformation solution included with SQL Server. The tool comes with the Microsoft SQL Server database and is used to extract data from relational databases and files so that it can be transformed.
File types supported by SSIS include XML, flat files, and Microsoft Excel. Many enterprises use the software as a data warehousing solution. Some of the tasks you can do with SSIS include:
- 1 SSIS Version History
- 2 Why is SSIS Important?
- 3 The Main SSIS Component
- 4 SSIS Packages
- 5 SSIS Tasks
- 6 Advantages and Disadvantages of SSIS
- 7 The main advantages of SSIS
- 8 SSIS is a ‘must-have’ ETL tool
SSIS Version History
SSIS may have been around for many years but it’s actually the successor to another program called Data Transformation Services (DTS). DTS was a component of SQL Server 7 and 2000 before Microsoft decided to upgrade the platform into the SSIS platform enterprises use today. The versions of SSIS are as follows:
|SSIS 2005||The first release of SSIS that replaced DTS.|
|SSIS 2008 / 2008 R2||Improved the performance of the previous version and introduced new sources.|
|SSIS 2012||Introduced the project deployment model, enabling users to deploy entire projects and packages to a server rather than being limited to individual packages.|
|SSIS 2014||No changes but added new sources and transformations through CodePlex and the SQL Server Feature Pack.|
|SSIS 2016||Updated SSIS so that the user can deploy entire packages and added additional cloud and big data sources.|
Why is SSIS Important?
SSIS is important because it provides the user with a platform to move data from one source to another. The user can extract data from sources like SQL databases, Oracle databases, DB2 databases, and Microsoft Excel files before exporting them to another location. Other data integration and extraction features make the transition more manageable.
One of the reasons why SSIS is so widely used is because of its versatility. The program’s support for a range of databases and its simple graphical user interface (GUI) help new users to complete transformations and data warehousing tasks. Users have complete control to design an ETL process around their own requirements.
Related Post: SQL Server Monitoring Tools
The Main SSIS Component
There are a number of core components that make up the SSIS GUI in the SSIS Designer. These include:
- Control Flowtab – Control flows are parts of a package that provide structure, tasks, and precedence constraints.
- Data Flow tab – Pull and transform data from a variety of sources on the device. Data flow components include sources, transformations, and destinations.
- Event Handlerstab – Create tasks in response to events at run-time.
- Package Explorer tab – View a top-down perspective of package elements.
The control flow is where you control the workflow of tasks inside a package and the order in which those tasks will be executed. You can use this through the Control Flow tab. In SSIS, control flows are part of packages and enable the user to execute different tasks.
Types of tasks you can execute include executeSQL tasks, data flow tasks, execute package tasks, script tasks, send mail tasks, and more (we’ve discussed tasks in further detail below). Control flows also include precedence constraints that connect executables, containers, and tasks together while dictating which order to complete the tasks.
A data flow is essentially an element that allows you to take data from a source, transform it, and place it into another destination. For example, you could use a data flow task to take data from a database and transform it into a Microsoft Excel file. You can manage data flows via the Data Flow tab and add them to the control flow.
There are three components that make up the data flow:
- Sources – Extract data from sources including files, databases, and other locations.
- Transformations – Process the data after it has been extracted from the source.
- Destinations – Write the data in its end destination.
As part of the data flow, you can choose the transformations you want to prepare the data to reach its end location and complete the ETL cycle. Transformations handle tasks like data cleansing, merging, sorting, joining, and distributing so that you can use the data where you need it.
Executables create events at run time and you can use the events handler to perform tasks in response. Tasks you can perform include cleaning up temporary data storage, retrieving system information, refreshing data in a table, and sending emails to notify you when there is an error or when a task fails.
Event handlers are similar to packages in the sense that they have tasks and containers. You can create custom event handlers in the Event Handlers tab and then add them to a package.
In the Package Explorer tab, you can view the hierarchy of elements of a package including connection managers, log providers, executables, precedence constraints, parameters, event handlers, variables, tasks, and containers. The package explorer is important for checking the contents of a package.
Related post: SQL Performance Tuning
One of the fundamental tasks you will have to learn when using SSIS is how to create a package. A package is a collection of connections, control flow elements, data flow elements, event handlers, parameters, and configurations that you use to process data. The contents of a package can be broken down into three components:
- Control flow (Tasks and containers) – The control flow includes tasks and containers that execute when the package is run.
- Data flow (Data sources and destinations) – The data flow determines how data is processed and transformed from source to destination.
- Connection managers – The link between the package and a data source that defines the connection string other components of the package use to interact with the source data.
In SSIS you can add tasks to the control flow. There are a range of different tasks that you can configure. Some of the main tasks you can complete are:
- Execute SQL Task – Allows you to run one or more SQL statements from a package.
- Data Flow Task – Extracts, transforms and loads data from a single source or multiple sources. Cleans and modifies data in transit.
- Analysis Services Processing Task – Processes Analysis Services objects (tabular models, cubes, dimensions, etc.). Processes one or multiple objects at a time.
- Execute Package Task – Allows a package to run another package as part of the workflow. Often used to breakdown complex workflows into multiple packages.
- Execute Process Task – Used to run an application or batch file as part of an SSIS package.
- FTP Task – Downloads and uploads data files with FTP. Also allows the user to copy files to a destination directory.
- Script Task – Enter code to perform a task that isn’t included in SSIS. Can combine multiple tasks in one script.
- Send Mail Task – Sends emails to the user if a package workflow succeeds, fails or an event (e.g. if a database backup fails) takes place.
- XML Task – Access, merge, validate and modify XML files. Can be used to merge multiple XML data sources into one document.
Advantages and Disadvantages of SSIS
Like any other tool, SSIS has its own set of advantages and disadvantages as an ETL tool. Ultimately the significance of these depends on how you wish to use the tool within your organization.
The main advantages of SSIS
SSIS brings to the table a high degree of flexibility as an ETL and data transformation tool. The user can import data from heterogeneous data sources and then transform it how they see fit. Configuration options like control flows and data flows gives the user complete control over the process.
Ease of Use
Coming to grips with the user interface and configuring SSIS is simple. Everything you need is easy to find, and you can click through to the control flow and data flow tabs without searching endlessly. If you’re ever unsure about structure, you can check through the Package Explorer.
SSIS offers a number of features that enable you to create documentation as you transform your data. You can add annotated packages, control flows, and data flows. Writing descriptions for SSIS elements allows you to make a record as you go.
The disadvantages of SSIS
Limited Operating System Support
One of the biggest limitations of SSIS is that it only supports Microsoft Windows. Even with integrations, SSIS’s lack of support for other operating systems limits its data integration capabilities in many environments. If your environment is primarily based on Windows then this drawback will be less significant.
Lack of integration Options
The limited OS support is made worse by SSIS’s limited integration with other tools. While there are extensions you can use these pale in comparison to the versatility of open-source tools like Python. SSIS isn’t necessarily a fit for every environment, particularly if you don’t want to be confined to Windows services.
SSIS is a ‘must-have’ ETL tool
If you’re looking to put data through an ETL process then SSIS is a must-have. Extracting data from sources and transforming it so it can be loaded into another destination is SSIS’s bread and butter. The software is suitable for companies regardless of whether they want to integrate one or multiple data sources.
While creating tasks to cleanse and process data can be a challenge initially but the benefits of adopting the tool are also steep. Taking the time to learn SSIS will make sure that you can prepare your data to be available where you need it when you need it.