Tuesday, January 25, 2011

Informatica Pushdown Optimization


What is Pushdown Optimization and things to consider

The process of pushing transformation logic to the source or target database by Informatica Integration service is known as Pushdown Optimization. When a session is configured to run for Pushdown Optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The Source or Target Database executes the SQL queries to process the transformations.

How does Pushdown Optimization (PO) Works?

The Integration Service generates SQL statements when native database driver is used. In case of ODBC drivers, the Integration Service cannot detect the database type and generates ANSI SQL.  The Integration Service can usually push more transformation logic to a database if a native driver is used, instead of an ODBC driver.
For any SQL Override, Integration service creates a view (PM_*) in the database while executing the session task and drops the view after the task gets complete. Similarly it also create sequences (PM_*) in the database.
Database schema (SQ Connection, LKP connection), should have the Create View / Create Sequence Privilege, else the session will fail.

Few Benefits in using PO

  • There is no memory or disk space required to manage the cache in the Informatica server for Aggregator, Lookup, Sorter and Joiner Transformation, as the transformation logic is pushed to database.
  • SQL Generated by Informatica Integration service can be viewed before running the session through Optimizer viewer, making easier to debug.
  • When inserting into Targets, Integration Service do row by row processing using bind variable (only soft parse – only processing time, no parsing time). But In case of Pushdown Optimization, the statement will be executed once.
Without Using Pushdown optimization:
INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL,
PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT,
MANAGER_ID,MANAGER_NAME,
DEPARTMENT_ID) VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13) –executes 7012352 times
With Using Pushdown optimization
INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, MANAGER_NAME, DEPARTMENT_ID) SELECT CAST(PM_SJEAIJTJRNWT45X3OO5ZZLJYJRY.NEXTVAL AS NUMBER(15, 2)), EMPLOYEES_SRC.EMPLOYEE_ID, EMPLOYEES_SRC.FIRST_NAME, EMPLOYEES_SRC.LAST_NAME, CAST((EMPLOYEES_SRC.EMAIL || ‘@gmail.com’) AS VARCHAR2(25)), EMPLOYEES_SRC.PHONE_NUMBER, CAST(EMPLOYEES_SRC.HIRE_DATE AS date), EMPLOYEES_SRC.JOB_ID, EMPLOYEES_SRC.SALARY, EMPLOYEES_SRC.COMMISSION_PCT, EMPLOYEES_SRC.MANAGER_ID, NULL, EMPLOYEES_SRC.DEPARTMENT_ID FROM (EMPLOYEES_SRC LEFT OUTER JOIN EMPLOYEES PM_Alkp_emp_mgr_1 ON (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID)) WHERE ((EMPLOYEES_SRC.MANAGER_ID = (SELECT PM_Alkp_emp_mgr_1.EMPLOYEE_ID FROM EMPLOYEES PM_Alkp_emp_mgr_1 WHERE (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID))) OR (0=0)) –executes 1 time

Things to note when using PO

There are cases where the Integration Service and Pushdown Optimization can produce different result sets for the same transformation logic. This can happen during data type conversion, handling null values, case sensitivity, sequence generation, and sorting of data.
The database and Integration Service produce different output when the following settings and conversions are different:
  • Nulls treated as the highest or lowest value: While sorting the data, the Integration Service can treat null values as lowest, but database treats null values as the highest value in the sort order.
  • SYSDATE built-in variable: Built-in Variable SYSDATE in the Integration Service returns the current date and time for the node running the service process. However, in the database, the SYSDATE returns the current date and time for the machine hosting the database. If the time zone of the machine hosting the database is not the same as the time zone of the machine running the Integration Service process, the results can vary.
  • Date Conversion: The Integration Service converts all dates before pushing transformations to the database and if the format is not supported by the database, the session fails.
  • Logging: When the Integration Service pushes transformation logic to the database, it cannot trace all the events that occur inside the database server. The statistics the Integration Service can trace depend on the type of pushdown optimization. When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.

Monday, January 24, 2011

Xcelsius Dashboards – Integration with SQL Server Reporting Services


Pre – requisite
  • Xcelsius Reporting Services (XRS) gateway needs to be installed on a web server where IIS, .NET framework and SQL Server are installed and configured.
  • SSRS (SQL Server Reporting Services) reports need to be deployed in the SQL server, so that it can be accessible in the dashboard.
Xcelsius Connector to be used
Reporting Services Button can be used to get data from the SQL Server report which is deployed in the SQL server.   The deployed report can be accessed through the below URL
http://servername/xrs/xrs.asmx/GetReports”
Here the servername is the SQL Server Name.
Building the Dashboard
  • Create a report using the SQL Server reporting services and deploy it to the server.
  • While creating the dashboard, use the Reporting Services Button to connect to the SSRS data source as below:
  • In the URL box, enter the path where the reports are deployed.   On clicking the submit button, the reports in the server are listed.
  • Select the target report and do the necessary data mapping in the underlying excel.
  • If the report contains prompts, they would be listed in the report parameters that can be used by users at run time to pass values.
  • Load the data into the dashboard based on the any one of the options available given below as per the requirement:
  • Refresh on Load: Loads the data to the dashboard as soon as it is opened.
  • Refresh on Interval: Loads the dashboard in periodic intervals.
  • Trigger Behavior: Loads the dashboard based on an action in the dashboard.
  • Generate the flash file (.swf) from the dashboard and deploy it portal for users to view the dashboard.
Hope you will be able to leverage your SQL Server environment effectively for integrating with Xcelsius dashboards.  Please get back to me for any queries.  Have an enjoyable 2011!  Happy year ahead friends!

Monday, January 17, 2011

Business Objects Query Builder


Accessing Query Builder

To access the Query Builder, point your web browser to your BusinessObjects server.  Query Builder can be found at the following URL:  http://[server]:[port]/AdminTools/.


Log on as an Administrator to get full access to all the repository objects.  From here you can start writing your query.  There are three Info objects tables that you can query:
  • CI_INFOOBJECTS
    Contains objects that are often used to build the user desktop, such as favorites folders and reports.
  • CI_SYSTEMOBJECTS
    Contains objects that are often used to build the admin desktop and internal system objects, such as servers, connections, users, and user groups.
  • CI_APPOBJECTS
    Contains objects that represent BusinessObjects Enterprise Solutions. For example, the InfoView and Desktop Intelligence objects are stored in this table.
Following columns are the frequently used from the above repository tables

Column Description
SI_IDIdentifies each InfoObject instance uniquely in the database. But, this is not a primary key. If the instance is deleted, the value may later be reassigned to a new instance.
SI_NAMEName of the InfoObject instance.
SI_KINDIdentifies each row by a particular InfoObject extended class type.
SI_KIND for CI_INFOOBJECTS includes Webi, Pdf, Excel, Folder, FullClient, FavoritesFolder, Inbox, PersonalCategory, Shortcut, MyInfoView
SI_KIND for CI_APPOBJECTS includes Universe, Universe Folder, MetaData.DataConnection,ReportConvTool, WebIntelligence, Discussions, InfoView, CMC, busobjReporter, Designer, AdHoc
SI_KIND for CI_SYSTEMOBJECTS includes User, UserGroup,Connection,secWinAD, secLDAP, secWindowsNT
SI_OWNERIDUser ID of the owner
SI_OWNERUser name of the owner
SI_CHILDRENNumber of children for the Infoobject
SI_CUIDCUIDs are Cluster Unique Identifiers that uniquely identify an InfoObject, within a given cluster and also identify replicas or copies of an object across multiple CMS clusters. Because CUIDs are moderately lengthy strings they are less efficient to use and slower to query for.
SI_UNIVERSEUniverses used by the document, there might be multiple universes used in one document; you may see a list of universes’ SI_ID attached to the property.
SI_PARENTIDIdentifies the InfoObject instance that operates in a parent relationship to the current InfoObject. Typically, a report that is configured to be scheduled is a parent, and each report that is copied and stored when scheduled will view the source report as its parent.
SI_INSTANCEIdentifies whether the item that is stored in the database row is an InfoObject that was created through scheduling (such as a nightly report) and is therefore an ‘instance‘.


Relationship between InfoObjects
CMS InfoObjects are organized into hierarchies based on the relationship between them. The hierarchy could be based on folder based or user group.
From above diagram, the InfoObjects relate to each other not only by folder hierarchy, they may have other relationships. For example, the SI_OWNERID is the property to identify the ownership from the user to the document.

Sample Queries

SELECT * FROM CI_INFOOBJECTS
Returns the details for all the ‘InfoObjects’ (documents, folders, and other content) in your repository; you can filter this list using a WHERE clause.

SELECT * FROM CI_INFOOBJECTS WHERE SI_KIND=’CrystalReport’
Returns all ‘Crystal Reports’.

SELECT * FROM CI_INFOOBJECTS WHERE SI_KIND=’Webi’
Returns all ‘Web Intelligence documents’.

SELECT * FROM CI_APPOBJECTS WHERE SI_KIND=’Universe’
Returns all ‘Universes’ in the BOE Repository.

SELECT * FROM CI_SYSTEMOBJECTS WHERE SI_KIND=’User’
Returns all Users in the BOE Repository.

Improving Query Performance in Query Builder


1. For improved performance use the below Indexed properties in query’s WHERE clause wherever required.

SI_CUID
SI_GUID
SI_HIDDEN_OBJECT
SI_ID
SI_INSTANCE_OBJECT
SI_KIND
SI_NAME
SI_NAMEDUSER
SI_NEXTRUNTIME
SI_OWNERID
SI_PARENTID
SI_PLUGIN_OBJECT
SI_RECURRING
SI_RUID
SI_RUNNABLE_OBJECT
SI_SCHEDULE_STATUS
SI_UPDATE_TS
SI_INSTANCE


2. Order of the above properties in WHERE clause also improves the Query performance as the Query Builder processes queries from top to bottom and left to right. So the selection criteria should be ordered from the most restrictive to the least restrictive.

For example, SI_NAME = ‘Test Report’ should be placed before SI_KIND = ‘WebI’ in the query.

I will discuss on few more queries in the next blog that will be followed by the File Repository Server details.

Happy blogging!  Have a good year ahead!

Monday, January 3, 2011

Informatica Performance Improvement Tips


We often come across situations where Data Transformation Manager (DTM) takes more time to read from Source or when writing in to a Target. Following standards/guidelines can improve the overall performance.
  • Use Source Qualifier if the Source tables reside in the same schema
  • Make use of Source Qualifer  “Filter” Properties if the Source type is Relational.
  • If the subsequent sessions are doing lookup on the same table, use persistent cache in the first session. Data remains in the Cache and available for the subsequent session for usage.
  • Use flags as integer, as the integer comparison is faster than the string comparison.
  • Use tables with lesser number of records as master table for joins.
  • While reading from Flat files, define the appropriate data type instead of reading as String and converting.
  • Have all Ports that are required connected to Subsequent Transformations else check whether we can remove these ports
  • Suppress ORDER BY using the ‘–‘ at the end of the query in Lookup Transformations
  • Minimize the number of Update strategies.
  • Group by simple columns in transformations like Aggregate, Source Qualifier
  • Use Router transformation in place of multiple Filter transformations.
  • Turn off the Verbose Logging while moving the mappings to UAT/Production environment.
  • For large volume of data drop index before loading and recreate indexes after load.
  • For large of volume of records Use Bulk load Increase the commit interval to a higher value large volume of data
  • Set ‘Commit on Target’ in the sessions