Tuesday, January 25, 2011

Informatica Pushdown Optimization


What is Pushdown Optimization and things to consider

The process of pushing transformation logic to the source or target database by Informatica Integration service is known as Pushdown Optimization. When a session is configured to run for Pushdown Optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The Source or Target Database executes the SQL queries to process the transformations.

How does Pushdown Optimization (PO) Works?

The Integration Service generates SQL statements when native database driver is used. In case of ODBC drivers, the Integration Service cannot detect the database type and generates ANSI SQL.  The Integration Service can usually push more transformation logic to a database if a native driver is used, instead of an ODBC driver.
For any SQL Override, Integration service creates a view (PM_*) in the database while executing the session task and drops the view after the task gets complete. Similarly it also create sequences (PM_*) in the database.
Database schema (SQ Connection, LKP connection), should have the Create View / Create Sequence Privilege, else the session will fail.

Few Benefits in using PO

  • There is no memory or disk space required to manage the cache in the Informatica server for Aggregator, Lookup, Sorter and Joiner Transformation, as the transformation logic is pushed to database.
  • SQL Generated by Informatica Integration service can be viewed before running the session through Optimizer viewer, making easier to debug.
  • When inserting into Targets, Integration Service do row by row processing using bind variable (only soft parse – only processing time, no parsing time). But In case of Pushdown Optimization, the statement will be executed once.
Without Using Pushdown optimization:
INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL,
PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT,
MANAGER_ID,MANAGER_NAME,
DEPARTMENT_ID) VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13) –executes 7012352 times
With Using Pushdown optimization
INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, MANAGER_NAME, DEPARTMENT_ID) SELECT CAST(PM_SJEAIJTJRNWT45X3OO5ZZLJYJRY.NEXTVAL AS NUMBER(15, 2)), EMPLOYEES_SRC.EMPLOYEE_ID, EMPLOYEES_SRC.FIRST_NAME, EMPLOYEES_SRC.LAST_NAME, CAST((EMPLOYEES_SRC.EMAIL || ‘@gmail.com’) AS VARCHAR2(25)), EMPLOYEES_SRC.PHONE_NUMBER, CAST(EMPLOYEES_SRC.HIRE_DATE AS date), EMPLOYEES_SRC.JOB_ID, EMPLOYEES_SRC.SALARY, EMPLOYEES_SRC.COMMISSION_PCT, EMPLOYEES_SRC.MANAGER_ID, NULL, EMPLOYEES_SRC.DEPARTMENT_ID FROM (EMPLOYEES_SRC LEFT OUTER JOIN EMPLOYEES PM_Alkp_emp_mgr_1 ON (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID)) WHERE ((EMPLOYEES_SRC.MANAGER_ID = (SELECT PM_Alkp_emp_mgr_1.EMPLOYEE_ID FROM EMPLOYEES PM_Alkp_emp_mgr_1 WHERE (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID))) OR (0=0)) –executes 1 time

Things to note when using PO

There are cases where the Integration Service and Pushdown Optimization can produce different result sets for the same transformation logic. This can happen during data type conversion, handling null values, case sensitivity, sequence generation, and sorting of data.
The database and Integration Service produce different output when the following settings and conversions are different:
  • Nulls treated as the highest or lowest value: While sorting the data, the Integration Service can treat null values as lowest, but database treats null values as the highest value in the sort order.
  • SYSDATE built-in variable: Built-in Variable SYSDATE in the Integration Service returns the current date and time for the node running the service process. However, in the database, the SYSDATE returns the current date and time for the machine hosting the database. If the time zone of the machine hosting the database is not the same as the time zone of the machine running the Integration Service process, the results can vary.
  • Date Conversion: The Integration Service converts all dates before pushing transformations to the database and if the format is not supported by the database, the session fails.
  • Logging: When the Integration Service pushes transformation logic to the database, it cannot trace all the events that occur inside the database server. The statistics the Integration Service can trace depend on the type of pushdown optimization. When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.

Monday, January 24, 2011

Xcelsius Dashboards – Integration with SQL Server Reporting Services


Pre – requisite
  • Xcelsius Reporting Services (XRS) gateway needs to be installed on a web server where IIS, .NET framework and SQL Server are installed and configured.
  • SSRS (SQL Server Reporting Services) reports need to be deployed in the SQL server, so that it can be accessible in the dashboard.
Xcelsius Connector to be used
Reporting Services Button can be used to get data from the SQL Server report which is deployed in the SQL server.   The deployed report can be accessed through the below URL
http://servername/xrs/xrs.asmx/GetReports”
Here the servername is the SQL Server Name.
Building the Dashboard
  • Create a report using the SQL Server reporting services and deploy it to the server.
  • While creating the dashboard, use the Reporting Services Button to connect to the SSRS data source as below:
  • In the URL box, enter the path where the reports are deployed.   On clicking the submit button, the reports in the server are listed.
  • Select the target report and do the necessary data mapping in the underlying excel.
  • If the report contains prompts, they would be listed in the report parameters that can be used by users at run time to pass values.
  • Load the data into the dashboard based on the any one of the options available given below as per the requirement:
  • Refresh on Load: Loads the data to the dashboard as soon as it is opened.
  • Refresh on Interval: Loads the dashboard in periodic intervals.
  • Trigger Behavior: Loads the dashboard based on an action in the dashboard.
  • Generate the flash file (.swf) from the dashboard and deploy it portal for users to view the dashboard.
Hope you will be able to leverage your SQL Server environment effectively for integrating with Xcelsius dashboards.  Please get back to me for any queries.  Have an enjoyable 2011!  Happy year ahead friends!

Monday, January 17, 2011

Business Objects Query Builder


Accessing Query Builder

To access the Query Builder, point your web browser to your BusinessObjects server.  Query Builder can be found at the following URL:  http://[server]:[port]/AdminTools/.


Log on as an Administrator to get full access to all the repository objects.  From here you can start writing your query.  There are three Info objects tables that you can query:
  • CI_INFOOBJECTS
    Contains objects that are often used to build the user desktop, such as favorites folders and reports.
  • CI_SYSTEMOBJECTS
    Contains objects that are often used to build the admin desktop and internal system objects, such as servers, connections, users, and user groups.
  • CI_APPOBJECTS
    Contains objects that represent BusinessObjects Enterprise Solutions. For example, the InfoView and Desktop Intelligence objects are stored in this table.
Following columns are the frequently used from the above repository tables

Column Description
SI_IDIdentifies each InfoObject instance uniquely in the database. But, this is not a primary key. If the instance is deleted, the value may later be reassigned to a new instance.
SI_NAMEName of the InfoObject instance.
SI_KINDIdentifies each row by a particular InfoObject extended class type.
SI_KIND for CI_INFOOBJECTS includes Webi, Pdf, Excel, Folder, FullClient, FavoritesFolder, Inbox, PersonalCategory, Shortcut, MyInfoView
SI_KIND for CI_APPOBJECTS includes Universe, Universe Folder, MetaData.DataConnection,ReportConvTool, WebIntelligence, Discussions, InfoView, CMC, busobjReporter, Designer, AdHoc
SI_KIND for CI_SYSTEMOBJECTS includes User, UserGroup,Connection,secWinAD, secLDAP, secWindowsNT
SI_OWNERIDUser ID of the owner
SI_OWNERUser name of the owner
SI_CHILDRENNumber of children for the Infoobject
SI_CUIDCUIDs are Cluster Unique Identifiers that uniquely identify an InfoObject, within a given cluster and also identify replicas or copies of an object across multiple CMS clusters. Because CUIDs are moderately lengthy strings they are less efficient to use and slower to query for.
SI_UNIVERSEUniverses used by the document, there might be multiple universes used in one document; you may see a list of universes’ SI_ID attached to the property.
SI_PARENTIDIdentifies the InfoObject instance that operates in a parent relationship to the current InfoObject. Typically, a report that is configured to be scheduled is a parent, and each report that is copied and stored when scheduled will view the source report as its parent.
SI_INSTANCEIdentifies whether the item that is stored in the database row is an InfoObject that was created through scheduling (such as a nightly report) and is therefore an ‘instance‘.


Relationship between InfoObjects
CMS InfoObjects are organized into hierarchies based on the relationship between them. The hierarchy could be based on folder based or user group.
From above diagram, the InfoObjects relate to each other not only by folder hierarchy, they may have other relationships. For example, the SI_OWNERID is the property to identify the ownership from the user to the document.

Sample Queries

SELECT * FROM CI_INFOOBJECTS
Returns the details for all the ‘InfoObjects’ (documents, folders, and other content) in your repository; you can filter this list using a WHERE clause.

SELECT * FROM CI_INFOOBJECTS WHERE SI_KIND=’CrystalReport’
Returns all ‘Crystal Reports’.

SELECT * FROM CI_INFOOBJECTS WHERE SI_KIND=’Webi’
Returns all ‘Web Intelligence documents’.

SELECT * FROM CI_APPOBJECTS WHERE SI_KIND=’Universe’
Returns all ‘Universes’ in the BOE Repository.

SELECT * FROM CI_SYSTEMOBJECTS WHERE SI_KIND=’User’
Returns all Users in the BOE Repository.

Improving Query Performance in Query Builder


1. For improved performance use the below Indexed properties in query’s WHERE clause wherever required.

SI_CUID
SI_GUID
SI_HIDDEN_OBJECT
SI_ID
SI_INSTANCE_OBJECT
SI_KIND
SI_NAME
SI_NAMEDUSER
SI_NEXTRUNTIME
SI_OWNERID
SI_PARENTID
SI_PLUGIN_OBJECT
SI_RECURRING
SI_RUID
SI_RUNNABLE_OBJECT
SI_SCHEDULE_STATUS
SI_UPDATE_TS
SI_INSTANCE


2. Order of the above properties in WHERE clause also improves the Query performance as the Query Builder processes queries from top to bottom and left to right. So the selection criteria should be ordered from the most restrictive to the least restrictive.

For example, SI_NAME = ‘Test Report’ should be placed before SI_KIND = ‘WebI’ in the query.

I will discuss on few more queries in the next blog that will be followed by the File Repository Server details.

Happy blogging!  Have a good year ahead!

Monday, January 3, 2011

Informatica Performance Improvement Tips


We often come across situations where Data Transformation Manager (DTM) takes more time to read from Source or when writing in to a Target. Following standards/guidelines can improve the overall performance.
  • Use Source Qualifier if the Source tables reside in the same schema
  • Make use of Source Qualifer  “Filter” Properties if the Source type is Relational.
  • If the subsequent sessions are doing lookup on the same table, use persistent cache in the first session. Data remains in the Cache and available for the subsequent session for usage.
  • Use flags as integer, as the integer comparison is faster than the string comparison.
  • Use tables with lesser number of records as master table for joins.
  • While reading from Flat files, define the appropriate data type instead of reading as String and converting.
  • Have all Ports that are required connected to Subsequent Transformations else check whether we can remove these ports
  • Suppress ORDER BY using the ‘–‘ at the end of the query in Lookup Transformations
  • Minimize the number of Update strategies.
  • Group by simple columns in transformations like Aggregate, Source Qualifier
  • Use Router transformation in place of multiple Filter transformations.
  • Turn off the Verbose Logging while moving the mappings to UAT/Production environment.
  • For large volume of data drop index before loading and recreate indexes after load.
  • For large of volume of records Use Bulk load Increase the commit interval to a higher value large volume of data
  • Set ‘Commit on Target’ in the sessions

Thursday, December 23, 2010

Leveraging Metadata in Informatica Workflow-Session/Analysis

We can leverage the metadata collected in the Informatica repository for many interesting analysis, few of the scenarios where I have leveraged the Informatica Metadata are as following.
This SQL Queries can be executed in Oracle database with no changes and requires little modification with other databases.
Failed Sessions
The following query lists the failed sessions. To make it work for the last ‘n’ days, replace SYSDATE-1 with SYSDATE – n
QUERY:
SELECT SUBJECT_AREA AS FOLDER_NAME,
SESSION_NAME,
LAST_ERROR AS ERROR_MESSAGE,
DECODE (RUN_STATUS_CODE,3,’Failed’,4,’Stopped’,5,’Aborted’) AS STATUS,
ACTUAL_START AS START_TIME,
SESSION_TIMESTAMP
FROM REP_SESS_LOG
WHERE RUN_STATUS_CODE != 1
AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE)
RESULT:
Long running Sessions
The following query lists long running sessions. To make it work for the last ‘n’ days, replace SYSDATE-1 with SYSDATE – n
QUERY:
SELECT SUBJECT_AREA AS FOLDER_NAME,
SESSION_NAME,
SUCCESSFUL_SOURCE_ROWS AS SOURCE_ROWS,
SUCCESSFUL_ROWS AS TARGET_ROWS,
ACTUAL_START AS START_TIME,
SESSION_TIMESTAMP
FROM REP_SESS_LOG
WHERE RUN_STATUS_CODE = 1
AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE)
AND (SESSION_TIMESTAMP – ACTUAL_START) > (10/(24*60))
ORDER BY SESSION_TIMESTAMP
RESULT:
Invalid Tasks
The following query lists folder names and task name, version number, and last saved for all invalid tasks.
QUERY:
SELECT SUBJECT_AREA AS FOLDER_NAME,
DECODE(IS_REUSABLE,1,’Reusable’,’ ‘) || ‘ ‘ ||TASK_TYPE_NAME AS TASK_TYPE,
TASK_NAME AS OBJECT_NAME,
VERSION_NUMBER,
LAST_SAVED
FROM REP_ALL_TASKS
WHERE IS_VALID=0
AND IS_ENABLED=1
ORDER BY SUBJECT_AREA,TASK_NAME
RESULT:
Thanks for reading, do you have other scenarios where Workflow Metadata has been effective …wish you a very happy new year 2011.

Friday, December 17, 2010

The Central Management Server (CMS) Repository


The content of the Business Objects Enterprise (BOE) system consists of the physical files and the metadata information about the physical files.
For a Crystal Report, the physical file as well as the metadata about the file should exist in the BOE system. The Crystal report is stored as a file on File Repository Server (FRS) with an extension of .rpt . The Metadata information such as report name, type, report ID, path, etc is stored as an InfoObject in the CMS Repository.
I will discuss about the CMS repository in this article.
The CMS Repository Database Tables
The CMS metadata is physically stored on a database as InfoObjects.   There are six tables, the purpose of which is given below.
SnoPhysical Table NamePurpose
1CMS_VersionInfoContains the current version of BOE.
2CMS_InfoObjects6Each row in this table stores a single InfoObject.  This is the main table in the repository.
3CMS_Aliases6Maps the user alias(es) to the corresponding user ID. For example, a user may have both a Win NT alias and an LDAP alias. Regardless of the number of aliases a user may have, in the Business Intellengence
.Platform each user has only one user ID. The map is stored in a separate table to enable fast logins.
4CMS_IdNumbers6The CMS uses this table to generate unique Object IDs and Type IDs. It has only two rows: an Object ID row and a Type ID row. The CMSs in a cluster use this table when generating unique ID numbers.
5CMS_Relationships6Relationship tables are used to store the relations between InfoObjects. Each row in the table stores one edge in the relation. For example, the relation between a Web Intelligence document and a Universe would be stored in a row in the WebI – Universe Relation table. Each relationship table has these columns: Parent Object ID, Child Object ID, Relationship InfoObject ID, member, version, ordinal, data.
6CMS_LOCKS6This is an auxiliary table of CMS_RELATIONS6.

The Central Management Server(CMS) repository tables cannot be queried directly. Query builder is the tool to be used to retrieve Business Objects metadata information using the virtual tables such as ci_systemobjects, ci_infoobjects and ci_appobjects.
In the forthcoming posts, I will discuss on query builder and file repository server.

Thursday, December 9, 2010

Building dashboards using Xcelsius with SharePoint data


Do you know – Xcelsius dashboards can be built using SharePoint data?

Xcelsius dashboard built using SharePoint data
Figure 1: Xcelsius dashboard built using SharePoint data
Pre-requisites
  • Xcelsius for SharePoint server (XSP) installs a web part container into the Sharepoint web part library.
  • After installation and configuration, the web part container can be added to the Web Part Gallery within SharePoint and any user with the appropriate rights can add the web part container to a page within the site.
Xcelsius Web connectivity Parameters to use
  • SharePoint Parameter: Used to define parameters or properties in Crystal Xcelsius models that a user can change in the SharePoint environment
  • SharePoint Provider: Used to give data to another Web Part published in SharePoint
  • SharePoint Consumer: Used to get data from another published Web Part in SharePoint (a SharePoint List or another Crystal Xcelsius model)
  • Use the File -> Export -> SharePoint option to generate the .swf file with SharePoint functionality.
  • Once generated, you will be able to add and configure the SharePoint Web Part with the .swf file to publish the model
Building the Dashboard
  • Add the Xcelsius document to the SharePoint document list
  • Copy the shortcut to the uploaded document
  • Go the SharePoint page
  • Click on Modify Shared web part
  • Add a web part
    • From the web part-gallery list select Crystal Xcelsius web-part
    • Use Modify Shared Web Part option
    • In the Xcelsius Visualization source, specify URL (paste the shortcut you copied from the document list) and apply
  • It is possible to customize models Real-time – Change chart type, title etc when defined as SharePoint Parameter
    • Use Modify Shared web part at run time
  • Add a SharePoint List, say “Sample” as a web part in the same page
  • In the Xcelsius web part configuration menu, “connections” option is enabled once a list is added in the page
  • In the connections, specify consume list from “Sample”
  • All Sharepoint lists and Providers in the same page will be listed in the connections option.
  • In order for a Xcelsius model to consume data from Sharepoint Provider, there should be two models deployed in the same page – one for the provider and one for the consumer
  • The visuals in the swf file with the consumer will change based on the data from the provider once connection is set