Monday, August 30, 2010

Informatica Development Best Practice – Workflow


Workflow Manager default properties can be modified to improve the overall performance and few of them are listed below.    This properties can impact the ETL runtime directly and needs to configured based on :

i)  Source Database
ii) Target Database
iii) Data Volume


CategoryTechnique
Session PropertiesWhile loading Staging Tables for FULL LOADS,  Truncate target table option should be checked. Based on the Target database and the primary key defined, Integration Service fires TRUNCATE or DELETE statement.Database                  Primary Key Defined                   No Primary KeyDB2                             TRUNCATE                                       TRUNCATE
INFORMIX                 DELETE                                              DELETE
ODBC                         DELETE                                                DELETE
ORACLE                    DELETE UNRECOVERABLE            TRUNCATE
MSSQL                       DELETE                                               TRUNCATE
SYBASE                     TRUNCATE                                        TRUNCATE Workflow Property “Commit interval” (Default value : 10,000) should be increased for increased for Volumes more than 1 million records.  Database Rollback Segment size should also be updated, while increasing “Commit Interval”.
Insert/Update/Delete options should be set as determined by the target population method.
Target Option                                   Integration Service
Insert                                                   Uses Target update Option
Update as Update
Update as Insert
Update else Insert
Update as update                             Updates all rows as Update
Update as Insert                               Inserts all rows
Update else Insert                            Updates existing rows else Insert
Partition
Maximum number of partitions for a session should be 1.5 times the number of processes in the Informatica server. i.e. 1.5 X 4 Processors = 6 partitions.
Key Value partitions should be used only when an even Distribution of data can be obtained.  In other cases, Pass Through partitions should be used.
A Source filter should be added to evenly distribute the data between Pass through Partitions. Key Value should have ONLY numeric values. MOD(NVL(<Numeric Key Value>,0),# No of Partitions defined)  Ex: MOD(NVL(product_sys_no,0),6)
If a session contains “N” partition, increase the DTM Buffer Size to at least “N” times the value for the session with One partition
If the Source or Target database is of MPP( Massively Parallel Processing ), enable Pushdown Optimization.  By enabling this, Integration Service will push as much Transformation Logic to Source database or Target database or FULL ( both ) , based on the settings.  This property can be ignored for Conventional databases.

Thursday, August 19, 2010

Informatica Development Best Practices – Mapping


The following are generally accepted “Best Practices” for Informatica PowerCenter ETL development and if implemented, can significantly improve the overall performance.


CategoryTechniqueBenefits
Source ExtractsLoading data from Fixed-width files take less time than delimited, since delimited files require extra parsing.  Incase of Fixed width files, Integration service know the Start and End position of each columns upfront and thus reduces the processing time.Performance Improvement
Using flat files located on the server machine loads faster than a database located on the server machine.Performance Improvement
Mapping DesignerThere should be a place holder transformation (Expression) immediately after the Source and one before the target.  Data type and Data width changes are bound to happen during development phase and these place holder transformations are used to preserve the port link between transformations.Best Practices
Connect only the ports that are required in targets to subsequent transformations.  Also, active transformations that reduce the number of records should be used as early in the mapping.Code Optimization
If a join must be used in the Mapping, select appropriate driving/master table while using joins. The table with the lesser number of rows should be the driving/master table.Performance Improvement
TransformationsIf there are multiple Lookup condition, make the condition with the “=” sign first in order to optimize the lookup performance.  Also, indexes on the database table should include every column used in the lookup condition.Code Optimization
Persistent caches should be used if the lookup data is not expected to change often.  This cache files are saved and can be reused for subsequent runs, eliminating querying the database.Performance Improvement
Integration Service processes numeric operations faster than string operations. For example, if a lookup is done on a large amount of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance.Code Optimization
Replace Complex filter expression with a flag (Y/N). Complex logic should be moved to the expression transformation and the result should be stored in a port.  Filter expression should take less time to evaluate this port rather than executing the entire logic in Filter expression.Best Practices
Power Center Server automatically makes conversions between compatible data types which slowdown the performance considerably.  For example, if a mapping moves data from an Integer port to a Decimal port, then back to an Integer port, the conversion may be unnecessary.Performance Improvement
Assigning default values to a port; Transformation errors written to session log will always slow down the session performance.  Try  removing default values and eliminate transformation errors.Performance Improvement
Complex joins in Source Qualifiers should be replaced with Database views. There won’t be any performance gains, but it improves the readability a lot.  Also, any new conditions can be evaluated easily by just changing the Database view “WHERE” clause.Best Practices

Thursday, August 12, 2010

Change Data Capture in Informatica


Change data capture (CDC) is an approach or a technique to identify changes, only changes, in the source. I have seen applications that are built without CDC and later mandate to implement CDC at a higher cost. Building an ETL application without CDC is a costly miss and usually a backtracking step. In this article we can discuss different methods of implementing CDC.


Scenario #01: Change detection using timestamp on source rows
In this typical scenario the source rows have extra two columns say row_created_time & last_modified_time. Row_created_time : time at which the record was first created ; Last_modified_time: time at which the record was last modified
  1. In the mapping create mapping variable $$LAST_ETL_RUN_TIME of datetime data type
  2. Evaluate condition SetMaxVariable ($$LAST_ETL_RUN_TIME, SessionStartTime); this steps stores the time at which the Session was started to $$LAST_ETL_RUN_TIME
  3. Use $$LAST_ETL_RUN_TIME in the ‘where’ clause of the source SQL. During the first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where last_modified_date > ’01/01/1900 00:00:000’
  4. Now let us assume the session is run on ’01/01/2010 00:00:000’ for initial seed
  5. When the session is executed on ’02/01/2010 00:00:000’, the sequel would be like : select * from employee where last_modified_date > ’01/01/2010 00:00:000’, hereby pulling records that had only got changed in between successive runs
Scenario #02: Change detection using load_id or Run_id
Under this scenario the source rows have a column say load_id, a positive running number. The load_id is updated as and when the record is updated
  1. In the mapping create mapping variable $$LAST_READ_LOAD_ID of integer data type
  2. Evaluate condition SetMaxVariable ($$LAST_READ_LOAD_ID,load_id); the maximum load_id is stored into mapping variable
  3. Use $$LAST_READ_LOAD_ID in the ‘where’ clause of the source SQL. During the first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where load_id > 0; Assuming all records during initial seed have load_id =1, the mapping variable would store ‘1’ into the repository.
  4. Now let us assume the session is run after five load’s into the source, the sequel would be select * from employee where load_id >1 ; hereby we limit the source read only to the records that have been changed after the initial seed
  5. Consecutive runs would take care of updating the load_id & pulling the delta in sequence
In the next blog we can see how to implement CDC when reading from Salesforce.com

Peoplesoft Connectors for Oracle Identity Manager – Part I


Introduction

Couple of weeks ago, I attended an Oracle Webcast titled “Introducing Oracle Identity Management 11g”. That webcast was about introducing the remaining components of Oracle Identity Management Product Suite which is part of the Oracle Fusion Middleware 11g (we can call it as a second set of product release!).

During the first phase release of Oracle Fusion Middleware Components, Oracle released the few components such as Oracle Internet Directory (OID), Oracle Virtual Directory (OVD) etc. Along with couple of other components, following are the major software releases (as part of second release) of the new Oracle Identity Management 11g Product Suite:
  • Oracle Identity Manager
  • Oracle Access Manager
  • Oracle Identity Analytics
  • … and few others …
In Identity Management, Oracle Identity Management 11g product suite provides Identity and Access Management (IAM) functions along with compliance/security related solutions. In Oracle Identity Management 11g, as usual, more features are added such as security development platform, integration with Fusion Middleware.
In this blog series, I am going to talk more about the Oracle Identity Manager (OIM) Product. Let us first understand about the Oracle Identity Manager Product and its features, and then we will talk more about various options available for integrating Peoplesoft Systems with Oracle Identity Manager Product. I used my personal experience with the product and referred the Oracle Identity Manager 11g Release 1 documentation for these. These are various guides available as part of Oracle Fusion Middleware Documentation. If you need in-depth knowledge about this product, you need to refer these manuals. Let’s understand OIM product first.

About Oracle Identity Manager

One of the Major and important Oracle Identity Management Component is Oracle Identity Manager (OIM). Earlier this product was called Xellerate Provisioning (by a company called Thor technologies). OIM product provides a central repository to store user and group information for any organization. One of the important features of OIM is it can integrate with various target systems available (such as Peoplesoft HRMS, SAP, Active Directory, Siebel etc). Also, various other Oracle products such as JD Edwards, EBS and Oracle Retail  have connectors as well.
I like the OIM Connectors Page at the Oracle Website. You should visit once. There are connectors for most commonly used products in the market (such as Sun Java Directory, Novell eDirectory, SAP products, Databases, Siebel etc). In this post, I want to explore the Peoplesoft Connectors and how can we deploy these connectors in an enterprise implementing OIM. I am going to provide a conceptual understanding only, for more details on the Connectors; you should refer the connector documentation (Search for “oracle identity manager connector documentation” to visit the Connector Documentation page). Also, other products (that has no connectors) can be integrated with OIM using Generic Technology Connectors (GTC) which is delivered as part of OIM product. We will talk more about GTC in later posts.

Integrating Peoplesoft HRMS system with OIM

Peoplesoft HRM (or HRMS) Systems are ERP systems deployed in many enterprises across the world. Hexaware supports many such Peoplesoft HRMS systems implementation and support across the globe. There are two Peoplesoft connectors available for OIM product.  They are:
  • PSFT Employee Reconciliation Connector
  • PSFT User Management Connector
These two connectors are used for different purposes in a Peoplesoft based environment. Let’s explore the use of these connectors using an Architecture diagram. I created the following diagram to show the integration and the use of PSFT connectors.
In this High-level Architecture, I used an existing Peoplesoft HRMS System as a trusted source for OIM. OIM will play a role of central repository to store user and group information. The User Provisioning will be happening to multiple target systems mentioned in the diagram.
PSFT Employee Reconciliation Connector is used to perform trusted source reconciliation with Peoplesoft HRMS system. In this scenario, Peoplesoft HRMS system is the source for all the user or employee related information during the entire user management lifecycle (user add, user delete, user modification etc). There are two versions of the PSFT Employee Recon Connector.
  • Version 9.0.4.x
  • Version 9.1.x
If you are in Peopletools 8.48 or earlier releases, then you should opt for 9.0.4. For detailed list of supported releases, you can refer the connector documentation.
Both Version 9.0.4.x and Version 9.1.x use Integration Broker Architecture for integrating with OIM. As you are aware, the IB architecture is considerably changed starting with Peopletools 8.48. There are new features added in Peopletools 8.49. For Version 9.1.x, the Supported Peoplesoft HRMS systems are 8.9, 9.0 and 9.1 with Peopletools 8.49 and 8.50.
Let’s explore these two Peoplesoft Connectors for OIM in future posts. I really like to share and learn more about these connectors, mainly for two reasons. I worked as Peoplesoft Admin for so many years and I also learned some basics of OIM recently. Let’s meet in next post. Until then