Tuesday, September 30, 2008

Business Intelligence – The Reusability Gene

One issue that confronts me time and again while executing BI projects is “Reusability”, actually the lack of it. Let me give an example. 
In the many migrations and upgrade projects that Hexaware (my company) has executed, I always find that the number of reports finally migrated/upgraded to a new environment is only 40-50% of the number that is provided to us by the customer initially. Report Rationalization has become such a critical step that we have developed many specific metadata tools that helps rationalize the reporting environment.  Coming back to the topic – The reason for such a divergence between the final number of reports and the initial number is lack of ‘reusability’. Business users have their own versions of standardized (?) reports stored in their desktops which are nothing but small variations (usually with a new filter added) of an already existing report.
Another similar example on the data integration side is the creation of ad-hoc ETL routines as and when required. This results in duplication of ETL jobs and also results in a non-standard BI environment.
Lack of re-use causes two major problems:
1) BI environment becomes bloated with the increase in the number of unwanted components that use valuable computing resources, resulting in delays for availability of more important information.
2) Any attempt at upgrading/re-engineering the existing system results in high costs and undesirable heart-burn among business users
The Prescription:
1) Establish a corporate level BI team whose primary responsibility is to ensure that any component addition (ETL, Reports, and Models etc.) is justified based on its purpose. This team has to ensure that existing standards and components are reused to the maximum extent.
2) Strengthen the “Business Metadata” architecture within the organization. In one of my earlier posts, I had explained my view of BI metadata and that is very relevant to the task of improving reusability.
Basically, the “Reusability gene” seems to be a little muted in its functioning among BI practitioners. It is time that BI teams within organizations and system integrators like Hexaware look at reusability as a critical parameter while developing and deploying BI solutions.
Read More About  The Reusability Gene

Thursday, September 25, 2008

Informatica 8.6 Enhancements for Developers – 1


Informatica has released its latest version 8.6 covering all the hot fixes it released for the prior version 8.5 and including few new features. Since version 8, a Unified Admin Console has been designed for managing Integration and Repository services. These were discussed in earlier Blogs.
What does PowerCenter 8.6 bring new for the developers? Let us discuss PowerCenter 8.6 Client enhancements which will be useful to the developers.
1. Creating Targets from Transformations
We can create targets based on transformations in the workspace or navigator.
To create a target,
1. Right-click the transformation in the workspace and select the Create and Add Target option.
2. Alternatively, we can drag and drop the transformation in the Target Designer.
The target that is created has the same port definitions as the transformation from which it was created. We can edit the target definitions later. In addition, the target type is the same as that of the repository used.
2. Invalid/Invalidated renamed
In PowerCenter 7, the two states of objects were known as Invalid and Invalidated.
The exact meaning of these states is as follows:
Invalid – an object will not run,
Invalidated – an object may be invalid or may not run.
The difference between the two terms was not very clear. Therefore, to avoid any confusion, in PowerCenter 8.6, the two states have been renamed as Invalid and Impacted. While the Invalid state still implies that an object will not run, Impacted means that an object is affected by a change, and therefore, may not run.
Apart from the naming convention the icons are also changed in PowerCenter 8.
3. Propagating Port Descriptions
In the Designer, in addition to the other properties of port propagation, we can edit a port description and propagate the description to other transformations in the mapping.
4. Environment SQL Enhancements
In PowerCenter 8, environment SQL can be used to execute an SQL statement at the beginning of each transaction. The Integration Service executes transaction environment SQL at the beginning of each transaction. Environment SQL can still be used to execute an SQL statement at each connection to the database.
Use SQL commands that depend upon a transaction being opened during the entire read or write process. For example, the following SQL command modifies how the session handles characters:
ALTER SESSION SET NLS_LENGTH_SEMANTICS=CHAR

5. Flat File Enhancements
PowerCenter 8 includes enhancements for handling flat files. Some of these improve performance.
Flat files can now use Integer or Double data types.
In addition, target partitions can be merged. The flat file target merge options include:

Wednesday, September 17, 2008

Informatica PowerCenter 8x Key Concepts -3


dministration Console
The Administration Console is a web application that we use to administer the PowerCenter domain and PowerCenter security. There are two pages in the console, Domain Page & Security Page.
We can do the following In Domain Page:
o Create & manage application services like Integration Service and Repository Service
o Create and manage nodes, licenses and folders
o Restart and shutdown nodes
o View log events
o Other domain management tasks like applying licenses and managing grids and resources
We can do the following in Security Page:
o Create, edit and delete native users and groups
o Configure a connection to an LDAP directory service. Import users and groups from the LDAP directory service
o Create, edit and delete Roles (Roles are collections of privileges)
o Assign roles and privileges to users and groups
o Create, edit, and delete operating system profiles. An operating system profile is a level of security that the Integration Services uses to run workflows
4. PowerCenter Client
Designer, Workflow Manager, Workflow Monitor, Repository Manager & Data Stencil are five client tools that are used to design mappings, Mapplets, create sessions to load data and manage repository.
Mapping is an ETL code pictorially depicting logical data flow from source to target involving transformations of the data. Designer is the tool to create mappings
Designer has five window panes, Source Analyzer, Warehouse Designer, Transformation Developer, Mapping Designer and Mapplet Designer.
Source Analyzer:
Allows us to import Source table metadata from Relational databases, flat files, XML and COBOL files. We can only import the source definition in the source Analyzer and not the source data itself is to be understood. Source Analyzer also allows us to define our own Source data definition.
Warehouse Designer:
Allows us to import target table definitions which could be Relational databases, flat files, XML and COBOL files. We can also create target definitions manually and can group them into folders. There is an option to create the tables physically in the database that we do not have in source analyzer. Warehouse designer doesn’t allow creating two tables with same name even if the columns names under them vary or they are from different databases/schemas.
Transformation Developer:
Transformations like Filters, Lookups, Expressions etc that have scope to be re-used are developed in this pane. Alternatively Transformations developed in Mapping Designer can also be reused by checking the option‘re-use’ and by that it would be displayed under Transformation Developer folders.
Mapping Designer:
This is the place where we actually depict our ETL process; we bring in source definitions, target definitions, transformations like filter, lookup, aggregate and develop a logical ETL program. In this place it is only a logical program because the actual data load can be done only by creating a session and workflow.
Mapplet Designer:
We create a set of transformations to be used and re-used across mappings.
Read More about Informatica PowerCenter 

Business Intelligence Challenge – Product Updates and Migration-I

Product Upgrades are situations where we are moving from one version of the product to the latest version of the same product. Upgrades happen
  • to ensure support from the product vendor
  • to leverage new features provided by the latest version in terms of performance and user experience
  • as some other new product which is being added to the architecture doesn’t talk to the existing versions
Product Migrations are situations where we are moving from a platform of one vendor to another vendor’s platform. Migrations happen
  • as ‘BI Standardization’ initiatives drive organizations to move towards a common platform to operate BI systems at a lower cost and provide uniform user experience
  • because of bad experience with the current product not meeting the business needs in terms of performance or usability or product support or license cost
  • to be triggered also because of the recent mergers and acquisitions which lead organizations to think of a ‘safer’ platform
Upgrade a Challenge? With newer versions of every major product especially the ones like Business Objects, Cognos under go such a rapid change that the newer versions of the same product comes out on a different architecture with entirely new set of components, no longer upgrades are upgrades they have become effort intensive product migrations almost similar to moving from one BI product vendor to the another BI vendor.
Let us call either upgrade or migration as ‘Upgrade’ as any such initiative is for better upgraded experience of the business and the IT.
Can we do this upgrade next year? , a common dialogue when an IT team requests for a Business Intelligence Product Upgrade. Upgrade is one of the key items that would definitely come up for discussions during BI budget allocation in every organization. Fears among the business subsist that Upgrade projects would involve many of their hours without much benefit to them. For the IT Upgrade is a bigger challenge due to the unpredictability involved in the problems they would face during the course of the project and ensuring minimal disturbance to the business team. Hence the BI initiatives related to Product Upgrade get through multiple scrutinies before budget approval. Such projects are seen as an IT initiative and clear definition of business benefits becomes difficult to build.

Tuesday, September 9, 2008

Business Intelligence – The Unconquered Territories

Bill Bryson, one of my favorite authors, writes this way in the book “A Short History of Nearly Everything” and I quote:
“As the nineteenth century drew to a close, scientists could reflect with satisfaction that they had pinned down most of the mysteries of the physical world: electricity, magnetism, gases, optics, kinetics, and statistical mechanics, to name just a few. If a thing could be oscillated, accelerated, perturbed, distilled, combined, weighed or made gaseous they had done it, and in the process produced a body of universal laws so weighty and majestic that we still tend to write them out in capitals. The whole world clanged and chuffed with the machinery and instruments that their ingenuity had produced. Many wise people believed that there was nothing much left for science to do”
Now we all know how much science did invent / discover in the 20th century.
Sitting now in 2008, sometimes when I hear people speaking about BI, I get a feeling that we are on the verge of accomplishing everything in this space. Alas! That is “as far as it gets” from the truth– There are so many “unconquered territories” in BI that if you were thinking that the past was challenging enough, it is time to get rejuvenated for wresting with bigger challenges in the future.
My top ten “Unconquered Territories” for BI Practitioners are:
1) Majority of BI decision making is geared towards analysis of structured data. Usage of unstructured data is minimal at best and non-existent in many cases.
2) There is still lot of work to be done in integrating the process rigor of a Six Sigma or a quality management methodology (say CMMI) to the BI paradigm. Unless that is done, BI will not be sustainable in the long run.
3) Lack of valuation techniques. BI systems are corporate assets like Human Resources, Brands etc. and there has to be concrete models for valuing them.
4) Predictive Analytics / Data Mining are used only by handful of organizations effectively. There is no shortage of techniques but the world is probably short of people who can apply high-end analytical techniques to solve “common-sense”, real world business problems.
5) Let’s face it – There are technology limitations. Operational BI (Lack of real-time data access), Guided analytics (Lack of comprehensive business metadata), Information as a Service (Lack of SOA based BI architecture) are some of those technology limitations that come to my mind.
6) Data Quality is a nightmare in most organizations. Either the data is already ‘dirty’ or there is really no governance process which leaves the only option that data will become ‘dirty’ eventually.
7) Here is a mindset challenge – BI Practitioners, in my view, need to develop a higher level of “business process” oriented thinking that seems to be lacking given the ever increasing technology complexity of BI tools.
8) Simulations!! – Businesses run with a lot of interdependent variables. Unless a simulation model of the business is built into the analytical landscape, there is really no way of having a handle on the future state of business. Of course, ‘Black Swans’ will continue to exist but that’s a different subject matter altogether.
9) On demand analytics – I accept that am being a little unfair here to expect BI to catch up with the nascent world of “cloud” computing so early. But the fact remains that much work can be done in this area of “Cloud Analytics”.
10) Packaged analytics is a step in the right direction – Organizations can quickly deploy analytical packages and spend more time on how to optimize business decisions. Having said that, the implementation difficulty combined with the lack of flexibility in packages are areas of concern to be alleviated.
Each one of us will have our own list of “unconquered territories”. Probably it is worthwhile to put everything down on paper and nudge your BI environments towards conquering all those areas and beyond.
Read More About  Business Intelligence

Monday, September 1, 2008

Business Intelligence Challenge – Understanding Requirements, System Object Analysis

In the earlier discussion we had looked at understanding BI requirements through User Object Analysis, now let us look at another aspect.
The uniqueness in building BI systems when compared to other systems is that BI systems are built over the data collected by transaction (source) systems for effective data analysis. In principle a BI system should enable any kind of analysis on the data from source(s), but in many cases we pull only required elements initially to the data warehouse based on predefined analysis and get the BI system up. The requirements for a BI system is to define the scope in terms of what business processes, its scenarios and data that are of immediate need and get them available for analysis.
Even though many system owners or functional experts provide the details of the transaction system, there are still many data elements and relationship that are not reachable through the inputs from the business. We must have experienced new scenarios pointed out by the business like ‘this data element should not be updated’, ‘we need the value to be populated based on a certain flag’, such things emerge during the testing phase or in the production, such surprises occur not because that the requirements keep changing but due to lack of understanding of the clear scenarios based on the data present in the source system.
The means of understanding the business process and the system functions of a source system by looking at its data elements and their values is called ‘System Object Analysis’.
Following are the steps in ‘System Object Analysis’
1. Collect all tables from the source system, physical structure metadata like table name, column name, data type etc
2. Define the descriptions in terms of kind of data each of these tables store
3. Group the tables based on the functions through description understanding or through naming conventions present among the tables.Certain tables or groups can get eliminated here by interaction with the users. Also a table can belong to multiple groups
4. Reverse engineering the underlying data model would be useful as well
5. Perform data profiling for each of tables
6. Understand the domain values, their significance in terms when such value can occur and the relationship between tables
7. Determine the different scenarios on how the data has arrived into this table
8. Determine the fact, dimension and the attributes of dimensions within each functional area/group
9. Now with the clear details on each group and the facts-dimensions that they contribute, prepare certain questions that a business can get answered within and across the functional area (groups). Validate the questions and possibly collect more questions from Business.
10. Present to the business on what can be done on the system, prioritize and prepare the implementation plan
Based on the analysis of the tables, the Group or Functional defined initially can undergo changes in terms of the table list within a group or even a new group can come up. During the above steps regular interaction with the business users happens and the requirements of the BI system gets defined.
Benefits of System Object Analysis
Ensures complete understanding of the process by which data gets modified in the source system enabling to deliver more than what the business needs
Helps group, prioritize requirements and build case for the dependency and prepare roll out plan
Means to trigger the requirements definition from user through an interactive process, gets us raise many questions to the business about their system and process
Many a times the requirement defined by the business is to build an ad-hoc query environment for a transaction system, so System Object Analysis which enables the users navigate the requirements through the inputs from the technical team becomes almost mandatory for building an effective BI system.

Few tips related to Informatica 8.x environment


Let us discuss some special scenarios that we might face in Informatica 8.x environments.
A. In Informatica 8.x, multiple integration services can be enabled under one node. In case if there is a need to determine the process associated with an Integration service or Repository service, then it can be done as follows.
If there are multiple Integration Services enabled in a node, there are multiple pmserver processes running on the same machine. In PowerCenter 8.x, it is not possible to differentiate between the processes and correlate it to a particular Integration Service, unlike in 7.x where every pmserver process is associated with a specific pmserver.cfg file. Likewise, if there are multiple Repository Services enabled in a node, there are multiple pmrepagent processes running on the same machine. In PowerCenter 8.x, it is not possible to differentiate between the processes and correlate it to a particular Integration Service.
To do these in 8.x do the following:
1.      Log on to the Administration Console
2.      Click on Logs > Display Settings.
3.      Add Process to the list of columns to be displayed in the Log Viewer.
4.      Refresh the log display.
5.      Use the PID from this column to identify the process as follows:
UNIX:
Run the following command:
ps –ef grep pid
Where pid is the process ID of the service process.
Windows:
    1. Run task manager.
    2. Select the Processes tab.
Scroll to the value in the PID column that is displayed in the PowerCenter Administration Console.
B. Sometimes, the PowerCenter Administration Console URL is inaccessible from some machines even when the Informatica services are running. The following error is displayed on the browser:
“The page cannot be displayed”
The reason for this is due to an invalid or missing configuration in the hosts file on the client machine.
To resolve this error, do the following:
  1. Edit the hosts file located in the windows/system32/drivers/etc folder on the server from where the Administration Console is being accessed.
  2. Add the host IP address and the host name (for the host where the PowerCenter services are installed).
Example
10.1.2.10 ha420f3
  1. Launch the Administration Console and access the login page by typing the URL: http://<host>:<port>/adminconsole in the browser address bar.
It should be noted that the host name in the URL matches the host entry in the hosts file.

Read More about Informatica 8.x