Monday, December 22, 2008

Business Intelligence Challenge – Product Upgrades & Migrations, Validation – 5

Once the code has been moved to the target platform (Moving the Code), whether it’s an upgrade to a newer version or migration to another newer platform, the next step is to validate the objects moved.
Validation Process involves verification or testing of the objects in the target platform to ensure that they deliver the same output as the older objects in the source platform.
Validation is a key process by which the migration or upgrade process is certified as successful, it’s usually laborious and a time consuming process. Let us see how the Validation Process can be broken into different steps and automated for saving time and for improved accuracy. We can look at the Validation process to encompass three steps, they are
  • Metadata Validation
  • Run Validation
  • Output Validation
Metadata Validation involves comparison of the metadata definitions between the existing source environment and the target environment. This requires that the metadata of the source and the target environment be captured for the comparison.
Steps Involved:
  • Capture the source metadata into a relational structure, as part of Object Consolidation we would have captured the source metadata
  • Capture the target platform metadata in a similar way into a relational structure
  • Run SQL queries to automate the metadata comparison process
Metadata Comparison would be done at the level of semantic layer definitions and individual reports. Let us take the case of metadata comparison between two semantic layers, in case of Business Objects; Universe is the semantic layer definition. After an upgrade from an older version of Business Objects to its newer version, the first level of metadata validation between the universes would be to check whether the object counts between the universes match like the classes, the objects, the filters and then further comparison on their definitions.
If there are any differences when comparing the definitions and if they fall within the known differences between the two versions (source & target) then they are good else would require code fixing in the upgraded object.
Since we always try to validate the reports by what it gives as output, the validation process is limited by the data fed in; we could miss scenarios of a filter clause not being tested. Metadata Validation can overcome the limitation in data preparation for different scenarios for testing. If a report passes through a Metadata Validation expectation then we could 100% say that the report has upgraded or migrated effectively.
Benefits:
  • Sets up a strong base on the metadata understanding, as the objects between different platforms has to be mapped and the bridges gaps identified to run automated metadata validation
  • Improved accuracy in the validation process, overcomes the limitation in data preparation
  • Enables determining issues without running the report against the data
Run Validation is to perform a dry run of the reports in an automated way to determine whether the reports run (open) successfully or not.
When we give a report to a tester, the first activity he would perform is to run the report and if it doesn’t go through the problem is reported or analysed further. We try to foresee this problem in an automated way.
Steps Involved:
  • Have scripts to invoke the reports in batch mode, as soon as the objects are upgraded invoke(open) all the upgraded reports in the batch mode
  • Capture the errors while opening/running the report into a log
  • Classify them into two categories ‘reports that ran’ and ‘reports that failed’
Some reports could fail to open because of incorrect connection details, some due to object not found etc. This process of quick run in an automated way enables to locate the failure reports immediately and also help determine the reason for the failures in one go. Limiting the data input should be considered while invoking the report.
Benefits:
  • Saves time in determining errors due to report opening or running
  • Enables building a common solution for the code fixing team, as the ‘run errors’ are consolidated
Output Validation, is to validate the output delivered by the reports. There are two levels of output validation; they are Format Validation and Data Validation.
Format Validation is to check on the format of the data presented like font size, colour, bold, label location etc which doesn’t relate to the data value.
Data Validation is to check cell by cell the data value content between the two reports.
Steps:
  • Run the source report and export the output data to excel/word
  • Run the target report and export the output data to excel/word
  • Compare the outputs for the format and the data
The best means of comparing the output of two reports is to export them to Excel and then performing a comparison between the two Excel’s. If we can export the reports to a word format then we can leverage the word compare utility, even an export to XML would enable using available utility. In case of excel we would need to build a utility that can compare the two excel sheets.
The above three validations are some of the key aspects in validating the objects of semantics and reports; let me know your thoughts on the other means of validation …

Monday, December 15, 2008

The Esoteric World of Predictive Analytics

Let me start with the defintion of Predictive Analytics as used in literature – “The nontrivial extraction of implicit, previously unknown and potentially useful information from data”. If that doesn’t sound esoteric enough, you are probably more advanced than what this post gives you credit for!
For a BI practitioner, it is important to get an understanding of Predictive Analytics (also known as Data Mining) as this subject definitely deserves a place in the wide spectrum of Business Intelligence disciplines. BI at a broad level is about optimizing business through “Hindsight, Insight and Foresight”. Predictive analytics adds the powerful “Foresight” part to business decision making.
Most BI practitioners tend to equate statistics with predictive analytics and this post explains why such a view is inaccurate. To understand this let’s start at the very beginning (a la Alice in Wonderland). Broadly, this world is divided into 2 types of systems:
  • Physical Systems – Has causality and hence can be modeled mathematically with relative ease
  • Human Behavioral Systems – Lacks causality and can be modeled only with specialized techniques
Predictive analytics for business decision making is all about modeling human behavioral systems.
Why Traditional Statistics is insufficient?
Though the entry into predictive analytics requires that we understand the implications of traditional statistical analysis, statistics by itself is insufficient in the business context. Traditional statistical analysis allows us to understand the general group behavior and is primarily concerned with common behavior within the group – the central tendencies.
In business we generally develop models to anticipate human behavior of some type. Human behavior is inconsistent, lacks causality and distributions based on human behavior almost always violate the assumptions of traditional statistical analysis (like normal distribution of data, stability of mean and standard deviation etc). The strength of data mining comes from the ability of the associated techniques to deal with the tails of the distributions, rather than the central tendencies, and from the techniques’ ability to deal with the realities of the data in a more precise manner.
In the realm of predictive analytics, we are concerned with modeling human behavior and hence are interested with the tail of our distribution – small percentage of the population that responds to a campaign, commits a fraud, leave our business or purchase the next service.
Though there are specialized techniques used for Predictive Analytics (viz. Non-linear statistics, Induction Algorithms, Cluster Analysis, Neural Networks to name a few), a BI practitioner is only expected to appreciate its usage in different business situations, prepare and model data as required by the tools and interpret the results correctly (a much less daunting task indeed!)
Typically the model development process involves the following steps – a) Define Project, b) Select Data, c) Prepare Data, d) Transform Variables, e) Process Model, f) Validate Model, g) Implement Model. I will explain these steps in more detail in subsequent posts.
Fundamentally, an end-to-end BI view requires the practitioner to learn the concepts around statistics and predictive analytical techniques as available in tools (like say SQL Server Analysis Services) in addition to their technology bag of tricks around data integration, data modeling and OLAP.
Read More About  Predictive Analytics

Wednesday, December 10, 2008

Business Objects Security

In the current business scenario, securing the data and restricting the users from what rows and columns of data they can see and what rows and columns of data they cannot see is very important.  We can secure the rows of data by row level security. Some people call this as ‘Fine grained access control’.  We can secure the columns of data by column level security. This is popularly called in Business Objects as ‘Object level security’
ROW LEVEL SECURITY
There are various ways through which the row level security can be implemented in a Business Objects environment.
One way is by securing the datamart. In case of this approach, the datamart is secured – meaning the security policies and rules are written in the datamart. Technically, a security table can be created and maintained having the users / groups with corresponding access rights.  Security policies can have a logic to compare the active logged in user and security table. All the users accessing the datamart are provided access to their data only after executing the security policies. We can also embed the security policies and rules in a view. A good example for row level security is — Non-Managers cannot see the data of   co-workers however managers can see the data of his / her sub-ordinates. In Oracle (for example), we can create a non-manager and manager views with the security rule (<security_table.user> = “USER”). The security views are imported in the Business Objects ( BO) universe and the reports use these security views through the universe. The main ADVANTAGE of securing your datamart is that your security rules can also be used by many other BI tools ( Cognos, Microstrategy )  as the rules are built at the datamart and NOT at the Business Objects)
Second way is by building the security rules at the Business Objects. Here the security rules comparing the logged in user and security data can be written in a virtual table of your Business Objects. These virtual tables are nothing but the universe derived table. BO Reports use the derived table to access the datamart tables. Alternatively, we can also define security filters in a BO universe. The filters are called as condition / filter  objects in the BO universe world. With this approach, you can take the maximum ADVANTAGE of the BO features however the disadvantage is that when you are going to a different BI tool like Cognos you need to rewrite the business security rules in your new tool.
In case of the projects dealing with the migration of Peoplesoft transactional reporting to Business Objects analytical reporting. We can potentially reuse / import some security tables  and security policies from Peoplesoft into our analytical datamart. These reusable components can save time in building the secured datamart and reporting environment.
COLUMN LEVEL SECURITY
Like ‘Row level security’, we can implement the column level security either at the datamart or Business Objects. In the financial industry, the business users do not want their revenue amounts, social security number , tax id number and other sensitive columns to be shown to unauthorized users.  Given this instance, we can mask the sensitive columns by a restricted tag in the place of sensitive columns. Non-sensitive columns like first name , last name , gender , age can be left and shown as it is to the end business user. These logic can be technically implemented in the business objects universe derived table or datamart views using a decode / ‘if then else’ / case statements.
Alternatively , we can use the universe object restriction feature in the BO designer to define restriction on the universe objects. So whenever a business user tries to drag the restricted object from the universe , the restriction rules get invoked , authorization occurs and the object access is given to the end user if he / she is successfully authenticated to access that object.
I’m signing off this BO security blog for now. The contents are based on my knowledge and BO experience in various projects.  Thanks for reading.  Please share your thoughts on this blog. Also, please let me know your project experiences pertaining to row and column level security in Business Objects.
Read More About  Business Objects Security

Tuesday, December 9, 2008

Informatica PowerCenter 8x Key Concepts – 4


owerCenter Client (contd)
Workflow Manager : In the Workflow Manager, we define a set of instructions called a workflow to execute mappings we build in the Designer. Generally, a workflow contains a session and any other task we may want to perform when we run a session. Tasks can include a session, email notification, or scheduling information.
A set of tasks grouped together becomes worklet. After we create a workflow, we run the workflow in the Workflow Manager and monitor it in the Workflow Monitor. Workflow Manager has following three window panes,Task Developer, Create tasks we want to accomplish in the workflow. Worklet Designer, Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You can nest worklets inside a workflow. Workflow Designer, Create a workflow by connecting tasks with links in the Workflow Designer. We can also create tasks in the Workflow Designer as you develop the workflow. The ODBC connection details are defined in Workflow Manager “Connections “ Menu .
Workflow Monitor : We can monitor workflows and tasks in the Workflow Monitor. We can view details about a workflow or task in Gantt Chart view or Task view. We can run, stop, abort, and resume workflows from the Workflow Monitor. We can view sessions and workflow log events in the Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives information from the Integration Service and Repository Service. It also fetches information from the repository to display historic information.
The Workflow Monitor consists of the following windows:
Navigator window – Displays monitored repositories, servers, and repositories objects.
Output window – Displays messages from the Integration Service and Repository Service.
Time window – Displays progress of workflow runs.
Gantt chart view – Displays details about workflow runs in chronological format.
Task view – Displays details about workflow runs in a report format.
Repository Manager
We can navigate through multiple folders and repositories and perform basic repository tasks with the Repository Manager. We use the Repository Manager to complete the following tasks:
1. Add domain connection information, we can configure domain connection information.
2. Add and connect to a repository, we can add repositories to the Navigator window and client registry and then connect to the repositories.
3. Work with PowerCenter domain and repository connections, we can edit or remove domain connection information. We can connect to one repository or multiple repositories. We can export repository connection information from the client registry to a file. We can import the file on a different machine and add the repository connection information to the client registry.
4. Change your password. We can change the password for our user account.
5. Search for repository objects or keywords. We can search for repository objects containing specified text. If we add keywords to target definitions, use a keyword to search for a target definition.
6. View objects dependencies. Before we remove or change an object, we can view dependencies to see the impact on other objects.
7. Compare repository objects. In the Repository Manager, wecan compare two repository objects of the same type to identify differences between the objects.
8. Truncate session and workflow log entries. we can truncate the list of session and workflow logs that the Integration Service writes to the repository. we can truncate all logs, or truncate all logs older than a specified date.

Monday, November 24, 2008

Business Intelligence Challenge – Product Upgrades & Migrations, Moving the Code – 4

Last time we discussed about Impact Assessment , the next logical step after this is to perform the actual upgrade or migration of the code.
Moving the Code: Performing Upgrade or Migration of the Objects
When we talk about product upgrades, always the product vendor provides tools by which the objects in the earlier version can be upgraded to the latest version. Yes we would see some objects failing through while using such tools; these are the ones that would need rework after the upgrade process.
When we talk about product migration like moving from Cognos to Business Objects or Business Objects to Cognos, there is good scope for us to look for some ways to automate the code migration. Earlier discussions have been on how to leverage the metadata for understanding the environment, now we are looking at an option on how to manipulate or transform the metadata so that an object in platform ‘A’ becomes compliant to platform ‘B’.
Steps involved in building an automated product migration process
Perform metadata level object mapping between the two platforms, determine the gaps. This would actually be a ‘by product’ of ‘Step 2’ in Impact Assessment
Build individual components that would
  • Read the metadata from the source platform and prepare a repository
  • Have the knowledge of the match & gap between the platforms, could be reference tables
  • Transform the ‘source’ metadata and write out as understood by the ‘target’ platform by using the reference tables
Benefits of Automated Migration
  • Helps avoid creation of objects from scratch
  • Ensures availability of time for testing (core task) than code development
  • Enables team to have a flexible skillset
  • A faster way of delivering things when a ‘one to one’ migration from the source platform is seen as a must
Automated Migration Challenges
Transforming the source metadata to the target platform would be a challenge, especially with data manipulation functions. Having a good understanding of the gaps will help; a reference table mapping the functions between the platforms would be useful. In scenarios where a function cannot be converted to the target platform, a comment can be written into a log file enabling quicker attention.
Have seen good success in writing such automated migration components though not 100%. With almost every products providing good SDK kits for reading and as well writing metadata and as well with the support for XML structures, writing such bridges for object migration are getting easier.
Whether the objects in a product are migrated/upgraded in an automated way or not, the following activity of ‘Validation’ plays a key role in ensuring the final quality, next time let us discuss on some of the means for effective validation ….

Business Intelligence Challenge – Product Upgrades & Migrations, Moving the Code – 4


Last time we discussed about Impact Assessment , the next logical step after this is to perform the actual upgrade or migration of the code.

Moving the Code: Performing Upgrade or Migration of the Objects

When we talk about product upgrades, always the product vendor provides tools by which the objects in the earlier version can be upgraded to the latest version. Yes we would see some objects failing through while using such tools; these are the ones that would need rework after the upgrade process.

When we talk about product migration like moving from Cognos to Business Objects or Business Objects to Cognos, there is good scope for us to look for some ways to automate the code migration. Earlier discussions have been on how to leverage the metadata for understanding the environment, now we are looking at an option on how to manipulate or transform the metadata so that an object in platform ‘A’ becomes compliant to platform ‘B’.

Steps involved in building an automated product migration process

Perform metadata level object mapping between the two platforms, determine the gaps. This would actually be a ‘by product’ of ‘Step 2’ in Impact Assessment
Build individual components that would
  • Read the metadata from the source platform and prepare a repository
  • Have the knowledge of the match & gap between the platforms, could be reference tables
  • Transform the ‘source’ metadata and write out as understood by the ‘target’ platform by using the reference tables
Benefits of Automated Migration
  • Helps avoid creation of objects from scratch
  • Ensures availability of time for testing (core task) than code development
  • Enables team to have a flexible skillset
  • A faster way of delivering things when a ‘one to one’ migration from the source platform is seen as a must
Automated Migration Challenges
Transforming the source metadata to the target platform would be a challenge, especially with data manipulation functions. Having a good understanding of the gaps will help; a reference table mapping the functions between the platforms would be useful. In scenarios where a function cannot be converted to the target platform, a comment can be written into a log file enabling quicker attention.

Have seen good success in writing such automated migration components though not 100%. With almost every products providing good SDK kits for reading and as well writing metadata and as well with the support for XML structures, writing such bridges for object migration are getting easier.

Whether the objects in a product are migrated/upgraded in an automated way or not, the following activity of ‘Validation’ plays a key role in ensuring the final quality, next time let us discuss on some of the means for effective validation ….

Thursday, November 20, 2008

Zachman Framework for BI Assessments

The Zachman Framework for Enterprise Architecture has become the model around which major organizations view and communicate their enterprise information infrastructure. Enterprise Architecture provides the blueprint, or architecture, for the organization’s information infrastructure. More information on the Zachman Framework can be obtained at www.zifa.com.
For BI practitioners, the Zachman Framework provides a way of articulating the current state of the BI infrastructure in the organization. Ralph Kimball in his eminently readable book “The Data Warehouse Lifecycle Toolkit” illustrates how the Zachman Framework can be adapted to the Business Intelligence context.
Given below is a version of the Zachman Framework that I have used in some of my consulting engagements. This is just one way of using this framework but does illustrate the power of this model in some measure.
zachman
Some Salient Points with respect to the above diagram are:
  • The framework answers the basic questions of “What”, “How”, “Who” and “Where” across 4 important dimensions – Business Requirements, Conceptual Model, Logical/Physical Model and Actual Implementation.
  • Zachman Framework reinforces the fact that a successful enterprise system combines the ingredients of business, process, people and technology in proper measure.
  • It is typically used to assess the current state of the BI infrastructure in any organization
  • Each of the cells that lies at the intersection of the rows and columns (Ex: Information Requirements of Business) has to be documented in detail as part of the assessment document
  • Information on each cell is gathered through subjective and objective questionnaires.
  • Scoring Models can be developed to provide an assessment score for each of the cells. Based on the scores, a set of recommendations can be provided to achieve the intended goals.
  • Another interesting thought is to create a As-Is Zachman framework and overlay that with To-Be one in situations where re-engineering of a BI environment is undertaken. This will help us provide a transition path from the current state to the future.
Thanks for reading. If you have used the Zachman framework differently in your environment, please do share your thoughts.

Monday, November 10, 2008

Valuing your Business Intelligence System – Part 1

Sample these statements:
  • Dow Jones Industrial Average jumped 200 points today, a 2% increase from the previous close
  • The carbon footprint of an average individual in the world is about 4 tonnes per year which is a 3% increase over last year
  • The number of unique URL’s as on July 2008 in the World Wide web is 1 trillion. The previous landmark of 1 billion was reached in 2000
  • One day 5% VaR (Value at Risk) for the portfolio is $ 1 Million as compared to the VaR of $ 1.3 Million a couple of weeks back
Most of us buy into the idea of having a single number that encapsulates complex phenomena. Though the details of the underlying processes are important, the single number (and the trend) does act like a bellwether of sorts helping us quickly get a feel of the current situation.
As a BI practitioner, I feel that it is about time that we formulated a way for valuing the BI infrastructure in organizations. Imagine a scenario where the Director of BI in company X can announce thus: “The value of the BI system in this organization has grown 15% over the past 1 year to touch $50 Million” (substitute your appropriate currencies here!).
The core idea of this post is to find a way to “scientifically put a number to your data warehouse”. Here are a few level setting points:
  1. Valuation of BI systems is different from computing the Return on Investment (ROI) for BI initiatives. ROI calculations are typically done using Discounted Cash Flow techniques and are used in organizations to some extent
  2. More than the absolute number, the trends are important which means that the BI system has to be valued using the same norms at different points in time. Scientific / Mathematical rigor helps in bringing the consistency aspect.
  3.  
My perspective to valuation is based on the “Outside-in” logic where the fundamental premise is that the value of the BI infrastructure is completely determined by its consumption. Or in other words, if there are no consumers for your data warehouse, the value of such a system is zero. One simple, yet powerful technique in the “Outside-in” category is RFM Analysis. RFM stands for Recency, Frequency and Monetary and is very popular in the direct marketing world. My 2-step hypothesis for BI system valuation using the RFM technique is:
  • Step 1: Value of BI system = Sum of the values of individual BI consumers
  • Step 2: Value of each individual consumer = Function (Recency, Frequency, Monetary parameters)
Qualitatively speaking, from the business user standpoint, one who has accessed information from the BI system more recently, has been using data more frequently and uses that information to make decisions that are critical to the organization will be given a higher value. A calibration chart will provide the specific value associated with RFM parameters based on the categories within them. For example: For the Recency parameter, usage of information within the last 1 day can be fixed at 10 points while access 10 days back will fetch 1 point. I will explain my version of the calibration chart in detail in subsequent posts. (Please note that the conversion of points to dollar values is also an interesting, non-trivial exercise)
Am sure that people acknowledge the fact that valuing data assets are difficult, tricky at best. But then, lot more difficult questions on nature and behavior have been reduced to mathematical equations – probably, the day on which BI practitioners can apply standardized techniques to value their BI infrastructure is not too far off.