Enterprise Services: March 2008

Monday, March 24, 2008

Metadata 101 – For BI Practitioners

For as long as I can remember, the definition given for Metadata is “Data about Data”. We have all said this in interviews, heard it from candidates, seen it on presentations, and (almost) always nodded our heads in agreement.

In the transaction processing world, where “data-in” is the paradigm, the definition is precise. The databases store the business data in the relational format and the system tables / catalogs describe the structure of that data – the columns, type, size, etc. This data about the structure of business data is “Metadata”.

In the Business Intelligence world, that definition of metadata is incomplete. A more precise definition of metadata has two components:

Metadata in BI = “Data about Data” + “Information about Information”

The first component “Data about data” is “Technical Metadata” and is similar to the metadata in the OLTP world. Having said that, the technical metadata in BI is arguably more complex, as it not only encompasses the databases but needs to cover the ETL and Reporting tools as well. Each of the tools in the overall BI landscape has its own metadata and this data has to be looked at in a comprehensive fashion to understand data lineage etc.

Even among BI tools, there are different categories – Tools that expose its metadata completely, tools that gives an handle to its metadata thro’ pre-defined APIs and tools that do not allow any access to the metadata. Given the industry direction and the evolution of Common Warehouse Metamodel (CWM) compliance standards, it is only a matter of time before the tool architecture is designed to expose the technical metadata. CWM is a fascinating topic of its own and you can get a feel for it by visiting this website: http://www.omg.org/technology/cwm/

To me, as a BI practitioner, the second piece of the metadata puzzle is more interesting. “Information about information” aspect of metadata is “Business Metadata” and understanding it is crucial to implementing the BI vision in any enterprise.

As an analytical information consumer, there are 2 important requirements:

Need direction to access the required analytical content
Example:
- Where can I get Sales by Product for different locations over the last 2 years?
- Am interested in Customer related Analytics. Where do I access it?
Once the content is retrieved, need guidance on how to make sense of it
Example:
- Report shows Forecasted Sales for next quarter in the chart. How is this value calculated?
- Does the total inventory value displayed in the report include the Raw material inventory or does it exclude it?

Business metadata when properly organized should provide direction to both the points mentioned above.

Metadata management in BI deals with integration of technical and business data in a way that is useful for the organization. The challenge of metadata management becomes even more daunting when one considers both structured and unstructured data. Having said that, it is important for BI practitioners to understand the true nature of BI metadata and provide implementable solutions in their specific organizational context.

In my future posts, I would discuss this fascinating area of Metadata management, with its manifestation as “Technical and Business Metadata” in both structured and unstructured data domains.

Tuesday, March 18, 2008

Data Integration Challenge – Initial Load – I

In a data warehouse all tables usually go through two phases of data load process they are the initial load and the incremental load. ‘History Load’ or ‘Initial Seeding/Load’ involves a one time load of the source transaction system data of the past years into the Data Management System. The process of adding only the new records (updations or insertions) to the data warehouse tables either daily or on a predefined frequency is called ‘Incremental Load‘. Also certain tables that are of small in size and largely independent set of tables which receives full data (current data + history data) as input would be loaded by means of a ‘Full Refresh‘; this involves complete delete and reload of data.

Especially code tables would usually under go a one time initial load and may not be required for a regular incremental load, incremental load is common for fact tables. Initial Load of a data warehouse system is quite a challenge in terms of getting it completed successfully within a planned timeframe. Some of the surprises or challenges faced in completing the history load are

Handling invalid records
Data Reconciliation
System performance
Catching up

Handling Invalid Records:
The occurrence of an invalid record becomes much more prominent as we process the history data which was collected into the source system much long before and the history data might not fit into the current business rules. The records from a source system can become invalid in the data warehouse due to multiple reasons like invalid domain value for a column or null value for a non null-able field or aggregate data not matching to the detail data. The ways of handling this problem effectively are

Determine the years of data to be loaded into the data warehouse very initially and ensure that the data profiling is performed on the sample data for all the years that has to be loaded. This ensures that most of the rules of data validation are identified up front and built as part of ETL process. In certain situations we may have to build separate data validation and transformation logic based on the year and data
Especially in situations like re-platforming or migrating the existing data warehouse to a new platform, even before running the data through regular ETL process we might need to load the old data into a data validation (staging) area through which the data analysis is done, cleaned and then data loaded into the data warehouse through regular ETL process
Design the ETL process to divert all the key values of the invalid records to a separate set of tables. In some sites we see that the customer just needs to be aware for the error records and fine if these records doesn’t get aligned into the current warehouse, but at times the invalid records are corrected and reloaded
For certain scenarios like aggregate data not matching to detail data, though we might always derive aggregate from detail data at times we might also generate detail data to match aggregate data

Data Reconciliation:
Once the initial load from the source system into the data warehouse has been completed we have to validate to ensure that the data has been moved in correctly.

Having a means of loading records in groups separated by years or any logical grouping like by customer or product would give a better control in terms of data validation. In general data validations performed are like count and sum should be tied to certain business specific validation rules like all customers from region ‘A’ belonging to division ‘1’ in the source should be classified under division ‘3’ in the current warehouse.
All the validations that needs to be performed after the initial load for each data group has to prepared and verified with the business team, many a times the data is validated by the business as a adhoc query process though the same can be verified by an automated ETL process by the data warehouse team

We shall discuss further on the other challenges in Part II.

Monday, March 17, 2008

Scripting Limitations in Siebel CRM

There are nearly 200 different types of objects that can be configured in Siebel to reflect the changes in the User Interface Layer, Business Objects Layer and the Data Objects Layer. Being a Siebel Professional, I often come across complex business Requirements that cannot be met solely by configuration. Siebel Scripting is great for handling these complex needs. But there are a few limitations to be considered while using scripting in Siebel development.

16 KB Size limitation
In Siebel eScripting, a single method can contain only 16KB of code. But in real life, one may have to go in for heavy customizations of Siebel objects for which 16KB is just not enough. My workaround in such cases is to split large methods into smaller methods.
Cannot mix scripting languages in an object
Most scripting in Siebel is in eScript. But, sometimes we need to use third party system/objects like ActiveX/COM where VBscript is the preferred language. So how do we get to external objects like ActiveX/COM objects from a Siebel Object coded in eScript given that we are not allowed to mix scripting languages in an object?. The solution is that we call another Siebel Object which is scripted in VBscript through which we can finally access the external system/object. This solution works fine. But it does lead to performance issues. Better ideas are welcome
Dynamic switching between server and browser side script not possible.
We can call server script from browser script. But once server script starts executing, it is not possible to call browser script. So, it is not possible to do things like call pop-up a message on the browser side after executing say 10 lines of code on server side. If anyone has any workarounds, please do let me know.
Object has to be compiled after every modification.
Till Siebel 7.8, developers had to compile and re-compile modified objects a every minor change in code. This is a big pain while debugging and a major time-waster. But in the latest version 8.0, Siebel provides “Fix and Go” option with ST Engine which has addressed this limitation. Thank you Siebel!!!
Scripts are not always upgradeable.
Scripts often give errors while upgrading a Siebel Application from one version to another. At times we have had to comment out entire sections so that the technical upgrade can complete. Finding all these problem areas in the script and fixing them is a big challenge during upgrades.

Feel free to comment/discuss/debate!

Wednesday, March 12, 2008

Analyzing Analytics

Dashboard Power
The other day, I was ‘Googling’ for a Siebel Analytics document when I stumbled upon this article which is quite a good example of how the prowess of Analytics is making life easier for commuters. Siebel Analytics now rechristened Oracle BI, helped officials at Manchester airport to cope with the extra pressure of screening additional bags by identifying future trends.

It provided Manchester Airport’s management team with interactive dashboards of flight, passenger and baggage movements. As a result, Manchester Airport lifted the ‘One Hand Baggage Rule’. The full article can be found out at the following link: http://www.computerweekly.com
/Articles/2008/01/07/228771/manchester-airport-lifts-one-bag-rule-with-oracle.htm

It’s a bit of wishful thinking but imagine the ease in travelling if every airport harnesses the capabilities of Siebel Analytics and lifts the above mentioned rule.

This first post of mine in the ‘Analyzing Analytics’ series is an attempt at exposing the readers, be it a neo-techie or a seasoned expert, to the vast capabilities of the end product of Siebel Analytics: The Intelligence Dashboards and the concept of Dashboard centric business operation and management. So let’s discuss what these ‘Dashboards’ are all about.

Case 1: What are the sales for this quarter? What’s the comparison of Sales for this quarter with year ago sales? How many customer accounts are in the red? Such kinds of questions are very common in day to day business analysis. The answer to these and a plethora of similar process oriented questions helps in running a business efficiently.

Case 2:Organizations often have data fragmented across multiple sources like spreadsheets, department specific systems, etc. Often critical data is buried deep somewhere in some file or system in the organization.

Keeping the above two cases in mind, a unified view of the organization is the need of the hour. An Analytical dashboard is just what any organization would desire. It helps in giving a holistic view of business operations, most often using data that has been gathered into a warehouse and occasionally working from operational data. Pre requisites to a healthy dashboard are clean data, a good data model, sturdy data mappings and neat analytics.

Essentially, a dashboard is the final outcome of a Siebel Analytics operation. Based on the metaphor of the instrument panel in a car, the Siebel Analytics version of a dashboard provides a user, mostly business managers, with the input necessary to “drive” the business. Siebel Analytics Interactive Dashboard refers to a web-based technology page on which real time or warehoused information (Reports) is embedded from various sources used in the business.
Analyzing Analytics

As seen in the above example of Interactive dashboards, business data is displayed infused with an array of user friendly views like Pie Charts, drill-downs, navigations, pivot table summaries, bar graphs, meters and gauges . These dashboards are often role-driven and customizable. They can be modeled to show historical, current and predictive data which leads to measure – based decision making. The biggest advantage of an Interactive dashboard is that it helps in decision making being more fact based and less intuitive.

Dashboard centric management is definitely going to be the next best thing in business analysis. The idea of viewing and analyzing the organizational facts and figures with the help of superb layouts in front of your computer screen is absolutely fantastic. In short, Dashboards are all about infusing ‘Intelligence’ in the business of the organization and a wonderful apparatus for any organization to possess.

Monday, March 3, 2008

PeopleSoft Fine Grained Auditing – Part II

Now, letlet’s test this policy. Log on to PeopleSoft environment using the browser and create a PRIVATE query referring to the above table. The query will not have any criteria and will fetch all rows (this table only had 1002 rows).
After executing the query, the audit data is populated in DBA_FGA_AUDIT_TRAIL.

select timestamp, db_user, client_id, object_name from dba_fga_audit_trail where object_name = ‘PS_ABC_COMPANY_TBL’

TIMESTAMP DB_USER CLIENT_I OBJECT_NAME

——— ——– ——– ———————

21-MAY-07 SYSADM NPAI PS_ABC_COMPANY_TBL

We can also, select the actual TEXT executed by the user by selecting the SQL_TEXT column in the above data dictionary view.

Summary:

I have shown a small example utilizing FGA for auditing the PeopleSoft database. FGA is a neat feature and allows us to audit specific rowset instead of auditing all the rows in the table. This is very useful when there exists a table which has sensitive + non-sensitive information, and you want to audit any un-authorized access to the sensitive column or rowset.

Note 1:

* As of 9i, FGA feature only allows auditing SELECT. 10g supports SELECT, DELETE, UPDATE and INSERT statements.

Note 2:

If you need to drop the policy then use the below SQL

begin

dbms_fga.drop_policy (

object_schema=>’SYSADM’,

object_name=>’PS_ABC_COMPANY_TBL’,

policy_name=>’ABC_COMPANY_TBL_ACCESS’

);

end;

Bug Note:

Do not forget to check out the bug related to FGA

http://www.red-database-security.com/advisory/oracle-fine-grained-auditing-issue.html

Read More About PeopleSoft Fine Grained Auditing

Pages

Enterprise Services

Ads 468x60px

Labels

Monday, March 24, 2008

Metadata 101 – For BI Practitioners

Tuesday, March 18, 2008

Data Integration Challenge – Initial Load – I

Monday, March 17, 2008

Scripting Limitations in Siebel CRM

Wednesday, March 12, 2008

Analyzing Analytics

Monday, March 3, 2008

PeopleSoft Fine Grained Auditing – Part II

Blog Archive

About Me

Popular Posts

Sample text

Followers

Pages

Enterprise Services

Ads 468x60px

Labels

Monday, March 24, 2008

Metadata 101 – For BI Practitioners

Tuesday, March 18, 2008

Data Integration Challenge – Initial Load – I

Monday, March 17, 2008

Scripting Limitations in Siebel CRM

Wednesday, March 12, 2008

Analyzing Analytics

Monday, March 3, 2008

PeopleSoft Fine Grained Auditing – Part II

Social Icons

Blog Archive

About Me

Popular Posts

Sample text

Followers