Friday, June 29, 2007

Data Integration Challenge – Facts Arrive Earlier than Dimension


The fact transactions that come in earlier than the dimension (master) records are not bad data, such fact records needs to be handled in our ETL process as a special case. Such situations of facts coming in before dimensions can occur quite commonly like in case of a customer opening a bank account and his transactions starting to flow into the data warehouse immediately.
But the customer id creation process from the Customer Reconciliation System can get delayed and hence the customer data would reach the data warehouse after few days.
How do we handle this scenario differs based on the business process being addressed, there could be two different requirements
  • Make the fact available and report under “In Process” category; commonly followed in financial reporting systems to enable reconciliation
  • Make the fact available only when the dimension is present,; commonly followed in status reporting systems
Requirement 1: Make the fact available and report under “In Process” category
For this requirement follow the below steps
  1. Insert into the dimension table a record that represents a default or ‘In Process’ status like in case of the banking example the Customer Dimension would have a ‘default record’ inserted that represents the information that the customer detail has not yet arrived

  2. In the ETL process while populating the Fact table, for the transactions that do not have a corresponding entry in the Dimension table, assign a default Dimension key and insert into the Fact. In the same process insert the Dimensions Lookup values into a ‘temporary’ or ‘error’ table

  3. Build an ETL process that checks the new records inserted into the Dimension table, queries the temporary table and identifies the records in facts for which the dimension key has to be updated and updates the respective fact’s dimension key

Requirement 2: Make the fact available only when the dimension is present
For this requirement follow the below steps
  1. Build an ETL process that populates the fact into a staging table

  2. Build an ETL process that pushes only the records that have a dimension value to the data warehouse tables

  3. At the end of ETL process delete all the processed records from the staging table making the other unprocessed records available to be pulled next time

You can read more about  Data Integration Challenge

Monday, June 25, 2007

Business Intelligence: Gazing at the Crystal Ball


Circa 2015 – 8 years from now
CEO of a multinational organization enters the corner office overlooking the busy city down below. On flicking a switch near the seat, the wall in front is illuminated with a colorful dashboard, what is known in CEO circles then, as the Rainbow Chart.
The Rainbow Chart is the CEO’s lifeline as it gives a snapshot of the current business position (the left portion) and also figures/colors that serves as a premonition of the company’s future (the right portion).
The current state/left portion of the dashboard, on closer examination, reveals 4 sub-parts. On the extreme left is the Balance Sheet of the business and next to it is the Income statement. The Income statement has more colors that are changing dynamically as compared to the Balance sheet. Each line item has links to it, using which the CEO can drill down further to specific geographies, business units and even further to individual operating units. The third part has the cash flow details (the colors are changing far more rapidly here) and the fourth one gives the details on inventory, raw materials position and other operational details.
The business future state/right portion of the dashboard has a lot of numbers that can be categorized into two. The first category is specific to the business – Sales in pipeline, Revenue & Cost projections, Top 5 initiatives, Strategy Maps etc. and the second category are the macroeconomic indicators across the world. At the bottom of the dashboard is a stock ticker (what else?) with the company’s stock prices shown in bold.
All these numbers & colors change in real-time and the CEO can drill up/down/across/through all the line items. Similar such dashboards are present across the organization and each one covers details that are relevant for the person’s level and position in the company.
This in essence is the real promise of BI.
Whether it happens in 2015 or earlier (hopefully not later!) can be speculated but the focus of the next few blogs from my side will zero-in on some of the pre-requisites for such a scenario – The  Business Intelligence Utopia!

Business Intelligence @ Crossroads


Business Intelligence (BI) is well & truly at the crossroads and so are BI practitioners like me. On one hand there is tremendous improvement in BI tools & techniques almost on a daily basis but on the other hand there is still a big expectation gap among business users on Business Intelligence’s usage/value to drive core business decisions.

This ensures that every Business Intelligence (BI) practitioner develops a ’split’ personality – a la Jekyll and Hyde, getting fascinated by the awesome power of databases, smart techniques in data integration tools etc. and the very next moment getting into trouble with a business user on why ‘that’ particular metric cannot be captured in an analytical report.
For the BI technologists, there is never going to be a dull moment in the near future. With all the big product vendors like Microsoft, Oracle, SAP etc. throwing their might behind BI and with all the specialty BI product vendors showing no signs of slowing down, just get ready to join the big swinging party.
For the business users, there is still the promise of BI that is very enticing – ‘Data to Information to Knowledge to Actions that drive business decisions’. But they are not giving the verdict as of now. Operational folks are really not getting anything out of BI right now (wait for BI 2.0) and the strategic thinkers are not completely satisfied with what they get to see.
The techno-functional managers, the split personality types are the ones in the middle trying to grapple with increasing complexity on the technology side and the ever increasing clamor for insights from the business side.
Take sides right away – there is more coming from this space on the fascinating world of Business Intelligence.

Perils of DataMover Access – 3

The story is a bit different in a non-production environment. Depending on your organization change control policies, developers might have Data Mover Access in non-production. In this case, we might want to prevent the OPRID from exploiting Data Mover Access to perform undesired DDL on the database.
To tackle this problem, you can create a DDL trigger as shown below. This will ensure that no DDL operations are performed from Data Mover.
CREATE OR REPLACE TRIGGER DATAMOVER_PREVENT_DDL
BEFORE CREATE OR ALTER OR DROP OR GRANT OR RENAME OR REVOKE ON SCHEMA
DECLARE
VAR_DDLEVENT VARCHAR2(25);
VAR_OBJ_NAME VARCHAR2(128);
V_AUDIT_OPRID VARCHAR2(32);
BEGIN
DBMS_APPLICATION_INFO.READ_CLIENT_INFO(V_AUDIT_OPRID);
SELECT ORA_SYSEVENT, ORA_DICT_OBJ_NAME, GET_PS_OPRID(V_AUDIT_OPRID)
INTO VAR_DDLEVENT, VAR_OBJ_NAME, V_AUDIT_OPRID FROM DUAL;
IF ( VAR_DDLEVENT IN (‘CREATE’,'ALTER’,'DROP’, ‘GRANT’, ‘RENAME’, ‘REVOKE’) AND V_AUDIT_OPRID != ‘!NoOPRID’)
THEN
RAISE_APPLICATION_ERROR(-20001,’**** THIS OPERATION IS NOT ALLOWED ****’);
END IF;
END;
Now, if the developer with Data Mover Access tries to grant his Oracle id DBA access then he will get the below message.
Error_msg
Conclusion
We have seen how dangerous Data Mover Access can be if controls are not in place. This access is often overlooked and can have serious implications. The best approach will need to start with cleaning the privileges assigned to the access id. The PeopleSoft access id should have only the required access. Do not go overboard and assign DBA role to the access id.

Monday, June 18, 2007

Perils of DataMover Access – Part 2b

 Continuing from my previous post – “Perils of DataMover Access – Part 2a
Let’slook at a scenario where the security was modified to enable Data Mover Access.
Output_1_2
The above results show that oprid NPAI ‘Added’ the security to enable DATA MOVER Access.
It is possible that the OPRID’s who by-passed controls to modify Data Mover Access are smart enough to also delete the rows in the audit table using Datamover. This might go undetected unless you have some additional monitoring in place.
Ideally, you might want to create a trigger to fire on any INSERT into sensitive audit records. The trigger should call a procedure which will send an email notification for alerting the group managing the sensitivity of the environment.
CREATE TRIGGER PSAUTHITEM_AUDIT_EMAIL
AFTER INSERT ON PS_AUDIT_AUTHITEM FOR EACH ROW
DECLARE
VAR_JOB_ID NUMBER;
BEGIN
DBMS_JOB.SUBMIT(VAR_JOB_ID, ‘SEND_EMAIL( JOB );’ );
INSERT INTO PS_AUDIT_EMAIL VALUES (VAR_JOB_ID, :NEW.AUDIT_OPRID, :NEW.AUDIT_STAMP, :NEW.AUDIT_ACTN );
END;
Alternatively, we can create a trigger to prevent execution of any DELETE or DROP sql on the audit table.
  • Prevent delete
CREATE OR REPLACE TRIGGER PSAUTHITEM_AUDIT_PREVENT_DEL
AFTER DELETE ON PS_AUDIT_AUTHITEM FOR EACH ROW
BEGIN
RAISE_APPLICATION_ERROR(-20001,’**** DELETE NOT ALLOWED ****’);
END;
  • Prevent DROP or TRUNCATE
CREATE OR REPLACE TRIGGER PSAUTHITEM_AUDIT_PREVENT_DDL
BEFORE DROP or TRUNCATE ON SCHEMA
DECLARE
VAR_DDLEVENT VARCHAR2(25);
VAR_OBJ_NAME VARCHAR2(128);
BEGIN
SELECT ORA_SYSEVENT, ORA_DICT_OBJ_NAME INTO VAR_DDLEVENT, VAR_OBJ_NAME FROM DUAL;
IF ( VAR_DDLEVENT IN (‘DROP’, ‘TRUNCATE’) AND VAR_OBJ_NAME IN (‘PS_AUDIT_AUTHITEM’, ‘PSAUTHITEM_AUDIT_PREVENT_DEL’))
THEN
RAISE_APPLICATION_ERROR(-20001,’**** DROP OR TRUNCATE NOT ALLOWED ****’);
END IF;
END;

Here is what you get when you try to drop the table or delete rows from data mover.
Output_2_5
Note:
- Truncate is not a valid command in DataMover.
- You can also consider creating the DDL event trigger in another schema to fire on the DATABASE level instead of SCHEMA. This will ensure that the OPRID cannot DROP the trigger itself from DataMover.

Thursday, June 14, 2007

DI Challenge – Handling Files of different format with same subject content


In a Data Integration environment which has multiple OLTP systems existing for same business functionality one of the scenarios that occur quite common is that of these systems ‘providing files of different formats with same subject content’.
Different OLTP systems with same functionality may arise in organizations like in case of a bank having its core banking systems running on different products due to acquisition, merger or in a simple case of same application with multiple instances with country specific customizations.
For example data about same subject like ‘loan payment details’ would be received on a monthly basis from different OLTP systems in different layouts and formats. These files might arrive in different frequency and may be incremental or full files.
Always files having same subject content reach the same set of target tables in the data warehouse.
How do we handle such scenarios effectively and build a scalable Data Integration process?
The following steps help in handling such situations effectively
• Since all the files provide data related to one common subject content, prepare a Universal Set of fields that would represent that subject. For e.g., for any loan payment subject we would have a set of fields identified as a Universal Set representing details about the guarantors, borrower, loan account etc. This Universal Field list is called Common Standard layout (CSL)
• Define the CSL fields with a Business Domain specialist and define certain fields in the CSL as mandatory or NOT NULL fields, which all source files should provide
• Build a set of ETL process that process the data based on the CSL layout and populates the target tables. The CSL layout could be a Table or Flat File. In case the CSL is table define the fields as character. All validations that are common to that subject are built in this layer.
• Build individual ETL process for each file which maps the source files fields to the CSL structure. All file specific validations are built in this layer.
Benefits of this approach
• Conversion of all source file formats to CSL ensured that all the common rules are developed as reusable components
• Addition of a new file that provides same subject content is easier, we need to just build a process to map the new file to the CSL structure
Read more about :Data Integration Challenge

Monday, June 11, 2007

First Step in Knowing your Data – ‘Profile It’


Chief Data Officer (CDO), the protagonist, who was introduced before on this blog has the unenviable task of understanding the data that is within the organization boundaries. Having categorized the data into 6 MECE sets (read the post dated May 29 on this blog), the data reconnaissance team starts its mission with the first step – ‘Profiling’.
Data Profiling at the most fundamental level involves understanding of:
1) How is the data defined?
2) What is the range of values that the data element can take?
3) How is the data element related to others?
4) What is the frequency of occurrence of certain values, etc.
A slightly more sophisticated definition of Data Profiling would include analysis of data elements in terms of:
  • Basic statistics, frequencies, ranges and outliers
  • Numeric range analysis
  • Identify duplicate name and address and non-name and address information
  • Identify multiple spellings of the same content
  • Identify and validate redundant data and primary/foreign key relationships across data sources
  • Validate data specific business rules within a single record or across sources
  • Discover and validate data patterns and formats
Armed with statistical information about critical data present in enterprise wide systems, the CDO’s team can devise specific strategies to improve the quality of data and hence the improve http://blogs.hexaware.com/business-intelligence/first-step-in-knowing-your-data-profile-it-2/ the quality of information and business decisioning.

To add more variety to your thoughts on Operational BI, you can read it More  Data Profiling 


Thursday, June 7, 2007

How to recreate Audit triggers when performing security restores/imports?


In response to Charles’ comments (shown below), I thought it will be nice to respond with a post which provides detailed instruction.
In your example for PSAUTHITEM, does the trigger gets stored as part of the tools? In other words, if I run security export from this instance to another one, will the trigger get migrated too or should I run the create trigger SQL in the target system?”
Here are the steps.
1. When you generate the trigger SQL (step 6 in my previous post), modify the SQL (as shown in step 7 in previous post) and click on the Save button as shown belowaudittriggers1
2. This will create an entry in PSTRIGGERDEFN
audittriggers2
Now, every time you copy/retain security. Do the following additional task
a. Go to PeopleTools —>Utilities —>Audit —>Perform Database Level Audit
audittriggers3
b. Create a new run control id
audittriggers4
c. Select checkbox Create all Triggers
audittriggers5
d. Run the process audittriggers6
4. The AE program will create a trgcode1.sql file in $PS_SRVRDIR. You can check the trace file and search for this file name.
audittriggers7
5. Run this SQL to recreate the triggers.

Tuesday, June 5, 2007

Perils of DataMover Access – Part 2a

Perils of DataMover Access– Part 2a
PeopleSoft provides trigger-based auditing functionality as an alternative to the record-based auditing that PeopleSoft Application Designer provides. Perform the following steps to setup trigger based auditing for PSAUTHITEM.
1.  Create a custom table to store the audit data for PSAUTHITEM. And build the record in the database.
datamover2a1
2.  Navigate to PeopleTools –> Utilities –> Audit –> Update Database Level Auditing.
datamover2a2
3.  Add a New Value and select Record Name PSAUTHITEM
datamover2a3
4.  Select the record AUDIT_AUTHITEM (we created in step 1) as the Audit Record.
datamover2a4
5.  Check all the audit options.
6.  Click on Generate Code button. This will generate the SQL for creating the trigger..
datamover2a5
7.   Modify the script as below to include the MENUNAME.
CREATE OR REPLACE TRIGGER PSAUTHITEM_TR
8.  Ensure that the GET_PS_OPRID function exists. Copy the SQL and execute it in SQLPLUS .
Now we are ready to audit any changes to PSAUTHITEM.
Summary
In my next post, I will demonstrate a test scenario and also provide insights into monitoring the audit results.
AFTER INSERT OR UPDATE OR DELETE ON PSAUTHITEM
FOR EACH ROW
DECLARE
V_AUDIT_OPRID VARCHAR2(64);
BEGIN
DBMS_APPLICATION_INFO.READ_CLIENT_INFO(V_AUDIT_OPRID);
IF INSERTING
THEN
INSERT INTO PS_AUDIT_AUTHITEM
VALUES (GET_PS_OPRID(V_AUDIT_OPRID),SYSDATE,’A',:NEW.CLASSID,
:NEW.MENUNAME,:NEW.BARITEMNAME,:NEW.PNLITEMNAME,:
NEW.DISPLAYONLY,
:NEW.AUTHORIZEDACTIONS);
ELSE
IF DELETING
THEN
INSERT INTO PS_AUDIT_AUTHITEM
VALUES (GET_PS_OPRID(V_AUDIT_OPRID),SYSDATE,’D',:OLD.CLASSID,
:OLD.MENUNAME,:OLD.BARITEMNAME,:OLD.PNLITEMNAME,:
OLD.DISPLAYONLY,
:OLD.AUTHORIZEDACTIONS);
ELSE
INSERT INTO PS_AUDIT_AUTHITEM
VALUES (GET_PS_OPRID(V_AUDIT_OPRID),SYSDATE,’K',:OLD.CLASSID,
:OLD.MENUNAME,:OLD.BARITEMNAME,:OLD.PNLITEMNAME,:
OLD.DISPLAYONLY,
:OLD.AUTHORIZEDACTIONS);
INSERT INTO PS_AUDIT_AUTHITEM
VALUES (GET_PS_OPRID(V_AUDIT_OPRID),SYSDATE,’N',:NEW.CLASSID,
:NEW.MENUNAME,:NEW.BARITEMNAME,:NEW.PNLITEMNAME,:
NEW.DISPLAYONLY,
:NEW.AUTHORIZEDACTIONS);
END IF;
END IF;
END PSAUTHITEM_TR;
/
Read more about DataMoverAccess.