Wednesday, May 30, 2007

New Face on the Corporate Board – The CDO


People who read the last post on this blog “What is a Data Warehouse” would probably accept my view that for an organization to get better at anything worthwhile, “data” is everything. If you accept this notion, I propose the immediate creation of a new ‘C’ level organizational position – Chief Data Officer (CDO).

To me, the CDO is a more important position than the more glamorous CIO (Chief Information Officer). After all, the input to any strategic information is raw data and many organizations don’t have a comprehensive focus on data that is present within its boundaries. It is important to realize that ‘Good data is a source of competitive advantage and not just any data’.
Let us for a moment assume that there is an organization with the CDO structure in place. The next question is – How should the CDO go about doing the job, given the massive amount of data generated by organizations? – Answer: Divide & Conquer!
The 6 mutually exclusive, collectively exhaustive (MECE) types of organizational data are given below:
Type 1) Transaction Structure Data – Business processes are a series of never-ending transactions. All these transactions has a context and this is defined by this category of data. Examples are: Products, Customers, Departments etc.
Type 2) Transaction Activity Data – These are the transactions themselves. Ex: Purchase Order data, Sales Invoice data etc.
Type 3) Enterprise Structure Data – These data elements are unique to each organization and the inter-relationships between data elements are important. Ex: Chart of Accounts, Org Structure, Bill of materials, etc.
Type 4) Reference Data – Set of codes, typically name-value pairs that drives business rules. Ex: Region Codes, Customer Types etc.
Type 5) Metadata – Data that defines other data thus making the collection a self-defining entity
Type 6) Audit Data – With so much focus on regulatory compliance, this is the data that tracks all the operations within a data store
Type 1,3 & 4 together is defined as Master Data and its management is the subject of numerous BI articles and white papers.
Our CDO would do well to understand all these 6 types of data in the organization and have some specific strategies to improve their quality. This & many other data management strategies will be the focus of this blog – Please do keep reading.

Inter Process Communication with Unix Operating System – How Peoplesoft Uses them?


First of all, I am going to tell you something about IPC System on Unix Operating System and how it is related managing a Peoplesoft based environment.
I took this topic first since PeopleSoft uses this for the application server and Process Scheduler server level; we need to know IPC System to understand how these PeopleSoft server processes work. We normally face issues related to this during our real-time supporting a Peoplesoft environment. A process, in Unix perspective, is a running program, created by the fork() system call. That’s all.
We can say, processes are created by ‘forking’.  Forking means a new process is getting created.  Obviously it also means we have a parent process to create a child process.  So, there exists a process, a running program in Unix system, which creates a sub-process. These two are parent and child processes.
I am trying to explain the Unix IPC System with relate to Peoplesoft server components.
Peoplesoft Application Server has many processes. You will be knowing almost all of them, for example:
BBL, PSAPPSRV, PSSAMSRV, PSQRYSRV, PSQCKSRV, JSL, WSL, Integration Broker server processes etc.
Peoplesoft Application Server is said to be the core of Peoplesoft Internet Architecture (PIA).  It has two components; Peoplesoft services and Peoplesoft server processes.
The Peoplesoft server processes needs to communicate among themselves using IPC System on the Unix Operating System.
That’s why, understanding IPC System in Unix, is one of the important items to be familiar with. Basically, Unix processes communicate using socket. Some of the important IPC Sockets are (We are talking about System V IPC here):
Shared Memory
Message Queues
Semaphores
Named Pipes etc
Problem Situation:
Whenever booting Peoplesoft Server processes, if it complains that already “Server Exists”, most likely, there are some IPC resources not cleanly closed. In that case, it is better to lost down all the “ipcs”, and then using these Peoplesoft Shell scripts.
Peoplesoft provides a shell script for handling IPC in Unix Operating System.
The script is called “ipcrmall.sh”. This script needs to be run as below:
cd $PS_HOME
. ./psconfig.sh
cd appserv
./ipcrmall.sh psoft psoftgrp
Here ‘psoft’ is user account that peoplesoft runs. ‘psoftgrp’ is group name.
IMPORTANT Note:
Here I assume the following:  You have only ONE Peoplesoft domain running using “psoft” account. For multiple domains running on the same user account (which is not a good practice!), you need to follow different approach for multiple domains (I will tell you later about this!) After we run this command, there is a new shell script getting created “killipc.sh”,you need ro run this script to remove the IPC that peoplesoft uses.
If you open “ipcrmall.sh” shell script, you can very easily understand, following three commands are useful:
“ipcs -m” – Lists all the Shared Memory in the System
“ipcs -s”  -  Lists all the Semaphores in the System
“ipcs -q”  – Lists all the Message Queues in the System
Tuxedo uses these three IPC mechanisms for inter-process communication.

Tuesday, May 29, 2007

Perils of DataMover Access – Part 1

Did you know that users connected to DataMover will have database access similar to access id?
What does this mean for you in a Production Environment? This can be really scary if you do not have security in place to ensure that DataMover access is restricted and controlled. Use the below SQL to determine who has access to DataMover, through which permission list and role.
SELECT DISTINCT A.CLASSID, B.ROLENAME, C.ROLEUSER
FROM PSAUTHITEM A, PSROLECLASS B, PSROLEUSER C, PSOPRDEFN D
WHERE A.CLASSID = B.CLASSID
AND B.ROLENAME = C.ROLENAME
AND A.MENUNAME = ‘DATA_MOVER’
AND D.OPRID = C.ROLEUSER
AND D.ACCTLOCK = 0
Ensure that the above SQL does not fetch any surprises.
What can a user with access to Datamover do?
datamover6
As we see above, the user can create an Oracle user and grant DBA role to that user. And this is just one example of the access that is available to that OPRID after logging on to Datamover. Basically all access available to access id is now available to the OPRID.
Summary:
1. Ensure that only authorized OPRID’s have access to DataMover security in PeopleSoft
2. Audit changes to PSAUTHITEM
3. Control privileges granted to access id. Do not go overboard and assign DBA role to the access id.
Next Steps:
In Part 2, I will provide some tips on auditing online security to ensure that any online changes to DataMover access are audited.
In Part 3, I will cover tips for DataMover security in non-production environment and conclude the post on ‘Perils of DataMover Access’.

Thursday, May 24, 2007

What is a Data Warehouse (DW) ?


To define the term Data Warehouse (DW) especially to software developers who are new to the industry, have tried asking them a few simple questions before getting to the classic definition in the words of Bill Inmon. Some of the questions which leads to defining a Data Warehouse are:

Q: What is Data?
A: ‘Data’ is a collection of facts which are captured as it happens.
E.g., the content present in a Survey Sheet is ‘
Data

Q: What is information?
A: The details that are derived by processing the ‘Data’ are called Information.
E.g., the details that are arrived from the survey data like total, average etc are called Information

Q: What is a system that collects ‘Data‘ called?
A: A computer system that collects ‘Data’ is usually called an OLTP (Online Transaction Processing System) system. This system is designed to collect data in a much more rapid way.
E.g., The survey data could be captured into a laptop using a software application,An ATM machine or a Core banking system for deposit/debit interaction…

Q: How is ‘Information’ derived from ‘Data’?
A: The ‘Data’ is pulled out from the OLTP system and moved to a separate data store/ system and then processed to derive Information. A computer system that acts as a platform for processing the ‘Data’ to derive ‘Information’ is called a Data warehouse.

The ‘Information’ gathered from DW system helps an Organization in gaining more Knowledge about their business. This gained Knowledge helps the Organization in Decision making hence the DW system which supports decision making is part of the “Decision Support System”bi-data-warehouse-11

Q: What are the key characteristics of a Data Warehouse?
A: A DW is designed to
1. store large quantity of data across years
2. push out ‘Data’ faster from its storage to the Information processing engine

Q: Why is a Data Warehouse required?
A: The OLTP system is usually used by many people to collect (push) data from the outside world into its storage where as the DW system is usually used by few people to pull the data out from its storage. Volume of data lying inside a DW system is very much higher that that in an OLTP system. The purpose of each system is different so designing a separate OLTP and DW system to cater to their unique requirement became imperative.

But this segregation between OLTP and DW has happened gradually. During the initial years the DW related activities were more done on OLTP systems and it still happens before an organization or department feels the need for a DW system.

The need for a DW system is felt due to issues related to
1. Performance
2. Maintenance
3. Data Integration
To add more variety to your thoughts on Operational BI, you can read it More Data Warehouse


-Pandian C M

My journey – From Web Development into the world of Unix and PeopleSoft!


Long long ago, I started my software career as a web developer who wrote coding on Perl CGI and running them using Apache….. Good or bad, I forgot most of the coding part now, except If I see part of the code, I am able differentiate whether this is a shell script or a perl script. And one more skill(really?) that, I developed was to understand the script based on the extension provided to the script or program whether that is a shell script, perl script or python script (.sh,.ksh,.csh,.pl,.py sometimes .bash)…
We were using a Linux distribution at that time (I hope it was Redhat 6.x or 7.x something), that was one more reason why I am talking always about perl, shell or python here..  All I want to say here is, I am not a developer and only a small potion of my life, I was associated with coding… thats all.
Believe me, I really do not know what was the ‘exact’ reason why I moved to Unix Administration after a short ‘webo’ experience… I believe there may be reasons, one of them was my engineering background (Chemical Engineering – dont ask me why I moved to computers, that is another good topic for research!), I got bored with coding work. Moving to unix, full time, was purely a decision taken by myself based on my personal liking.
I used HP_UX during my college days and was developed a huge interest with Unix at that time. Then I got Sun Certified (both System Admin and Network Admin for the Solaris 8 Operating Environment). I tried to become Redhat certified (and failed with 25% score, I didn’t retry again!). I did a lot of setup, configurations, etc on Solaris. After that I become certified in HP_UX with System Administration as well. Just after that, one more miracle happened, I moved to a new project as a Peoplesoft Administrator. This time, It was not my decision, a decison whether rational or irrational, I do not know, taken based on Organizational requirement ( Herbert Simon may call it as “Organizationally rational decision” in his Decision-making model).  And I still continue doing Unix, Peoplesoft and some more bunch of software support.
Nobody is an expert in any areas, especially if you take Unix and Peoplesoft, they are like Ocean, but there is an end always on what we know. We need constant updates from Industry about latest progress, new releases, etc. Some of the operating systems that Peoplesoft runs are Sun Solaris OE (Version 10 being latest), IBM AIX 5L, HP_UX 11i and Microsoft’s OS’s. I worked mostly on Sun Solaris, IBM AIX and HP_UX during my career till now, along with PeopleSoft Infrastructure involved, so most of my experiences that I am going to write here, will be related to this cluster of software.
If you provide Peoplesoft infrastructure support and you want to be an expert, obviously two non-peoplesoft components that you should know well, the Operating System and the DBMS system. These are like two eyes for a PeopleSoft infrastructure support person, for one who provide either infrastructure support or architectural support for a Peoplesoft environment. In most of my blogs I will talk about Unix and PeopleSoft (I know only about this). I will write more on this on coming days on this blogs. Hope you will enjoy reading about them. :)

Tuesday, May 22, 2007

Is this the right choice???

When I began writing this blog, I really thought that why did I even think of broaching the topic of change in the aviation sector. I am not an expert in this area so to say. But I am not a novice either. I mean around 4 years in this area has made me realize that this is an area incorporating change on a regular basis. Change is inevitable and has a deep influx in a domain that has been technology driven for ages. But the important factor is that when change is ‘properly guided’, it can yield excellent results in every aspect of this area.
Airlines today are not just about size, stature or time span of operation, but more about adaptability to change. For ages, it was an industry dominated by IBM (mostly) and Unisys machines. No longer seen as too feasible to continue with, the industry is looking for alternatives in the ‘Open Systems’ domain. There is no doubt that this will take time to stabilize, but there is no denial of the eventuality taking place some time soon in future.
So what do we (neither experts nor novice) people do in such a scenario. My personal take on this is that we let ourselves amass as much knowledge as we can. The reason being that whatever is the technology used, the industry would not give up its current practices/functions entirely. It might keep changing them from time to time to suit its needs. It takes months to build a software product, but years to build practices and turn them into standards. We need to understand that domain expertise is something that takes time to grasp and is an added boon. To add to it should be our ability to translate it into the technology desired by the end user.
It may not be as easy like this all the time. But change is what one needs to keep looking out for. For instance, today airlines are not just interested in plain reservations or check-ins. They are also interested in knowing where are they going wrong, what are they doing right, where can they improve. And this is what we can also focus on. Data Warehousing for instance today is a necessity for airlines rather than a liability. Web based services that enhance passenger – airline associations are of considerable importance too.
These are just a few examples that I am citing, but the crux of it is still that people who have a broader picture in mind are ready to look at the old system in a new way. It would be foolhardy to ignore the changing trends. But it cannot be achieved in a day either. It is for each individual to analyze his/her own personal strengths in the domain (this is apart from programming). The reason being that every one of us has a different way to interpret information. That is what separates an analyst from a programmer. And to understand this subtle difference, one needs to not just look at the current generation of work but look at what in the future can take it’s role. This is what I personally believe is the right recipe to bring about the guided change.
Read More about  Point to Point