Monday, June 2, 2008

Data Integration Challenge – Parent-Child Record Sets, Child Updates

There are certain special set of records like Loan & its Guarantor details in a banking system, each Loan record can have one or more Guarantor record. In a similar way for a services based industry Contracts & its contract Components exist, these sets can be called as parent-child records where in for one parent record like Loan we might have zero to many child records of Guarantor.
During data modeling we would have one table for the parent level record and its attribute, another separate table for the child records and its attributes.
As part of the data load process, have seen situations where a complete refresh (delete & insert) of the Child records is required whenever there is a change in certain attributes of a parent record. This requirement can be implemented in different ways; here we would look at one of the best ways to get this accomplished.
The following steps would be involved in the ETL process
  1. Read the parent-child record
  2. Determine if a change in the incoming parent record
  3. If a change has occurred then issue a delete to the particular set of child records
  4. Write corresponding incoming new child records into a flat file
  5. Once step 1 to 4 is completed for all parent records have another ETL flow that would bulk load the records from the flat file to the child table
We didn’t issue an insert with a new incoming child record after the delete because the deleted record wouldn’t have got committed and an insert can lock the table. We can issue a commit after every delete and then follow it with an insert but having a commit after each delete would be costlier, writing the inserts to the files handles this situation perfectly.
Also an option to insert first with a different key and then delete the older records would be costlier in terms of locating the records that needs to the deleted.
We could have also looked at the option of updating the records in place of deletion then we would at times end up having dead records in the child tables; the records that have been deleted in the source would still exist in the target child table, also updating a record can disturb contagious memory, deletion and insert would have the pages intact.
Read More about  Data Integration


0 comments:

Post a Comment