Sunday, May 26, 2013

Data Governance as a part of the SDLC | LinkedIn Group: Data Governance & Stewardship

I agree with most of what has already been said in preceding comments:
Standardization of the SDLC and the related artifacts such as data models, process models and data flow diagrams definitely contribute to transparency which is a basic demand of any Data Governance endeavor, regardless of industry-specific compliance requirements.

However, Data Governance primarily demands traceability of the production data itself, i.e. transparent data lineage is a major prerequisite so that consuming applications / users can judge the reliability and trustability of data.

Consequently, the SDLC of applications that create, update or delete governance-sensitive data will need to include logbook tables into the application data models and subsequently into the application databases. Such logbook tables comprise e.g. the following columns (and their related trigger functions) and record for each modification event of an application database row (and possibly even of an application database row column):
  • Timestamp
  • Actor (e.g. staff member, batch process, third-party)
  • Physical source (e.g. third-party self-service (Web) application form, postal code verification from external reference, MDM hub, migrated database, merger / acquisition database)
  • Status (e.g. active, inactive because customer passed away, inactive as being a duplicate entry)
  • Quality indicator, i.e flagged if incomplete and/or incorrect (NOT NULL columns empty, filled with semantically incorrect values or meaningless defaults); flagged if referential integrity is violated (e.g. not every customer has an address)).
Data extraction mechanisms for data warehouses / BI purpose will need to have the ability of filtering data based on its logbook information (and of tracing the data lineage backwards) to make sure that only reliable data contributes to a decision process (or the user is accordingly warned about the related risk).