Geeks With Blogs

News Google

Nick Harrison Blog<Nick>.Next()

Thoughts on a Data Dictionary

I have often thought about what would make a good data dictionary.   Usually I have lamented the absence of any meaningful data dictionary beyond what the DBMS itself keeps on its own.   If everyone would simply add comments everywhere that the database natively allows, we would all be oh so much better off.   But alas we don’t.

As is so often the case, if we don’t have to do something, we probably won’t.    Often times, it’s a matter of meeting deadlines, sometimes it’s a matter of shear laziness, but sometimes, it may just be a case of not understanding the need or not knowing what is intrinsically available.

Let’s tackle the need first…

Database objects rarely have meaningful names.   Database objects with meaningful names to one group may not be meaningful to another group.   When you are trying to learn a new data model, the history of this data model is useful to understand the conventions adopted or ignored.

In its simplest form, I have several goals for a data dictionary:

·        Define the naming conventions followed
·        Identify known exceptions to these conventions
·        Provide an English description for each database object (table, column, stored proc, trigger, etc)
·        Document the accepted data types and their purposes
·        Identify known exceptions to the data type rules
In Oracle or SQL Server, this can all be accomplished with a relatively small word document and the native data dictionary.

I think though that to actually enforce these conventions, we need a little extra support:

·        A Data Thesaurus
·        A rule validator similar to fxCop
The data thesaurus would be used to ensure that appropriate naming conventions are used.   Proposed object names would be split on a case change and run through the thesaurus.   If the individual words are found in main line entries, you are good to go.   If an individual word is found as an alternate, replace it with the main line entry and proceed.   If an individual word is not found in the thesaurus, flag it for review.

Such a process would eliminate such confusing names as:

·        ProcessDate
·        ProcessDte
·        Process_Date
The first name would work as entered.   The second one would be converted to the first one, and the third entry would be rejected since Process_ was not found in the thesaurus.   The key thing here is that it tracks and enforces the convention at the time of object creation not at the end of the process.

The above is one rule based on the data thesaurus.   Other rules that may be useful might include:

·        Insure that object names are not plural.   Each record in the “Members” table is a Member
·        Insure that each table name is a noun
·        Ensure that every column is either a noun or an adjective.   Columns should not be verbs
·        Ensure that every table has an associated comment
·        Ensure that every column has an associated comment
·        Ensure that every stored procedure has an associated comment
·        Ensure that stored procedures pass the thesaurus test
·        Ensure that stored procedures have the form ActionObject, such as InsertMember or SelectMembersByGroupID

·        Ensure that stored procedures have the form NounAction such as MemberInsert or MemberSelectByGroupID



What are your thoughts on a data dictionary?   Has anyone incorporated similar ideas into one?   How did it work out?   Have you thought about and decided against?

Posted on Thursday, October 16, 2008 8:21 PM | Back to top

Comments on this post: Thoughts on a Data Dictionary

# re: Thoughts on a Data Dictionary
Requesting Gravatar...
usually you do not get this period. you either infer it on the job or have to justify it (which will FAIL due to COST for the MAN HOURS necessary).

what I usually see is that you are asking for something they've already tried and know it gets out of date rather quickly.
Left by Rod on Oct 27, 2008 3:41 PM

comments powered by Disqus

Copyright © Nick Harrison | Powered by: