An Abstract Data Model
This is post 3 from a 7 part series entitled Technical Achievements in my Last Project. Overview Normally, when I build a new system, I design the new data model based on the requirements, and build my business objects and data access, based primarily on a that data model*. The remainder of the application is built on the components beneath it, so when you change something at the bottom, like the data model, changes ripple throughout the application. The data model serves as the foundation of my application. Now as far as this project goes, one of the important requirements was to deliver the new system incrementally, while leaving the older system to run in parallel until completely replaced. Parallel Data Models This presented a bit of a dilemma for me since the current database was … well … lacking, and I was planning to refactor it enough to make it a very unstable foundation for the old system. I wanted to refactor it for a number of reasons including; missing primary keys, no foreign keys, no constraints, data fields which were required but not there, data fields which were there but not used, data fields containing 2 or more pieces of information, and tables which should have been multiple tables. Not to mention the desire to achieve a consistent naming convention without the insane column names using characters like ‘/’ and ‘?’ … seriously. However the parallel systems requirement caused a bit of a dilemma. I mean, how do you manage parallel systems, one of which needs a stable foundation, and the other is so temperamental that you don’t want to touch it. My options as I saw them were something like: Scrap the data model refactoring. This really didn’t get much thought. Well it did, but the thought was, is this the best route for the client? And if so, should I offer to help them find my replacement or just leave? I definitely wasn’t up for replacing one unmaintainable piece of junk for another. New data model and re-factor the existing app. The existing application was a total nightmare built in classic Access spaghetti code fashion. Just touching that looked like going down a rabbit hole of certain doom. New application on the old data model and refactor the data model later. This would have caused a real disconnect between the data model and the application. I’m not sure if the data model and application ever... read more10 things to review after finishing your data model
When I finish modeling my database, I like to just let it sit a couple days, then spend some time just reviewing it to check for inconsistencies. The kind of inconsistencies you never notice when you’re up to your eyeballs in details, but drive you up the wall after 5 years of maintenance. When I finished my database design a few days ago, I jotted a few things down to remember to do. But when I transcribed them into my bug tracker, where I manage all my tasks, 3 things turned into 5, then 7 and I realized if I could just add a few more, I’d have an infamous ‘Top 10’ list. Anyway here it is. It’s not exactly comprehensive, but it’s a start. 10 things to review after finishing your data model: Naming consistencies Column default consistencies Identity / auto-number technique exists Constraint consistencies Foreign Key relationships exist Indexes on Foreign Key columns Abbreviations are consistent Abbreviations are documented Data type and size consistencies. (For example; TableA.UserName is nvarchar(N), so TableB.UserName should be nvarchar(N) as well, not nvarchar(N±X)) Review all requirements again to reconfirm everything was... read moreHow to enforce a foreign key constraint against multiple tables
I am building a web app with Ben Alabaster, and one of the requirements is for the user to be able to flag items for moderators. So the user can flag entity A, entity B, entity C, etc… So I created a single flag table. Which I then tried to tie it to the entity tables, hoping for something like Where all the foreign key relationships were from [flag].[entity_id] to [EntityX].[id] Then when I wanted the top 10 flags from a particular entity (B in this case), I could run a query like select top 20 e.[name], count(*) "count" from entityB as e left join flag as f on f.entity_id = e.id where f.entity_type='B' group by e.[name] order by count(*) desc Unfortunately, if you were to create the above table relationship, and run the following inserts insert into EntityA( id, name) values (1, 'EntityA'); insert into EntityB( id, name) values (2, 'EntityB'); insert into EntityC( id, name) values (3, 'EntityC'); insert into EntityD( id, name) values (4, 'EntityD'); The following statement insert into flag(entity_id, flag_reason) values(5, 'Testing without a valid FK value.'); would fail as expected, as expected, with the following error. “The INSERT statement conflicted with the FOREIGN KEY constraint “FK_flag_EntityA”. The conflict occurred in database “test”, table “dbo.EntityA”, column ‘id’.” But insert into flag(entity_id, flag_reason) values(1, 'Testing the FK to entity A.'); would also fail, which was undesired, with the following error: “The INSERT statement conflicted with the FOREIGN KEY constraint “FK_flag_EntityB”. The conflict occurred in database “test”, table “dbo.EntityB”, column ‘id’.” + So, my options with regards to referential integrity are : Ditch the referential integrity, which I am vehemently opposed to. ++ Create multiple flag tables, each with the exact same schema, but a different Foreign Key relationship, which just seems wrong. Managing referential integrity via triggers. While I’m not a big fan of triggers, the ‘Managing referential integrity via triggers.’ option seems like the only tolerable one. So I added the [entity_type] column to my flag table. Removed the relationships And wrote the following trigger to manage the foreign key relationship. -- ============================================= -- Description: maintain referential integrity on -- a column which is a FK for different tables -- ============================================= CREATE TRIGGER flag_entity_id_fk ON flag AFTER INSERT,UPDATE AS BEGIN declare @entity_type char(1); declare @entity_id int; declare @cnt int; -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; -- get info select @entity_type=entity_type, @entity_id=entity_id,... read moreHow To Get The Most Frequently Used Column Values
Whenever I import external data, integrate to another database, or am new to a project, I need to get familiar with the database. The table schemas, relational integrity, and constraints are the first thing I look at and take me a long way, but soon I need to know what the data looks like.
In an ideal world, relational integrity and database constraints would define control this, and all I’d really need to do is look at those. But the reality is, in 15 years of working in this industry, most of the databases I’ve worked on, that I didn’t design, have barely used constraints and some haven’t even used relation integrity fully!
The need to get a good feel of the data is even more prevalent when working with dirty data, or when refactoring poorly written applications to ensure any refactoring doesn’t introduce other issues. I will usually wind up writing the following query repeatedly:
….