What are the pros and cons of Anchor Modeling?
I am currently trying to create a database where a very large percentage of the data is temporal. After reading through many techniques for doing this (most involving 6nf normalization) I ran into Anchor Modeling.
The schema that I was developing strongly resembled the Anchor Modeling model, especially since the use case (Temporal Data + Known Unknowns) is so similar, that I am tempted to embrace it fully.
The two biggest problem I am having is that I can find nothing detailing the negatives of this approach, and I cannot find any references to organizations that have used it in production for war-stories and gotchas that I need to be aware of.
I am wondering if anyone here is familiar enough with to briefly expound on some of the negatives (since the positives are very well advertized in research papers and their site), and any experiences with using it in a production environment.
In reference to the anchormodeling.com
Here are a few points I am aware of
The number of DB-objects is simply too large to maintain manually, so make sure that you use designer all the time to evolve the schema.
Currently, designer supports fully MS SQL Server, so if you have to port code all the time, you may want to wait until your target DB is fully supported. I know it has Oracle in dropdown box, but ...
Do not expect (nor demand) your developers to understand it, they have to access the model via 5NF views -- which is good. The thing is that tables are loaded via (instead-of-) triggers on views, which may (or may not) be a performance issue.
Expect that you may need to write some extra maintenance procedures (for each temporal attribute) which are not auto-generated (yet). For example, I often need a prune procedure for temporal attributes -- to delete same-value-records for the same ID on two consecutive time-events.
Generated views and queries-over-views resolve nicely, and so will probably anything that you write in the future. However, "other people" will be writing queries on views-over-views-over-views -- which does not always resolve nicely. So expect that you may need to police queries more than usual.
Having sad all that, I have recently used the approach to refactor a section of my warehouse, and it worked like a charm. Admittedly, warehouse does not have most of the problems outlined here.
I would suggest that it is imperative to create a demo-system and test, test, test ..., especially point No 3 -- loading via triggers.
With respect to point number 4 above. Restatement control is almost finished, such that you will be able to prevent two consecutive identical values over time.
And a general comment, joins are not necessarily a bad thing. Read: Why joins are a good thing.
One of the great benefits of 6NF in Anchor Modeling is non-destructive schema evolution. In other words, every previous version of the database model is available as a subset in the current model. Also, since changes are represented by extensions in the schema (new tables), upgrading a database is almost instantanous and can safely be done online (even in a production environment). This benefit would be lost in 5NF.
I haven't read any papers on it, but since it's based on 6NF, I'd expect it to suffer from whatever problems follow 6NF.
6NF requires each table consist of a candidate key and no more than one non-key column. So, in the worst case, you'll need nine joins to produce a 10-column result set. But you can also design a database that uses, say, 200 tables that are in 5NF, 30 that are in BCNF, and only 5 that are in 6NF. (I think that would no longer be Anchor Modeling per se, which seems to put all tables in 6NF, but I could be wrong about that.)
The Mythical Man-Month is still relevant here.
The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers.
Fred Brooks, Jr., in The Mythical Man-Month, p 116.
How cheaply can you build a prototype to test your expected worst case?
In this post I will present a large part of the real business that belong to databases. Database's solutions in this big business area can not be solved by using „Anchor modeling" , at all. In the real business world this case is happening on a daily basis. That is the case when data entry person, enters a wrong data.
In real-world business, errors happen frequently at data entry level. It often happens that data entry generates large amounts of erroneous data. So this is a real and big problem. "Anchor modeling" can not solve this problem.
Anyone who uses the "Anchor Modeling" database can enter incorrect data. This is possible because the authors of "Anchor modeling" have written that the erroneous data can be deleted.
Let me explain this problem by the following example: A profesor of mathematics gave the best grade to the student who had the worst grade. In this high school, professors enter grades in the corressponding database. This student gave money to the professor for this criminal service. The student managed to enroll at the university using this false grade. After a summer holiday, the professor of mathematics returned to school. After deleting the wrong grade from the database, the professor entered the correct one in the database. In this school they use "Anchor Modeling" db. So the math profesor deleted false data as it is strictly suggested by authors of "Anchor modeling". Now, this professor of mathematics who did this criminal act is clean, thanks to the software "Anchor modeling".
This example says that using "Anchor Modeling," you can do crime with data just by applying „Anchor modeling technology“
In section 5.4 the authors of „Anchor modeling“ wrote the following: „Delete statements are allowed only when applied to remove erroneous data.“ . You can see this text at the paper „ An agile modeling technique using sixth normal form for structurally evolving data“ written by authors of „Anchor modeling“.
Please note that „Anchor modeling“ was presented at the 28th International Conference on Conceptual Modeling and won the best paper award?!
Authors of "Anchor Modeling" claim that their data model can maintain a history! However this example shoes that „Anchor modeling“ can not maintain the history at all.
As „Anchor modeling“ allows deletion of data, then "Anchor modeling" has all the operations with the data, that is: adding new data, deleting data and update. Update can be obtained by using two operations: first delete the data, then add new data.
This further means that Anchor modeling has no history, because it has data deletion and data update.
I would like to point out that in "Anchor modeling" each erroneous data "MUST" be deleted. In the "Anchor modeling" it is not possible to keep erroneous data and corrected data.
"Anchor modeling" can not maintain history of erroneous data.
In the first part of this post, I showed that by using "Anchor Modeling" anyone can do crime with data. This means "Anchor Modeling" runs the business of a company, right into a disaster.
I will give one example so that professionals can see on real and important example, how bad "anchor modeling" is.
Example People who are professionals in the business of databases, know that there are thousands and thousands of international standards, which have been used successfully in databases as keys.
International standards: All professionals know what is "VIN" for cars, "ISBN" for books, and thousands of other international standards.
National standards: All countries have their own standards for passports, personal documents, bank cards, bar codes, etc
Local standards: Many companies have their own standards. For example, when you pay something, you have an invoice with a standard key written and that key is written in the database, also.
All the above mentioned type of keys from this example can be checked by using a variety of institutions, police, customs, banks credit card, post office, etc. You can check many of these "keys" on the internet or by using a phone.
I believe that percent of these databases, which have entities with standard keys, and which I have presented in this example, is more than 95%.
For all the above cases the "anchor surrogate key" is nonsense. "Anchor modeling" exclusively uses "anchor-surrogate key"
In my solution, I use all the keys that are standard on a global or local level and are simple.