Tuesday, June 21, 2005
Natural vs Synthetic keys
Primary keys should be:
1. Unique
2. Never changed (or very infrequently)
This is because they are used in other tables to reference a single row.
(Definitions from www.orafaq.com)
Natural Key: A key made from existing attributes.
Surrogate Key: A system generated key with no business value. Usually implemented with database generated sequences.
This topic has been discussed at great length many times. We recently re-hashed it on the Dizwell Forum.
Disadvantages:
EDIT: Please note: There are some potentially important corrections and clarifications to the below. Please see Howard's comments and follow the links to his blog or the forum discussion.
Natural keys:
- May require the concatentation of many columns, so it gets very long
- Sometimes a natural key can be hard to find (in example: names can change)
- In a sense, you are duplicating data
- Can lead to future conflicts when the database expands beyond original requirements
- Care must be taken to avoid block-splitting
- Care must be taken to avoid index range scans
- Tend to change, albeit infrequently
"You nearly always end up having to append a number to guarantee both uniqueness and existence. If I always have to append a guaranteed unique existing number to ensure I meet my requirements, why not make it the key itself."
- Niall Litchfield
Synthetic/Surrogate keys:
- Meaningless by itself (ie. no relationship to the data to which it is assigned)
- Implementation is database-dependent
- Care must be taken to avoid contention (monotically increasing and smaller indexes)
- Similar rows would not have similar keys
Gauss suggested considering using a GUID for the surrogate key. No trigger is required, uniqueness is guaranteed over space and time, but it is 16 bytes.
1. Unique
2. Never changed (or very infrequently)
This is because they are used in other tables to reference a single row.
(Definitions from www.orafaq.com)
Natural Key: A key made from existing attributes.
Surrogate Key: A system generated key with no business value. Usually implemented with database generated sequences.
This topic has been discussed at great length many times. We recently re-hashed it on the Dizwell Forum.
Disadvantages:
EDIT: Please note: There are some potentially important corrections and clarifications to the below. Please see Howard's comments and follow the links to his blog or the forum discussion.
Natural keys:
- May require the concatentation of many columns, so it gets very long
- Sometimes a natural key can be hard to find (in example: names can change)
- In a sense, you are duplicating data
- Can lead to future conflicts when the database expands beyond original requirements
- Care must be taken to avoid block-splitting
- Care must be taken to avoid index range scans
- Tend to change, albeit infrequently
"You nearly always end up having to append a number to guarantee both uniqueness and existence. If I always have to append a guaranteed unique existing number to ensure I meet my requirements, why not make it the key itself."
- Niall Litchfield
Synthetic/Surrogate keys:
- Meaningless by itself (ie. no relationship to the data to which it is assigned)
- Implementation is database-dependent
- Care must be taken to avoid contention (monotically increasing and smaller indexes)
- Similar rows would not have similar keys
Gauss suggested considering using a GUID for the surrogate key. No trigger is required, uniqueness is guaranteed over space and time, but it is 16 bytes.
SQL> CREATE TABLE atable (pk RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
2 a_str VARCHAR2(32));
Table created.
SQL> INSERT INTO atable (a_str) VALUES ('One');
1 row created.
SQL> INSERT INTO atable (a_str) VALUES ('Two');
1 row created.
SQL> SELECT * FROM atable;
PK A_STR
-------------------------------- --------------------------------
5C3BCF77D55B41E78DE4016DFBE25FFA One
D3B959F3010745D3854F8FC2B09A18F3 Two
Comments:
<< Home
Howard Rogers responded to this discussion on his blog, mostly to assert that there is legitimacy to the "natural keys" opinion (some people felt there wasn't).
http://www.dizwell.com/2005/06/natural-or-synthetic-primary-keys.html
http://www.dizwell.com/2005/06/natural-or-synthetic-primary-keys.html
Thanks for the corrections, Howard, Your clarifications are very welcome.
Rather than try to explain them myself, I will add a note to the blog pointing people to your comments, and your related blog (there is already a pointer to the forum discussion).
There is already a summation of the disadvantages of synthetic/surrogate keys immediately following Niall's quote. You probably got too hot to read that far. :)
This is what I meant by duplication of data:
Last_name: Gretzky
First_name: Wayne
DOB: 26-Jan-61
ID: Gretzky_Wayne_260161
The ID just duplicated 3 columns worth of data.
Rather than try to explain them myself, I will add a note to the blog pointing people to your comments, and your related blog (there is already a pointer to the forum discussion).
There is already a summation of the disadvantages of synthetic/surrogate keys immediately following Niall's quote. You probably got too hot to read that far. :)
This is what I meant by duplication of data:
Last_name: Gretzky
First_name: Wayne
DOB: 26-Jan-61
ID: Gretzky_Wayne_260161
The ID just duplicated 3 columns worth of data.
I really opened the pickle jar this time!
David Aldridge has a contrasting viewpoint to Howard's. Here is his blog article:
http://oraclesponge.blogspot.com/2005/06/natural-and-synthetic-keys.html
David Aldridge has a contrasting viewpoint to Howard's. Here is his blog article:
http://oraclesponge.blogspot.com/2005/06/natural-and-synthetic-keys.html
I think that the topics of the size of the required index and the rate of block splits are related, are they not? Whether the index is built on a natural key or on a synthetic key (RKI there for the purpose of this, of course) you increase the chances of block splits by increasing the amount of free space per index block, either by reducing the size of the indexed values or by spreading them over more blocks (hence the index itself is larger).
Or have I been thinking about this too hard?
Or have I been thinking about this too hard?
Here is Pete's take on the topic, and look in the comments for a link to Tom Kyte's.
http://pjs-random.blogspot.com/2005/06/surrogate-keys.html
http://pjs-random.blogspot.com/2005/06/surrogate-keys.html
One disadvantage of synthetic keys I don't see mentioned here: In many cases they require an extra join. Suppose, for example, I have a table of U.S. states, where a natural key would be two-letter state abbreviation. I have another table of tax rates with a foreign key that ties it back to the state. If I want a query that gives information about each tax rate including the state it applies to, but I don't care about any other information from the state table, with a natural key I would just display the key because it is already a human-comprehensible identification of the state. With a synthetic key I would have to join on the state table to find the state abbreviation. For production queries this can be a major performance hit. For ad hoc queries it can be a big hassle to have to construct more complex queries and make sure they're correct.
Another advantage of natural keys is that they make it much easier to resolve database corruption problems. Suppose in the above example I used a synthetic key, and we find that due to database corruption state records 17, 19, and 24 have been lost. When we try to repair the problem, we can't just add back in those three records in any order. We have to study the records with the state-id posted as a foreign key to figure out which state was #17, which was #19, and which was #24, so that we can keep everything pointed to the correct record. It may not be at all obvious looking at the tax rate table what state any given record should apply to. But if we use a natural key, this would be zero problem.
And I must object to the unfairness of your "Gretzky_Wayne_260161" example. If this was a political debate I think I'd be screaming "propaganda tactics". If the question is, "Do database designers who prefer natural keys sometimes do things that are totally stupid and pointless in a vain attempt to avoid having to create a synthetic key in situations when there is no natural key available?" then okay, this is an example that proves that the answer is "yes". But the fact that some people attempt to apply a principle in an incompetent way doesn't mean the principle is flawed. I think every introductory database design book ever written points out that a name is not a suitable primary key because we cannot be sure it is unique: we might well have many people named "Jim Smith" in our database. And about 90% of them point out that concatenating multiple fields in order to create some combination that is unlikely to be duplicated is inadequate, because "unlikely" isn't good enough in database design. Even if we were talking about a set of fields that would make a suitable key, duplicating data by concatenating several fields into one is just ... why would you want to do this? If your answer is, "So I have a single key field rather than having to mess with multiple fields", then I think you failed the class on data normalization. I tend to prefer natural keys, but I readily recognize that there are cases where there is no natural key, or where the only available natural key is impractical to use (like, it requires six fields totalling 500 bytes) and so the only practical choice is a synthetic key.
Another advantage of natural keys is that they make it much easier to resolve database corruption problems. Suppose in the above example I used a synthetic key, and we find that due to database corruption state records 17, 19, and 24 have been lost. When we try to repair the problem, we can't just add back in those three records in any order. We have to study the records with the state-id posted as a foreign key to figure out which state was #17, which was #19, and which was #24, so that we can keep everything pointed to the correct record. It may not be at all obvious looking at the tax rate table what state any given record should apply to. But if we use a natural key, this would be zero problem.
And I must object to the unfairness of your "Gretzky_Wayne_260161" example. If this was a political debate I think I'd be screaming "propaganda tactics". If the question is, "Do database designers who prefer natural keys sometimes do things that are totally stupid and pointless in a vain attempt to avoid having to create a synthetic key in situations when there is no natural key available?" then okay, this is an example that proves that the answer is "yes". But the fact that some people attempt to apply a principle in an incompetent way doesn't mean the principle is flawed. I think every introductory database design book ever written points out that a name is not a suitable primary key because we cannot be sure it is unique: we might well have many people named "Jim Smith" in our database. And about 90% of them point out that concatenating multiple fields in order to create some combination that is unlikely to be duplicated is inadequate, because "unlikely" isn't good enough in database design. Even if we were talking about a set of fields that would make a suitable key, duplicating data by concatenating several fields into one is just ... why would you want to do this? If your answer is, "So I have a single key field rather than having to mess with multiple fields", then I think you failed the class on data normalization. I tend to prefer natural keys, but I readily recognize that there are cases where there is no natural key, or where the only available natural key is impractical to use (like, it requires six fields totalling 500 bytes) and so the only practical choice is a synthetic key.
Isn't the two-letter abbreviation for a state a synthetic key?
Isn't it, too, a quasi-duplication of data since it is based on the state's name (which would be a natural key)?
Isn't it, too, a quasi-duplication of data since it is based on the state's name (which would be a natural key)?
"Isn't the two-letter abbreviation for a state a synthetic key?"
No, not in this context. A natural key isn't necessarily all that natural: it's a value in a domain defined and controlled outside the boundaries of the system we are designing.
No, not in this context. A natural key isn't necessarily all that natural: it's a value in a domain defined and controlled outside the boundaries of the system we are designing.
If you have any problems or just want to deal then only text loans urgent tax work can really help in such situations, is to give from 100 to 1000 pounds of funding to clarify their needs quick funds. To help find some quick money, you can get free on these loans to borrow money urgently to 100-1000 pounds for the repayment tenure of a couple of days, which can range from 7 days to 30 days. Mobile Text loans are a division of payday loans that is provided by you make less effort.
http://uk6monthloans.co.uk/
http://6monthtextloansnoguarantor.co.uk/
http://www.elongtermpaydayloansuk.co.uk/small-cash-loans-online.html
Post a Comment
http://uk6monthloans.co.uk/
http://6monthtextloansnoguarantor.co.uk/
http://www.elongtermpaydayloansuk.co.uk/small-cash-loans-online.html
<< Home