Tuesday, September 12, 2006

Using Numerical Fields

What is a number? The definition is one of the longest I've seen, but generally it refers to a quantity (or sum, total, count) of units. Quantities can be subjected to all sorts of calculations, such as addition, subtraction, multiplication and division.

Sometimes numbers are used for ordering, as well. So it makes sense to compare one number to another, not only as being greater or lesser than another, but also by what number of units it is greater.

Which brings me to my point. I've been asked on more than one occasion why I sometimes favour the use of VARCHAR2 in situations where others are using NUMBER. Take, for example, my earlier post on creating an appropriate Boolean type.

Some developers think I should have used NUMBER(1) as the type, denote True as 1, False as 0, and have a check to make sure the value is either 0 or 1.

That solution also works. But there is a reason I don't use it.

It is true that this method has its advantages, not the least of which is comfort to programmers who are used to that (most programming languages make that conversion implicitly). They also claim that you can use multiplication for AND operations, and addition for OR operations.

Or can you? True * False = False, so ok, multiplication seems to work fine. But addition? True + True = 2. 2 is not a valid value. And what about subtraction and division? "Oh those don't apply" Then it's not an appropriate use of NUMBER, in my mind. What about square root? What is True to the exponent False? And so on.

I don't mean to pick on this one example. I've seen cases where NUMBER is used for all sorts of things: dates, credit card numbers, social insurance/security numbers, street address house numbers, zip codes, and so on. In some of these cases (though not necessarily all), it would have been more appropriate to create a type: a VARCHAR2 restricted to numerical characters, and then use that one type for all such instances.

No, this is not a critical point. NUMBER will work just fine, and I have no stories about how people got into trouble by using NUMBER for something that wasn't. But consider the importance of having precision and accuracy in your data. Shouldn't we be equally precise in your description of the data?

That is why I generally prefer using NUMBER only in cases where the field actually is by definition a number, a quantity of units, and eligible for all numerical operations. I'd love to hear your thoughts on this matter too.

To simulate a Boolean datatype in SQL, I usually use 0 for false and 1 for true. It does not matter to me whether I store the 0 or the 1 in a number or a varchar2 datatype, but since 0 and 1 are "numbers" I usually use the number datatype.

I prefer 0/1 over y/n and true/false just to avoid questions like is it Y or y?, No or no?, yes or YES? true or TRUE...
I completely agree with you. How hard could it possibly be to add a boolean data type to the database?

I also agree with only using numbers when it's actually a number. One of the problems with storing account numbers is when they have leading zeroes.

00000123 = 123 and that's fine.

But when I print 123 on a check because Oracle got rid of the zeroes, that's a problem. If the account number was stored as a varchar2, that wouldn't be a problem.

Good post,

A vendor-specific language I use keeps the values of yes and no in a parameter table. This way, you can code VARCHAR2(1) yes and no variables in the user's language, and not be stuck with Y and N in a place that uses Asian languages and French, or whatever.

The language also separates out storage from display, so the 0000123 can be kept in a numeric field. Formatting and storage should be separate and in the db (with formatting as a dictionary attribute), don't you think?
Thanks, Robert. I put this in Log Buffer #10.

Log Buffer
To Joel,

I would prefer a true boolean, i.e. TRUE/FALSE. I don't want to say
IF var = 1 THEN
IF UPPER(var) IN ('Y', 'YES', 'TRUE') THEN

I want to say

And I don't think the format of a varchar2 field should be removed from the data unless there is a reason for it. Should we remove all the white space from a text field? How about punctuation? I want the data in my database to accurately represent what the data is and what it means. If there was a GOOD reason to remove the formatting, I would agree. Just because putting a varchar in a numeric does, I wouldn't say that's a good practice.


Regarding TRUE+TRUE=2, the suggested course would simply to check if the results is 0 or not 0. A lot of languages do this in their conditional structures, any non-zero value evaluates to true.
Ok Don, but then you have the case where TRUE does not equal TRUE. Check it out.

a = 1
b = 2
a and b are both TRUE, but they aren't equal!

Instead of
if (a = b)
you have to write
if (a = b OR (a <> FALSE and b <> FALSE)). BAH!

TRUE+TRUE=2 just doesn't make sense.
I use number(1) to store boolean values (it's like bit value for me).

so in my case
true + true is eqvivalent to bitwise addition: 1 or 1 = 1
true * false is eqvivalent to bitwise multiplication: 1 and 0 = 0

I ofen use sys.diutil package to convert between int and boolean..

Boolean is internally represented by number in every language I know. So that's another reason why I think that number is more appropriate.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?