Monday, December 04, 2006

REGEXP_LIKE

Recently I invited anyone who has been considering joining the Oracle blogging community to jump right in. Please, help me encourage John H:

Robert,

Way cool blog. I really need to create one too. I like to adhere to best practices and love to read stuff from Tom K. and Steven F.!

I have a question on regular expression check constraints and was wondering if you can help:

alter table logical_address
modify (address_ip check (

regexp_like(address_ip,'((^([2][0-5]{2}\.|[2][0-4][0-9]\.|
([0-1]\d\d)\.|\d\d\.|\d\.))
([2][0-5]{2}\.|[2][0-4][0-9]\.|
([0-1]\d\d)\.|\d\d\.|\d\.){2})(([2][0-5]{2}$)|
([0-1]\d\d$)|(\d\d$)|(\d$))')))

Isn't there a better way to write this? Oh! the pattern works in a regexp_substr but not in the regexp_like. Thanks in advance...


I applaud what John is trying to do: manage the integrity of the data in the database. His request is a common one, and I wish it were even more common. Too many databases are designed to simply trust that whatever application is providing the IP address has already checked it.

Unfortunately, I'm limited in what assistance I can provide with regards to regular expressions. Since I'm still using Oracle 9, I can't take advantage of one of the finest features of Oracle 10. I get a lot of questions about REGEXP in the past year and a half, but I can't offer much more than those links to articles and references.

Wanting to offer John something of value, I wondered how I would address this issue in Oracle 9. I recall coming across this once before, but we actually chose to store the IP address as a 32-bit integer which, by the way, is the same way our OS stored it. I wrote a quick procedure that would convert that 32-bit integer into a readable IP address (or "dotted-quad" notation). The specific details are proprietary to the company for which I worked at the time, but it should be easy enough to reproduce if you went that route.

Of course, this approach didn't check that the 32-bit integer was convertible to an IP address valid to my application. That problem, however, is a specific case of a very common general requirement: write a constraint to check the validity of a string. Briefly, in a case like this, I would convert the IP address to special coded string using TRANSLATE, and then CHECK if that string matched a set pattern. Further, in particular complex cases, I often resort to checking the constraint with a trigger.

In the end, I suggested John post this question in an Oracle forum. There are lots of people there, including things who are familiar with REGEXP (unlike me) who can give him a hand. But I believe he came up with a satisfactory solution on his own. I'm including it here, and I know he would appreciate a critique if anyone familiar with REGEXP has something to offer.

alter table logical_address
add constraint la_address_ip_chk
check (
regexp_like(
address_ip,
'((^([2][0-5]{2}\.|[2][0-4][0-9]\.|([0-1]\d\d)\.|\d\d\.|\d\.))([2][0-5]'||
'{2}\.|[2][0-4][0-9]\.|([0-1]\d\d)\.|\d\d\.|\d\.){2})(([2][0-5]{2}$)|'||
'([0-1]\d\d$)|(\d\d$)|(\d$))'
)
);

alter table logical_address
add constraint la_address_mac_chk
check (
regexp_like(address_mac,
'((([0-9]|[A-Z]){2})(:)){5}([0-9]|[A-Z]){2}|((([0-9]|[A-Z]){2})(-)){5}'||
'([0-9]|[A-Z]){2}'
)
);

alter table logical_address
add constraint la_port_chk
check (
regexp_like(address_port,'[0! -9]{5}| [0-9]{4}')
);

Comments:
it seems very wrong, a mac address is hexadecimal and cannot contain G-Z characters...

ip

regexp_like(address_ip,'(([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])\.){3}([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])')


mac

regexp_like(address_mac,','([0-9A-F]?[0-9A-F]:){5}[0-9A-F]?[0-9A-F]')


for port, no idea what he tried, the most logical could be
to_number(addess_port) between 0 and 65535
 
I noticed performance issues when using regexp_like in 10.2.0.2. It didn't use index on 2.7billion partitioned table.

When using good old like (with or clause to add more patterns) yielded a performance improvement for me. With good old like, my query runs in 2 minutes, with regexp_like, it takes 20+ minutes.

I like regexp_* commands, but I am slightly weary to use them in my code.

raj
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?