Thursday, December 17, 2009

Regular Expressions in Magik

It has been bothering me for years that Magik does not seem to support Regular Expressions. I tried to make do with the ro_charindex_mixin.matches?() method. But that method only uses wildcard characters %*, %? and %\.

So when l_santi asked a question in sw-gis about Magik support for Regular Expressions, it got me thinking: WWBD (What Would Bhimesh Do)? Bhimesh is the owner of the Magik Fun blog and many of his posts explore the interaction of Magik with various OLE objects. I began to wonder if there was an OLE object that could support regular expressions. It turns out that vbscript.regexp is exactly what I was looking for.

Here is an example of testing that a telephone number string matches the correct format for North America...


MagikSF> regexp << ole_client.createobject("vbscript.regexp")
$
MagikSF> regexp.pattern << "^[0-9]{3}-[0-9]{3}-[0-9]{4}$"
$
MagikSF> regexp.test("303-555")
$
False
MagikSF> regexp.test("303-555-5555")
$
True

... and another example testing if a string is in a valid e-mail format.

MagikSF> regexp.pattern << "\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b"
$
MagikSF> regexp.test("alfred@ifactorconsulting.com")
$
True
MagikSF> regexp.test("alfred!ifactorconsulting.com")
$
False

5 comments:

Anonymous said...

Thank you Alfred. Another missing piece of the puzzle created!

I jogged down the road a step with your idea and came up with this...

_pragma(classify_level=basic, topic={strings}, usage={subclassable})
_method ro_charindex_mixin.matches_regexp?( a_reg_exp )
## Returns true if self matches the regular expression
## A_REG_EXP, else false. On error, returns unset and the
## condition that was raised
##
## Uses the ole object vbscript.regexp to perform the analysis

# adapted from Alfred Sawatzky
# http://sworldwatch.blogspot.com/2009/12/regular-expressions-in-magik.html

matches? << _unset

_try _with cond
_protect
regexp << ole_client.createobject( "vbscript.regexp" )
regexp.pattern << a_reg_exp

matches? << regexp.test( _self )
_protection
_if ( regexp _isnt _unset )
_then
regexp.release_object()
_endif
_endprotect
_when error
_return matches?, cond
_endtry

_return matches?
_endmethod
$

Bruce Morehouse, InMaps

Alfred Sawatzky said...

Hi Bruce,

That's a great step! Wrapping it like that and writing it on ro_charindex_mixin as :matches_regexp?() will make it easy to find in the class browser.

Alfred

Bhimu... said...

Hi Alfred

This is really cool..
.Net has built in regular Expersssion validator.
using similar validator in Magik session with OLE Automation. .. This is really good..

I think finding the Ole components is real challange.. This post may help in finding available Ole components http://magikfun.blogspot.com/2009/12/how-do-i-ole.html

What if the user Operating System is non-windows?

Thanks,
Bhimesh

Alfred Sawatzky said...

Hi Bhimesh,

Thanks for the information about the OLE Viewer tool. If the session is not running in Windows, I guess you are out of luck. It would be nice if regular expressions could be incorporated directly into Magik by the core product team.

I always seem to have difficulty finding the exact download page at Microsoft. So I have found the OLE2 View download site and created a link for it here: http://tinyurl.com/ole2view-download

Alfred

Mike Buller said...

If you're looking for platform independence, how about using a Java ACP to leverage Java's built-in regular expression parsing? The class of relevance is...

java.lang.String

...on which you'll find the method...

matches(java.lang.String regex)

I just gave this a shot, and it seems to work pretty nicely.


Here's the main Java code.


public RegExACP() throws Exception
{
String stringToMatch = this.getString8();
String regEx = this.getString8();
this.putBoolean(stringToMatch.matches(regEx));
this.flush();
}


Here's the Magik call from ro_charindex_mixin...


_method ro_charindex_mixin.matches?(a_regex)
an_acp << regex_acp.new()
matches? << _unset
_protect
matches? << an_acp.matches_regex?(_self, a_regex)
_protection
_if an_acp _isnt _unset
_then
an_acp.close()
_endif
_endprotect
_return matches?
_endmethod
$


And finally, here's the interesting ACP code.


_method regex_acp.matches_regex?(a_string, a_regex)
ok? << _false
matches? << _unset
_protect _locking _self
_self.locked_start()
_self.put_chars8(system.current_text_encoding, a_string)
_self.put_chars8(system.current_text_encoding, a_regex)
_self.flush()
matches? << _self.get_boolean()
ok? << _true
_protection
_self.locked_end(ok?)
_endprotect
_return matches?
_endmethod
$

All that's left to write is the ACP plumbing in Java and Magik.