Jul 16 2012

Nested strings with regular expressions (similar to recursive regex) in .NET

Published under Development

Parsing nested groups is a really common task required when parsing code-like strings.

Imagine you have to extract parameters from a function call like this one:

ProcedureName(param1,param2, param3)

which is the right regex to do the dirty work?

At a really first glance, in order to capture "param1, param2, param3", a regular expression like this could work:

(?
\(
[^()]*
\)
)

but this doesn't work in the case of nested parenthesis, like in this case:

ProcedureName(param1,subprocedure(param2,param3,param4), param3)

It is really known that regular expression engine could not manage nesting groups (exceptions exit in perl), in order to avoid to write a full syntax parser in order to match a fuction call with nested calls, .NET helps with balancing groups.

The Right Solution for .NET

In order to correctly manage this case balancing groups available in .NET regex engine are really useful.

Balancing groups act as a counter of groups with pop and push from the stack. The regex is valid if, and only if, the counter at the end of the matching returns to 0, actually balancing pushes to and pops from the stack.

Balancing groups syntax is quite simple:

(?)     : push counter +1
(?)     : pop counter -1
(?(groupname)(?!)) : check if the counter is 0

Using these features, counting nested parenthesis is simple:

\(
(?
    (?>
        \( (?) |
        \) (?) |
        [^()]+
    )*
    (?(NestedParenthesis)(?!))
)
\)

The key elements in this regular expression are:

  • leading and closing literal parenthesis \( and \)
  • count up and count down the nested parenthesis in order to catch matching ( and )
  • matching of non-parenthesis charters with [^()]

How to manage literal strings

What if parameters can be also strings?

This string would make the regular expression failing in matching nested parameters:

ProcedureName(param1,subprocedure('Bad work : -(',param3,param4), param3)

the : -( emoticon would be matched as a nested parenthesis and all the regular expression would fail.

This case can ben matched with this modified code:

\(
(?
    (?>
        \( (?) |
        \) (?) |
        [^()']+ |
        ('[^']*')*
    )*
    (?(NestedParenthesis)(?!))
)
\)

The differences from the previous regex are:

  • Exclusion of  apex (')  from the general characters matching group
  • Addition of a specific group to match literal strings: ('[^']*')*

Note that this code works well if the literal string from apex is the double apex (pascal-like).

Comments Off

Jun 30 2012

(Italiano) Corso: “Uso professionale dei social network” presso CIS di Valmadrera

Published under General,Web 2.0

Sorry, this entry is only available in Italiano.

Comments Off

May 26 2012

Firebird vs Oracle – First impressions

Published under Database,English,Firebird,Oracle

I and my team recently managed the development of a huge database with Oracle. The same database already exists and successfully lives thanks to Firebird SQL.

The main goal of this development is the creation of a cross-platform application, able to work both with Firebird SQL and with Oracle at the same time.

We reached the goal after a lot of headache and frustration due to the subtle, and often incredible, differences between the two platforms. In this post I list the main differences we faced to. The next table summarize the major topics, this topics will be better discussed in a next post.

This analysis is not intended to be exhaustive and general for all types development. This comparison is based on the experience of a team that reaches great results following the "best practices" in designing a Firebird SQL and expects similar results, and possibly similar winning strategies, in developing with Oracle .

 Topic

Oracle 11g

Firebird 2.5

Installation Cumbersome with a lot of unexpected incidents Simple and straightforward
Resource eating As much RAM as possible

A lot of services running

Minimal, proportional to DB size

Two services running

Logs Widespread on the machine

Mainly non human readable

One log for all
Documentation Huge Not well structured
Support form the community A lot of discussion forums and blogs

Low quality

Difficulties to find the right solution due different approaches and philosophies

Sparse specific forums

Great support from Firebird enthusiasts

Empty strings Automatically transformed to NULL Differentiation of empty strings and NULL strings
INTEGER Automatically converted to NUMBER(38) INTEGER
NULLs in expression NULL operators gives NULL results

String CONCATENATION has different behaviour: NULL treated as empty string

SQL-92 Compliant: NULL operators gives NULL results
Stored procedure as Tables Possible with "FROM Table()" construct Possible with "FROM" construct
SELECT INTO with empty result set Fails Processed with NULL
IF EXISTS() Non present. Workaround with tricks  (for loop) Present
SELECT FIRST SKIP or SELECT ROWS WHERE ROWNUM

Due to the prefiltering effect of the where clause, and odd solution should be used:

select * from (select .... order by) where rownum

SQL-92 compliant
SELECT COALESCE() with no rows Returns empty result set Executes the COALESCE
Index on Foreing Keys Explicit creation required Automatically created
CONTAINING Workaround: upper(filed) LIKE upper('%pattern%') Native

Comments Off

Next »