Archive for the 'Development' Category

Jul 16 2012

Nested strings with regular expressions (similar to recursive regex) in .NET

Published under Development

Parsing nested groups is a really common task required when parsing code-like strings.

Imagine you have to extract parameters from a function call like this one:

ProcedureName(param1,param2, param3)

which is the right regex to do the dirty work?

At a really first glance, in order to capture "param1, param2, param3", a regular expression like this could work:

(?
\(
[^()]*
\)
)

but this doesn't work in the case of nested parenthesis, like in this case:

ProcedureName(param1,subprocedure(param2,param3,param4), param3)

It is really known that regular expression engine could not manage nesting groups (exceptions exit in perl), in order to avoid to write a full syntax parser in order to match a fuction call with nested calls, .NET helps with balancing groups.

The Right Solution for .NET

In order to correctly manage this case balancing groups available in .NET regex engine are really useful.

Balancing groups act as a counter of groups with pop and push from the stack. The regex is valid if, and only if, the counter at the end of the matching returns to 0, actually balancing pushes to and pops from the stack.

Balancing groups syntax is quite simple:

(?)     : push counter +1
(?)     : pop counter -1
(?(groupname)(?!)) : check if the counter is 0

Using these features, counting nested parenthesis is simple:

\(
(?
    (?>
        \( (?) |
        \) (?) |
        [^()]+
    )*
    (?(NestedParenthesis)(?!))
)
\)

The key elements in this regular expression are:

  • leading and closing literal parenthesis \( and \)
  • count up and count down the nested parenthesis in order to catch matching ( and )
  • matching of non-parenthesis charters with [^()]

How to manage literal strings

What if parameters can be also strings?

This string would make the regular expression failing in matching nested parameters:

ProcedureName(param1,subprocedure('Bad work : -(',param3,param4), param3)

the : -( emoticon would be matched as a nested parenthesis and all the regular expression would fail.

This case can ben matched with this modified code:

\(
(?
    (?>
        \( (?) |
        \) (?) |
        [^()']+ |
        ('[^']*')*
    )*
    (?(NestedParenthesis)(?!))
)
\)

The differences from the previous regex are:

  • Exclusion of  apex (')  from the general characters matching group
  • Addition of a specific group to match literal strings: ('[^']*')*

Note that this code works well if the literal string from apex is the double apex (pascal-like).

Comments Off

Sep 05 2009

Intersting memo about use of Firebird with Solid State Disks (SSD)

Published under Database,Development,Firebird

Poul Dige writes this interesting memo about his experience with Firebird on a SSD.
He claims a 50% performance improvement under normal conditions and an amazing 93% improvement under heavy load conditions.
With this numbers, solutions based on SSD could be successfully used in stressed environments.

Comments Off

May 05 2009

Dokuwiki as documentation repository

dokuwiki

DokuWiki is another amazing open source project I cannot do without in my everyday activities.

It's a long time since I first uploaded dokuwiki to my server and I started playing with.

I started playing with wikis before I thought how wikis could had helped me in my daily work. Web introduced the hyperlink concept, wiki made it simple and introduced a fast way to create hyperlinked pages.

Today, thanks to a lot of extensions and plugins, wikis migrate far from the original purpose (see the page plugins in dokuwiki for an idea on how wide is the use of a wiki): generic web sites, blogs, documentation repositories, forums, data collections; but the "core business" is, for sure, sites requiring powerful hyperlink management.

In my personal and professional experience I tried to use a wiki as ToDo management (rapidly abandoned in favor of freemind as I widely depicted in my previous post), as customer relations management (rapidly abandoned due to small interest from customers), and as documentation repository. The latter was the most successful experience, mainly in collaborative environments.

Before dokuwiki I tested a lot of wiki packages (see WikiMatrix for a realtime comparison with other wiki tools), but none of them caught my attention because all missed in some aspects or functionalities. When I tested DokuWiki I immediately felt that it was the right tool, it was the only one satisfying any of the functionalities I was interested in. Following a short list of them:

Simple and effective installation procedure

Dokuwiki can be installed on quite any web server, in particular I tried it on apache and IIS without any problems.

It doesn't require any database engine so the test installation can be made in seconds and  existing installations can be cloned with a simple copy of the files.

Light system requirements

Dokuwiki is light, it uses low resources and runs even with old php releases (>4.3.3).

Extensive documentation

The documentation is wide and complete, indeed it is provided by a dokuwiki... it is always up to date and users can collaborate in writing.

Page storage method

One of the things I best appreciate in dokuwiki is the way it stores the pages. Dokuwiki doesn't use any database, no particular and fancy repositories, but only the file system: one page one file. I like this!

This mean simple backup, simple maintenance and, really important to me, data longevity. I like that my data is stored in a open way and I like thinking that anytime I can get my data out from dokuwiki, see it with a simple text editor (with a simple file copy) and use it whatever I want. The multi file model is good even for security because data loss can be limited to single files.

Good user management

Dokuwiki has the merit to provide a really simple and effective ACL management. Indeed ACL has come with time, the very early ACL management wasn't a real ACL, but in the last releases ACL became complete and effective.

Complete and Friendly Syntax

The textual syntax usable in writing texts is really complete and, at a certain point, intuitive. The visual editor helps for the simpler formatting options, but for the expert user there are plenty of tags. Some good plugins introduce functionality not covered by native syntax.

For a complete example see the syntax page on dokuwiki web site.

I often use the syntax highlight functionality in order to highlight source code written in various languages, and the math plugin capable of rendering mathematical expressions written in (La)TeX or in MathML.

Plugins

DokuWiki cames with plenty of free plugins extending the functionalities of the engine, the syntax  and the way of rendering pages.

Templates

DokuWiki comes with a lot of templates also. I listed this as the last topic because I'm not really interested in changing layout or look&feel of my documentation repository. I thinks that look&feel customization for technical repositories is not as important as in blogs or personal web-site. Anyway the list of template is full and building a new template is quite simple.

Comments Off

Next »