Automating the XML Configuration files generation

Building by hand the "runtime XML Configuration File" (the one used by mod_parmguard to accept or reject requests) can be a very boring and difficult task. Especially with large web sites !

The package provides two Perl Tools that can help the Administrator managing the XML Configuration file:

 htmlspider.pl
Description: This tool recursively scans a given URL and extracts information from HTML tags to produce XML Configuration information on stdout.

Here are some rules applied by the tool to produce the XML:
  • <select> and <input type='radio'...> become 'script parameters' of type 'enum'
  • <input> tags become 'script parameters' of type 'string'
  • map the maxlength attribute to the 'maxlen' attribute
  • for 'string' parameters, the 'charclass' attribute is affected with the value of the 'class' attribute if present.
  • handles the HTTP redirection
Syntax: htmlspider.pl [-v] [-u useragent] -h startURL

where:

-v : set the verbose mode
-u useragent: set the value of the User-Agent field in the HTTP Requests (default is mod_parmguard/1.2)
-h startURL: starting URL

Example: The Administrator runs the following command:

./htmlspider.pl -h http://www.mysite.com/index.php

Now, imagine the 'index.php' page contains the following HTML code:

<html>
<body>
  <form>
    ID : <input type=text maxlength=10 name=id class=c_string>
    Choose one:
    <select name=v multiple=0>
      <option value=v1>v1</option>
      <option value=v2>v2</option>
    </select>
  </form>
</body>
</html>


the output will then be (the output header is not written here):

...
<parmguard>
  <url>
    <match>^/index.php</match>
    <parm name="id">
      <type setby="auto" name="string"/>
      <attr setby="auto" name="maxlen" value="10"/>
      <attr setby="auto" name="charclass" value="c_string"/>
    </parm>
    <parm name="v">
      <type setby="auto" name="enum"/>
      <attr setby="auto" name="multiple" value="0"/>
      <attr setby="auto" name="option" value="v1"/>
      <attr setby="auto" name="option" value="v2"/>
    </parm>
  </url>
</parmguard>
 
 confmerger.pl
Description: This tool takes many XML Configuration files in input and merges them into a single file dumped on stdout.

Merging files related to the same Web Site may lead to conflicts. The following rules explain how conflicts are resolved:
  • Conflicts only arrive when node children of <url> nodes differ
  • Nodes with a setby attribute not set or with a value of manual have a higher priority than nodes with a setby attribute set to auto (auto means that the attribute has been set by htmlspider.pl tool)
  • When two attributes both manually or automatically set are encountered, the second one is discarded, so the order of the XML file list given on the commmand line matters !
  • The tool generates a warning or an error message, on stderr and as a XML comment on stdout, each time a conflict is solved and each time it detects an inconsistency
Syntax: confmerger.pl [-v] file ...

where:

-v : set the verbose mode
file: list of XML Configuration files to merge


www.trickytools.com