1 """Parse a time or date string.
2
3 Converts a strftime/strptime-like format string into a Martel regular
4 expression (either as a string or an Expression).
5
6 Example use:
7 >>> from Martel import Time
8 >>> from xml.sax import saxutils
9 >>> format = Time.make_expression("%(Jan)-%(day)-%(YYYY)\n")
10 >>> parser = format.make_parser()
11 >>> parser.setContentHandler(saxutils.XMLGenerator())
12 >>> parser.parseString("OCT-31-2021\\n")
13 <?xml version="1.0" encoding="iso-8859-1"?>
14 <month type="short">OCT</month>-<day type="numeric">31</day>-<year type="long">2021</year>
15 >>>
16
17
18 Times and dates come up often in parsing. It's usually pretty easily
19 to write a syntax for them on the fly. For example, suppose you want
20 to parse a date in the format
21 YYYY-MM-DD
22
23 as in "1985-03-26". One pattern for that is
24 "\\d{4}-\\d{2}-\\d{2}"
25
26 To get the individual fields in Martel requires group names.
27 "(?P<year>\\d{4})-(?P<month>\\d{2})-(?P<day>\\d{2})"
28
29 If you want some minimal verification (eg, to help make sure you
30 haven't accidentally swapped the day and month fields) you need to
31 tighten down on what values are allowed, as in
32 "(?P<year>\\d{4})-(?P<month>0[1-9]|1[012])-(?P<day>0[1-9]|[12][0-9]|3[01])"
33
34 The more you write, the more the likelihood of making a mistake, the
35 more the chance different format definitions use different patterns,
36 the harder it is to understand what's going on.
37
38 This module helps by providing a set of standard definitions for the
39 different terms needed in parsing dates, and a way to generate those
40 definitions from a relatively easy to understand format string.
41
42 The syntax of the format string is based on that used by the standard
43 unix strftime/strptime functions, with terms taken from the POSIX and
44 GNU documentation plus some experimentation. These terms are in the
45 form "%c" where "c" is a single character. It's hard to remember
46 everything through a not always mnemonic single character code, so
47 Martel.Time adds a new syntax of the form "%(word)" where word can be
48 one of the single characters, or a multicharacter word. For example,
49 "%(Mon)" is identical to "%a" but easier to understand.
50
51 The complete list of definitions is given below.
52
53 The lowest-level terms (like "year", but excluding terms like "%D"
54 which expand to other terms) are inside of named groups, which
55 generate the element tag and attributes when used for Martel.
56
57 For example, "%m" generates the pattern used for a month, from "01" to
58 "12". The name of the group is "month" and it has a single attribute
59 named "type" with value "numeric". (All "numeric" types can be parsed
60 with Python's 'int' function.) The default pattern made from "%m" is
61
62 (?P<month?type=numeric>(0[1-9]|1[012]))
63
64 and when parsed against a month value, like "05", produces
65
66 <month type="numeric">05</month>
67
68 The "type" attribute is used because values which mean the same thing
69 can be represented in different formats. The month "January" can be
70 represented with the word "January" (type = "long"), "Jan" (type =
71 "short"), "01" (type = "numeric"), "1" (type = "numeric"), or " 1"
72 (type = "numeric"). [Note: It is possible that subtypes may be added
73 in the future to distinguish between these different numeric cases.]
74
75
76 FUNCTIONS:
77
78 There are two public functions -- "make_pattern" and
79 "make_expression". Both take the same parameters and return a regular
80 expression.
81
82 make_pattern(format, tag_format = "%s") -- returns the expression
83 as a pattern string
84
85 make_expression(format, tag_format = "%s") -- returns the expression
86 as a Martel.Expression data structure (which can be used to
87 make a parser)
88
89 The first parameter, "format", is the time format string already
90 discussed. Some examples are:
91
92 >>> from Martel import Time
93 >>> Time.make_pattern("%y")
94 '(?P<year?type=short>\\\\d{2})'
95 >>> Time.make_pattern("%H:%M")
96 '(?P<hour?type=24-hour>([01][0-9]|2[0-3]))\\\\:(?P<minute?type=numeric>[0-5][0-9])'
97 >>>
98
99 The second parameter is used if you want to change the tag name. For
100 example, instead of "year" you may want "year-modified" or
101 "start-year" -- or you may not want a tag at all.
102
103 For each term, the tag name ("year", "month", etc.) is %'ed with the
104 tag_format string. The default string is "%s" which effectively says
105 to keep the name unchanged. Here are a couple examples which use a
106 different string.
107
108 >>> from Martel import Time
109 >>> Time.make_pattern("%(year)", "%s-modified")
110 '(?P<year-modified?type=any>([0-9]{2}([0-9]{2})?))'
111 >>> Time.make_pattern("%(year)", "start-%s")
112 '(?P<start-year?type=any>([0-9]{2}([0-9]{2})?))'
113 >>> Time.make_pattern("%(year)", None)
114 '([0-9]{2}([0-9]{2})?)'
115 >>>
116
117 The tag_format is used for every tag name, which lets you modify
118 several values at once. You can even pass in an object which
119 implements the __mod__ method to make more drastic changes to the
120 name.
121
122 >>> Time.make_pattern("%H:%M", "%s-created")
123 '(?P<hour-created?type=24-hour>([01][0-9]|2[0-3]))\\\\:(?P<minute-created?type=numeric>[0-5][0-9])'
124 >>> class Upcase:
125 ... def __mod__(self, name):
126 ... return name.upper()
127 ...
128 >>> Time.make_pattern("%H:%M", Upcase())
129 '(?P<HOUR?type=24-hour>([01][0-9]|2[0-3]))\\:(?P<MINUTE?type=numeric>[0-5][0-9])'
130 >>>
131
132 BUGS:
133 Only the "C" locale (essentialy, US/English) is supported. Field
134 widths (as in "%-5d") are not supported.
135
136 There is no way to change the element attributes. I'm not sure this
137 is a bug.
138
139 ==== Table of Date/Time Specifiers ====
140
141 %a is replaced by the pattern for the abbreviated weekday name.
142 Pattern: (Mon|Tue|Wed|Thu|Fri|Sat|Sun)
143 (the real pattern is case insensitive)
144 Example: "Wed" "FRI"
145 Element name: "weekday"
146 Element attributes: "type" = "short"
147 Note: %(Mon) is the same as %a
148
149 %A is replaced by the pattern for the full weekday name.
150 Pattern: (Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)
151 (the real pattern is case insensitive)
152 Example: "Thursday" "SUNDAY"
153 Element name: "weekday"
154 Element attributes: "type" = "long"
155 Note: %(Monday) is the same as %a
156
157 %b is replaced by the the pattern for the abbreviated month name.
158 Pattern: (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
159 (the real pattern is case insensitive)
160 Example: "Oct" "AUG"
161 Element name: "month"
162 Element attributes: "type" = "short"
163 Note: %(Jan) is the same as %b
164
165 %B is replaced by the pattern for the full month name.
166 Pattern: (January|February|March|April|May|June|July|August|
167 September|October|November|December)
168 (the real pattern is case insensitive)
169 Example: "August", "MAY"
170 Element name: "month"
171 Element attributes: "type" = "long"
172 Note: %(January) is the same as %B
173
174 %c is replaced by the pattern for the US 24-hour date and time
175 representation.
176 Pattern: same as "%a %b %e %T %Y"
177 Example: "Wed Dec 12 19:57:22 2001"
178 Element: only uses names and attributes of the individual terms
179
180 %C is replaced by the pattern for the century number (the year divided
181 by 100 and truncated to an integer) as a decimal number [00-99].
182 Pattern: "[0-9][0-9]"
183 Example: "19" for the years 1900 to 1999
184 Element name: "century"
185 Element attributes: "type" = "numeric"
186
187 %d is replaced by the pattern for a day of the month as a decimal
188 number [01,31].
189 Pattern: (0[1-9]|[12][0-9]|3[01])
190 Example: "01", "12"
191 Element name: "day"
192 Element attributes: "type": "numeric"
193 Note: "%d" does not include " 1" or "1". If you also want to allow
194 those then use "%(day)"
195
196 %D same as the pattern for "%m/%d/%y".
197 Pattern: see "%m/%d/%y".
198 Example: "12/13/01"
199 Element: only uses names and attributes of the individual terms
200
201 %e is replaced by the pattern for a day of the month as a decimal
202 number [1,31]; a single digit is preceded by a space.
203 Pattern: "( [1-9]|[12][0-9]|3[01])"
204 Example: " 1", "31"
205 Element name: "day"
206 Element attributes: "type" = "numeric"
207 Note: "%e" does not include "01" or "1". If you also want to allow
208 those then use "%(day)"
209
210 %F same as the pattern for "%Y-%m-%d".
211 Pattern: see "%Y-%m-%d".
212 Example: "2001-12-21"
213 Element: only uses names and attributes of the individual terms
214
215 %g ISO 8601 2-digit (like %G but without the century) (00-99)
216 Pattern: [0-9][0-9]
217 Example: "00"
218 Element name: "century"
219 Element attributes: "type" = "ISO8601"
220
221 %G Pattern for the ISO 8601 year with century as a decimal number.
222 The 4-digit year corresponding to the ISO week number (see %V).
223 This has the same format and value as %y, except that if the ISO
224 week number belongs to the previous or next year, that year is
225 used instead. (TZ)
226 Pattern: [0-9][0-9][0-9][0-9]
227 Example: "1954" "2001"
228 Element name: "year"
229 Element attributes: "type" = "ISO8601"
230
231 %h (DEPRECATED) same as %b.
232 Pattern: (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
233 Example: "Feb"
234 Element name: "month"
235 Element attributes: "type" = "short"
236 Note: %(Jan) is the same as %b is the same as %h
237
238 %H is replaced by the pattern for the hour on a 24-hour clock, as
239 a decimal number [00,23].
240 Pattern: ([01][0-9]|2[0-3])
241 Example: "00", "01", "23"
242 Element name: "hour"
243 Element attributes: "type" = "24-hour"
244 Note: This does not allow single digit hours like "1". If you also
245 want to include those, use %(24-hour)
246
247 %I is replaced by the pattern for the hour on a 12-hour clock, as
248 a decimal number [01,12].
249 Pattern: (0[0-9]|1[012])
250 Example: "01", "12"
251 Element name: "hour"
252 Element attributes: "type" = "12-hour"
253 Note: This does not allow single digit hours like "1". If you also
254 want to include those, use %(12-hour)
255
256 %j is replaced by the pattern for day of the year as a decimal
257 number. First day is numbered "001" [001,366].
258 Pattern: "([12][0-9][0-9]|3([012345][0-9]|6[0-6])|0(0[1-9]|[1-9][0-9]))"
259 Example: "001", "092", "362"
260 Element name: "year_day"
261 Element attributes: "type": "1"
262
263 %k is replaced by the pattern for the hour on a 24-hour clock, as a
264 decimal number (range 0 to 23); single digits are preceded by a
265 blank.
266 Pattern: "( [0-9]|1[0-9]|2[0123])"
267 Example: " 1", "10", "23"
268 Element name: "hour"
269 Element attributes: "type" = "24-hour"
270 Note: This does not allow single digit hours like "1" or hours which
271 start with an "0" like "03". If you also want to include those,
272 use %(24-hour). See also %H.
273
274 %l is replaced by the pattern for the hour on a 12-hour clock, as a
275 decimal number (range 1 to 12); single digits are preceded by a
276 blank.
277 Pattern: "( [0-9]|1[012])"
278 Example: " 1", "10", "12"
279 Element name: "hour"
280 Element attributes: "type" = "12-hour"
281 Note: This does not allow single digit hours like "1" or hours which
282 start with an "0" like "03". If you also want to include those,
283 use %(12-hour). See also %I.
284
285 %m is replaced by the pattern for the month as a decimal number [01,12].
286 Pattern: "(0[1-9]|1[012])"
287 Example: "01", "09", "12"
288 Element name: "month"
289 Element attributes: "type" = "numeric"
290 Note: This does not allow single digit months like "1" or months which
291 start with an space like " 3". If you also want to include those,
292 use %(month). See also %(DD), which is an alias for %m.
293
294 %M is replaced by the pattern for the minute as a decimal number [00,59].
295 Pattern: "[0-5][0-9]"
296 Example: "00", "38"
297 Element name: "minute"
298 Element attributes: "type" = "numeric"
299 Note: this is the same as %(minute)
300
301 %n is replaced by the pattern for the newline character.
302 Pattern: "\\n"
303 Note: you shouldn't need to use this
304
305 %p is replaced by the case insensitive pattern for "AM" or "PM"
306 Pattern: "([AaPp][Mm])"
307 Example: "AM", "pm"
308 Element name: "ampm"
309 Element attributes: no attributes
310 Note: this doesn't allow "a.m." or "P.M."
311
312 %P is identical to "%p" (they have slightly different meanings for output)
313 Pattern: "([AaPp][Mm])"
314 Example: "am", "PM"
315 Element name: "ampm"
316 Element attributes: no attributes
317 Note: this doesn't allow "a.m." or "P.M."
318
319 %r is equivalent to "%I:%M:%S %p".
320 Pattern: see the patterns for the individual terms
321 Example: "07:57:22 PM"
322 Element: only uses names and attributes of the individual terms
323
324 %R is the pattern for the 24 hour notation "%H:%M".
325 Pattern: see the patterns for the individual terms
326 Example: "19:57"
327 Element: only uses names and attributes of the individual terms
328
329 %s is pattern for a decimal number of seconds (Unix timestamp)
330 Pattern: "[0-9]+"
331 Example: "1008205042"
332 Element name: "timestamp"
333 Element attributes: no attributes
334
335 %S is replaced by the pattern for the second as a decimal number
336 Can take values from "00" to "61" (includes double leap seconds).
337 Pattern: "([0-5][0-9]|6[01])"
338 Example: "03", "25"
339 Element name: "second"
340 Element attributes: "type" = "numeric"
341 Note: This is the same as %(second)
342
343 %t is replaced by a tab character. (plat-spec)
344 Pattern: "\\t"
345 Note: You shouldn't need to use this.
346
347 %T is identical to the 24-hour time format "%H:%M:%S".
348 Pattern: see the patterns for the individual terms
349 Example: "19:57:22"
350 Element: only uses names and attributes of the individual terms
351
352 %u is replaced by the pattern for the weekday as a decimal number
353 [1,7], with "1" representing Monday.
354 Pattern: "[1-7]"
355 Example: "4" (which is Thursday)
356 Element name: "weekday"
357 Element attributes: "type" = "Monday1"
358 Note: See also %w, which has a type of "Sunday0"
359
360 %U is replaced by the pattern for the week number of the year (Sunday
361 as the first day of the week) as a decimal number [00,53]. In
362 other words, this is the number of Sundays seen so far in the year.
363 Pattern: "([0-4][0-9]|5[0-3])"
364 Example: "04", "26"
365 Element name: "week_number"
366 Element attributes: "type" = "Sunday_count"
367 Note: See also %V and %W
368
369 %V is replaced by the pattern for the week number of the year (Monday
370 as the first day of the week) as a decimal number [01,53]. This is
371 used for week numbers where if the week containing 1 January has four
372 or more days in the new year, then it is considered week 1. (Otherwise,
373 it is the last week of the previous year, and the next week is week 1.)
374 Pattern: "(0[1-9]|[1-4][0-9]|5[0-3])"
375 Example: "04", "33"
376 Element name: "week_number"
377 Element attributes: "type" = "type_V" (Got a better short name?)
378 Note: See also %U and %W. I don't know when to use this.
379
380 %w is replaced by pattern for the the weekday as a decimal number [0,6],
381 with 0 representing Sunday.
382 Pattern: "[0-6]"
383 Example: "6"
384 Element name: "weekday"
385 Element attributes: "type" = "Sunday0"
386 Note: See also %u, which has a type of "Monday1"
387
388 %W is replaced by the pattern for the week number of the year (Monday
389 as the first day of the week) as a decimal number [00,53]. All days
390 in a new year preceding the first Monday are considered to be in
391 week 0. In other words, this is the number of Mondays seen so far
392 in the year.
393 Pattern: "([0-4][0-9]|5[0-3])"
394 Example: "00", "49"
395 Element name: "week_number"
396 Element attributes: "type" = "Monday_count"
397 Note: See also %U and %V.
398
399 %x is the same as "%D", which is "%m/%d/%y".
400 Pattern: see the patterns for the individual terms
401 Example: "12/13/99"
402 Element: only uses names and attributes of the individual terms
403
404 %X is the same as "%T", which is "%H:%M:%S".
405 Pattern: see the patterns for the individual terms
406 Example: "19:57:22"
407 Element: only uses names and attributes of the individual terms
408
409 %y is replaced by the pattern for the year without century, as a
410 decimal number [00,99].
411 Pattern: "[0-9][0-9]"
412 Example: "89", "01"
413 Element name: "year"
414 Element attributes: "type" = "short"
415 Note: This is the same as %(YY).
416
417 %Y is replaced by the pattern for the year, including the century, as a
418 decimal number.
419 Pattern: "[0-9][0-9][0-9][0-9]"
420 Example: "1610", "2002"
421 Element name: "year"
422 Element attributes: "type" = "long"
423 Note: This is the same as %(YYYY).
424
425 %z is replaced by the pattern for the time-zone as hour offset from GMT.
426 (This is used when parsing RFC822-conformant dates, as in
427 "%a, %d %b %Y %H:%M:%S %z", except that %z does not include the
428 pattern for a missing timezone -- should I fix that?).
429 Pattern: "[-+][0-9][0-9][0-9][0-9]"
430 Example: "-0500" (for EST), "+0100" (for CET), "+0530" (somewhere in India)
431 Element name: "timezone"
432 Element attributes: "type" = "RFC822"
433
434 %Z is replaced by a pattern for a timezone name or abbreviation. (It does
435 not allow missing timezone field.)
436 Pattern: "(GMT([+-][0-9][0-9][0-9][0-9])?|[A-Z][a-zA-Z]*( [A-Z][a-zA-Z]*)*)"
437 (is there anything better?)
438 Example: "MST", "GMT", "Pacific Standard Time", "GRNLNDST", "MET DST",
439 "New Zealand Standard Time", "NZST", "SAST", "GMT+0200", "IDT"
440 Element name: "timezone"
441 Element attributes: "type" = "name"
442
443 %% is replaced by the pattern for "%" (which happens to be "%")
444 Pattern: "%"
445 Example: "%"
446 Element: none
447
448 === Martel specific extensions ===
449
450 %(Mon) is the same as "%a".
451 Pattern: See the definition for "%a"
452 Example: "Wed" "FRI"
453 Element name: "weekday"
454 Element attributes: "type" = "short"
455
456 %(Monday) is the same as "%A".
457 Pattern: See the definition for "%A"
458 Example: "Thursday" "SUNDAY"
459 Element name: "weekday"
460 Element attributes: "type" = "long"
461
462 %(Jan) is the same as "%b".
463 Pattern: See the definition for "%b"
464 Example: "Feb"
465 Element name: "month"
466 Element attributes: "type" = "short"
467
468 %(January) is the same as "%B".
469 Pattern: See the definition for "%B"
470 Example: "August", "MAY"
471 Element name: "month"
472 Element attributes: "type" = "long"
473
474 %(second) is the same as "%S".
475 Pattern: See the definition for "%S".
476 Example: "03", "25"
477 Element name: "second"
478 Element attributes: "type" = "numeric"
479
480 %(minute) is the same as "%M".
481 Pattern: See the definition for "%M"
482 Example: "00", "38"
483 Element name: "minute"
484 Element attributes: "type" = "numeric"
485
486 %(12-hour) is replaced by the pattern for a 12 hour clock in any of
487 the common formats. (Numeric values from 1 to 12.)
488 Pattern: "(0[1-9]|1[012]?|[2-9]| [1-9])"
489 Example: "2", "02", " 2", "10"
490 Element name: "hour"
491 Element attributes: "type" = "12-hour"
492
493 %(24-hour) is replaced by the pattern for a 24 hour clock in any
494 of the common formats. (Numeric values from 0 to 23.)
495 Pattern: "([01][0-9]?|2[0123]?|[3-9]| [1-9])"
496 Example: "9", "09", " 9", "00", "0", " 0", "23"
497 Element name: "hour"
498 Element attributes: "type" = "24-hour"
499
500 %(hour) is replaced by the pattern for any hour in either a
501 12-hour or 24-hour clock.
502 Pattern: "([01][0-9]?|2[0123]?|[3-9]| [1-9])"
503 (this happens to be the same as %(24-hour)
504 Example: "9", "09", " 9", "00", "0", " 0", "23"
505 Element name: "hour"
506 Element attributes: "type" = "any"
507
508 %(day) is replaced by the pattern for the day of the month as a decimal
509 in any of the common day format
510 Pattern: "(0[1-9]|[12][0-9]?|3[01]?|[4-9]| [1-9])"
511 Example: "9", "09", " 9", and "31"
512 Element name: "day"
513 Element attributes: "type" = "numeric"
514
515 %(DD) is the same as "%d", which is the pattern for a day of the month
516 as a decimal number [01,31].
517 Pattern: See the definition for "%d"
518 Example: "09", "31"
519 Element name: "day"
520 Element attributes: "type" = "numeric"
521
522 %(month) is replaced by the pattern for the month as a decimal in any
523 of the common month formats.
524 Pattern: "(0[1-9]|1[012]?|[2-9]| [1-9])"
525 Example: "5", "05", " 5", and "12".
526 Element name: "month"
527 Element attributes: "type" = "numeric"
528 Note: See also "%m" and %(MM).
529
530 %(MM) is the same as "%m", which is a two-digit month number [01,12]
531 Pattern: See the definition for "%m"
532 Example: "05", "01", and "12".
533 Element name: "month"
534 Element attributes: "type" = "numeric"
535 Note: See also %(month).
536
537 %(YY)
538 Pattern: "[0-9][0-9]"
539 Example: "10"
540 Element name: "year"
541 Element attributes: "type" = "short"
542
543 %(YYYY)
544 Pattern: "[0-9][0-9][0-9][0-9]"
545 Example: "1970"
546 Element name: "year"
547 Element attributes: "type" = "long"
548
549 %(year) is replaced by the pattern accepting 2 digit and 4 digit year formats.
550 Pattern: "([0-9]{2}([0-9]{2})?)"
551 Example: "2008", "97"
552 Element name: "year"
553 Element attributes: "type" = "any"
554 Note: Need to change this before the year 10,000
555
556 """
557 import string, Martel, Expression
558
559
561 t = ""
562 for c in s:
563 if c in string.letters:
564 t = t + "[%s%s]" % (string.upper(c), string.lower(c))
565 else:
566 t = t + c
567 return t
568
569 _time_fields = (
570 ("a", _any_case("(Mon|Tue|Wed|Thu|Fri|Sat|Sun)"),
571 "weekday", {"type": "short"}),
572 ("A", _any_case("(Monday|Tuesday|Wednesday|Thursday|Friday|"
573 "Saturday|Sunday)"),
574 "weekday", {"type": "long"}),
575 ("b", _any_case("(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)"),
576 "month", {"type": "short"}),
577 ("B", _any_case("(January|February|March|April|May|June|July|August|"
578 "September|October|November|December)"),
579 "month", {"type": "long"}),
580 ("C", "\d\d",
581 "century", {"type": "numeric"}),
582 ("d", "(0[1-9]|[12][0-9]|3[01])",
583 "day", {"type": "numeric"}),
584 ("e", "( [1-9]|[12][0-9]|3[01])",
585 "day", {"type": "numeric"}),
586 ("g", r"\d{2}",
587 "century", {"type": "ISO8601"}),
588 ("G", r"\d{4}",
589 "year", {"type": "ISO8601"}),
590 ("H", "([01][0-9]|2[0-3])",
591 "hour", {"type": "24-hour"}),
592 ("I", "(0[0-9]|1[012])",
593 "hour", {"type": "12-hour"}),
594
595
596 ("j", "([12][0-9][0-9]|3([012345][0-9]|6[0-6])|0(0[1-9]|[1-9][0-9]))",
597 "year_day", {"type": "1"}),
598
599 ("k", "( [0-9]|1[0-9]|2[0123])",
600 "hour", {"type": "24-hour"}),
601 ("l", "( [0-9]|1[012])",
602 "hour", {"type": "12-hour"}),
603 ("m", "(0[1-9]|1[012])",
604 "month", {"type": "numeric"}),
605 ("M", "[0-5][0-9]",
606 "minute", {"type": "numeric"}),
607 ("n", r"\n", None, None),
608 ("p", "([AaPp][Mm])",
609 "ampm", {}),
610 ("P", "[aApP][mM]",
611 "ampm", {}),
612 ("s", r"\d+",
613 "timestamp", {}),
614 ("S", "([0-5][0-9]|6[01])",
615 "second", {"type": "numeric"}),
616 ("t", r"\t", None, None),
617 ("u", "[1-7]",
618 "weekday", {"type": "Monday1"}),
619 ("U", "([0-4][0-9]|5[0-3])",
620 "week_number", {"type": "Sunday_count"}),
621 ("V", "(0[1-9]|[1-4][0-9]|5[0-3])",
622 "week_number", {"type": "type_V"}),
623 ("w", "[0-6]",
624 "weekday", {"type": "Sunday0"}),
625 ("W", "([0-4][0-9]|5[0-3])",
626 "week_number", {"type": "Monday_count"}),
627 ("y", r"\d{2}",
628 "year", {"type": "short"}),
629 ("Y", r"\d{4}",
630 "year", {"type": "long"}),
631 ("z", r"[-+]\d{4}",
632 "timezone", {"type": "RFC822"}),
633
634
635 ("Z", r"(GMT([+-]\d{4})?|[A-Z][a-zA-Z]*( [A-Z][a-zA-Z]*)*)",
636 "timezone", {"type": "name"}),
637 ("%", "%", None, None),
638
639
640 ("D", "%m/%d/%y", None, None),
641 ("F", "%Y-%m-%d", None, None),
642 ("h", "%b", None, None),
643 ("r", "%I:%M:%S %p", None, None),
644 ("R", "%H:%M", None, None),
645 ("T", "%H:%M:%S", None, None),
646 ("x", "%D", None, None),
647 ("X", "%T", None, None),
648 ("c", "%a %b %e %T %Y", "date", {}),
649
650
651 ("Mon", "%a", None, None),
652 ("Monday", "%A", None, None),
653 ("Jan", "%b", None, None),
654 ("January", "%B", None, None),
655 ("second", "%S", None, None),
656 ("minute", "%M", None, None),
657 ("12-hour", r"(0[1-9]|1[012]?|[2-9]| [1-9])",
658 "hour", {"type": "12-hour"}),
659 ("24-hour", r"([01][0-9]?|2[0123]?|[3-9]| [0-9])",
660 "hour", {"type": "24-hour"}),
661 ("hour", r"([01][0-9]?|2[0123]?|[3-9]| [0-9])",
662 "hour", {"type": "any"}),
663 ("day", r"(0[1-9]|[12][0-9]?|3[01]?|[4-9]| [1-9])",
664 "day", {"type": "numeric"}),
665 ("DD", "%d", None, None),
666 ("month", r"(0[1-9]|1[012]?|[2-9]| [1-9])", "month", {"type": "numeric"}),
667 ("MM", "%m", None, None),
668 ("YY", r"[0-9]{2}", "year", {"type": "short"}),
669 ("YYYY", r"[0-9]{4}", "year", {"type": "long"}),
670 ("year", r"([0-9]{2}([0-9]{2})?)", "year", {"type": "any"}),
671 )
672 _time_table = {}
673 for spec, pat, tag, attrs in _time_fields:
674 _time_table[spec] = (pat, tag, attrs)
675 for v in _time_table.values():
676 v = v[0]
677 assert (v[0] == '(' and v[-1] == ')') or '|' not in v, v
678
680 """format, tag_format = "%s" -> regular expression pattern string
681
682 Turn the given time format string into the corresponding regular
683 expression string. A format term may contain a Group name and attribute
684 information. If present, the group name is %'ed with the
685 tag_format to produce the tag name to use. Use None to specify
686 that named groups should not be used.
687
688 >>> from Martel import Time
689 >>> print Time.make_pattern("%m-%Y)", "created-%s")
690 (?P<created-month?type=numeric>(0[1-9]|1[012]))\\-(?P<created-year?type=long>\\d{4})\\)
691 >>>
692
693 See the Time module docstring for more information.
694
695 """
696 return _parse_time(format, tag_format,
697 text_to_result = Expression.escape,
698 group_to_result = Expression._make_group_pattern,
699 re_to_result = lambda x: x,
700 t = "")
701
703 """format, tag_format = "%s" -> Martel Expresion
704
705 Turn the given time format string into the corresponding Martel
706 Expression. A format term may contain a Group name and attribute
707 information. If present, the group name is %'ed with the
708 tag_format to produce the tag name to use. Use None to specify
709 that named groups should not be used.
710
711 >>> from Martel import Time
712 >>> from xml.sax import saxutils
713 >>> exp = Time.make_expression("%m-%Y\\n", "created-%s")
714 >>> parser = exp.make_parser()
715 >>> parser.setContentHandler(saxutils.XMLGenerator())
716 >>> parser.parseString("05-1921\n")
717 <?xml version="1.0" encoding="iso-8859-1"?>
718 <created-month type="numeric">05</created-month>-<created-year type="long">1921</created-year>
719 >>>
720
721 See the Time module docstring for more information.
722
723 """
724 return _parse_time(format, tag_format,
725 text_to_result = Martel.Str,
726 group_to_result = Martel.Group,
727 re_to_result = Martel.Re,
728 t = Martel.NullOp())
729
734
735
736
737
738
739
740
741
742
743
744
745 -def _parse_time(s, tag_format, text_to_result, group_to_result,
746 re_to_result, t):
747 initial_t = t
748 n = len(s)
749 end = 0
750 while end < n:
751 prev = end
752
753 start = string.find(s, "%", end)
754 if start == -1:
755 break
756 end = start + 1
757
758 if end == n:
759 end = prev
760 break
761
762 c = s[end]
763
764 if c == '(':
765 pos = string.find(s, ")", end)
766 if pos != -1:
767
768 c = s[end+1:pos]
769 else:
770 raise TypeError("Found a '%%(' but no matching ')': %s" % \
771 (repr(s[end-1:]),))
772 end = pos
773
774
775 pat, name, attrs = _time_table[c]
776 if "%" in pat and pat != "%":
777 pat = _parse_time(pat, tag_format,
778 text_to_result, group_to_result,
779 re_to_result, initial_t)
780 else:
781 pat = re_to_result(pat)
782 if name is not None:
783 fullname = _use_tag_format(tag_format, name)
784 if fullname:
785 pat = group_to_result(fullname, pat, attrs)
786
787 if prev + 1 > start:
788 t = t + pat
789 else:
790 t = t + text_to_result(s[prev:start]) + pat
791 end = end + 1
792
793 if end < n:
794 t = t + text_to_result(s[end:])
795 return t
796