Clean up Attributed String

If you simply want to match any <font …> tag, you can use "<font [^>]++>".

For any <font …> tag containing a ‘color=#hhhhhh’ element: "<font [^>]*color=#[[:hex:]]{6}[^>]*+>".

For any <font …> tag specifically ending with such an element: "<font [^>]*color=#[[:hex:]]{6}>".

Many thanks!

This make totally sense.
Some question though: where would I find that I can use something like

[[:hex]]

That’s really odd! :face_with_raised_eyebrow: I used [[:hex:]] from memory and it worked (and still works) when tested. I thought it was one of the POSIX bracket expressions, which are listed in man grep and on the Regular-Expressions.info site and are recognised by ICU regex. But it’s not one of those. The bracket expression for a hexadecimal digit in those places is [[:xdigit:]], which also works here. I must have seen [[:hex:]] somewhere in the past, but I don’t know where now. :confused: Apologies if I’ve given you something which isn’t officially supported. If in doubt, you can use [0-9A-Fa-f] instead.

It’s a legitimate alias for Hex_Digit. See:

http://www.unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt

In non-POSIX terms, you would use \p{Hex_Digit}.

Thanks, Shane.

So taken in conjunction with what it says in the ICU Regex Guide, it’s an alternate [sic] POSIX-like syntax for a set expression matching an acceptable abbreviation of the Unicode category “Hex_Digit”? What a stroke of luck! :wink:

And the syntax is only POSIX-like. It doesn’t necessary have to be in a character set as it would be in a shell script. So [[:hex:]] can be just [:hex:] here. We’re certainly spoiled for choice!

"[:hex:]" or "[[:hex:]]" (“hex” can be written in any combination of cases)
"[:hex_digit:]" (simile)
"[:xdigit:]" (simile)
"\\p{hex}" (simile)
"\\p{hex_digit}" (simile)
"\\p{xdigit}" (simile)
"[0-9A-Fa-f]" or "[[0-9][A-F][a-f]]" (case sensitive, but both cases are covered)
"(?i:[0-9a-f])" or "[[0-9](?i:[a-f])]" (case insensitive, but the cases in the given letter range must be the same)
"[\\d(?i:[a-f])]" or "[\\p{nd}(?i:[a-f])]" or "[\\p{number}(?i:[a-f])]" or "[\\p{decimal number}(?i:[a-f])]" or "(?:\\d|(?i:[a-f]))"etc.!

The perfect ingredients for write-only code :hole:

From "[[0-9][A-F][a-f]]" on, certainly. The options before that seem pretty self-explanatory and allow writers to use what they happen to know. Even some of the later stuff may make sense in a broader context.

No question. It’s the reader I was thinking about.

Well, all in in all a real detailed answer :wink: