[cvsspam-devel] diffs not character safe
David Holroyd
dave at badgers-in-foil.co.uk
Wed Mar 7 16:05:54 UTC 2007
On Tue, Mar 06, 2007 at 12:35:36PM +0200, Elan Ruusam??e wrote:
> appears that when passed --charset utf-8 to collect_diffs the diffs are not
> characterwise but bytewise
You are correct. The --charset option only sets up the email headers
with the given value; it's not used during processing at all.
> and as cvsspamm appears to make diffs on same line coloured darker, it breaks
> multibytes
>
> so if the diff would be:
> - 'map_tab_label' => '??????????',
> + 'map_tab_label' => '??????????',
>
> cvsspam hilights after first byte of letter 'k' because it's unicode first
> part is the same byte.
I hadn't considered that possibility. Maybe the within-a-line colouring
should be disabled when a multibyte encoding is detected?
I don't know a huge amount about handling multibyte encodings in Ruby,
but have the impression that it's a bit of a black art (until Ruby 2
comes out). Fixing this might require a rewrite of the highlighting
code, and that code is a horrible mess. I am scared of it :(
--
http://david.holroyd.me.uk/
More information about the cvsspam-devel
mailing list