[cvsspam-devel] diffs not character safe

David Holroyd dave at badgers-in-foil.co.uk
Wed Mar 7 16:05:54 UTC 2007


On Tue, Mar 06, 2007 at 12:35:36PM +0200, Elan Ruusam??e wrote:
> appears that when passed --charset utf-8 to collect_diffs the diffs are not 
> characterwise but bytewise

You are correct.  The --charset option only sets up the email headers
with the given value; it's not used during processing at all.


> and as cvsspamm appears to make diffs on same line coloured darker, it breaks 
> multibytes
> 
> so if the diff would be:
> -	'map_tab_label'			=> '??????????',
> +	'map_tab_label'			=> '??????????',
> 
> cvsspam hilights after first byte of letter 'k' because it's unicode first 
> part is the same byte.

I hadn't considered that possibility.  Maybe the within-a-line colouring
should be disabled when a multibyte encoding is detected?

I don't know a huge amount about handling multibyte encodings in Ruby,
but have the impression that it's a bit of a black art (until Ruby 2
comes out).  Fixing this might require a rewrite of the highlighting
code, and that code is a horrible mess.  I am scared of it  :(


-- 
http://david.holroyd.me.uk/



More information about the cvsspam-devel mailing list