[cvsspam-devel] diffs not character safe

David Holroyd dave at badgers-in-foil.co.uk
Wed Mar 7 23:59:59 UTC 2007


On Wed, Mar 07, 2007 at 09:06:28PM +0200, Elan Ruusam?e wrote:
> On Wednesday 07 March 2007 19:41:45 David Holroyd wrote:
> > On Wed, Mar 07, 2007 at 07:20:19PM +0200, Elan Ruusam?e wrote:
> > > On Wednesday 07 March 2007 18:05:54 David Holroyd wrote:
> > > > > cvsspam hilights after first byte of letter 'k' because it's unicode
> > > > > first part is the same byte.
> > > >
> > > > I hadn't considered that possibility. Maybe the within-a-line
> > > > colouring should be disabled when a multibyte encoding is detected?
> > >
> > > as quick fix, would be nice. but how you detect the charset is
> > > multibyte? just match /utf-?.+/i ?
> >
> > My use of 'detect' was incorrect :)
> >
> > Yeah, a regexp or just a simple list of encodings was about what I had
> > in mind.
> 
> ok. waiting for patch :)

Please test...

-- 
http://david.holroyd.me.uk/
-------------- next part --------------
Index: cvsspam.rb
===================================================================
--- cvsspam.rb	(revision 255)
+++ cvsspam.rb	(working copy)
@@ -936,7 +936,10 @@
         addInfixSize = line.length - (prefixLen+suffixLen)
         oversize_change = deleteInfixSize*100/@lineJustDeleted.length>33 || addInfixSize*100/line.length>33
 
-        if prefixLen==1 && suffixLen==0 || deleteInfixSize<=0 || oversize_change
+        # avoid doing 'within-a-line highlighting' if a multibyte encoding
+        # is suspected, as all the suffix/prefix stuff above is byte, not
+        # character based
+        if multibyte_encoding? || prefixLen==1 && suffixLen==0 || deleteInfixSize<=0 || oversize_change
           print(htmlEncode(@lineJustDeleted))
         else
           print(htmlEncode(@lineJustDeleted[0,prefixLen]))
@@ -1297,6 +1300,11 @@
   end
 end
 
+# guess if the users selected encoding is multibyte, since some CVSspam code
+# isn't multibyte-safe, and needs to be disabled.
+def multibyte_encoding?
+  $charset && ["utf-8", "utf-16"].include?($charset.downcase)
+end
 
 cvsroot_dir = "#{ENV['CVSROOT']}/CVSROOT"
 $config = "#{cvsroot_dir}/cvsspam.conf"


More information about the cvsspam-devel mailing list