Monthly Archive for May, 2008

[Hint] Discarding non-ASCII characters in Ruby on Rails

My current Ruby on Rails project requires SEO-friendly URL-s. I’ve installed acts_as_slugable plugin, but soon I found out that all my slugs contained Polish characters (in UTF-8). After some googling (Polish Ruby on Rails forum) and some more sips of coffee I came up with some modifications to acts_as_slugable.

Here’s lib/acts_as_slugable_ascii.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
 require 'string'
 
module Multiup
  module Acts
    module Slugable
      module InstanceMethods
        private
          def create_slug
            return if self.errors.length > 0
 
            if self[source_column].nil? or self[source_column].empty?
              return
            end
 
            if self[slug_column].to_s.empty?
              test_string = self[source_column]
 
              proposed_slug = test_string.strip.downcase.gsub(/[\'\"\#\$\,\.\!\?\%\@\(\)]+/, '').to_ascii
              proposed_slug = proposed_slug.gsub(/&/, 'and')
              proposed_slug = proposed_slug.gsub(/[\W^-_]+/, '-')
              proposed_slug = proposed_slug.gsub(/\-{2}/, '-')
 
              suffix = ""
              existing = true
              acts_as_slugable_class.transaction do
                while existing != nil
                  existing = acts_as_slugable_class.find(:first, :conditions => ["#{slug_column} = ? and #{slug_scope_condition}",  proposed_slug + suffix])
                  if existing
                    if suffix.empty?
                      suffix = "-0"
                    else
                      suffix.succ!
                    end
                  end
                end
              end         
              self[slug_column] = proposed_slug + suffix
            end
        end
      end
    end
  end
end
 
ActiveRecord::Base.class_eval do
  include Multiup::Acts::Slugable
end

The only difference is the use of to_ascii method.

Now, we need to extend the String class. to_ascii method converts Polish UTF-8 characters to ASCII and discards all unknown characters:

 require 'iconv'
 
class String
  def to_ascii
    ascii = 'acelnoszzACELNOSZZ'
    non_ascii = "\271\346\352\263\361\363\234\277\237"
    to_ascii_string = self
    begin
      result = Iconv.new("CP1250", "UTF-8").iconv(to_ascii_string)
    rescue Iconv::IllegalSequence => e
      failed = e.failed.chars.split(//, 2)
      to_ascii_string = to_ascii_string.gsub(failed[0], '')
    retry
    end
   result.tr!(non_ascii, ascii)
  end
end

And we’re done!

Just add the following line to your model:

 require 'acts_as_slugable_ascii'

and configure acts_as_slugable as instructed in README file. Now you’re ready to face your users.