Discarding non-ASCII characters in Ruby on Rails

My current Ruby on Rails project requires SEO-friendly URLs. I’ve installed acts_as_slugable plugin, but soon I found out that all my slugs contained Polish characters (in UTF-8). After some googling (Polish Ruby on Rails forum) and some more sips of coffee I came up with some modifications to acts_as_slugable.

Here’s lib/acts_as_slugable_ascii.rb:

require 'string'

module Multiup
  module Acts
    module Slugable
      module InstanceMethods
        private
          def create_slug
            return if self.errors.length > 0

            if self[source_column].nil? or self[source_column].empty?
              return
            end

            if self[slug_column].to_s.empty?
              test_string = self[source_column]

              proposed_slug = test_string.strip.downcase.gsub(/[\'\"\#\$\,\.\!\?\%\@\(\)]+/, '').to_ascii
              proposed_slug = proposed_slug.gsub(/&/, 'and')
              proposed_slug = proposed_slug.gsub(/[\W^-_]+/, '-')
              proposed_slug = proposed_slug.gsub(/\-{2}/, '-')

              suffix = ""
              existing = true
              acts_as_slugable_class.transaction do
                while existing != nil
                  existing = acts_as_slugable_class.find(:first, :conditions => ["#{slug_column} = ? and #{slug_scope_condition}",  proposed_slug + suffix])
                  if existing
                    if suffix.empty?
                      suffix = "-0"
                    else
                      suffix.succ!
                    end
                  end
                end
              end
              self[slug_column] = proposed_slug + suffix
            end
        end
      end
    end
  end
end

ActiveRecord::Base.class_eval do
  include Multiup::Acts::Slugable
end

The only difference is the use of to_ascii method.

Now, we need to extend the String class. to_ascii method converts Polish UTF-8 characters to ASCII and discards all unknown characters:

require 'iconv'

class String
  def to_ascii
    ascii = 'acelnoszzACELNOSZZ'
    non_ascii = "\271\346\352\263\361\363\234\277\237"
    to_ascii_string = self
    begin
      result = Iconv.new("CP1250", "UTF-8").iconv(to_ascii_string)
    rescue Iconv::IllegalSequence => e
      failed = e.failed.chars.split(//, 2)
      to_ascii_string = to_ascii_string.gsub(failed[0], '')
    retry
    end
   result.tr!(non_ascii, ascii)
  end
end

And we’re done!

Just add the following line to your model:

require 'acts_as_slugable_ascii'

and configure acts_as_slugable as instructed in README file. Now you’re ready to face your users.