Using Paperclip you might have
noticed that it doesn’t change any of your file names by default. You might
find some of its interpolations useful in case you don’t need human readable file names.
In case you do want human readable file names you can simply use
:basename.:extension or :filename interpolations.
However, there is one thing you should keep in mind. Let’s say someone uploaded
“foo bar.jpg”. Yes, with space in its name. Later on, when your application
build a URL for that file, that space will be encoded into %20. So when
user’s browser will try to fetch that file it could fail, because your
application doesn’t care about such cases or your CDN provider doesn’t care.
But we should care, because showing images is important for business and we don’t want to build walls of rules around our users.
One of the possible solutions would be normalization of file names to store
them without any special symbols.
The simplest way is to add your own interpolation using Paperclip’s API. This would still keep the original file name in the database, but change the real file name to what you want, running that interpolation every time you build URL for a file.
We decided that it’s more useful to have already normalized file names in the database, so the way to achieve that is a little bit different.
I started from designing the class that will take care of file name normalization and its spec.
Normalizer is a simple module with some of Ruby magic. self.included(base) method let’s you configure behaviour of a class that includes this module. In this particular case I just create before_save callback that runs normalization.
In normalize_filename method, each_attachment is a Paperclip’s method that lets you iterate over all has_attached_file definitions. Using class above and Paperclip’s API I change name of the file and it gets saved in normalized version.
All I have to do now is just include this module in Asset class
123
classAsset
includeAssets::Normalizer
and
If you use STI as we do, you don’t need to do anything else, because normalize_filename method will be inherited as well as callback. If you just have attachments in different classes you can include this module there.
Your interpolations remain untouched.
This approach lets you do anything you want with file names, let’s say randomize their names as in article mentioned above.
Paperclip is well known gem that adds image upload to your application. Many applications use it so do we.
In fact, once you got it working according to your business rules you can forget about it. So did we for two years. Our image upload volume was low, however it’s dramatically increased recently.
We were using pretty standard way of storing images, like that
1
"/assets/images/:id/:style/:basename.:extension"
However, use of this folder structure leads to one issue that hard to notice in the beginning - limit of sub-directories per one directory in some filesystems. So we decided to change it in advance before we reach any real issues with that.
Paperclip actually has a good interpolation for that, however it’s not used by default.
:id_partition is that important piece that won’t let your image directories reach any limits. Given image ID = 25500, this interpolation will create 3 directories 000/025/500 for every image, so you’ll have 1000 directories in one directory at most.
So I came up with new directory structure like this
Then I needed to figure out how to migrate tons of existing images to new directory structure. If you simply change your interpolations, Paperclip will start building path to image according to new rules, however you’ll still have those images in directory structure you had, only the new ones will be uploaded into correct directories.
You can write something that will move files into correct places. However, you can achieve that easier and have more flexibility using Paperclip in your script.
I came up with script below. It’s huge, but look throught it and I’ll explain some of its parts later on.
You can adjust time format passing a format string to Logger::Formatter.new.
The second thing is Thread.abort_on_exception = true. This is important for debugging, otherwise your script will fail only in the very end, waiting for other threads to finish.
The next thing is to collect folder names, that are actually IDs of your assets, to a queue. I use Queue class here, because it’s safe way to syncronize queue among threads. Code is very simple, it just iterates over all directories and put their names into queue.
The next thing is threads. The first version of script didn’t have threads, however when I run it and calculated time to complete I got 18 hours. This was too long and unitilization of CPU and memory was very low. So I introduced some threads to speed up the process. With four threads estimated time was 8 hours. Not ideal, but this is something you can work with. Unfortunately, using more threads caused deadlocks.
Threads. Every thread takes an asset ID from the queue, gets the record from database, gets the file from directory where I put all existing assets, assings it to the record and just saves it. The rest of the job is done by Paperclip. With my configuration, Paperclip was saving files locally, but you can do something similar and save all files to S3 or other storage you use.
There is few things you might be interested in. Moving things around many times we’ve got some inconsistencies between database and actual files, so those rescue is a way to get rid of this.
rescue ActiveRecord::RecordNotFound will not try to process any files that don’t have records in database.
rescue ActiveRecord::SubclassNotFound will not try to process any records that don’t have STI class defined.
To get correct object with correct file path with less effort I just build attachment object in the memory passing different interpolations for path and then assign it to the record’s attachment which makes Paperclip happily process old file and save everything to new directory structure.
And the last piece is weird preload task. By some reason the first thread couldn’t find any of the classes defined, so I preload them before running migration.
It looks a bit ugly, but it’s good enought for one time migration from one directory structure to another, moreover using Paperclip you are flexible enought to upload files to S3 or use different processor, or… you name it.
I had a task to create bulk import of images from AWS S3 and attach them to existing records. In our case Paperclip doesn’t use S3 storage so we use S3 to only import images uploaded by photographers.
Doing any kind of manipulations with files and Paperclip is pretty easy. I use Official AWS SDK to get images from S3, read them into temporary file and let Paperclip do the rest of job for me.
This is simple example how to attach S3 file to one of your existing records
1234567891011121314151617
# variables
# file - S3 object
# record - AR record with Paperclip's has_attachment
with this we read contents of a file from S3 into temporary file on our server and then just let Paperclip use that file to create all images for all sizes and put them to storage we use.
Easy! However this code has one small issue which was very important for my task - file name. We want retain original file name.
Let’s say I’m importing unicorn.jpg from S3 and I assumed that final file name will be unicorn.jpg, but it’s not because of the way Tempfile works
as you see Tempfile changes the name of file to keep it unique and Paperclip of course uses this file name.
After quick digging into Paperclip’s internals you may find that if File object responds to original_filename method, then Paperclip use the name provided by this method.
Let’s create that method
12345678
classImageTempfile<Tempfile
attr_reader:original_filename
definitialize(file_name)
@original_filename=file_name
super(file_name.split(/(.\w+)$/))
end
end
and change code that does import to use our new class
1
temp_file=ImageTempfile.new(file_name)
Paperclip now use our pretty file names instead of those that Tempfile gives you. That’s it.
Rails community doesn’t like deriving business logic to the database, but in some cases stored procedures are very helpful and many people trying to use them in Rails, however it’s not so easy as you can imagine.
Running ActiveRecord::Base.connection.execute("CALL proc01") will give you a bunch of errors in different cases.
Let’s say your procedure returns some result set. So running that procedure will give you exception
1
ActiveRecord::StatementInvalid: Mysql2::Error: PROCEDURE can't return a result set in the given context
In case your procedure doesn’t return any result set, then running it twice will give you another exception
1
ActiveRecord::StatementInvalid: Mysql2::Error: Commands out of sync; you can't run this command now
In other case when stored procedure doesn’t return any result set at all, you’ll get NoMethodError.
All these issues are well known, however they aren’t fixed yet, even in Rails 3.
Let’s look at the first issue. When MySQL runs stored procedure it has to know that client can handle multiple result sets. By default MySQL assumes that client cannot handle this unless you set CLIENT_MULTI_RESULTS flag when establishing connection to MySQL server. It’s not a surprise that neither Rails or MySQL2 doesn’t do this, because in most projects you don’t need multiple result sets. In the future we’ll probably have an option to configure this, but until then let’s create a workaround.
We use MySQL2. Its latest 0.2.6 gem release is kind of outdated for Ruby 1.9.2, so we did a fork from edge version at some stable point. MySQL2 defines its own mysql adapter for Rails in lib/active_record/connection_adapters/mysql2_adapter.rb. We’re interested in a method that creates connection object:
This place looks good to put our additional flag for MySQL, but wait! There is other flags already, so let’s just re-use this and let adapter pass it further.
Create a file in config/initializers with the following content:
So now you can pass any additional options from your database.yml. See that 131072? This is the value of CLIENT_MULTI_RESULTS constant. Not so clear, because you have to know those magic numbers, but OK for the beginning.
If you want to pass more options, remember that you must use bitwise OR operator, so in database.yml it will be
database.yml
1
flags:<%= 65536 | 131072 %>
where 65536 is the value of CLIENT_MULTI_STATEMENTS constant. BTW, enabling only CLIENT_MULTI_STATEMENTS will automatically enable CLIENT_MULTI_RESULTS.
As you know all CI tools rely on command exit status, if you have 0 then your build is OK, if you have something more than 0 then your build’s failed.
You can simply see that by running any command and when is’t finished run
1
echo$?
which tells you the last exit code.
Configuring CI server I found that all our our builds are passing, regardless the fact that we have some failing specs. Any combination of specs wether passing or failing returned exit code 0.
I don’t know how come I didn’t find this issue, so at first I was blaming our code and with help of my colleague we begun digging into this.
There is not too many ways to override exit code. As Ruby documentation says you can intercept SystemExit exception or define your own object finalizers - at_exit and ObjectSpace.define_finalizer.
However we didn’t find anything in the project so my next victim was RSpec and I found that Runner calls at_exit
123
defautorun# :nodoc:
at_exit{exitrununless$!}
end
Trying to find a way to fix this I went to RSpec issues, just to see what’s new and found one month old issue with the same problem and possible solution.
I’ve applied that to RSpec, run its specs and everything looks good so far. So if you’re struggling without proper exit codes you can create a monkey patch and put it in your spec/support folder
12345678910111213
moduleSpec
moduleRunner
# This monkey-patch apply fix to force RSpec return
Few weeks ago I was trying to make Integrity work with Ruby 1.9.2 and Bundler. It’s a well known CI tool, but kind of abandoned. I thought making it work with Ruby 1.9.2 could be tough, however the real problem was in Bundler.
Integrity use Bundler to manage its dependencies. When Integrity runs a build it opens new subshell where your project is building.
Here is how Integrity does it
12345678910
defrun(command)
cmd=normalize(command)
@logger.debug(cmd)
output=""
IO.popen(cmd,"r"){|io|output=io.read}
Result.new($?.success?,output.chomp)
end
The issue arise when your project use Bundler too. Who doesn’t? In this case your project is trying to use Integrity’s Gemfile which is not that you wanna do. Integrity should use its own Gemfile as well as your project should use its own.
This happens because Bundler change your environment to do what it does. It sets BUNDLE_GEMFILE variable which points to Integrity’s Gemfile. Even when Integrity runs your project in subshell this variable is there, because subshell inherit its parent environment.
Looking for solution on the web you can find recommendations to use Bundler.with_clean_env method, however this was working solution for old Bundler versions I guess. With modern versions it doesn’t help, because this method doesn’t cleanup BUNDLE_GEMFILE variable. Moreover, Bundler sets and doesn’t cleanup two more variables - RUBYOPT and BUNDLE_BIN_PATH. So unless you have these variables in subshell you’ll keep using Integrity’s gems.
To avoid this I went almost the same way as with_clean_env does - replace current environment with the one you want and restore it when command is finished, in the same time removing those three variables.
Do you use any Git GUI tool? I do. Although, git command line has everything I need for comfortable work, I still prefer GUI tools. Linux users have Gitk - pretty ugly, but powerful tool, likely there is a clone of this tool for MacOS - Gitx.
Gitx was released in 2008 with as a fairly simple, but prospective clone of Gitk, however a year later its development stopped. Not a bad news though, because it has a network of more than 100 forks and there is one far more advanced (experimental) version of Gitx.
Personally, I don’t like generation of rdoc and ri documentation when using RVM, because it slows down gems installation and I don’t need documentation for the same gem under different versions of Ruby.
I can always disable this manually (which is not convenient)
1
gem install some_gem --no-rdoc --no-ri
or do the same by adding
1
:gem: --no-ri --no-rdoc
into my ~/.gemrc, however it doesn’t help when I work under RVM. Likely, gem command checks for global settings in /etc/gemrc file, so adding
1
gem: --no-ri --no-rdoc
into /etc/gemrc helps to solve this problem. Now every gem installation under any Ruby version will use --no-rdoc --no-ri options.
Watching Rails 3 Edge commits I’ve noticed an addition to ActiveSupport - ActiveSupport::Concern made by Neeraj Singh. In fact, his commit is just a piece of documentation and ActiveSupport::Concern itself was added by Joshua Peek about a year ago. Shame on me I didn’t know this.
What it does? It’s a nice extension of Module that let’s you adding instance or class methods to a class, call its methods, etc.
The old Ruby way of doing this (and you still have to follow this way in pure Ruby):
123456789101112131415
moduleM
defself.included(base)
base.send(:extend,ClassMethods)
base.send(:include,InstanceMethods)
scope:foo,:conditions=>{:created_at=>nil}
end
moduleClassMethods
defcm;puts'I am class method';end
end
moduleInstanceMethods
defim;puts'I am instance method';end
end
end
Using new Rails Edge way, the module above can be rewritten as follows:
123456789101112131415
moduleM
extendActiveSupport::Concern
includeddo
scope:foo,:conditions=>{:created_at=>nil}
end
moduleClassMethods
defcm;puts'I am class method';end
end
moduleInstanceMethods
defim;puts'I am instance method';end
end
end
Let’s say we extend ActiveRecord::Base with module M:
123
ActiveSupport.on_load(:active_record)do
includeM
end
…and as a result we’ll have all ActiveRecord::Base classes with class method cm, instance method im and scope foo.
Working on a Rails project I got an error that every Ruby developer knows syntax error, unexpected $end, expecting keyword_end. Usually it means that someone left out end keyword somewhere in the code. I quickly went through the code and found no sign of that, but Ruby still points at the end of one hundred lines of code file.