Topic: batch methods, GC and memory usage

I'm struggling with a method that does a batch import of a CSV file read from the web, creates about 1000 objects, saves them and additionally creates about 100.000 additional HABTM join table entries (it's not something that is run frequently, rarther only by myself a few times a month).

The problem is: mongrel_cluster grows from an average of 50MB to 300MB - which is a problem as the slice VPS I'm using only has 256MB RAM.

I now wonder how garbage collection works, and what I could do to optimize the task. I can only assume that the mongrel garbage collection does not trigger inside a request, but only after the entire (batch) method is completed. If I'm right, this would mean that many thousand objects are kept in memory even if they aren't needed anymore.

Any suggestions on how to deal with the situation? Or, what's the suggested way to keep garbage as small as possible when processing a loop that creates objects? Should I set the objects to nil after saving each of them?

Currently my code is something like:

Parser.read_data_from_web(url).each do |line|
  x =
  x.blah = line[:something]

As the entire batch process runs for a couple of hours (well, the server starts swapping after 10min, and then it grinds itself to death), I can't easily test any changes, so any suggestion on how to find the cause of the massive memory increase will help.


Re: batch methods, GC and memory usage

scrap that - I realized the huge memory spike happens later in the called method, when creating a hash with three arrays as values, each array holding some thousands of floats. For some reason the memory isn't released when that hash isn't needed anymore, so I'll have to mess around a bit more and see how I can solve that problem.