Topic: Messy method

I am having some trouble refactoring a method that is causing a huge performance hit.  The method takes an array of traits and an array of objects and works out the average of each trait in that object.

The problem is that the method iterates over the array of objects and also the array of traits.  The objects array can be quite large so I have been trying to find a way to achieve the same result with no looping through either array. 


One solution I thought of was to carry out an SQL query of the form:

BlupData.find_by_sql("SELECT #{trait_col} FROM cattle c, blupdata b" +
"WHERE c.cattleid IN(#{cattle_ids}) AND c.cattleid = b.animalid")

However I still have to loop over the array to find all the id's and the query does look rather messy.  Is there a better way to do this?

Re: Messy method

To clarify my last post, the method in question does something like this:

def find_average(obj_array, column_hash)
  to_return = Object.new
  column_hash.each do | column |
      temp = { column.col_name => 0 }
      column_count = column_count.merge temp
      for obj in obj_array
         column_value = obj[column.col_name]
         if column_value
           to_return[column.col_name] = 0 if to_return[column.col_name] == 0
           to_return[column.col_name] += column_value
           column_count += 1
         end
       end
       unless obj_array.length == 0
         (to_return[column.col_name] = sprintf("%.1f", to_return[column.col_name] / column_count[column.col_name])) if to_return[column.col_name]
   end
   to_return
end

So if there are 5 objects and 5 columns to find the average of then each object will be looped over 25 times to find the average.

The solution I came up with was:
1) Loop through each object and get its id
2) Loop through each column and get its col_name
3) Query the database to return the average for each column in the object array

Is there a better way to do this?

Re: Messy method

I can't understand even what your code is trying to do.  Can you provide some documentation?  eg comments?  Also, can you explain what your variables are?  And, when you access an attribute of the current object (ie the object the method is called on) you should precede it with "self.".

For example you use 'column_count' without defining it anywhere.  Is this an attribute?  A method?  or what?  Since you call .merge on it i'd guess it's a hash but who knows?

Iterating over an array in memory shouldn't be a performance hit unless the size of the arrays is very very large.  What is a performance hit is repeatedly accessing the database.  Normally with something like this you would

a) find some cunning way to do it in a single sql call.  This tends to be the fastest but can be very tricky to set up, and tends to result in hard-to-read code.

b) Load all the data you will need out of the database first, and then do some ruby stuff with it to get the values you want.

I'd do b first, as simply as possible, then a pure sql solution may present itself.

Last edited by Max Williams (2009-06-19 08:52:32)

###########################################
#If i've helped you then please recommend me at Working With Rails:
#http://www.workingwithrails.com/person/ … i-williams