Topic: complex find on large dataset

The scenario. I have one table person, a person can have any combination of 10 qualities such as eye color (blue, brown, hazel, green), height(5-9,5-10,5-11), weight(120-130,130-140), nose length(1 inch. 2 inch), etc.

I have to make a report that has a count of all of the different possibilities and combinations. I do not need to count non-matches. An example: Person count =1 for  Person with blue-eyes, brown-hair, 5-10, 150-160, Person count 5 with green-eyes, red-hair....

So what I did was to create a tables for each quality (EyeColor, HairColor) and then a table that consisted of the 10 types of qualities (eye_color_id, height_id, weight_id) and populated the Quality Table with every possible combination of every type of quality.

It ended up being 137,700 combinations of eye color, nose length, height, etc... Now I am steping through each record in qualities and performing a find on any person that matches those qualities and keeping the count.

A. Is there a better way to do this as it takes forever to run through.

B. Can I perform a Find within a Find. IE if there are not any people with blue eyes why search through those records. I need something like  if found_set = find(blue eyes) >0 do more searching, else go to find(brown_eyes) etc...

I guess something like @people = find(blue-eyes)

if @people then find @people.with_the_next_qualities_without_searching_the_entire_db

Re: complex find on large dataset

This should probably be done with custom SQL, as this is exactly what SQL does best. I'm no SQL expert, but my first stab would be something like this:

SELECT COUNT(p.id), q.eye_color, q.height, q.weight, . . .
    FROM people AS p
        LEFT OUTER JOIN qualities AS q
        ON q.person_id = p.id
    GROUP BY q.eye_color, q.height, q.weight, . . .

I may be way off on that, try it out.

Once you've got the SQL sorted out, you can wrap it in a custom method in your ActiveRecord model for easy access.

NOTE: With this solution, there is no need for the table with all possible combinations and the query would run much quicker than iterating over each of the combinations.