Topic: Best way to structure a database for a large/static dataset

I’m a rails newbie and am looking to create an app to work off a very large dataset. The dataset contains survey data from 1980 through 2011 for about 8,000 respondents. Each year’s survey has between 900 and 7000 variable associated with it.

Basically, the app will be used to view all information for one respondent (seeing all the years at a glance), compare respondents, etc.

It seems like creating a table for each year isn’t necessarily the best strategy. What would be a good way to structure this data for the database? I’ve considered just having a table that’s basically the respondent ID, survey year, variable name, and value name. The difficulty there is that the values would range from simple boolean fields to fields with extensive amounts of text, so standardizing that field would be tricky (and probably very inefficient).

Any thoughts/tips/resources on where to turn?

Thanks in advance.

Re: Best way to structure a database for a large/static dataset

I don't think it would be that inefficient to have a universal model, each record with a space for each type of answer, (boolean, text, date, etc.) .

But before you decide I'd ask yourself what kind of semantics do you want to support, i,e

@answers = Answer.find(...) # find all answers for given respondent, survey and survey_year
@answers.text_answers # return only text answers from selected answers 
@answers.boolean_answers # return only boolean answers from selected answers

or

@answers = Answer.find(...) # find all answers for given respondent, survey and survey_year
@answers.each do |a|  # just plow through all answers, and deal each as needed
  if a.textanswer?
    ...
  elsif a.booleananswer?
    ...
  end
end

That's just an example,  but you get the idea.

There are several approaches, depending on the way you want to use the answers, and how worried you are about sparsely populated tables.  The most efficient would be to end up with separate answer tables, one for boolean answers, one for text answers, etc.  But that may lead you to less elegant programming semantics. 

I'd start by deciding how the semantics will look first, THEN worry about efficiency of the resulting database.  Filesystem storage is cheap, your time spent programming is not!  Make it so it's easy to understand/write the code!

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.