Topic: Designing a data mining schema

A client of mine would like me to implement a data mining form to gather some demographics before releasing a new product. It will be implemented as a single form of questions on the home page of the site. The questions will be pretty standard - age, sex, email, plus some short-answer questions.

I am struggling with how to best model this. Right now, I'm thinking:

Models

Question
  :question => :string
Answer
  :question_id => :integer
  :answer => :string

Associations
Question :has_many Answers

So the views would take care of giving the options for the questions. For example, I would hard code the options for age as an array in the view like:

["under 18", "28-35", "36-45", "46-55", "55+]"

Obviously this would work, but I also want to include some more info to be stored with each answer. For example, I would like to keep all of the answers from one form submittal together. But in order to accomplish this, I would have to make another column in the answers table (:remote_ip => :string). I suppose it would work, but for some reason, it just doesn't feel DRY; I'm thinking that it could be modeled a little bit cleaner/clearer. Any ideas for alternate implementations? Perhaps an implementation that would allow me to push the options for each multiple-choice question into the database, rather than hard coding them into the views?

Last edited by jed.hurt (2007-04-23 00:27:54)

I thought about how mothers feed their babies with tiny little spoons and forks, so I wondered what do Chinese mothers use. Toothpicks?

Re: Designing a data mining schema

Are the questions fairly static or will they need to be dynamically and easily changable? This is a very important question because there are basically two different ways to do this: one easy, static way, and one hard dynamic way.

If you want to go with the easy way, it would be to create one table. Each question would be a column in that table and each answer would be a value.

# migration
create_table :survey_results do |t|
  t.column :age, :integer
  t.column :sex, :string
  t.column :email, :string
  ...
end

If you want to have a multiple choice of ages, you could create a second table: age_ranges. This would relate to the survey_results table by one-to-many. Where a given AgeRange has_many survey_results. This way the age options won't be hard-coded. I recommend doing this for any other multiple choice options as well.

The alternative way to do it is significantly more difficult. This is somewhat of an age old problem of database design because a given question needs to be displayed differently and can have many different forms of input and answers (multiple choice, text field, checkbox, etc.). There was an interesting blog post on o'reilly about this problem, but I'm too lazy to look it up at the moment.

Sometimes you can solve this with 3 tables: questions, answers, and choices. Where question has many answers and choices (for multiple choice, no choices for text fields). The difficult part is recording the entry in the answer because sometimes it's text they type in and sometimes it's a choice they selected. One way to solve this is by having 3 columns in the answers table: question_id, choice_id and content. Then just use the "choice_id" or "content" column depending on the type of question it is.

Not to mention validation. If you want to make sure, for example, that an email address is formatted properly this will be very difficult with the second approach. The first approach is a piece of cake because you can simply do validates_format_of on the email column.

Railscasts - Free Ruby on Rails Screencasts

Re: Designing a data mining schema

ryanb wrote:

There was an interesting blog post on o'reilly about this problem, but I'm too lazy to look it up at the moment.

Is this the post you were referring to?

ryanb wrote:

Are the questions fairly static or will they need to be dynamically and easily changable?

The questions will be static. The survey will probably only be up for about 4 weeks and shouldn't change much, if any.


ryanb wrote:

Each question would be a column in that table and each answer would be a value.

I do like the idea of having a single table, but unfortunately, there are some short-answer questions that would take a full sentence question, so without having column names like what_do_you_think_about_such_and_such, I think I'm going to need another table to hold the questions.

I think I'm still leaning towards hard coding the response options (for multiple-choice questions) straight into the views. This way, each tuple in the answers table holds only a foreign_key to its corresponding question in the questions table, and a second column (string) containing only the answer that the user chose  (rather than making tables for possible answers and juggling foreign_keys to them).

On the other hand, I do really like your idea of having the entire survey be a single tuple (rather than a bunch of tuples being added to the answers table for each survey). This seems like it would make the code cleaner when trying to analyze the data down the line. I suppose the only option for grouping with the multiple tuple scenario is to store a session_id (i.e. request.remote_ip, or or some other GUID) with each tuple in the answers table?


By the way, I'm totally digging RailsCasts. I've watched every one of them, and have learned quite a bit. Never change, ryanb! wink

Last edited by jed.hurt (2007-04-23 21:23:25)

I thought about how mothers feed their babies with tiny little spoons and forks, so I wondered what do Chinese mothers use. Toothpicks?

Re: Designing a data mining schema

jed.hurt wrote:

Is this the post you were referring to?

Yep!

jed.hurt wrote:

I do like the idea of having a single table, but unfortunately, there are some short-answer questions that would take a full sentence question, so without having column names like what_do_you_think_about_such_and_such, I think I'm going to need another table to hold the questions.

I was thinking you would hard-code the question in the view. But if you prefer not to do that then creating a second table seems the best solution.

jed.hurt wrote:

I suppose the only option for grouping with the multiple tuple scenario is to store a session_id (i.e. request.remote_ip, or or some other GUID) with each tuple in the answers table?

You could do that. Or you may want to create a new table to keep track of this which each answer can belong_to.

Railscasts - Free Ruby on Rails Screencasts

Re: Designing a data mining schema

ryanb wrote:

I was thinking you would hard-code the question in the view. But if you prefer not to do that then creating a second table seems the best solution.

Yeah, I think this will be more DRY in the long run. When I create the admin views to analyze the data that was collected, I'll just pull the questions straight out of the questions table in the database (i.e. @answer.question), rather than having to copy and paste the questions from the survey view into the admin view.

Last edited by jed.hurt (2007-04-24 12:01:38)

I thought about how mothers feed their babies with tiny little spoons and forks, so I wondered what do Chinese mothers use. Toothpicks?