Topic: Problem: Normalization vs Performance?

Hey, I need some input for a problem I want to find a nice solution for:

I'm receiving data that looks like the following bundle which describes an 'update':

timestamp, user, [entity1, entity2, entity3], [source1, source2], data

Now I have to somehow create meaningful tables in order to support queries with the following inputs:
- find by timestamp, entity
- find by timestamp source
- find by timestamp, entity, source
....

as I cannot store arrays in the db, I need to split up this single update message
there are 2 possible ways that I think I have to chose from:

1. rows like:

timestamp, user, entity1, data...
timestamp, user, entity2, data...
....

problem: duplicated user, timestamp, etc, multiple rows


2. split into 2 tables:

table entities: timestamp, entity1
table updates: user, data

with a reference from entities to updates

problem: join needed on EVERY request (if I want to list the user and sort timestamp for example)


any suggestions how to solve this? smile

Re: Problem: Normalization vs Performance?

Humble thoughts:

(1) You can store complex objects (such as arrays) in the DB via serialization, it's well supported in AR.  However, it is way not optimal for performing searches.

(2) I wouldn't worry about spreading the stuff over tables because it's the natural relational representation.  RDBMS-es are pretty good at joins.  Create scopes to keep your queries DRY.

(3) Depending on what the whole problem-domain is, you may opt for a non-relational database back-end, such as MongoDB to which this config is more natural.

Re: Problem: Normalization vs Performance?

after some testing .. I think I will go with mongodb smile

Re: Problem: Normalization vs Performance?

as I cannot store arrays in the db, I need to split up this single update message there are 2 possible ways that I think I have to chose from:

Postgresql supports arrays. I'm sure Mongo will be fine, but really how much traffic are you looking at??

Re: Problem: Normalization vs Performance?

it's basically arrays of dateranges, query input is also a daterange - which seems to be working fine

traffic in what sense?