Topic: virtual folders / smart filtering

Having recently started coding Rails, I'm now stuck at drafting something we all know and like: virtual folders (or what's called 'smart playlist' in iTunes). The concept is the following:

Users can create [many] filters. These filters have [many] rules and [many] data sources. Each data source has [many] data.

Here's a real world example of a personalized news poll: User A likes digg, slashdot and del.ico.us. He dislikes all other (similar) services. He's interested in articles related to technology and/or science. He's also interested in content with lots of comments or lots of diggs, but also stories from the author X. First he'll tell the new filter to only be applied to his favourite services (via multiple checkboxes), then with the following configuration:

Notify me if at least [2] of the following criterias are met:
- Category is [Technology]
- Category is [Science]
- Has more than [50] comments
- Has more than [100] diggs
- Author is [X]
(e.g. the notification could be a daily email with the aggregated links)

Now the question is how to implement the rules database-wise. Let them have a seperate table? Probably - but with which columns (I could make this more complex with rules such as 'has more comments than the monthly average comment count')?  Plus, do we convert all these rules to AcriveRecord queries on the fly? Won't we run into major problems once all the filters and rules have to be evaluated every time a new post is being polled by the system?

Or, should rules and filters have a many-to-many relationship (therefor removing all the duplicate rules), should rules be evaluated on each new incoming content snippet, then tell their parented filters if the specific rule has evaluated true or false? This still leaves the question on how to store the search queries.

Any thoughts/ideas/suggestions highly appreciated.

Re: virtual folders / smart filtering

This is no easy task.

First make sure your users need this amount of control. You may be able to get by with an advanced search. You could create a searches table with comments_count, diggs_count, etc. columns for setting the conditions. You could create many-to-many relationships between categories as well.

Obviously that approach has its limitations. If it doesn't satisfy the requirements you'll have to add another layer of abstraction which is what you were originally thinking (a rules table, etc.). Here's one way to do that:

filters
required_conditions (number of conditions which must be met to succeed)

conditions
filter_id (belongs_to filter)
rule_id (belongs_to rule)
comparison (=, <=, >=, !=, starts with, contains, etc.)
value (user specified number/string/id)

rules
name (Author, Category, etc.)
column_name (cateogires.id, articles.comments_count, article.diggs_count, etc.)
type (STI so you can change the behavior for different types of rules)

You will then need to generate the SQL query dynamically for the given filter's conditions.

Hopefully that will get you started.

Railscasts - Free Ruby on Rails Screencasts

Re: virtual folders / smart filtering

Ryan, thanks for your reply. Am I correct that your 'advanced search' suggestion is to have one table with quite a lot of columns storing the user specified searches, e.g.

Searches
user_id
search_type
digg_count_min
digg_count_max
comments_count_min
comments_count_max
author_name
topic_contains
...

plus a n:m relationship to categories? When reducing the logic to a simple AND/OR search this seems viable - although the one thing I never liked about search engines was that one cannot specify many conditions and it'll return you a sorted list with the results that have many conditions met first, descending to the results that only meet one or two conditions.

As for the second suggestion: am I right that you've put 'comparison' and 'value' into the conditions table to keep the rules table short? Also, is the rules table basically just a list of all available search parameters, each with a human readable name and the relevant table column?

What I'm still fearing is the CPU load when implemented in a non-optimized way. Imagine this code would be used in a real world app: this could be anything from a personalized news filter to a vacation planner and hotel search ("show me all available hotels for [date] that ideally have WLAN, a swimming pool, breakfast until 2pm, are close to the city centre etc"). Now if I end up with 1000 users that each have 5 active 'smart searches' plus daily updated hotel stats (.available?) for hundreds of hotels, wouldn't the app have to construct *lots* of SQL queries day for day?

Considering that the 'smart filters' are more static than the data they are filtering, wouldn't it be better to work from the updated data towards the filters and users (bottom-up), instead top-down? E.g. the app polls a new story on digg, then checks which rules are met, find the parented filters and increment a counter - and once that counter hits a certain level, call a method that alerts the user? In this case, wouldn't I need a HABTM for incoming data and filters, and store the counter in this table?


PS: yes, I'm aware that this is a complicated logic, but I'm learning rails to accomplish stuff that's not available otherwise, so even if the classic blog/comments or products/cart applications are helpful to learn the language, there are not the reason why I'm doing all this :)

Re: virtual folders / smart filtering

aleco wrote:

Am I correct that your 'advanced search' suggestion is to have one table with quite a lot of columns storing the user specified searches, e.g.

Yep.

aleco wrote:

am I right that you've put 'comparison' and 'value' into the conditions table to keep the rules table short?

Nope, I put this in a separate table because it is user definable and can change depending on the filter. Someone may want to search for a given post which has fewer than 100 comments, or one that has more than 100 comments - it uses the same "rule" (which I guess should be renamed to something more descriptive) but provides a different condition.

aleco wrote:

Also, is the rules table basically just a list of all available search parameters, each with a human readable name and the relevant table column?

Right.

aleco wrote:

What I'm still fearing is the CPU load when implemented in a non-optimized way. Imagine this code would be used in a real world app: this could be anything from a personalized news filter to a vacation planner and hotel search ("show me all available hotels for [date] that ideally have WLAN, a swimming pool, breakfast until 2pm, are close to the city centre etc"). Now if I end up with 1000 users that each have 5 active 'smart searches' plus daily updated hotel stats (.available?) for hundreds of hotels, wouldn't the app have to construct *lots* of SQL queries day for day?

The search would be performed every time the user goes to the smart list. So it depends on how often it is visited. There are various ways of caching this returned result, but this causes problems because then it won't be a live result and/or you will have to expire the caches after each change.


aleco wrote:

Considering that the 'smart filters' are more static than the data they are filtering, wouldn't it be better to work from the updated data towards the filters and users (bottom-up), instead top-down? E.g. the app polls a new story on digg, then checks which rules are met, find the parented filters and increment a counter - and once that counter hits a certain level, call a method that alerts the user?

If you want to alert the user when a smart filter changes then this will be more difficult because you can't use the live query I mentioned above. Do you really need this requirement?

aleco wrote:

In this case, wouldn't I need a HABTM for incoming data and filters, and store the counter in this table?

I don't know, do you need to store the counter? From my understanding this is a way to optimize things, but I don't recommend worrying about that until you know you need it. It's reasonably easy to cache the count in a column later.

Railscasts - Free Ruby on Rails Screencasts

Re: virtual folders / smart filtering

ryanb wrote:

If you want to alert the user when a smart filter changes then this will be more difficult because you can't use the live query I mentioned above. Do you really need this requirement?

Hmm. Well, the concept of the app was to create a smart agent. You tell the app once what you're interested in, and you're done. While I'll initially write it for data I can easily poll from the web, the goal is to create a system that can be used for many different datasets. It should be usable for things like

- "One day I'd like to fly to Paris, London or Rome. Too bad I don't have a travel agent who'd call me if a cheap last minute flight is available."
- "Even if I'm on vacation drinking Caipirinhas on the beach, I'd still like to recieve the most important sports news via email, so I can read them on my 3G cell phone. Just don't bug me with ice hockey, football or soccer, give me the stuff *I* like."

As you can see, it's about data snippets knowing who might be interested in it, instead of users actively searching.

Re: virtual folders / smart filtering

I'm unsure if there is anything like an event handling system in Rails (as I'm very new to rails), but I now think it's easiest if:

1) filters and their respective rules register for specific events they are interested in
2) new data triggers events for the evaluation types (comments, author,...)
3) rules that recieve an event increment a counter inside the filter
4) on each counter incrementation the filter checks if enough conditions are met

This way data and filters don't have to know anything about each other. Data snippets simply send all the information they can provide via events, and filter rules pick them up. Theoretically data snippets could even check if evaluation types for the information they can provide already exist, and if not, trigger an event that will register the new evaluation type, so it can be used in future filters.

The main downside of this solution is that one cannot search through 'old' data - but I could live with this. Unfortunately I'm a bit clueless how the models and controllers would look with the above concept. And I'm not even sure if this is the best way to deal with the problem either. Any thoughts?

Re: virtual folders / smart filtering

Perhaps what you want is an observer. You can observer changes to given models. Then you can call the appropriate actions/triggers if the given model meets the filter's criteria.

Railscasts - Free Ruby on Rails Screencasts