Topic: Cron to update database

Hi everyone.
I am in the process of porting a site to rails. I am very happy with the overall result but I need some pointers with Ruby and with Rails to make my application more efficient.

So first things first.
The original site (written in php) runs a script as a cron job every night. This script reads a rather large flat file whith fields separated by | and imports it into a database table. My first attempt at the ruby script made use of Active Record but then I decided not to use it and rewrote the script:

SQL = <<STR
use crumun_development;
truncate originals;
STR

FIELDS = <<STR
insert into originals (hoy,ship_cd,depart...-and alot more fields here -...promotion) values (
STR

IO.popen("mysql -u root", 'w') {|io| io.puts SQL}
IO.readlines('file.txt').each do |line|
  values = line.split(/\|/).collect {|field| "'#{field}'"}.join(",")
  IO.popen("mysql -u root", 'w') do |io|
    io.puts FIELDS + values + ")"
  end
end

I then realized that opening a pipe and sending login info  to mysql for every row in the file was redundant so I changed the script to:

mysql = IO.popen("mysql -u root", 'w')
mysql.puts SQL
IO.readlines('rssc2.txt').each do |line|
  values = line.split(/\|/).collect {|field| "'#{field}'"}.join(",")
  mysql.puts FIELDS + values + ")"
end

But then mysql complains: packet biggier than 'max_allowed_packet'. I know I can change this value (at least in theory) but I thought this was an excellent time to get some help.
How would you approach  this? How could I improve the script so it takes the least possible time to process?

Re: Cron to update database

I'd try using the MySQL Ruby API rather than IO. You may already have it installed. If the limit you are running into is client side, it might solve the problem.

BTW, how did the PHP script solve this? Did it import it all at once like you are doing here?

Last edited by ryanb (2006-08-03 12:49:08)

Railscasts - Free Ruby on Rails Screencasts

Re: Cron to update database

If you're really just taking a simple text file and dumping it into mysql you may not need ruby/php at all.  Mysql will let you import a file directly and you can specify what character separates columns.

Re: Cron to update database

Good thinking Danger! If you aren't doing any kind of processing then you can just load the text file using the LOAD DATA command. Something like this (untested):

LOAD DATA INFILE '/path/to/file' INTO TABLE my_table FIELDS TERMINATED BY '|' (col1, col2, col3)

Railscasts - Free Ruby on Rails Screencasts

Re: Cron to update database

The php script is very similar to the ruby one. It loops trough each line in the file, procees it , creates the sql query and sends it to the database. I just did not know how to talk to mysql properly from a ruby program. The Mysql API is just what I needed.
Thanks a bunch for that ryanb.

danger, I am using ruby/php to process the file because I do not need all the fields contained in the text file. I also filter the rows so there is not one record for every row in the file. I also need to re-format some date fields properly and then I need to wrap everything in a transaction. The script I posted is overly simplified. There is a lot more proceesing involved but I just take one task at a time...otherwise I tend to get lost.

Which leads me to another question: What do you guys think would be faster?

a) Dump the whole text file into a temporary  table, extract and process what I need and then insert it into another table, or
b) Have ruby procees the text file before inserting it into a table so what I do not need gets discarded whithout ever going into the db

I have the script setup as in option a but it takes forever to finish. The text file is huge and I only really need about 20% of the rows it contains.

Re: Cron to update database

I would think that modifying the text file before attempting to load it would be faster.  This would reduce your result set at the earliest possible point.

Re: Cron to update database

I don't know. If you only need 20% of the file, then I would say loading it into a temporary table and then SELECTing the rows you need and working with that would be faster. Then again, it probably depends a lot on what you are doing with each row in ruby. If performance matters a lot, I would try both and see.

Railscasts - Free Ruby on Rails Screencasts

Re: Cron to update database

Either of these could be faster - it depends on what you're doing.  If it's worth the time try it both ways - and let us know how it turns out!

Re: Cron to update database

I believe the fastest way to do this is to leave as much work as possible to the database itself.
The solution I came up with based on your suggestions is dead simple. It involves 3 tables:

Originals - is where the text file gets dumped.
Temp  - is a (not really) temporary table
Live - is the table that holds the data accessed by  the application. (I needed this table to remain untouched for as long as possible for obvious reasons and reduce the time spent messing around with it)

I realize this si pretty basic stuff but if I'd show you the script I wrote 8 months ago to handle this...well

require "mysql"
sql = Mysql::new("localhost", "user","password","db")

#dump the whole file into table originals the fastest way. Hands down.
sql.query("LOAD DATA INFILE '/root/file.txt'
                INTO TABLE originals FIELDS TERMINATED BY '|'" )

#create table temp selecting only needed fields and records
sql.query("CREATE TABLE temp SELECT ship_cd , depart , ..and some more columns..  FROM originals WHERE...")

#before dropping table live and renaming temp I need to modify some date fields and do some furher processing but
#at this point I got rid of 80% of useless data and processing time is no longer a concern

sql.query("DROP TABLE live")
sql.query("RENAME TABLE temp TO live")
sql.query("TRUNCATE originals")

Last edited by alce (2006-08-03 18:37:13)