Topic: Parsing data with ruby

Hi there,

I'm trying to figure out the right direction to go in regards to learning what I need to create a script that can take numerous files with raw cron data in it and convert it to a human readable format. Each file will have the raw output of crontab -l in it like the following:

10,25,40,55 * * * * /some/cron/here > /dev/null 2>&1
30 */4 * * * /some/cron/here

I need to parse the data in these files and output it in a format similar to the following:

Cronjob | # of Servers | Every minute | Every hour | Every day | Every week | Every month
-----------------------------------------------------------------------------------------
CronHere| 10 | N | N | Y | Y | Y
CronHere| 8 | Y | N | N | Y | Y

I'm fairly new to ruby, and I'm trying to do as much of this as possible myself, so any guidance on what I'd need to be able to do this would be greatly appreciated.

Re: Parsing data with ruby

If your priority is to learn Ruby,  that's one answer,  if you just want to get it done with as little effort as possible,  I'd use awk and call awk from ruby.

If you want to learn more about Ruby, if you google

Ruby Awk

You'll find several examples of people who have done things in pure Ruby that they formally had done with AWK,  it should provide you with lots of examples.

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

Thanks for the reply. I've been taking a look at the results, and it seems like I have a long ways to go as I don't understand how I would do it in regards to doing what I want to do.

The biggest thing that confuses me is how I would tell it to add a N or a Y depending on whether there's a * or not, as well as how I would actually compile the list of crons on each server and get a total count on the hits for each cron.

I've always struggled with programming, and hitting a bit of a roadblock here with ruby as this is a fairly complicated task I want to do.

Re: Parsing data with ruby

OK, if you are that new to programming,  forget AWK,  it's just another learning curve you don't need right now.

First thing is to write a ruby program opens a file, reads each line, the just write that line back out.

file = File.new("crontab.out", "r")
while (line = file.gets)
    puts line
end
file.close

Get that to work and let me know

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

Thanks again, Brad. If we could somehow do this piece by piece that would be awesome, as it'll be a lot easier for me to understand. Here's the output (I just changed the name of the file in your code to cronjob.txt)

# ruby cron.rb
10,25,40,55 * * * * /usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1
30 */4 * * * /usr/bin/test -x /usr/local/cpanel/scripts/update_db_cache && /usr/local/cpanel/scripts/update_db_cache
*/5 * * * * /usr/local/cpanel/bin/dcpumon >/dev/null 2>&1
9 1 * * * /usr/local/cpanel/whostmgr/docroot/cgi/cpaddons_report.pl --notify
45 */8 * * * /usr/bin/test -x /usr/local/cpanel/bin/optimizefs && /usr/local/cpanel/bin/optimizefs
45 2 * * * /usr/local/cpanel/scripts/upcp --cron
0 1 * * * /usr/local/cpanel/scripts/cpbackup
35 * * * * /usr/bin/test -x /usr/local/cpanel/bin/tail-check && /usr/local/cpanel/bin/tail-check
30 */2 * * * /usr/local/cpanel/bin/mysqluserstore >/dev/null 2>&1
15 */2 * * * /usr/local/cpanel/bin/dbindex >/dev/null 2>&1
0 6 * * * /usr/local/cpanel/scripts/exim_tidydb > /dev/null 2>&1
2,58 * * * * /usr/local/bandmin/bandmin
0 0 * * * /usr/local/bandmin/ipaddrmap
0 6 * * * /usr/local/cpanel/scripts/exim_tidydb > /dev/null 2>&1
45 */4 * * * /usr/bin/test -x /usr/local/cpanel/scripts/update_mailman_cache && /usr/local/cpanel/scripts/update_mailman_cache
15 */6 * * * /usr/local/cpanel/scripts/recoverymgmt >/dev/null 2>&1

As you can see, all standard CPanel cron jobs. So, that reads and outputs the file. I guess the next step is to somehow separate the actual times from the command being executed? Or would you do that some other way? Such as, telling it to look for a * or a number and then sorting the crons later?

Re: Parsing data with ruby

file = File.new("crontab.out", "r")
while (line = file.gets)
    parts = line.split(' ')
    puts "min = #{parts[0]}"
    puts "hour = #{parts[1]}"
    ...
    puts "day of week = #{parts[4]}"
end
file.close
Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

Hmm, that gives me an interesting output.

]# ruby cron3.rb 
min = 10,25,40,55
hour = *
day of week = *
min = 30
hour = */4
day of week = *
min = */5
hour = *
day of week = *
min = 9
hour = 1
day of week = *
min = 45
hour = */8
day of week = *
min = 45
hour = 2
day of week = *
min = 0
hour = 1
day of week = *
min = 35
hour = *
day of week = *
min = 30
hour = */2
day of week = *
min = 15
hour = */2
day of week = *
min = 0
hour = 6
day of week = *
min = 2,58
hour = *
day of week = *
min = 0
hour = 0
day of week = *
min = 0
hour = 6
day of week = *
min = 45
hour = */4
day of week = *
min = 15
hour = */6
day of week = *

So, thus far I can list the contents of a text file containing cron output, and I can generate a list of the cron execution times based on minutes, hours and day of the week.

Based on the way this is listed, I'm guessing that every 3 rows is a different cron job? I.e. this would be the first two cron jobs in the list:

hour = *
day of week = *
min = 30
hour = */4
day of week = *
min = */5

Or am I interpreting that wrong?

Re: Parsing data with ruby

No you have it right,  I left off two of the array elements, thinking you could fill in the ... with the appropriate code,  an exercise left to the reader :>

Let's clean it up so it's complete and easier to read:

file = File.new("crontab.out", "r")
while (line = file.gets)
    parts = line.split(' ')
    puts "min = #{parts[0]}, hour = #{parts[1]}, day of month = #{parts[2]}, month = #{parts[3]}, day of week = #{parts[4]}"
end
file.close

Last edited by BradHodges (2011-11-08 19:45:20)

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

OH! I'm sorry, I didn't know you wanted me to fill that in. If you want me to work on something specific just add a comment in the code or let me know in the post and I'll tackle it.

Re: Parsing data with ruby

Not a problem,  in my last post there was a minor syntax error,  I've since corrected the post,  but see if you can figure it out first befor you look at the corrected post.

To explain what's ging on,  the split function of a string will split the string into an array based on the argument,  so

line.split(' ')

will split the String 'line' on the space character,

parts = "1 2 3 4 5".split(' ')

will return an array, (zero based index),  so after that call:

parts[0] = "1"
parts[1] = "2"
parts[2] = "3"
...

parts = "1,2,3,4,5".split(',')

would produce the same result

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

BradHodges wrote:

Not a problem,  in my last post there was a minor syntax error,  I've since corrected the post,  but see if you can figure it out first befor you look at the corrected post.

To explain what's ging on,  the split function of a string will split the string into an array based on the argument,  so

line.split(' ')

will split the String 'line' on the space character,

parts = "1 2 3 4 5".split(' ')

will return an array, (zero based index),  so after that call:

parts[0] = "1"
parts[1] = "2"
parts[2] = "3"
...

parts = "1,2,3,4,5".split(',')

would produce the same result

I actually got the error, and then figured it out! There was a missing } smile

Here's my output now:

$ ruby parse2.rb 
min = 10,25,40,55, hour = *, day of month = *, month = *, day of week = *
min = 30, hour = */4, day of month = *, month = *, day of week = *
min = */5, hour = *, day of month = *, month = *, day of week = *
min = 9, hour = 1, day of month = *, month = *, day of week = *
min = 45, hour = */8, day of month = *, month = *, day of week = *
min = 55, hour = 17, day of month = *, month = *, day of week = *
min = 35, hour = 1, day of month = *, month = *, day of week = *
min = */5, hour = *, day of month = *, month = *, day of week = *
min = 45, hour = 2, day of month = *, month = *, day of week = *
min = 0, hour = 1, day of month = *, month = *, day of week = *
min = 35, hour = *, day of month = *, month = *, day of week = *
min = 30, hour = */2, day of month = *, month = *, day of week = *
min = 15, hour = */2, day of month = *, month = *, day of week = *
min = 0, hour = 6, day of month = *, month = *, day of week = *
min = 2,58, hour = *, day of month = *, month = *, day of week = *
min = 0, hour = 0, day of month = *, month = *, day of week = *
min = 0, hour = 6, day of month = *, month = *, day of week = *
min = 45, hour = */4, day of month = *, month = *, day of week = *
min = 15, hour = */6, day of month = *, month = *, day of week = *

Re: Parsing data with ruby

OK, now it get's tricky,  each part can have several formats

*
*/x  where x is a number
x,y,z...   where x or y or z or ??? are numbers
x-y  where x and y  are numbers


* means do nothing based on the position,  so if parts[0] = "*',  the cronjob is not based on minutes

*/2 means repeat,  so if parts[1] = "*/4", it means every four hours

x,y,z means each,  so if parts[4] = "0,2,5" it means every Sunday, Tuesday and Friday

x-y means all in the range,  so if parts[1] = "8-18", it means every hour from 8 AM to 6 PM

Do you want to decode and present all of that detail?

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

I left off one format

x where x is a number,  so parts[1] = "12" means 12 noon every day

Last edited by BradHodges (2011-11-08 20:09:45)

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

BradHodges wrote:

OK, now it get's tricky,  each part can have several formats

*
*/x  where x is a number
x,y,z...   where x or y or z or ??? are numbers
x-y  where x and y  are numbers


* means do nothing based on the position,  so if parts[0] = "*',  the cronjob is not based on minutes

*/2 means repeat,  so if parts[1] = "*/4", it means every four hours

x,y,z means each,  so if parts[4] = "0,2,5" it means every Sunday, Tuesday and Friday

x-y means all in the range,  so if parts[1] = "8-18", it means every hour from 8 AM to 6 PM

Do you want to decode and present all of that detail?

Yeah, this is the hard part. The only thing I want to achieve with this script is to have it output a Y if there is anything but a * in that column. If there's a *, it should just be a N.

I want to do this with multiple files simultaneously, as well. The times will vary on each of the servers, but the actual cron job itself and the appropriate columns should be the same (for example, said cron job will always execute at the interval of 10,25,40,55 * * * * except the 10,25,40,55 columns will all be different).

To make this simple, without needing to input specific times, and because getting more complicated than this would probably be over my head, just outputting a Y or a N for an entry in the results is sufficient for my purposes here.

Re: Parsing data with ruby

def doYorN(part)
  if part == "*"
     "N"
  else
     "Y"
   end
end
file = File.new("crontab.out", "r")
puts "Minutes   Hours  DayOfMonth  Month DayOfWeek"
while (line = file.gets)
    parts = line.split(' ')
    puts "#{doYorN(parts[0])} {doYorN(parts[1])} #{doYorN(parts[2])} #{doYorN(parts[3])}, #{doYorN(parts[4])}"
end
file.close

The formatting won't be pretty,  but that is a good exercise for you to figure out.

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

BradHodges wrote:
def doYorN(part)
  if part == "*"
     "N"
  else
     "Y"
   end
end
file = File.new("crontab.out", "r")
puts "Minutes   Hours  DayOfMonth  Month DayOfWeek"
while (line = file.gets)
    parts = line.split(' ')
    puts "#{doYorN(parts[0])} {doYorN(parts[1])} #{doYorN(parts[2])} #{doYorN(parts[3])}, #{doYorN(parts[4])}"
end
file.close

The formatting won't be pretty,  but that is a good exercise for you to figure out.

Thanks! I need a bit of a brain break, but I'll tackle it later tonight or tomorrow and let you know. I really appreciate all your help so far with this project. I realize you're doing the vast majority of the work, so once this is over and done with I'm going to try and code something on my own to apply all that I've learned.

Re: Parsing data with ruby

I am not following the part about multiple servers.

You might want to explain that a bit more.  Are you parsing crontab output from multiple servers?

How are those multiple crontab outputs presented to you?

/whatever/crontabs/server1.txt
/whatever/crontabs/server2.txt
/whatever/crontabs/server3.txt
/whatever/crontabs/server4.txt
/whatever/crontabs/server5.txt

????

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

BradHodges wrote:

I am not following the part about multiple servers.

You might want to explain that a bit more.  Are you parsing crontab output from multiple servers?

How are those multiple crontab outputs presented to you?

/whatever/crontabs/server1.txt
/whatever/crontabs/server2.txt
/whatever/crontabs/server3.txt
/whatever/crontabs/server4.txt
/whatever/crontabs/server5.txt

????

Yeah, that's the idea. Say I have 10 servers with the raw crontab output in each file - all in files named server.log, server2.log, etc. I'm parsing all of them to come up with the output I mentioned in my original post.

If actually counting the number of times a cron job shows up and outputting a total sum is too difficult, I could at least get the raw output from each file and do the addition myself to get the total amount a cron job shows up.

So, say I have 10 servers with:

/some/cron/here/

And 8 servers with:

/another/cron/here/

The final format I'm aiming for would be like so:

Cronjob | # of Servers | Every minute | Every hour | Every day | Every week | Every month
-----------------------------------------------------------------------------------------
CronHere| 10 | N | N | Y | Y | Y
CronHere| 8 | Y | N | N | Y | Y

The only dynamic variable here would be the number of servers that have each cron in their crontab. The other entries of N or Y should be static amongst each individual cron job as they are all setup the same way. The only thing that varies amongst the same cron job on each server is the time it executes, but since we're just adding a Y or N that's of no consequence.

Edit:

Here's the output of the latest version of the script with the missing bits added in to get it to display right:

$ ruby parse3.rb 
Minutes   Hours  DayOfMonth  Month DayOfWeek
min = Y, hour = N, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = N, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = N, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = N, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = N, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N
min = Y, hour = Y, day of month = N, month = N, day of week = N

And here's the code:

#!/usr/bin/ruby 

def doYorN(part)
  if part == "*"
     "N"
  else
     "Y"
   end
end
file = File.new("cn.log", "r")
puts "Minutes   Hours  DayOfMonth  Month DayOfWeek"
while (line = file.gets)
    parts = line.split(' ')
    puts "min = #{doYorN(parts[0])}, hour = #{doYorN(parts[1])}, day of month = #{doYorN(parts[2])}, month = #{doYorN(parts[3])}, day of week = #{doYorN(parts[4])}"
end
file.close

Last edited by Striketh (2011-11-08 23:42:30)

Re: Parsing data with ruby

OK, so let's start a new development thread,  parse all the server logs,  can we assume there is ONE top level directory where all crontab outputs of ALL servers reside,  and there is nothing but crontab/server outputs under that root directory?

Like:

/allcrontabs/cronhere/server1.log
/allcrontabs/cronhere/server2.log
/allcrontabs/cronthere/server1.log
/allcrontabs/cronthere/server2.log

To keep things easy,  you need a dedicated top level directory where you can be assured of all the sub directories and file names found within.

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data with ruby

BradHodges wrote:

OK, so let's start a new development thread,  parse all the server logs,  can we assume there is ONE top level directory where all crontab outputs of ALL servers reside,  and there is nothing but crontab/server outputs under that root directory?

Like:

/allcrontabs/cronhere/server1.log
/allcrontabs/cronhere/server2.log
/allcrontabs/cronthere/server1.log
/allcrontabs/cronthere/server2.log

To keep things easy,  you need a dedicated top level directory where you can be assured of all the sub directories and file names found within.

Yep, I've already gotten all of the necessary data in the top level directory under <servername>.log and there's nothing else to interfere with that data. You can assume that the directory we're working in is nothing but individual files, each containing the raw output of crontab -l. There's no sub-directories.