Topic: Parsing data: Part 2

So, here I am again. I'm a bit more prepared than I was last time, but I'm now getting into the realm of math and stuff.

The task this time is similar to the last one, except I want to take the raw cron job output and parse it down to find out how many times said cron job would execute in a year.

Based on what I learned last time... I started with this.

 #!/usr/bin/ruby

  def rdcron(line)
  (minutes, hours) = (line.scan(/^\S+ \S+/))[0] # Need to complete regex... not sure what to add though.
  if hours =~ /(.*)\/(\d+)/
    num=$1; dem=$2;
    num=1 if num == "\*"
    hours = Rational(num,dem).to_f
  end
   file = File.new("file.log", "r")
  while (line = file.gets)
   parts = line.split(' ')
    puts "#{rdcron(parts[0])} #{rdcron(parts[1])} #{rdcron(parts[2])}"
end
file.close
end

It's not done, there's parts missing, and I have no idea how to progress or even how much of that is right.

A friend of mine gave me a much cleaner beginning:

#!/usr/bin/ruby

  def parse_cron(line)
  cron = line.scan(/([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s(.*)/)[0]
  { :min => cron[0], :hour => cron[1], :day => cron[2], :month => cron[3], :week => cron[4], :command => cron[5] }
end

crons = `crontab -l`.split("\n")

# Parses the cron
crons.each do |cron|
  p parse_cron(cron)
end

But, I want to fix up the problems in my script and figure out how to make THAT do what I want, and then go from there.

I know what I want to do, but I'm not sure how to do it. Any help would be appreciated.

Edit:

Would num_string or DateTime be viable to accomplish this?

Last edited by Striketh (2011-11-15 18:41:06)

Re: Parsing data: Part 2

Your friends code will parse the crontab line,  and return a hash

take your friend's parse_cron routine, and replace your rdcron with it.

Then you could do this:

file = File.new("file.log", "r")
while (line = file.gets)
  parts = parse_cron(line)
  puts "#{parts[:command]} occures every #{parts[:year]} #{pluralize(parts[:year],'year')}"
end

Last edited by BradHodges (2011-11-16 12:58:19)

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

I just read your requirement again,  you want to calculate the total number of times a cron job runs in a year,  much different than what I posted above :<

What you'd need to do will require a bit more programming. 

You'd have to identify all the various ways to define how many times the cron job is run.  So for example

*/4

You'd have to detect the '/',  and grab the trailing numeric digit(s)

0-4
You'd have to detect the '-',  and determine the difference between the two digits

0,3,4

You'd have to detect the ',', and then count them and add 1

Do you follow so far?

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

Yes. My friend said that the best way to go about doing this would probably be from using DateTime. I haven't used that before, but would you agree? Or do you have a different method in mind?

My regex is pretty bad, but I can cheat a little bit using the one he provided until I'm able to get the hang of that more. I don't plan on using any of his other code suggestions, though, as that would defeat the purpose of what I'm trying to do here (though it will be a good reference for afterward to see how I could have done it).

I'm stuck at trying to figure out how I would get this portion of it started. I can do the math manually, but I can't think of how I'd tell the computer to do the math. Would you possibly be able to get me started with how it would start out so I can try and see if I can figure out the rest?

Thanks!

Re: Parsing data: Part 2

OK,  here is a modification of his starting point:

  def parse_cron_part(cp)
    if cp == '*'
      0
    elsif cp ~= /[0-9]+/
      cp.to_i
    elsif cp ~= /\//
      c.split('/')[1].to_i
    elsif cp ~= /-/
      cp.split('-')[1].to_i - cp.split('-')[0].to_i
    elsif cp ~= /,/
      c.split(',').length + 1
    elsif
      -1  # some kind of error
    end
  end
  def parse_cron(line)
    cron = line.scan(/([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s(.*)/)[0]
    { :min => [cron[0], parse_cron_part[cron[0]],
      :hour => [cron[1], parse_cron_part[cron[1]],
      :day => [cron[2], parse_cron_part[cron[2]],
      :month => [cron[3],parse_cron_part[cron[3]],
      :week => [cron[4], parse_cron_part[cron[4]],
      :command => cron[5]
    }
  end

That will return a hash,  but each hash element contains an ARRAY (except for :command),  the first element in the array is the original cron definition,  the second element is an integer.

Now you only have to figure out the math logic,  see if you can figure out from this:

file = File.new("file.log", "r")
while (line = file.gets)
  parts = parse_cron(line)
  puts "The cron command is #{parts[:command]}"
  puts "The minutes definition looks like #{parts[:min][0]} and has a value of #{parts[:min][1]}"
end
Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

opps!  logic flaw

    elsif cp ~= /[0-9]+/
      1

if the cron definition contains just a numeric digit(s),  the count should be 1, i.e.

12 in minutes means on the 12th minute,  that's once per the time period

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

Thanks Brad! I'll review that as soon as I can and let you know how I do.

Re: Parsing data: Part 2

Ya know what,  it's got bugs,  your problem is so involved that I've decided to actually debug it and make it work 100%

wait till I post a working version.

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

BradHodges wrote:

Ya know what,  it's got bugs,  your problem is so involved that I've decided to actually debug it and make it work 100%

wait till I post a working version.

No problem! I'm not in a huge hurry with this or anything, so just let me know smile

Re: Parsing data: Part 2

Here ya go.  I did some testing but there may be some test cases I missed.

It will only parse the following types of column specifications:

*
1
*/2
2,3,4
2-5
?

I doesn't deal with:
L
W
C
#

This was fun!

crons = []
counts = []
cparts = []
basedir = "."
DAYS_IN_MONTHS = [31,28,31,30,30,31,31,31,30,31,30,31]
  def calc_cron_parts(cps)
      # MINUTES
      times_per_hour = 0
      if cps[:min].is_a? Array   # X specific minutes of the hour
        times_per_hour = cps[:min].length
      elsif cps[:min].is_a? Range
        times_per_hour = 1 + cps[:min].last - cps[:min].first
      elsif cps[:min] == 0   # all
        times_per_hour = 60
      elsif cps[:min] >99     #specific minute  
        times_per_hour = 1 
      elsif cps[:min] > 1    # every X minutes
        times_per_hour = (60/cps[:min]).to_i 
      end
      # HOURS
      times_per_day = 0 
      if cps[:hour].is_a? Array
        times_per_day = cps[:hour].length
      elsif cps[:hour].is_a? Range
        times_per_day = 1 + cps[:hour].last - cps[:hour].first
      elsif cps[:hour] == 0 #all
        times_per_day = 24
      elsif cps[:hour] > 99 # specific hour
        times_per_day = 1
      elsif cps[:hour] > 1 # a 4 would mean every 4th hour
        times_per_day = (24/cps[:hour]).to_i
      end
      
      # DAY OF MONTH AND DAY OF WEEK
      days_per_month = 0
      
      # SPECIFIC DAYS OF MONTH
      if cps[:day].is_a? Array and cps[:week].is_a? Array
        days_per_month = cps[:day].lenth + cps[:month].length
      elsif cps[:day].is_a? Array and cps[:week].is_a? Range
        days_per_month = cps[:day].lenth + 1 + cps[:week].last - cps[:week].first
      elsif cps[:day].is_a? Array and ( cps[:week] == 99 or cps[:week] == 0 )
        days_per_month = cps[:day].length
      elsif cps[:day].is_a? Array and cps[:week] > 99
        days_per_month = cps[:day].length + 1
      # RANGE of DAYS OF MONTH
      elsif cps[:day].is_a? Range and cps[:week].is_a? Array
        days_per_month = (1 + cps[:day].last - cps[:day].first) + cps[:week].length
      elsif cps[:day].is_a? Range and cps[:week].is_a? Range
        days_per_month = (1 + cps[:day].last - cps[:day].first) + (1 + cps[:week].last - cps[:week].first)
      elsif cps[:day].is_a? Range and ( cps[:week] == 99 or cps[:week] == 0 )
        days_per_month = (1 + cps[:day].last - cps[:day].first) 
      elsif cps[:day].is_a? Range and cps[:week] > 99 
        days_per_month = (1 + cps[:day].last - cps[:day].first) + 1 

      elsif (cps[:day] == 0 or cps[:day] > 99) and cps[:week].is_a? Range
        days_per_month = (1 + cps[:week].last - cps[:week].first) * 4
      elsif cps[:day] == 0  and cps[:week] == 99 # all days in month, ignore Doy of Week
        days_per_month = -1  # means ALL days in month
      elsif cps[:day] == 0 and cps[:week] == 0 # all days in month, all days in week
        days_per_month = -1 # All days in month
      elsif cps[:day] == 0 and cps[:week] > 99   # all days in month, ONE specific week
        days_per_month = 365/12
      # Specific Day of Month
      elsif cps[:day] > 99  and cps[:week] == 99 # specific day of the month ignore week
        days_per_month = 1
      elsif cps[:day] > 99 and cps[:week] > 99 # specific day of month AND  specific day of week
        days_per_month = 5
      elsif cps[:day] > 99 and cps[:week] == 99  # Specific Day of Month, ignore Day of Week
        days_per_month = 12 
      # Every X days of Month
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] == 99 or cps[:week] == 0 )
          days_per_month = (30/cps[:day]).to_i
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] > 0 and cps[:week] < 8 )
          days_per_month = (30/cps[:day]).to_i + (7/cps[:week]).to_i
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] == 0 )
          days_per_month = (30/cps[:day]).to_i        
      end
      
      # MONTH
      times_per_year = 0
      if days_per_month == -1  # All days in month
        if cps[:month].is_a? Array
          cps[:month].each do |m|
            times_per_year += DAYS_IN_MONTHS[cps[:month][cps[:month].index(m)].to_i-1]
           end
        elsif cps[:month] == 0 
            times_per_year = 365
        elsif cps[:month] > 0 and cps[:month] < 100       # this would mean ONLY X months per year
          times_per_year = 365 / (12/cps[:month]).to_i     
        elsif cps[:month] > 99 # this would mean ONLY X months per year
          times_per_year =   (cps[:month].to_i - 100) * 12
        end
      else
        if cps[:month].is_a? Array
          times_per_year = days_per_month * cps[:month].length
        elsif cps[:month].is_a? Range
          times_per_year = days_per_month * (1+ cps[:month].last - cps[:month].first)
        elsif cps[:month] > 0 and cps[:month] < 12
          times_per_year = days_per_month
        elsif cps[:month] == 0
          times_per_year = days_per_month * 12
        end     
      end
      times_per_hour * times_per_day * times_per_year
  end
  def parse_cron_part(cp)
    
    if cp == '*' 
      r = 0
    elsif cp == "?"
      r = 99
    elsif  /,/ =~ cp # returns an Array
      r = cp.split(',')
    elsif  /-/ =~ cp # return a Range
     r = (cp.split('-')[0].to_i .. cp.split('-')[1].to_i) 
    elsif /\// =~ cp 
      r = cp.split('/')[1].to_i
    elsif /[0-9]+/ =~ cp 
      r = 100 + cp.to_i
    end
    return r
  end
  def parse_cron(line)
    cron = line.scan(/([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s(.*)/)[0]
    return { 
    :min => parse_cron_part(cron[0]),
    :hour => parse_cron_part(cron[1]),
    :day => parse_cron_part(cron[2]),
    :month => parse_cron_part(cron[3]),
    :week => parse_cron_part(cron[4]),
    :command => cron[5]    
    }
  end
  Dir.new(basedir).entries.each do |logfile|
    unless File.directory? logfile
      if logfile.split('.')[1] == 'log'
        file = File.new(logfile, "r")
        while (line = file.gets)
          parts = line.split(' ')
          if parts[5,parts.length-5]
            cmd = parts[5,parts.length-5].join(' ')
            idx = crons.index(cmd)
            if idx
               counts[idx] += 1
            else
              crons << cmd
              idx = crons.index(cmd)
              counts[idx] = 1
              cparts << parse_cron(line)
            end
          else
            puts "Error on: #{line} in file #{logfile}"
          end
        end
        file.close
      end
    end
  end
  puts "# Servers YearCount Cronjob"
  crons.each do |c|
    idx = crons.index(c)
    puts("#{'%9s' % counts[idx].to_s } #{'%9s' % calc_cron_parts(cparts[idx])} #{cparts[idx][:command]}" )  
  end
Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

BradHodges wrote:

Here ya go.  I did some testing but there may be some test cases I missed.

It will only parse the following types of column specifications:

*
1
*/2
2,3,4
2-5
?

I doesn't deal with:
L
W
C
#

This was fun!

crons = []
counts = []
cparts = []
basedir = "."
DAYS_IN_MONTHS = [31,28,31,30,30,31,31,31,30,31,30,31]
  def calc_cron_parts(cps)
      # MINUTES
      times_per_hour = 0
      if cps[:min].is_a? Array   # X specific minutes of the hour
        times_per_hour = cps[:min].length
      elsif cps[:min].is_a? Range
        times_per_hour = 1 + cps[:min].last - cps[:min].first
      elsif cps[:min] == 0   # all
        times_per_hour = 60
      elsif cps[:min] >99     #specific minute  
        times_per_hour = 1 
      elsif cps[:min] > 1    # every X minutes
        times_per_hour = (60/cps[:min]).to_i 
      end
      # HOURS
      times_per_day = 0 
      if cps[:hour].is_a? Array
        times_per_day = cps[:hour].length
      elsif cps[:hour].is_a? Range
        times_per_day = 1 + cps[:hour].last - cps[:hour].first
      elsif cps[:hour] == 0 #all
        times_per_day = 24
      elsif cps[:hour] > 99 # specific hour
        times_per_day = 1
      elsif cps[:hour] > 1 # a 4 would mean every 4th hour
        times_per_day = (24/cps[:hour]).to_i
      end
      
      # DAY OF MONTH AND DAY OF WEEK
      days_per_month = 0
      
      # SPECIFIC DAYS OF MONTH
      if cps[:day].is_a? Array and cps[:week].is_a? Array
        days_per_month = cps[:day].lenth + cps[:month].length
      elsif cps[:day].is_a? Array and cps[:week].is_a? Range
        days_per_month = cps[:day].lenth + 1 + cps[:week].last - cps[:week].first
      elsif cps[:day].is_a? Array and ( cps[:week] == 99 or cps[:week] == 0 )
        days_per_month = cps[:day].length
      elsif cps[:day].is_a? Array and cps[:week] > 99
        days_per_month = cps[:day].length + 1
      # RANGE of DAYS OF MONTH
      elsif cps[:day].is_a? Range and cps[:week].is_a? Array
        days_per_month = (1 + cps[:day].last - cps[:day].first) + cps[:week].length
      elsif cps[:day].is_a? Range and cps[:week].is_a? Range
        days_per_month = (1 + cps[:day].last - cps[:day].first) + (1 + cps[:week].last - cps[:week].first)
      elsif cps[:day].is_a? Range and ( cps[:week] == 99 or cps[:week] == 0 )
        days_per_month = (1 + cps[:day].last - cps[:day].first) 
      elsif cps[:day].is_a? Range and cps[:week] > 99 
        days_per_month = (1 + cps[:day].last - cps[:day].first) + 1 

      elsif (cps[:day] == 0 or cps[:day] > 99) and cps[:week].is_a? Range
        days_per_month = (1 + cps[:week].last - cps[:week].first) * 4
      elsif cps[:day] == 0  and cps[:week] == 99 # all days in month, ignore Doy of Week
        days_per_month = -1  # means ALL days in month
      elsif cps[:day] == 0 and cps[:week] == 0 # all days in month, all days in week
        days_per_month = -1 # All days in month
      elsif cps[:day] == 0 and cps[:week] > 99   # all days in month, ONE specific week
        days_per_month = 365/12
      # Specific Day of Month
      elsif cps[:day] > 99  and cps[:week] == 99 # specific day of the month ignore week
        days_per_month = 1
      elsif cps[:day] > 99 and cps[:week] > 99 # specific day of month AND  specific day of week
        days_per_month = 5
      elsif cps[:day] > 99 and cps[:week] == 99  # Specific Day of Month, ignore Day of Week
        days_per_month = 12 
      # Every X days of Month
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] == 99 or cps[:week] == 0 )
          days_per_month = (30/cps[:day]).to_i
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] > 0 and cps[:week] < 8 )
          days_per_month = (30/cps[:day]).to_i + (7/cps[:week]).to_i
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] == 0 )
          days_per_month = (30/cps[:day]).to_i        
      end
      
      # MONTH
      times_per_year = 0
      if days_per_month == -1  # All days in month
        if cps[:month].is_a? Array
          cps[:month].each do |m|
            times_per_year += DAYS_IN_MONTHS[cps[:month][cps[:month].index(m)].to_i-1]
           end
        elsif cps[:month] == 0 
            times_per_year = 365
        elsif cps[:month] > 0 and cps[:month] < 100       # this would mean ONLY X months per year
          times_per_year = 365 / (12/cps[:month]).to_i     
        elsif cps[:month] > 99 # this would mean ONLY X months per year
          times_per_year =   (cps[:month].to_i - 100) * 12
        end
      else
        if cps[:month].is_a? Array
          times_per_year = days_per_month * cps[:month].length
        elsif cps[:month].is_a? Range
          times_per_year = days_per_month * (1+ cps[:month].last - cps[:month].first)
        elsif cps[:month] > 0 and cps[:month] < 12
          times_per_year = days_per_month
        elsif cps[:month] == 0
          times_per_year = days_per_month * 12
        end     
      end
      times_per_hour * times_per_day * times_per_year
  end
  def parse_cron_part(cp)
    
    if cp == '*' 
      r = 0
    elsif cp == "?"
      r = 99
    elsif  /,/ =~ cp # returns an Array
      r = cp.split(',')
    elsif  /-/ =~ cp # return a Range
     r = (cp.split('-')[0].to_i .. cp.split('-')[1].to_i) 
    elsif /\// =~ cp 
      r = cp.split('/')[1].to_i
    elsif /[0-9]+/ =~ cp 
      r = 100 + cp.to_i
    end
    return r
  end
  def parse_cron(line)
    cron = line.scan(/([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s(.*)/)[0]
    return { 
    :min => parse_cron_part(cron[0]),
    :hour => parse_cron_part(cron[1]),
    :day => parse_cron_part(cron[2]),
    :month => parse_cron_part(cron[3]),
    :week => parse_cron_part(cron[4]),
    :command => cron[5]    
    }
  end
  Dir.new(basedir).entries.each do |logfile|
    unless File.directory? logfile
      if logfile.split('.')[1] == 'log'
        file = File.new(logfile, "r")
        while (line = file.gets)
          parts = line.split(' ')
          if parts[5,parts.length-5]
            cmd = parts[5,parts.length-5].join(' ')
            idx = crons.index(cmd)
            if idx
               counts[idx] += 1
            else
              crons << cmd
              idx = crons.index(cmd)
              counts[idx] = 1
              cparts << parse_cron(line)
            end
          else
            puts "Error on: #{line} in file #{logfile}"
          end
        end
        file.close
      end
    end
  end
  puts "# Servers YearCount Cronjob"
  crons.each do |c|
    idx = crons.index(c)
    puts("#{'%9s' % counts[idx].to_s } #{'%9s' % calc_cron_parts(cparts[idx])} #{cparts[idx][:command]}" )  
  end

Wow, I didn't expect you to do the whole thing. That is... a lot longer than I thought it'd be yikes

Unfortunately, there is an error on it right now. I understand the error, but I'm not sure why it's erroring as it looks like it's defined to me? But maybe not. Here's the error:

parse_cron_times.rb:129:in `parse_cron': undefined method `[]' for nil:NilClass (NoMethodError)
from parse_cron_times.rb:152
from parse_cron_times.rb:137

Which is these lines (same order as the error message):

:min => parse_cron_part(cron[0]),
parts << parse_cron(line)
Dir.new(basedir).entries.each do |logfile|

Any hints?

Re: Parsing data: Part 2

First, isolate the crontab.log file that it is barfing on. 

If you leave basedir = ".",  you can move the program and ONE crontab.log file to a new directory,  and run the program.

Keep swapping out crontab.log files and run the program until you find the one that barfs, and post the log file.

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

here is the test crontab.log file I used

1-20 * * 1 * root run-parts /etc/cron.hourly
*/10 * */2 * * root run-parts /etc/cron.eoday
*/30 */4 * * 0 root run-parts /etc/cron.weekly
* * * */3 * root run-parts /etc/cron.monthly
* * * * * root zero test
20 * * * * 20ith minute
20 10 * 2,12 * 20ith minute 15th and 30th
20 10 * */3 1-5 20ith minute every third month mon thru fri
1-5 10 * * 1-5 1st throug 4th minute every third month mon thru fri
Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

Hmm, I copied it over and did one test file at a time, but these are giving a different error.

Error:

# Servers YearCount Cronjob
parse_cron_times.rb:11:in `calc_cron_parts': undefined method `[]' for nil:NilClass (NoMethodError)
from parse_cron_times.rb:165
from parse_cron_times.rb:163

Crontab:

6,21,36,51 * * * * /usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1
30 */4 * * * /usr/bin/test -x /usr/local/cpanel/scripts/update_db_cache && /usr/local/cpanel/scripts/update_db_cache
*/5 * * * * /usr/local/cpanel/bin/dcpumon >/dev/null 2>&1
36 2 * * * /usr/local/cpanel/whostmgr/docroot/cgi/cpaddons_report.pl --notify
42 0 * * * /usr/local/cpanel/scripts/upcp --cron
0 1 * * * /usr/local/cpanel/scripts/cpbackup
35 * * * * /usr/bin/test -x /usr/local/cpanel/bin/tail-check && /usr/local/cpanel/bin/tail-check
30 */2 * * * /usr/local/cpanel/bin/mysqluserstore >/dev/null 2>&1
15 */2 * * * /usr/local/cpanel/bin/dbindex >/dev/null 2>&1
45 */8 * * * /usr/bin/test -x /usr/local/cpanel/bin/optimizefs && /usr/local/cpanel/bin/optimizefs
27 5 * * * cd /usr/local/cpanel/whostmgr/docroot/cgi/fantastico/scripts/ ; /usr/local/cpanel/3rdparty/bin/php cron.php > /dev/null 2>&1
57 3 * * * perl /root/rvadmin/auto_rvskin.pl
*/5 * * * * perl /root/rvadmin/rvmultiupdate.pl >/dev/null 2>&1
2,58 * * * * /usr/local/bandmin/bandmin
0 0 * * * /usr/local/bandmin/ipaddrmap
0 6 * * * /usr/local/cpanel/scripts/exim_tidydb > /dev/null 2>&1
45 */4 * * * /usr/bin/test -x /usr/local/cpanel/scripts/update_mailman_cache && /usr/local/cpanel/scripts/update_mailman_cache
15 */6 * * * /usr/local/cpanel/scripts/recoverymgmt >/dev/null 2>&1

Re: Parsing data: Part 2

That's weird, I pasted that into my test file,  and it ran OK??
here is the output:

# Servers YearCount Cronjob
        1     35040 /usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1
        1      2190 /usr/bin/test -x /usr/local/cpanel/scripts/update_db_cache && /usr/local/cpanel/scripts/update_db_cache
        1    105120 /usr/local/cpanel/bin/dcpumon >/dev/null 2>&1
        1       365 /usr/local/cpanel/whostmgr/docroot/cgi/cpaddons_report.pl --notify
        1       365 /usr/local/cpanel/scripts/upcp --cron
        1       365 /usr/local/cpanel/scripts/cpbackup
        1      8760 /usr/bin/test -x /usr/local/cpanel/bin/tail-check && /usr/local/cpanel/bin/tail-check
        1      4380 /usr/local/cpanel/bin/mysqluserstore >/dev/null 2>&1
        1      4380 /usr/local/cpanel/bin/dbindex >/dev/null 2>&1
        1      1095 /usr/bin/test -x /usr/local/cpanel/bin/optimizefs && /usr/local/cpanel/bin/optimizefs
        1       365 cd /usr/local/cpanel/whostmgr/docroot/cgi/fantastico/scripts/ ; /usr/local/cpanel/3rdparty/bin/php cron.php > /dev/null 2>&1
        1       365 perl /root/rvadmin/auto_rvskin.pl
        1    105120 perl /root/rvadmin/rvmultiupdate.pl >/dev/null 2>&1
        1     17520 /usr/local/bandmin/bandmin
        1       365 /usr/local/bandmin/ipaddrmap
        1       365 /usr/local/cpanel/scripts/exim_tidydb > /dev/null 2>&1
        1      2190 /usr/bin/test -x /usr/local/cpanel/scripts/update_mailman_cache && /usr/local/cpanel/scripts/update_mailman_cache
        1      1460 /usr/local/cpanel/scripts/recoverymgmt >/dev/null 2>&1

Are you sure that that is the ONLY file with a .log extension in the directory where you are running the program?

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

BradHodges wrote:

That's weird, I pasted that into my test file,  and it ran OK??
here is the output:

# Servers YearCount Cronjob
        1     35040 /usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1
        1      2190 /usr/bin/test -x /usr/local/cpanel/scripts/update_db_cache && /usr/local/cpanel/scripts/update_db_cache
        1    105120 /usr/local/cpanel/bin/dcpumon >/dev/null 2>&1
        1       365 /usr/local/cpanel/whostmgr/docroot/cgi/cpaddons_report.pl --notify
        1       365 /usr/local/cpanel/scripts/upcp --cron
        1       365 /usr/local/cpanel/scripts/cpbackup
        1      8760 /usr/bin/test -x /usr/local/cpanel/bin/tail-check && /usr/local/cpanel/bin/tail-check
        1      4380 /usr/local/cpanel/bin/mysqluserstore >/dev/null 2>&1
        1      4380 /usr/local/cpanel/bin/dbindex >/dev/null 2>&1
        1      1095 /usr/bin/test -x /usr/local/cpanel/bin/optimizefs && /usr/local/cpanel/bin/optimizefs
        1       365 cd /usr/local/cpanel/whostmgr/docroot/cgi/fantastico/scripts/ ; /usr/local/cpanel/3rdparty/bin/php cron.php > /dev/null 2>&1
        1       365 perl /root/rvadmin/auto_rvskin.pl
        1    105120 perl /root/rvadmin/rvmultiupdate.pl >/dev/null 2>&1
        1     17520 /usr/local/bandmin/bandmin
        1       365 /usr/local/bandmin/ipaddrmap
        1       365 /usr/local/cpanel/scripts/exim_tidydb > /dev/null 2>&1
        1      2190 /usr/bin/test -x /usr/local/cpanel/scripts/update_mailman_cache && /usr/local/cpanel/scripts/update_mailman_cache
        1      1460 /usr/local/cpanel/scripts/recoverymgmt >/dev/null 2>&1

Are you sure that that is the ONLY file with a .log extension in the directory where you are running the program?

Hmm, I think I probably just messed it up while I was trying to fix it tongue I did a fresh copy/paste and it worked. Still give the other error when doing the other files, so I'll have to do them one-by-one. I won't have time to do that tonight, but I'll let you know as soon as I do!

Re: Parsing data: Part 2

Like I said,  my test cases were minimal.  I'll add some exception handling,  then you won't have to do the tedious job of isolating the specific file that makes it barf.  I'll make it so you can run it against lots of files,  and it will tell YOU which file it barfed on.

Stay tuned

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

Here is a new version with error handling, I also added two optional arguments

-b basedir - defaults to '.'
-e extension  - defaults to 'log'

crons = []
counts = []
cparts = []
basedir = "."
extn = 'log'

DAYS_IN_MONTHS = [31,28,31,30,30,31,31,31,30,31,30,31]
  def calc_cron_parts(cps)
      # MINUTES
      times_per_hour = 0
    begin
      if cps[:min].is_a? Array   # X specific minutes of the hour
        times_per_hour = cps[:min].length
      elsif cps[:min].is_a? Range
        times_per_hour = 1 + cps[:min].last - cps[:min].first
      elsif cps[:min] == 0   # all
        times_per_hour = 60
      elsif cps[:min] >99     #specific minute  
        times_per_hour = 1 
      elsif cps[:min] > 1    # every X minutes
        times_per_hour = (60/cps[:min]).to_i 
      end
      # HOURS
      times_per_day = 0 
      if cps[:hour].is_a? Array
        times_per_day = cps[:hour].length
      elsif cps[:hour].is_a? Range
        times_per_day = 1 + cps[:hour].last - cps[:hour].first
      elsif cps[:hour] == 0 #all
        times_per_day = 24
      elsif cps[:hour] > 99 # specific hour
        times_per_day = 1
      elsif cps[:hour] > 1 # a 4 would mean every 4th hour
        times_per_day = (24/cps[:hour]).to_i
      end
      
      # DAY OF MONTH AND DAY OF WEEK
      days_per_month = 0
      
      # SPECIFIC DAYS OF MONTH
      if cps[:day].is_a? Array and cps[:week].is_a? Array
        days_per_month = cps[:day].lenth + cps[:month].length
      elsif cps[:day].is_a? Array and cps[:week].is_a? Range
        days_per_month = cps[:day].lenth + 1 + cps[:week].last - cps[:week].first
      elsif cps[:day].is_a? Array and ( cps[:week] == 99 or cps[:week] == 0 )
        days_per_month = cps[:day].length
      elsif cps[:day].is_a? Array and cps[:week] > 99
        days_per_month = cps[:day].length + 1
      # RANGE of DAYS OF MONTH
      elsif cps[:day].is_a? Range and cps[:week].is_a? Array
        days_per_month = (1 + cps[:day].last - cps[:day].first) + cps[:week].length
      elsif cps[:day].is_a? Range and cps[:week].is_a? Range
        days_per_month = (1 + cps[:day].last - cps[:day].first) + (1 + cps[:week].last - cps[:week].first)
      elsif cps[:day].is_a? Range and ( cps[:week] == 99 or cps[:week] == 0 )
        days_per_month = (1 + cps[:day].last - cps[:day].first) 
      elsif cps[:day].is_a? Range and cps[:week] > 99 
        days_per_month = (1 + cps[:day].last - cps[:day].first) + 1 

      elsif (cps[:day] == 0 or cps[:day] > 99) and cps[:week].is_a? Range
        days_per_month = (1 + cps[:week].last - cps[:week].first) * 4
      elsif cps[:day] == 0  and cps[:week] == 99 # all days in month, ignore Doy of Week
        days_per_month = -1  # means ALL days in month
      elsif cps[:day] == 0 and cps[:week] == 0 # all days in month, all days in week
        days_per_month = -1 # All days in month
      elsif cps[:day] == 0 and cps[:week] > 99   # all days in month, ONE specific week
        days_per_month = 365/12
      # Specific Day of Month
      elsif cps[:day] > 99  and cps[:week] == 99 # specific day of the month ignore week
        days_per_month = 1
      elsif cps[:day] > 99 and cps[:week] > 99 # specific day of month AND  specific day of week
        days_per_month = 5
      elsif cps[:day] > 99 and cps[:week] == 99  # Specific Day of Month, ignore Day of Week
        days_per_month = 12 
      # Every X days of Month
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] == 99 or cps[:week] == 0 )
          days_per_month = (30/cps[:day]).to_i
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] > 0 and cps[:week] < 8 )
          days_per_month = (30/cps[:day]).to_i + (7/cps[:week]).to_i
      elsif cps[:day] > 0 and  cps[:day] < 32 and ( cps[:week] == 0 )
          days_per_month = (30/cps[:day]).to_i        
      end
      
      # MONTH
      times_per_year = 0
      if days_per_month == -1  # All days in month
        if cps[:month].is_a? Array
          cps[:month].each do |m|
            times_per_year += DAYS_IN_MONTHS[cps[:month][cps[:month].index(m)].to_i-1]
           end
        elsif cps[:month] == 0 
            times_per_year = 365
        elsif cps[:month] > 0 and cps[:month] < 100       # this would mean ONLY X months per year
          times_per_year = 365 / (12/cps[:month]).to_i     
        elsif cps[:month] > 99 # this would mean ONLY X months per year
          times_per_year =   (cps[:month].to_i - 100) * 12
        end
      else
        if cps[:month].is_a? Array
          times_per_year = days_per_month * cps[:month].length
        elsif cps[:month].is_a? Range
          times_per_year = days_per_month * (1+ cps[:month].last - cps[:month].first)
        elsif cps[:month] > 0 and cps[:month] < 12
          times_per_year = days_per_month
        elsif cps[:month] == 0
          times_per_year = days_per_month * 12
        end     
      end
      times_per_hour * times_per_day * times_per_year
    rescue
      puts("error in file #{cps[:logfile]}")
    end
  end
  def parse_cron_part(cp)
    # parse the column definition
    # most results are integers
    # zero means ALL values
    # an integer in the range of 1 through 98 (realistically 1 through 31) means EVERY , i.e. Every 3 days, or Every 2 months
    # 99 means the ?,  which should only appear in Doy of Month or Day of Week
    # 100 or greater means a specific Minute, Hour, Day of Month, Month or Day of Week
    # returning an Array means a series of specific Minutes, Hours, Days of Month, Months or Days of Week
    # returning a Range means a Range of specific  Minutes, Hours, Days of Month, Months or Days of Week
    begin
      if cp == '*' 
        r = 0 
      elsif cp == "?"
        r = 99
      elsif  /,/ =~ cp # returns an Array
        r = cp.split(',')
      elsif  /-/ =~ cp # return a Range
       r = (cp.split('-')[0].to_i .. cp.split('-')[1].to_i) 
      elsif /\// =~ cp 
        r = cp.split('/')[1].to_i
      elsif /[0-9]+/ =~ cp 
        r = 100 + cp.to_i
      end
      return r
    rescue
      puts("error line #{cp[:lineno]} in file #{cp[:logfile]}")
    end    
  end
  def parse_cron(line,logfile,lineno)
    cron = line.scan(/([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s([-0-9*,\/]+)\s(.*)/)[0]
    begin
      return { 
      :min => parse_cron_part(cron[0]),
      :hour => parse_cron_part(cron[1]),
      :day => parse_cron_part(cron[2]),
      :month => parse_cron_part(cron[3]),
      :week => parse_cron_part(cron[4]),
      :command => cron[5],
      :logfile => logfile,
      :lineno => lineno
      }
    rescue
      puts("error line #{lineno} in file #{logfile}")
      exit
    end
  end
  # BEGIN PROGRAM
  # Parse ANY arguments
  ARGV.each do |a|
    case a
    when '-b' 
      basedir = ARGV[ARGV.index(a)+1]
    when '-e'
      extn = ARGV[ARGV.index(a)+1]
    end
  end
  Dir.new(basedir).entries.each do |logfile|
    unless File.directory? logfile
      if logfile.split('.')[1] == extn
        file = File.new(logfile, "r")
        lineno = 0
        while (line = file.gets)
          lineno += 1
          parts = line.split(' ')
          if parts[5,parts.length-5]
            cmd = parts[5,parts.length-5].join(' ')
            idx = crons.index(cmd)
            if idx
               counts[idx] += 1
            else
              crons << cmd
              idx = crons.index(cmd)
              counts[idx] = 1
              cparts << parse_cron(line,logfile,lineno)
            end
          else
            puts "Error on line #{lineno}: '#{line.chop}' in file #{logfile}"
          end
        end
        file.close
      end
    end
  end
  puts "# Servers YearCount Cronjob"
  crons.each do |c|
    idx = crons.index(c)
    puts("#{'%9s' % counts[idx].to_s } #{'%9s' % calc_cron_parts(cparts[idx])} #{cparts[idx][:command]}" )  
  end

Last edited by BradHodges (2011-11-19 10:49:33)

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.

Re: Parsing data: Part 2

That's pretty awesome, Brad. I hope my grasp of ruby becomes as good as yours one of these days.

I'll test it out later tonight when I get off work and let you know how it goes.

Re: Parsing data: Part 2

My grasp of Ruby isn't that great,  it's just another language. It's a grasp of programming that is important.  I've programmed in so many languages over the years,  they all kind of blur in my mind.  Starting with an HP-25C calculator programming RPN in 1975, I've programmed in the following additional languages

Commodore PET BASIC
Radio Shack TRS-80 BASIC
SMC Business BASIC
dBase II, .. IV
C
C++
Smalltalk-80
Lisp
Postscript
Ada
Java
Ruby

Then there are the pseudo languages
LEX/YACC
nroff/troff

And the APIs
BSD Unix
SVR4 Unix
J2EE
RoR

It's the API's that are the real killer,  Rails took longer to master than Ruby.
J2EE took longer to master than Java, (hell, having programmed in C++ and Ada,  Java's learning curve was about 2 days, J2EE was unbearably long,  never did stick with it,  I thought it was a bloated piece of crap!

Ruby on Rails is by far the best programming environment I've ever used, no contest!

Joe got a job, on the day shift, at the Utility Muffin Research Kitchen, arrogantly twisting the sterile canvas snout of a fully charged icing anointment utensil.