Topic: File.each_line with a Mac text file


I'm having a weird problem with File.each_line.  If I create a text file from Excel (save as tab-delimited), each_line reads the entire file as a single line.  In contrast, if I create the text file using Emacs, then each_line properly iterates through the file one line at a time.  Is this an end-of-line problem?  What is a reasonable solution?  I need to be able to handle files that were created on Macs, PCs, and/or Linux.  Here's my code:

contents = "" do |file|
  file.each_line { |line|
    puts "line: " + line
    contents = contents + line
    puts "contents in loop: " + contents
puts "contents: " + contents

Re: File.each_line with a Mac text file

I don't know a lot about Excel, but the fact that you said "tab-delimited" makes me think that there is the problem. File.each_line probably doesn't recognize tabs as end of line characters. Couldn't find the details by a quick search but that's my guess...

Re: File.each_line with a Mac text file

No, tabs are not the problem.  Tab-delimited means that the tabs separate the columns.  The tabs are not used to represent end of line.

Mac and Windows use a different way to define end of line.

Re: File.each_line with a Mac text file

Oh, okay, I didn't realize there were still line breaks as well. While they do use different end of line it seems like each_line would recognize both...

Re: File.each_line with a Mac text file

But I guess it woudlnt' be too hard to implement your own "eachline" function if you had to. Just read in characters, and have some sort of "if it's end of line" (here you could account for both mac and windows) "then do something".

Re: File.each_line with a Mac text file

The problem stems from the new line encoding OS X uses.

Um... I don't know of an easy way to fix this though.

Most systems use whatever the system they are running on are encoded for. So, if your server is Unix then its probably looking for LF, if its on Windows its looking for CR + LF. If your files encoded for OS X it might have either LF or CR depending on the program and settings.

Thats the problem, but not really a solution sorry ><

Re: File.each_line with a Mac text file

You can do this....


where params[:file_input] is either a file you passed via a html form or a file handle

the read command reads the entire file in one big chunk and places it in a string
".split(/\r?\n|\r(?!\n)/)" splits the file based on the various new line characters that different OSes use:

Unix = "\n"
Windows = "\r\n"
Mac ="\r"

If you still want to read the file into your program one line at a time (i.e. using .each) you can modify the default line delimiter that ruby uses.  This is stored in the variable $/ (for a mac file do $/="\r").  Of course if you do this you need to know what type of new line characters your file contains....

Re: File.each_line with a Mac text file

Following seems to work based on a set of files I made up using an editir that can save files in all 3 formats. jdittmar's regex may be better than mine, though...

files = %w{dos.txt apple.txt unix.txt}

files.each do |f|
   puts "file: #{f}"
   data =\r\n|\r|\n/)
   data.each {|line| puts line}

Re: File.each_line with a Mac text file

Seems I've fallen foul of the itinerant watch seller. Should have checked the date on the previous post... sad