A parser for unruly CSVs
Parse CSVs with heirarchical headers and duplicated headers. Skip lines by line number, etc.
Read below to get started, or see the API Documentation for more details.
Add this line to your application's Gemfile:
gem 'uncsv'
And then execute:
bundle
Or install it yourself as:
gem install uncsv
Reading a CSV with Uncsv is similar to using Ruby's built-in CSV class. Create
a new instance of Uncsv
and pass it a String
or IO
. The second argument
is an options hash, see below.
require 'uncsv'
data = "A,B,C\n1,2,3"
csv = Uncsv.new(data, header_rows: 0)
csv.map do { |row| row['B'] }
Uncsv can read directly from the filesystem with the open
method.
Uncsv.open('my_data.csv')
Uncsv is an Enumerable
. All enumerable methods like each
, map
, reduce
,
etc. are supported.
data = "A,B,C\n1,2,3\n4,5,6"
csv = Uncsv.new(data, header_rows: 0)
c_total = csv.reduce do { |sum, row| sum + row['C'] }
The following options can be passed as a hash to the second argument of the Uncsv constructor, or set inside the constructor block.
Uncsv.new(data, skip_blanks: true)
# Is equivalent to
Uncsv.new(data) do |config|
config.skip_blanks = true
end
:expand_headers
: Defaultfalse
. If set totrue
, blank header row cells will assume the header of the row to their left. This is useful for heirarchical headers where not all the header cells are filled in. If set to an array of header indexes, only the specified headers will be expanded.:header_rows
: Default[]
. Can be set to either a single row index or an array of row indexes. For example, it could be set to0
to indicate a header in the first row. If set to an array of indexes ([1,2]
), the header row text will be joined by the:header_separator
. For example, if if the cell (0,0) had the value"Personal"
and cell (1,0) had the value "Name", the header would become"Personal.Name"
. Any data above the last header row will be ignored.:header_separator
: Default"."
. When using multiple header rows, this is a string used to separate the individual header fields.:nil_empty
: Defaulttrue
. Iftrue
, empty cells will be set tonil
, otherwise, they are set to an empty string.:normalize_headers
: Defaultfalse
. If set totrue
, header field text will be normalized. The text will be lowercased, and non-alphanumeric characters will be replaced with underscores (_
). If set to a string, those characters will be replaced with the string instead. If set to a hash, the hash will be treated as options to KeyNormalizer, accepting the:separator
, and:downcase
options. If set to another object, it is expected to respond to thenormalize(key)
method by returning a normalized string.:skip_blanks
: Defaultfalse
. Iftrue
, rows whose fields are all empty will be skipped.:skip_rows
: Default[]
. If set to an array of row indexes, those rows will be skipped. This option does not apply to header rows.:unique_headers
: Defaultfalse
. If set totrue
, headers will be forced to be unique by appending numbers to duplicates. For example, if two header cells have the text"Name"
, the headers will become"Name.0"
, and"Name.1"
. The separator between the text and the number can be set using the:header_separator
option.
See the documentation for Ruby's built-in CSV
class for the following
options.
:col_sep
:field_size_limit
:quote_char
:row_sep
:skip_blanks
After checking out the repo, run bundle
to install dependencies. You
can also run bin/console
for an interactive prompt that will allow you to
experiment.
To check your work, run bin/rspec
run the tests and bin/rubocop
to check
style. To generate a code coverage report, set the COVERAGE
environment
variable when running the tests.
COVERAGE=1 bin/rspec
bin/rubocop
Bug reports and pull requests are welcome on GitHub at https://github.com/nullscreen/uncsv.