Dreaming of Code

How to Parse an HTML Table with Nokogiri

August 17, 2015

Here's a simple way to parse an html table into a ruby hash using nokogiri

<!-- table.html -->
require 'nokogiri'

html = File.open('table.html', 'r').read
doc = Nokogiri::HTML(html)

# get table headers
headers = []
doc.xpath('//*/table/thead/tr/th').each do |th|
  headers << th.text

# get table rows
rows = []
doc.xpath('//*/table/tbody/tr').each_with_index do |row, i|
  rows[i] = {}
  row.xpath('td').each_with_index do |td, j|
    rows[i][headers[j]] = td.text

p rows
# [{"Foo"=>"foofoo", "Bar"=>"barbar"}, {"Foo"=>"foobar", "Bar"=>"barfoo"}]