Dreaming of Code

How to Parse an HTML Table with Nokogiri

August 17, 2015

Here's a simple way to parse an html table into a ruby hash using nokogiri

<!-- table.html -->
<table>
  <thead>
    <tr>
      <th>Foo</th>
      <th>Bar</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>foofoo</td>
      <td>barbar</td>
    </tr>
    <tr>
      <td>foobar</td>
      <td>barfoo</td>
    </tr>
  </tbody>
</table>
require 'nokogiri'

html = File.open('table.html', 'r').read
doc = Nokogiri::HTML(html)

# get table headers
headers = []
doc.xpath('//*/table/thead/tr/th').each do |th|
  headers << th.text
end

# get table rows
rows = []
doc.xpath('//*/table/tbody/tr').each_with_index do |row, i|
  rows[i] = {}
  row.xpath('td').each_with_index do |td, j|
    rows[i][headers[j]] = td.text
  end
end

p rows
# [{"Foo"=>"foofoo", "Bar"=>"barbar"}, {"Foo"=>"foobar", "Bar"=>"barfoo"}]