Skip to content

serping/cheerio-tree

Repository files navigation

Cheerio Tree

What is Cheerio Tree?

Cheerio Tree is a powerful utility built on Cheerio, designed for efficient DOM parsing. It enables rapid conversion of HTML data into JSON format. When paired with YAML, it provides an intuitive and streamlined approach to data handling and transformation.

Install

npm install cheerio-tree

# or
yarn add cheerio-tree

# or
pnpm install cheerio-tree

Dependencies

Usage

Easy YAML Config

Just look like:

# ./config.yml
tree:
  nodes:
    title:
      selector: title
    body:
      selector: body
      attr: html
      to_markdown: true
    footer:
      selector: .footer

Typescript

import fs from 'fs';
import yaml from 'js-yaml';
import CheerioTree, { type CheerioTreeConfig } from 'cheerio-tree';

const config = fs.readFileSync('./config.yml', "utf-8");
const html = `
<html lang="en">
  <head>
    <title>Cheerio Tree</title>
  </head>
  <body>
    <h1>Cheerio Tree</h1>
    <main>
      <h2>What is Cheerio Tree?</h2>
      <p><b>Cheerio Tree</b> is a powerful utility built on <b>Cheerio</b>, designed for efficient DOM parsing. It enables rapid conversion of HTML data into JSON format. When paired with YAML, it provides an intuitive and streamlined approach to data handling and transformation.</p>
    </main>
  </body>
</html>
`
const configYaml = yaml.load(config) as CheerioTreeConfig;

const cheerioTree = new CheerioTree({ body: html });
const data = cheerioTree.parse({
  config: configYaml,
  beforeParse: ({cheerio}) =>{
    cheerio('body').append("<footer class='footer'>Append Text..</footer>")
  }
});
console.log(data);

output

{
  "title": "Cheerio Tree",
  "body": "Cheerio Tree\n============\n\nWhat is Cheerio Tree?\n---------------------\n\n**Cheerio Tree** is a powerful utility built on **Cheerio**, designed for efficient DOM parsing. It enables rapid conversion of HTML data into JSON format. When paired with YAML, it provides an intuitive and streamlined approach to data handling and transformation.\n\nAppend Text..",
  "footer": "Append Text.."
}

Need More YAML Config Demo?

Here is a demo based on the Cheerio Tree Scraper API:

GITHUB: https://github.com/serping/express-scraper

Yaml Nodes