Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Use Hexo for generate more than 10000 posts #2579

Open
mnlbox opened this issue May 23, 2017 · 19 comments
Open

[Question] Use Hexo for generate more than 10000 posts #2579

mnlbox opened this issue May 23, 2017 · 19 comments
Labels
enhancement New feature or request #perfmatters

Comments

@mnlbox
Copy link

mnlbox commented May 23, 2017

Hi guys,
I have a big list of posts that I converted to markdown files from my previous web site. My post count in _posts directory now is near 10000 posts.
I try hexo server --debug but it's not running after 15 minutes processing on a machine with 32 gig RAM and a SSD hard drive.
I also try hexo clean && hexo generate but it's also not finish after 25 minutes.
Now my question are:

  1. Is it normal?
  2. How we can speed up Hexo process?
  3. Can we use Hexo for this scale site or It's for smaller sites?
@NoahDragon
Copy link
Member

  1. Yes, it is a known performance issue on Hexo. But all static website generator has such issue. We are trying to improve it. Speed up generating speed #550
  2. Here are several tips:
  • Run hexo clean only when the style or the theme is changed which may affect all the pages. The old posts are stored in database, if there is no global change the generating process will skip them.
  • Use simple theme which does not contain many widgets.
  • Don't use categories or tags.
  • Remove unused Hexo plugins.
  1. So far, we haven't encountered such amount of posts to process. But the performance is really varied by different themes, the number of categories/tags, and renderer engines.

@NoahDragon NoahDragon added the question Needs help in usage label May 23, 2017
@mnlbox
Copy link
Author

mnlbox commented May 24, 2017

@NoahDragon Thanks for your reply.
I create simple theme based on bootstrap without any widgets. It's extra simple and just use some property from app _config (I put some global config in app _config rather than theme config) I don't know is it right or not?
I used many front-matter in my markdown page. (Category, Tags and two other custom attribute)
Each post has between 2 or 10 tags and just one category
My renderer engine is https://github.com/hexojs/hexo-renderer-marked

Can you make me some other suggestion based on this new information about my app?

@rahil471
Copy link

I have been through the same problem. In the end, we decided to get rid of some hexo-plugins like.
hexo-multiauthors, hexo-tag-generator, hexo-archives.
And it's not about the RAM, it has more to do with your CPU's. Hexo Generate is a CPU intensive task.
@NoahDragon
If we could utilize multi-cores(if it doesn't) then maybe we can reduce the pain a bit.

@mnlbox
Copy link
Author

mnlbox commented May 24, 2017

@rahil471 Yes I checked my system monitor and it seems Hexo only use 1 CPU core. I have 4 core and one of my core increased until 100% but others between 2% to 19%.

@NoahDragon
Copy link
Member

@mnlbox It's okay to put all configuration into the app _config.yml file, the theme configuration will fall back to the app when it is not set.
As the #550 states, the categories/tags may dramatically slow down the rendering process. So far, I don't have better suggests on that unless we improve the Hexo performance.

@rahil471 I believe @tommy351 has tried the multi-cores approach but I don't know why he didn't continue, maybe creating multiple Hexo rendering instances may increase the complexity and maintenance efforts. I think it is a good approach, and may re-think about it.

@mnlbox
Copy link
Author

mnlbox commented May 26, 2017

Maybe related: #2164

@mnlbox mnlbox mentioned this issue May 26, 2017
53 tasks
@mnlbox
Copy link
Author

mnlbox commented Jun 4, 2017

@NoahDragon I can build my site with Hugo in just 25 second. (with hexo my build not finished after 5 hour)
25 second for more than 10000 posts is awesome.
What is the reason of this biiiiig difference?

@NoahDragon
Copy link
Member

@mnlbox Thanks for the info. I will take a look into Hugo, I'm assuming it uses multiple processors and the performance between javascript and go also impacts.

@leesei
Copy link
Member

leesei commented Jun 12, 2017

@mnlbox did you tried the default theme without any plugin? (init a new site and copy your posts over)
Inefficiency of theme/widget may be the culprit.

See discussion starting from here:
#1769 (comment)

@mnlbox
Copy link
Author

mnlbox commented Jun 24, 2017

@leesei Yes I also try default theme and also try remove unused plugins but it's not different for this issue.

@stevenjoezhang
Copy link
Member

stevenjoezhang commented Nov 26, 2022

I have tested a site with 2000 posts, and found the following code was executed over 10k times, taking up 20% of the execution time of hexo g

hexo/lib/models/tag.js

Lines 35 to 43 in 58a8f8c

Tag.virtual('posts').get(function() {
const PostTag = ctx.model('PostTag');
const ids = PostTag.find({tag_id: this._id}).map(item => item.post_id);
return ctx.locals.get('posts').find({
_id: {$in: ids}
});
});

One of the reasons is that queries (find in line 38 and 40) are O(n), thus the time consumption is terrible when dealing with a large number of posts

Update: with 8000 posts, list_tags is taking 47% of the execution time (21m 50s in total)

截屏2022-11-27 16 05 12

Update: I tried to disable external_link and optimize list_tags, the generating time reduced to ~5.25min

CC @hexojs/core

@lorezyra
Copy link

lorezyra commented Dec 10, 2022

I have a site with over 1400 posts and almost 10K assets. It takes HEXO over 30 minutes to generate if I don't run hexo clean first. However, running hexo clean && hexo gen will generate the site within a minute. This tells me the issue resides with using the db.json file. That db is of no value to me as I only need my website generated and pushed. I don't need it after the site is generated.

@SukkaW
Copy link
Member

SukkaW commented Dec 10, 2022

One of the reasons is that queries (find in line 38 and 40) are O(n), thus the time consumption is terrible when dealing with a large number of posts

@stevenjoezhang

Tag.virtual().get() defines a getter, and the getter function here will be executed every time the property is accessed. So before we try to optimize the find, is it possible for Hexo to reduce the access to tag.posts with cache?

@yoshinorin
Copy link
Member

yoshinorin commented Apr 14, 2024

I have a site with over 1400 posts and almost 10K assets. It takes HEXO over 30 minutes to generate if I don't run hexo clean first. However, running hexo clean && hexo gen will generate the site within a minute. This tells me the issue resides with using the db.json file. That db is of no value to me as I only need my website generated and pushed. I don't need it after the site is generated.

I've been deleting db.json before running hexo g and hexo s for quite some time. Today I took the opportunity to capture the framegraph without deleting db.json. (See #5456 (comment))

My environment has 1800 .md files and 1500 image files (jpg, png), and db.json size is 37MB.

As a result, it seems that toObject in the warehouse is taking time.

withdb json

@SukkaW
Copy link
Member

SukkaW commented Apr 14, 2024

As a result, it seems that toObject in the warehouse is taking time.

So the culprit is the cloneDeep.

But here is a thing: JSON doesn't support circular reference, hence why the warehouse uses cloneDeep. We can accomplish that by using a JSON-like format that supports circular reference.

@stevenjoezhang
Copy link
Member

I have found some JSON libraries that handle circular references very well, such as flatter. However, switching the implementation of warehouse over to this would require a significant amount of manpower.

@stevenjoezhang
Copy link
Member

In my own experiments, cloneDeep was not invoked. I'm curious under what circumstances / configurations this would be an issue?

@yoshinorin
Copy link
Member

In my own experiments, cloneDeep was not invoked. I'm curious under what circumstances / configurations this would be an issue?

Below is my environment. I didn't include _config.yml in this comment. Because it's lengthy. Is it necessary? If you need any additional information, please let me know. I'll provide it if possible.

Hexo, Node version

$ hexo -v
hexo: 7.1.1
hexo-cli: 4.3.1
os: win32 10.0.22631
node: 20.11.1
...
v8: 11.3.244.8-node.17

Machine Info

# OS
Microsoft Windows [Version 10.0.22631.3447]

# Cpu
AMD Ryzen 7 PRO 4750G with Radeon Graphics

# Memory
Capacity     Name             Tag
17179869184  Physical Memory  Physical Memory 1
17179869184  Physical Memory  Physical Memory 3

Dependencies

// package.json
"dependencies": {
  "hexo": "7.1.1",
  "hexo-filter-nofollow": "2.0.2",
  "hexo-generator-archive": "git+https://github.com/hexojs/hexo-generator-archive.git#master",
  "hexo-generator-category": "git+https://github.com/hexojs/hexo-generator-category.git#master",
  "hexo-generator-feed": "git+https://github.com/yoshinorin/_hexo-generator-feed.git#master",
  "hexo-generator-index": "git+https://github.com/hexojs/hexo-generator-index.git#master",
  "hexo-generator-sitemap": "git+https://github.com/yoshinorin/_hexo-generator-sitemap.git#master",
  "hexo-generator-tag": "git+https://github.com/hexojs/hexo-generator-tag.git#master",
  "hexo-html-minifier": "git+https://github.com/hexojs/hexo-html-minifier.git#master",
  "hexo-pagination": "git+https://github.com/yoshinorin/hexo-pagination.git#my-site",
  "hexo-renderer-ejs": "git+https://github.com/hexojs/hexo-renderer-ejs.git#master",
  "hexo-renderer-markdown-it": "git+https://github.com/hexojs/hexo-renderer-markdown-it#master",
  "hexo-server": "git+https://github.com/hexojs/hexo-server.git#master"
}

Theme

I'm using a theme that I've delete many features from https://github.com/LouisBarranqueiro/hexo-theme-tranquilpeak.

db.json size

$ dir

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        2024/04/16     20:21       37507929 db.json

Number of posts, assets etc

Please see Appendix section how to get these.

Number of posts: 1773
Number of post assets: 1784
Avg of post content length: 3645

Number of pages: 23
Number of page assets: 81
Avg of page content length: 4217

Number of tags: 246
Number of categories: 170
Number of routes: 5335

Appendix 1. (How to get number of x)

const Hexo = require('hexo');
const hexo = new Hexo(process.cwd(), {silent: false});

hexo.init().then(() => {
  hexo.load().then(() => {

    const posts = hexo.locals.get('posts').toArray();
    const postAsset = hexo.model('PostAsset');
    let numOfPostAssets = 0;
    let postContentTotalLen = 0;
    for(let post of posts) {
      const dir = post.path.slice(0, post.path.lastIndexOf("/"));
      const assets = postAsset.filter(x => x._id.includes(dir));
      numOfPostAssets = numOfPostAssets + assets.length;
      postContentTotalLen = postContentTotalLen + post.content.length;
    }

    const pages = hexo.locals.get('pages').toArray();
    const pageAsset = hexo.model('Asset');
    let numOfPageAssets = 0;
    let pageContentTotalLen = 0;
    for(let page of pages) {
      const dir = page.path.slice(0, page.path.lastIndexOf("/"));
      const assets = pageAsset.filter(x => x._id.includes(dir));
      numOfPageAssets = numOfPageAssets + assets.length;
      pageContentTotalLen = pageContentTotalLen + page.content.length;
    }

    const tags = hexo.locals.get('tags').toArray();
    const categories = hexo.locals.get('categories').toArray();
    const routes = hexo.route.list();

    console.log(`Number of posts: ${posts.length}`);
    console.log(`Number of post assets: ${numOfPostAssets}`);
    console.log(`Avg of post content length: ${Math.floor(postContentTotalLen / posts.length)}`);

    console.log(`Number of pages: ${pages.length}`);
    console.log(`Number of page assets: ${numOfPageAssets}`);
    console.log(`Avg of page content length: ${Math.floor(pageContentTotalLen / pages.length)}`);

    console.log(`Number of tags: ${tags.length}`);
    console.log(`Number of categories: ${categories.length}`);
    console.log(`Number of routes: ${routes.length}`);
  });
});

Appendix 2. (How to get framegraph)

$ 0x -D framegraph\\with-dbjson .\\node_modules\\hexo\\bin\\hexo g

@yoshinorin
Copy link
Member

yoshinorin commented Apr 18, 2024

In my own experiments, cloneDeep was not invoked. I'm curious under what circumstances / configurations this would be an issue?

caused by toArray(). This function exec when post_asset_folder option is enabled.

const post = Post.toArray().find(post => file.source.startsWith(post.asset_dir));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request #perfmatters
Projects
None yet
Development

No branches or pull requests

9 participants