Growth 1304 fix memory issue #5

sbehak-trr · 2026-02-12T17:33:35Z

Description

Previously, each time we loaded the max amount of urls per file (defined by the @max_urls config variable) we generated a new sitemap file and released the memory, then we repeated the process until all urls were processed. This allowed us to load a huge amount of urls without clogging all our memory.
At some point this feature was lost and sitemap files started to be generated only at the very end of the process. This means that the process starts to accumulate more and more memory and if there's too many urls to process we get a memory overload error.
This PR fixes the issue by restoring the capability to generate sitemap files each time we reach the limit defined by the @max_urls config variable.

src/sitemapper/builder/stream_builder.cr

rgoraya · 2026-02-12T19:38:44Z

src/sitemapper/builder/stream_builder.cr

-    def flush(page)
-      filename = filename_for_page(page)
-      doc = build_xml_for_page(paginator.items(page))
+    def flush


Same here, why not just keep using flush(page) and when needed send current_page as the value of that param?

Here I'm not so sure, because I need to update the value of @current_page as well and it's not a good idea to modify the value of an input param. Also, it doesn't make sense to "flush" pages other than the current page, because we are dropping them as soon as they are processed.

johnholdun

Seems good, but unless I'm mistaken, Paginator is no longer needed here. It's being recreated every time its collection reaches the limit, so it will always only return one page of results, and thus could be replaced by an array.

sbehak-trr added 2 commits February 12, 2026 13:41

[GROWTH-1304] fix memory issue when sitemap files are too long

2d5551e

remove page variable

4d35cbf

sbehak-trr marked this pull request as ready for review February 12, 2026 17:33

rgoraya reviewed Feb 12, 2026

View reviewed changes

src/sitemapper/builder/stream_builder.cr Outdated Show resolved Hide resolved

rgoraya reviewed Feb 12, 2026

View reviewed changes

add last sitemap file to index

6dc8759

sbehak-trr requested a review from a team February 12, 2026 20:48

johnholdun approved these changes Feb 12, 2026

View reviewed changes

rgoraya approved these changes Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Growth 1304 fix memory issue #5

Growth 1304 fix memory issue #5

Uh oh!

sbehak-trr commented Feb 12, 2026 •

edited by atlassian bot

Loading

Uh oh!

Uh oh!

rgoraya Feb 12, 2026

Uh oh!

sbehak-trr Feb 12, 2026

Uh oh!

johnholdun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Growth 1304 fix memory issue #5

Are you sure you want to change the base?

Growth 1304 fix memory issue #5

Uh oh!

Conversation

sbehak-trr commented Feb 12, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Uh oh!

rgoraya Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

sbehak-trr Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

johnholdun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbehak-trr commented Feb 12, 2026 •

edited by atlassian bot

Loading