Open
Conversation
sstadick
reviewed
May 9, 2025
| # track length before and after | ||
| var before = len(self.seq) | ||
| var _want = want | ||
| var _total = self.reader.read_bytes(self.seq, _want) |
Contributor
There was a problem hiding this comment.
What if here we did var _total = self.reader.read_bytes(self.seq, _want, keep=True)?
You'd have to adjust the byte math to subtract one. But it avoids an extra read call, which might be nice.
Contributor
There was a problem hiding this comment.
That's not read_until! So just _want + 1?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stabilizing fastxpp Benchmarks
I had AI summarize my messy notes and hyperfine results. Everything seems to be correct.
This is follow up to #14 where there were some inconsistent results.
TL,DR
By holding benchmarking scaffolding static with
@no_inlineand selectively forcing inlining on the hottest helpers, we:origimplementation in the apples‑to‑apples, separate‑executable benchmark, all without algorithmic changesMotivation
The existing benchmark numbers have been noisy, likely because the compiler optimizes the benchmark harness together with the implementation under test. This obscures the real cost of each I O strategy. We want numbers that:
strip_newline,read_byte, andread_until, andHeader field definition (for now)
How do we calculate the last line (if we wanted to)?
Different read methods for fastxpp
The methods are named terribly sorry, ill change latter.
There are 4 key steps
1: Identify record start ('>')
2: Read header
3: SWAR decode header info field
4: Read sequence bytes
Besides the original (naive) read method, the main difference between the three is how we read the sequence bytes (and quality scores if this was fastq). Especially how we remove new lines in sequence blocks of fasta.
orig
strip_newline
swar
read_once
Only passes over bytes once
Design of the Experiment
Input: 2.6G uncompressed fasta file
strip_newlineread_byteread_until@always_inline@no_inline@no_inline@always_inline@no_inline@no_inline@no_inline@no_inline@no_inline@no_inline@always_inline@always_inline@always_inline@no_inline@always_inline@always_inline@always_inline@no_inline)@always_inline@always_inline@always_inlineAll builds used the same
mojo build fastxpp_bench.mojoinvocation and were measured with Hyperfine--warmup 3 -r 10on an otherwise idle machine.Results Snapshot
orig(s)strip_newline(s)swar(s)read_once(s)orig@no_inline@no_inline, helpers@no_inline@no_inline, helpers@always_inline@no_inline, helpers@always_inlineOrdering Sensitivity
The last entry in the bench list is most sensitive to
@no_inline. In-lining the read bytes functions eliminates most of the difference besides compiling separately.Summary
@no_inlineon all bench functions to freeze harness behavior.read_byteandread_untilbecause they are hot in both swar and read_once paths.readhelpers.var lcnt = (slen + (bpl - 2)) // (bpl - 1)
Next steps