Add script to convert Meta Tectonic traces into object traces (for use with libCacheSim)#8
Add script to convert Meta Tectonic traces into object traces (for use with libCacheSim)#8shermanjlim wants to merge 2 commits intocacheMon:mainfrom
Conversation
Summary of ChangesHello @shermanjlim, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new C++ utility that facilitates the conversion of Meta Tectonic block storage traces into a Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a C++ script to convert Meta Tectonic traces into a format compatible with libCacheSim, along with corresponding updates to the README. The script is a solid addition, but I've identified some critical robustness issues related to CSV parsing that could cause crashes on malformed input. My review includes suggestions to handle these errors gracefully. I've also provided feedback to improve code maintainability and adhere to standard C++ practices, such as using named constants and directing error output to stderr.
| op_time = std::stoull(cells[0]); | ||
| block_id = cells[1]; | ||
| io_size = std::stoull(cells[3]); | ||
| io_offset = std::stoull(cells[4]); | ||
| rs_shard_id = cells[10]; |
There was a problem hiding this comment.
The CSV parsing logic here is not robust and could cause the program to crash if it encounters malformed data.
- Out-of-bounds access: The code accesses
cells[10]without checking if the line has at least 11 columns. This will lead to undefined behavior on shorter lines. - Unhandled exceptions:
std::stoullwill throw an exception if a string is not a valid number, which will terminate the program.
You should add validation to handle these cases gracefully, for instance by wrapping the parsing in a try-catch block and checking cells.size().
if (cells.size() < 11) {
std::cerr << "Skipping malformed line (not enough columns): " << line << std::endl;
continue;
}
try {
op_time = std::stoull(cells[0]);
block_id = cells[1];
io_size = std::stoull(cells[3]);
io_offset = std::stoull(cells[4]);
rs_shard_id = cells[10];
} catch (const std::exception& e) {
std::cerr << "Skipping line due to conversion error: " << e.what() << " in line: " << line << std::endl;
continue;
}| std::cout << "Usage: " << argv[0] | ||
| << " <output_csv> <input_csv1> [<input_csv2> ...]\n"; |
There was a problem hiding this comment.
It's standard practice to print usage information to std::cerr rather than std::cout. This allows users to redirect standard output (e.g., to a file) without capturing usage or error text.
| std::cout << "Usage: " << argv[0] | |
| << " <output_csv> <input_csv1> [<input_csv2> ...]\n"; | |
| std::cerr << "Usage: " << argv[0] | |
| << " <output_csv> <input_csv1> [<input_csv2> ...]\n"; |
| // open files | ||
| std::ofstream output_csv{argv[1]}; | ||
| if (!output_csv) { | ||
| std::cout << "Error: Could not open file " << argv[1] << std::endl; |
There was a problem hiding this comment.
| for (size_t i = 2; i < argc; ++i) { | ||
| input_csvs.emplace_back(argv[i]); | ||
| if (!input_csvs.back()) { | ||
| std::cout << "Error: Could not open file " << argv[i] << std::endl; | ||
| } | ||
| } |
There was a problem hiding this comment.
The current logic prints an error but continues if an input file fails to open. This could lead to an incomplete conversion without a clear failure signal. It's more robust to exit immediately upon failing to open a required file. Also, error messages should be directed to std::cerr.
| for (size_t i = 2; i < argc; ++i) { | |
| input_csvs.emplace_back(argv[i]); | |
| if (!input_csvs.back()) { | |
| std::cout << "Error: Could not open file " << argv[i] << std::endl; | |
| } | |
| } | |
| for (size_t i = 2; i < argc; ++i) { | |
| input_csvs.emplace_back(argv[i]); | |
| if (!input_csvs.back()) { | |
| std::cerr << "Error: Could not open file " << argv[i] << std::endl; | |
| return EXIT_FAILURE; | |
| } | |
| } |
| size_t op_time; // 0 | ||
| std::string block_id; // 1 | ||
| size_t io_size; // 3 | ||
| size_t io_offset; // 4 | ||
| std::string rs_shard_id; // 10 |
There was a problem hiding this comment.
Using magic numbers for column indices, as indicated by the comments, makes the code harder to read and maintain. If the column order of the input trace changes, you'd have to find all occurrences of these numbers. It would be better to define these as named constants.
For example:
constexpr size_t kColOpTime = 0;
constexpr size_t kColBlockId = 1;
// ...etc.
// Then use them in the loop:
op_time = std::stoull(cells[kColOpTime]);
Description
Fixes #7