Conversation
Tested with https://github.com/nicolargo/glances since it is a relatively complex pyton program that generates a log. Files: * Cargo.toml: Add python tree-sitter crate. * Tasks.md: Use task markers so we can check them off. * lib.rs: Refactor the placeholder regex and add python to SourceLanguage. * source_ref.rs: The `escape_ignore_newlines()` needs to convert octal escapes to hex. Also, handle raw strings since the backslashes need to be escaped.
ttiimm
left a comment
There was a problem hiding this comment.
Would you have time to write the raw string test case coverage? Otherwise, the changes look fine.
| - TSS: Doesn't this work already? I echo | ||
| the body of the log message into log2src | ||
| and it can find the message. |
There was a problem hiding this comment.
I thought I tried it recently, but there was a panic when something was using the log format. Could we remove the log formats from the test code if they're unneeded by the test case?
| fn escape_ignore_newlines(raw: bool, segment: &str) -> String { | ||
| const HEX_CHARS: [char; 16] = [ | ||
| '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', | ||
| ]; | ||
|
|
||
| let mut result = String::with_capacity(segment.len() * 2); | ||
| for c in segment.chars() { | ||
| match c { | ||
| '\n' => result.push_str(r"\n"), // Use actual newline in regex | ||
| '\r' => result.push_str(r"\r"), // Handle carriage returns too | ||
| // Escape regex special chars | ||
| '.' | '+' | '*' | '?' | '^' | '$' | '(' | ')' | '[' | ']' | '{' | '}' | '|' => { | ||
| let mut last_end = 0; | ||
| let regex = if raw { | ||
| &RAW_ESCAPE_REGEX | ||
| } else { | ||
| &ESCAPE_REGEX | ||
| }; | ||
| for cap in regex.captures_iter(segment) { | ||
| let overall_range = cap.get(0).unwrap().range(); | ||
| result.push_str(segment[last_end..overall_range.start].as_ref()); | ||
| last_end = overall_range.end; | ||
| if let Some(c) = cap.get(1) { | ||
| result.push('\\'); | ||
| result.push_str(c.as_str()); | ||
| } else if let Some(c) = cap.get(2) { | ||
| match c.as_str() { | ||
| "\n" => result.push_str("\\n"), | ||
| "\r" => result.push_str("\\r"), | ||
| "\t" => result.push_str("\\t"), | ||
| _ => unreachable!(), | ||
| } | ||
| } else if let Some(c) = cap.get(3) { | ||
| if raw { | ||
| result.push('\\'); | ||
| result.push_str(c.as_str()); | ||
| } else { | ||
| let c = c.as_str(); | ||
| let c = &c[1..]; | ||
| let c = u8::from_str_radix(c, 8).unwrap(); | ||
| result.push('\\'); | ||
| result.push(c); | ||
| result.push('x'); | ||
| result.push(HEX_CHARS[(c >> 4) as usize]); | ||
| result.push(HEX_CHARS[(c & 0xf) as usize]); | ||
| } | ||
| _ => result.push(c), | ||
| } else if let Some(_c) = cap.get(4) { | ||
| // XXX This is the fancy Python "\N{...}" escape sequence. Ideally, we'd interpret the | ||
| // name of the escape, but that seems like a lot of work. So, we'll just match any | ||
| // character. | ||
| result.push_str("\\w"); | ||
| } else { | ||
| unreachable!(); | ||
| } | ||
| } | ||
| result.push_str(segment[last_end..].as_ref()); |
There was a problem hiding this comment.
I don't see any additional test coverage for the raw strings. Would you add some?
| @@ -0,0 +1,52 @@ | |||
| --- | |||
There was a problem hiding this comment.
What do you think about adding test coverage here vs in the root test directory. I guess this tests the library pretty thoroughly end to end, so maybe those tests should focus more on the CLI features. Does that make sense to you?
There was a problem hiding this comment.
I add tests in lib.rs so that I can use the debugger. The stuff under tests spawns a separate process and I don't think I was able to attach a debugger.
| static ESCAPE_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new(r#"([.*+?^${}()|\[\]])|([\n\r\t])|(\\[0-7]{3}|\\0)|(\\N\{[^}]+})"#).unwrap() | ||
| }); | ||
|
|
||
| // A regex for raw strings that doesn't try to interpret escape sequences. | ||
| static RAW_ESCAPE_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new(r#"([.*+?^${}()|\[\]])|([\n\r\t])|(\\)"#).unwrap() | ||
| }); | ||
|
|
There was a problem hiding this comment.
These regexes, and the placeholder ones, are hurting my brain. I wish there was a more expressive way to write them. Any ideas?
There was a problem hiding this comment.
I'll add some comments for now.
The |
I was looking for a case in the source_ref module and didn't see any. |
Does this work with Rust raw strings? |
Tested with https://github.com/nicolargo/glances since it is a relatively complex pyton program that generates a log.
Files:
escape_ignore_newlines()needs to convert octal escapes to hex. Also, handle raw strings since the backslashes need to be escaped.