Skip to content

Commit bb576ba

Browse files
authored
Lexer: Fix possible OOB read in quoted strings (#374)
## Summary Fixes an out-of-bounds string access in `WP_MySQL_Lexer::read_quoted_text()` that produces PHP warnings when lexing unclosed quoted strings with trailing backslashes: ``` Uninitialized string offset N in class-wp-mysql-lexer.php on line 2855 ``` This appears to happen in streaming SQL processing (e.g., WordPress Playground's `runSql` blueprint step) when a buffer boundary splits a quoted string literal and a backslash falls at the end of the buffer. Standard MySQL dumps with escaped string literals contain thousands of backslashes, making this likely to hit in practice. ## The bug The backslash-counting loop in `read_quoted_text()` ran **before** the EOF check. When `strcspn()` reached the end of the string without finding a closing quote: 1. `$at` pointed to `strlen($sql)` (one past the last byte). 2. The backslash loop accessed `$this->sql[$at - 1]` — valid, but if it was `\`: 3. The loop treated the absent quote as escaped and did `$at += 1` (now past end). 4. Next iteration: `strcspn` returned 0, `$at` stayed past end. 5. The backslash loop accessed `$this->sql[strlen($sql)]` — **out of bounds**. ## Fix Two changes to the `while (true)` loop body: 1. **Move the EOF check before the backslash-counting loop.** When `strcspn` reaches end-of-string without finding the quote, `$this->sql[$at] ?? null` won't match `$quote`, so we return `null` immediately — the backslash loop is never reached. 2. **Add a lower-bound guard to the backslash loop.** The `for` condition now includes `($at - $i - 1) >= 0` to prevent underflow when a quote appears near the start of the string. Belt-and-suspenders — the EOF reorder already prevents the primary bug. ## Tests - Unclosed strings with odd/even trailing backslashes (single and double quotes). - Regression tests for valid escaped strings, doubled quotes, and backtick identifiers. - Chunk boundary test simulating a streaming SQL processor splitting input at a backslash inside a quoted string. ## Use of AI This was diagnosed and implemented with the help of Claude Code.
1 parent eb3146a commit bb576ba

2 files changed

Lines changed: 115 additions & 8 deletions

File tree

packages/mysql-on-sqlite/src/mysql/class-wp-mysql-lexer.php

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2825,8 +2825,6 @@ private function read_number(): ?int {
28252825
* Rules:
28262826
* 1. Quotes can be escaped by doubling them ('', "", ``).
28272827
* 2. Backslashes escape the next character, unless NO_BACKSLASH_ESCAPES is set.
2828-
*
2829-
* @param string $quote The quote character - ', ", or `.
28302828
*/
28312829
private function read_quoted_text(): ?int {
28322830
$quote = $this->sql[ $this->bytes_already_read ];
@@ -2842,6 +2840,11 @@ private function read_quoted_text(): ?int {
28422840
while ( true ) {
28432841
$at += strcspn( $this->sql, $quote, $at );
28442842

2843+
// Unclosed string - unexpected EOF.
2844+
if ( ( $this->sql[ $at ] ?? null ) !== $quote ) {
2845+
return null; // Invalid input.
2846+
}
2847+
28452848
/*
28462849
* By default, quotes can be escaped with a "\".
28472850
* When NO_BACKSLASH_ESCAPES SQL mode is active, the "\" treated as
@@ -2852,18 +2855,13 @@ private function read_quoted_text(): ?int {
28522855
* "\\\" is an escaped backslash and an escape sequence, and so on.
28532856
*/
28542857
if ( ! $no_backslash_escapes ) {
2855-
for ($i = 0; '\\' === $this->sql[ $at - $i - 1 ]; $i += 1);
2858+
for ( $i = 0; ( $at - $i - 1 ) >= 0 && '\\' === $this->sql[ $at - $i - 1 ]; $i += 1 );
28562859
if ( 1 === $i % 2 ) {
28572860
$at += 1;
28582861
continue;
28592862
}
28602863
}
28612864

2862-
// Unclosed string - unexpected EOF.
2863-
if ( ( $this->sql[ $at ] ?? null ) !== $quote ) {
2864-
return null; // Invalid input.
2865-
}
2866-
28672865
// Check if the quote is doubled.
28682866
if ( ( $this->sql[ $at + 1 ] ?? null ) === $quote ) {
28692867
$at += 2;

packages/mysql-on-sqlite/tests/mysql/WP_MySQL_Lexer_Tests.php

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,115 @@ public function data_identifier_or_number(): array {
258258
);
259259
}
260260

261+
/**
262+
* Test that unclosed quoted strings with trailing backslashes do not
263+
* cause out-of-bounds string access in read_quoted_text().
264+
*
265+
* The backslash-counting loop walks backward from the position returned
266+
* by strcspn(). When the closing quote is missing, strcspn() returns
267+
* the remaining string length. If the last byte is a backslash, the
268+
* loop treats the absent quote as escaped and advances past the end
269+
* of the string. On the next iteration, the loop accesses an invalid
270+
* string offset, triggering "Uninitialized string offset" warnings.
271+
*
272+
* @dataProvider data_unclosed_strings_with_backslashes
273+
*/
274+
public function test_unclosed_string_with_trailing_backslash( string $sql ): void {
275+
set_error_handler(
276+
function ( $severity, $message, $file, $line ) {
277+
throw new \ErrorException( $message, 0, $severity, $file, $line );
278+
},
279+
E_WARNING | E_NOTICE
280+
);
281+
282+
try {
283+
$lexer = new WP_MySQL_Lexer( $sql );
284+
while ( $lexer->next_token() ) {
285+
// Consume all tokens.
286+
}
287+
} finally {
288+
restore_error_handler();
289+
}
290+
291+
// If we reach here without an ErrorException, no OOB access occurred.
292+
$this->assertNull( $lexer->get_token() );
293+
}
294+
295+
public function data_unclosed_strings_with_backslashes(): array {
296+
return array(
297+
'single-quoted trailing backslash' => array( "SELECT '\\" ),
298+
'double-quoted trailing backslash' => array( 'SELECT "\\' ),
299+
'even trailing backslashes' => array( "SELECT '\\\\" ),
300+
'odd trailing backslashes' => array( "SELECT '\\\\\\" ),
301+
'backslash-only single-quoted' => array( "'\\" ),
302+
'backslash-only double-quoted' => array( '"\\' ),
303+
);
304+
}
305+
306+
/**
307+
* Regression: valid strings with escapes must still tokenize correctly.
308+
*
309+
* @dataProvider data_valid_escaped_strings
310+
*/
311+
public function test_valid_escaped_string( string $sql, int $expected_token_id ): void {
312+
$lexer = new WP_MySQL_Lexer( $sql );
313+
$this->assertTrue( $lexer->next_token() );
314+
$this->assertSame( $expected_token_id, $lexer->get_token()->id );
315+
}
316+
317+
public function data_valid_escaped_strings(): array {
318+
return array(
319+
'escaped single quote' => array( "'it\\'s'", WP_MySQL_Lexer::SINGLE_QUOTED_TEXT ),
320+
'trailing escaped backslash' => array( "'path\\\\'", WP_MySQL_Lexer::SINGLE_QUOTED_TEXT ),
321+
'doubled single quote' => array( "'it''s'", WP_MySQL_Lexer::SINGLE_QUOTED_TEXT ),
322+
'empty single-quoted string' => array( "''", WP_MySQL_Lexer::SINGLE_QUOTED_TEXT ),
323+
'escaped double quote' => array( '"col\\"name"', WP_MySQL_Lexer::DOUBLE_QUOTED_TEXT ),
324+
'backtick identifier' => array( '`my_column`', WP_MySQL_Lexer::BACK_TICK_QUOTED_ID ),
325+
);
326+
}
327+
328+
/**
329+
* Test that a chunk boundary splitting a quoted string with a trailing
330+
* backslash does not cause an out-of-bounds string access.
331+
*
332+
* This simulates streaming SQL processing where a buffer boundary falls
333+
* inside a string literal right after a backslash escape character.
334+
*/
335+
public function test_chunk_boundary_inside_escaped_string(): void {
336+
set_error_handler(
337+
function ( $severity, $message, $file, $line ) {
338+
throw new \ErrorException( $message, 0, $severity, $file, $line );
339+
},
340+
E_WARNING | E_NOTICE
341+
);
342+
343+
try {
344+
// Build a SQL string where a backslash falls at the chunk boundary.
345+
// The string content before the boundary is padded to place the
346+
// backslash at exactly position $chunk_size - 1.
347+
$chunk_size = 8192;
348+
349+
// "SELECT '" = 8 bytes, so we need chunk_size - 8 - 1 bytes of
350+
// padding before the trailing backslash to place '\' at the last
351+
// byte of the chunk.
352+
$padding = str_repeat( 'A', $chunk_size - 8 - 1 );
353+
$sql = "SELECT '" . $padding . '\\';
354+
355+
// The chunk is exactly $chunk_size bytes. The last byte is '\'.
356+
// The lexer should handle this as an unclosed string without OOB.
357+
$this->assertSame( $chunk_size, strlen( $sql ) );
358+
359+
$lexer = new WP_MySQL_Lexer( $sql );
360+
while ( $lexer->next_token() ) {
361+
// Consume all tokens.
362+
}
363+
} finally {
364+
restore_error_handler();
365+
}
366+
367+
$this->assertNull( $lexer->get_token() );
368+
}
369+
261370
private function get_token_names( array $token_types ): array {
262371
return array_map(
263372
function ( $token_type ) {

0 commit comments

Comments
 (0)