php - Why does is_numeric() return false when passed a seemingly numeric string? ← (PHP)

I am trying to understand numeric strings in PHP. I have the following code:

var_dump(5 * "10 abc");
var_dump(is_numeric("10 abc"));

Which gives me the output:

int(50)
bool(false)

This confuses me as the string "10 abc" seems to be interpreted as a numeric string in the first expression (hence the int(50) output and no warnings about using a non-numeric value), but when run through the is_numeric() function it returns false, suggesting that it is in fact not a numeric string.

I have spent some time looking through the documentation to understand this behaviour but can't find any concrete answers, can somebody please help to explain what is causing this behaviour?

I am aware PHP 8.0.0 made some changes to what is considered a numeric string, but this is PHP 7.1.33 I am trying to understand right now.

Answer



Solution:

RFC author of the "Saner numeric string" RFC which got accepted for PHP 8.0 here.

"10 abc" is not a numeric string, but a leading-numeric string, meaning that the beginning of the string looks like a number but it isn't one because gibberish exists at some point in the string (and this includes white-spaces).

Because is_numeric() checks that a value is considered numeric per PHP's definition (which prior to PHP 8.0 meant leading white-spaces followed by a + or - sign and any of an integer, a normal decimal number, or a number in exponential notation), it will return false on strings which are just considered leading-numeric.

However, arithmetic operation try to convert their operands to a proper number type (int or float) and as such "10 abc" gets converted to 10 because PHP will convert the leading-numeric string to it's leading numeric value.

Many more "fun" details and edge cases can be found in the technical background section of the PHP RFC.

Answer



Solution:

I think the easiest way to understand the behaviour you describe is that just because a string isn't numeric, that does not mean it cannot be coerced or treated as a number.

Your first line code

var_dump(5 * "10 abc");

Treats the string as a number, and once it comes across an invalid character it just ignores everything else after that.

Your other line of code

var_dump(is_numeric("10 abc"));

Actually behaves more intelligent, and asks itself, just like a human might, are we dealing with a numeric string here; the answer to which is no (because of those same invalid characters).

Source