php - How to extract the image paths and their recommended new dimensions for Automated Image Optimisation?
Get the solution ↓↓↓I am creating a php script to scrape the images and respective dimension recommendations from https://gtmetrix.com/reports/example.com/a_unique_code.
After extracting the image path and the suggested new height and width, I will programmatically optimize my images.
The following is the relevant portion of the html returned from the Uniform Resource Locator:
<tr class="rules-details" style="display: none">
<td colspan="4">
<a href="/serve-scaled-images.html" class="rule-help btn hover-tooltip" data-tooltip-interactive data-tooltip-max-width="450" title="<h4>Serve scaled images</h4><p>Serving appropriately-sized images can save many bytes of data and improve the performance of your webpage, especially on low-powered (eg. mobile) devices.</p><p class="rule-help-tooltip-more"><a href="/serve-scaled-images.html">Read more</a></p>"><i class="sprite-question"></i><span class="resp-hidden">What's this mean?</span></a>
<div>
<p>The following images are resized in HTML or CSS. Serving scaled images could save 1.3MiB (45% reduction).
<ul>
<li><a href="https://www.example.com/Pictures/thumbs/0029.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0029.jpg</a> is resized in HTML or CSS from 300x623 to 123x200. Serving a scaled image could save 51.3KiB (86% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0133.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0133.jpg</a> is resized in HTML or CSS from 300x578 to 135x200. Serving a scaled image could save 44.0KiB (84% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0075.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0075.jpg</a> is resized in HTML or CSS from 300x390 to 176x200. Serving a scaled image could save 43.2KiB (69% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0057.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0057.jpg</a> is resized in HTML or CSS from 300x436 to 174x200. Serving a scaled image could save 35.0KiB (73% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 31.4KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.9KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.7KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.7KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0093.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0093.jpg</a> is resized in HTML or CSS from 300x458 to 138x200. Serving a scaled image could save 28.9KiB (79% reduction).</li>
</ul>
</p>
</div>
</td>
</tr>
After advice from John Conde to use a DOM parser, here is my coding attempt:
$html = file_get_contents('https://gtmetrix.com/reports/example.com/a_unique_code');
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
$stack = array();
$expression = './/tr[contains(concat(" ", normalize-space(@class), " "), " rules-details ")]';
foreach ($xpath->evaluate($expression) as $tr)
{
array_push($stack, $tr->nodeValue);
}
$i=0;
foreach ($stack as $string)
{
$search_string = $string;
$find = 'reduction';
$pos = strpos($search_string, $find);
if($pos===false){}
else
{
$string = str_replace("What's this mean?","",$string);
$string = trim(preg_replace("/\s+/", " ", $string));
$string_array = explode(').', $string);
for($i=0;$i<sizeof($string_array);$i++)
{
$search_string = $string_array[$i];
$find = 'The following images are resized in HTML or CSS.';
$pos = strpos($search_string, $find);
if($pos===false){}
else
{
unset($string_array[$i]);
}
$find = "Optimize the following images to reduce their size by";
$pos = strpos($search_string, $find);
if($pos===false){}
else
{
$current_index = $string_array[$i];
$array_size = sizeof($string_array);
for($j=$current_index;$j<$array_size;$j++)
{
unset($string_array[$i]);
}
}
echo '<pre>'.$string_array[$i];
}
}
}
The question is, given the following string, how do I extract the url and second image dimension?
example.com/Pictures/thumbs/0093.jpg is resized in HTML or CSS from 300x458 to 138x200. Serving a scaled image could save 28.9KiB (79% reduction).
I need:
example.com/Pictures/thumbs/0093.jpg
138x200
I will be optimizing this prototype script, but this is how I am implementing JohnConde's answer:
<?php
// #########################################
// AUTOMATED IMAGE OPTIMIZATION
// #########################################
class Image
{
public $image_url;
public $image_name;
public $image_path;
public $image_full_path;
public $original_size;
public $new_size;
}
$debugging = true;
if($debugging === true){echo '<ul class="Results" style="display:block; height:auto;">';}
try
{
$HTML = file_get_contents('https://gtmetrix.com/reports/www.example.com/a_unique_code');// Get Webpage
switch($HTML)
{
case false:
if($debugging === true)
{
$error = error_get_last();
echo '<li class="Error_Msg" style="display:block; height:auto;">';
echo '<span><b>## FATAL ERROR - PROGRAM ABORTED ##</b></span>';
echo '<span><b>Message:</b> Could not retrieve the HTML document</span>';
echo '</li>';
error_clear_last();
exit;
}
break;
default:// START OF WRAPPER
$DOMdoc = new DOMDocument();// Object to store an HTML document
libxml_use_internal_errors(true);//
$html = @$DOMdoc->loadHTML($HTML);// Parse the HTML
$racks = (new DOMXPath($DOMdoc))->query('//tr/td/div//ul/li');// Creates a new DOMXPath object from the XPath expression
$images_info_array = array();// Array for storing image details objects
$document_root = $_SERVER['DOCUMENT_ROOT'];// Define the document root
foreach($racks as $rack)// Traverse over the HTML structure
{
// Define a pattern to search for
$expression = "/https?\:\/\/[^\",]+ is resized in HTML or CSS from \d{1,3}x\d{1,3} to \d{1,3}x\d{1,3}./";
if(preg_match_all($expression, $rack->nodeValue, $matched) == 1)// If the pattern is found then
{
$url = $rack->firstChild->nodeValue;// Get the URL from the string
preg_match_all('/\d{1,4}x\d{1,4}/', $rack->nodeValue, $matches);// Get the image dimensions from the string
[$original_size, $new_size] = $matches[0];//
$url_parts = parse_url($url);// Break the URL up into sections
$directory_path = $url_parts['path'];// Get the directory path without the domain
$path_parts = pathinfo($directory_path);// Get information about a file path
$position = strpos($directory_path, '/');// Find the first / in the file path
if ($position !== false)// If found
{
$new_directory_path = substr_replace($directory_path, "", $position, strlen('/'));// Remove the /
$image_info = new Image();// Create a new Image Object
$image_info->image_url = $url;// Store the image URL
$image_info->image_name = basename($url);// Store just the image name
$image_info->image_path = $path_parts['dirname'];// Store image directory without domain & file name
$image_info->image_full_path = $new_directory_path;//
$image_info->original_size = $original_size;// Store the original image size
$image_info->new_size = $new_size;// Store the new image size
array_push($images_info_array, $image_info);// Add the image information to an array
}else{
if($debugging === true)
{
$error = error_get_last();
echo '<li class="Warning_Msg">';
echo '<span><b>## WARNING - FILE PATH CHARACTER MISSING ##</b></span>';
echo '<span><b>Message:</b> / in the file path not found</span>';
echo '</li>';
error_clear_last();
}
}
}else{// If the pattern is not found then
if($debugging === true)
{
$error = error_get_last();
echo '<li class="Error_Msg" style="display:block; height:auto;">';
echo '<span><b>## FATAL ERROR - PROGRAM ABORTED ##</b></span>';
echo '<span><b>Message:</b> Could not find the pattern required to extract the URL & size information</span>';
echo '</li>';
error_clear_last();
exit;
}
}
}
foreach($images_info_array as $image_info)// Traverse the image info array
{
if(file_exists($image_info->image_full_path))// Check if the image exists
{
$temp_path = $document_root.$image_info->image_path.'/temp/';// Define a temporary folder location
switch(file_exists($temp_path))// Check if the temporary folder exists
{
case true:// If it does recursively delete it
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($temp_path, RecursiveDirectoryIterator::SKIP_DOTS), RecursiveIteratorIterator::CHILD_FIRST);
foreach ($files as $fileinfo)
{
$todo = ($fileinfo->isDir() ? 'rmdir' : 'unlink');
$todo($fileinfo->getRealPath());
}
rmdir($temp_path);
break;
case false:// If it does not exist create it
mkdir($temp_path, 0777);// If it doesnt create the temporary folder
break;
}
// Define the convert command for recommended optimization of the image
$command = 'convert -thumbnail '.$image_info->new_size.' "'.$document_root.'/'.$image_info->image_full_path.'" "'.$document_root.''.$image_info->image_path.'/temp/'.$image_info->image_name.'" 2>&1';
$last_line = system($command, $return_value);// Run the defined command
if($debugging === true)
{
switch ($return_value)
{
case true:
echo '<li class="Normal_Message">';
echo '<span><b>MESSAGE - THE COMMAND COMPLETED SUCCESSFULLY</b></span>';
echo '<span><b>Command:</b> '.$command.'</span>';
echo '<span><b>Directory:</b> '.$item->image_full_path.'</span>';
echo '<span><b>Resized:</b> '.$item->new_size.'</span>';
echo '<span><b>Returned:</b> '.$return_value.'</span>';
echo '<span><b>Output:</b> '.$last_line.'</span>';
echo '</li>';
break;
case false;
$error = error_get_last();
echo '<li class="Error_Msg" style="display:block; height:auto;">';
echo '<span><b>## ERROR - THE COMMAND DID NOT COMPLETE ##</b></span>';
echo '<span><b>TYPE:</b> '.$error['type'].'</span>';
echo '<span><b>MESSAGE:</b> '.$error['message'].'</span>';
echo '<span><b>FILE:</b> '.$error['file'].'</span>';
echo '<span><b>LINE:</b> '.$error['line'].'</span>';
echo '</li>';
error_clear_last();
break;
default:
break;
}
}
}
else// If the file does not exist
{
echo '<li class="Warning_Message" style="display:block; height:auto;">The file doesn\'t exist</li>';
}
}
break;// END OF WRAPPER
}
}
catch(Exception $Error_Message)
{
echo $Error_Message;
}
echo '</ul>';
?>
Answer
Solution:
This will parse that HTML and output the text you are looking for:
$html = '<tr class="rules-details" style="display: none">
<td colspan="4">
<a href="/serve-scaled-images.html" class="rule-help btn hover-tooltip" data-tooltip-interactive data-tooltip-max-width="450" title="<h4>Serve scaled images</h4><p>Serving appropriately-sized images can save many bytes of data and improve the performance of your webpage, especially on low-powered (eg. mobile) devices.</p><p class="rule-help-tooltip-more"><a href="/serve-scaled-images.html">Read more</a></p>"><i class="sprite-question"></i><span class="resp-hidden">What\'s this mean?</span></a>
<div>
<p>The following images are resized in HTML or CSS. Serving scaled images could save 1.3MiB (45% reduction).
<ul>
<li><a href="https://www.example.com/Pictures/thumbs/0029.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0029.jpg</a> is resized in HTML or CSS from 300x623 to 123x200. Serving a scaled image could save 51.3KiB (86% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0133.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0133.jpg</a> is resized in HTML or CSS from 300x578 to 135x200. Serving a scaled image could save 44.0KiB (84% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0075.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0075.jpg</a> is resized in HTML or CSS from 300x390 to 176x200. Serving a scaled image could save 43.2KiB (69% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0057.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0057.jpg</a> is resized in HTML or CSS from 300x436 to 174x200. Serving a scaled image could save 35.0KiB (73% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 31.4KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.9KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.7KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumb/thumb.png" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/thumb.png</a> is resized in HTML or CSS from 148x100 to 68x46. Serving a scaled image could save 30.7KiB (78% reduction).</li>
<li><a href="https://www.example.com/Pictures/thumbs/0093.jpg" target="_blank" rel="nofollow noopener noreferrer">https://www.example.com/Pictures/thumbs/0093.jpg</a> is resized in HTML or CSS from 300x458 to 138x200. Serving a scaled image could save 28.9KiB (79% reduction).</li>
</ul>
</p>
</div>
</td>
</tr>';
$doc = new DOMDocument();
$html = @$doc->loadHTML($html);
$items = (new DOMXPath($doc))->query('//tr/td/div//ul/li');
foreach ($items as $item) {
$url = $item->firstChild->nodeValue;
preg_match_all('/\d{1,3}x\d{1,3}/', $item->nodeValue, $matches);
[$original, $resized] = $matches[0];
printf('URL:%s Original: %s Resized: %s%s', $url, $original, $resized, PHP_EOL);
}
Outputs
URL:https://www.example.com/Pictures/thumbs/0029.jpg Original: 300x623 Resized: 123x200
URL:https://www.example.com/Pictures/thumbs/0133.jpg Original: 300x578 Resized: 135x200
URL:https://www.example.com/Pictures/thumbs/0075.jpg Original: 300x390 Resized: 176x200
URL:https://www.example.com/Pictures/thumbs/0057.jpg Original: 300x436 Resized: 174x200
URL:https://www.example.com/Pictures/thumbs/thumb.png Original: 148x100 Resized: 68x46
URL:https://www.example.com/Pictures/thumbs/thumb.png Original: 148x100 Resized: 68x46
URL:https://www.example.com/Pictures/thumbs/thumb.png Original: 148x100 Resized: 68x46
URL:https://www.example.com/Pictures/thumbs/thumb.png Original: 148x100 Resized: 68x46
URL:https://www.example.com/Pictures/thumbs/0093.jpg Original: 300x458 Resized: 138x200
Answer
Solution:
I will offer a slightly altered approach from John's answer.
Use XPath to access the desired<a>
tags, then grab their values, then isolate the<a>
tag's parent value and use preg_match to isolate the dimensional substring after the keywordto
(\K
resets the fullstring match so that no capture groups are necessary).
Code: (Demo)
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$result = [];
foreach ($xpath->query('//tr/td/div//ul/li/a') as $a) {
$result[] = [
$a->nodeValue,
preg_match('~to \K\d+x\d+~', $a->parentNode->nodeValue, $m) ? $m[0] : ''
];
}
var_export($result);
Note that I am suppressing the html error generated by the<p>
tag.
Why: Should ol/ul be inside <p> or outside?
For this reason, the XPath expression jumps passed thep
tag straight to theul
inside of it.
Output:
array (
0 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/0029.jpg',
1 => '123x200',
),
1 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/0133.jpg',
1 => '135x200',
),
2 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/0075.jpg',
1 => '176x200',
),
3 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/0057.jpg',
1 => '174x200',
),
4 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/thumb.png',
1 => '68x46',
),
5 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/thumb.png',
1 => '68x46',
),
6 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/thumb.png',
1 => '68x46',
),
7 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/thumb.png',
1 => '68x46',
),
8 =>
array (
0 => 'https://www.example.com/Pictures/thumbs/0093.jpg',
1 => '138x200',
),
)
Share solution ↓
Additional Information:
Link To Answer People are also looking for solutions of the problem: please make sure the php redis extension is installed and enabled.
Didn't find the answer?
Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.
Similar questions
Find the answer in similar questions on our website.
Write quick answer
Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.