javascript - How to search on any website in Website Scraping
Get the solution ↓↓↓Solution:
You will have to check and see how the form works - to which endpoint the data is posted. Then just do the same in your script and process the result (which might be servied in various different formats - JSON, HTML, XML etc.). Sometimes there might be added security, in particular if it's a .NET site that uses viewstate.
Answer
Solution:
A somewhat straight forward suggestion would be to run a script on submission of the form that searches the text in each of the web pages in your working directory to find a match, then display a page with links to the found matches.
I will use PHP for my description of how this is done.
With this in mind, first learn how to read entire pages (i.e. webpages) into a string: http://php.net/manual/en/function.file-get-contents.php
//YOU WILL HAVE TO LINE THIS UP WITH YOUR WORKING FILE NAMES
$home = file_get_contents('./home.php', FILE_USE_INCLUDE_PATH);
or I suppose you could just search for the actual webpage/URL like so:
$home = file_get_contents('http://www.example.com/');//IMAGINE THIS IS REALLY HOME.PHP
$homePageName = "home.php";//JUST HERE TO SHOW AN EXAMPLE
Example:
///YOUR FORM/INPUT BOX
<form action="search.php" method="post">
<input type="text" name="findMe" placeholder="Search...">
</form>
Now search.php
$search = $_POST['findMe'];
//$search = "example";//THIS WOULD WORK, BUT I WAS SHOWING HOW TO USE FORM
//IF WORD FOUND IN HOME PAGE
if (stripos($home, $search) !== false) {//USING EXAMPLE.COM TO SHOW IT WORKS
echo '<a href="'.$homePageName.'">'.$homePageName.'</a>';
}
Then if you want to be simplistic and not use an array to store the found pages, take the same code above and use it for every page you want searched (i.e. home, about, products, etc..).
Now a user can search your site (or the pages you want indexed), to find all pages that have matching text. If you want specific keywords to be searched, just add them to the page metadata and the process I have described will still work as it searches everything that makes up the page.
<meta name="keywords" content="keyword1, keyword2, keyword3 " />
Share solution ↓
Additional Information:
Link To Answer People are also looking for solutions of the problem: mysqli::real_connect(): (hy000/2002): connection refused
Didn't find the answer?
Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.
Similar questions
Find the answer in similar questions on our website.
Write quick answer
Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.