Free Website Search Script and Tutorial
On the free website search forms for site search page, you will find the needed site search forms (as pictured above) and CSS style codes for you to use on your website so that visitors can search for words or phrases and get ads-free search results. Please abide by the terms of use. If you'd rather have a site search page than just a search form (as pictured above) that you put in the HTML code of one of your site's pages, see the search page search-this-site.php whose script and tutorial are here: Free-Website-Search-Script.html. The script for the Site Indexer is index-site.php, whose script and tutorial are available on this page: index your site so you can have a free site search on your website.
The Script
The script (search-website.php) is below—in bold. It's the PHP script run by any of the search forms on this page: free website search forms for site search.
The page head section has standard CSS styles except for word-wrap:break-word;white-space:nowrap;overflow:hidden;text-overflow:ellipsis. The first style doesn't actually work on everything, so the rest of the styling is for what happens when something that the search routine finds (in the indexed website data in the MySQL database) insists on being about 3 screens wide, forcing a horizontal scroll bar to appear in all its loveliness. We force the search results to stay confined to one page width, when the break-word property goes awry, so the rest of the overflow is hidden and at the end of the chopped-off line is placed an ellipsis (...).
We start out the PHP script with a function called highlight(). It case-insensitively searches for $S (the search term) in $x (some text to search that has been put into the PHP variable $x), and needs to return $x with all search terms highlighted in light blue. The PHP function preg_match_all() is used to find the matches and place then between span tags where the text background color can be styled.
Next we get the config.php data into the mix with an include so the connection to the MySQL database will work right and get our search routine access to all the juicy data which our indexer (previous to any searching) has cleverly inserted into records in our database table called sitepages. Next we get the POSTs from the search form that give words to search for and type of search to the PHP script. If $t is "words", the words in the search string get seen as a list of words to be dealt with separately; if $t is not "words", the words in the search string will be seen as an exact phrase to match and results will only be displayed from pages with this exact phrase. The words type of search is the most complex type, and it will be handled first in our script, with the bottom part of the script starting with
}else{ dealing with the exact phrase searches.
The script next looks to see if there have been any search term submissions, and if not, it asks for them. It looks for the word THE in any form or search terms under 3 characters long and prompts for better input if it finds either of these. Then a preg_replace() input filter dumps everything but legal characters from the search input. Only . ? , ! , space, letters, or numbers are allowed. The mysql_real_escape_string() function that comes next is wise to use on all user input, but since all illegals were already dumped out of the search string, it's not needed. It explains the also unneeded stripslashes() functions further in the script. You often need this when data is pulled from MySQL databases since mysql_real_escape_string() functions escape quotes and such, so these need removing by stripslashes(), so we do so. But since the filter already took care of that, let's just say we're displaying good security habits, however unneeded in this case, and let it go at that.
Now we initialize $score and $N arrays. In the current circumstances, reading the $N field from the database table to get numbers for the $N array is not needed, since it is numbers from 1 to the number of pages that have been indexed, and so is the id numbers of each record. But we have other search scripts that do need this db table reading to get the N field numbers into the $N array. They are here: articles CMS, website directory CMS, photo gallery CMS, forum CMS, and blog CMS. In these search scripts, users are allowed to delete any no-longer-needed records, so consequently both id numbers and N field numbers often end up both nonsequential and in wacky order. So getting each record's N field value turned out to be essential in these cases.
However, in the current circumstances, $number=mysql_result(mysql_query("SELECT COUNT(id) FROM sitepages"),0);$N=range(1,$number); works fine and is faster than db table reading. Likewise, $score=array_fill(0,$number,0); outpaces a for loop that puts the 0s in $score, which is the usual method.
We begin the script section dealing with the "words" search form only. The $searchtermarray is now populated with each individual word found in the search string input, by use of the explode() function which seems built for such a task. The count() function gets the number of terms in the array and this gets put in the $terms variable.
To use less memory and speed up the query below, we used SELECT id rather than SELECT *, below:
$r=mysql_query("SELECT id FROM sitepages WHERE title LIKE '%" . $S . "%' OR content LIKE '%" . $S . "%' OR description LIKE '%" . $S . "%'") or die('Error ,search failed');
SELECT * is the usual way of using SELECT FROM WHERE LIKE statements in MySQL. The query above appears twice in the script and its purpose is to see if there are pages that have any of the search terms, so it knows if the search should be abandonned or not. It's important to notice the statement on the PHP mysql_query page that says that even if you never fetch anything via mysql_fetch_array(), a resultset is fetched anyway when you use mysql_query(). You might assume wrongly that if you did not use mysql_fetch_array() and fetch data (which we did NOT), no fetching would occur, so * would not waste time and memory. That is logical, but incorrect. We could use mysql_unbuffered_query() to send an SQL query to MySQL without fetching and buffering the result rows, but it still returns a resultset, so it wouldn't help. SELECT id is the best way to go—it fetches the smallest resultset.
Next, if any page has any matches of the search term(s), we loop through the search terms one at a time and count how many search matches there are for it on each page (which requires a loop in a loop) and add it to the $score array element number corresponding to its website page record in the MySQL db table. We are determining relevance by letting the highest scored pages be displayed first, and the lowest scored pages be displayed last—except that scores of 0 means we don't display that page at all in the search results.
To optimize the code very slightly, you could delete the initial search loop (that results in either $gotmatch=1 or $gotmatch=0) and just see if the searches all resulted in a 0 score, and if so, give the alert about unsuccessful search. The way to do that is to—just before array_multisort—use array_sum($score) and if the result is 0 give the alert about unsuccessful search, remembering to use else to have either the alert OR results displaying. We tried this but the speed increase was under a second for a 348-page site. Okay, you're asking yourself why we included this loop which looks through the entire MySQL table for each search term, in the top half of the script code about words type searches. (The same thing happens in the bottom half of the script code about exact phrase searches except there's no loop since there's only one term to match.) The answer is simple: it can speed up the search a whole lot. It only takes a second to look for terms and give you the "The search was unsuccessful." message, but several seconds to get this message if you delete that loop (which was displayed above) in 2 places. It's simply a bad idea to mess with the loop!
Note that the PHP function array_multisort() is used on the $score and $N arrays to get them sorted in descending order for the page displaying operation. Without the multisort, it would not be possible to get the $N array—where the record's N values are—to guide the search results being displayed according to relevance by using SELECT * FROM sitepages WHERE N = '$N[$k]'.
Next, the relevant data is pulled from the resultset via mysql_fetch_array(). Notice that only where the scores are greater than 0 does displaying occur. Now we loop through the search terms highlighting as we go any matched search terms. Then the search results get displayed and the page url gets used to create a link using the title as link text. Next, the description—if one is found in the site indexing process—is displayed in blue text. Now it's time to display the page content (after a horizontal line that makes the display look better).
To display a search term first, in the snippets, we use a for loop to find a search term that is in the page content. In the loop comes $starter=stripos($content,$S); if ($starter){break;} to figure out how to start our snippet on a search term, since the function finds the string position of the first search string ($S) match in the page content string ($content), and if it finds a match, breaks out of the loop. Next, if (!$starter){$starter=42;} gets $starter set for the next line in case the search comes up empty in the page content. If the search string is not in the page content, the $starter variable being 42 offsets the $starter-42 in the next line so the snippet begins at the very start of the page content string at 0. Then echo "..."; $x=substr($content,$starter-42,350) prints an ellipsis then grabs the snippet beginning where the span tag in front of the matched search term is (the span tag in highlight() is 42 characters long). But, because arbitrarily grabbing 350 characters may slice a span tag before the end tag fully exists in the snippet, we have to do some fancy footwork to make the snippet come out right, occasionally. That is what the next few lines are for. The code strlen($S)+49 is the length of the search term plus the span tags with the CSS styling in them (from the highlight() function). If the start position of <s is close enough to the end of the snippet to make any part of the SPAN-SEARCHTERM-SPAN concatenation be beyond the end of the snippet, we get a broken tag (open span OR close span tag), OR just the open span tag and part of the search term. When this happens it would make the lightblue background bleed into the next area where the page url is displayed, making that have a lightblue background as well, so we added a span style='color:green;background-color:#ddd' to fix any bleed. The display page background color is #ddd, so this works. Since the display from the search result shows page snippets, we used another ellipsis at the end of the snippet, you may notice.
The rest of the code is for "phrase" search forms (or choosing this in the search form where you choose between phrase and words via radio buttons). The only real difference in the "phrase" section that starts after the }else{ is that there is no extra loop that causes search words to be searched for one at a time on each page in both the scoring section and the search results section. The scoring is the same, the multisort is the same, and the snippet display is the same.
Finally the info box in the left sidebar gets the following info in it: "No hyphens (-) or underscores (_) or Enter/Return allowed in search terms. Use letters, numbers, spaces and these: , . ? ! in searches. Click match exact phrase to match exact phrases only. Use search for words to search for 2 or more terms at once. Relevancy will determine the order of sitepages returned as search results—whatever sitepage has the most word matches will be displayed first as it's the most relevant to your search. Servers often have 30-sec. timeouts or memory limits so to avoid these, don't use search for words unless necessary, especially if there are a lot of pages indexed."
The average website is, statistically, 273 to 441 pages, say the best estimates, though most websites are a couple of pages to introduce person or product or item or place, in reality. No site search needed for these sites! But if your site is average (273 to 441 pages), you DO need site search. Use the exact phrase search form for that many pages. Even though one can often get away with 3- or 4-word searches and using the words search form rather than the exact prase search form with sites with this many pages, it sometimes leads to 30-sec. timeouts or recoverable memory errors—depending on the current traffic level of the internet, the server business, and the presence or absence of the search terms in the site's indexed pages.
Hopefully you've learned a bit from this tutorial and will grab this script and use it. To learn even more, once you've got it working, try a few experiments. Once these work, try the optimization where the first loop goes away and you use array_sum($score) just before array_multisort and if the result is 0 give the alert. If you happen to be a real brainiac, try optimizing it for speed even more than it already is. How? If we knew, WE would do it! We'd be real curious to see what you come up with. If you're like us, the main thing is that you love experimenting with PHP scripts and have fun tweaking and fine-tuning code to see what you can get PHP to do. So—regardless of whether you tweak code or have fun with the search script just the way it is—have a good time and hopefully learn a lot as well!
The script below is called search-website.php, so when you copy the code below, name the file search-website.php.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<TITLE>Search Website</TITLE>
<meta name="description" content="Search Website">
<meta name="keywords" content="Search Website,Search my Website,Search a Website,Search site,php,CMS,javascript, dhtml, DHTML">
<style type="text/css">
BODY {margin-left:0; margin-right:0; margin-top:0;text-align:left;background-color:#ddd}
p, li {font:13px Verdana; color:black;text-align:left;text-indent:2em;margin-bottom:-1em}
td {padding:10px}
h1 {font:bold 28px Verdana; color:black;text-align:center}
h2 {font:bold 24px Verdana;text-align:center}
h3 {font:bold 15px Verdana;}
.textbox {position:absolute;top:50px;left:190px;width:772px;word-wrap:break-word;white-space:nowrap;overflow:hidden;text-overflow: ellipsis;}
.info {position:absolute;top:0px;left:2px;width:160px;background-color:#bbb;border:1px solid blue;padding:5px}
</style>
</head>
<body>
<h1>Search Website</h1>
<div class='textbox'>
<?php
function highlight() {
global $x, $S;
if (strlen($x) < 1 || strlen($S) < 1) {return;}
preg_match_all("/$S+/i", $x, $matches);
if (is_array($matches[0]) && count($matches[0]) >= 1) {
foreach ($matches[0] as $match) {
$x = str_replace($match, '<span style="background-color:lightblue;">'.$match.'</span>', $x);
}}}
include_once"config.php";
$S=$_POST['search'];$t=$_POST['searchtype'];
if(!isset($S)){echo '<script language="javascript">alert("Enter search terms.");</script>';
}else{
if (strlen($S)<3 || $S=="the" || $S=="The" || $S=="THE") {echo '<script language="javascript">alert("Enter longer search terms.");</script>';
}else{
$S=strip_tags($S);
$pattern2 = '/[^a-zA-Z0-9\\s\\.\\,\\!\\?]/i';
$replacement = '';
$S=preg_replace($pattern2, $replacement, $S);
$S=mysql_real_escape_string($S);
$number=mysql_result(mysql_query("SELECT COUNT(id) FROM sitepages"),0);
$N=range(1,$number);
$score=array_fill(0,$number,0);
if($t=="words"){
$gotmatch=0;
$searchtermarray = explode(" ", $S);
$terms=count($searchtermarray);
for($i=0;$i<$terms;$i++) {
$S=$searchtermarray[$i];
$r=mysql_query("SELECT id FROM sitepages WHERE title LIKE '%" . $S . "%' OR content LIKE '%" . $S . "%' OR description LIKE '%" . $S . "%'") or die('Error ,search failed');
$num_rows = mysql_num_rows($r);
if($num_rows>0){$gotmatch=1;}
}
if ($gotmatch==0){echo '<script language="javascript">alert("The search was unsuccessful.");</script>';$S='';
}else{
for($i=0;$i<$terms;$i++) {
$S=$searchtermarray[$i]; //do not need multi-term $S anymore--just the $searchtermarray[$i] values
for($k=0;$k<$number;$k++) {
$res = mysql_query("SELECT * FROM sitepages WHERE N='$N[$k]'") or die(mysql_error());
while($row = mysql_fetch_array($res)){
$x=$row['title']." ".$row['description']; $xxx=$row['content'];
}
preg_match_all("/$S+/i", $x, $matches); $m=count($matches[0]);
$score[$k]=$score[$k]+$m;
preg_match_all("/$S+/i", $xxx, $matches); $m=count($matches[0]);
$score[$k]=$score[$k]+$m;
}}
array_multisort($score,SORT_DESC,SORT_NUMERIC,$N);
echo '<table width="772" border="1">';
for($k=0;$k<$number;$k++) {
$res = mysql_query("SELECT * FROM sitepages WHERE N = '$N[$k]'") or die(mysql_error());
while($row = mysql_fetch_array($res)){
$tt=$row['title'];
$dc=$row['description'];
$cn=$row['content'];
$pu=$row['pageurl'];
if($score[$k]>0){
for($i=0;$i<$terms;$i++) {
$S=$searchtermarray[$i];
$S=stripslashes($S); //unescape quotes and backslashes so they're normal
$x=stripslashes($tt);highlight();$tt=$x; //highlight search terms
$x=stripslashes($dc);highlight();$dc=$x; //highlight search terms
$x=stripslashes($cn);highlight();$cn=$x; //highlight search terms
}
echo '<tr><td><b>';
echo "<a target='_blank' HREF='".$pu."'>".$tt."</a>";
echo "</b><BR><span style='color:blue;'>".$dc."</span>";
echo "<hr>";
$content=$cn;
for ($i=0;$i<$terms;$i++) {$S=stripslashes($searchtermarray[$i]);
$starter=stripos($content,$S);if ($starter){break;}}
if (!$starter){$starter=42;}
echo "..."; $x=substr($content,$starter-42,250);
$E=strlen($x);$L=strlen($S)+49;$B=$E-$L;
$Y=strripos($x,'<s',$B+1);if ($Y)
{$x=substr($x,0,$Y);$x=$x.'<span style="background-color:lightblue;">'.$S.'</span>';}
echo $x."...<BR><I><span style='color:green;background-color:#ddd'>".$siteurl."/".$pu."</span></I>";
echo "<br><br></td></tr>";
}}}
echo '</table>';
}
}else{
$gotmatch=0;
$r=mysql_query("SELECT id FROM sitepages WHERE title LIKE '%" . $S . "%' OR content LIKE '%" . $S . "%' OR description LIKE '%" . $S . "%'") or die('Error ,search failed');
$num_rows = mysql_num_rows($r);
if($num_rows>0){$gotmatch=1;}
if ($gotmatch==0){echo '<script language="javascript">alert("The search was unsuccessful.");</script>';$S='';
}else{
for($k=0;$k<$number;$k++) {
$res = mysql_query("SELECT * FROM sitepages WHERE N='$N[$k]'") or die(mysql_error());
while($row = mysql_fetch_array($res)){
$x=$row['title']." ".$row['description']; $xxx=$row['content'];
}
preg_match_all("/$S+/i", $x, $matches); $m=count($matches[0]);
$score[$k]=$score[$k]+$m;
preg_match_all("/$S+/i", $xxx, $matches); $m=count($matches[0]);
$score[$k]=$score[$k]+$m;
}}
array_multisort($score,SORT_DESC,SORT_NUMERIC,$N);
echo '<table width="772" border="1">';
for($k=0;$k<$number;$k++) {
$res = mysql_query("SELECT * FROM sitepages WHERE N = '$N[$k]'") or die(mysql_error());
while($row = mysql_fetch_array($res)){
$tt=$row['title'];
$dc=$row['description'];
$cn=$row['content'];
$pu=$row['pageurl'];
if($score[$k]>0){
$S=stripslashes($S); //unescape quotes and backslashes so they're normal
$x=stripslashes($tt);highlight();$tt=$x; //highlight search terms
$x=stripslashes($dc);highlight();$dc=$x; //highlight search terms
$x=stripslashes($cn);highlight();$cn=$x; //highlight search terms
echo '<tr><td><b>';
echo "<a target='_blank' a HREF='".$pu."'>".$tt."</a>";
echo "</b><BR><span style='color:blue;'>".$dc."</span>";
echo "<hr>";
$content=$cn;
$starter=stripos($content,$S);if ($starter==FALSE){$starter=42;}
echo "..."; $x=substr($content,$starter-42,350);
$E=strlen($x);$L=strlen($S)+49;$B=$E-$L;
$Y=strripos($x,'<s',$B+1);if ($Y)
{$x=substr($x,0,$Y);$x=$x.'<span style="background-color:lightblue;">'.$S.'</span>';}
echo $x."...<BR><I><span style='color:green;background-color:#ddd'>".$siteurl."/".$pu."</span></I>";
echo "<br><br></td></tr>";
}}}
echo '</table>';
}}}
mysql_close();
?>
</div>
<div id='info' class='info'>No hyphens (-) or underscores (_) or Enter/Return allowed in search terms. Use letters, numbers, spaces and these: <B> , . ? ! </b> in searches. Click "match exact phrase" to match exact phrases only. Use "search for words" to search for 2 or more terms at once. Relevancy will determine the order of sitepages returned as search results—whatever sitepage has the most word matches will be displayed first as it's the most relevant to your search.<br>Servers often have 30-sec. timeouts or memory limits so to avoid these, don't use "search for words" unless necessary, especially if there are a lot of pages indexed. <A HREF="javascript:history.go(-1)">GO BACK</A> </div>
</body>
</html>