R
E
S
O
U
R
C
E
S
       Home      Products & Services      Contact Us      Links


WebHatchers will design & develop your site for you.
_______________________

Website Menu Heaven: menus, buttons, etc.
_______________________

Send us your questions.
_______________________

site search by freefind
_______________________

HOME
SEO, Google, Privacy
   and Anonymity
Browser Insanity
JavaScript
Popups and Tooltips
Free Website Search
HTML Form Creator
Animation
Buttons and Menus
Counters
Captchas
Image Uploading
CSS and HTML
PHP
AJAX
XPATH
Website Poll
IM and Texting
Databases—MySQL
   or Not MySQL
Personal Status Boards
Content Management
   Systems
Article Content
   Management Systems
Website Directory
   CMS Systems
Photo Gallery CMS
Forum CMS
Blog CMS
Customer Records
   Management CMS
Address Book CMS
Private Messaging CMS
Chat Room CMS
JavaScript Charts
   and Graphs




Free Personal Status Boards (PSB™)

Free Standard Free PSB

Free PSB Pro Version

Free Social PSB

Free Social PSB Plus (with Email)

Free Business PSB

Free Business PSB Plus (with Email)

PSB demo

Social PSB demo

Business PSB demo

So what's all this PSB stuff about?

Chart comparing business status boards

PSB hosting diagram

PSB Licence Agreement



Copyright © 2002 -
MCS Investments, Inc. sitemap

PSBs, social networking, social evolution, microcommunities, personal status boards
PSBs, social networking, business personal status boards
website design, ecommerce solutions
website menus, buttons, image rotators
Ez-Architect, home design software
the magic carpet and the cement wall, children's adventure book
the squirrel valley railroad, model railroad videos, model train dvds
the deep rock railroad, model railroad videos, model train dvds

Index Your Website and Use Our Site Search Forms—Free

Please read the terms of use and other info on how to use our indexer and site search forms to have an ad-free site search on your website. And it's free. The search forms are real, even though you cannot search with them on our pages because we require that you use them only on your own website. It's simple to index your site and install our search form on any web pages you wish. The search form code is on this page: search forms.

Site Search Scripts, Site Indexing Scripts, and MySQL

The bottom line here is that in order to have a site search on your website, you first have to index your site. What this means is that you run our indexing script that will crawl through your website, grabbing most of the searchable text, and storing it in a MySQL database. Not stored are tags such as script, body, paragraph, head, CSS styles, JavaScript, etc. In fact NO tags are stored, nor is the data between most tags' start tag and end tag. But the content in the head tag's title and description is stored, and so is the page URL, and everything that isn't tags between start tags and end tags of body tags, paragraph tags, headings tags like <H1>, etc.

If we had one of those site searches that stores the site indexes in their database for you, we would do what they do: support the database hosting on their servers by the use of ads. But our free website indexing and site search scripts support ad-free search results, so therefore we require that you store the indexed site in a MySQL database on your server. Our css-resources.com website currently has nearly 16 MB of files on it, and 3.6 MB of these files are of the HTML, HTM, and PHP types that get indexed, and the rest are images, etc. When indexed into a database, these pages take up 4.2 MB on the server. The reason they take up slightly more space than the pages themselves is that the hundreds of links in the sidebar are indexed for every single page since they are part of the page content. Like most savvy webmasters, we use a PHP include on every page so that any changes required on the left sidebar with the links, the right sidebar with the ads, or the header area with the logo image require only one page change. So we need only store this sidebar and header data on one single page, and the PHP include pulls it in to every page before it is sent from the server to the browser. So even though the text we grab from every page when our indexer is indexing is smaller in size than the actual page file size, for indexing we include the HTML contents of the PHP include file, which is fairly large. There's really no way to avoid sidebar content during indexing (no other available site search indexer avoids such content either), nor would we want to since it is legitimate page content.

We store the page URL, title, description and content of each page in MySQL database table fields called pageurl, title, description and content. The db table is called sitepages. Perhaps you are nervous about this MySQL stuff. If so, you can cease worrying—we've got your back! MySQL is the most popular open source database around and it's easy to deal with. First, the basics: All decent server hosts let you have at least one MySQL database and at least PHP 5.2 or better (if yours has PHP 4, get them to update—both PHP and MySQL are FREE for them so don't buy excuses). You can put as many tables as you want in a database—subject to the amount of server space your host allots you. If you host your own db, so much the better. Regardless of who hosts, you simply go to your server control panel called cPanel, check that you are alloted at least one "SQL Database" in the sidebar (if not, contact the host ASAP), find the databases section, click on the MySQL Databases icon, run the video tutorial at the top of the next page, then create a db by simply naming it. If you see no evidence of a db, contact your host ASAP. If they have no db for you, lose this host ASAP and get a decent one! Anyway, after naming/creating your db, go lower on the page and assign at least one user and one password (make sure it is a strong one!). Select maximum permissions for each user name, save them, then—lower down the page—add these users to your newly named MySQL database.

Now you have to create your PHP configuration file called config.php. As you can see from that link, this is just a way to connect to your database by using your db name and a username and password. Once you FTP this file into your server in the public_html folder, with YOUR info in it, you are ready to use our scripts to index your site. The script on this page will create your table in your new db for you automatically once you Submit the full website URL (with its filename at the end, such as index.html). Submit it in the indexing script you save (and FTP) from the script in bold at the bottom of the page you are now on. If something goes wrong, you did not enter your info correctly in the config.php file. Note that if your db is called Lotsastuff, you don't enter that as the db name. On our server, we'd enter cssres_Lotsastuff. On yours, you can tell the right name by going back to the database creation screen and, at the top, look at the Create New Database area and you'll see an input box that says New Database: cssres_ just before the input box. Except yours will be New Database: whatever_ instead. So in your config.php file, call your databasename: whatever_Lotsastuff or whatever your db is called. And the username is cssres_yourusername, or for you, whatever_yourusername. Give the actual username, not "yourusername". But the password is just the password—don't add any prefix. Easy enough, right? By the way, the variable names and values we gave things in our configuration file have "psbhost" in them but you can call your string variables anything you want—like the script below—as long as your values are correct for YOUR db. You could use the following for your config.php file and do fine—just make sure you put YOUR info in place of the data in quotes but leave the PHP variable names alone:

<?php
$emailaddress = "emailaddress"; //EDIT ME (totally optional)
$roothostname = "localhost"; //LEAVE ALONE
$username = "username"; //EDIT ME
$password = "password"; //EDIT ME
$databasename = "databasename"; //EDIT ME
mysql_connect("".$roothostname."","".$username."","".$password."") or die(mysql_error());
mysql_select_db("".$databasename."") or die(mysql_error());
?>


We suggest guidance from this site and/or this site for using phpMyAdmin.

To sum up, the db needs creation, as do the username and password, and you need to have a correct config.php file to connect you to the db by naming the db, username, and password. But it is the indexing script itself that will create a table in your db and put the data in it. You will find that a MySQL db multiplies exponentially the capabilities of PHP scripts, and PHP and MySQL are both FREE and relatively easy to interact with. Once you have a table created by indexing your site, check it out by clicking the phpMyAdmin icon in the Databases area of your cPanel. Is that cool or what?

The Site Indexing Script

The script starts with an error reporting function. So error_reporting(E_ERROR) lets you know about the nature of any fatal run-time errors. Errors that can not be recovered from are called "fatal." Execution of the script is halted. Examples: Server time-outs because script execution time limit is usually 30 seconds on hosted servers (unless you are the host), and out-of-memory errors while running a script. Both of these can happen in this script if network connections are weak or you try to index too big a website. We've indexed sites of 789 pages with no problem with this script, but when network connections are weak, we can have a problem with a smaller site. The lesson is to do no indexing during peak network use or peak server use times. This script makes extensive use of the file_get_contents() function, which actually goes out to web pages on the Internet and reads their contents, processes them, and stores them in a MySQL database table. While crawling, the script finds links to other pages and crawls those too. All links are stored in an array and as new links are found, the script searches the array to make sure it's not already in the list. Each page gets extensive processing. You can see why commercial web crawlers host their own servers that have no time limits, since doing all the above in under 30 seconds is asking a lot!

After the error reporter and the including of config.php, an HTML form is echoed to the screen if the user has not yet submitted a URL. If s/he has submitted, the URL is processed. The form action is to reload the page (index-site.php) while POSTing the URL to the PHP script.

If a URL was submitted, it is checked to see that it ends in php, html, or htm. If not, the user did not read the instruction that said to include the file name of the home page as well as the rest of the URL. The script jumps all the way to the bottom where an alert is given and an example of doing it right is shown. Then the page reloads.

If the home page file name was correctly included in the URL, we start getting serious. The URL is parsed with the parse_url() function. It returns an array of the URL's parts. The path is the /filename in our case and the / is removed so we just have index.html (or whatever) now in the $home variable. Next we dump the filename from the URL the user input so we have just the site URL in $f without the filename. If there's a / at the end of $f, we dump it. Note that we alse dump tags in the input using strip_tags() and trim any spaces the user input before or after the URL, and change any spaces inside the URL to %20—otherwise the page will screw up since such URLs mess up in PHP functions.

Next we dump the sitepages table if it exists, then recreate it and the contents field is of the mediumtext type, which allows over 16 million bytes. Some pages are long!

Now we add / to the site URL and put it in $g as a part to build URLs with, since the file_get_contents() function cannot work with relative addresses. It needs absolutes. We echo the URL of the site to the browser page. Then we use file_get_contents() on the home page and get all its info.

We use XPATH to evaluate the home page. (There are a few places on the Net to learn about $xpath = new DOMXPath($dom).) In this case, we get all the link URLs, trim off outer spaces, replace inner spaces with %20 so the URL has no holes which prevent it from working correctly in PHP functions, and use strrpos()—which finds the last position in one specific string where there is another specific string to dump anchors (#whatever) and query strings (?whatever). Next relative path syntax such as ../ and ./ is dumped. If the URL starts with / that's dumped too, but / inside the path is left. If the site URL with / at the end is found in the URL (i.e., absolute URL), it is dumped since we want only relative URLs.

Then the script dumps offsite, home page or wrong extension links. Finally, if the URL made it this far and is 5 or more characters in length (minimum allowable relative URL: a.htm), it is added to the $a array. Once the home page links are all in the array, $a=array_keys(array_flip($a)) dumps duplicate array values and fills the holes that are left, since array_unique has a bug and is 10 times slower so it is not used. Why does the above work? array_flip says: "array_flip — Exchanges all keys with their associated values in an array . . . . . If a value has several occurrences, the latest key will be used as its values, and all others will be lost." And array_keys says: "array_keys — Return all the keys of an array". So you can see why the final array dumps duplicate array values and fills the holes that are left, now! Finally, the count() function puts the number of array elements into the $r variable.

Control now jumps over several functions to the line starting with $z=$home and the total link URLs found begin to be printed out with numbers in front of them, counting the total pages getting indexed. The functions grab_title_description() and make_html_searchable() are run now and the processed data is inserted into the sitepages db table sitepages. Then the stream context creation is set so the file_get_contents() function doesn't miss a slow-to-load page. Before we added stream context a tiny fraction of pages would be bypassed due to remote server busyness. If the remote server hangs or fails to respond, you need to try to program your way around it. Like this PHP experts site says: "Most HTTP requests complete in sub-second time," so stream context timeout is not usually needed, but when it is, a timeout of 3 seconds really comes to the rescue during a remote server falter.

Now we run a while loop. The PHP variable $o is the array element number we're now on. We've already processed the home page with functions we'll discuss below. But now we need to loop through all the URLs in the array, processing one page at a time. While on these pages, if more URLs are found not yet in the array, they are added to the end of the array. URLs are echoed to the screen as they are processed. Once the site is indexed, a message telling how many pages were indexed is given in an alert. Now, let's look at the processing functions:

The function add_urls_to_array() adds any new URLs found during site crawling to the main URLs array $a. In the first 3 lines, we find folders and subfolders in the URL and put everything but the file name into $folder. When we run the PHP function file_get_contents(), we use the stream context discussed above to make sure we deal with remote server falters or weak network conditions. The filename parameter (besides stream context) in the function file_get_contents() is $g (e.g., http://www.yoursite.com/) concatenated to $z (e.g., folder/subfolder/file.html). We once again use XPATH to get any links on the page and near the end of this function we use the PHP function array_search() to see if the URL is already in the array, and if not, we add it. If the URL starts with http or ./ or ../, we zero the "folder prepending flag" named $sf. Otherwise, we concatenate $folder and $url to get the correct url for the array (e.g., folder/subfolder/file.html). We replace $g in the url (if we find it) with an empty string—only relative addresses are allowed in our array. If we find "http" in the url after all this, we dump it—it's not part of this website. Incidentally, dumping URL filenames like home, placeholder, and default (and a few others) is needed because we've already processed the home page, and these names are all acceptable home pages on websites but we're only recognizing the one we started with—the one the user was asked to put at the end of the site URL, in the HTML form.

The function grab_title_description() is responsible to find and process the title and description from the page. As it says here preg_match, "$matches[1] will have the text that matched the first captured parenthesized subpattern." The regular expression ([^>]*) means "whatever is between the codes around me"—in this case: title tags. The ^> means not greater-than symbols and the * means zero or more of these. So, the title grabbing works. Next we use the PHP function get_meta_tags()—since meta_tags are where the description lives. As you see, the function returns an associative array and the key "description" gets the correct array value. If there's any encoding, we decode it.

The function make_html_searchable() processes the page content. It dumps the head, style, script, object, embed, applet, noframes, noscript, noembed and comment tags and everything in between them. The &nbsp; (nonbreaking spaces) tokens are replaced with spaces. Any other tags, like P, div, font and span, are replaced but what's between them is not touched, obviously. The [^>]*? in the open tag regular expression allows this generic tag dumper to succeed even if there are attributes in the tag, like class='h', for instance. This is critical since many webmasters put such attributes all over the place, and just trying to catch these tags only, without allowing for attributes, WILL CAUSE TONS OF PAGE TEXT TO SIMPLY VANISH from a page's content string (while it's getting indexed) due to the strip_tags() function coming up in a couple of lines—it dumps both tags and contents. (So if a generic opening tag dumper without [^>]*? misses one or more tags, it will be because it's not a simple <P> tag, but instead a <P id='main'> tag—one with attributes.) In other words, if we did the opening tag the way we did the closing tag (which takes no attributes), we risk losing most of the page content on many website pages, simply due to the oversight about tag attributes. (The "\\0 " is padding plus a space—the tags get replaced by a space.)

The above generic opening tag dumper was tested on sites with and without [^>]*?, and, sure enough, without [^>]*?, a lot of page content was missing (in the MySQL database sitepages table) that had been nestled between paragraph tags with attributes. But that's not the only story we have for you.

It's hard to believe it, but ONE SINGLE MISSING CHARACTER ON ONE PAGE can make the difference between ALL page content showing up and NO page content showing up—on ALL of a site's pages! True story. We tested a website and the indexing was a sick joke. We investigated why. It turned out to be that a missing character in one file created this entire disaster. It's good to run your pages through an HTML validator, since we hadn't noticed that an ending span tag was coded as </span instead of </span>. The good news is that the browsers were forgiving and merciful and let it slide, and everything displayed as it should regardless of our goof. The bad news is that our indexer was neither forgiving nor merciful. Our generic closing tag dumper expects a complete tag—if that's a problem, use the validator: it caught the incomplete tag. So what happened when the indexer indexed is that it left in the </span, so one would think that the strip_tags() function on the next line would have ignored it. It's not a real tag if it's incomplete, right? Wrong. It DID NOT ignore it. That's the good news—sort of. Unfortunately that is also the bad news. The function looked for any tags missed by our bunch of tag dumpers and dealt with them harshly. It treated the partial tag as one of a set and looked for a second span tag. It didn't find it, so it defaulted to the end of the page! strip_tags() says: "Because strip_tags() does not actually validate the HTML, partial, or broken tags can result in the removal of more text/data than expected." (Ya think?!) The end result is that it removed the page content after the broken tag, which was nearly everything. So why would this goofy tag dump the page content of 240 pages? Because it is a PHP included file, included with code like this: <?php include("important-links.html"); ?>. This code is on every page of that site. So the page with the broken tag (which we fixed once we found it, solving the whole problem) was part of the code for all pages—that's how includes work. So the strip_tags() function dumped all page content of all pages, except for the teeny bit of stuff before the broken tag. The moral of the story: always validate your include files.

Our strategy to avoid bad search results, besides making good, well-tested, site search and site indexing scripts, was to let the tag dumpers above help avoid unintentional meanings by replacing tags with spaces. Had we not done this, the last word from one paragraph could get concatenated with the first word of the next, and a search would display weird results and/or find weird results. If a paragraph ends with dog. and the next one starts with Dew, searchers would find neither the lone word dog. nor the lone word Dew, but only the word dog.Dew. So it's important to keep words apart. If one paragraph ends with: then the sales fell off., and the next paragraph starts with: The cliff was high but we climbed it. Then: fell off.The cliff would illustrate why paragraphs need spaces between them.

Finally, we replace carriage returns and newlines with spaces as well as trimming the string, which knocks out spaces, NULLs, and tabs before and after the string.

Our indexer will go at least 3 levels deep if you use folder/subfolder/file.html or ../folder/subfolder/file.html syntax rather than ../folder/../subfolder/file.htm syntax, which confuses the script so you get fewer pages indexed than you wanted. We don't mess with https, pdf, excel, powerpoint, word, text, doc, asp, aspx, xml, xhtml, images, or other file types, or mess with nofollow or noindex attributes, etc. We do not deal with robots, ports, sockets, sessions, keywords, encrypted or hashed or passworded files, or links not on the submitted domain. We just index .html, .htm, and .php file extensions on your domain—period. (The .shtml extension may also work—we haven't tried it. An SHTML html document contains "Server Side Includes" that the server processes before the page gets sent to the browser. And it depends on settings on your particular server.)

The script below is called index-site.php, so when you copy the code below,
name the file index-site.php.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<TITLE>Index Website</TITLE>
<meta name="description" content="Index Website">
<meta name="keywords" content="Index Website,Index a Website,php,CMS,javascript, dhtml, DHTML">
<style type="text/css">
BODY {margin-left:0; margin-right:0; margin-top:0;text-align:left;background-color:#ddd}
p, li {font:13px Verdana; color:black;text-align:left}
h1 {font:bold 28px Verdana; color:black;text-align:center}
h2 {font:bold 24px Verdana;text-align:center}
td {font:normal 13px Verdana;text-align:center;background-color:#ccc}
.topic {text-align:left;background-color:#fff}
.center {text-align:center;}
</style>
</head>
<body>
<?php
error_reporting(E_ERROR);
include_once"config.php";

$f=$_POST['siteurl'];
if (!isset($f)){
echo '<div id="pw" style="position:absolute;top:150px;left:50px;width:950px"><table style="background-color:#8aa;border-color:#00f" border="6" cellspacing=0 cellpadding=6><tr><td><form id="formurl" name="formurl" method="post" action="index-site.php"><b>home page URL (must include /index.html, /index.htm, /index.php or whatever the home page filename is)</b><BR><label for="URL">URL: </b><input type="text" name="siteurl" size="66" maxlength="99" value=""></label><br><br><input type="submit" value="Submit URL"><br><br><input type="reset" value="Reset"></form></td></tr></table></div>';

}else{

if (substr($f,-4)==".htm" || substr($f,-4)=="html" || substr($f,-4)==".php"){
$e=(parse_url($f,PHP_URL_PATH));
if (substr($e,0,1)=="/"){$LLLL=strlen($e);$home=substr($e,1,$LLLL-1);}

$f=strip_tags($f);$f=str_replace($e, "", $f);
$L=strlen($f);if (substr($f,-1)=="/"){$f=substr($f,0,$L-1);}
$f = str_replace(" ", "%20", $f); $f=trim($f);

$sql = "DROP TABLE IF EXISTS sitepages";
mysql_query($sql);

$sql = "CREATE TABLE sitepages (
id int(4) NOT NULL auto_increment,
N int(4) NOT NULL default '0',
pageurl varchar(255) NOT NULL default '',
title varchar(255) NOT NULL default '',
description varchar(255) NOT NULL default '',
content mediumtext NOT NULL default '',
PRIMARY KEY (id)
) ENGINE=MyISAM AUTO_INCREMENT=1";
mysql_query($sql);

// "mediumtext" allows over 16 million bytes

$a=array();$n=0;$o=-1;$g=$f."/"; echo "<B>".$f."</B><BR>";
$t = file_get_contents($g.$home);
$dom = new DOMDocument();
@$dom->loadHTML($t);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$url = trim($url);
$url = str_replace(" ", "%20", $url);
$w=strrpos($url,"#");if ($w){$url=substr($url,0,$w);}
$w=strrpos($url,"?");if ($w){$url=substr($url,0,$w);}
$url = str_replace("../", "", $url);
$url = str_replace("./", "", $url);
if (substr($url,0,1)=="/"){$LL=strlen($url);$url=substr($url,1,$LL-1);}
$ok="0";$url=str_replace($g, "", $url);$L=strlen($url);
if ((substr($url,0,4)<>"http" && substr($url,0,6)<>"index." && substr($url,0,8)<>"default." && substr($url,0,5)<>"home." && substr($url,0,6)<>"Index." && substr($url,0,8)<>"Default." && substr($url,0,5)<>"Home." && substr($url,0,12)<>"placeholder.") && (substr($url,-4)==".htm" || substr($url,-4)=="html" || substr($url,-4)==".php")){$ok="1";} //dumps offsite, home page or wrong extension links
if($L>4 && $ok=="1"){$a[$n]=$url;$n++;}}
$a=array_keys(array_flip($a)); //dump duplicate array values and fill the holes that are left; array_unique has BUG!
$r = count($a);

function grab_title_description(){
global $t; global $z; global $g; unset($m); global $j; global $d;
preg_match('/<title>([^>]*)<\/title>/si',$t,$m);
$j=$m[1];
$b = get_meta_tags($g.$z);$d=$b['description'];
if(mb_detect_encoding($d, 'UTF-8, ISO-8859-1', true) != 'ISO-8859-1'){$d = utf8_decode($d);}
$d = strtr($d, get_html_translation_table(HTML_ENTITIES));
}

function make_html_searchable(){
global $t;
$pp=array('/<head[^>]*?>.*?<\/head>/si',
'/<style[^>]*?>.*?<\/style>/si',
'/<script[^>]*?.*?<\/script>/si',
'/<object[^>]*?.*?<\/object>/si',
'/<embed[^>]*?.*?<\/embed>/si',
'/<applet[^>]*?.*?<\/applet>/si',
'/<noframes[^>]*?.*?<\/noframes>/si',
'/<noscript[^>]*?.*?<\/noscript>/si',
'/<noembed[^>]*?.*?<\/noembed>/si',
'/<!--.*?-->/si');
$t = preg_replace($pp,'',$t);
$t = preg_replace('/&nbsp;/si',' ',$t);
$t = preg_replace("/<[A-Za-z]+[^>]*?>/i", "\\0 ", $t);
$t = preg_replace("/<\/[A-Za-z]+>/", "\\0 ", $t);
$t=strip_tags($t);
$t=preg_replace('/\r\n/', ' ', trim($t));
}

function add_urls_to_array(){
global $a; global $g; global $z; global $t; global $r; $n=$r; $folder="";
$fo=strrpos($z,"/"); if ($fo){$folder=substr($z,0,$fo+1);}
$LLL=strlen($folder);
$t = file_get_contents($g.$z,0,$context);
$dom = new DOMDocument();
@$dom->loadHTML($t);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$url = trim($url);
$url = str_replace(" ", "%20", $url);
if (substr($url,0,4)=="http"){$sf="0";}else{$sf="1";}
if (substr($url,0,3)=="../" || substr($url,0,2)=="./"){$sf="0";}
$url = str_replace("../", "", $url);
$url = str_replace("./", "", $url);
if (substr($url,0,1)=="/"){$LL=strlen($url);$url=substr($url,1,$LL-1);}
if (substr($url,0,4)<>"http" && substr($url,0,$LLL)<>$folder && $sf=="1"){$url=$folder.$url;}
$w=strrpos($url,"#");if ($w){$url=substr($url,0,$w);}
$w=strrpos($url,"?");if ($w){$url=substr($url,0,$w);}
$ok="0";$url=str_replace($g, "", $url);$L=strlen($url);
if ((substr($url,0,4)<>"http" && substr($url,0,6)<>"index." && substr($url,0,8)<>"default." && substr($url,0,5)<>"home." && substr($url,0,6)<>"Index." && substr($url,0,8)<>"Default." && substr($url,0,5)<>"Home." && substr($url,0,12)<>"placeholder.") && (substr($url,-4)==".htm" || substr($url,-4)=="html" || substr($url,-4)==".php")){$ok="1";} //dumps offsite, home page or wrong extension links
$q=array_search($url,$a);if ($L>4 && $ok=="1" && $q===false){$a[$n]=$url;$n++;}}
$r = count($a);
}

$z=$home;$NN=1;echo $NN." ".$z."<BR>";
grab_title_description();
make_html_searchable();
$d=mysql_real_escape_string($d);
$j=mysql_real_escape_string($j);
$t=mysql_real_escape_string($t);
$z=mysql_real_escape_string($z);
$sql="INSERT INTO sitepages(id, N, pageurl, title, description, content)VALUES('', '$NN', '$z', '$j', '$d', '$t')";
$result=mysql_query($sql);

$context = stream_context_create(array('http' => array('timeout' => 3))); // Timeout in seconds
$z="";

while ($o<$r-1){
$o++; $z=$a[$o];$NN=$o+2;echo $NN." ".$z."<BR>";
add_urls_to_array();
grab_title_description();
make_html_searchable();
$d=mysql_real_escape_string($d);
$j=mysql_real_escape_string($j);
$t=mysql_real_escape_string($t);
$z=mysql_real_escape_string($z);
$sql="INSERT INTO sitepages(id, N, pageurl, title, description, content)VALUES('', '$NN', '$z', '$j', '$d', '$t')";
$result=mysql_query($sql);
}

mysql_close();
unset($f);$r=$r+1;
echo "DONE!";

echo '<script language="javascript">alert("'.$r.' pages were indexed. Press a key to submit another URL.");window.location="index-site.php"; </script>';

}else{

mysql_close();
unset($f);

echo '<script language="javascript">alert("Enter full URL with page filename, like this example:\n\nhttp://www.yoursitename/index.html\n\nPress a key to submit another URL.");window.location="index-site.php"; </script>';}

}

?>

</body>
</html>