R
E
S
O
U
R
C
E
S
       Home      Products & Services      Contact Us      Links


WebHatchers will design & develop your site for you.
_______________________

Website Menu Heaven: menus, buttons, etc.
_______________________

Send us your questions.
_______________________

site search by freefind
_______________________

HOME
SEO, Google, Privacy
   and Anonymity
Browser Insanity
JavaScript
Popups and Tooltips
Free Website Search
HTML Form Creator
Animation
Buttons and Menus
Counters
Captchas
Image Uploading
CSS and HTML
PHP
AJAX
XPATH
Website Poll
IM and Texting
Databases—MySQL
   or Not MySQL
Personal Status Boards
Content Management
   Systems
Article Content
   Management Systems
Website Directory
   CMS Systems
Photo Gallery CMS
Forum CMS
Blog CMS
Customer Records
   Management CMS
Address Book CMS
Private Messaging CMS
Chat Room CMS
JavaScript Charts
   and Graphs




Free Personal Status Boards (PSB™)

Free Standard Free PSB

Free PSB Pro Version

Free Social PSB

Free Social PSB Plus (with Email)

Free Business PSB

Free Business PSB Plus (with Email)

PSB demo

Social PSB demo

Business PSB demo

So what's all this PSB stuff about?

Chart comparing business status boards

PSB hosting diagram

PSB Licence Agreement



Copyright © 2002 -
MCS Investments, Inc. sitemap

PSBs, social networking, social evolution, microcommunities, personal status boards
PSBs, social networking, business personal status boards
website design, ecommerce solutions
website menus, buttons, image rotators
Ez-Architect, home design software
the magic carpet and the cement wall, children's adventure book
the squirrel valley railroad, model railroad videos, model train dvds
the deep rock railroad, model railroad videos, model train dvds

Get Links from Web Page

This script will get the links from a web page. It is limited to .html, .htm, .php, and .shtml extensions.

The DOM extension is enabled by default in most PHP installations, so the following should work fine—it does for us. The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It supports XPATH 1.0. XPATH has been around awhile. What is it? XPath is a syntax for defining parts of an XML document. It uses path expressions to navigate in XML documents. It contains a library of standard functions. It is a major element in XSLT. It is a W3C recommendation.

But you can use it to parse web pages as well, as the code below demonstrates. To make the code more useful, we processed the retreived links. In the script, we first defined a URL to search. You can do this from an HTML form and a POST if you wish to not hardwire the web page address like we did below—use your own URL in place of http://www.css-resources.com, please. This will require a change to substr($url,0,28) since most URLs are not 28 characters long. Next, the for loop looks for the href attribute and sticks the related link into $url. We didn't want anchors in links, so we searched for # and dumped this and the rest of the anchor from the rest of the link. We didn't want query strings in links, so we searched for ? and dumped this and the rest of the query from the rest of the link. You may leave out these 2 lines if you desire query strings and anchors. Next, we made sure that if a link had http in it, it was from the current domain, not another site. We also made sure that only .html, .htm, .php, and .shtml extensions were used. If not, we dumped them. If you'd like more or fewer extensions, add them in the appropriate place in the script. We also made sure the string length was 5 or more for each link. We stuck all these links in an array, counted them (the total went into the $r variable), and dumped duplicate array values and filled the holes that were left. We'd have used array_unique, but it has a BUG! The code $a=array_keys(array_flip($a)) works great.

<?php
$a=array();$n=0;$f='http://www.css-resources.com';
$html = file_get_contents($f);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$w=strrpos($url,"#");if ($w){$url=substr($url,0,$w);}
$w=strrpos($url,"?");if ($w){$url=substr($url,0,$w);}
$ok="0";if ((substr($url,0,28)==$f || substr($url,0,4)<>"http") && (substr($url,-4)==".htm" || substr($url,-4)=="html" || substr($url,-4)==".php")){$ok="1";} //dumps anchors (#d), query strings (? etc), or offsite links
if (strlen($url)>4 && $ok=="1"){$a[$n]=$url;$n++;}}
$r = count($a);
$a=array_keys(array_flip($a)); //dump duplicate array values and fill the holes that are left; array_unique has BUG!
for ($i = 0; $i < $r; $i++) {echo $a[$i]; echo "<BR>";}
?>