Note: An update to this script is available here: viewtopic.php?f=5&t=7334&p=78054#p78054index.php:
Code:
<?php
ini_set("display_errors", '1');
function get_var_GET($varname) {
$varname = $_GET[$varname];
if (!get_magic_quotes_gpc()) $varname = addslashes($varname);
return $varname;
}
$url = get_var_GET('url');
if (substr($url, 0, 7) == 'http://') {
$path = substr($url, 0, strrpos($url, '/'));
$domain = substr($url, 0, strpos($url, '/', 7));
$lines = file($url);
foreach ($lines as $line) {
$line = preg_replace('/(<a [^<>]* ?href ?= ?["\']?)([\w:\/.\?&;=]+)([" >])/ie', '"$1index.php?url=" . (substr("$2", 0, 1) == \'/\' ? urlencode("$domain$2") : urlencode("$path/$2")) . "$3"', $line);
$line = preg_replace('/<img [^<>]*>/i', '', $line);
$line = preg_replace('/(<form [^<>]* ?action ?= ?["\']?)([\w:\/.\?&;=]+)([" >])/ie', '"$1index.php?url=" . (substr("$2", 0, 1) == \'/\' ? urlencode("$domain$2") : urlencode("$path/$2")) . "$3"', $line);
echo $line;
}
}
else {
echo '<form method="get" action="index.php"><input name="url" type="text" value="http://"><input type="submit" value="Go"></form>';
}
?>
Note that it doesn't do any kind of input validation... Verifying that the crap you entered is actually a URL. Even scarier, it doesn't check if it's trying to access files hosted on its own server, which could lead to information leaks (loading PHP scripts/configuration files/password files and serving them plain-text?) ...
And I never bothered to filter images, so they are simply stripped. There's other external resources that are not filtered properly, either... For example, <script>, <link>, <style>, <input type="image">, ...