RawDev.net - Developing developement developing developers.
Home - Files - Pastebin - Mail

Posts Tagged "php"

RawGallery, a compact gallery script

Friday, April 4th, 2008 by Hekos

Recently i decided to opensource one of my scripts i have been using in the past.

Features:

(more...)

Tags: ,
Posted in Linux, Scripting - No Comments

Size of XKCD

Saturday, March 15th, 2008 by Hekos

As the last post was about the size of bash.org, this one is about xkcd, the famous comic site, a simple set of scripts and you get the whole set and a few stats:
Use script wisely, it's a strain on servers.

#!/bin/bash
echo "Downloading 395 pages."
for i in `seq 1 395`;
do
	if [ -s "xkcd/$i" ]; then
		continue
	else
		echo -n "`date +%H:%M:%S`: Trying $i ..."
		lynx --source "http://xkcd.com/$i" > "xkcd/$i"
		echo -n " Done. Image:.. "
		wget -q -p "comics" -nH "http://imgs.xkcd.com/comics/"`awk 'BEGIN{FS="<img src=\"http://imgs.xkcd.com/comics/";RS="\" title="}/<img/{print $2}' "xkcd/$i"`
		echo " Done."
		sleep 2s
	fi
done
echo "All done."

This piece of code does sometihng special, it takes the name of the image and uses wget to download it.

$n=1;
$vse=0;
while ($n &lt; 410) {
	unset ($fajl);
	$fajl=file_get_contents("original/".$n);
 
	preg_match_all("|
&lt;p class=\"quote\"&gt;(.*)&lt;b&gt;#(.*)&lt;/b&gt;(.*)
&lt;p class=\"qt\"&gt;(.*)
 
|Us", $fajl, $out);
	$i=0;
	while (isset($out[0][$i])) {
		echo '('.$out[2][$i].")\n".$out[4][$i]."\n";
		echo $out[2][$i]."\n".$out[4][$i]."\n";
		$i++;
		$vse++;
	}
	$n++;
}
echo "\n(".$vse.")";

And a parser that makes the final big file of everything, coincidentally also making the comments easy to read.
Comics make the most part of the download, with ~22 MB.

And as usual, the download link: LINK (22mb), or email me for the data.

Tags: , , ,
Posted in Hacking, Scripting - No Comments

Size of Bash.org

Saturday, March 15th, 2008 by Hekos

I spent the last few hours on a simple question, how large is the worlds largest irc quote database (bash.org) ?
Thinking specifically of the quotes themselves.

So first i had to get them all, a simple bash script was sufficient.

#!/bin/bash
echo "Downloading 409 pages."
for i in `seq 1 409`;
do
if [ -s "original/$i" ]; then
continue
else
echo -n "`date +%H:%M:%S`: Trying $i ..."
lynx --source "http://www.bash.org/?browse=$i" &gt; "original/$i"
echo "Done."
sleep 10s
fi
done
echo "All done."

Please, do not use that script, it is a strain on the bash servers, instead you can grab the original files at the end of the article.
After a couple of hours that was done, and i had my next script ready as well;

$n=1;
$vse=0;
while ($n &lt; 410) {
unset ($fajl);
$fajl=file_get_contents("original/".$n);
 
preg_match_all("|
<p class="\">(.*)<strong>#(.*)</strong>(.*)
<p class="\">(.*)|Us", $fajl, $out);
$i=0;
while (isset($out[0][$i])) {
echo '('.$out[2][$i].")\n".$out[4][$i]."\n";
echo $out[2][$i]."\n".$out[4][$i]."\n";
$i++;
$vse++;
}
$n++;
}
echo "\n(".$vse.")";
 

The last line is to make sure i got all of them, 20440 at the time.
Ran it with shell, and piped to "final": php parser.php > final

So, the conclusion was, the size of bash.org is ~5 MB
This are the files if you want them: link. (or email me)

Tags: , , ,
Posted in Hacking, Scripting - No Comments