Converting HTML to plaintext
This afternoon I wrote a simple PHP-function to convert an HTML
formatted message to plain text. I'm using it to create a non-HTML
version of a newsletter. It strips out all the HTML, but before it does
that, it captures al links (modeled after this method
and turns HTML list-items into plaintext list-items. I thought I'd
share the code:
function html2text($text,$wrap=0){
preg_match_all("/(<([\w]+)[^>]*>)([^<]*)(<\/\\2>)/", $text, $matches, PREG_SET_ORDER);
$text = str_replace("<br />","\n",$text);
$text = str_replace("<br>","\n",$text);
$text = str_replace("<BR>","\n",$text);
$text = str_replace("<p>","\n\n",$text);
$text = str_replace("<P>","\n\n",$text);
$text = str_replace("<LI>","\n * ",$text);
$text = str_replace("<li>","\n * ",$text);
$text = str_replace("</LI>","",$text);
$text = str_replace("</li>","",$text);
$text = str_replace("</UL>","\n\n",$text);
$text = str_replace("</ul>","\n\n",$text);
$urlcount=0;
foreach ($matches as $val) {
if($val[2]=="a" || $val[2]=="A"){
preg_match_all ("|href\=([\"'`])(.+?)\1|i", $val[1], $urls);
$urllist[$urlcount]=$urls[2][0];
$text = str_replace($val[0],$val[3]. " [$urlcount]",$text);
$urlcount++;
}
}
if($wrap>0){
$text= wordwrap($text, $wrap, "\n");
}
$text.="\n";
if(is_array($urllist)){
foreach($urllist as $key=>$url){
$text.="\n[".$key."] ".$url."";
}
}
$text=strip_tags($text);
return $text;
}
Feel free to use it if you need it. Please mind that the
code is wrapped to fit on this page.