Peter Breuls's Weblog

 dinsdag 23 augustus 2005

Converting HTML to plaintext

This afternoon I wrote a simple PHP-function to convert an HTML formatted message to plain text. I'm using it to create a non-HTML version of a newsletter. It strips out all the HTML, but before it does that, it captures al links (modeled after this method and turns HTML list-items into plaintext list-items. I thought I'd share the code:

function html2text($text,$wrap=0){
    
    
preg_match_all("/(<([\w]+)[^>]*>)([^<]*)(<\/\\2>)/", $text, $matches, PREG_SET_ORDER);
    
    
$text = str_replace("<br />","\n",$text);
    
$text = str_replace("<br>","\n",$text);    
    
$text = str_replace("<BR>","\n",$text);    
    
$text = str_replace("<p>","\n\n",$text);    
    
$text = str_replace("<P>","\n\n",$text);    
    
$text = str_replace("<LI>","\n * ",$text);    
    
$text = str_replace("<li>","\n * ",$text);    
    
$text = str_replace("</LI>","",$text);    
    
$text = str_replace("</li>","",$text);    
    
$text = str_replace("</UL>","\n\n",$text);    
    
$text = str_replace("</ul>","\n\n",$text);    
    
    
$urlcount=0;
    foreach (
$matches as $val) {
        if(
$val[2]=="a" || $val[2]=="A"){
               
preg_match_all ("|href\=([\"'`])(.+?)\1|i", $val[1], $urls);
               
$urllist[$urlcount]=$urls[2][0];
               
$text = str_replace($val[0],$val[3]. " [$urlcount]",$text);
               
$urlcount++;
        }
    }
    
    if(
$wrap>0){
        
$text= wordwrap($text, $wrap, "\n");
    }
    
$text.="\n";
    
    if(
is_array($urllist)){
        foreach(
$urllist as $key=>$url){
            
$text.="\n[".$key."] ".$url."";
        }
    }    
    
$text=strip_tags($text);
    return
$text;
}


Feel free to use it if you need it. Please mind that the code is wrapped to fit on this page.

Planning ahead

Systems Engineer: How long will it take for you to implement [the customer]'s changes?
Engineer: About two-three weeks. So four weeks.
Systems Engineer: Good. And how long will it take you to make your changes?
Intern: Well, I already did it, and it took an hour.
Systems Engineer: Okay, I'll tell them five weeks total.
Sounds like an everyday situation at work.

Google Desktop 2

The question is: does Google Desktop 2 work with FireFox? It doesn't say..

Maybe I should try it out. Looks interesting, with the sidebar and all.