Making WordPress Tag Balancing Work with Exec-PHP

I use the WordPress plugin Exec-PHP to use PHP in my posts, but under normal circumstances if I do this with “WordPress should correct invalidly nested XHTML automatically” (a.k.a. tag balancing) checked in Settings > Writing, I get this nasty error whenever I try to use PHP:

Parse error: syntax error, unexpected ‘?’ in /home/thripp/public_html/wp-content/plugins/exec-php/includes/runtime.php(42) : eval()’d code on line 1

The solution the developer provides is to simply disable that feature. That’s fine most of the time, but I encountered a tricky situation where I needed to use PHP and have WordPress close open HTML tags which I simply could not close.

My posts on this blog are usually photos with short descriptions, but occasionally I write long articles which may go on for thousands of words. Up until last year, my tag, category, and archive pages displayed the full content of my posts. WordPress excerpts were unacceptable for two reasons: 1.) they are always 55 words; 2.) I use Post Thumb Revisited to auto-convert 800×600 images to 400×300 thumbnails, but it only converts them through the_content filter, not the_excerpt. While the excerpt length is customizable in WP 2.8 or newer, I am unwilling to upgrade from WPMU 2.7. What I really needed was an excerpt that used the_content, respected all HTML tags, worked with Exec-PHP, and let me customize the excerpt length.

Enter The Excerpt Reloaded. The plugin was 5 years old, so I found an updated version with bugfixes that was only 3 years old. I quickly wrote this code for my theme’s index.php file and left it this way until today:

if(is_category() || is_archive()) the_excerpt_reloaded(200, ‘all’, ‘content’,
     true, “<strong>… continue reading</strong>”, false, 1, false, false,
     ’p', ‘Click to see whole entry.’, true);
else the_content(__(‘… CONTINUE READING’));

This has been the best of both worlds. It cuts off the content at 200 words, so most of my photos do not have a “continue reading” link because my descriptions are under 200 words. Longer posts are cut off after 200 words, so my archive pages do not become unnecessarily long. I had to set the 8th argument of the_excerpt_reloaded, $fix_tags, to false, because I would get the same old Exec-PHP error if it was set to true. “No problem,” I thought. I already have tag balancing disabled in WordPress, so what could it hurt to disable it here?

Recently, however, I encountered an insidious bug when a post was cut at 200 words in the middle of a <strong> tag. The tag would never be closed, meaning the rest of the page would be bold! Take a look at this screenshot:

Unbalanced tags

What do you do about something like this? Obviously, there are many solutions. I could rewrite the offending article so the 201st word is not in the middle of an HTML tag. All I would have to do is put in a few filler words earlier in the article. I could enable tag balancing, write some code to check if each post contains PHP, and not use the_excerpt_reloaded in those cases. I could use custom fields on posts to determine which mode of behavior should be used. I could upgrade WordPress (oh god no). All of these solutions seem suboptimal.

Instead, I went to the problem’s source. If $fix_tags is true, the_excerpt_reloaded runs the content of the excerpt through balanceTags. What is balanceTags? A WordPress function in /wp-includes/formatting.php which activates force_balance_tags. What is force_balance_tags? A WordPress function in the same file which looks like hieroglyphics. All I wanted to do was force the function to ignore PHP, but I couldn’t figure it out. It wasn’t a simple matter of ignoring <?php ?> tags. My PHP tags often appear in the middle of other HTML tags. Here is the source code of a typical photo on my blog:

<img src=”http://thripp.com/files/photos/flash.jpg” alt=”<?php the_title(); ?>” />

I took this picture of raindrops falling at night, and my camera’s flash reflected off one of the raindrops. It looks like a star going supernova.

[­sniplet fuji-a360], <?php fexf(); ?>

<a href=”[­sniplet photos-path]stock/<?php echo fprm(); ?>-stock.jpg”>[­sniplet stock-dl-text]</a> (<?php fsze(fprm() . ‘-stock.jpg’); ?>) or <a href=”[­sniplet photos-path]stock/source/<?php echo fprm(); ?>-ss.jpg”>[­sniplet ss-dl-text-lc]</a> (<?php fsze(‘source/’ . fprm() . ‘-ss.jpg’); ?>).

[­sniplet stock-rights]

Looks pretty terrible, huh? I’ve got sniplets in there, custom PHP functions, concatenation, nesting… you name it. This template creates the file paths for the photos, source files, and stock versions right from the post title, because I upload the photos by FTP and follow a rigid file structure. The only reason there’s a full IMG tag at the top of each post is because Post Thumb Revisted won’t create the thumbnails to automatically generate my gallery without it. The template also displays the size of the source files and then extracts and displays the Exif data from the photos in my preferred format, which was extremely difficult to set up and is something I used to do manually. It runs itself, and the functions are really quite interesting.

Anyway, what I needed was a way to bypass force_balance_tags entirely, but only in regard to PHP code. I need the function to close dangling tags like <strong>, <em>, and <u> if the_excerpt_reloaded cuts the post in the middle of a tag.

After a lot of unsuccessful Google searches, I remembered that I solved a similar problem at the beginning of October in Tweet This 1.8. On the “Write Tweet” page, Tweet This uses a modified version of Jeff Roberson’s Linkify URL to delimit URLs with a space on each side (function tt_delimit_urls). A tweet like “Check out http://www.google.com/!” becomes “Check out http://www.google.com/ !”. Then, I use Ext-Conv-Links by Muhammad Arfeen to convert all long URLs to short URLs if the tweet is over 140 characters (class tt_shorten_urls). This works great for most URLs, but I discovered it breaks URLs containing underscores. http://en.wikipedia.org/wiki/South_Africa gets sent to the URL shortener http://en.wikipedia.org/wiki/South, which gets converted into http://bit.ly/bzLvSK_Africa, which doesn’t work at all. Totally unacceptable.

After many hours of torment trying to fix Jeff or Muhammad’s code, I decided to approach the problem from a different angle. Why not just replace underscores with something else on the way in, and then change them back to something else on the way out? Good programming doesn’t dance around problems, but I’ll take a practical solution that works over an idealistic solution that fails, any day. But what string to replace underscores with? I can’t use a special character or something that might be used in a tweet on purpose, because it will get converted into an underscore. After some thought, I settled on t9WGb5. It doesn’t look pretty, but it works, and I doubt any URL containing “t9WGb5″ is ever going to be purposefully included in a tweet. So I proceeded to write statements like str_replace(‘t9WGb5′, ‘_’, $url) and str_replace(‘_’, ‘t9WGb5′, $url) at the necessary places throughout the code, and URLs with underscores worked like a charm. As an Easter egg, try writing a tweet over 140 characters containing a URL where you replace an underscore with “t9WGb5″ yourself, for example, “test test test test test test test test test test test test test test test test test test test http://en.wikipedia.org/wiki/Southt9WGb5Africa”, then preview it on the Write Tweet page. Check the preview page for the short URL, i.e. http://bit.ly/cRLAis+, and you’ll see that your “t9WGb5″ was converted to an underscore before the long URL was even sent to Bit.ly, as an artifact of my kludge-like solution.

Couldn’t the tag balancing problem be approached in the same way? Of course it could. A simple modification to /wp-includes/formatting.php did the trick. Right at the start of the force_balance_tags function, I replaced “<?php” and “?>” with “[![?php" and "?]!]” using str_replace, as follows:

function force_balance_tags( $text ) {
     $text = str_replace(array(‘<?php’, ‘?>’), array(‘[![?php', '?]!]’), $text);
     $tagstack = array(); $stacksize = 0; $tagqueue = ”; $newtext = ”;

Then, at the end of the function, I change it all back:

// WP fix for the bug with HTML comments
     $newtext = str_replace(“< !–”,”<!–”,$newtext);
     $newtext = str_replace(“< !–”,”< !–”,$newtext);
     $newtext = str_replace(array(‘[![?php', '?]!]’), array(‘<?php’, ‘?>’), $newtext);
     return $newtext;
}

All this happens either before or after Exec-PHP executes. I’m not sure when, but it doesn’t matter. My goal of being able to use tag balancing with Exec-PHP has been reached. I now have $strip_tags set to true in the_excerpt_reloaded and “WordPress should correct invalidly nested XHTML automatically” enabled in Settings > Writing, and all I have to do is re-apply the hack when I upgrade WordPress. It’s amazing what thinking outside the box gets you.

I can’t actually write “[![?php" or "?]!]” inside any post on my site, because my hack will convert those strings to real PHP code and they won’t be displayed. How did I display the code above? My actual /wp-includes/formatting.php file uses underscores instead of exclamation points. How did I include the sniplets in the example post without the Sniplets plugin executing them? Breaking the parser with the &shy; HTML entity. Simple.

Earlier, I talked about the functions I use in my photo template to automate display of file size and Exif data. Here are those functions:

function fsze($f = ‘simplicity-stock.jpg’, $p =
     ’/home/thripp/public_html/wp-content/blogs.dir/2/files/photos/stock/’)
     {$n = array(‘Bytes’, ‘KB’, ‘MB’, ‘GB’); $p = $p . $f;
     if(file_exists($p)) $b = filesize($p);
          else $b = ’1000′;
     echo round($b/pow(1000, ($i = floor(log($b, 1000)))), 2) . $n[$i];}

function fprm() {
     return str_replace(‘photo-’, ”, preg_replace(‘/-+/’, ‘-’,
          preg_replace(‘/[^a-z0-9-]/’, ‘-’,
          strtolower(trim(str_replace(array(‘?’, ‘…’),
          array(”, ”), get_the_title()))))));}

function fexf() {
     $exif = exif_read_data(‘/home/thripp/public_html/wp-content/’ .
          ’blogs.dir/2/files/photos/’ . fprm() . ‘.jpg’, 0, true);
     $shutter = $exif['EXIF']['ExposureTime'];
     $fnum = str_replace(‘f/’, ‘F’, $exif['COMPUTED']['ApertureFNumber']);
     $focal = $exif['EXIF']['FocalLength'];
     $iso = $exif['EXIF']['ISOSpeedRatings'];
     $date = $exif['EXIF']['DateTimeOriginal'];
     $date = str_replace(‘:’, ‘-’, substr($date, 0, 10)) . ‘T’ .
          substr($date, 11);
     if(substr($date, 0, 4) < = 2007) {
          $id = substr($date, 0 , 10) . '_' . substr($date, 11, 2) .
          'h' . substr($date, 14, 2) . 'm' . substr($date, 17);}
     elseif(substr($date, 0, 4) >= 2008) {
          $id = str_replace(‘-’, ”, substr($date, 0 , 10)) . ‘-’ .
          str_replace(‘:’, ”, substr($date, 11)) . ‘rxt’;}
     $md = str_replace(‘-’, ”, substr($date, 5, 5));
     $hms = str_replace(‘:’, ”, substr($date, 11));
     if(substr($date, 0, 4) == 2004) {
          if(($md < 0404) || ($md == '0404' && $hms < 020000) ||
          ($md > 1031) || ($md == ’1031′ && $hms > 020000))
               $ldate = $date . ‘-05′;
          else $ldate = $date . ‘-04′;}
     if(substr($date, 0, 4) == 2005) {
          if(($md < 0403) || ($md == '0403' && $hms < 020000) ||
          ($md > 1030) || ($md == ’1030′ && $hms > 020000))
               $ldate = $date . ‘-05′;
          else $ldate = $date . ‘-04′;}
     if(substr($date, 0, 4) == 2006) {
          if(($md < 0402) || ($md == '0402' && $hms < 020000) ||
          ($md > 1029) || ($md == ’1029′ && $hms > 020000))
               $ldate = $date . ‘-05′;
          else $ldate = $date . ‘-04′;}
     if(substr($date, 0, 4) == 2007) {
          if(($md < 0311) || ($md == '0311' && $hms < 070000) ||
          ($md > 1104) || ($md == ’1104′ && $hms > 070000))
               $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 18000)) . ‘-05′;
          else     $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 14400)) . ‘-04′;}
     if(substr($date, 0, 4) == 2008) {
          if(($md < 0309) || ($md == '0309' && $hms < 070000) ||
          ($md > 1102) || ($md == ’1102′ && $hms > 070000))
               $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 18000)) . ‘-05′;
          else     $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 14400)) . ‘-04′;}
     if(substr($date, 0, 4) == 2009) {
          if(($md < 0308) || ($md == '0308' && $hms < 070000) ||
          ($md > 1101) || ($md == ’1101′ && $hms > 070000))
               $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 18000)) . ‘-05′;
          else     $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 14400)) . ‘-04′;}
     if(substr($date, 0, 4) == 2010) {
          if(($md < 0314) || ($md == '0314' && $hms < 070000) ||
          ($md > 1107) || ($md == ’1107′ && $hms > 070000))
               $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 18000)) . ‘-05′;
          else     $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 14400)) . ‘-04′;}
     if(substr($date, 0, 4) == 2011) {
          if(($md < 0313) || ($md == '0313' && $hms < 070000) ||
          ($md > 1106) || ($md == ’1106′ && $hms > 070000))
               $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 18000)) . ‘-05′;
          else     $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 14400)) . ‘-04′;}
     if(substr($date, 0, 4) == 2012) {
          if(($md < 0311) || ($md == '0311' && $hms < 070000) ||
          ($md > 1104) || ($md == ’1104′ && $hms > 070000))
               $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 18000)) . ‘-05′;
          else     $ldate = date(“Y-m-dTH:i:s”,
               (strtotime($date) – 14400)) . ‘-04′;}
     if(preg_match(“///”, $focal, $m)) {$pieces = explode(‘/’, $focal);
          $focal = intval($pieces['0'])/intval($pieces['1']);}
     if(preg_match(“///”, $shutter, $m)) {$pieces = explode(‘/’, $shutter);
          $shutter = ’1/’ . round($pieces['1']/$pieces['0']);}
     echo $shutter . ‘, ‘ . $fnum . ‘, ‘ . $focal . ‘mm, ISO’ . $iso .
          ’, ‘ . $ldate . ‘, ‘ . $id . “n”;}

Those were tough to write. PHP’s native functions for calculating the size of files believe that a kilobyte is 1024 bytes and a megabyte is 1024*1024 bytes, which is completely false and unacceptable. I had to write my own function to calculate proper file sizes. I take all my pictures with my clock set to Greenwich Mean Time, but I still want to display the time in local time (Eastern) with the GMT offset. I couldn’t figure out how to write a generic function, so I just did it for each year up until 2012, using the United State’s Daylight Saving Time rules. I’ll have to update the function in 2013, but I hear the world is going to end in 2012 anyway.

If you think this adds to my page load time, you’re probably right. But I use W3 Total Cache to completely cache each page of my blog, so it doesn’t matter.

Next summer, I’m going to China with my Mom. I will be leaving the Eastern time zone for the first time ever. I will definitely need to update the functions above, and I will probably have to specify the time zones manually for all the photos I post from the trip. Should I worry about that now? Of course not.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>