Transform complex and variable xml

I've a complex XML that I want to transform in HTML. Some tags need to be replaced in html tags.

The XML is this:

<root>
<div>
    <p>
        <em>bol text</em>, some normale text
    </p>
</div>
<list>
    <listitem>
        normal text inside list <em>bold inside list</em>
    </listitem>
    <listitem>
        another text in list...
    </listitem>
</list>
<p>
    A sample paragraph
</p>

The text inside the element is variable, wich means that the other xml that I parse can completely change.

The output I want is this (for this scenario):

<root>
    <div>
        <p>
            <strong>bol text</strong>, some normale text
        </p>
    </div>
    <ul>
        <li>
            normal text inside list <strong>bold inside list</strong>
        </li>
        <li>
            another text in list...
        </li>
    </ul>
    <p>
        A sample paragraph
    </p>
</root>

I make a recursive function for parse any single node of xml and replace it in HTML tag (but doesn't work):

$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->load('section.xml');
echo $doc->saveHTML();

function printHtml(DOMNode $node)
{
    if ($node->hasChildNodes())
    {
        foreach ($node->childNodes as $child)
        {
            printHtml($child);
        }
    }

    if ($node->nodeName == 'em')
    {
        $newNode = $node->ownerDocument->createElement('strong', $node->nodeValue);
        $node->parentNode->replaceChild($newNode, $node);
    }

    if ($node->nodeName == 'listitem')
    {
        $newNode = $node->ownerDocument->createElement('li', $node->nodeValue);
        $node->parentNode->replaceChild($newNode, $node);
    }
}

Can anyone help me? Thanks in advance.

This is an example of a complete xml:

<root>
    <div>
        <p>
            <em>bol text</em>, some normale text
        </p>
    </div>
    <list>
        <listitem>
            normal text inside list <em>bold inside list</em>
        </listitem>
        <listitem>
            another text in list...
        </listitem>
    </list>
    <media>
        <info isVisible="false">
            <title>
                <p>Image title <em>in bold</em> not in bold</p>
            </title>
        </info>
        <file isVisible="true">
            <href>
                "path/to/file.jpg"
            </href>
        </file>
    </media>
    <p>
        A sample paragraph
    </p>
</root>

Wich has to be tranformed in:

<root>
    <div>
        <p>
            <strong>bol text</strong>, some normale text
        </p>
    </div>
    <ul>
        <li>
            normal text inside list <em>bold inside list</em>
        </li>
        <li>
            another text in list...
        </li>
    </ul>
    <!-- the media tag can be presented in two mode: with title visible, and title hidden -->
    <!-- this is the case when the title is hidden -->
    <img src="path/to/file.jpg" />

    <!-- this is the case when the title is visible -->
    <!-- the info tag (inside media tag) has an attribute isVisible="false" which means it doesn't have to be shown. -->
    <!-- if the info tag has visible=true, the media tag must be transated into
     <div>
        <img src="path/to/file.jpg" />
        <p>Image title <strong>in bold</strong> not in bold</p>
     <div>
     -->
    <p>
        A sample paragraph
    </p>
</root>

Answers


There's a language specially designed for this task: it's called XSLT, and you can easily express your desired transformation in XSLT and invoke it from your PHP program. There's a learning curve, of course, but it's a much better solution than writing low-level DOM code.

In XSLT you write a set of template rules saying how individual elements should be handled. Many elements in your example are copied through unchanged, so you can start with a default rule that does this:

<xsl:template match="*">
  <xsl:copy><xsl:apply-templates/></xsl:copy>
</xsl:template>

The "match" part says what part of the input you are matching; the body of the rule says what output to produce. The xsl:apply-templates does a recursive descent to process the children of the current element.

Some of your elements are simply renamed, for example

<xsl:template match="listitem">
 <li><xsl:apply-templates/></li>
</xsl:template>

Some of the rules are a little bit more complex, but still easily expressed:

<xsl:tempate match="media/file[@isVisible='true']">
  <img src="{href}"/>
</xsl:template>

I hope you agree that this declarative rule-based approach is much clearer than your procedural code; it's also much easier for someone else to change the rules in six months' time.


Well, maybe, it's not the most correct idea, but why not just to use str_replace? That way You will see clearly the list of changes to apply and add / remove new ones easily.

  1. file_get_contents $file = file_get_contents('file.xml');
  2. str_replace $file = str_replace("<em>", "<strong>", $file);
  3. file_put_contents file_put_contents('file.html', $file);

UPDATE (Some more ideas regarding the changes in the question)

This seems a little bit tricky (at least for me now) to use PHP + DOM here. Maybe, it would be more reasonable to use XSL / XSLT (Extensible Stylesheet Language Transformations). In that case, smth. similar can be found here: How to replace a node-name with another in Xslt?

XSLT specifically used for Language Transformations http://en.wikipedia.org/wiki/XSLT


Need Your Help

Adding Marker to Google Map

javascript api google-maps

I'm creating a map using the Javascript API, and I'm having some trouble getting the markers to show up.

Mongoose populate throwing error

node.js mongodb mongoose

The follow code should (i thought) populate the dd field in schema A but produces an error