HTML Parsing, Assertions, and Manipulation
The HTML
class provides methods to query, manipulate, and assert against HTML
strings and documents. It is built on top of the DomCrawler
component from
Symfony, which
provides a powerful and flexible way to work with HTML.
Usage
The HTML class can be instantiated with the HTML
class or by using the html()
helper function.
use Mantle\Support\HTML;
use function Mantle\Support\Helpers\html_string;
$html = new HTML( '<div id="test">Hello World</div>' );
// Or using the helper function.
$html = html_string( '<div id="test">Hello World</div>' );
The HTML class supports being passed a HTML string, document, DOMDocument
,
DOMNode
, or DOMNodeList
. It will parse the HTML using DOMDocument
.
Filtering
The HTML class provides methods to filter nodes based on various criteria, such as ID, class name, tag name, and custom selectors.
Query Selector
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$elements = html_string( $html )->get_by_selector( '.example' );
foreach ( $elements as $element ) {
echo $element->text(); // Outputs: Hello World, Hello Universe
}
You can also retrieve the first element that matches a specific selector using
the first_by_selector
method.
$element = html_string( $html )->first_by_selector( '.example' );
echo $element->text(); // Outputs: Hello World
XPath
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
// Multiple elements.
$elements = html_string( $html )->get_by_xpath( '//p[@class="example"]' );
// Single element.
$element = html_string( $html )->first_by_xpath( '//p[@class="example"]' );
ID / Tag / Test ID
You can also retrieve elements by their ID, tag name, or test ID using the
first_by_id
, first_by_tag
, and first_by_testid
methods respectively.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
// Retrieve by ID.
$element = html_string( $html )->first_by_id( 'test' );
// Retrieve by tag name.
$element = html_string( $html )->first_by_tag( 'p' );
// Retrieve by test ID.
$element = html_string( $html )->first_by_testid( 'test' );
There are also get_by_*
versions of these methods that return all matching
elements.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
// Retrieve all elements by tag name.
$elements = html_string( $html )->get_by_tag( 'p' );
// Retrieve all elements by test ID.
$elements = html_string( $html )->get_by_testid( 'test' );
Traversing and Looping
The HTML class is iterable, allowing you to loop through the nodes it contains.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$elements = html_string( $html )->get_by_selector( '.example' );
foreach ( $elements as $element ) {
// $element is a instanceof the `HTML` class.
echo $element->text(); // Outputs: Hello World, Hello Universe
}
Access node by its position on the list:
$crawler->filter( 'body > p' )->eq( 0 );
Get the first or last node of the current selection:
$crawler->filter( 'body > p' )->first();
$crawler->filter( 'body > p' )->last();
Get the nodes of the same level as the current selection:
$crawler->filter( 'body > p' )->siblings();
Get the same level nodes after or before the current selection:
$crawler->filter( 'body > p' )->nextAll();
$crawler->filter( 'body > p' )->previousAll();
Get all the child or ancestor nodes:
$crawler->filter( 'body' )->children();
$crawler->filter( 'body > p' )->ancestors();
Get all the direct child nodes matching a CSS selector:
$crawler->filter( 'body' )->children( 'p.lorem' );
Get the first parent (heading toward the document root) of the element that matches the provided selector:
$crawler->closest( 'p.lorem' );
Accessing Node Values
You can access the text content of a node using the tag_name()
, text()
, or
innerText()
methods.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$element = html_string( $html )->first_by_selector( '.example' );
// Tag name.
echo $element->tag_name(); // Outputs: 'p'.
// Text content.
echo $element->text(); // Outputs: 'Hello World'.
You can also retrieve the attributes of a node using the get_attribute()
method.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test" class="example">
<p class="example" data-example="1234">Hello World</p>
</div>
HTML;
$html = html_string( $html );
// Get attribute value.
$html->first_by_id( 'test' )->get_attribute( 'class' ); // Outputs: 'example'.
$html->first_by_selector( '.example' )->get_attribute( 'data-example' ); // Outputs: '1234'.
$html->first_by_selector( '.example' )->get_data( 'example' ); // Outputs: '1234'.
Modifying Node Values
You can modify the attributes, classes, and content of nodes using the
modify()
method of the HTML class as well as other methods to mutate the
element's contents, attributes, etc.
use Mantle\Support\HTML;
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$html = html_string( $html );
$html->filter( 'p' )->modify( function ( HTML $node ) {
// Add a class to all <p> elements.
$node->add_class( 'modified' );
} );
// You can also replace the entire contents of a node.
$html->filter( 'p' )->modify(
fn ( HTML $node ) => "<span>New content</span>"
);
Methods
- Retrieving Nodes
- Modifying Nodes
- Iteration
- Node Information
- Assertions
assertHasChildren
assertHasNodes
assertNodeHasAnyClass
assertNodeHasClass
- assertQuerySelectorExists
- assertQuerySelectorMissing
- assertElementExists
- assertElementMissing
- assertElementExistsByClass
- assertElementMissingByClass
- assertElementExistsById
- assertElementMissingById
- assertElementExistsByTagName
- assertElementMissingByTagName
- assertElementCount
- assertQuerySelectorCount
- assertElementExistsByTestId
- assertElementMissingByTestId
- assertElement
- assertQuerySelector
Retrieving Nodes
filter/get_by_selector
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$elements = html_string( $html )->filter( '.example' );
$elements = html_string( $html )->get_by_selector( '.example' );
first_by_id
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$element = html_string( $html )->first_by_id( 'test' );
first_by_selector
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$element = html_string( $html )->first_by_selector( '.example' );
first_by_tag
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$element = html_string( $html )->first_by_tag( 'p' );
first_by_testid
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example" data-testid="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$element = html_string( $html )->first_by_testid( 'example' );
first_by_xpath
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$element = html_string( $html )->first_by_xpath( '//p[@class="example"]' );
get_by_tag
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$elements = html_string( $html )->get_by_tag( 'p' );
get_by_testid
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example" data-testid="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$elements = html_string( $html )->get_by_testid( 'example' );
get_by_xpath
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$elements = html_string( $html )->get_by_xpath( '//p[@class="example"]' );
Modifying Nodes
add_class
Adds a class to the element. Supports multiple classes.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->add_class( 'new-class' );
html_string( $html )->first_by_selector( '.example' )->add_class( 'new-class', 'another-class' );
after
Inserts content after the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->after( '<span>After</span>' );
/*
Outputs:
<div id="test">
<p class="example">Hello World</p>
<span>After</span>
<p class="example">Hello Universe</p>
</div>
*/
append
Appends content to the end of the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->append( '<span>Appended</span>' );
/*
Outputs:
<div id="test">
<p class="example">Hello World<span>Appended</span></p>
<p class="example">Hello Universe</p>
</div>
*/
before
Inserts content before the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->before( '<h2>Before</h2>' );
/*
Outputs:
<div id="test">
<h2>Before</h2>
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
*/
empty
Empties the content of the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->empty();
/*
Outputs:
<div id="test">
<p class="example"></p>
<p class="example">Hello Universe</p>
</div>
*/
get_data
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example" data-example="1234">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->get_data( 'example' ); // Outputs: '1234'.
has_any_class
Checks if the element has any of the specified classes.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->has_any_class( 'example', 'another-class' ); // Outputs: true
has_class
Checks if the element has all of the specified classes.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->has_class( 'example' ); // Outputs: true
html_string( $html )->first_by_selector( '.example' )->has_class( 'example', 'another-class' ); // Outputs: false
modify
Modifies the element using a callback function. The callback receives the current crawler as an argument. You can modify the element and return null/void or you can return a new element to replace the current one.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
$html = html_string( $html );
$html->filter( '.example' )->first()->modify( function ( HTML $node ) {
// Add a class to the element.
$node->add_class( 'modified' );
} );
$html->filter( '.example' )->last()->modify( function ( HTML $node ) {
// Replace the content of the element.
return '<span>New content</span>';
} );
prepend
Prepends content to the beginning of the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->prepend( '<span>Prepended</span>' );
/*
Outputs:
<div id="test">
<p class="example"><span>Prepended</span>Hello World</p>
<p class="example">Hello Universe</p>
</div>
*/
remove
Removes the element from the DOM.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->remove();
/*
Outputs:
<div id="test">
<p class="example">Hello Universe</p>
</div>
*/
remove_attribute
Removes an attribute from the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->remove_attribute( 'class' );
/*
Outputs:
<div id="test">
<p>Hello World</p>
<p class="example">Hello Universe</p>
</div>
*/
remove_class
Removes a class from the element. Supports multiple classes.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->remove_class( 'example' );
/*
Outputs:
<div id="test">
<p>Hello World</p>
</div>
*/
remove_data
Removes a data attribute from the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example" data-example="1234">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->remove_data( 'example' );
/*
Outputs:
<div id="test">
<p class="example">Hello World</p>
</div>
*/
set_attribute
Sets an attribute on the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->set_attribute( 'data-example', '1234' );
/*
Outputs:
<div id="test">
<p class="example" data-example="1234">Hello World</p>
</div>
*/
set_data
Sets a data attribute on the element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_selector( '.example' )->set_data( 'example', '1234' );
/*
Outputs:
<div id="test">
<p class="example" data-example="1234">Hello World</p>
</div>
*/
wrap
Wraps the element with the specified HTML or element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div>
<ul id="test">
<li class="example">Hello World</li>
<li class="example">Hello Universe</li>
</ul>
</div>
HTML;
html_string( $html )->filter( 'ul' )->wrap( '<div class="wrapper"></div>' );
/*
Outputs:
<div>
<div class="wrapper">
<ul id="test">
<li class="example">Hello World</li>
<li class="example">Hello Universe</li>
</ul>
</div>
</div>
*/
wrap_all
Wraps all matched elements with the specified HTML or element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div>
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
</div>
HTML;
html_string( $html )->filter( 'p' )->wrap_all( '<div class="wrapper"></div>' );
/*
Outputs:
<div>
<div class="wrapper">
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
</div>
</div>
*/
wrap_inner
Wraps the inner content of the element with the specified HTML or element.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div>
<ul id="test">
<li class="example">Hello World</li>
<li class="example">Hello Universe</li>
</ul>
</div>
HTML;
html_string( $html )->filter( 'li' )->wrap_inner( '<span class="wrapper"></span>' );
/*
Outputs:
<div>
<ul id="test">
<li class="example"><span class="wrapper">Hello World</span></li>
<li class="example"><span class="wrapper">Hello Universe</span></li>
</ul>
</div>
*/
Iteration
next_until
Returns all nodes after a condition is met. Only includes the last node that
matched the condition if $include
is set to true
.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
<p class="example">Hello Galaxy</p>
</div>
HTML;
// Contains the "Hello Galaxy" node.
$elements = html_string( $html )->filter( 'p' )->next_until(
fn ( HTML $element ) => $element->text() === 'Hello Universe',
);
// Contains the "Hello Universe" and "Hello Galaxy" nodes.
$elements = html_string( $html )->filter( 'p' )->next_until(
fn ( HTML $element ) => $element->text() === 'Hello Universe',
include: true
);
previous_until
Returns all nodes before a condition is met. Only includes the last node that
matched the condition if $include
is set to true
.
use function Mantle\Support\Helpers\html_string;
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
<p class="example">Hello Galaxy</p>
</div>
HTML;
// Contains the "Hello World" node.
$elements = html_string( $html )->filter( 'p' )->previous_until(
fn ( HTML $element ) => $element->text() === 'Hello Universe',
);
// Contains the "Hello World" and "Hello Universe" nodes.
$elements = html_string( $html )->filter( 'p' )->previous_until(
fn ( HTML $element ) => $element->text() === 'Hello Universe',
include: true
);
Node Information
dd
Dumps the current nodes and exits the script.
dump
Dumps the current nodes for debugging purposes and returns static.
has_nodes
Checks if the current selection has any nodes.
tag_name
Returns the tag name of the first node in the current selection.
text
Returns the text content of the first node in the current selection.
Assertions
The assertions available on the HTML
class extend the same assertions used on
the Element Assertions used when
testing HTTP requests.
assertHasChildren
Assert that the current node has children.
use function Mantle\Support\Helpers\html_string;
class ExampleTest extends TestCase {
public function testExample() {
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->first_by_id( 'test' )->assertHasChildren();
html_string( $html )->first_by_id( 'test' )->assertHasChildren( selector: 'p' );
html_string( $html )->first_by_id( 'test' )->assertHasChildren( selector: 'p', count: 2 );
}
}
assertHasNodes
Assert that the current selection has nodes.
use function Mantle\Support\Helpers\html_string;
class ExampleTest extends TestCase {
public function testExample() {
$html = <<<'HTML'
<div id="test">
<p class="example">Hello World</p>
<p class="example">Hello Universe</p>
</div>
HTML;
html_string( $html )->filter( 'p' )->assertHasNodes();
html_string( $html )->filter( 'p' )->assertHasNodes( count: 2 );
}
}
assertNodeHasAnyClass
Assert that the current node has any of the specified classes.
use function Mantle\Support\Helpers\html_string;
class ExampleTest extends TestCase {
public function testExample() {
$html = <<<'HTML'
<div id="test" class="example another-class">
<p class="example">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_id( 'test' )->assertNodeHasAnyClass( 'example', 'another-class' );
}
}
assertNodeHasClass
Assert that the current node has all of the specified classes.
use function Mantle\Support\Helpers\html_string;
class ExampleTest extends TestCase {
public function testExample() {
$html = <<<'HTML'
<div id="test" class="example another-class">
<p class="example">Hello World</p>
</div>
HTML;
html_string( $html )->first_by_id( 'test' )->assertNodeHasClass( 'example' ); // true
html_string( $html )->first_by_id( 'test' )->assertNodeHasClass( 'example', 'another-class' ); // true
html_string( $html )->first_by_id( 'test' )->assertNodeHasClass( 'example', 'non-existent-class' ); // false
}
}
assertQuerySelectorExists
Assert that a given CSS selector exists in the HTML.
$html->assertQuerySelectorExists( string $selector );
assertQuerySelectorMissing
Assert that a given CSS selector does not exist in the HTML.
$html->assertQuerySelectorMissing( string $selector );
assertElementExists
Assert that a given XPath exists in the HTML.
$html->assertElementExists( string $expression );
assertElementMissing
Assert that a given XPath does not exist in the HTML.
$html->assertElementMissing( string $expression );
assertElementExistsByClass
Assert that a given class exists in the HTML.
$html->assertElementExistsByClass( string $class );
assertElementMissingByClass
Assert that a given class does not exist in the HTML.
$html->assertElementMissingByClass( string $class );
assertElementExistsById
Assert that a given ID exists in the HTML.
$html->assertElementExistsById( string $id );
assertElementMissingById
Assert that a given ID does not exist in the HTML.
$html->assertElementMissingById( string $id );
assertElementExistsByTagName
Assert that a given tag name exists in the HTML.
$html->assertElementExistsByTagName( string $tag_name );
assertElementMissingByTagName
Assert that a given tag name does not exist in the HTML.
$html->assertElementMissingByTagName( string $tag_name );
assertElementCount
Assert that the HTML has the expected number of elements matching the given XPath expression.
$html->assertElementCount( string $expression, int $expected );
assertQuerySelectorCount
Assert that the HTML has the expected number of elements matching the given CSS selector.
$html->assertQuerySelectorCount( string $selector, int $expected );
assertElementExistsByTestId
Assert that an element with the given data-testid
attribute exists in the HTML.
$html->assertElementExistsByTestId( string $test_id );
assertElementMissingByTestId
Assert that an element with the given data-testid
attribute does not exist in the HTML.
$html->assertElementMissingByTestId( string $test_id );
assertElement
Assert that the given element exists in the HTML and passes the given assertion. This can be used to make custom assertions against the element that cannot be expressed in a simple XPath expression or query selector.
$html->assertElement( string $expression, callable $assertion, bool $pass_any = false );
If $pass_any
is true
, the assertion will pass if any of the elements pass
the assertion. Otherwise, all elements must pass the assertion. Let's take a
look at an example:
use DOMElement;
$html->assertElement(
'//div',
fn ( DOMElement $element ) => $this->assertEquals( 'Hello World', $element->textContent )
&& $this->assertNotEmpty( $element->getAttribute( 'class' ) ) );
},
);
assertQuerySelector
Assert that the given CSS selector exists in the HTML and passes the given
assertion. Similar to assertElement
, this can be used to make custom
assertions against the element that cannot be expressed in a simple XPath
expression or query selector.
$html->assertQuerySelector( string $selector, callable $assertion, bool $pass_any = false );
Let's take a look at an example:
use DOMElement;
$html->assertQuerySelector(
'div > p',
fn ( DOMElement $element ) => $this->assertEquals( 'Hello World', $element->textContent )
&& $this->assertNotEmpty( $element->getAttribute( 'class' ) ) );
},
);