You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+221-8Lines changed: 221 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Most of the PHP parsers I encountered in the past were either too complicated, *
4
4
5
5
# pQuery Web Scraper tutorial
6
6
## Getting started
7
-
To start coding with pQuery simply include the main PHP file in this repository and initilize an object class like this:
7
+
To start coding with pQuery simply include the main PHP file in this repository and initialize an object class like this:
8
8
```php
9
9
// include webscraper.php file
10
10
include "path/webscraper.php";
@@ -409,7 +409,8 @@ These functions are built to test if an element has a specific class/attribute o
409
409
410
410
```html
411
411
<pclass="rice">Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.</p>
412
-
<pdata-target="lord">Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo.</p>
412
+
<pdata-target="lord">Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris
413
+
placerat eleifend leo.</p>
413
414
```
414
415
415
416
### Php
@@ -437,10 +438,10 @@ p[1] has class "rice": true
437
438
p[2] has attribute "data-target" with value "lady": false
438
439
```
439
440
440
-
## `delete`
441
+
## `remove` and `empty`
441
442
442
-
The delete function is used to delete DOM nodes.
443
-
It accepts a boolean as a parameter: `true` or `false`, `true` tells it to keep its inner content (be it text or html nodes), while `false` tells it the opposite. The default parameter is set to `true`.
443
+
The remove function is used to delete DOM nodes.
444
+
It accepts a boolean as a parameter: `true` or `false`, `true` tells it to keep its inner content (be it text or html nodes), while `false` tells it the opposite. The default parameter is set to `false`. While `empty` clears out inner HTML.
444
445
445
446
### `$html`
446
447
@@ -471,7 +472,7 @@ include "path/webscraper.php";
471
472
$doc = new WebScraper();
472
473
$doc->loadHTML($html);
473
474
474
-
$doc->Q("style")->delete();
475
+
$doc->Q("style")->remove();
475
476
476
477
$doc->echo();
477
478
```
@@ -494,7 +495,7 @@ include "path/webscraper.php";
494
495
$doc = new WebScraper();
495
496
$doc->loadHTML($html);
496
497
497
-
$doc->Q("p")->delete(true);
498
+
$doc->Q("p")->remove(true);
498
499
499
500
$doc->echo();
500
501
```
@@ -511,9 +512,48 @@ Will output
511
512
512
513
</body>
513
514
```
515
+
An example with `empty`:
516
+
```php
517
+
include "path/webscraper.php";
518
+
$doc = new WebScraper();
519
+
$doc->loadHTML($html);
514
520
521
+
$doc->Q("p[1]")->empty();
515
522
516
-
## `iterate` and `replaceText`
523
+
$doc->echo();
524
+
```
525
+
```html
526
+
<head>
527
+
<style>
528
+
code {
529
+
font-family: Consolas,"courier new";
530
+
color: crimson;
531
+
background-color: #f1f1f1;
532
+
padding: 2px;
533
+
font-size: 105%;
534
+
}
535
+
</style>
536
+
</head>
537
+
<body>
538
+
539
+
<p></p>
540
+
<p>The CSS <code>background-color</code> property defines the background color of an element.</p>
541
+
542
+
</body>
543
+
```
544
+
## Tip
545
+
546
+
You can delete all tags attributes by using the `::attributes` with the `remove` function:
547
+
```php
548
+
include "path/webscraper.php";
549
+
$doc = new WebScraper();
550
+
$doc->loadHTML($html);
551
+
552
+
$doc->Q("::attributes")->remove();
553
+
554
+
$doc->echo();
555
+
```
556
+
## `hasAttr` and `hasClass`
517
557
518
558
These functions are built to test if an element has a specific class/attribute or not. If it does, it returns true.
519
559
@@ -551,3 +591,176 @@ p[1] has class "rice": true
551
591
552
592
p[2] has attribute "data-target" with value "lady": false
553
593
```
594
+
## `replaceText` and `replaceTextCallback`
595
+
596
+
Both work similarly to the native `preg_replace` and `preg_replace_callback` functions, respectively. With the only differences being that you are able to choose between injecting HTML/XML or not, and that they are able to automatically iterate over all node texts or specific node texts inside a chosen tag, leaving the overall XML/HTML structure untouched.
597
+
598
+
### `$html`
599
+
```html
600
+
601
+
<p>Nam finibus, neque et placerat condimentum, eros ligula mattis libero, eget aliquet nisi dolor nec ex.
602
+
Cras eleifend et nulla rutrum mattis. Etiam eu ipsum nisi. Sed non placerat ante. Aliquam urna tellus,
603
+
faucibus a risus quis, porta eleifend mauris. Nullam sagittis consequat faucibus. Nunc metus tortor,
604
+
blandit sit amet odio sit amet, iaculis pulvinar ipsum. Morbi in urna vel leo fringilla efficitur.
605
+
Vivamus eget rutrum sem. Phasellus posuere nunc sem, vel ultricies metus rutrum nec.</p>
0 commit comments