2014-04-05 21:58:17 -04:00
Full article download
=====================
For feeds that accept only a summary, it's possible to download the full content directly from the original website.
How the content grabber works?
------------------------------
2014-12-29 17:13:20 -05:00
1. Try with rules first (Xpath patterns) for the domain name
2014-04-05 21:58:17 -04:00
2. Try to find the text content by using common attributes for class and id
2014-10-19 14:42:31 -04:00
3. Finally, if nothing is found, the feed content is displayed
2014-04-05 21:58:17 -04:00
However the content grabber doesn't work very well with all websites.
2015-04-11 09:39:22 -04:00
Especially websites that use a lot of Javascript to generate the content.
2014-04-05 21:58:17 -04:00
**The best results are obtained with Xpath rules file.**
How to write a grabber rules file?
----------------------------------
2015-04-10 20:34:48 -04:00
Add a PHP file to the directory `rules` , the filename must be the domain name with the suffix `.php` :
2014-04-05 21:58:17 -04:00
Example with the BBC website, `www.bbc.co.uk.php` :
2014-10-30 22:10:59 -04:00
```php
< ?php
return array(
'test_url' => 'http://www.bbc.co.uk/news/world-middle-east-23911833',
'body' => array(
'//div[@class ="story-body"]',
),
'strip' => array(
'//script',
'//form',
'//style',
'//*[@class ="story-date"]',
'//*[@class ="story-header"]',
'//*[@class ="story-related"]',
'//*[contains(@class , "byline")]',
'//*[contains(@class , "story-feature")]',
'//*[@id ="video-carousel-container"]',
'//*[@id ="also-related-links"]',
'//*[contains(@class , "share") or contains(@class , "hidden") or contains(@class , "hyper")]',
)
);
```
2014-04-05 21:58:17 -04:00
2015-04-11 09:39:22 -04:00
Actually, only the keys `body` , `strip` and `test_url` are supported.
Miniflux will try first to find the file in the [default bundled rules directory ](https://github.com/miniflux/miniflux/tree/master/vendor/fguillot/picofeed/lib/PicoFeed/Rules ), then it will try to load your custom rules.
Sharing your custom rules with the community
--------------------------------------------
2014-04-05 21:58:17 -04:00
2015-04-11 09:39:22 -04:00
If you would like to share your custom rules with everybody, send a pull-request to the project [PicoFeed ](https://github.com/fguillot/picofeed ).
That will be merged in the Miniflux code base.
2014-04-05 21:58:17 -04:00
List of content grabber rules
-----------------------------
2015-04-10 20:34:48 -04:00
[List of rules included by default ](https://github.com/miniflux/miniflux/tree/master/vendor/fguillot/picofeed/lib/PicoFeed/Rules ).