miniflux-legacy/docs/full-article-download.markdown

Full article download
=====================

For feeds that accept only a summary, it's possible to download the full content directly from the original website.

How the content grabber works?
------------------------------

1. Try with rules first (Xpath patterns) for the domain name
2. Try to find the text content by using common attributes for class and id
3. Finally, if nothing is found, the feed content is displayed

The content downloader use a fake user agent, actually Google Chrome under Mac Os X.

However the content grabber doesn't work very well with all websites.

**The best results are obtained with Xpath rules file.**


How to write a grabber rules file?
----------------------------------

Add a PHP file to the directory `rules`, the filename must be the domain name with the suffix `.php`:

Example with the BBC website, `www.bbc.co.uk.php`:

```php
<?php

return array(
    'test_url' => 'http://www.bbc.co.uk/news/world-middle-east-23911833',
    'body' => array(
        '//div[@class="story-body"]',
    ),
    'strip' => array(
        '//script',
        '//form',
        '//style',
        '//*[@class="story-date"]',
        '//*[@class="story-header"]',
        '//*[@class="story-related"]',
        '//*[contains(@class, "byline")]',
        '//*[contains(@class, "story-feature")]',
        '//*[@id="video-carousel-container"]',
        '//*[@id="also-related-links"]',
        '//*[contains(@class, "share") or contains(@class, "hidden") or contains(@class, "hyper")]',
    )
);
```

Actually, only `body`, `strip` and `test_url` are supported.

Don't forget to send a pull request or a ticket to share your contribution with everybody.

List of content grabber rules
-----------------------------

[List of rules included by default](https://github.com/miniflux/miniflux/tree/master/vendor/fguillot/picofeed/lib/PicoFeed/Rules).
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00			`Full article download`
			`=====================`

			`For feeds that accept only a summary, it's possible to download the full content directly from the original website.`

			`How the content grabber works?`
			`------------------------------`

Typo 2014-12-29 23:13:20 +01:00			`1. Try with rules first (Xpath patterns) for the domain name`
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00			`2. Try to find the text content by using common attributes for class and id`
Update PicoFeed and PicoDb 2014-10-19 20:42:31 +02:00			`3. Finally, if nothing is found, the feed content is displayed`
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00
			`The content downloader use a fake user agent, actually Google Chrome under Mac Os X.`

			`However the content grabber doesn't work very well with all websites.`

			`The best results are obtained with Xpath rules file.`


			`How to write a grabber rules file?`
			`----------------------------------`

Add custom rules directory support 2015-04-11 02:34:48 +02:00			Add a PHP file to the directory `rules`, the filename must be the domain name with the suffix `.php`:
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00
			Example with the BBC website, `www.bbc.co.uk.php`:

Improve documentation 2014-10-31 03:10:59 +01:00			```php
			`<?php`

			`return array(`
			`'test_url' => 'http://www.bbc.co.uk/news/world-middle-east-23911833',`
			`'body' => array(`
			`'//div[@class="story-body"]',`
			`),`
			`'strip' => array(`
			`'//script',`
			`'//form',`
			`'//style',`
			`'//*[@class="story-date"]',`
			`'//*[@class="story-header"]',`
			`'//*[@class="story-related"]',`
			`'//*[contains(@class, "byline")]',`
			`'//*[contains(@class, "story-feature")]',`
			`'//*[@id="video-carousel-container"]',`
			`'//*[@id="also-related-links"]',`
			`'//*[contains(@class, "share") or contains(@class, "hidden") or contains(@class, "hyper")]',`
			`)`
			`);`
			```
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00
			Actually, only `body`, `strip` and `test_url` are supported.

			`Don't forget to send a pull request or a ticket to share your contribution with everybody.`

			`List of content grabber rules`
			`-----------------------------`

Add custom rules directory support 2015-04-11 02:34:48 +02:00			`[List of rules included by default](https://github.com/miniflux/miniflux/tree/master/vendor/fguillot/picofeed/lib/PicoFeed/Rules).`