miniflux-legacy/docs/full-article-download.markdown

Full article download
=====================

For feeds that accept only a summary, it's possible to download the full content directly from the original website.

How the content grabber works?
------------------------------

1. Try with rules first (xpath patterns) for the domain name (see `PicoFeed\Rules\`)
2. Try to find the text content by using common attributes for class and id
3. Finally, if nothing is found, the feed content is displayed

The content downloader use a fake user agent, actually Google Chrome under Mac Os X.

However the content grabber doesn't work very well with all websites.

**The best results are obtained with Xpath rules file.**


How to write a grabber rules file?
----------------------------------

Add a PHP file to the directory `vendor/fguillot/picofeed/lib/PicoFeed/Rules`, the filename must be the domain name:

Example with the BBC website, `www.bbc.co.uk.php`:

```php
<?php

return array(
    'test_url' => 'http://www.bbc.co.uk/news/world-middle-east-23911833',
    'body' => array(
        '//div[@class="story-body"]',
    ),
    'strip' => array(
        '//script',
        '//form',
        '//style',
        '//*[@class="story-date"]',
        '//*[@class="story-header"]',
        '//*[@class="story-related"]',
        '//*[contains(@class, "byline")]',
        '//*[contains(@class, "story-feature")]',
        '//*[@id="video-carousel-container"]',
        '//*[@id="also-related-links"]',
        '//*[contains(@class, "share") or contains(@class, "hidden") or contains(@class, "hyper")]',
    )
);
```

Actually, only `body`, `strip` and `test_url` are supported.

Don't forget to send a pull request or a ticket to share your contribution with everybody.

List of content grabber rules
-----------------------------

[List of existing rules on the repository](https://github.com/fguillot/miniflux/tree/master/vendor/fguillot/picofeed/lib/PicoFeed/Rules)

If you want to add new rules, just open a ticket and I will do it.
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00			`Full article download`
			`=====================`

			`For feeds that accept only a summary, it's possible to download the full content directly from the original website.`

			`How the content grabber works?`
			`------------------------------`

			1. Try with rules first (xpath patterns) for the domain name (see `PicoFeed\Rules\`)
			`2. Try to find the text content by using common attributes for class and id`
Update PicoFeed and PicoDb 2014-10-19 20:42:31 +02:00			`3. Finally, if nothing is found, the feed content is displayed`
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00
			`The content downloader use a fake user agent, actually Google Chrome under Mac Os X.`

			`However the content grabber doesn't work very well with all websites.`

			`The best results are obtained with Xpath rules file.`


			`How to write a grabber rules file?`
			`----------------------------------`

Update docs 2014-12-29 23:10:37 +01:00			Add a PHP file to the directory `vendor/fguillot/picofeed/lib/PicoFeed/Rules`, the filename must be the domain name:
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00
			Example with the BBC website, `www.bbc.co.uk.php`:

Improve documentation 2014-10-31 03:10:59 +01:00			```php
			`<?php`

			`return array(`
			`'test_url' => 'http://www.bbc.co.uk/news/world-middle-east-23911833',`
			`'body' => array(`
			`'//div[@class="story-body"]',`
			`),`
			`'strip' => array(`
			`'//script',`
			`'//form',`
			`'//style',`
			`'//*[@class="story-date"]',`
			`'//*[@class="story-header"]',`
			`'//*[@class="story-related"]',`
			`'//*[contains(@class, "byline")]',`
			`'//*[contains(@class, "story-feature")]',`
			`'//*[@id="video-carousel-container"]',`
			`'//*[@id="also-related-links"]',`
			`'//*[contains(@class, "share") or contains(@class, "hidden") or contains(@class, "hyper")]',`
			`)`
			`);`
			```
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00
			Actually, only `body`, `strip` and `test_url` are supported.

			`Don't forget to send a pull request or a ticket to share your contribution with everybody.`

			`List of content grabber rules`
			`-----------------------------`

Update docs 2014-12-29 23:10:37 +01:00			`[List of existing rules on the repository](https://github.com/fguillot/miniflux/tree/master/vendor/fguillot/picofeed/lib/PicoFeed/Rules)`
Split the documentation in multiple files 2014-04-06 03:58:17 +02:00
			`If you want to add new rules, just open a ticket and I will do it.`