Commit Graph

97 Commits

Author SHA1 Message Date
Dag e65fd7c822
fix: remove debug line (#3092) 2022-10-06 21:05:50 +02:00
Dag 5d18852108
fix: more verbose error in fb (#3089) 2022-10-05 19:30:42 +02:00
Jan Tojnar 951092eef3
Fix coding style missed by phpbcf (#2901)
$ composer require --dev friendsofphp/php-cs-fixer

$ echo >.php-cs-fixer.dist.php "<?php

$finder = PhpCsFixer\Finder::create()
    ->in(__DIR__);

$rules = [
    '@PSR12' => true,
    // '@PSR12:risky' => true,
    '@PHP74Migration' => true,
    // '@PHP74Migration:risky' => true,
    // buggy, duplicates existing comment sometimes
    'no_break_comment' => false,
    'array_syntax' => true,
    'lowercase_static_reference' => true,
    'visibility_required' => false,
    // Too much noise
    'binary_operator_spaces' => false,
    'heredoc_indentation' => false,
    'trailing_comma_in_multiline' => false,
];

$config = new PhpCsFixer\Config();

return $config
    ->setRules($rules)
    // ->setRiskyAllowed(true)
    ->setFinder($finder);

"

$ vendor/bin/php-cs-fixer --version
PHP CS Fixer 3.8.0 BerSzcz against war! by Fabien Potencier and Dariusz Ruminski.
PHP runtime: 8.1.7

$ vendor/bin/php-cs-fixer fix
$ rm .php-cs-fixer.cache
$ vendor/bin/php-cs-fixer fix
2022-07-08 13:00:52 +02:00
Dag 4f75591060
Reformat codebase v4 (#2872)
Reformat code base to PSR12

Co-authored-by: rssbridge <noreply@github.com>
2022-07-01 15:10:30 +02:00
Dag 5076d09de6
refactor: prepare for PSR2 (#2859) 2022-06-24 18:29:35 +02:00
Eugene Molotov 37cb4091d4
bridges: remove redundant "or returnServerError" after getContents/getSimpleHTMLDom/getSimpleHTMLDomCached (#2398)
When fetching website contents, exceptions already raise on fetching error
2022-01-02 14:36:09 +05:00
Eugene Molotov 4bc534c80f [FacebookBridge] teromene and logmanoriginal do not maintain this bridge defacto 2021-12-03 05:31:46 +05:00
triatic 9c99a1a9c1
[FacebookBridge] Increase cache timeout (#2149)
Facebook aggressively throttles queries now
2021-06-12 23:15:56 +05:00
Joshua Coales efe32aad22
[FacebookBridge] Permalink fix (#1838)
Also removing timestamp which does not work
2020-11-06 18:43:10 +05:00
Joshua Coales 6af87b2f32
[FacebookBridge] Use touch.facebook.com for groups (#1817) 2020-10-29 08:42:49 +05:00
Joshua Coales 45e2f385b3
[FacebookBridge] Handle mobile links and unify host validation (#1789) 2020-10-15 14:08:03 +05:00
jannyba ef54a78430
[InstagramBridge] Fix "Skip reviews" checkbox description (#1702) 2020-08-16 11:23:48 +05:00
Eugene Molotov 6a90a9d33f
phpcs: fix new sudden violations (#1443) 2020-01-31 14:30:31 +01:00
LogMANOriginal df9f7eb778
[FacebookBridge] Fix permalink issue (#1358)
Facebook has changed their strategy regarding permalinks, which
now include lots of unnecessary target data. Fortunately it also
contains the unique story id which we can utilize as URI.
2019-12-01 13:24:11 +01:00
somini 2ac44172ac Facebook: Clarify Facebook bridges (#1221)
* Clarify Facebook bridges status

Distinguish between both Facebook bridges by their title.
This preserves all existing URLs.

* Update all URLs to secure HTTPS versions.
* Configure author name abbreviation
* Improve feed names

Use the correct feed name on each bridge.
Make sure the feed names don't repeat the "Facebook" name.
2019-10-28 19:01:04 +01:00
somini 0eab63d728 Update Facebook URL detection (#1334)
* add detectParameters to FacebookBridge.php
2019-10-16 21:32:29 +02:00
triatic 53fbd2a5a0 [FacebookBridge] Prevent sending empty header (#1239)
* [FacebookBridge] Prevent sending empty header

When running in CLI mode, `getEnv('HTTP_ACCEPT_LANGUAGE')` returns `false`. In that case, don't send the `Accept-Language` header.
2019-09-07 18:32:06 +02:00
triatic 21b27a1042 [FacebookBridge] Remove relative date from content (#1212)
Remove relative date from content, as well as the separator after it.

As mentioned in #1188.
2019-07-26 10:56:34 +02:00
logmanoriginal 6c4098d655 Revert "all: Use ->remove() instead of ->outertext = ''"
This reverts commit 052844f5e1.

There is a bug in ->remove() that causes the parser to incorrectly
identify elements in the DOM tree that shouldn't exist anymore.

References #1151
2019-06-02 13:06:16 +02:00
logmanoriginal 052844f5e1 all: Use ->remove() instead of ->outertext = ''
simplehtmldom 1.9 introduced new functions to recursively remove
nodes from the DOM. This allows removing elements without the need
to re-load the document by using $html->load($html->save()), which
is very inefficient.

Find more information about remove() at
https://simplehtmldom.sourceforge.io/docs/1.9/api/simple_html_dom_node/remove/
2019-06-01 21:29:57 +02:00
Aleś Bułojčyk b9bbc9bdda [FacebookBridge] Fix decoding of cyrillic letters in group names (#842) 2019-03-23 16:39:09 +01:00
triatic 81ee15a161 general: Fix PHP 7.3 deprecation warnings (#982)
Fix PHP 7.3 deprecation warnings. FILTER_VALIDATE_URL implies FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED since PHP 5.2.1

https://bugs.php.net/bug.php?id=75442
2018-12-28 16:13:03 +01:00
logmanoriginal e7d3a006c8 global: Fix code violations 2018-12-26 21:58:07 +01:00
triatic 3806895059 [FacebookBridge] Improve titles (#924)
A slightly improved version of #454 and #468 . Build titles from content rather than author + pre-content (which doesn't reflect anything useful).
2018-11-16 15:33:54 +01:00
triatic 599d438a0d [FacebookBridge] Decode all elements in $item (#925) 2018-11-16 15:25:58 +01:00
logmanoriginal cb91d9cce8 [FacebookBridge] Fix media origin info is not inside a tag
References #912
2018-11-08 19:24:14 +01:00
triatic bf91f106b4 [FacebookBridge] Remove "Posts" from author name (#917) 2018-11-08 19:04:58 +01:00
logmanoriginal 0b2ede35cd [FacebookBridge] Don't remove origin information from embedded media
References #912
2018-11-08 18:59:12 +01:00
logmanoriginal 5842bdfc83 [FacebookBridge] Simplify implementation 2018-11-08 18:45:25 +01:00
logmanoriginal 68ee24d6bd [FacebookBridge] Remove videos and views
This commit adds filters to remove embedded videos and view counts from
all posts. This doesn't remove the preview image for videos, which are
embedded separately.
2018-11-08 18:36:11 +01:00
logmanoriginal 104ae2298e [FacebookBridge] Remove hidden elements
Hidden elements are used for error conditions and generally made
visible using JavaScript. Since RSS-Bridge doesn't support JS, these
error messages are shown in the final feed. For example:

"It looks like you may be having problems playing this video. If so,
please try restarting your browser."

This commit removes all hidden elements to prevent error messages being
added to the feed.

- "It looks like you may be having problems playing this video. If so,
please try restarting your browser."
2018-11-08 18:24:05 +01:00
logmanoriginal 7026684e34 [FacebookBridge] Don't remove description of embedded media
FB includes origin information (i.e. "YOUTUBE.COM") as well as
descriptions with embedded media (images and video).

These details are currently being removed by the bridge.

This commit changes implementation to only remove origin information
and keep the media description in place. The media description consists
of two elements - title and description. The title provided by FB is
included in an anchor, which gets replaced by a paragraph with the
same contents to improve readability.

References #912
2018-11-08 18:12:57 +01:00
teromene 723bd1150a Remove tracking codes from Facebook posts 2018-11-06 16:58:58 +01:00
logmanoriginal 3031fa406d core: Set code in header() instead of calling http_response_code() 2018-11-05 19:29:01 +01:00
logmanoriginal 7621784598 bridges: Add favicon to bridges missing it
Adds favicon to bridges that support it. Some sites prevent downloading
favicons, those bridges are left untouched.

Affected bridges:

- AutoJMBridge
- BandcampBridge
- BlaguesDeMerdeBridge
- BloombergBridge
- BundesbankBridge
- ChristianDailyReporterBridge
- ContainerLinuxReleasesBridge
- DailymotionBridge
- DiceBridge
- DribbbleBridge
- EliteDangerousGalnetBridge
- ElsevierBridge
- FacebookBridge
- FB2Bridge
- FDroidBridge
- FierPandaBridge
- GooglePlusPostBridge
- JapanExpoBridge
- KATBridge
- KernelBugTrackerBridge
- LegifranceJOBridge
- NotAlwaysBridge
- NyaaTorrentsBridge
- PinterestBridge
- RadioMelodieBridge
- RainbowSixSiegeBridge
- SupInfoBridge
- TagBoardBridge
- TebeoBridge
- TheTVDBBridge
- WhydBridge
- ZoneTelechargementBridge
2018-10-26 19:10:58 +02:00
logmanoriginal c56f7abc2a [FacebookBridge] Reduce occurrence of HTTP error 302
Facebook returns "HTTP/1.1 302 Found" when requesting:
  https://www.facebook.com//pg/username/posts?_fb_noscript=1
Automatically redirecting to:
  https://www.facebook.com/username/posts/

We receive a positive response faster when directly requesting the
correct page:
  https://www.facebook.com/username/posts?_fb_noscript=1

Notice: This is just a minor adjustment to improve performance while
requesting data from the server. The previous version worked fine as
well.
2018-10-24 17:27:46 +02:00
logmanoriginal cb488d9d8c [FacebookBridge] Fix broken feeds
This commit collects the original contents from a different
tag to prevent this issue. The root cause is unknown but closely
related to the regex.

References #877
2018-10-20 15:45:20 +02:00
logmanoriginal 717b0bdd9c Fix items link to localhost
References #864
2018-10-16 19:16:51 +02:00
logmanoriginal 7561c0685d [FacebookBridge] Fix 'SpSonSsoSredS' text in title
The function 'defaultLinkTo' applied to the source HTML does break
regex matches later in the bridge. We need to apply the function
right before adding the contents to the item for the bridge to work
properly.

References #856
2018-10-15 19:53:46 +02:00
logmanoriginal 5779f641c0 [FacebookBridge] Add option to limit number of returned items
This commit adds a new optional parameter 'limit' which can be used
to limit the number of items returned by this bridge (i.e. '&limit=10')

As requested in #669
2018-10-15 17:35:10 +02:00
logmanoriginal fcc9f9fd61 [FacebookBridge] Use alternative URI to load more posts
The URI "https://facebook.com/username?_fb_noscript=1" returns two
posts per user. Some profiles, however, are very active, causing the
bridge to miss items if more than two posts are send within the cache
duration (5 minutes).

The alternative suggested in #669 is to use a different URI:
"https://facebook.com/pg/username/posts?_fb_noscript=1"

While the contents of this URI essentially look the same when viewed
in a browser, it actually returns more than 10 posts depending on the
profile.

References #669
2018-09-26 18:24:46 +02:00
logmanoriginal e1c4914b1c [FacebookBridge] Optimize for readability 2018-09-25 18:56:33 +02:00
ORelio de8cee6a1c Catching up | [Main] Debug mode, parse utils, MIME | [Bridges] Add/Improve 20 bridges (#802)
* Debug mode improvements

 - Improve debug warning message
 - Restore error reporting in debug mode
 - Fix 'notice' messages for unset fields

* Add parsing utility functions

html.php
 - extractFromDelimiters
 - stripWithDelimiters
 - stripRecursiveHTMLSection
 - markdownToHtml (partial)

bridges
 - remove now-duplicate functions
 - call functions from html.php instead

* [Anidex] New bridge

Anime torrent tracker

* [Anime-Ultime] Restore thumbnail

* [CNET] Recreate bridge

Full rewrite as the previous one was broken

* [Dilbert] Minor URI fix

Use new self::URI property

* [EstCeQuonMetEnProd] Fix content extraction

Bridge was broken

* [Facebook] Fix "SpSonsSoriSsés" label

... which was taking space in item title

* [Futura-Sciences] Use HTTPS, More cleanup

Use HTTPS as FS now offer HTTPS
Clean additional useless HTML elements

* [GBATemp] Multiple fixes

- Fix categories: missing "break" statements
- Restore thumbnail as enclosure
- Fix date extraction
- Fix user blog post extraction
- Use getSimpleHTMLDOMCached

* [JapanExpo] Fix bridge, HTTPS, thumbnails

- Fix getSimpleHTMLDOMCached call
- Upgrade to HTTPS as JE now offers HTTPS
- Restore thumbnails as enclosures

* [LeMondeInformatique] Fix bridge, HTTPS

- Upgrade to HTTPS as LMI now offers HTTPS
- Restore thumbnails using small images
- Fix content extraction
- Fix text encoding issue

* [Nextgov] Fix content extraction

- Restore thumbnail and use small image
- Field extraction fixes

* [NextInpact] Add categories and filtering by type

- Offer all RSS feeds
- Allow filtering by article type
- Implement extraction for brief articles
- Remove article limit, many brief articles are publied all at once

* [NyaaTorrents] New bridge

Anime torrent tracker

* [Releases3DS] Cache content, restore thumbnail

- Use getSimpleHTMLDOMCached
- Restore thumbnail as enclosure

* [TheHackerNews] Fix bridge

 - Fix content extraction including article body
 - Restore thumbnail as enclosure

* [WeLiveSecurity] HTTPS, Fix content extraction

- Upgrade to HTTPS as WLS now offers HTTPS
- Fix content extraction including article body

* [WordPress] Reduce timeout, more content selectors

- Reduce timeout to use default one (1h)
- Add new content selector (articleBody)
- Find thumbnail and set as enclosure
- Fix <script> cleanup

* [YGGTorrent] Increase limit, use cache

- Increase item limit as uploads are very frequent
- Use getSimpleHTMLDOMCached

* [ZDNet] Rewrite with FeedExpander

- Upgrade to HTTPS as ZD now offers HTTPS
- Use FeedExpander for secondary fields
- Fix content extraction for article body

* [Main] Handle MIME type for enclosures

Many feed readers will ignore enclosures (e.g. thumbnails) with no MIME type. This commit adds automatic MIME type detection based on file extension (which may be inaccurate but is the only way without fetching the content).

One can force enclosure type using #.ext anchor (hacky, needs improving)

* [FeedExpander] Improve field extraction

- Add support for passing enclosures
- Improve author and uri extraction
- Fix 'notice' PHP error messages

* [Pull] Coding style fixes for #802

* [Pull] Implementing changes for #802

 - Fix coding style issues with str append
 - Remove useless CACHE_TIMEOUT
 - Use count() instead of $limit
 - Use defaultLinkTo() + handle strings
 - Use http_build_query()
 - Fix missing </em>
 - Remove error_reporting(0)
 - warning CSS (@LogMANOriginal)
 - Fix typo in FeedExpander comment

* [Main] More documentation for markdownToHtml

See #802 for more details
2018-09-09 20:20:13 +01:00
Eugene Molotov bf30ad127c [FacebookBridge] Removes query string from post links
* [FacebookBridge] Removes query string from post links
2018-09-09 16:31:15 +01:00
LogMANOriginal 0d80a19e84
[FacebookBridge] Add context for public Facebook groups (#739)
The previous context is now labeled 'User', while the new context is
labeled 'Group'. The existing code was not changed, instead new group*
functions were implemented to handle groups.

The general principle of capturing groups is the same as done for users
with adjustments to account for different HTML structures.

Captcha responses are currently not supported for groups! There doesn't
seem to be a way to trigger them consistently, which makes it hard to
handle them properly.

Features of the group context:

- The feed title is based on the group name
- The group URI used for capturing is returned for the feed URI
- Author names and timestamps are reproduced from the source
- Post titles are reproduced from the source if they exist, otherwise
the title is build manually from the author name and the content
- Original contents are included with the feed
- All images are attached as enclosures as well

Closes #
2018-07-08 17:16:00 +02:00
LogMANOriginal 193ca87afa [phpcs] enforce single quotes (#732)
* [phpcs] Add rule to enforce single quoted strings
2018-06-29 22:55:33 +01:00
logmanoriginal 5087f5f79e [FacebookBridge] Support facebook links as user name
Allows users to paste facebook links as user name. The link must contain
the correct host (www.facebook.com) and a valid path (/user-name/...).
The first part of the path is used for the user name. Errors are returned
in case something went wrong.

References #706
2018-06-24 11:14:08 +02:00
logmanoriginal 4a5f190e0e [FacebookBridge] Add option to skip reviews
Reviews are provided the same way as summary posts and therefore returned
as separate feed item for each review. This commit adds a new option
'&skip_reviews=on' to skip reviews entirely.

References #706
2018-06-24 10:52:22 +02:00
teromene 79ebdc4b39 Warn the user when trying to fetch a non-public facebook page. 2018-05-05 13:49:49 +01:00
logmanoriginal 6caca4946b bridges: Fix bridges with custom headers and options
This commit fixes bridges which called getContents, getSimpleHTMLDOM
or getSimpleHTMLDOMCached with custom settings.
2018-04-06 20:42:19 +02:00