Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feed discovery does not work with relative URLs in links #1385

Open
3 tasks done
mormegil-cz opened this issue May 28, 2021 · 13 comments
Open
3 tasks done

Feed discovery does not work with relative URLs in links #1385

mormegil-cz opened this issue May 28, 2021 · 13 comments
Labels

Comments

@mormegil-cz
Copy link
Contributor

IMPORTANT

Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)

  • I have read the CONTRIBUTING.md and followed the provided tips
  • I accept that the issue will be closed without comment if I do not check here
  • I accept that the issue will be closed without comment if I do not fill out all items in the issue template.

Explain the Problem

When trying to add a blog to the News reader, I was unable to do so, News repeatedly claims the hostname was not found.

The (first) problem I found is that during the discovery phase, <link> element’s href attributes are used as written which does not work for relative URLs (allowed by the spec).

Steps to Reproduce

Explain what you did to encounter the issue

  1. Try to add a new feed: https://k47.cz/
  2. An error appears: cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)

The problem is the k47.cz page links to its feed using a relative URL <link rel=alternate type=application/rss+xml href=rss.xml title="RSS zdroj"> which is then not resolved and News just attempts to fetch an “URL” of http://rss.xml.

It might be argued this is an upstream bug; feed-io’s Explorer might resolve the relative URIs itself. Hard to tell, there is no specification of its expected behavior, AFAICT.

I was able to fix the problem by resolving relative URLs after discovery:

Patch fixing the problem
--- FeedServiceV2.php.bak       2021-05-28 07:48:45.524385111 +0000
+++ FeedServiceV2.php   2021-05-28 07:58:19.287691101 +0000
@@ -16,6 +16,7 @@
 use FeedIo\Explorer;
 use FeedIo\Reader\ReadErrorException;
 use HTMLPurifier;
+use Net_URL2;

 use OCA\News\Db\FeedMapperV2;
 use OCA\News\Fetcher\FeedFetcher;
@@ -199,7 +200,13 @@
         if ($full_discover) {
             $feeds = $this->explorer->discover($feedUrl);
             if ($feeds !== []) {
-                $feedUrl = array_shift($feeds);
+                $discoveredUrl = array_shift($feeds);
+               $url2 = new Net_URL2($discoveredUrl);
+               if ($url2->isAbsolute()) {
+                       $feedUrl = $discoveredUrl;
+               } else {
+                       $feedUrl = strval((new Net_URL2($feedUrl))->resolve($discoveredUrl));
+               }
             }
         }

System Information

  • News app version: 15.4.5
  • Nextcloud version: 20.0.9
  • Cron type: Cron running on systemd timer
  • PHP version: 7.4.18
  • Database and version: mysql 10.5.10
  • Browser and version: Firefox 88.0
  • OS and version: Arch Linux/4.14.232
Contents of nextcloud/data/nextcloud.log
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Json","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Atom","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Rss","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Rdf","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"discover feeds from https://k47.cz/","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"read access : rss.xml into a feed instance (feed class : FeedIo\\Feed)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"start reading rss.xml","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"no 'modifiedSince' parameter given, setting it to 01/01/1970","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"hitting rss.xml","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":2,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"rss.xml read error : cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
@SMillerDev
Copy link
Contributor

https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fk47.cz%2Frss.xml even the self reference is broken. I'd just recommend alerting author of this issue.

@mormegil-cz
Copy link
Contributor Author

Yes, that was the other problem I hit; I have already contacted the author about that. However, this issue is not caused by the broken self-link in the feed.

@stale
Copy link

stale bot commented Jul 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jul 21, 2021
@mormegil-cz
Copy link
Contributor Author

What does it mean “no recent activity”? Should I keep commenting that yes, this is still broken?

@stale stale bot removed the stale label Aug 17, 2021
@SMillerDev
Copy link
Contributor

It means that nobody has time or motivation to do something about it. So at some point it'll be closed automatically unless someone fixes it before then.

@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jan 8, 2022
mormegil-cz added a commit to mormegil-cz/news that referenced this issue Jan 12, 2022
When a feed is added using the feed discovery feature, and the feed link
uses a relative URL, the discovery needs to resolve the URL relative
to the provided website URL.

Fixes nextcloud#1385
mormegil-cz added a commit to mormegil-cz/news that referenced this issue Jan 12, 2022
When a feed is added using the feed discovery feature, and the feed link
uses a relative URL, the discovery needs to resolve the URL relative
to the provided website URL.

Fixes nextcloud#1385

Signed-off-by: Mormegil <[email protected]>
@stale stale bot closed this as completed Apr 16, 2022
@IgorA100
Copy link
Contributor

IgorA100 commented Nov 4, 2023

Solved here: alexdebril/feed-io#422

@IgorA100
Copy link
Contributor

I made another PR fix, but it hasn't been approved yet: alexdebril/feed-io#431

@Grotax
Copy link
Member

Grotax commented Jan 10, 2025

Yea I saw that thanks, it seems like the maintenance activity on feed-io has gone down quite a bit, unfortunately...

@provokateurin
Copy link
Member

The same is true for absolute paths, for example on https://notapplicable.dev.

@mormegil-cz
Copy link
Contributor Author

mormegil-cz commented Feb 3, 2025

Yeah, an “absolute path” (/somewhere/something) is still a “relative URI” (resolved relatively to the base URI). :-)

[Also, note that this was (automatically) closed as either “stale” or “completed” [!], but it is very much not so, still the same problem as always, and “You do not have permissions to reopen this issue”, even though that would not help anyway, it would get closed automatically as stale again, I guess. Ceterum censeo https://nostalebots.xyz/ etc.]

@provokateurin
Copy link
Member

Well I have the power and despite there being an upstream PR, it is neither merged nor fixed in the News app itself.

@provokateurin provokateurin reopened this Feb 3, 2025
@stale stale bot removed the stale label Feb 3, 2025
@Grotax
Copy link
Member

Grotax commented Feb 3, 2025

So regarding this issue unless someone here takes over the maintenance of feed-io I see no way to fix this.

Taking over maintenance would mean to fork the repository, check all the open PRs and merge them if needed.

Then to setup the release and publishing procedure because news would still want to pull this via composer and other projects probably too.

I have been thinking about doing this myself but didn't have the motivation or time to do that.

Being maintainer of a project does not mean that you have to do all stuff like coding but to respond to questions if possible and to review PRs and stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants