ActivityPub objects contain links to Friendica for hashtags and mentions #8642

koehn · 2020-05-16T12:18:03Z

Expected behavior

Content in objects transmitted over activitypub should not contain hashtags and mentions surrounded by links to the Friendica server.

Actual behavior

Content in objects transmitted over activitypub contain hashtags and mentions surrounded by links to the Friendica server. The links render the hashtags unparsable by other servers, because they're inserted between the hashtag symbol # and the name of the hashtag. They also contain CSS information relevant only to Friendica servers.

  "content": "Hello Brad! #<a href=\"https://friendica.mrpetovan.com/search?tag=hashtag\" class=\"tag\" rel=\"tag\" title=\"hashtag\">hashtag</a>",

The links lead to confusion for users, as they link away from the server they're on to a page on another server. This is inconsistent with the general user experience for these applications.

Steps to reproduce the problem

curl -H 'Accept: application/activity+json' "https://friendica.mrpetovan.com/objects/735a2029-125e-b326-fb6f-d81402846040" | jq .

annando · 2020-05-16T12:27:29Z

The way we are creating these tags is done comparable to how Pleroma and Mastodon are doing it. To find (and probably change) the links you should have a look at the provided array where all mentions and tags should be summed up.

See https://mastodon.social/@heluecht/104178141231819096 and https://pleroma.soykaf.com/objects/84c48190-1925-469b-844f-57d2fb99e34c for a comparism.

koehn · 2020-05-16T13:10:50Z

It’s pretty absurd to me to present the user with a link that yanks them away from their platform to one where they’re not logged in, possibly with a different UX, and they cannot take any meaningful action on the information presented when they get there. The content doesn’t even appear in a new window.

Even more absurd is to expect other developers to parse out HTML in order to get to the original, untainted content the user entered in the first place, when it would be easier not to do so.

Those links should be applied in the presentation layer: they’re easy to parse and allow developers to provide a simpler, richer user experience. By applying them earlier Friendica, Pleroma and Mastodon remove that option from other developers and prevent users from getting a better experience.

So to get my users the experience I want them to have, I need to write code that somehow detects that some (but not all) links in content are actually hashtags or mentions (while others are not), remove them, piece the text back together (because you’ve broken the text nodes with your links by leaving the # outside the anchor), and then I can parse the content. I have to write this code for every version of every server that does this, and maintain it so that new versions or new servers that do it differently also work.

Does that seem like a reasonable thing to do, or should all servers just leave the user content unmolested and allow servers to interpret and display it to the best of their ability?

I apologize for the rant, but this seems like a terrible state of affairs.

MrPetovan · 2020-05-16T13:23:41Z

We've been having the philosophy "Send strict, accept lax" but this is clearly a case where we could send stricter without links referring back to the originating node that are supposed to be discarded by the receiving server.

annando · 2020-05-16T13:52:06Z

We can enclose the hashtag inside of the link, that's no problem. But AFAIK the other systems are only looking for links with rel="tag" and then do their thing. Without providing a link at all (just the pure hashtag) I have the feeling as if other systems wouldn't parse them at all (at least that's what I experienced when testing this - if I remember correctly).

MrPetovan · 2020-05-16T14:32:11Z

This is our current policy at Friendica: we treat plain text hashtags as, well, plain text. However the Mammoth message contains the hashtag in the tag array which would allow us to match it an transform it into a link.

koehn · 2020-05-16T17:04:26Z

Is there a consistent, agreed-upon specification as to precisely what constitutes a hashtag? I'm using /(?:^|\B)#(?![\p{Nd}\p{Pc}]+\b)([\p{L}\p{Nl}\p{Nd}\p{Pc}]{1,30})(?:\b|\r)/gu, which gets me a very wide set. But if others use a different definition, my users will be confused, because what becomes a hashtag will depend on what version of what server the hashtag was initially submitted to. So while #enchanté is a complete hashtag on my server which recognizes non-roman characters as valid, servers that don't would show it as #enchanté. My users would be justified in wondering why the same text results in different hashtags depending on the originating system, which is a poor experience for everybody. If each server uses its own definition for all hashtags presented to its users, it is consistent in a way that is understandable to those users.

MrPetovan · 2020-05-16T17:21:11Z

I believe this could be the reason we send/parse links, so that you don't have to use a regular expression to know what the full tag is, since it's going to be whatever is in the text node of the link. This allows spaces in received tags even though we don't allow the creation of such tags in Friendica.

But to answer your question, if #enchanté is in the tag array of the message, it could prevent hashtag mismatch on remote servers, or at least provide plausible deniability. And for incoming messages, instead of matching a regular expression, you look for links with rel=tag or tags in the tag array property. The only time you would have to use the regular expression would be during post storage, and at this point you can make it as permissive or as restricted as you want, as long as your users know the rule.

annando · 2020-05-16T17:32:54Z

Since hashtags could be in Korean, Icelandic, Cyrillic, Japanese (Hiragana, Katakana or Kanji), Chinese and whatever. So finding a good rule is a problem. That's the reason why I prefer having this work done via the remote system.

I will soon add a coding so that the hashtag will be part of the link. I also will check if other systems do need that CSS class in the link or if the rel="tag" is enough. This should help with parsing.

MrPetovan · 2020-05-16T19:25:04Z

We probably still need to support plain hashtags with the corresponding string in the tag array property.

annando · 2020-05-16T19:39:57Z

I guess that plaintext hashtags don't work well with other systems. I can try it out, of course.

MrPetovan · 2020-05-16T19:49:40Z

Just for received messages. We can keep outputting links in the body.

annando · 2020-05-16T20:15:30Z

AFAIK we always parse the incoming plaintext for hashtags.

MrPetovan · 2020-05-16T20:58:42Z

This isn’t the case for the last two Mammoth message I received from @koehn .

annando mentioned this issue May 16, 2020

issue 8642: Make hashtags more compatible #8645

Merged

MrPetovan closed this May 16, 2020

MrPetovan reopened this May 16, 2020

friendica / friendica

ActivityPub objects contain links to Friendica for hashtags and mentions #8642

ActivityPub objects contain links to Friendica for hashtags and mentions #8642

koehn commented May 16, 2020

annando commented May 16, 2020 •

edited

koehn commented May 16, 2020

MrPetovan commented May 16, 2020

annando commented May 16, 2020 •

edited

MrPetovan commented May 16, 2020

koehn commented May 16, 2020

MrPetovan commented May 16, 2020

annando commented May 16, 2020

MrPetovan commented May 16, 2020

annando commented May 16, 2020

MrPetovan commented May 16, 2020

annando commented May 16, 2020

MrPetovan commented May 16, 2020

friendica / friendica

Join GitHub today

ActivityPub objects contain links to Friendica for hashtags and mentions #8642

ActivityPub objects contain links to Friendica for hashtags and mentions #8642

Comments

koehn commented May 16, 2020

Expected behavior

Actual behavior

Steps to reproduce the problem

annando commented May 16, 2020 • edited

koehn commented May 16, 2020

MrPetovan commented May 16, 2020

annando commented May 16, 2020 • edited

MrPetovan commented May 16, 2020

koehn commented May 16, 2020

MrPetovan commented May 16, 2020

annando commented May 16, 2020

MrPetovan commented May 16, 2020

annando commented May 16, 2020

MrPetovan commented May 16, 2020

annando commented May 16, 2020

MrPetovan commented May 16, 2020

Essential cookies

Always active

Analytics cookies

annando commented May 16, 2020 •

edited

annando commented May 16, 2020 •

edited