Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActivityPub objects contain links to Friendica for hashtags and mentions #8642

Open
koehn opened this issue May 16, 2020 · 13 comments
Open

ActivityPub objects contain links to Friendica for hashtags and mentions #8642

koehn opened this issue May 16, 2020 · 13 comments

Comments

@koehn
Copy link

@koehn koehn commented May 16, 2020

Expected behavior

Content in objects transmitted over activitypub should not contain hashtags and mentions surrounded by links to the Friendica server.

Actual behavior

Content in objects transmitted over activitypub contain hashtags and mentions surrounded by links to the Friendica server. The links render the hashtags unparsable by other servers, because they're inserted between the hashtag symbol # and the name of the hashtag. They also contain CSS information relevant only to Friendica servers.

  "content": "Hello Brad! #<a href=\"https://friendica.mrpetovan.com/search?tag=hashtag\" class=\"tag\" rel=\"tag\" title=\"hashtag\">hashtag</a>",

The links lead to confusion for users, as they link away from the server they're on to a page on another server. This is inconsistent with the general user experience for these applications.

Steps to reproduce the problem

curl -H 'Accept: application/activity+json' "https://friendica.mrpetovan.com/objects/735a2029-125e-b326-fb6f-d81402846040" | jq .
@annando
Copy link
Collaborator

@annando annando commented May 16, 2020

The way we are creating these tags is done comparable to how Pleroma and Mastodon are doing it. To find (and probably change) the links you should have a look at the provided array where all mentions and tags should be summed up.

See https://mastodon.social/@heluecht/104178141231819096 and https://pleroma.soykaf.com/objects/84c48190-1925-469b-844f-57d2fb99e34c for a comparism.

@koehn
Copy link
Author

@koehn koehn commented May 16, 2020

It’s pretty absurd to me to present the user with a link that yanks them away from their platform to one where they’re not logged in, possibly with a different UX, and they cannot take any meaningful action on the information presented when they get there. The content doesn’t even appear in a new window.

Even more absurd is to expect other developers to parse out HTML in order to get to the original, untainted content the user entered in the first place, when it would be easier not to do so.

Those links should be applied in the presentation layer: they’re easy to parse and allow developers to provide a simpler, richer user experience. By applying them earlier Friendica, Pleroma and Mastodon remove that option from other developers and prevent users from getting a better experience.

So to get my users the experience I want them to have, I need to write code that somehow detects that some (but not all) links in content are actually hashtags or mentions (while others are not), remove them, piece the text back together (because you’ve broken the text nodes with your links by leaving the # outside the anchor), and then I can parse the content. I have to write this code for every version of every server that does this, and maintain it so that new versions or new servers that do it differently also work.

Does that seem like a reasonable thing to do, or should all servers just leave the user content unmolested and allow servers to interpret and display it to the best of their ability?

I apologize for the rant, but this seems like a terrible state of affairs.

@MrPetovan
Copy link
Collaborator

@MrPetovan MrPetovan commented May 16, 2020

We've been having the philosophy "Send strict, accept lax" but this is clearly a case where we could send stricter without links referring back to the originating node that are supposed to be discarded by the receiving server.

@annando
Copy link
Collaborator

@annando annando commented May 16, 2020

We can enclose the hashtag inside of the link, that's no problem. But AFAIK the other systems are only looking for links with rel="tag" and then do their thing. Without providing a link at all (just the pure hashtag) I have the feeling as if other systems wouldn't parse them at all (at least that's what I experienced when testing this - if I remember correctly).

@MrPetovan
Copy link
Collaborator

@MrPetovan MrPetovan commented May 16, 2020

This is our current policy at Friendica: we treat plain text hashtags as, well, plain text. However the Mammoth message contains the hashtag in the tag array which would allow us to match it an transform it into a link.

@koehn
Copy link
Author

@koehn koehn commented May 16, 2020

Is there a consistent, agreed-upon specification as to precisely what constitutes a hashtag? I'm using /(?:^|\B)#(?![\p{Nd}\p{Pc}]+\b)([\p{L}\p{Nl}\p{Nd}\p{Pc}]{1,30})(?:\b|\r)/gu, which gets me a very wide set. But if others use a different definition, my users will be confused, because what becomes a hashtag will depend on what version of what server the hashtag was initially submitted to. So while #enchanté is a complete hashtag on my server which recognizes non-roman characters as valid, servers that don't would show it as #enchanté. My users would be justified in wondering why the same text results in different hashtags depending on the originating system, which is a poor experience for everybody. If each server uses its own definition for all hashtags presented to its users, it is consistent in a way that is understandable to those users.

@MrPetovan
Copy link
Collaborator

@MrPetovan MrPetovan commented May 16, 2020

I believe this could be the reason we send/parse links, so that you don't have to use a regular expression to know what the full tag is, since it's going to be whatever is in the text node of the link. This allows spaces in received tags even though we don't allow the creation of such tags in Friendica.

But to answer your question, if #enchanté is in the tag array of the message, it could prevent hashtag mismatch on remote servers, or at least provide plausible deniability. And for incoming messages, instead of matching a regular expression, you look for links with rel=tag or tags in the tag array property. The only time you would have to use the regular expression would be during post storage, and at this point you can make it as permissive or as restricted as you want, as long as your users know the rule.

@annando
Copy link
Collaborator

@annando annando commented May 16, 2020

Since hashtags could be in Korean, Icelandic, Cyrillic, Japanese (Hiragana, Katakana or Kanji), Chinese and whatever. So finding a good rule is a problem. That's the reason why I prefer having this work done via the remote system.

I will soon add a coding so that the hashtag will be part of the link. I also will check if other systems do need that CSS class in the link or if the rel="tag" is enough. This should help with parsing.

@MrPetovan
Copy link
Collaborator

@MrPetovan MrPetovan commented May 16, 2020

We probably still need to support plain hashtags with the corresponding string in the tag array property.

@annando
Copy link
Collaborator

@annando annando commented May 16, 2020

I guess that plaintext hashtags don't work well with other systems. I can try it out, of course.

@MrPetovan
Copy link
Collaborator

@MrPetovan MrPetovan commented May 16, 2020

Just for received messages. We can keep outputting links in the body.

@annando
Copy link
Collaborator

@annando annando commented May 16, 2020

AFAIK we always parse the incoming plaintext for hashtags.

@MrPetovan
Copy link
Collaborator

@MrPetovan MrPetovan commented May 16, 2020

This isn’t the case for the last two Mammoth message I received from @koehn .

@MrPetovan MrPetovan closed this May 16, 2020
@MrPetovan MrPetovan reopened this May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.