Data-driven Design for Recommendations in Decentralized Social Media
Carl Colglazier
Community Data Science Collective
Northwestern University
@carl@hci.social
June 3, 2024
Figure 1: Accounts in the dataset created between January 2022 and March 2023. The top panels shows the proportion of accounts still active 45 days after creation, the proportion of accounts that have moved, and the proportion of accounts that have been suspended. The bottom panel shows the count of accounts created each week. The dashed vertical lines in the bottom panel represent the announcement day of the Elon Musk Twitter acquisition, the acquisition closing day, a day where Twitter suspended a number of prominent journalist, and a day when Twitter experienced an outage and started rate limiting accounts.
Caveat: how do we determine success (for servers) in a decentralized social network?
A writer discovers an alternative technology system
Media hypes it as a “killer” of a major platform
The system does not in fact “kill” the major platform
The system is declared a failure
This has happened mutliple times already (Zulli, Liu, and Gehl 2020).
Mastodon does not need to replace something else to be successful
Do people find value in the system?
Each server has its own:
Moderation policies and community norms (Nicholson, Keegan, and Fiesler 2023; Gehl and Zulli 2023)
Relationship with other servers (Colglazier, TeBlunthuis, and Shaw 2024)
Local timeline
Working assumptions:
Newcomers are key to Mastodon’s growth and sustainability (Kraut, Resnick, and Kiesler 2011)
Retaining newcomers is better than losing them
We thus ask: do some servers retain newcomers better than others?
Applying a survival model to accounts created in May 2023, we find accounts on the 12 largest servers featured on joinmastodon.org are more likely to become inactive in the first 91 days; further, we find accounts on smaller servers are less likely to become inactive.
| Term | Estimate | 95% CI | p-value |
|---|---|---|---|
| Join Mastodon | 0.115 | (0.97, 1.3) | 0.117 |
| General Servers | 0.385 | (1.07, 2.01) | 0.017 |
| Small Server | -0.245 | (0.66, 0.92) | 0.003 |
Cox Proportional Hazard Model with Mixed Effects. The model includes a random effect for the server.
|
Model A
|
Model B
|
|||
|---|---|---|---|---|
| Coef. | Std.Error | Coef. | Std.Error | |
| Sum | -9.529 | ***0.188 | -10.268 | ***0.718 |
| Nonzero | -3.577 | ***0.083 | -2.861 | ***0.254 |
| Smaller server | 0.709 | ***0.032 | 0.629 | ***0.082 |
| Server size (outgoing) | 0.686 | ***0.013 | 0.655 | ***0.042 |
| Open registrations (incoming) | 0.168 | ***0.046 | -0.250 | 0.186 |
| Languages match | 0.044 | 0.065 | 0.589 | 0.392 |
Exponential family random graph models for account movement between Mastodon servers. Accounts in Model A were created in May 2022 and moved to another account at some later point. Accounts in Model B were created at some earlier point and moved after October 2023.
Consent: servers should be able to choose whether to participate
Privacy: do not reveal information about individual accounts
Decentralization: do not concentrate data in one place
Openness: use shared standards and protocols
A decentralized, tag-based collaborative filtering system
Each server reports their top tags from the last three months
Learn from these reports and from other servers which tags are most important for each server
Recommend servers based on selected tags of interest
Report top hashtags used by the most accounts on each server
For robustness, drop hashtags used by too few accounts or servers
Build an \(m \times n\) server-tag matrix \(M\)
Normalize with Okai BM25 TF-IDF and L2 normalization
Apply singular value decomposition (SVD) on \(M\) to create a new matrix \(M'\)
Match servers to selected tags using cosine similarity
Special thanks to my qualifying exam committee: Noshir Contractor, Darren Gergle, and Aaron Shaw; to the Community Data Science Collective; and to the Technology and Social Behavior program at Northwestern University for funding this research.
timeline
title Mastodon and Fediverse Timeline
2008: OStatus Protocol
2016: Mastodon releases v0.1
2018: ActivityPub standard published
2019: Mastodon drops OStatus
2022: Elon Musk Twitter acquisition
: Truth Social launches using Mastodon code
2023: Mastodon reaches 2M active users
: Threads (Meta) begins experimental support for ActivityPub