Team : NPA
Departure date : 05/31/2014
Supervision : Renata CRUZ TEIXEIRA
Matching User Accounts Across Online Social Networks: Methods and Applications
The proliferation of social networks and all the personal data that people share brings many opportunities for developing exciting new applications. At the same time, however, the availability of vast amounts of personal data raises privacy and security concerns. These opportunities are even more interesting (and the concerns more serious) if we can aggregate the data that a single individual publishes in different social networks to create a more complete image of the user.
The focus of my thesis is to understand the extent to which we can mach accounts that belong to the same individual in todays' social networks. We develop methods to build scalable and reliable matching schemes that exploit public data from users profiles and we demonstrate the utility of matching accounts by showing that it is a powerful tool to detect impersonators online.
More precisely the main contributions of my thesis are the following.
Development of a scheme that exploits public profile attributes of users to match accounts. We study how we can exploit the information users provide about themselves (e.g., name, screen name, location, bio, profile photo and friends) in different social networks to match their accounts. We identify four key properties – Availability, Consistency, non-Impersonability, and Discriminability (ACID) – to evaluate the quality of different attributes to match accounts. These properties show that exploiting profile attributes has a good potential to match accounts. Yet, traditional matching schemes perform poorly to match accounts in today's social networks because of the very large size of the datasets and because we can only exploit a handful of attributes to find the matching account out of potentially billions. To overcome this problem, we develop a novel matching scheme in three-steps that better exploits the attributes and is able to achieve a high accuracy even at large scale. For example, our matching scheme can achieve a 21% recall for a 98% precision to match Twitter accounts to Facebook accounts which is close to what humans can achieve (a 25% recall for the same precision).
Development of a scheme that exploits innocuous activities of users on different social networks to match accounts. We show that we can still match accounts across social networks even if we ignore the user profiles and we only use information from user's posts, i.e., the user's activity on different social networks. Specifically, we show that by exploiting only the location, timing and writing style of a user’s posts, we can match his accounts across social networks. For example, if we exploit the location from which users post, we can match 60% of Flickr accounts with their corresponding Twitter accounts while only introducing a small percentage of falsely matching accounts. Moreover, we can match 50% of the accounts that we cannot match using names by exploiting the user activity. This demonstrates that, even if users maintain distinct profiles on different social networks, it is still possible to match their accounts.
Detection and characterization of impersonators online. Matching accounts has many potential applications. For example, matching accounts within a social network allowed us to be the first to tackle the problem of detecting and characterizing impersonators online. Our study shows that, traditional methods to detect fake accounts based on classifiers that only exploit features of accounts alone perform poorly for detecting impersonators. A better approach to detect impersonators is to build classifiers that exploit features that characterize pairs of accounts instead. More surprisingly, we observe that humans are very bad at detecting impersonating accounts and that our automated schemes perform much better. Finally, our detection methods allow us to do the first characterization of impersonation attacks on Twitter. One of the most surprising findings is that not only celebrities are impersonated but also random active Twitter users and that the main purpose of many impersonation attacks is to evade fake account detection systems and use the accounts for retweet and follower fraud.
Defence : 05/21/2014 - 14h00 - Laboratoire LINCS, 4è étage, 23 avenue d'Italie, 75005 Paris
Jury members :
Jon Crowcroft, University of Cambridge
Krishna Gummadi, MPI-SWS
Clémence Magnien, CNRS
Dina Papagiannaki, Telefónica I+D
Renata Teixeira, Inria Rocquencourt
- O. Goga : “Matching User Accounts Across Online Social Networks: Methods and Applications”, thesis, defence 05/21/2014, supervision Cruz teixeira, Renata (2014)
- O. Goga, H. Lei, S. Parthasarathi, G. Friedland, R. Sommer, R. Teixeira : “Exploiting Innocuous Activity for Correlating Users Across Sites”, The 22nd International conference on World Wide Web, WWW'13, Rio de Janeiro, Brazil, pp. 447-458, (ACM) (2013)
- D. Zeaiter Joumblatt, O. Goga, R. Teixeira, J. Chandrashekar, N. Taft : “Characterizing end-host application performance across multiple networking environments”, 2012 Proceedings IEEE INFOCOM, Orlando, FL, United States, pp. 2536-2540 (2012)
- O. Goga, R. Teixeira : “Speed Measurements of Residential Internet Access”, Passive and Active Measurement, vol. 7192, Lecture Notes in Computer Science, Vienne, Austria, pp. 168-178, (Springer) (2012)
- O. Goga, P. Loiseau, P. Gonçalves : “On the impact of the flow size distribution’s tail index on network performance with TCP connections”, IFIP PERFORMANCE 2011, Amsterdam, Netherlands, pp. 62-64 (2011)