Reza Zafarani :: Research

Research Summary

My research interests are in (I) data science (data mining and machine learning), especially on large-scale data; (II) networks and graphs, especially theoretical research, and (III) analyzing online human behavior.

I am big fan of multidisciplinary research and as such I have a significant interest in mining social media as it provides me with the opportunity to pursue all my research interests under one unified ecosystem. I have championed the opportunities in such exciting interdisciplinary studies in my textbook on Social Media Mining: An Introduction; Cambridge University Press, check it out, it's free!

When conducting such interdisciplinary research, a common pattern in my studies is to collect and analyze large scale data to glean actionable patterns. When studying online human behavior, I often employ theories from social sciences, psychology, or anthropology, in addition to developing and using advanced mathematical, statistical, and machine learning machinery to prove the validity of such patterns. My research is supported by an NSF CAREER award.

For a sample of my work see our recent Tutorials:

SDM 2022 - Noise Enhancement: Techniques and Applications (Slides);
WWW 2022 - Interpretable Network Representations (Slides);
WSDM'19/KDD'19 Tutorials: Fake News: Fundamental Theories, Detection Strategies and Challenges;
or older ICDM Tutorial: Social Media Mining: Fundamental Issues and Challenges

and the list of current research directions below.

Research Interests

Data Science (Data Mining and Machine Learning)
Social Media Mining
Networks/Graph Mining
Big Data Analytics
Social Network Analysis
Social Computing
Analzying Online Human behavior

Research Directions

Interpretable/Spectral Network/Graph Representations

New network representations that are (I) easy-to-visualize and (II) interpretable (i.e., structurally-informative)

Traditionally, a network is represented by an adjacency matrix, which captures the nodes connected in the network. Adjacency matrices can be massive even for sparse large graphs, are not interpretable (e.g., not directly capturing complex relationships such as paths or cuts), and are hard to visualize, appearing as ``hairballs"; dense tangled structures of nodes and edges often carrying no insights. To address these challenges, we have recently developed new network representations that are (I) easy-to-visualize and (II) interpretable (i.e., structurally-informative. See these examples (Spectral Zoo, Spectral Paths and Network Shapes) and their applications in network authentication.
"Fake News" Research

Understanding and characterizing fake news and designing techniques to detect it.

A summary of fake news research can be obtained through our survey. Our work includes research on detecting fake news using content or link (network) information and ways to detect fake news early. We have also recently worked on introducing the first techniques to assess the intent of fake news spreaders: see this paper. For more information see our KDD and WSDM Tutorials on the topic here.
Mining across Sites

Users are often active on multiple social media sites. To systematically study users, we need their information on all sites.

To mine across social media sites, we particularly focus on two specific problems. First, how does user behavior vary across sites (e.g., difference between LinkedIn Friends and Facebook Friends). In addition to designing new techniques, we investigate means to scale and adapt traditional models that analyze user behavior for a single site to multiple sites. For recent results on this research question, see my papers in Information Fusion'16 and ICWSM'14 and this book chapter. Second, I study user behaviors that are only observed across sites. An example includes our study on user migrations across sites.
Identifying Users across Sites

Investigating means to identify the same user across social media sites, allowing to understand users online comprehensively.

I investigate identifying the same user across multiple sites using link (friendships) [TKDD' 15] and content information [ICWSM'09, KDD' 13]. User identification using link information is closely related to the graph isomorphism problem... .
Analyzing Human Behavior Using Online Traces

Realistically model, predict, or mine human behavior.

My research has investigated means to realistically analyze human behavior online by focusing on ways to exploit information redundancies generated by user behavior. The methodology has been used to identify sarcasm on Twitter, to identify users across sites, among other behaviors. For more on the topic see this article, this chapter, or our recent workshop on the topic. As a by-product, my research on human behavior modeling has had implication in information verification, privacy and security.
Evaluation in Social Media Research

With no face-to-face access to users on social media, how can we guarantee that the patterns that we identify online represents the true intentions of online users?

In data mining terms, ground truth is rarely available online. I recently started to investigate this problem and identified some ways to tackle the problem. For a succinct review of the topic see my recent Communciations of the ACM (CACM) paper on this issue.
Mining with Absolute Minimum Information

What is the minimum information required to perform data mining tasks on social media?

I have looked at how to utilize minimum information to identify users, detect malicious users, or to recommend friends on social media sites with high accuracy. As these methods utilize only minimum information, they scale easily to millions of users. Recently, I have been investigating theoretical limits of using minimum information.
Theoretical and Empirical Limits of Privacy

How much user privacy is violated by mining user's content?

I have recently investigated the balance between privacy and mining user-generated content by connecting ideas from complexity theory, specifically Kolmogrov complexity. See this paper for some (very!) preliminary results.
Pyschological and Affective States of Online Users (e.g., sentiments and emotions)

nderstanding the role emotions play in social interactions has been a central research question in the social sciences. However, the challenge of obtaining large-scale data on human emotions has left the most fundamental questions on emotions less explored.

Previous research has shown that human sentiment and/or mental state depends on those of friends and family. I have investigated how sentiment and information propagates in large scale networks. See some recent results in our CIKM and ICDM papers and an older paper here.
Online Crisis and Disaster Management

How can we identify areas impacted by natural disasters and provide assistance to individuals impacted by natural disasters using online data?

My research has focused on (1) online means to map areas impacted by natural disasters in real-time [ICDM'15], (2) identifying relevant users that provide most useful information in case of crises [HT 2014], and (3) systematic approaches to crowdsource user-generated content in case of disasters [CMOT'12].

Reza Zafarani

Syracuse University

Research Summary

Research Interests

Research Directions

Interpretable/Spectral Network/Graph Representations

"Fake News" Research

Mining across Sites

Identifying Users across Sites

Analyzing Human Behavior Using Online Traces

Evaluation in Social Media Research

Mining with Absolute Minimum Information

Theoretical and Empirical Limits of Privacy

Pyschological and Affective States of Online Users (e.g., sentiments and emotions)

Online Crisis and Disaster Management