DSoAP – Distributed Social Analytics Platform

Established: 2015-06-01

Home page: https://www.microsoft.com/en-us/research/project/dsoap-distributed-social-analytics-platform/

Overview

The Distributed Social Analytics Platform (DSoAP) project is focused on the “Huge Data” problem in social policy research caused by the breadth of data involved. Using aggregate social media data to investigate and validate social issues (such as employment, health and fiscal policy) requires analyzing many months or years of data. DSoAP is applying intelligent compaction, pre-indexing and distribution of data across a server cluster to achieve responsive query times for online data exploration.

Twitter is much more than just cat pictures and what people eat for lunch! – it is a treasure trove of data about people’s life events, experiences, and opinions.

Recent research has started to look at how to use broader aggregate data to investigate and validate social issues such as employment, health and fiscal policy. A defining characteristic of this type of social policy research is the timeline and breadth of data involved. While most tweet analysis concentrates on a short sliding time window of the order of hours or days, extracting meaningful social policy trends typically involves looking at many months or even years of data.

With ~500 million new tweets (~2-3TB) been added to the Twitter data corpus daily, creating systems that can efficiently handle that massive volume of data is a challenging task. In the dsoap project, we are working on solutions for this “huge data” problem by applying intelligent compaction, pre-indexing and distribution of data across a cluster of machines to achieve reasonable query times for online data exploration.

People

Publications

Alexandra Olteanu, Onur Varol, and Emre Kiciman. 2017. Distilling the Outcomes of Personal Experiences: A Propensity-scored Analysis of Social Media, in Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW ‘17). ACM, New York, NY, USA, 370-386. DOI: https://doi.org/10.1145/2998181.2998353

Emre Kıcıman, Scott Counts, Michael Gamon, Munmun De Choudhury, and Bo Thiesson Discussion Graphs: Putting Social Media Analysis in Context Intl. Conf. on Weblogs and Social Media (ICWSM-14), AAAI. 2 June 2014.