Projects

Our work spans a wide range of uses of social media for health applications. We focus on both novel methods of natural language processing and machine learning, as well as new applications.

Digital Disease Surveillance: influenza

We have built a system that can estimate the daily or weekly prevalence of influenza in a geographic location based on the normalized volume of Twitter messages ("tweets") that indicate an influenza infection. The system explicitly attempts to automatically distinguish tweets about a flu infection (e.g. "sick with the flu") vs tweets that discuss the flu in other ways (e.g. "I'm worried about this swine flu").
  • David Broniatowski, Michael J. Paul, Mark Dredze. National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic. PLOS ONE, 2013. [PDF]
  • Alex Lamb, Michael J. Paul, Mark Dredze. Separating Fact from Fear: Tracking Flu Infections on Twitter. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Atlanta. June 2013. [PDF]
  • Mark Dredze, Michael J. Paul, Shane Bergsma, Hieu Tran. Carmen: A Twitter Geolocation System with Applications to Public Health. AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI), Bellevue, WA. July 2013.[PDF]

Tracking Drug Use: prescription and illicit

We have downloaded over 400,000 messages from an online community of illicit drug users, which can be used for a large-scale analysis of temporal and demographic trends in drug use, and to identify and understand novel and emerging drugs. Using natural language processing tools, we were able to automatically extract user-reported information such as the effects, side effects, and dosage information of new drugs. We are in the process of validating this data by comparing the user demographics of the online community to the same information reported in government surveys. We have also started preliminary investigations into monitoring prescription opioid usage patterns in Twitter.
  • Michael J. Paul and Mark Dredze. Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Atlanta. June 2013. [PDF]

Patient Safety

This project attempts to identify informal reports of medical errors in Twitter. We identified 170 tweets describing apparent patient safety events, which were coded by 3 experts to characterize the tweets based on the type and source of error, who reported the error, and the response to the error. This approach has potential because it captures informal self-reported events which might not be captured by traditional indicators of patient safety.
  • Atul Nakhasi, Ralph J. Passarella, Sarah G. Bell, Michael J. Paul, Mark Dredze, Peter J. Pronovost. Malpractice and Malcontent: Analyzing Medical Complaints in Twitter. In the AAAI 2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, Arlington, VA. November 2012.[PDF]
  • Ralph J. Passarella, Atul Nakhasi, Sarah G. Bell, Michael J. Paul, Peter J. Pronovost, Mark Dredze. Twitter as a Source for Learning about Patient Safety Events. In the AMIA 2012 Annual Symposium (American Medical Informatics Association), Chicago, IL. November 2012. [oral presentation]

Tracking self-medicating behavior

We automatically analyzed 1.6 million Twitter messages that mentioned the names of OTC drugs and counted how often these were mentioned in the context of various illnesses/ailments. We found that some anecdotally popular methods of self-medication were also prominent on Twitter, such as using antihistamines as sleep aids. We also discovered that a large number of users reported taking antibiotics for influenza, confirming a result previously described in the literature.
  • Michael J. Paul and Mark Dredze. You are what you Tweet: Analyzing Twitter for Public Health. International AAAI Conference on Weblogs and Social Media (ICWSM), Barcelona, Spain. July 2011. [PDF]

Measuring healthcare quality via doctor reviews

We downloaded over 50,000 reviews from RateMDs.com, a website where patients write reviews of their healthcare providers. The goal was to automatically discover the prominent themes that are described in the text of a large number of reviews to identify the issues that are most important to patients. We are currently working to see how the patient perspective varies across geographic region, and to evaluate whether the perceptions described in the reviews correlate with known metrics of provider quality. There is also potential to analyze these reviews to understand more specific concerns, such as patient safety issues.
  • Michael J. Paul, Byron C. Wallace, Mark Dredze. What Affects Patient (Dis)satisfaction? Analyzing Online Doctor Ratings with a Joint Topic-Sentiment Model. AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI), Bellevue, WA. July 2013. [PDF]

Quantifying Mental Health Using Social Media

The ubiquity of social media provides a rich opportunity to enhance the data available to mental health clinicians and researchers, allowing better-informed and better-equipped mental health research. We are investigating the potential of social media data for a number of mental illnesses, including post-traumatic stress disorder (PTSD), major depressive disorder, bipolar disorder, and seasonal affective disorder.
  • Glen Coppersmith, Mark Dredze, Craig Harman. Quantifying Mental Health Signals in Twitter. ACL Workshop on Computational Linguistics and Clinical Psychology, 2014.
  • Glen Coppersmith, Craig Harman, Mark Dredze. Measuring Post Traumatic Stress Disorder in Twitter. International Conference on Weblogs and Social Media (ICWSM), 2014.

Answering BRFSS questions

We have shown that the volume of Twitter messages about various health topics such as diet and exercise are correlated with existing survey data on behavioral risk factors such as rates of physical activity and obesity. We compared 1.6 million (and more recently, 140 million) tweets to the CDC's BRFSS survey for these experiments. We believe that social media analysis could complement traditional survey data.
  • Michael J. Paul and Mark Dredze. You are what you Tweet: Analyzing Twitter for Public Health. International AAAI Conference on Weblogs and Social Media (ICWSM), Barcelona, Spain. July 2011. [PDF]

Tracking Public Health with Search Trends

When people want to learn more about their health, they search the web. Mining trends in search queries can reveal public health information. We identify patterns in these trends to reveal interesting and informative aspects of public health. For example, we have shown that the United States great recession (2008 - 2011) had a negative impact on health, increasing searches for health problems.
  • John W. Ayers, Benjamin M. Althouse, Mark Dredze. Could Behavioral Medicine Lead the Web Data Revolution?. Journal of the American Medical Association (JAMA), 2014.
  • John W. Ayers, Benjamin M. Althouse, Morgan Johnson, Mark Dredze, Joanna E. Cohen. What's the Healthiest Day? Circaseptan (Weekly) Rhythms in Healthy Considerations. American Journal of Preventive Medicine, 2014.
  • Ben Althouse, Jon-Patrick Allem, Matt Childers, Mark Dredze, John W Ayers. Population Health Concerns During the United States' Great Recession. American Journal of Preventive Medicine, 2014;46(2):166-170.