The proliferation of social media -- such as Twitter, Facebook, blogs, and Web forums -- has created an unprecedented, continuous stream of messages containing the thoughts, opinions, and beliefs of millions of people. In addition to the primary benefit users of this technology enjoy, a secondary benefit is emerging as scientists discover how to analyze this new data source to provide insights into society. Evidence is mounting that such analysis can be valuable in understanding public health, finance, politics, social unrest, and natural disasters.
The goal of my research is two-fold: (1) to leverage this unprecedented source of data to advance research in automated processing of informal human communication; and (2) to apply these techniques to analyze trends in social media and produce socially beneficial technology. My research contributions can be categorized into three main areas:
| 2012 |
A demographic analysis of online sentiment during Hurricane Irene
NAACL-HLT Workshop on Language in Social Media, 2012 |
|
Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages
Language Resources and Evaluation, Special Issue on Analysis of Short Texts on the Web, 2012 Preprint. Final version available at springer.com |
|
| 2011 |
SampleRank: Training factor graphs with atomic gradients
Proceedings of the International Conference on Machine Learning (ICML), 2011 |
| 2010 |
Detecting influenza epidemics by analyzing Twitter messages
arXiv:1007.4748v1 [cs.IR], 2010 |
|
Towards detecting influenza epidemics by analyzing Twitter messages
KDD Workshop on Social Media Analytics, 2010 |
|
| 2009 |
SampleRank: Learning preferences from atomic gradients
Neural Information Processing Systems (NIPS) Workshop on Advances in Ranking, 2009 |
|
An entity-based model for coreference resolution
SIAM International Conference on Data Mining, 2009 |
|
| 2008 |
Learning and inference in weighted logic with application to natural language processing
Ph.D. Thesis, University of Massachusetts, Amherst, 2008 |
| 2007 |
Canonicalization of Database Records using Adaptive Similarity Measures
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2007 |
|
Sparse Message Passing Algorithms for Weighted Maximum Satisfiability
New England Student Colloquium on Artificial Intelligence (NESCAI), 2007 |
|
|
Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function
Sixth International Workshop on Information Integration on the Web (IIWeb-07), 2007 |
|
|
First-Order Probabilistic Models for Coreference Resolution
Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL), 2007 |
|
| 2006 |
Corrective Feedback and Persistent Learning for Information Extraction
Artificial Intelligence, 2006 |
|
Tractable Learning and Inference with High-Order Representations
International Conference on Machine Learning Workshop on Open Problems in Statistical Relational Learning, 2006 |
|
|
Learning field compatibilities to extract database records from unstructured text
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2006 |
|
|
Practical Markov logic containing first-order quantifiers with application to identity uncertainty
Human Language Technology Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing (HLT/NAACL), 2006 |
|
|
Integrating probabilistic extraction models and data mining to discover relations and patterns in text
Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL), 2006 |
|
| 2005 |
Learning clusterwise similarity with first-order features
Neural Information Processing Systems (NIPS) Workshop on the Theoretical Foundations of Clustering, 2005 |
|
A conditional model of deduplication for multi-type relational data
University of Massachusetts IR-443, 2005 |
|
|
Joint deduplication of multiple record types in relational data
ACM CIKM International Conference on Information and Knowledge Management, 2005 |
|
|
Reducing labeling effort for structured prediction tasks
The Twentieth National Conference on Artificial Intelligence (AAAI), 2005 |
|
|
Gene prediction with conditional random fields
University of Massachusetts, Amherst UM-CS-2005-028, 2005 |
|
| 2004 |
Dependency tree kernels for relation extraction
42nd Annual Meeting of the Association for Computational Linguistics (ACL), 2004 |
|
Interactive information extraction with constrained conditional random fields
Nineteenth National Conference on Artificial Intelligence (AAAI), 2004 Best Paper Award (Honorable Mention) |
|
|
Confidence estimation for information extraction
Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL), 2004 |
|
|
Extracting social networks and contact information from email and the Web
First Conference on Email and Anti-Spam (CEAS), 2004 |
|
| 2003 |
Maximizing cascades in social networks
University of Massachusetts, 2003 |
| Spring 2013 |
CS207: Programming II [schedule] CS300: Client-side Web Development [schedule] |
| Fall 2012 |
CS207: Programming II CS300: Client-side Web Development |