I’m currently a Lecturer1 at the University College London, leading the Machine Reading Lab. Before that I was a postdoc and research scientist with Andrew McCallum at UMass Amherst, a researcher at Tokyo University and DBCLS under Tsujii Junichi, and a PhD student with Ewan Klein at the University of Edinburgh.

Whenever we advance our understanding of the world, we write down our findings in publications, patents, webpages or the like. This results in a vast and ever-increasing body of literature, impossible to effectively access and comprehend. The overarching goal of my research is to extract and distill this knowledge.

To achieve this goal I have been working in several areas of Natural Language Processing and Machine Learning I believe need to be advanced:

  • Efficient Probabilistic Inference: I designed methods that efficiently find exact MAP states, and approximate marginals, for high tree-width graphical models with millions of variables and factors (UAI08, NAACL10a, NAACL09b, UAI10, EMNLP12).
  • Joint Inference: I developed language processors in which information can flow up and downwards, reducing the problem of cascading errors (EMNLP11, NAACL09a, ACL09).
  • Distant supervision: I trained graphicals models to extract Freebase facts from text, using no textual annotation but only existing Freebase facts as training signal: (ECML10, EMNLP10, NAACL10b).
  • Linguistic Processsors: I developed syntactic and semantic parsers that can efficiently capture global dependencies within the semantic structure to be extracted (EMNLP06, NAACL09a, ICASSP08).

The above research has enabled me develop state-of-the-art systems that consistently rank highest in international competitions on extracting information from biomedical literature: In 2011, and in collaboration with colleagues at Stanford, I developed an event extractor for biomedical text that ranked 1st in three tracks of the BioNLP 2011 Shared Task (BIONLP11a, BIONLP11b); in 2009 I developed an event extraction system that ranked 1st in the BioNLP 2009 Shared Task, Track 2 (BIONLP09); in 2005 my protein-protein interaction extraction model ranked 1st in the Learning Language in Logic Task (LLL05)

I am also very interested in Probabilistic Programming and Statistical Relational Learning. Progress in AI is slowed down by the difficulty of engineering large scale end-to-end systems that reason under uncertainty. Probabilistic Programming frameworks dramatically simplify this task. I have developed markov thebeast, a Markov Logic engine used in research institutes around the world. It has helped developers to create state-of-the-art probabilistic models—without the need of a post-graduate degree in Machine Learning. I am also involved in the development of factorie.

See my publication, software and scholar pages for more details.

I am also interested in helping out the Ishinomaki Children’s Refuge Center.

1 Roughly equivalent to US Assistant Professor.