Understanding the actionable outcomes of a dialogue requires effectively modeling situational roles of dialogue participants, the structure of the dialogue and the relevance of each utterance to an eventual action. We develop a latent-variable model that can capture these notions and apply it in the context of courtroom dialogues, in which the objection speech act is used as binary supervision to drive the learning process. We demonstrate quantitatively and qualitatively that our model is able to uncover natural discourse structure from this distant supervision