Content Planning for NLG

What is Content Planning

Content Planning, also known as Document Planning and Macroplanning, is the first stage of a Natural Language Generation (NLG) pipeline.

Given a Semantic Input, together with communicative goals, a NL Generation system targets producing a textual output conveying the information present in the input, in such a way that the communicative goals are achieved.

This complex task is normally divided in three steps (Reiter, 1994): Document/Content Planning, Microplanning and Surface Realization. (See below for details.) In such a pipelined design the input is modified and refined, getting closer to the textual output in every step. Abstracts concepts get grounded, synonyms are choosen and the actual wording is selected, in order to convey the semantic information present in the input.

For a good background reading on the subject (a little bit dated by now, of course), take a look at my candidacy exam.

Generation Pipeline

Briefly, each of the pipeline steps can be summarized as follows:

  1. Content Planning: deals with Content Selection (choosing what to say) and Document Structuring (choosing how to say it).

  2. Microplanning: involves three subtaks, aggregation (avoiding repeat information), refering expressions generation (chosing pronouns and descriptive expressions) and lexical choice (which words to use to refer to a particular concept).

  3. Surface Realization: the last step, inflecting the words provided by the microplanner, actually build grammatical sentences, taking into account fine grain grammatical issues such as agreement and subject-verb ordering.

The Content Planning task is, therefore, the closer to the semantic input. It focus on building a Document Plan, a high level description of a document. This undertaking can be further subdivided into two subtasks:

  • Content Selection

  • Text Structuring