Tired of keeping a French and English versions of your Web site up-to-date? The field of natural language generation (NLG) has methods and insights that can help you.
In this talk, I will present my general NLG framework written in PHP, PHP-NLGen (https://github.com/DrDub/php-nlgen) and show an example of how to achieve multi-lingual NLG using it.
NLG is not for the faint of heart but it is a fun and young field with plenty of opportunities for innovation.
About the speaker: Dr Duboue has a PhD in Computer Science from Columbia University (New York) in NLG and five years of corporate research experience (including the IBM Research Watson project).
- printf++
- intelligent templates
When to use it
- Text Output vs. Graphs
- Capture generalization across dimensions
IF culture_feries + culture_dimanche < 30 THEN pct_change(reading, -10) IF culture_spectacles < 50 THEN on_strike(employee(maison_culture))
Combine the predicates such as on_strike, remove redundant info
Transform each predicate and arguments into French or English as required
Les dépenses à des activités culturelles seront les plus bas dans la ville. Les dépenses dans les soins de la rue seront les plus élevé dans la ville. Les fonctionnères de la piscine interieur et les fonctionnères de la Maison de la Culture vont aller en grêve. Le nombre d'accidents de voiture diminuera de 5 pour cent.
The spending in cultural activities will be the lowest within the city. The spending in street care will be the highest within the city. The heated pool employees and the employees at the Maison de la Culture will go on strike. The number of car accidents will decrease by 5 percent.
- Aggregation
- The heated pool employees and the employees at the Maison de la Culture will go on strike.
The heated pool employees will go on strike. The employees at the Maison de la Culture will go on strike.- Subsumption
- Les dépenses dans les soins de la rue seront les plus élevé dans la province.
Les dépenses dans les soins de la rue seront les plus élevé dans la ville. Les dépenses dans les soins de la rue seront les plus élevé dans la province.
- Adding new entities (and predicates).
- More agents that can go on strike.
- More metrics where to excel at the city and provincial levels
- Avoid cut&paste errors.
- Edit this: "Les dépenses à des activités culturelles seront les plus bas dans la ville."
- Into this: "Le nombre d'accidents de voiture seront les plus bas dans la ville.
- Grammar encoded into functions
- A repository of general records
- A repository of lexical entries
- arguments, data as needed (function dependent).
- returns the generated string plus a dictionary with semantic info about the generated string.
- For example, generate the subject first, so as to know whether it is plural or singular, before generating the verbal phrase.
- Such ordering might not exist for complex sentences (a full-fledged generation system is then needed).
function on_strike($data){ $actor = $data[0]; $actor_str = $this->gen('np', array('head'=>$actor),'actor'); $sem = $this->current_semantics(); return $actor_str . ' ' . $this->gen('on_strike_vp', array('subject' => $sem['actor'])); } function np($data){ $head = $data['head']; if(gettype($head) == "object"){ $str = $this->gen($head->predicate,$head->args,'subpred'); $sem = $this->current_semantics(); return array('text'=>$str, 'sem'=>$sem['subpred']); }else if(gettype($head) == "array"){ $str = array(); for($i=0;$i<count($head); $i++) { $str[] = $this->gen('np', array('head'=>$head[$i])); } return array('text'=>join(", ", array_slice($str, 0, count($str)-1)) . ' ' . $this->lex->string_for_id('conjunction') . ' ' . $str[count($str)-1], 'sem' => array('gen'=>$gen, 'num'=>'pl')); }else{ return array('text'=>$this->lex->string($head,$data), 'sem'=>$data); } }
- Not unlike a NoSQL database.
- The difference is the helper functions defined for each.
- For ontologies we care about the type of a frame.
- For lexicons, we care about the string(s) verbalizing a concept.
Onto | "city":{ "class":"region" }, "province":{ "class":"region", "includes":"city" } |
Lex(En) |
"maison_culture":{ "string":"Maison de la Culture", "class":"place" }, "heated_pool":{ "string":"heated pool", "is_short":"1", "class":"place" } |
Lex(Fr) |
"will_decrease":{ "pl":"diminueront", "sing":"diminuera", "class":"verb" } |
generator.php
- 192 loc
- 11 methods
ontology.php
- 96 loc
- 5 methods
lexicon.php
- 352 loc
- 32 methods
- Multilingual extension: a dictionary of language name to lexicon.
- Multilingual extension: context contains the language to generate.
- Multilingual extension: when calling method 'foo', if it is not defined, the system will try with 'foo_$lang'.
- The methods can return strings or a dictionary containing the keys 'text' and 'sem'.
- Either way, a tree is built by successive invocations to gen, by means of a stack.
- find, has, find_all_by_class, find_by_path
- find, has, find_all, query, string_for_id, ...
- The lexicon is quite complex and heavily under development.
IF culture_feries > 2 AND culture_dimanche == 52 THEN pct_change(reading, 10) IF culture_feries + culture_dimanche < 30 THEN pct_change(reading, -10) IF culture_spectacles < 50 THEN on_strike(employee(maison_culture)) IF deneigement_chargements == 5 AND deneigement_findesemaine < 3 THEN pct_change(car_accident, 10) IF routier_nidsdepoule > 10 THEN pct_change(car_accident, -5)
pct_change( metric, delta ) on_strike( actor ) benchmarked( position, metric, region ) employee( place )
- apply_rules
- sort_predicates
- sentence_planning
- Each predicate is verbalized
- pct_change -> metric
- benchmarked -> metric
- on_strike -> np (might call another predicate, e.g., employee), on_strike_vp
- Building Natural Language Generation Systems (2000) by Reiter and Dale http://amzn.to/p63jzB
- http://www.siggen.org/
- http://wiki.duboue.net/index.php/2011_FaMAF_Intro_to_NLG
- U. de Montréal, Recherche Appliquée en Linguistique Informatique (RALI), http://rali.iro.umontreal.ca/
- Contact me for some joint Free Software projects
- Particularly in the area of Open Data.
- @pabloduboue http://duboue.net
Table of Contents | t |
---|---|
Exposé | ESC |
Full screen slides | e |
Presenter View | p |
Source Files | s |
Slide Numbers | n |
Toggle screen blanking | b |
Show/hide slide context | c |
Notes | 2 |
Help | h |