Les Dégoûts (FICTION) (French Edition)

Free download. Book file PDF easily for everyone and every device. You can download and read online Les Dégoûts (FICTION) (French Edition) file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Les Dégoûts (FICTION) (French Edition) book. Happy reading Les Dégoûts (FICTION) (French Edition) Bookeveryone. Download file Free Book PDF Les Dégoûts (FICTION) (French Edition) at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Les Dégoûts (FICTION) (French Edition) Pocket Guide.

Buy with confidence, excellent customer service!. Seller Inventory n. Jean d'Ormesson.

  1. Search form.
  2. Change Your Thoughts...Ignite Your Soul;
  3. Sophie Daull.
  4. Pick Language!

Publisher: French and European Publications Inc , This specific ISBN edition is currently not available. View all copies of this ISBN edition:. Buy New View Book. Customers who bought this item also bought. Stock Image. Published by Gallimard New Hardcover Quantity Available: 2. Seller Rating:. New Hardcover Quantity Available: 4. Gallix Gif sur Yvette, France.

J'ai pleuré de dégoût devant "Gangsterdam"

New Hardcover Quantity Available: 3. New Paperback Quantity Available: 6. Published by Editions Gallimard. New Hardcover Quantity Available: 1. Revaluation Books Exeter, United Kingdom. Published by French and European Publications Inc BookVistas New Delhi, India. The rationale behind meta-recognition is to reach out for a set of independent classifiers, each looking at different features and possibly using different strategies.

Editorial Reviews

One would then expect that although none of the standalone method clearly outperforms the others, their combined judgment offers a robust solution to the tackled recognition problem, as their individual errors will tend to cancel out. The next sections describe the performances obtained with state of the art tools compared to each of our classifiers on a 5M tokens test corpus, and the overall gain brought by averaging their results using a fixed weight and a meta-recognition algorithm. The conclusion outlines some applications for analyses of novels that can be performed on the outputted categorization.

We built a test corpus made out of 35 classic works 5M tokens in French, stemming from digitized books part of Project Gutenberg. They include 3 sagas, 6 classical books that were translated from other languages and 25 French classical novels. For each piece of work, we produced a corresponding reference file by extracting the proper nouns occurring more than five times and manually labeling each of them with the category they fall in character, place, or other.

Diderot | French Studies | Oxford Academic

The resulting database references about 3, categorized named entities, along with the work they come from and their number of occurrences. Given our further needs, we considered names referring to pets, companies, and groups of persons as regular characters whenever they serve the same narrative purpose people mostly interacting with corporations or groups of people. This differs a lot from most knowledge databases, as the very bounds between characters and places might be fuzzy and context dependent. For the actual test procedure, we ran each classifier on our corpus and computed the averagely yield precision positive predictive value , recall true positive rate and F 1 score of the classification task.

Section 2 shortly discusses state of the art of existing tools under the same constraints, and sections 3. The field of NER is evolving quickly, as big data approaches and access to increasingly large corpora brought up several projects promising interesting results. However, these tools remain mainly purposed to summarize, categorize, and extract meaning out of short texts that relate to the outside world e.

Analyses on big portions of self-contained texts like entire chapters or full fiction books, whose named entities are difficult to look up and interpret out of their context, remain a challenging endeavor. Novels in particular are different from traditional applications for NER techniques for at least two reasons:.

Many traditional NER methods do not exploit this characteristic as they on the contrary rely on external databases to identify and classify named entities Jovanovic et al. This phenomenon is obvious enough that it was shown to allow to confidently attribute texts to their authors in many contexts Stamatatos, This intrinsic diversity makes a one-size-fits-all approach to NER difficult and is problematic for pattern-inducing machine learning algorithms.

In order to establish a baseline for our study, we conducted an initial evaluation with one of the current widely used NER systems. Even if it was not designed to handle big amounts of text, 7 its processing routine is efficient and produces XML result files that can easily be processed and compared to our reference data. Since OpeNER is extracting a wider range of entities than the ones we considered such as date and time information , we only considered the relevantly tagged words and manually resolved naming differences in order to keep the comparison fair and accurate.

In general, spotting proper nouns in French texts is a rather easy task due to capitalization rules, that are quite similar to English if not simpler due to less false positives e. Still, the task is not entirely trivial, because we need to filter out capitalization due to sentences starts and miscellaneous stylistic effects such as subtitles, quotes, and verses. Additionally, we want full names extracted as single entities and remove false positives, like honorific titles or named time periods Grevisse and Lenoble-Pinson, In order to do that, we designed a two-pass method:.

To do this, we compute all 1, 2, and 3 g of each sentence while letting out the first word. We then keep all isolated proper nouns i. For each of them, we store an index of the sentences they appear in for further processing. This approach allows to recover all occurrences of each capitalized word, as long as they are not systematically at the start of sentences.

In practice, the resulting list needs to be refined thereafter, as some capitalized common nouns still happen to end up in it. The reasons for this may be multiple, ranging from sentence tokenization errors, typos in the source text or other stylistic effects that may influcence punctuation and case. This refining can be done accurately and efficiently by combining three strategies:. Figure 1 illustrates that the typical distributions often allow for easy separation, with very few outliers.

Figure 1. Typical mean positions of uppercased words in their respective tokenized sentences vs. Once identified, proper nouns usually fall in three main categories that serve different purposes to narration: they can namely be characters, places , or others brands, abstract concepts, acronyms…. We designed and evaluated six independent classifiers. Each classifier gets one word at a time as an input as well as the context that is necessary and relevant for its way of processing data, and returns the predicted category namely character, place , or other.

We first present the implementation characteristics of each component before looking in details at the resulting scores. When one encounters a proper noun in a sentence, a good guess on its nature can sometimes easily be taken due to the immediate context.

Oeuvres Jean d'Ormesson (Bibliotheque de la Pleiade) (French Edition)

The simplest case, which we will refer here as obvious context , would be if the noun is immediately preceded by a title or a predicate that hints at what it refers to. For this classifier, we compiled a simple list of obvious context classifiers that allow to make good guesses about the nature of the immediately or next to following proper noun:. In French, like in many other languages, the grammatical structure makes it more likely for sentences to follow a pattern that puts the subject of the action at the beginning, and the location toward the end. This characteristic can be used when one looks at enough examples to make a simple, yet quite powerful guess about the global roles of the proper nouns.

The accuracy of this classifier is indeed strongly dependent on the writing style of the author, as the frequent use of specific figures of speech may break its work hypothesis, and longer sentences may narrow the gap between the categories or blurry the bounds. This can be seen clearly in Figure 2 , where we show the relative positions of identified classes of names for three different stories.

Figure 2. Relative mean position of characters and places names for three classical French novels. This approach is very different from section 3. For this implementation, we compiled lists of words that are more likely but not exclusively to appear, respectively, nearby characters, places, or abstract concepts. For instance, we expect names of characters to be more often surrounded by words related to emotions, body functions, speech, or professions, whereas names of places would be more closely related to motion verbs, place features, and prepositions.

Starting from common nouns that are unambiguously related to one of the categories we are interested in, we used a French synonyms dictionary service 10 to put up a list aiming to be as extensive as possible. The final files resulted in 4, words for characters, for places and 50 for concepts see Appendix B for the complete list. The script then looks for these words in the neighborhood of the nouns to be disambiguated and returns the most probable category.

Pick Language

As characters and places serve different narrative purposes, one may expect the grammatical constructs surrounding them to differ in a significant way. For instance, place names are often preceded with prepositions or determiners, whereas it is expected for character names to be more often directly followed by verbs.

We thus introduced a script classifying names based on its knowledge of the full text, grammatically tagged using TreeTagger, 11 and tokenized in sentences. To guess the nature of the names, it then matches all sentences containing them against a set of rules that are typical constructions one uses when writing about a person or a place. We tried out a set of seven manually established rules covering the most straightforward grammatical constructs described in details in Table 1 , plus two that help filter out tokenization errors at a sentence level by flagging words that are preceded by a punctuation mark or that are alone in their sentence.

When matched, they increase or decrease the probability score for one or several classifications, and the category yielding the highest score is chosen and returned in the end. A lot of proper nouns can be non-ambiguously or with a high probability related to one or several categories based on general knowledge.

But the same knowledge may equivocally tell that those same words could also potentially be related to the ship RMS Queen Elisabeth , an abstract concept project Manhattan , or a place Amur river , probably with a lower likelihood if no other context is available. For many nouns, the knowledge we are looking for is well captured in the categorization of their related Wikipedia pages. Using categories instead of the text of the articles also presents the advantages of being very straightforward and reduces a lot noisy signals related to text processing techniques.

To test this idea, we implemented a simple algorithm that gathers the categories of the page whose name is closest to the noun we are looking for and looks for ones tagging people, places, or abstract concepts. In the case no category gives a hint which tends to happen both with very complex or very precise pages , it tries to recursively walk up the hierarchy until the necessary clues are found.

Several works already showed the relevance of locating direct and indirect speech parts to identify characters in novels Glass and Bangay, ; Goh et al. Most of these approaches rely heavily on the lexical database WordNet 12 to find out speech-related verbs and refine their accuracy, but for performance reasons and since we wanted the classifiers to remain efficient even on very long texts we implemented a simpler version that simply checks the proximity of detected proper nouns to quotation marks. For each proper noun w appearing m w times, the system would essentially count the number q w of mentions that appear near quotations.

It then computes:. Once all classifiers returned their answer for a given word, the last step is to compare these results and to decide on a final answer.

Select Category

This meta-classification step can be done by voting systems, choosing the final result according to the majority of predictions using various strategies, or by a meta-recognition system, aiming to discard classifiers that seem to have encountered a problem on the considered text file. We implemented and discussed the performance of four distinct meta-classification methods. The easiest and most obvious solution to average the different classifications is a simple voting system i. However, since there is an even number of classifiers, ties are to be expected.

This situation is quite unlikely since it would require exactly three classifiers deciding correctly, and the three others agreeing incorrectly on a wrong categorization. Still, in case, this situation occurred the final choice would be non-deterministic by lack of model to support one option or the other. For this reason, we introduced a second meta-classification, which involves for each classifier to compute a confidence self-assessment score.

For most classifiers, their internal mechanics allow themselves to evaluate to which extent the strategy they are using seems likely to return reliable results, given the current work context. Hence, a simple strategy to help the voting process in the case of ties is for each classifier to return a confidence index, between 0 and 1. This index is thus expected to equal 1 if the decision was made with no ambiguity and 0 if the clues were equally distributed. For instance, considering the Quotes classifier computes a ratio of 0. Again, this index is expected to see its value tend toward 0 for ambiguous cases and toward 1 for the more definite ones.

On top of that, some classifiers are given the possibility to return 0 to mark their results as known to be invalid, and thus irrelevant at voting time. This can happen for instance when we do not find any known title preceding a word throughout the text, if no grammatical rule could be matched, or if Wikipedia does not have any result for the searched word. The improved voting algorithm then first discards all classifications that have a confidence mark of 0 and proceeds to a simple vote between the remaining ones for each noun. In case of a tie, the results rated with the highest confidence will be privileged.

Not all classifiers exhibit the same behavior regarding precision and recall. It thus can be justified to put more confidence on some of them in cases when we know they are more likely to succeed. For this test, we used manually set weights putting more importance to the obvious context classifier section 3.

With the help of confidence rating section 4. Hence, those cases will be discarded regardless of the coefficient. A good compromise can be reached by giving 3 times more weight to the obvious context classifier, allowing the others to still easily overpower it in the unlikely case a majority of them reach a contradictory agreement. A meta-recognition algorithm follows the idea of improving its accuracy by entirely removing one classifier if it detects it is consistently failing, typically due to stylistic biases or other broken assumptions on the considered book.

Our hypothesis here is that since the remaining classifiers reached a higher agreement, the discarded one must have globally failed in some way and needs to be put aside. Let us consider in Figure 3 the precisions vs. One can immediately see a typical pattern in any information retrieval system: one parameter is detrimental to the other, and no two classifiers behave in a similar way. We can also see that for each of them, some books get incredibly good results, and few others turn out very bad.

Interestingly and as backed up by the full numerical values shown in Table 2 , those are almost never the same, confirming our hypothesis that some methods may work way better or worse on some texts, giving a strong justification for the multi-classifier Mcapproach. The averaged results seem to confirm this intuition. In Figures 4 and 5 , we can see that all meta-classification schemes overall pushed the results toward the top, and at the same time made the clustering denser, hence reducing the differences between the books and output more consistant results by removing the worse outliers.

Figure 3. Comparison between precision and recall for each classifier, on each book. By looking at the numerical results Tables 2 and 3 , several interesting facts can be stated about each classifier:. However, one has to note that with a precision and recall of respectively 0. Its precision of 0. One can notice it performed best, with a very satisfying 0. Its F 1 score 0. Regarding the meta-classification schemes Table 4 , we first wanted to compare them to our the OpeNER baseline. On our test corpus and using similar evaluation OpeNER averaged a precision and recall of 0.

This may seem surprisingly low compared to the standards usually set by this tool, but actually is a good illustration of how difficult it may get to find the correct tagging in fictional texts. The two worst cases Les Malheurs de Sophie and Germinie Larcerteux are shown in details in Tables A1 and A2 Appendix A , and show that those problems are about as much related to bad classifications as missing entities.

Our meta-classification methods perform on average better than that, and we can see an encouraging trend that the various strategies we tried out tended to get increasingly better results and to close the standard deviation gap.

That being said, as far as SD is concerned and taken into account, this improvement was not statistically significative.