The new version has two additions. First, at the suggestion of Stephen Stigler I we have replaced the Table of Contents by what he calls an Analytic Table of Contents. Following the title of each section or subsection is a description of the content of the section. This material helps the reader in several ways, for example: by giving a synopsis of the book, by explaining where the various data tables are and what they deal with, by telling what theory is described where. We did several distinct full studies for the Federalist papers as well as many minor side studies. Some or all may offer information both to the applied and the theoretical reader. We therefore try to give in this Contents more than the few cryptic words in a section heading to ~peed readers in finding what they want. Seconq, we have prepared an extra chapter dealing with authorship work published from. about 1969 to 1983. Although a chapter cannot compre­ hensively Gover a field where many books now appear, it can mention most ofthe book-length works and the main thread of authorship' studies published in English. We founq biblical authorship studies so extensive and com­ plicated that we thought it worthwhile to indicate some papers that would bring out the controversies that are taking place. We hope we have given the flavor of developments over the 15 years mentioned. We have also corrected a few typographical errors.



Inhalt

Analytic Table of Contents.- 1. The Federalist Papers As a Case Study.- 1.1. Purpose.- To study how Bayesian inference works in a large-scale data analysis, we chose to try to resolve the problem of the authorship of the disputed Federalist papers..- 1.2. The Federalist papers.- The Federalist papers were written by Hamilton, Madison, and Jay. Jay's papers are known. Of the 77 papers originally published in newspapers, 12 are in dispute between Hamilton and Madison, and 3 may regarded as joint by them. Historians have varied in their attributions..- 1.3. Early work.- Frederick Williams and Frederick Mosteller found that sentence length and its variability within papers did not discriminate. Tables 1.3-1, 2, 3, 4 show that they found some discriminating power in percentage of nouns, of adjectives, of one- and two-letter words, and of the's. Together these variables could have decided whether Hamilton or Madison wrote all the disputed papers, if that were the problem, but the problem is to make an effective assignment for each paper..- 1.4. Recent work-pilot study.- We call marker words those which one author often uses and the other rarely uses. Douglass Adair found while (Hamilton) versus whilst (Madison). We found enough (Hamilton) and upon (Hamilton); see Tables 1.4-1, 2 for incidence and rates. Tables 1.4-3, 4, 5 give an over-view of marker words for Federalist and non-Federalist writings. Alone, they would not settle the dispute compellingly..- 1.5. Plots and honesty.- Some say that the dispute is not a matter of honesty but a matter of memory. Hamilton was hurried in his annotation by an impending duel, but Madison had plenty of time. Editing may be a hazard. We want to use many words as discriminating variables..- 1.6. The plan of the book.- 2. Words and Their Distributions.- 2.1. Why words?.- Hamilton and Madison use the same words at different rates, and so their rates offer a vehicle for discrimination. Some words like by and to vary relatively little in their rates as context changes, others like war vary a lot, as the empirical distributions in the four tables show. Generally, less meaningful words offer more stability..- 2.2. Variation with time.- In Table 2.2-2, a separate study illustrated by Madison's rates for 11 function words over a 26-year period examines the stability of rates through time. We desire stability because we need additional text of known authorship to choose words and their rates for discriminating between authors. Among function words, some pronouns and auxiliary verbs seem unstable..- 2.3. How frequency of use varies.- For establishing a mathematical model, we need to find out empirically how rates of use by an author vary from one chunk of writing to another..- 2.4. Correlations between rates for different words.- Theoretical study shows that the correlation between the rates of occurrence for different words should ordinarily be small but negative. An empirical study whose results appear in Table 2.4-1 shows that these correlations are ordinarily negligible for our work..- 2.5. Pools of words.- Three pools of words produced potential discriminators..- 2.6. Word counts and their accuracies.- Some word counts were carried out by hand using slips of paper, one word per slip. Others were done by a high-speed computer which constructed a concordance..- 2.7. Concluding remarks.- Although words .offer .only .one set .of discriminators, .one needs a large enough Pool of potential discriminators to .offer a good chance .of success. We need to avoid selection and regression effects. Ideally we want enough data to get a grip on the distribution theory for the variables to be used..- 3. The Main Study.- In the main study, we use Bayes' theorem to determine odds of authorship for each disputed paper by weighting the evidence from words. Bayesian methods enter centrally in estimating the word rates and choosing the words to use as discriminators. We use not one but an empirically based range of prior distributions. We present the results for the disputed papers and examine the sensitivity of the results to various aspects of the analysis..- After a brief guide to the chapter, we describe some views of prob-ability as a degree of belief and we discuss the need and the difficulties of such an interpretation..- 3.1. Introduction to Bayes' theorem and its applications.- We give an overview, abstracted from technical detail, of the ideas and methods of the main study, and we describe the principal sources of difficulties and how we go about meeting them..- 3.1 A. An example applying Bayes' theorem with both initial odds and parameters known.- 3.1B. Selecting words and weighting their evidence.- 3.1C. Initial odds.- 3.1D. Unknown parameters.- 3.2. Handling unknown parameters of data distributions.- We begin to set out the components of our Bayesian analysis..- 3.2A. Choosing prior distributions.- 3.2B. The interpretation of the prior distributions.- 3.2C. Effect of varying the prior.- 3.2D. The posterior distribution of (?, ?).- 3.2E. Negative binomial.- 3.2F. Final choices of underlying constants.- 3.3. Selection of words.- The prior distributions are the route for allowing and protecting against selection effects in choice of words . We use an unselected pool of 90 words for estimating the underlying constants of the priors, and we assume the priors apply to the populations of words from which we developed our pool of 165 words. We then selectively reduce that pool to the final 30 words. We describe a stratification of words into word groups and our deletion of two groups because of contextuality..- 3.4. Log odds.- We compute the logarithm of the odds factor that changes initial odds to final odds and call it simply log odds. The computations use the posterior modal estimates as if they were exact and are made under the various choices of underlying constants and using both negative binomial or Poisson models..- 3.4A. Checking the method.- 3.4B. The disputed papers.- 3.5 Log odds by words and word groups.- 3.5A. Word groups.- 3.5B. Single words.- 3.5C. Contributions of marker and high-frequency words.- 3.6. Late Hamilton papers.- We assess the log odds for four of the late Federalist papers, written by Hamilton after the newspaper articles appeared and not used in any of our other analyses. The log odds all favor Hamilton, very strongly for all but the shortest paper..- 3.7. Adjustments to the log odds.- Through special studies, we estimate the magnitude of effects on the log odds of various approximations…

Titel
Applied Bayesian and Classical Inference
Untertitel
The Case of The Federalist Papers
EAN
9781461252566
Format
E-Book (pdf)
Veröffentlichung
06.12.2012
Digitaler Kopierschutz
Wasserzeichen
Anzahl Seiten
303