Few books on statistical data analysis in the natural sciences are written at a level that a non-statistician will easily understand. This is a book written in colloquial language, avoiding mathematical formulae as much as possible, trying to explain statistical methods using examples and graphics instead. To use the book efficiently, readers should have some computer experience. The book starts with the simplest of statistical concepts and carries readers forward to a deeper and more extensive understanding of the use of statistics in environmental sciences. The book concerns the application of statistical and other computer methods to the management, analysis and display of spatial data. These data are characterised by including locations (geographic coordinates), which leads to the necessity of using maps to display the data and the results of the statistical methods. Although the book uses examples from applied geochemistry, and a large geochemical survey in particular, the principles and ideas equally well apply to other natural sciences, e.g., environmental sciences, pedology, hydrology, geography, forestry, ecology, and health sciences/epidemiology.
The book is unique because it supplies direct access to software solutions (based on R, the Open Source version of the S-language for statistics) for applied environmental statistics. For all graphics and tables presented in the book, the R-scripts are provided in the form of executable R-scripts. In addition, a graphical user interface for R, called DAS+R, was developed for convenient, fast and interactive data analysis.
Statistical Data Analysis Explained: Applied Environmental Statistics with R provides, on an accompanying website, the software to undertake all the procedures discussed, and the data employed for their description in the book.
Autorentext
Clemens Reiman (born 1952) holds an M.Sc. in Mineralogy and Petrology from the University of Hamburg (Germany), a Ph.D. in Geosciences from Leoben Mining University, Austria, and a D.Sc. in Applied Geochemistry from the same university. he has worked as a lecturer in Mineralogy and Petrology and Environmental Sciences at Leoben Mining University, as an exploration geochemist in eastern Canada, in contract research in environmental sciences in Austria and managed the laboratory of an Austrian cement company before joining the Geological Survey of Norway in 1991 as a senior geochemist. From March to October 2004 he was director and professor at the German Federal Environment Agency (Unweltbundesamt, UBAS), responsible for the Division II, Environmental Health and Protection of Ecosystems. At present he is chairman of the EuroGeoSurveys geochemistry expert group, acting vice president of the International Association of GeoChemistry (IAGC), and associate editor of both Applied Geochemistry and Geochemistry: Exploration, Environment, Analysis.
Peter Filzmoser (born 1968) studies Applied Mathematics at the Vienna University of Technology, Austria, where he also wrote his doctoral thesis and habilitation devoted to the field of multivariate statistics. His research led him to the area of robust statistics, resulting in many international collaborations and various scientific papers in this area. His interest in applications of robust methods resulted in the development of R software packages. He was and is involved in the Organisation of several scientific evens devoted to robust statistics. Since 2001 he has been dozent at the Statistics Department at Vienna University of Technology. He was visiting professor at the universities of Vienna, Toulouse and Minsk.
Robert G. Garrett (Bob Garrett) studied Mining Geology and Applied Geochemistry at Imperial College, London, and joined the Geological Survey of Canada (GSC) in 1967 following post-doctoral studies at Northwestern University, Evanston. For the next 25 years his activities focused on regional geochemical mapping in Canada, and overseas for the Canadian International Development Agency, to support mineral exploration and resource appraisal. Throughout his work there has been a use of computers and statistics to manage data, assess their quality, and maximise the knowledge extracted from them. In the 1990s he commenced collaboration crops. Since then he has been involved in various Canadian Federal and university-based research initiatives aimed at providing sound science to support Canadian regulatory and international policy activities concerning risk assessments and risk management for metals. he retired in March 2005 but remains active as an Emeritus Scientist.
Rudolf Dutter is senior statistician and full professor at Vienna University of Technology, Austria. he studies Applied Mathematics in Vienna (M.Sc.) and Statistics at Universite de Montreal, Canada (Ph.D.). He spent three years as a post-doctoral fellow at ETH, Zurich, working on computational robust statistics. research and teaching activities followed at the Graz University of Technology, and as a full professor of statistics at Vienna University of Technology, both in Austria. he also taught and consulted at Leoben Mining University, Technology, both in Austria. he also taught and consulted at Leoben Mining University, Austria; currently he consults in many fields of applied statistics with main interests in computational and robust statistics, development of statistical software, and geostatistics. He is author and coauthor of many publications and several books, e.g., an early booklet in German on geostatistics.
Inhalt
Preface xiii
Acknowledgements xv
About the authors xvii
1 Introduction 1
1.1 The Kola Ecogeochemistry Project 5
1.1.1 Short description of the Kola Project survey area 6
1.1.2 Sampling and characteristics of the different sample materials 9
1.1.3 Sample preparation and chemical analysis 11
2 Preparing the Data for Use in R and DAS+R 13
2.1 Required data format for import into R and DAS+R 14
2.2 The detection limit problem 17
2.3 Missing values 20
2.4 Some "typical" problems encountered when editing a laboratory data report file to a DAS+R file 21
2.4.1 Sample identification 22
2.4.2 Reporting units 22
2.4.3 Variable names 23
2.4.4 Results below the detection limit 23
2.4.5 Handling of missing values 24
2.4.6 File structure 24
2.4.7 Quality control samples 25
2.4.8 Geographical coordinates, further editing and some unpleasant limitations of spreadsheet programs 25
2.5 Appending and linking data files 25
2.6 Requirements for a geochemical database 27
2.7 Summary 28
3 Graphics to Display the Data Distribution 29
3.1 The one-dimensional scatterplot 29
3.2 The histogram 31
3.3 The density trace 34
3.4 Plots of the distribution function 35
3.4.1 Plot of the cumulative distribution function (CDF-plot) 35
3.4.2 Plot of the empirical cumulative distribution function (ECDF-plot) 36
3.4.3 The quantile-quantile plot (QQ-plot) 36
3.4.4 The cumulative probability plot (CP-plot) 39
3.4.5 The probability-probability plot (PP-plot) 40
3.4.6 Discussion of the distribution function plots 41
3.5 Boxplots 41
3.5.1 The Tukey boxplot 42
3.5.2 The log-boxplot 44
3.5.3 The percentile-based boxplot and the box-and-whisker plot 46
3.5.4 The notched boxplot 47
3.6 Combination of histogram, density trace, one-dimensional scatterplot, boxplot, and ECDF-plot 48
3.7 Combination of histogram, boxplot or box-and-whisker plot, ECDF-plot, and CP-plot 49
3.8 Summary 50
4 Statistical Distribution Measures 51
4.1 Central value 51
4.1.1 The arithmetic mean 51
4.1.2 The geom…