You may have to register before you can download all our books and magazines, click the sign up button below to create a free account.
Often considered more of an art than a science, books on clustering have been dominated by learning through example with techniques chosen almost through trial and error. Even the two most popular, and most related, clustering methods-K-Means for partitioning and Ward's method for hierarchical clustering-have lacked the theoretical underpinning req
The ubiquitous challenge of learning and decision-making from rank data arises in situations where intelligent systems collect preference and behavior data from humans, learn from the data, and then use the data to help humans make efficient, effective, and timely decisions. Often, such data are represented by rankings. This book surveys some recent progress toward addressing the challenge from the considerations of statistics, computation, and socio-economics. We will cover classical statistical models for rank data, including random utility models, distance-based models, and mixture models. We will discuss and compare classical and state-of-the-art algorithms, such as algorithms based on M...
A new and refreshingly different approach to presenting the foundations of statistical algorithms, Foundations of Statistical Algorithms: With References to R Packages reviews the historical development of basic algorithms to illuminate the evolution of today’s more powerful statistical algorithms. It emphasizes recurring themes in all statistical algorithms, including computation, assessment and verification, iteration, intuition, randomness, repetition and parallelization, and scalability. Unique in scope, the book reviews the upcoming challenge of scaling many of the established techniques to very large data sets and delves into systematic verification by demonstrating how to derive gen...
Full of real-world case studies and practical advice, Exploratory Multivariate Analysis by Example Using R, Second Edition focuses on four fundamental methods of multivariate exploratory data analysis that are most suitable for applications. It covers principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) a
The second edition of this bestseller provides a practical and accessible introduction to the main concepts, foundation, and applications of Bayesian networks. This edition contains a new chapter on Bayesian network classifiers and a new section on object-oriented Bayesian networks, along with new applications and case studies. It includes a new section that addresses foundational problems with causal discovery and Markov blanket discovery and a new section that covers methods of evaluating causal discovery programs. The book also offers more coverage on the uses of causal interventions to understand and reason with causal Bayesian networks. Supplemental materials are available on the book's website.
This volume presents the proceedings of the Second International Colloquium on Grammatical Inference (ICGI-94), held in Alicante, Spain in September 1994. Besides 25 research papers carefully selected and refereed by the program committee, the book contains a survey by E. Vidal. The book is devoted to all those aspects of automatic learning that explicitly focus on principles, theory, and applications of grammars and languages. The papers are organized in sections on formal aspects; language modelling and linguistic applications; stochastic approaches, applications and performance analysis; and neural networks, genetic algorithms, and artificial intelligence techniques.
This work presents approaches to modelling and control problems arising from conditions of ever increasing nonlinearity and complexity. It prescribes an approach that covers a wide range of methods being combined to provide multiple model solutions. Many component methods are described, as well as discussion of the strategies available for building a successful multiple model approach.
Statistical agencies, research organizations, companies, and other data stewards that seek to share data with the public face a challenging dilemma. They need to protect the privacy and confidentiality of data subjects and their attributes while providing data products that are useful for their intended purposes. In an age when information on data subjects is available from a wide range of data sources, as are the computational resources to obtain that information, this challenge is increasingly difficult. The Handbook of Sharing Confidential Data helps data stewards understand how tools from the data confidentiality literature—specifically, synthetic data, formal privacy, and secure compu...
Model a Wide Range of Count Time Series Handbook of Discrete-Valued Time Series presents state-of-the-art methods for modeling time series of counts and incorporates frequentist and Bayesian approaches for discrete-valued spatio-temporal data and multivariate data. While the book focuses on time series of counts, some of the techniques discussed ca
A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are most stable. In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for non-experts. In this paper we give a high-level overview about the existing literature on clustering stability. In addition to presenting the results in a slightly informal but accessible way, we relate them to each other and discuss their different implications.