You may have to register before you can download all our books and magazines, click the sign up button below to create a free account.
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.
Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.
Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.
Class-tested and up-to-date textbook for introductory courses on information retrieval.
Word embeddings are a form of distributional semantics increasingly popular for investigating lexical semantic change. However, typical training algorithms are probabilistic, limiting their reliability and the reproducibility of studies. Johannes Hellrich investigated this problem both empirically and theoretically and found some variants of SVD-based algorithms to be unaffected. Furthermore, he created the JeSemE website to make word embedding based diachronic research more accessible. It provides information on changes in word denotation and emotional connotation in five diachronic corpora. Finally, the author conducted two case studies on the applicability of these methods by investigating the historical understanding of electricity as well as words connected to Romanticism. They showed the high potential of distributional semantics for further applications in the digital humanities.
Lucas Haasis found a time capsule: A complete mercantile letter archive of the merchant Nicolaus Gottlieb Luetkens who lived in 18th century Hamburg. Luetkens travelled France between 1743-1745 in order to become a successful wholesale merchant. He succeeded in this undertaking via both shrewd business practice and proficient skills in the practice of letter writing. Based on this unique discovery, in this microhistorical study Lucas Haasis examines the crucial steps and activities of a mercantile establishment phase, the typical letter practices of Early Modern merchants, and the practical principles of persuasion leading to success in the 18th century.
This volume is concerned with how ambiguity and ambiguity resolution are learned, that is, with the acquisition of the different representations of ambiguous linguistic forms and the knowledge necessary for selecting among them in context. Schütze concentrates on how the acquisition of ambiguity is possible in principle and demonstrates that particular types of algorithms and learning architectures (such as unsupervised clustering and neural networks) can succeed at the task. Three types of lexical ambiguity are treated: ambiguity in syntactic categorisation, semantic categorisation, and verbal subcategorisation. The volume presents three different models of ambiguity acquisition: Tag Space, Word Space, and Subcat Learner, and addresses the importance of ambiguity in linguistic representation and its relevance for linguistic innateness.
The refereed proceedings of the 4th International and Interdisciplinary Conference on Modeling and Using Context, CONTEXT 2003, held in Stanford, CA, USA in June 2003. The 31 full papers and 15 short papers presented were carefully reviewed, selected, and revised for inclusion in the book. The papers presented deal with the interdisciplinary topic of modeling and using context from various points of view, ranging through cognitive science, formal logic, artifical intelligence, computational intelligence, philosophical and psychological aspects, and information processing. Highly general philosophical and theoretical issues are complemented by specific applications in various fields.
This volume presents a selection of the best papers from the 21st Annual University of Wisconsin-Milwaukee Linguistics Symposium. Researchers from linguistics, psychology, computer science, and philosophy, using many different methods and focusing on many different facts of language, addressed the question of the existence of linguistic rules. Are such rules best seen as convenient tools for the description of languages, or are rules actually invoked by individual language users? Perhaps the most serious challenge to date to the linguistic rule is the development of connectionist architecture. Indeed, these systems must be viewed as a serious challenge to the foundations of all of contempora...
This open access book introduces Vector semantics, which links the formal theory of word vectors to the cognitive theory of linguistics. The computational linguists and deep learning researchers who developed word vectors have relied primarily on the ever-increasing availability of large corpora and of computers with highly parallel GPU and TPU compute engines, and their focus is with endowing computers with natural language capabilities for practical applications such as machine translation or question answering. Cognitive linguists investigate natural language from the perspective of human cognition, the relation between language and thought, and questions about conceptual universals, rely...