Coverart for item
The Resource Data-intensive text processing with MapReduce, Jimmy Lin and Chris Dyer, (electronic book)

Data-intensive text processing with MapReduce, Jimmy Lin and Chris Dyer, (electronic book)

Label
Data-intensive text processing with MapReduce
Title
Data-intensive text processing with MapReduce
Statement of responsibility
Jimmy Lin and Chris Dyer
Creator
Contributor
Subject
Language
eng
Summary
Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains. This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well
Member of
Cataloging source
CaBNvSL
http://library.link/vocab/creatorDate
1979-
http://library.link/vocab/creatorName
Lin, Jimmy
Illustrations
illustrations
Index
no index present
Literary form
non fiction
Nature of contents
dictionaries
http://library.link/vocab/relatedWorkOrContributorName
Dyer, Chris
Series statement
  • Synthesis digital library of engineering and computer science
  • Synthesis lectures on human language technologies
Series volume
7
http://library.link/vocab/subjectName
  • Database management
  • Cloud computing
  • Parallel processing (Electronic computers)
  • Electronic data processing
Target audience
adult
Label
Data-intensive text processing with MapReduce, Jimmy Lin and Chris Dyer, (electronic book)
Instantiates
Publication
Antecedent source
file reproduced from original
Bibliography note
Includes bibliographical references (p. 149-163)
Color
multicolored
Contents
  • 1. Introduction -- Computing in the clouds -- Big ideas -- Why is this different -- What this book is not --
  • 2. MapReduce basics -- Functional programming roots -- Mappers and reducers -- The execution framework -- Partitioners and combiners -- The distributed file system -- Hadoop cluster architecture -- Summary --
  • 3. MapReduce algorithm design -- Local aggregation -- Combiners and in-mapper combining -- Algorithmic correctness with local aggregation -- Pairs and stripes -- Computing relative frequencies -- Secondary sorting -- Relational joins -- Reduce-side join -- Map-side join -- Memory-backed join -- Summary --
  • 4. Inverted indexing for text retrieval -- Web crawling -- Inverted indexes -- Inverted indexing: baseline implementation -- Inverted indexing: revised implementation -- Index compression -- Byte-aligned and word-aligned codes -- Bit-aligned codes -- Postings compression -- What about retrieval -- Summary and additional readings --
  • 5. Graph algorithms -- Graph representations -- Parallel breadth-first search -- PageRank -- Issues with graph processing -- Summary and additional readings --
  • 6. EM algorithms for text processing -- Expectation maximization -- Maximum likelihood estimation -- A latent variable marble game -- MLE with latent variables -- Expectation maximization -- An EM example -- Hidden Markov models -- Three questions for hidden Markov models -- The forward algorithm -- The Viterbi algorithm -- Parameter estimation for HMMs -- Forward-backward training: summary -- EM in MapReduce -- HMM training in MapReduce -- Case study: word alignment for statistical machine translation -- Statistical phrase-based translation -- Brief digression: language modeling with MapReduce -- Word alignment -- Experiments -- EM-like algorithms -- Gradient-based optimization and log-linear models -- Summary and additional readings --
  • 7. Closing remarks -- Limitations of MapReduce -- Alternative computing paradigms -- MapReduce and beyond --
  • Bibliography -- Authors' biographies
Dimensions
unknown
Extent
1 electronic text (ix, 165 p. : ill.)
File format
multiple file formats
Form of item
electronic
Isbn
9781608453436
Level of compression
unknown
Other physical details
digital file.
Quality assurance targets
unknown
Reformatting quality
access
Specific material designation
remote
System details
System requirements: Adobe Acrobat Reader. ;
Label
Data-intensive text processing with MapReduce, Jimmy Lin and Chris Dyer, (electronic book)
Publication
Antecedent source
file reproduced from original
Bibliography note
Includes bibliographical references (p. 149-163)
Color
multicolored
Contents
  • 1. Introduction -- Computing in the clouds -- Big ideas -- Why is this different -- What this book is not --
  • 2. MapReduce basics -- Functional programming roots -- Mappers and reducers -- The execution framework -- Partitioners and combiners -- The distributed file system -- Hadoop cluster architecture -- Summary --
  • 3. MapReduce algorithm design -- Local aggregation -- Combiners and in-mapper combining -- Algorithmic correctness with local aggregation -- Pairs and stripes -- Computing relative frequencies -- Secondary sorting -- Relational joins -- Reduce-side join -- Map-side join -- Memory-backed join -- Summary --
  • 4. Inverted indexing for text retrieval -- Web crawling -- Inverted indexes -- Inverted indexing: baseline implementation -- Inverted indexing: revised implementation -- Index compression -- Byte-aligned and word-aligned codes -- Bit-aligned codes -- Postings compression -- What about retrieval -- Summary and additional readings --
  • 5. Graph algorithms -- Graph representations -- Parallel breadth-first search -- PageRank -- Issues with graph processing -- Summary and additional readings --
  • 6. EM algorithms for text processing -- Expectation maximization -- Maximum likelihood estimation -- A latent variable marble game -- MLE with latent variables -- Expectation maximization -- An EM example -- Hidden Markov models -- Three questions for hidden Markov models -- The forward algorithm -- The Viterbi algorithm -- Parameter estimation for HMMs -- Forward-backward training: summary -- EM in MapReduce -- HMM training in MapReduce -- Case study: word alignment for statistical machine translation -- Statistical phrase-based translation -- Brief digression: language modeling with MapReduce -- Word alignment -- Experiments -- EM-like algorithms -- Gradient-based optimization and log-linear models -- Summary and additional readings --
  • 7. Closing remarks -- Limitations of MapReduce -- Alternative computing paradigms -- MapReduce and beyond --
  • Bibliography -- Authors' biographies
Dimensions
unknown
Extent
1 electronic text (ix, 165 p. : ill.)
File format
multiple file formats
Form of item
electronic
Isbn
9781608453436
Level of compression
unknown
Other physical details
digital file.
Quality assurance targets
unknown
Reformatting quality
access
Specific material designation
remote
System details
System requirements: Adobe Acrobat Reader. ;

Library Locations

Processing Feedback ...