Coverart for item
The Resource Practical Hive : a guide to Hadoop's data warehouse system, Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard, (electronic book)

Practical Hive : a guide to Hadoop's data warehouse system, Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard, (electronic book)

Label
Practical Hive : a guide to Hadoop's data warehouse system
Title
Practical Hive
Title remainder
a guide to Hadoop's data warehouse system
Statement of responsibility
Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard
Contributor
Author
Subject
Language
eng
Summary
Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software. In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. What You Will Learn Install and configure Hive for new and existing datasets Perform DDL operations Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices Who This Book Is For Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.
Member of
Cataloging source
N$T
Dewey number
005.74/5
Illustrations
illustrations
Index
index present
LC call number
QA76.9.D37
Literary form
non fiction
Nature of contents
dictionaries
http://library.link/vocab/relatedWorkOrContributorDate
  • 1970-
  • 1978-
http://library.link/vocab/relatedWorkOrContributorName
  • Shaw, Scott
  • Vermeulen, Andreas François
  • Gupta, Ankur
  • Kjerrumgaard, David
http://library.link/vocab/subjectName
  • Data warehousing
  • Computer Science
  • Computer Science, general
  • Data Storage Representation
  • Systems and Data Security
  • Data Structures
  • Database Management
  • Files
Label
Practical Hive : a guide to Hadoop's data warehouse system, Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard, (electronic book)
Instantiates
Publication
Distribution
Copyright
Note
Includes index
Antecedent source
unknown
Carrier category
online resource
Carrier category code
  • cr
Carrier MARC source
rdacarrier
Color
multicolored
Content category
text
Content type code
  • txt
Content type MARC source
rdacontent
Contents
  • At a Glance; Contents; About the Authors; About the Technical Reviewers; Acknowledgments; Introduction; Chapter 1: Setting the Stage for Hive: Hadoop; An Elephant Is Born; Hadoop Mechanics; Data Redundancy; Traditional High Availability; Hadoop High Availability; Processing with MapReduce; Beyond MapReduce; YARN and the Modern Data Architecture; Hadoop and the Open Source Community; Where Are We Now; Chapter 2: Introducing Hive; Hadoop Distributions; Cluster Architecture; Hive Installation; Finding Your Way Around; Hive CLI; Chapter 3: Hive Architecture; Hive Components; HCatalog
  • Hiveserver2Client Tools; Execution Engine: Tez; Chapter 4: Hive Tables DDL; Schema-on-Read; Hive Data Model; Schemas/Databases; Why Use Multiple Schemas/Databases; Creating Databases; Altering Databases; Dropping Databases; List Databases; Data Types in Hive; Primitive Data Types; Choosing Data Types; Complex Data Types; Arrays; Maps; Structs; Unions; Tables; Creating Tables; Listing Tables; Internal/External Tables; External Tables; Internal or Managed Tables; External/Internal Table Example; Table Properties; Generating a Create Table Command for Existing Tables; Partitioning and Bucketing
  • PartitioningPartitioning Considerations; Efficiently Partitioning on Date Columns; Bucketing; Bucketing Considerations; Temporary Tables; Altering Tables; Renaming Tables; Modifying a Table's Storage Properties; ORC File Format; Merging a Table's Files; Altering Table Partitions; Add Partition; Rename Partition; Modifying Columns; Adding Columns; Dropping Tables/Partitions; Drop Tables; Dropping Partitions; Protecting Tables/Partitions; Other Create Table Command Options; Create Table as Select (CTAS); Create Table Like; Chapter 5: Data Manipulation Language (DML); Loading Data into Tables
  • Loading Data Using Files Stored on the Hadoop Distributed File SystemUsing Hive to Upload a Data File; Loading Data Using Queries; Using an Existing Table to Create a New Table; Writing Data into the File System from Queries; Using an Existing Table to Create an Output Directory; Inserting Values Directly into Tables; Adding Extra Records to an Existing Table; Updating Data Directly in Tables; Updating Records in an Existing Table; Deleting Data Directly in Tables; Updating Records in an Existing Table; Creating a Table with the Same Structure
  • Using an Existing Table to Create a New Table with the Same StructureJoins; Using Equality Joins to Combine Tables; Joining Tables in Hive; Using Outer Joins; Joining Tables in Hive Using Left Join; Joining Tables in Hive Using Right Join; Joining Tables in Hive Using a Full Outer Join; Using Left Semi-Joins; Performing a Semi-Join; Using Join with Single MapReduce; Joining Three Tables in One MapReduce; Using Largest Table Last; Transactions; What Is ACID and Why Use It?; Hive Configuration; Chapter 6: Loading Data into Hive; Design Considerations Before Loading Data; Loading Data into HDFS
Control code
SPR957557490
Dimensions
unknown
Extent
1 online resource (xxi, 265 pages)
File format
unknown
Form of item
online
Isbn
9781484202715
Level of compression
unknown
Media category
computer
Media MARC source
rdamedia
Media type code
  • c
Other control number
10.1007/978-1-4842-0271-5
Other physical details
illustrations (some color)
Quality assurance targets
not applicable
Reformatting quality
unknown
Sound
unknown sound
Specific material designation
remote
Label
Practical Hive : a guide to Hadoop's data warehouse system, Scott Shaw, Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard, (electronic book)
Publication
Distribution
Copyright
Note
Includes index
Antecedent source
unknown
Carrier category
online resource
Carrier category code
  • cr
Carrier MARC source
rdacarrier
Color
multicolored
Content category
text
Content type code
  • txt
Content type MARC source
rdacontent
Contents
  • At a Glance; Contents; About the Authors; About the Technical Reviewers; Acknowledgments; Introduction; Chapter 1: Setting the Stage for Hive: Hadoop; An Elephant Is Born; Hadoop Mechanics; Data Redundancy; Traditional High Availability; Hadoop High Availability; Processing with MapReduce; Beyond MapReduce; YARN and the Modern Data Architecture; Hadoop and the Open Source Community; Where Are We Now; Chapter 2: Introducing Hive; Hadoop Distributions; Cluster Architecture; Hive Installation; Finding Your Way Around; Hive CLI; Chapter 3: Hive Architecture; Hive Components; HCatalog
  • Hiveserver2Client Tools; Execution Engine: Tez; Chapter 4: Hive Tables DDL; Schema-on-Read; Hive Data Model; Schemas/Databases; Why Use Multiple Schemas/Databases; Creating Databases; Altering Databases; Dropping Databases; List Databases; Data Types in Hive; Primitive Data Types; Choosing Data Types; Complex Data Types; Arrays; Maps; Structs; Unions; Tables; Creating Tables; Listing Tables; Internal/External Tables; External Tables; Internal or Managed Tables; External/Internal Table Example; Table Properties; Generating a Create Table Command for Existing Tables; Partitioning and Bucketing
  • PartitioningPartitioning Considerations; Efficiently Partitioning on Date Columns; Bucketing; Bucketing Considerations; Temporary Tables; Altering Tables; Renaming Tables; Modifying a Table's Storage Properties; ORC File Format; Merging a Table's Files; Altering Table Partitions; Add Partition; Rename Partition; Modifying Columns; Adding Columns; Dropping Tables/Partitions; Drop Tables; Dropping Partitions; Protecting Tables/Partitions; Other Create Table Command Options; Create Table as Select (CTAS); Create Table Like; Chapter 5: Data Manipulation Language (DML); Loading Data into Tables
  • Loading Data Using Files Stored on the Hadoop Distributed File SystemUsing Hive to Upload a Data File; Loading Data Using Queries; Using an Existing Table to Create a New Table; Writing Data into the File System from Queries; Using an Existing Table to Create an Output Directory; Inserting Values Directly into Tables; Adding Extra Records to an Existing Table; Updating Data Directly in Tables; Updating Records in an Existing Table; Deleting Data Directly in Tables; Updating Records in an Existing Table; Creating a Table with the Same Structure
  • Using an Existing Table to Create a New Table with the Same StructureJoins; Using Equality Joins to Combine Tables; Joining Tables in Hive; Using Outer Joins; Joining Tables in Hive Using Left Join; Joining Tables in Hive Using Right Join; Joining Tables in Hive Using a Full Outer Join; Using Left Semi-Joins; Performing a Semi-Join; Using Join with Single MapReduce; Joining Three Tables in One MapReduce; Using Largest Table Last; Transactions; What Is ACID and Why Use It?; Hive Configuration; Chapter 6: Loading Data into Hive; Design Considerations Before Loading Data; Loading Data into HDFS
Control code
SPR957557490
Dimensions
unknown
Extent
1 online resource (xxi, 265 pages)
File format
unknown
Form of item
online
Isbn
9781484202715
Level of compression
unknown
Media category
computer
Media MARC source
rdamedia
Media type code
  • c
Other control number
10.1007/978-1-4842-0271-5
Other physical details
illustrations (some color)
Quality assurance targets
not applicable
Reformatting quality
unknown
Sound
unknown sound
Specific material designation
remote

Library Locations

Processing Feedback ...