Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Securing Hadoop
Securing Hadoop
Securing Hadoop
Ebook238 pages2 hours

Securing Hadoop

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

This book is a step-by-step tutorial filled with practical examples which will focus mainly on the key security tools and implementation techniques of Hadoop security.This book is great for Hadoop practitioners (solution architects, Hadoop administrators, developers, and Hadoop project managers) who are looking to get a good grounding in what Kerberos is all about and who wish to learn how to implement end-to-end Hadoop security within an enterprise setup. It’s assumed that you will have some basic understanding of Hadoop as well as be familiar with some basic security concepts.
LanguageEnglish
Release dateNov 22, 2013
ISBN9781783285266
Securing Hadoop

Related to Securing Hadoop

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for Securing Hadoop

Rating: 4 out of 5 stars
4/5

2 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Securing Hadoop - Sudheesh Narayanan

    Table of Contents

    Securing Hadoop

    Credits

    About the Author

    About the Reviewers

    www.PacktPub.com

    Support files, eBooks, discount offers and more

    Why Subscribe?

    Free Access for Packt account holders

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Errata

    Piracy

    Questions

    1. Hadoop Security Overview

    Why do we need to secure Hadoop?

    Challenges for securing the Hadoop ecosystem

    Key security considerations

    Reference architecture for Big Data security

    Summary

    2. Hadoop Security Design

    What is Kerberos?

    Key Kerberos terminologies

    How Kerberos works?

    Kerberos advantages

    The Hadoop default security model without Kerberos

    Hadoop Kerberos security implementation

    User-level access controls

    Service-level access controls

    User and service authentication

    Delegation Token

    Job Token

    Block Access Token

    Summary

    3. Setting Up a Secured Hadoop Cluster

    Prerequisites

    Setting up Kerberos

    Installing the Key Distribution Center

    Configuring the Key Distribution Center

    Establishing the KDC database

    Setting up the administrator principal for KDC

    Starting the Kerberos daemons

    Setting up the first Kerberos administrator

    Adding the user or service principals

    Configuring LDAP as the Kerberos database

    Supporting AES-256 encryption for a Kerberos ticket

    Configuring Hadoop with Kerberos authentication

    Setting up the Kerberos client on all the Hadoop nodes

    Setting up Hadoop service principals

    Creating a keytab file for the Hadoop services

    Distributing the keytab file for all the slaves

    Setting up Hadoop configuration files

    HDFS-related configurations

    MRV1-related configurations

    MRV2-related configurations

    Setting up secured DataNode

    Setting up the TaskController class

    Configuring users for Hadoop

    Automation of a secured Hadoop deployment

    Summary

    4. Securing the Hadoop Ecosystem

    Configuring Kerberos for Hadoop ecosystem components

    Securing Hive

    Securing Hive using Sentry

    Securing Oozie

    Securing Flume

    Securing Flume sources

    Securing Hadoop sink

    Securing a Flume channel

    Securing HBase

    Securing Sqoop

    Securing Pig

    Best practices for securing the Hadoop ecosystem components

    Summary

    5. Integrating Hadoop with Enterprise Security Systems

    Integrating Enterprise Identity Management systems

    Configuring EIM integration with Hadoop

    Integrating Active-Directory-based EIM with the Hadoop ecosystem

    Accessing a secured Hadoop cluster from an enterprise network

    HttpFS

    HUE

    Knox Gateway Server

    Summary

    6. Securing Sensitive Data in Hadoop

    Securing sensitive data in Hadoop

    Approach for securing insights in Hadoop

    Securing data in motion

    Securing data at rest

    Implementing data encryption in Hadoop

    Summary

    7. Security Event and Audit Logging in Hadoop

    Security Incident and Event Monitoring in a Hadoop Cluster

    The Security Incident and Event Monitoring (SIEM) system

    Setting up audit logging in a secured Hadoop cluster

    Configuring Hadoop audit logs

    Summary

    A. Solutions Available for Securing Hadoop

    Hadoop distribution with enhanced security support

    Automation of a secured Hadoop cluster deployment

    Cloudera Manager

    Zettaset

    Different Hadoop data encryption options

    Dataguise for Hadoop

    Gazzang zNcrypt

    eCryptfs for Hadoop

    Securing the Hadoop ecosystem with Project Rhino

    Mapping of security technologies with the reference architecture

    Infrastructure security

    OS and filesystem security

    Application security

    Network perimeter security

    Data masking and encryption

    Authentication and authorization

    Audit logging, security policies, and procedures

    Security Incident and Event Monitoring

    Index

    Securing Hadoop


    Securing Hadoop

    Copyright © 2013 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: November 2013

    Production Reference: 1181113

    Published by Packt Publishing

    Ltd.Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78328-525-9

    www.packtpub.com

    Cover Image by Ravaji Babu (<ravaji_babu@outlook.com>)

    Credits

    Author

    Sudheesh Narayanan

    Reviewers

    Mark Kerzner

    Nitin Pawar

    Acquisition Editor

    Antony Lowe

    Commissioning Editor

    Shaon Basu

    Technical Editors

    Amit Ramadas

    Amit Shetty

    Project Coordinator

    Akash Poojary

    Proofreader

    Ameesha Green

    Indexer

    Rekha Nair

    Graphics

    Sheetal Aute

    Ronak Dhruv

    Valentina D'silva

    Disha Haria

    Abhinash Sahu

    Production Coordinator

    Nilesh R. Mohite

    Cover Work

    Nilesh R. Mohite

    About the Author

    Sudheesh Narayanan is a Technology Strategist and Big Data Practitioner with expertise in technology consulting and implementing Big Data solutions. With over 15 years of IT experience in Information Management, Business Intelligence, Big Data & Analytics, and Cloud & J2EE application development, he provided his expertise in architecting, designing, and developing Big Data products, Cloud management platforms, and highly scalable platform services. His expertise in Big Data includes Hadoop and its ecosystem components, NoSQL databases (MongoDB, Cassandra, and HBase), Text Analytics (GATE and OpenNLP), Machine Learning (Mahout, Weka, and R), and Complex Event Processing.

    Sudheesh is currently working with Genpact as the Assistant Vice President and Chief Architect – Big Data, with focus on driving innovation and building Intellectual Property assets, frameworks, and solutions. Prior to Genpact, he was the co-inventor and Chief Architect of the Infosys BigDataEdge product.

    I would like to thank my wife, Smita and son, Aryan for their sacrifices and support during this journey, and my dad, mom, and sister for encouraging me at all times to make a difference by contributing back to the community. This book would not have been possible without their encouragement and constant support.

    Special thanks to Rupak and Debika for investing their personal time over weekends to help me experiment with a few ideas on Hadoop security, and for being the bouncing board.

    I would like to thank Shwetha, Sivaram, Ajay, Manpreet, and Venky for providing constant feedback and helping me make continuous improvements in my securing Hadoop journey.

    Above all, I would like to acknowledge my sincere thanks to my teacher, Prof. N. C. Jain; my leaders and coach Paddy, Vishnu Bhat, Sandeep Bhagat, Jaikrishnan, Anil D'Souza, and KNM Rao for their mentoring and guidance in making me who I am today, so that I could write this book.

    About the Reviewers

    Mark Kerzner holds degrees in Law, Math, and Computer Science. He has been designing software for many years and Hadoop-based systems since 2008. He is the President of SHMsoft, a provider of Hadoop applications for various verticals, and a co-author of the Hadoop illuminated book/project. He has authored and co-authored books and patents.

    I would like to acknowledge the help of my colleagues, in particular, Sujee Maniyam, and last but not the least, my multitalented family.

    Nitin Pawar started his career as a Release Engineer and Tools Developer, then moved into different roles such as operations, solutions engineering, process engineering, and Big Data analytics. Currently, he is working as a Big Data System Architect, and trying to solve problems related to customer success management. He has mainly been working with technologies revolving around the first generation Hadoop ecosystem.

    www.PacktPub.com

    Support files, eBooks, discount offers and more

    You might want to visit www.PacktPub.com for support files and downloads related to your book.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    http://PacktLib.PacktPub.com

    Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

    Why

    Enjoying the preview?
    Page 1 of 1