White Paper: Closing the Big Data Management & Security Gap
2
Contents
Big Data Is Gaining Momentum, but Increasing Concerns, Too .................................................................. 3
Big Data Projects Still Rely Heavily on Professional Services ................................................................................... 3
Security Still a Top Concern for Big Data Platforms ................................................................................................. 4
How Organizations Should Automate and Secure Big Data Deployments ................................................. 5
Zettaset Delivers a Safer, More Automated and Secure Solution .............................................................. 6
The Bigger Truth ......................................................................................................................................... 7
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The
Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are
subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of
this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the
express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and,
if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
White Paper: Closing the Big Data Management & Security Gap
6
While set up and configuration of a few management and data nodes in a Hadoop cluster may be touted as
relatively easy to do, the manual effort introduces chances of errors, which are increased for each additional
instance. Having an automated system for deployment simplifies this process, making for both a more scalable and
more reliably protected environment.
Encryption may seem like a common “tick box” option on many Hadoop distributions, but not all follow the same
conventions or coverage model. Ensure that all data on disk is covered with strong encryption, and take steps to
also guard against network attacks for data being transferred between nodes; during extract, transform, and load
activities; and when exporting information. Data masking can also be useful if certain fields need to be identifiably
unique for analytics without exposing their actual contents.
Though encryption itself may seem quite simple to turn on, key management is often the weak point of solutions,
particularly in larger, more varied, or more dynamic environments. Unique keys should be generated and controlled
via customizable policies, kept and provided in a highly available source, and compliant with KMIP definitions. Key
management should also have role-‐based administration and auditing capabilities.
Even if the whole environment is defended from external attacks using these mechanisms, steps should be taken to
limit access to particular data sets for only authenticated users. This should be fine-‐grained, role-‐based,
automatically tied into AD and LDAP protocols, and carry over permissions as specified from these proven access
control systems.
From a broader perspective, additional steps should be explored as best practices, including establishing a security
zone for the analytics servers, deploying these servers in a hardened configuration, frequent scanning and timely
patching, and traffic monitoring. These approaches are not necessarily different for Hadoop environments,
however, and should be considered as a standard part of a larger IT security framework.
Although a non-‐trivial undertaking, IT technology decision makers should build these into their “must have”
evaluation criteria, and select products that have functionality to match.
Zettaset Delivers a Safer, More Automated and Secure Solution
While many companies, young and old, are rushing to capitalize on the new opportunities afforded by big data,
many vendors are seeking to provide them with the technology to do so. Of these, some focus on performance,
some on connectivity, and some on vertical-‐specific applications. Zettaset is differentiating with a focus on building
rock solid enterprise-‐ready management and security applications that augment and improve the branded open-‐
source distribution frameworks. In doing so, Zettaset enables other vendors’ big data solutions to also better meet
enterprise operational requirements. As already noted, these requirements may not be top of mind for the DBA or
data scientist, but they will be critical steps before IT infrastructure and operations teams can adopt the new
solutions and begin enterprise-‐wide production deployments.
Zettaset’s Orchestrator provides a more mature, more comprehensive approach to managing big data
environments, automating and standardizing common activities like cluster configuration, node deployment, set up
of interfaces to applications, general administration, and not least, securing Hadoop environments.
With the recent Fast-‐PATH addition, Orchestrator process automation reduces reliance on manual efforts and
accelerates database cluster deployment.
In the company’s internal benchmark testing, Zettaset found Fast-‐PATH
was able to fully install a 50-‐node Hadoop cluster in 140 minutes, which would almost certainly be quicker and less
error-‐prone than a manual effort. The benchmark time includes installation of the Hadoop distribution, as well as
installation of Kerberos, HBase, Hive, Encryption, Key Management, and Zettaset’s patented High-‐Availability
framework on all nodes.
Orchestrator Fast-‐PATH dramatically lowers operational costs and reduces the IT resource
requirements necessary to implement Hadoop, as well as reduces time to value from weeks to hours.
Now Zettaset is going a step further and modularizing key components, like Hadoop security and their patented
multi-‐service high availability and automated failover, to more easily complement and integrate with popular
Hadoop distributions from Cloudera and Hortonworks. This enterprise-‐class add-‐on functionality enhances the
White Paper: Closing the Big Data Management & Security Gap
7
management and security mechanisms of most branded distributions, and will help address the considerations
outlined in Table 1.
Specific modularized Big Data management and security capabilities include:
•
•
•
Data-‐at-‐rest Encryption – Zettaset offers a standards-‐based, low-‐overhead approach linking up AES-‐256
bit disk partition encryption with existing frameworks, and smoothly interoperates with KMIP-‐
compliant key management, PKCS hardware security modules, and a wide range of leading Hadoop
distributions and NoSQL databases. This complements open source encryption approaches for data in
motion in Hadoop clusters, and also ensures the Orchestrator console communications are safe.
Multi-‐Service High Availability -‐ Hadoop cluster environments are complex, and require multiple
services to productively function. Zettaset Orchestrator uniquely delivers enterprise class high
availability with automated fail-‐over for all Hadoop services running in a cluster, eliminating single
points of failure that exist in open source Hadoop, and delivering the robust security and compliance
capabilities that enterprises expect and need.
Fine-‐Grained, Role-‐based Access Control – Because Hadoop may often contain a wide range of
information, both management tools and data itself must be restricted to those who “need to know.”
Fine-‐grained controls ensure that roles and permissions can be easily customized, and that only
appropriate administrators and users can make changes or access sensitive information.
Zettaset has a bigger vision, too, including smoother deployments, better reliability, improved performance, and
easier support and administration for broader big data environments. Centralizing and certifying management of all
required functions to meet enterprise operational standards will go a long way to facilitating the adoption of
technologies that are still evolving and maturing. Modularizing the Zettaset offerings opens them up to the wider
community with a flexible “a la carte” menu to suit specific enterprise requirements, while also paving the way for
an expanded, more comprehensive, and fully integrated solution for big data management and security.
The Bigger Truth
Big data is rapidly entering the mainstream, and new data platforms like Hadoop and NoSQL databases are
becoming increasingly popular tools to capture and serve up more enterprise data than ever before, spanning
sensitive personal profile, health, financial, and sometimes R&D information. Not only is more data being collected
and compiled into a single repository, but also more people are being given access to this data across multiple lines
of business for application development and for analysis and reporting. Yet these emerging technologies are not yet
fully mature in their security capabilities, increasing the risk of a “super breach.” The financial repercussions and
brand damage of an incident are well documented, as are the limitations of simple perimeter-‐based security
products.
While many are leaping into the big data opportunity with enthusiasm, the need to build a robust, manageable, and
safe solution is paramount. Many vendors are paying lip-‐service to these issues, but few have really understood the
scope of the problem or yet endeavored to design and implement a truly protected product. Zettaset has focused
on building more comprehensive security and management functionality, and offers a great complementary
solution that addresses the inherent risks of Hadoop distribution frameworks.