DBTA Best Practices Going Hybrid Data Management

Published on February 2017 | Categories: Documents | Downloads: 33 | Comments: 0 | Views: 228
of 9
Download PDF   Embed   Report

Comments

Content

Hortonworks
PAGE 14

GOING HYBRID:
THE NEXT ERA OF
DATA MANAGEMENT

MemSQL
PAGE 16

A HYBRID APPROACH
TO DATA PROCESSING

Denodo
Technologies
PAGE 17

DATA VIRTUALIZATION:
THE FOUNDATION FOR A
SUCCESSFUL HYBRID DATA
ARCHITECTURE

Splice Machine

GOING
HYBRID
The Next Era of
Data Management

PAGE 18

POWERING REAL-TIME
APPLICATIONS
AND OFFLOADING
OPERATIONAL REPORTS
WITH AN OPERATIONAL
DATA LAKE

GridGain Systems
PAGE 19

ACCELERATE BUSINESS
INSIGHTS BY MANAGING
HYBRID DATA IN MEMORY

Best Practices Series

12

FEBRUARY/MARCH 2015 | DBTA

GOING
HYBRID

The Next Era of Data Management
Best Practices Series
For powering today’s enterprises, no
one single solution does it all. Rather,
organizations rely on varied—and often
eclectic—mixes of databases, platforms,
systems, and frameworks. The spotlight
may currently be on Hadoop and all-flash
storage systems as the stars of the big
data show, but there are many other cast
members as well. A well-functioning
data environment requires an entire
ensemble of approaches that include,
but aren’t necessarily limited to, Hadoop
and all-flash storage. These consist of
relational database management systems,
enterprise data warehouses, in-memory
systems, disk and tape storage systems,
NoSQL databases, and cloud-based data

environments. Bringing all these elements
together into hybrid approaches may
potentially deliver faster, better, and more
scalable approaches to data management.
Workloads and applications may vary
on a day-to-day basis. A typical hybrid
architecture may consist of a relational
database management system running
a transactional system, with data sent
to an in-memory database supporting
analytics platform. Or, there may be an
open source framework such as Hadoop
at the back end to manage and create
files with big data that would be too
expensive to send through the extract,
transform, and load processes of the
enterprise data warehouse. A hybrid data

architecture may also consist of remote
databases that capture data from outside
the walls of the enterprise, which is then
sent to an enterprise data warehouse
at the secondary level to support data
transformation and governance. Such
an environment may also have NoSQL
databases at the front end to support data
access and analysis.
The key is that business requirements
are constantly changing, and the data
infrastructure has to be flexible enough
to evolve with these requirements.
Enterprises need to be able to scale to new
configurations, or even swap out existing
solutions for newer technologies. An open,
hybrid architecture enables such agility.

FEBRUARY/MARCH 2015 | DBTA

There are compelling benefits to
deploying hybrid data environments,
especially in terms of speed, flexibility,
and costs. That’s because hybrid data
environments can be built on enterprisewide service layers that can dynamically
scale against back-end on-premises,
virtualized, or cloud-based resources,
while employing in-memory, clustered,
and parallel processing resources. As
business needs evolve, DBAs and data
managers need to be able to provision
and stand up databases and supporting
infrastructures that can quickly support
such growth. Hybrid environments
provide an array of choices to enable
rapid implementations. At the same
time, businesses need to avoid the costs
involved in investing in high-end systems
that may need to be scaled back, or may
not be suitable for their requirements 3 or
4 years down the road.
Here are eight ways hybrid approaches
can be effectively deployed to support
data management:

systems with a range of new capabilities,
offering the option to move to either
in-memory or traditional disk-based
storage. These databases are optimal when
enterprises require a high-performance,
relatively small footprint without the
expense and resources needed for moving
data back and forth between disks—
often seen as a latency factor in many
traditional database settings.

WORK WITH BUSINESS
REQUIREMENTS

SUPPORT HYBRID
STORAGE SYSTEMS

Systems, and even architectures, are
in a great state of flux, as enterprises
wrestle with ever-shifting requirements
while also attempting to stay competitive
with digital strategies. There are many
options to address opportunities and
problems, and every enterprise and
department of an enterprise has its own
business requirements, budgets, existing
technologies and approaches, and
available skills. That’s why as enterprises
seek to transition to digital, they need
to move in deliberate and well-planned
steps, as new approaches take root
alongside existing legacy systems and
processes. Business requirements vary,
plus budgets and priorities may vary.

There are many new options on the
table for storage, including traditional
hard disk drives and tape, flash memory,
solid state drives (SSDs), and cloudbased storage. The costs vary for these
various modes, requiring assessment of
the business requirements of each. Some
forms, such as physical hard disk drives
and tape, are lower cost but take more
time to access, so therefore may serve
better for data that is more infrequently
accessed, or in back-end archival roles.
More costly but faster and betterperforming forms of storage such as
SSDs may serve caching or short-term
storage requirements.

LOOK TO THE NEW BREED
OF RELATIONAL DATABASES
To meet varying demands, a new
generation of hybrid databases is now
emerging in the market. These data
platforms are typically relational database

INCORPORATE AND INTEGRATE
DIVERSE DATABASES INTO
A HYBRID ARCHITECTURE
There are a wide range of database
types—from relational database
management systems to NoSQL to
in-memory cloud databases—now in
today’s environments. Each serves specific
purposes, but the information they
handle needs to be available across the
enterprise. At the same time, each format
brings its own advantages in terms of cost
and ease of use. Play on the strengths of
each, but bring them together.

SUPPORT ANY AND
ALL DATA TYPES
Data from various sources – both
existing and being added on a regular
basis—will be in a variety of formats, be
they unstructured or semi-structured,
including ASCII, binary, and proprietary

13

formats. A hybrid data architecture is well
equipped to handle the variety—expected
and unexpected—that the business may
be bringing in.

ACHIEVE QUICK
IMPLEMENTATION
Hybrid data storage and warehouse
appliances can be quickly implemented
into existing infrastructures at relatively
low costs and with small footprints.
Appliances offer high-capacity
alternatives for mixed application
workloads and virtualized environments.

ENABLE DATA AS A
SERVICE (DAAS)
The challenge is to provide a data
environment in which the entire
organization can benefit, enabling all
parties—no matter how distributed they
are—to acquire, transform, move, clean,
stage, model, govern, deliver, explore,
collect, move, replicate, share, analyze,
catalog, publish, search, back up, and
archive the data they are working with.
Ultimately, this ends up as a data as a
service layer that provides for all these
requirements, while ensuring control,
security, privacy, reliability, and scalability
—along with a great user experience on
the front end.

BUILD A SKILLS REPERTOIRE
While there’s always a strong case to
be made for specialization, particularly
in database technologies, enterprises are
fast requiring a broad range of skill sets
to power their data environments. As
hybrid environments and architectures
increasingly become the norm, there
will be critical demand for data
managers capable of addressing multiple
environments, or at least being able to
acquire help on an as-needed basis. This
is part of the ongoing evolution of the
jobs of data managers, who see their roles
evolving from collectors and installers of
data, to high-level consultative roles to the
business, serving as brokers and curators. 


—Joe McKendrick

14

FEBRUARY/MARCH 2015 | DBTA

Sponsored Content

®

GOING HYBRID
THE NEXT ERA OF DATA MANAGEMENT

As we step into an age where data is a
competitive advantage, our concepts of
data management need to be revised from
the days of the enterprise data warehouse.
In the big data age, a hybrid model of
data management can ensure continued
success and build long lasting value for
the enterprise. Hortonworks and Red Hat,
two pioneers in the open source space,
have worked closely together to build
agile, enterprise-grade big data solutions
for the enterprise of the future.

WHAT IS APACHE HADOOP?
Shortly after enterprise IT adopted
large scale systems to manage data, the
Enterprise Data Warehouse (EDW)
emerged as the logical home of all
enterprise data. Today, virtually every
company has a Data Warehouse that serves
to model and capture the essence of the
business from their operational systems.
The explosion of new types of data in
recent years—from inputs such as the
web and connected devices, or just sheer
volumes of records—has put tremendous
pressure on the EDW. Organizations
are also seeking to capitalize business
opportunities as they ingest real time event
data streams.
In the meantime, Apache Hadoop
has emerged as a great way to parallelize
analytics on large data sets running on
commodity hardware. As a result, an
increasing number of organizations have
resorted to a hybrid model using Apache
Hadoop to help cost-effectively manage
the enormous increase in data while still

maintaining the integrity of the data in
the EDW. By adopting this new hybrid
model, organizations are beginning to
deploy new analytic applications that
could not exist before, either because it
was too costly to scale their EDW, or it
was not technically possible in the existing
IT infrastructure model.

RED HAT AND HORTONWORKS
—THE VISION
Red Hat and Hortonworks, two open
source leaders, bring Apache Hadoop to
the enterprise. Working together, they
are building on their common, open
source approach to developing software
that addresses the growing big data
requirements of the enterprise. With
an enterprise Hadoop platform that is
tightly integrated with open hybrid cloud
technologies (including OpenStack, Red
Hat Storage, JBoss, Red Hat Enterprise
Linux, and OpenJDK), Hortonworks
and Red Hat deliver infrastructure and
application development solutions that
enable the next generation of big data
applications through IT optimization and
advanced analytics applications.
Companies are now able to move high
volumes of existing data into Hadoop,
offload processing workloads, and enrich
their data architecture with additional
types of data to create new business value.
Additionally, a new, ultra-competitive
breed of businesses is now emerging.
These organizations are able to take
advantage of immense volumes and
varieties of data to create competitive

differentiation—as an example, by
building a single, 360-degree view of
their customers and leveraging advanced
predictive analytics in Apache Hadoop.
Red Hat and Hortonworks are committed
to helping enterprises mine and monetize
their data for deeper business insights.
Learn more about the collaboration at
hortonworks.com/partner/redhat.

HYBRID DEPLOYMENT
SCENARIOS
Hortonworks and Red Hat provide
many choices of infrastructure to deploy
Hortonworks Data Platform (HDP): on
premise, cloud, and virtualized. Further,
our customers have a choice of deploying
on Linux and Windows operating
systems. We believe you should not be
limited to just one option, but have the
option to choose the best combination
of infrastructure and operating system
based on the usage scenario. In a hybrid
deployment model, you should have all
of these options. Our customers come
to us asking to meet the requirements
for their organizations for the following
scenarios:
Cluster Backup
IT Operations teams expect Hadoop
to provide robust, enterprise-level
capabilities, like other systems in the data
center, and business continuity through
replication across on-premises and
cloud-based storages targets is a critical
requirement. In HDP 2.2, Hortonworks
helped extend the capabilities of Apache

FEBRUARY/MARCH 2015 | DBTA

Sponsored Content

15

®

Falcon to establish an automated policy
for cloud backup to Red Hat Storage.
In addition, Red Hat engineers have
worked closely with the HDP Engineering
Team to build a plugin for Red Hat
storage. This new plugin allows customers
to run Map Reduce applications directly
on top on GlusterFS rather than have to
deal with expensive and cumbersome data
movement to and from HDFS (Hadoop
Distributed File System). In addition, the
Red Hat Storage option offers customers a
POSIX-compatible environment without
a single point of failure.
Development
In enterprise shops, development
environments are normally separated
from production systems. And
development environments are typically
smaller in scale, spun up and down
on a regular basis and are constantly
changing. Today, many organizations
are relying on a cloud-based option
for their development teams. It allows
IT to manage multiple development
environments more easily and also to
spin up temporary environments to
a full or a short-term development
requirement. As a hybrid option, you
need to be able to port not just data, but
the Hadoop applications as well. Red Hat
and Hortonworks are working closely
together on joint R&D efforts to simplify
instantiation of analytic applications on
HDP leveraging the Platform as a Service
(PaaS) capabilities of Red Hat OpenShift.
Burst
Data Science continues to grow in
interest within all of the organizations
we see adopting Apache Hadoop. With
YARN acting as the data operating
system for Apache Hadoop within a
production cluster, new advanced analytic
applications, whether short or longrunning in nature, are able to spin up
application containers in a distributed
fashion, on Hadoop Worker nodes that
are ideal and have the right profiles and
available resourcing for hosting each of

these unique workloads. Data Science
teams are also able to comfortably spin
up temporary clusters (on premise or
in the cloud) to perform discoverytype exploration, develop and test new
models or even run advanced machinelearning algorithms without the worry
of impacting other IT systems. Data
Scientists can seamlessly incorporate data
and application logic from their existing
production Hadoop environments.
OpenStack
OpenStack is an open-source
cloud platform typically deployed as
an Infrastructure as a Service (IaaS)
solution. It has gained much popularity
for its affordability, scalability and
flexibility. Hortonworks and Red Hat
have collaborated to bring Hadoop
to OpenStack via the Sahara project.
The goal of Sahara is to eliminate the
complexity of setting up and maintaining
Hadoop clusters, and to lower the TCO
of big data analytics. Deploying HDP on
OpenStack will provide IT shops with
the deployment flexibility and speed
needed to meet today’s rapidly changing
business needs.
Portability is the key to making each
of these deployments models successful.
You need to be able to not only move data
back and forth, but to also synchronize
data sets. Hortonworks continues to
focus on providing an Enterprise Ready
Hadoop distribution and invest in this
area. Apache Falcon, Sqoop, Kafka
and Flume are delivered with HDP2.2
to support data management and
governance.
Further, and even more complex,
is the consistency of the “bits” across
environments. The same version of
the entire Hadoop stack must be
deployed in these environments, or else
you risk a job execution failing as it is
migrated from one to the next. This
portability is a critical requirement for
hybrid deployment of Hadoop. With
Hortonworks providing Apache Ambari

for Hadoop deployment, configuration,
management and monitoring, and the
above-mentioned joint Red Hat and
Hortonworks engineering collaboration,
these challenges can be met.

HORTONWORKS ENABLES THE
MOST CHOICE IN THE INDUSTRY
Agility is a key business imperative for
CEOs and CIOs alike. Agile businesses
run on agile technology, which in turn
is made possible through choice. The
collaboration of two open source leaders
translates into more choice for customers
looking to build big data systems today
that will also evolve to meet the demands
of the enterprise tomorrow. Enabling Red
Hat and Hortonworks partner integration
is key to everyone’s success and a key part
of our joint strategy.

CONCLUSION
Hybrid is more than just a good idea.
It’s the way forward. As traditional lines
blur (IT vs. Business, Cloud vs. On-Premise,
Big Data vs. EDW), it is important that
enterprises are prepared to juggle the
demands of traditional data systems​with
a modern data architecture. Monetizing
all types of data has emerged as the new
battleground, and a hybrid model for data
management ensures that tomorrow’s
enterprises are set up for success. n

Call to action:
Learn more at
http://hortonworks.com/
labs/redhat/
and go over the tutorials.

HORTONWORKS
For more information, visit
www.hortonworks.com

18

FEBRUARY/MARCH 2015 | DBTA

Sponsored Content

A Hybrid Approach to Data Processing
In-memory computing and untapped business opportunities
are leading organizations to hybrid transactional
and analytical data processing.

Gartner: Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation
https://www.gartner.com/doc/2657815/hybrid-transactionanalytical-processing-foster-opportunities

THE CHALLENGE WITH DATA
MANAGEMENT TODAY
Traditionally, processing of
transactional and analytical data occurs
in separate databases which leads to
data silos. The ability to run business
operations and gain insights from data
is restricted by the time it takes to move
data from a transactional database to a
data warehouse.
Typically, the process goes like this:
• An
 online transaction processing
(OLTP) database ingests and stores
incoming data.
• The

process we begrudgingly know
as ETL (extract, transform, load)
transfers data from the OLTP database
to a data warehouse.
• Stale

data is then available to run
queries against and, hopefully, garner
insights to either increase revenue or
reduce costs.
This is a problem because, for most
organizations, the highest value data is
also the most recent.

A NEW, HYBRID APPROACH
TO DATA PROCESSING
Thanks to innovations in in-memory
computing coupled with distributed
system architectures, the antiquated OLTP
to OLAP approach to data management
is being turned on its head by what
Gartner has coined Hybrid Transactional/
Analytical Processing, or HTAP for short.
Defining Hybrid Transactional/
Analytical Processing
Hybrid Transactional/Analytical
Processing (HTAP) describes the
capability of a single database that
can perform both online transaction
processing (OLTP) and online analytical
processing (OLAP) for real-time
operational intelligence processing.
Market Forces Driving HTAP
Powerful market forces must be in
motion in order for any transformative

change to take place, especially when that
change is connected to data management.
The major market forces spurring the
transition from OLTP/OLAP to HTAP
include the following:
Lowering Cost of RAM—Over
the past decade, the price of RAM has
steadily dropped, and is now at the point
where value gained from storing data in
memory far outweighs the costs.
Untapped Business Opportunities—
HTAP gives businesses an accurate
representation of their most recent data.
With this visibility, businesses can extract
revenue from incoming data sources, and
mitigate costs by monitoring application
performance in real-time.
Data Everywhere—Movements like
mobile computing and the Internet of
Things have brought us to an age of
interconnectivity where data rules. For
business to thrive in this era, the ability
to collect, store, and analyze data in realtime is an absolute must.
HTAP Solves for Real-Time
Data Processing
HTAP opens new doors for
organizations to make sound decisions
from incoming data without the
restrictions of latency. As data workloads
grow from terabytes to petabytes, HTAP
will enable businesses to scale accordingly.
As a result, organizations will be able to
extract value from data that, with legacy
systems, was unthinkable.
HTAP Use Cases
We are in the early days of HTAP, and
it is not always clear how it can be applied
in the real world. As a rule of thumb,
any organization that handles large
volumes of data will benefit from HTAP.
To provide a bit more context, we’ve
compiled the following applications of
HTAP in use today.
Application Monitoring—When
millions of users reach mobile or webbased applications simultaneously, it

is critical that systems run without any
hiccups. HTAP allows teams of system
administrators and analysts to monitor
the health of applications in real-time to
spot anomalies and save on costs incurred
from poor performance.
Internet of Things—Applications
built for the Internet of Things (IoT) run
on huge amounts of sensor data. HTAP
easily processes IoT scale data workloads,
as it is designed to handle extreme data
ingestion while concurrently making
analytics available in real-time.
Real-Time Bidding—Ad Tech
companies struggle to implement complex
real-time bidding features due of the
sheer volume of data processing required.
HTAP delivers the processing power that’s
necessary to serve display, social, mobile
and video advertising at scale.
Market Conditions—Financial
organizations must be able to respond
to market volatility in an instant. Any
delay is money out of their pocket. HTAP
makes it possible for financial institutions
to respond to fluctuating market
conditions as they happen.
In each of these use cases, the ability to
react to large data sets in a short amount
of time provides incredible value and,
with HTAP, is entirely possible.

WE BUILT MEMSQL FOR HTAP
MemSQL is built for Hybrid
Transactional and Analytical Processing.
It allows data reliant businesses to handle
large amounts of incoming data with ease,
make sound decisions in real-time, and
to manage messy, real-world, data models
without having to give up the power and
familiarity of SQL. n
Download a 30-day FREE trial

memsql.com/download

MEMSQL
www.memsql.com

FEBRUARY/MARCH 2015 | DBTA

Sponsored Content

17

Data Virtualization: The Foundation for
a Successful Hybrid Data Architecture
In the era of of IoT, Cloud, Mobile, and
Social, data is being generated at a pace
not seen before and has become the fuel
that drives successful businesses.
Traditional data management
solutions—based on rigid information
models—are not flexible enough to deal
with the processing of today´s data and,
as a result, hybrid databases, Big Data and
NoSQL technologies have emerged.
For data architects this represents
a new situation where a hybrid data
architecture is needed. The singlerepository centralized data warehouse
approach is no longer suitable as data
architects need to deploy multiple
repositories to store and process the
new data types.
Doing so without a sound information
architecture imposes new challenges:
• Additional information silos, with the
subsequent risk of a spaghetti-style
point-to-point architecture.
• Data is stored with different
granularities in each repository, in
different formats and accessed using
different protocols.
• Data model mismatches between the
data in enterprise systems and that in
the new repositories.
• Applications find it difficult to access
and consume the information that is
spread across many silos.
As a result of this, IT finds it difficult
to cope with today´s business demands.

DATA VIRTUALIZATION ENABLES
A HYBRID DATA ARCHITECTURE
Data Virtualization is a technology
that provides a data abstraction layer
over multiple distributed heterogeneous
repositories–hiding the complexity
underneath in terms of potential data model
mismatches and different information
granularity and access heterogeneity.
The Data Virtualization engine lies
between the information repositories and
the consuming application layer representing
a unified point of bi-directional access to
the information. It offers the following
architectural advantages:

1. Abstraction: Hides the complexity
of the underlying data sources and
exposes a unified data model that can be
consumed by the application layer. This
information model is typically based on
the well-known relational model and can
be accessed using SQL.
2. Decoupling: A change in the
underlying infrastructure is buffered by
this data virtualization layer, protecting
the consuming applications from the
changes.
3. Unified point of access: The data
virtualization layer is the ideal place
to enforce your data governance and
security rules.
4. Reuse: The data virtualization
approach fosters the deployment of data
services that can be reused across the
whole organization.

BEST PRACTICES FOR A SOUND
HYBRID DATA ARCHITECTURE
1. Introduce data virtualization at the
beginning of a project to avoid the risk of
creating a point-to-point architecture that
will be very difficult to manage in the future.
2. Define the information model to be
exposed to the consuming applications in
this layer. As a best practice, use a canonical
model that represents the key business
entities that your applications require.

3. Enable access to the new
repositories (Big Data, NoSQL, etc.)
through this layer avoiding a direct access
from the consuming applications to them.
4. Apply the needed transformations
at this layer to import the information
models from the new sources. Advanced
data virtualization engines, such as Denodo,
allow importing hierarchical data, keyvalue, etc. seamlessly thanks to its Extended
Relational Model while preserving the
native data model in the source.
5. Build reusable data services that
expose the information in multiple
formats (SQL, SOAP, REST).
6. Finally, tune the model in terms
of performance to meet your SLAs.
Advanced data virtualization platforms
apply sophisticated query optimization
techniques to make the most of Big
Data and NoSQL platforms computing
capabilities.
A hybrid data architecture based on
data virtualization makes it easy to add a
new repository, offering the agility that IT
needs to properly scale and react to ever
increasing business demands. n
DENODO TECHNOLOGIES
is the leader in Data Virtualization.
www.denodo.com

18

FEBRUARY/MARCH 2015 | DBTA

Sponsored Content

Powering Real-Time Applications and
Offloading Operational Reports
With an Operational Data Lake
Companies are increasingly evaluating
big data technologies to handle massive
data growth. For many, though, the path
to conquering big data is riddled with
challenges—both technical and resourcedriven. They want to leverage big data,
but they don’t know where to start.
A common starting point is
implementing an operational data lake,
which is a hybrid approach to upgrading
obsolete operational data stores (ODSs)
that are inherently expensive to scale. To
power real-time applications and offload
operational reports, an operational data
lake requires a hybrid of two technologies
to make it happen: an RDBMS for
transactional, real-time updates and a
scale-out architecture from Hadoop.

OPERATIONAL DATA
LAKE STRUCTURES THE
UNSTRUCTURED
While Hadoop is a great platform
for unstructured data, it traditionally
has not been conducive to structured,
relational data. Hadoop uses read-only
flat files, which can make it very difficult
to replicate the cross-table schema in
structured data.
A data lake is operationalized via a
Hadoop RDBMS (see Figure 1 above),
where Hadoop handles the scale out,
and the RDBMS functionality supports
structured data and reliable real-time
updates. With this setup, the operational
data lake is never overwhelmed like a
traditional ODS. It’s nearly bottomless
or limitless in its scalability—companies
can continue to add as much data as they
want because expansion costs so little.
With the data lake, users can extract
structured metadata from unstructured
data on a regular basis and store it in the
operational data lake for quick and easy
querying, thus enabling better real-time
data analysis. And, just as importantly,
because all data is in a single location, the
operational data lake enables easy queries
across structured and unstructured data
simultaneously.

Figure 1.
Operational
Data Lake
Architecture

Finally, unlike native Hadoop, an
operational data lake can handle CRUD
(create, read, update, delete) operations
in a highly concurrent fashion. The
system can handle truly structured data
in real time, while using transactions to
ensure that updates are completed in a
reliable manner.
In the following section, a case study
is presented to illustrate the power of the
operational data lake in the enterprise.
The case study specifically demonstrates
how a Hadoop RDBMS, such as Splice
Machine, can bring significant business
value to digital marketers using an
operational data lake as a unified
customer profile.

SPLICING TOGETHER A
SOLUTION: A CASE STUDY
Marketing services company Harte
Hanks needed to power its campaign
management and BI applications to
deliver 360-degree customer views to
its client base, but found that its queries
were slowing to a crawl, taking half an
hour to complete in some cases. Given the
company’s prediction that its data would
grow by 30% to 50%, query performance
would only get worse.
Harte Hanks replaced its Oracle
RAC databases with Splice Machine, a
Hadoop RDBMS, thereby experiencing
a 3-to-7 fold increase in query speeds
at a cost that is 75% less than its Oracle
implementation.
Splice Machine allows Harte Hanks
to seamlessly support their OLTP and

OLAP processes all previously powered by
Oracle RAC:
• IBM Unica for campaign management
• IBM Cognos for business intelligence
• Harte Hanks Trillium for data quality
• Ab Initio for ETL
With this operational data lake
powered by Splice Machine, Harte Hanks
can now provide real-time campaign
management more cost-effectively, easily
scaling out to hundreds of terabytes by
adding commodity servers.

CONCLUSION
Creating a Hadoop-based operational
data lake to support core applications
and services involves selecting scaleout technologies that can effectively
encompass the best of all worlds. A
Hadoop RDBMS like Splice Machine
brings together the scalability of Hadoop,
the ubiquity of industry-standard SQL,
and the transactional integrity of a fully
ACID-compliant RDBMS.
An operational data lake can be an
excellent way of implementing a hybrid
architecture approach to not only ride
the wave of big data, but also ensure that
businesses face smooth sailing in the
future. n

SPLICE MACHINE
To learn more about how Splice
Machine can power an operational
data lake for your enterprise, visit
www.splicemachine.com.

FEBRUARY/MARCH 2015 | DBTA

Sponsored Content

19

Accelerate Business Insights
by Managing Hybrid Data in Memory
Organizations look to data to provide
answers, but most are ingesting it at a
volume, speed and variety that creates
the even bigger challenge of making
sense of it. Due to the tradition of
keeping operational and analytical data
separate in legacy environments, the
increasing need for processing structured,
unstructured and semi-structured data,
and the convergence of both volatile
and non-volatile storage underneath
these workloads, data management
is threatening to become increasingly
burdened by complex architectures and
unsolvable pain points rather than a
source of insight. And “Big Data” quickly
becomes just that—a huge pile of data
creating a big headache.

HYBRID TRANSACTIONAL AND
ANALYTICAL PROCESSING STILL
EARLY, BUT GAINING TRACTION
Historically, two types of data
processing environments have developed
due to different characteristics of
analytics (OLAP) and transactional
(OLTP) workloads, and the reluctance of
performing analytical processing on live
transactional data. OLAP often requires ad
hoc exploratory capabilities and doesn’t
have strong SLAs, while OLTP almost
always demands strong performance and
SLA’s for data consistency.
However, increasing demand for realtime analytics, which allows instantaneous
business intelligence and decision making,
is forcing many enterprises to rethink the
fundamental premises behind OLAP
and OLTP. Surging innovations in the
area of In-Memory Computing provide
the technological underpinning for the
new software infrastructure for emerging
hybrid transactional and analytical
processing (HTAP) workloads. With the
performance and scalability benefits of
In-Memory Computing, Big Data can be
effectively stored and processed in DRAM,
and both analytical and transactional
workloads can be effectively executed
without a need for two different systems
or ETL data movement processes.

The GridGain In-Memory Data Fabric
provides a unique platform for high
performance data processing of analytical
and transactional workloads without a
need for costly ETL processes from silo-ed
installations. It combines state-of-the-art
transactional processing capabilities with
all key analytical processing features in
one data layer, sharing the same ultrahigh performance characteristic (high
throughput, low latency) of in-memory
processing.

DIVERSE DATA SOURCES
Another interesting aspect of hybrid
data management is the fact that no longer
is there a single data source that serves the
application or a set of applications. The
typical modern composite application
relies on multiple dedicated data sources
such as traditional RDBMS for OLTP,
NoSQL for OLAP and Hadoop for data
warehousing. One of the key challenges of
hybrid data management is the ability to
effectively query and manage data across a
diverse set of data sources, while providing
a unified and consistent view on all data to
the applications.
The GridGain In-Memory Data Fabric
provides a data access and processing layer
that takes a holistic view of in-memory
processing as a layer on top of any existing
data source—instead of requiring a costly
replacement of any one of them. GridGain’s
approach allows to ingest new and
traditional data sources without ripping
and replacing existing databases, while
offering high performance processing of
diverse data sets in a hybrid environment.

DATA PERSISTENCE WITHOUT
PERFORMANCE PENALTY

Just as Flash technology is quickly
taking the place of spinning disks as
the default storage for many traditional
workloads, RAM—especially emerging
non-volatile DIMM (NVDIMM)
technology —promises long sought-after
data persistence for high-performance,
hyper-scale applications. NVDIMM
makes a normal DDR4 memory persistent

and enables dramatic performance
optimizations for in-memory-based
applications—transactional, analytical
and hybrid (HTAP).
Unlike NAND-based storage which
is always accessed as a block device,
DRAM-based NVDIMM is purely byteaddressable memory that’s absolutely
identical to a normal DRAM. In fact,
from the application’s standpoint there is
no difference between accessing normal
DRAM or NVDIMM. Most transactional
systems assume a tiered memory hierarchy
of volatile memory for processing, and
persistent disk storage (HDD, SSD) for
durability of the data. With NVDIMM,
these systems gain fast and granular
access to persistent storage without the
performance penalty of involving diskbased storage.
As a leading provider of open source
and commercial in-memory technology,
GridGain Systems is on the forefront of
innovating in the areas of hybrid volatile/
non-volatile memory environments,
with the goal to support low-latency
write-though operations for real-time
applications that cannot afford to lose data.

THE PROMISE
Modern in-memory technology
provides the most logical and
comprehensive way to harness the
computing power necessary to manage
the growing demands of hybrid data
management. The GridGain In-Memory
Data Fabric—available as an open source
project (Apache Ignite incubating) and
a hardened enterprise product—offers
companies unique capabilities and a
competitive advantage in managing diverse
data with the speed and scale necessary to
address the requirements of modern Cloud,
Big Data, social and IoT applications.
It’s easy to test our promise. Download
a free evaluation copy of the GridGain
In-Memory Data Fabric at http://www
.gridgain.com/download/. n
GRIDGAIN SYSTEMS
www.gridgain.com

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close