HBase Goes Realtime

Published on September 2019 | Categories: Documents | Downloads: 10 | Comments: 0 | Views: 232
of 21
Download PDF   Embed   Report

Comments

Content

1.1K views

0

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

HBase Goes Realtime

Hbase

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Quick Overview

»

Who are we? What is HBase? HBase 0.20 Primary Goal HBase 0.20 Architecture and Specifics HBase 0.20 By The Numbers Zookeeper Integration

»

What’s Next?

» » » » »

Hbase

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

Who are we? »

Jonathan Gray › › › › ›

»

Co-Founder and CTO, Streamy.com Background in CE, databases, distributed systems User of technology as a competitive advantage Contributing to HBase for ~1 year HBase in production for ~9 months

Jean-Daniel Cryans ›

HBase Committer

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

What is HBase? »

HBase is a… › Sorted, › Distributed, › Column-Oriented, › Multi-Dimensional, › Highly-Available, › High-Performance, › Persisted Storage System

HBase – Coprocessors

HBase schema design case

Hbase

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

HBase 0.20 Primary Goal »

First ever Performance Release 1. Random Access Time 2. Scan Time 3. Insert Time

»

As a random-access store, we are well suited fo storing and serving of Web applications ›

But high latency and variability (100s of ms to seconds) has reduced the usefulness of HBase an

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

HBase 0.20 Architecture »

The Guiding Philosophy – Unjavafy Everything! › › › › › ›

»

Zero-copy reads Block-based storage, reading, and indexing Drastically reduce Object instantiation Eliminate widespread usage of Trees Sorted merges using Heap structures Fast and intelligent caching with memory-awareness

Effort Lead By…

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

HBase 0.20 Architecture – Storage »

New Key Format – KeyValue ›

Contains only (byte [] buf, int offset, int length) Compact binary format with binary comparators



Our “pointer” to keys inside blocks



»

New File Format – HFile › › ›

Originally based on TFile (HADOOP-3315) and BigTab Block based binary format with a block index Contains any number of Meta blocks

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

HBase 0.20 Architecture – API »

New Query API › › › ›

»

Put, Get, Scan, Delete operations Extended support for versioning Drastically reduces API size and complexity  An API that more closely mirrors implementation

New Result API and optimized Serialization › ›

Result is just a wrapper for KeyValue[] User-friendly Trees are built on-demand, client-side

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

HBase 0.20 Architecture – Algorithms »

New Scanners – KeyValueScanner / KeyValueH › › ›



»

Replace linear sort logic with an encapsulated Heap Abstract the handling of versions, deletes, query para Now capable of processing individual rows with millio columns and versions

Linear (or worse) to Logarithmic, Logarithmic to Const

New Block Cache - Concurrent LRU ›

Backed by ConcurrentHashMap

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

HBase 0.20 By The Numbers (Uncached) »

Tall Table: 1 Million Rows with a single Column › › ›

»

Wide Table: 1000 Rows with 20,000 Columns ea › › ›

»

Sequential insert – 24 seconds (.024 ms/row) Random reads – 1.42 ms/row (average) Full scan – 11 seconds (117 ms/10,000 rows, .011ms/row)

Sequential insert – 312 seconds (312 ms/row) Random reads – 121 ms/row (average) Full scan – 146 seconds (14.6 seconds/100 rows, 146ms/ro

Fat Table: 1000 Rows with 10 Columns,1MB valu

0

1.1K views

RELATED TITLES

0

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

HBase 0.20 Performance Conclusion »

We surprised even ourselves ›

Random read times similar to that of an RDBMS •



Scan times reduced •



20-100 times faster with far less variability 30 times faster than previous versions

Insert times reduced •

2-10 times faster with less than half the memory us

We improved our performance by more than an

1.1K views

0

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Zookeeper Integration

Hbase

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

Why? » » » »

Takes 2 mins to figure a RegionServer’s death Clients have to ask Master for -ROOT- address Managing shared state in HBase is a zoo ;) And...

Master is a SPOF!

»

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

Zookeeper? •







Project under Hadoop started by Y! Centralized service for maintaining configura information, naming, providing distributed synchronization, and group services. Highly available when used on an ensemble machines, typically 5 or more.

ZK’s data model is a simple namespace with

permanent and ephemeral nodes.

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

Tough Decisions »

»

» »

Should we impose the usage of a ZK quorum on every setup? How much should we rely on ZK for HA, are ther better alternate solutions for some of our proble Should we have an HBase implementation of ZK Should we package our own version of ZK?

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

Major Integration Points » » » » » »

Master address is stored in ZK Master election is a race for that lock  -ROOT- address is also stored in ZK Region Servers are all registered in ZK

The RSs watch the Master’s node Backup Masters are watching both Master’s nod and a “cluster state” node

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

What it Changes for You »

Standalone and pseudo-distributed setups: ›

»

a ZK server that listens on localhost is started for y starts/stops with the rest of the cluster.

Fully-distributed setup: ›



poss. to keep the managed ZK server but have to make it point on a non-local IP/hostname. better is to get a quorum, can also use it for other purposes, for higher availability.

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

Fully-distributed setup »

What you have to do with ZK:



hbase-site.xml: set hbase.cluster.distributed to tru also notice that hbase.master is deprecated. hbase-env.sh: export HBASE_MANAGES_ZK=false



zoo.cfg: server.0=… server.1=… etc. You also hav



configure those servers per ZK doc. »

You want backup masters? ›

${HBASE_HOME}/bin/hbase-daemon.sh start mas It’s also a good idea to set

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

New Features from ZK integration in 0.20 »

No more SPOF ›

» »

Automatic Master failover

Rolling upgrades of point releases Modify some cluster configuration without full cluster restart

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

What’s next? »

More performance and reliability › ›

»

0.20 was mostly a RegionServer rewrite 0.21 will rewrite Master with better ZK integration

HBase 0.21 Roadmap ›

Decentralized Master responsibilities + More ZK •







Further capability to modify configurations at run time State sharing via ZK nodes Ephemeral nodes for region ownership Distributed queue for region assignment

Hbase

0

1.1K views

0

RELATED TITLES

HBase Goes Realtime Uploaded by Oleksiy Kovyrin



Full description 







Save

Embed

Share

Print

Hadoop Performance

HBase – Coprocessors

HBase schema design case

Hbase

More Information about HBase »

HBase Website and Wiki › ›

»

Mailing List ›

»

http://www.hbase.org http://wiki.apache.org/hadoop/Hbase

http://hadoop.apache.org/hbase/mailing_lists.h

IRC Channel ›

#hbase on Freenode All committers and core contributors are here

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close