Who are we? What is HBase? HBase 0.20 Primary Goal HBase 0.20 Architecture and Specifics HBase 0.20 By The Numbers Zookeeper Integration
»
What’s Next?
» » » » »
Hbase
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
Who are we? »
Jonathan Gray › › › › ›
»
Co-Founder and CTO, Streamy.com Background in CE, databases, distributed systems User of technology as a competitive advantage Contributing to HBase for ~1 year HBase in production for ~9 months
Jean-Daniel Cryans ›
HBase Committer
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
What is HBase? »
HBase is a… › Sorted, › Distributed, › Column-Oriented, › Multi-Dimensional, › Highly-Available, › High-Performance, › Persisted Storage System
HBase – Coprocessors
HBase schema design case
Hbase
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
HBase 0.20 Primary Goal »
First ever Performance Release 1. Random Access Time 2. Scan Time 3. Insert Time
»
As a random-access store, we are well suited fo storing and serving of Web applications ›
But high latency and variability (100s of ms to seconds) has reduced the usefulness of HBase an
Zero-copy reads Block-based storage, reading, and indexing Drastically reduce Object instantiation Eliminate widespread usage of Trees Sorted merges using Heap structures Fast and intelligent caching with memory-awareness
Effort Lead By…
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
HBase 0.20 Architecture – Storage »
New Key Format – KeyValue ›
Contains only (byte [] buf, int offset, int length) Compact binary format with binary comparators
›
Our “pointer” to keys inside blocks
›
»
New File Format – HFile › › ›
Originally based on TFile (HADOOP-3315) and BigTab Block based binary format with a block index Contains any number of Meta blocks
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
HBase 0.20 Architecture – API »
New Query API › › › ›
»
Put, Get, Scan, Delete operations Extended support for versioning Drastically reduces API size and complexity An API that more closely mirrors implementation
New Result API and optimized Serialization › ›
Result is just a wrapper for KeyValue[] User-friendly Trees are built on-demand, client-side
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
HBase 0.20 Architecture – Algorithms »
New Scanners – KeyValueScanner / KeyValueH › › ›
›
»
Replace linear sort logic with an encapsulated Heap Abstract the handling of versions, deletes, query para Now capable of processing individual rows with millio columns and versions
Linear (or worse) to Logarithmic, Logarithmic to Const
New Block Cache - Concurrent LRU ›
Backed by ConcurrentHashMap
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
HBase 0.20 By The Numbers (Uncached) »
Tall Table: 1 Million Rows with a single Column › › ›
»
Wide Table: 1000 Rows with 20,000 Columns ea › › ›
20-100 times faster with far less variability 30 times faster than previous versions
Insert times reduced •
2-10 times faster with less than half the memory us
We improved our performance by more than an
1.1K views
0
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Zookeeper Integration
Hbase
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
Why? » » » »
Takes 2 mins to figure a RegionServer’s death Clients have to ask Master for -ROOT- address Managing shared state in HBase is a zoo ;) And...
Master is a SPOF!
»
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
Zookeeper? •
•
•
•
Project under Hadoop started by Y! Centralized service for maintaining configura information, naming, providing distributed synchronization, and group services. Highly available when used on an ensemble machines, typically 5 or more.
ZK’s data model is a simple namespace with
permanent and ephemeral nodes.
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
Tough Decisions »
»
» »
Should we impose the usage of a ZK quorum on every setup? How much should we rely on ZK for HA, are ther better alternate solutions for some of our proble Should we have an HBase implementation of ZK Should we package our own version of ZK?
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
Major Integration Points » » » » » »
Master address is stored in ZK Master election is a race for that lock -ROOT- address is also stored in ZK Region Servers are all registered in ZK
The RSs watch the Master’s node Backup Masters are watching both Master’s nod and a “cluster state” node
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
What it Changes for You »
Standalone and pseudo-distributed setups: ›
»
a ZK server that listens on localhost is started for y starts/stops with the rest of the cluster.
Fully-distributed setup: ›
›
poss. to keep the managed ZK server but have to make it point on a non-local IP/hostname. better is to get a quorum, can also use it for other purposes, for higher availability.
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
Fully-distributed setup »
What you have to do with ZK:
›
hbase-site.xml: set hbase.cluster.distributed to tru also notice that hbase.master is deprecated. hbase-env.sh: export HBASE_MANAGES_ZK=false
›
zoo.cfg: server.0=… server.1=… etc. You also hav
›
configure those servers per ZK doc. »
You want backup masters? ›
${HBASE_HOME}/bin/hbase-daemon.sh start mas It’s also a good idea to set
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
Hbase
New Features from ZK integration in 0.20 »
No more SPOF ›
» »
Automatic Master failover
Rolling upgrades of point releases Modify some cluster configuration without full cluster restart
0
1.1K views
0
RELATED TITLES
HBase Goes Realtime Uploaded by Oleksiy Kovyrin
Full description
Save
Embed
Share
Print
Hadoop Performance
HBase – Coprocessors
HBase schema design case
What’s next? »
More performance and reliability › ›
»
0.20 was mostly a RegionServer rewrite 0.21 will rewrite Master with better ZK integration
HBase 0.21 Roadmap ›
Decentralized Master responsibilities + More ZK •
•
•
•
Further capability to modify configurations at run time State sharing via ZK nodes Ephemeral nodes for region ownership Distributed queue for region assignment