Big Data Tools

Published on June 2016 | Categories: Types, Presentations | Downloads: 21 | Comments: 0 | Views: 142

of 14

big data tools.....presentation...

Content

Big Data tools
Hbase
HIVE
Zookeeper
Pig

Hadoop Random access
Databases
• Applications
such
as
HBase,
Cassandra, ccouchDB, Dynamo, and
MongoDB are some of the databases
that store huge amounts of data and
access the data in a random manner.

Hbase :• Hbase is a distributed columnoriented database built on top of the
Hadoop file system.
• It is an open-source project and is
horizontally scalable.
• It is a part of the Hadoop ecosystem
that provides random real-time
read/write access to data in the
hadoop File System.

Hbase Architecture :• One can store the data in HDFS
either directly or through Hbase.

• Hbase has three major components:
– Client library
– Master server
– Region servers
• Region servers can be added or removed as
per requirment.

Components :• Master Server
– Assigns regions to the region servers
and takes the help of Apache ZooKeeper
for this task
– Handles load balancing of the regions
across region servers. It unloads the
busy servers and shifts the regions to
less occupied servers.
– Is responsible for schema changes and
other metadata operations such as
creation of tables and column families.

• Regions
– Regions are nothing but tables that are
split up and spread across the region
servers.

• Region server
– Communicate with the client and
handle data-related operations.
– Handle read and write requests for all
the regions under it.

• Zookeeper:– Zookeeper is an open-source project
that provides services like maintaining
configuration information, naming,
providing distributed synchronization,
etc.
– Clients communicate with region servers
via zookeeper.

HIVE
• Hive is a data warehouse
infrastructure tool to process
structured data in Hadoop. It resides
on top of Hadoop to summarize Big
Data, and makes querying and
analyzing easy.

• Features of Hive
– It stores schema in a database and
processed data into HDFS.
– It is designed for OLAP
– It provides SQL type language for
querying called HiveQL or HQL.
– It is familiar, fast, scalable, and
extensible.

PIG
• Pig was initially developed at Yahoo!
to allow people using Hadoop to
focus more on analyzing large data
sets and spend less time having to
write mapper and reducer programs

Pig components
• Language
– Which is called PigLatin

• Runtime Environment
– Where PigLatin programs are executed.
• Think of the relationship between a java
Virtual Machine(JVM) and Java application.

• The Programming Language
1. The first step in a Pig program is
to LOAD the data you want to
manipulate from HDFS.
2. Then you run the data through a set
of transformations.
3. Finally, you DUMP the data to the
screen or you STORE the results in a
file somewhere.

Big Data Tools

Comments

Content

Sponsor Documents

Recommended