Best Hadoop Online Training

Published on March 2017 | Categories: Documents | Downloads: 68 | Comments: 0 | Views: 369
of 6
Download PDF   Embed   Report

Comments

Content

Hadoop Online Training
Best Hadoop Online Training by Real time Experts:
Hadoop is an open source framework which is used for storing and processing the large scale of data
sets on large clusters of hardware. The specialty of Hadoop involves in HDFS which is used for storing
data on large commodity machines and provides very huge bandwidth for the cluster. Mainly, Hadoop
uses Map Reduce Method for processing large scale data sets. Lead Online Training provides the best
online training for Hadoop by technical experts in subject and will provide you the best training and
makes you perfect in technology. We are always available to support you.
Hadoop Online Training Course Overview:
Basics of Hadoop:


Motivation for Hadoop



Large scale system training



Survey of data storage literature



Literature survey of data processing



Networking constraints



New approach requirements

Basic concepts of Hadoop


What is Hadoop?



Distributed file system of Hadoop



Map reduction of Hadoop works



Hadoop cluster and its anatomy



Hadoop demons



Master demons



Name node



Tracking of job



Secondary node detection



Slave daemons



Tracking of task



HDFS(Hadoop Distributed File System)

www.kellytechno.com

Page 1

Hadoop Online Training


Spilts and blocks



Input Spilts



HDFS spilts



Replication of data



Awareness of Hadoop racking



High availably of data



Block placement and cluster architecture



CASE STUDIES



Practices & Tuning of performances



Development of mass reduce programs



Local mode



Running without HDFS



Pseudo-distributed mode



All daemons running in a single mode



Fully distributed mode



Dedicated nodes and daemon running
Hadoop administration



Setup of Hadoop cluster of Cloud era, Apache, Green plum, Horton works



On a single desktop, make a full cluster of a Hadoop setup.



Configure and Install Apache Hadoop on a multi node cluster.



In a distributed mode, configure and install Cloud era distribution.



In a fully distributed mode, configure and install Hortom works distribution



In a fully distributed mode, configure the Green Plum distribution.



Monitor the cluster



Get used to the management console of Horton works and Cloud era.



Name the node in a safe mode



Data backup.



Case studies



Monitoring of clusters
Hadoop Development :



Writing a MapReduce Program



Sample the mapreduce program.



API concepts and their basics



Driver code



Mapper



Reducer

www.kellytechno.com

Page 2

Hadoop Online Training


Hadoop AVI streaming



Performing several Hadoop jobs



Configuring close methods



Sequencing of files



Record reading



Record writer



Reporter and its role



Counters



Output collection



Assessing HDFS



Tool runner



Use of distributed CACHE



Several MapReduce jobs (In Detailed)



1.MOST EFFECTIVE SEARCH USING MAPREDUCE



2.GENERATING THE RECOMMENDATIONS USING MAPREDUCE



3.PROCESSING THE LOG FILES USING MAPREDUCE



Identification of mapper



Identification of reducer



Exploring the problems using this application



Debugging the MapReduce Programs



MR unit testing



Logging



Debugging strategies



Advanced MapReduce Programming



Secondary sort



Output and input format customization



Mapreduce joins



Monitoring & debugging on a Production Cluster



Counters



Skipping Bad Records



Running the local mode



MapReduce performance tuning



Reduction network traffic by combiner



Partitioners



Reducing of input data



Using Compression

www.kellytechno.com

Page 3

Hadoop Online Training


Reusing the JVM



Running speculative execution



Performance Aspects



CASE STUDIES
CDH4 Enhancements :
1. Name Node – Availability
2. Name Node federation
3. Fencing
4. MapReduce – 2
HADOOP ANALYST
1.Concepts of Hive
2. Hive and its architecture
3. Install and configure hive on cluster
4. Type of tables in hive
5. Functions of Hive library
6. Buckets
7. Partitions
8. Joins
1. Inner joins
2. Outer Joins
9. Hive UDF
PIG
1.Pig basics
2. Install and configure PIG
3. Functions of PIG Library
4. Pig Vs Hive
5. Writing of sample Pig Latin scripts
6. Modes of running
1. Grunt shell
2. Java program
7. PIG UDFs
8. Macros of Pig
9. Debugging the PIG
IMPALA
1. Difference between Pig and Impala Hive
2. Does Impala give good performance?

www.kellytechno.com

Page 4

Hadoop Online Training
3. Exclusive features
4. Impala and its Challenges
5. Use cases
NOSQL
1. HBase
2. HBase concepts
3. HBase architecture
4. Basics of HBase
5. Server architecture
6. File storage architecture
7. Column access
8. Scans
9. HBase cases
10. Installation and configuration of HBase on a multi node
11. Create database, Develop and run sample applications
12. Access data stored in HBase using clients like Python, Java and Pearl
13. Map Reduce client
14. HBase and Hive Integration
15. HBase administration tasks
16. Defining Schema and its basic operations.
17. Cassandra Basics
18. MongoDB Basics
Ecosystem Components
1. Sqoop
2. Configure and Install Sqoop
3. Connecting RDBMS
4. Installation of Mysql
5. Importing the data from Oracle/Mysql to hive
6. Exporting the data to Oracle/Mysql
7. Internal mechanism
Oozie
1. Oozie and its architecture
2. XML file
3. Install and configuring Apache
4. Specifying the Work flow
5. Action nodes

www.kellytechno.com

Page 5

Hadoop Online Training
6. Control nodes
7. Job coordinator
Avro, Scribe, Flume, Chukwa, Thrift
1. Concepts of Flume and Chukwa
2. Use cases of Scribe, Thrift and Avro
3. Installation and configuration of flume
4. Creation of a sample application

www.kellytechno.com

Page 6

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close