What is Cloud Computing

Published on January 2017 | Categories: Documents | Downloads: 29 | Comments: 0 | Views: 154
of 4
Download PDF   Embed   Report

Comments

Content


 What is Cloud Computing
 What is Grid Computing
 What is Virtualization
 How above three are inter-related to each other
 What is Big Data
 Introduction to Analytics and the need for big data
analytics
 Hadoop Solutions - Big Picture
 Hadoop distributions
 Comparing Hadoop Vs. Traditional systems
 Volunteer Computing
 Data Retrieval - Radom Access Vs. Sequential Access
 NoSQL Databases
The Motivation for Hadoop
 Problems with traditional large-scale systems
 Requirements for a new approach
Hadoop: Basic Concepts
 What is Hadoop?
 The Hadoop Distributed File System
 How MapReduce Works
 Anatomy of a Hadoop Cluster
Hadoop Demons
 Namenode
 Datanode
 Secondary namenode
 Job tracker
 Task tracker
HDFS at detail
 Blocks and Splits
 Replication
 Data high availability
 Data Integrity
 Cluster architecture and block placement
Programming Practices & Performance Tuning
 Developing MapReduce Programs in
 Local Mode
 Pseudo-distributed Mode
 Fully distributed mode
Writing a MapReduce Program
 Examining a Sample MapReduce Program
 Basic API Concepts
 The Driver Code
 The Mapper
 The Reducer
 Hadoop's Streaming API
Setup Hadoop cluster
 Install and configure Apache Hadoop
 Make a fully distributed Hadoop cluster on a single
laptop/desktop
 Install and configure Cloudera Hadoop distribution in fully
distributed mode
 Install and configure Horton Works Hadoop distribution in
fully distributed mode
 Monitoring the cluster
 Getting used to management console of Cloudera and
Horton Works
Hadoop Security
 Why Hadoop Security Is Important
 Hadoop's Security System Concepts
 What Kerberos Is and How it Works
 Configuring Kerberos Security
 Integrating a Secure Cluster with Other Systems
Managing and Scheduling Jobs
 Managing Running Jobs
 Hands-On Exercise
 The FIFO Scheduler
 The FairScheduler
 Configuring the FairScheduler
 Hands-On Exercise
Cluster Maintenance
 Checking HDFS Status
 Hands-On Exercise
 Copying Data Between Clusters
 Adding and Removing
 Cluster Nodes
 Rebalancing the Cluster
 Hands-On Exercise
 Name Node Metadata Backup
Cluster Monitoring and Troubleshooting
 General System Monitoring
 Managing Hadoop's Log Files
 Using the NameNode and
 JobTracker Web UIs
 Hands-On Exercise
 Cluster Monitoring with Ganglia
 Common Troubleshooting Issues
 Benchmarking Your Cluster
 Hadoop Ecosystem covered as part of Hadoop
Administrator
Eco system component: Ganglia
 Install and configure Ganglia on a cluster
 Configure and use Ganglia
 Use Ganglia for graphs.
Eco system component: Nagios
 Nagios concepts
 Install and configure Nagios on cluster
 Use Nagios for sample alerts and monitoring
Eco system component: Hive
 Hive concepts
 Install and configure hive on cluster
 Create database, access it console
 Develop and run sample applications in Java/Python to
access hive
Eco system component: Sqoop
 Install and configure sqoop on cluster
 Import data from Oracle/Mysql to hive
Overview of other Eco system component:
 Oozie, Avro, Thrift, Rest, Mahout, Cassandra, YARN,
MR2 etc

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close