Recommended Posts


Real World Hadoop - Automating Hadoop install with Python
MP4 | Video: AVC 1280x720 | Audio: AAC 44KHz 2ch | Duration: 4 Hours | Lec: 19 | 825 MB
Genre: eLearning | Language: English

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Cloudera Manager's Python API. Hands on.

Deploy a Hadoop cluster (Zookeeper, HDFS, YARN, Spark) with Python! Instruct Cloudera Manager to do the work! Hands on. Here we use Python to instruct an already installed Cloudera Manager to deploy your Hadoop Services.

.The Cloudera Manager API provides configuration and service lifecycle management, service health information and metrics, and allows you to configure Cloudera Manager itself. The API is served on the same host and port as the Cloudera Manager Admin Console, and does not require an extra process or extra configuration. The API supports HTTP Basic Authentication, accepting the same users and credentials as the Cloudera Manager Admin Console.


Here are some of the cool things you can do with Cloudera Manager via the API:

Deploy an entire Hadoop cluster programmatically. Cloudera Manager supports HDFS, MapReduce, YARN, ZooKeeper, HBase, Hive, Oozie, Hue, Flume, Impala, Solr, Sqoop, Spark and Accumulo.
Configure various Hadoop services and get config validation.
Take admin actions on services and roles, such as start, stop, restart, failover, etc. Also available are the more advanced workflows, such as setting up high availability and decommissioning.
Monitor your services and hosts, with intelligent service health checks and metrics.
Monitor user jobs and other cluster activities.
Retrieve timeseries metric data.
Search for events in the Hadoop system.
Administer Cloudera Manager itself.
Download the entire deployment description of your Hadoop cluster in a json file.

Additionally, with the appropriate licenses, the API lets you:

Perform rolling restart and rolling upgrade.
Audit user activities and accesses in Hadoop.
Perform backup and cross data-center replication for HDFS and Hive.
Retrieve per-user HDFS usage report and per-user MapReduce resource usage report.


Here I present a curriculum as to the current state of my Cloudera courses.

My Hadoop courses are based on Vagrant so that you can practice and destroy your virtual environment before applying the installation onto real servers/VMs.


Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now