Create an Account
username: password:
 
  MemeStreams Logo

Cloudera Hadoop & Big Data Blog » Blog Archive » Job Scheduling in Hadoop

search

Lost
Picture of Lost
My Blog
My Profile
My Audience
My Sources
Send Me a Message

sponsored links

Lost's topics
Arts
Business
Games
Health and Wellness
Home and Garden
Miscellaneous
Current Events
Recreation
Local Information
Science
Society
Sports
Technology

support us

Get MemeStreams Stuff!


 
Cloudera Hadoop & Big Data Blog » Blog Archive » Job Scheduling in Hadoop
Topic: Technology 2:56 pm EDT, Mar 24, 2009

Job Scheduling in Hadoop

When Hadoop started out, it was designed mainly for running large batch jobs such as web indexing and log mining. Users submitted jobs to a queue, and the cluster ran them in order. However, as organizations placed more data in their Hadoop clusters and developed more computations they wanted to run, another use case became attractive: sharing a MapReduce cluster between multiple users. The benefits of sharing are tremendous: with all the data in one place, users can run queries that they may never have been able to execute otherwise, and costs go down because system utilization is higher than building a separate Hadoop cluster for each group. However, sharing requires support from the Hadoop job scheduler to provide guaranteed capacity to production jobs and good response time to interactive jobs while allocating resources fairly between users.

This July, the scheduler in Hadoop became a pluggable component and opened the door for innovation in this space. The result was two schedulers for multi-user workloads: the Fair Scheduler, developed at Facebook, and the Capacity Scheduler, developed at Yahoo.

Cloudera Hadoop & Big Data Blog » Blog Archive » Job Scheduling in Hadoop



 
 
Powered By Industrial Memetics
RSS2.0