Deja Vu


DQ

Remote Management

You can contact us at
sales@californiadigital.com
or call 1-888-546-8948 for more
information.

software systems solutions turnkey support about


DQ Software

The ability of Deja vu to transparently checkpoint, recover and migrate applications on the fly enables preemptive scheduling for traditionally batch queuing applications. Current cluster batch queuing systems from several vendors provide cooperative scheduling algorithms. However, cooperative scheduling algorithms have significant drawbacks that impact scheduling efficiency and limit their capabilities for true priority queuing and weighted fair queuing. To see why, assume the following scenario. A technical computing cluster has 200 processors, with all processors used up by currently executing jobs. If new jobs are submitted, they are inserted in an execution queue managed by the queuing system. Assume that the first job in the queue requests 64 processors. If a currently executing 32 processor job terminates, the first job in the queue cannot be started since it requests more computational resources than available. In a cooperative scheduling system, this situation presents a trade-off between scheduling efficiency and process starvation concerns. If the scheduling system schedules jobs lower down in the queue to use up the available 32 processors, it results in process starvation – the top entry in the queue may never get a chance to run. A technique called "backfill" is commonly used to achieve a better trade-off, however it requires users to provide accurate estimates of the computational time their jobs need, which is not usually available.

Operating systems on shared memory multiprocessors use time-shared preemptive multitasking to achieve high scheduling efficiency in the above scenario. However, in the absence of true preemptive multitasking, there were no equivalents for clusters. Here, the ability of Déjà vu to recover from multiple failures presents an interesting capability. Since Déjà vu can recover from "all failures", where all nodes involved in a coupled computation fail – we can use failure recovery for preemption. Intuitively, the "all failures" case is identical to suspending a process and transparently restarting another in its place – preemptive time-shared multitasking

DQ is the first preemptive scheduler for distributed memory clusters and comes integrated with systems management software. DQ enables

  • True priority queuing: Jobs may have multiple priority levels including priority levels that preempt currently running jobs.
  • Weighted Fair Queuing: A technical computing resource proportionally shared by multiple "groups" can be allocated guaranteed "shares" of the system.
  • On-Demand Computing: Priority queuing and preemptive multitasking can be used to achieve guaranteed queuing delays. This enables computational data centers to create priority based performance guarantees and appropriate pricing strategies.
  • Enhanced Systems Administration Control: Enables fluid control over systems resources including the ability to preemptively control node availability.

These capabilities enable the use of DQ in domains ranging from enterprise technical computing resources to upcoming "on-demand" cycle access data centers.


 


Home | Contact Us | Legal Statement | Terms & Conditions
Call us at 1-888-546-8948 or (510) 651-8811; Copyright © 2005 California Digital
Contact Us.