Related Work

Variants of leases are widely used when a client holds a resource on a server. The common purpose of a lease abstraction is to specify a mutually agreed time at which the client's right to hold the resource expires. If the client fails or disconnects, the server can reclaim the resource when the lease expires. The client renews the lease periodically to retain its hold on the resource.

Lifetime management. Leases are useful for distributed garbage collection. The technique of robust distributed reference counting with expiration times appeared in Network Objects [5], and subsequent systems--including Java RMI [29], Jini [27], and Microsoft .NET--have adopted it with the ``lease'' vocabulary. Most recently, Web Services WSRF [10] has defined a lease protocol as a basis for lifetime management of hosted services.

Mutual exclusion. Leases are also useful as a basis for distributed mutual exclusion, most notably in cache consistency protocols [14,21]. To modify a block or file, a client first obtains a lease for it in an exclusive mode. The lease confers the right to access the data without risk of a conflict with another client as long as the lease is valid. The key benefit of the lease mechanism itself is availability: the server can reclaim the resource from a failed or disconnected client after the lease expires. If the server fails, it can avoid issuing conflicting leases by waiting for one lease interval before granting new leases after recovery.

Resource management. As in SHARP [13], the use of leases in Shirako combines elements of both lifetime management and mutual exclusion. While providers may choose to overbook their physical resources locally, each offered logical resource unit is held by at most one lease at any given time. If the lease holder fails or disconnects, the resource can be allocated to another guest. This use of leases has three distinguishing characteristics:.

Shirako leases apply to the resources that host the guest, and not to the guest itself; the resource provider does not concern itself with lifetime management of guest services or objects.
The lease quantifies the resources allocated to the guest; thus leases are a mechanism for service quality assurance and adaptation.
Each lease represents an explicit promise to the lease holder for the duration of the lease. The notion of a lease as an enforceable contract is important in systems where the interests of the participants may diverge, as in peer-to-peer systems and economies.

Leases in Shirako are also similar to soft-state advance reservations [8,30], which have long been a topic of study for real-time network applications. A similar model is proposed for distributed storage in L-bone [3]. Several works have proposed resource reservations with bounded duration for the purpose of controlling service quality in a grid. GARA includes support for advance reservations, brokered co-reservations, and adaptation [11,12].

Virtual execution environments. New virtual machine technology expands the opportunities for resource sharing that is flexible, reliable, and secure. Several projects have explored how to link virtual machines in virtual networks [9] and/or use networked virtual machines to host network applications, including SoftUDC [18], In Vigo [20], Collective [25], SODA [17], and Virtual Playgrounds [19]. Shared network testbeds (e.g., Emulab/Netbed [28] and PlanetLab [4]) are another use for dynamic sharing of networked resources. Many of these systems can benefit from foundation services for distributed lease management.

PlanetLab was the first system to demonstrate dynamic instantiation of virtual machines in a wide-area testbed deployment with a sizable user base. PlanetLab's current implementation and Shirako differ in their architectural choices. PlanetLab consolidates control in one central authority (PlanetLab Central or PLC), which is trusted by all sites. Contributing sites are expected to relinquish permanent control over their resources to the PLC. PlanetLab emphasizes best-effort open access over admission control; there is no basis to negotiate resources for predictable service quality or isolation. PlanetLab uses leases to manage the lifetime of its guests, rather than for resource control or adaptation.

The PlanetLab architecture permits third-party brokerage services with the endorsement of PLC. PlanetLab brokers manage resources at the granularity of individual nodes; currently, the PlanetLab Node Manager cannot control resources across a site or cluster. PLC may delegate control over a limited share of each node's resources to a local broker server running on the node. PLC controls the instantiation of guest virtual machines, but each local broker is empowered to invoke the local Node Manager interface to bind its resources to guests instantiated on its node. In principle, PLC could delegate sufficient resources to brokers to permit them to support resource control and dynamic adaptation coordinated by a central broker server, as described in this paper.

One goal of our work is to advance the foundations for networked resource sharing systems that can grow and evolve to support a range of resources, management policies, service models, and relationships among resource providers and consumers. Shirako defines one model for how the PlanetLab experience can extend to a wider range of resource types, federated resource providers, clusters, and more powerful approaches to resource virtualization and isolation.

7 Conclusion

This paper focuses on the design and implementation of general, extensible abstractions for brokered leasing as a basis for a federated, networked utility. The combination of Shirako leasing services and the Cluster-on-Demand cluster manager enables dynamic, programmatic, reconfigurable leasing of cluster resources for distributed applications and services. Shirako decouples dependencies on resources, applications, and resource management policies from the leasing core to accommodate diversity of resource types and resource allocation policies. While a variety of resources and lease contracts are possible, resource managers with performance isolation enable guest applications to obtain predictable performance and to adapt their resource holdings to changing conditions.

Bibliography

1: Ant, September 2005.
http://ant.apache.org/.
2: P. Barham, B. Dragovic, K. Faser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield.
Xen and the art of virtualization.
In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP), October 2003.
3: A. Bassi, M. Beck, T. Moore, and J. S. Plank.
The logistical backbone: Scalable infrastructure for global data grids.
In Proceedings of the 7th Asian Computing Science Conference on Advances in Computing Science, December 2002.
4: A. Bavier, M. Bowman, B. Chun, D. Culler, S. Karlin, S. Muir, L. Peterson, T. Roscoe, T. Spalink, and M. Wawrzoniak.
Operating system support for planetary-scale network services.
In First Symposium on Networked Systems Design and Implementation (NSDI), March 2004.
5: A. Birrell, G. Nelson, S. Owicki, and E. Wobber.
Network Objects.
In Proceedings of the 14th ACM Symposium on Operating Systems Principles, pages 217-230, December 1993.
6: J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle.
Managing energy and server resources in hosting centers.
In Proceedings of the 18th ACM Symposium on Operating System Principles (SOSP), pages 103-116, October 2001.
7: J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle.
Dynamic virtual clusters in a grid site manager.
In Proceedings of the Twelfth International Symposium on High Performance Distributed Computing (HPDC-12), June 2003.
8: M. Degermark, T. Kohler, S. Pink, and O. Schelen.
Advance reservations for predictive service in the Internet.
Multimedia Systems, 5(3):177-186, 1997.
9: R. J. Figueiredo, P. A. Dinda, and F. Fortes.
A case for grid computing on virtual machines.
In International Conference on Distributed Computing Systems (ICDCS), May 2003.
10: I. Foster, K. Czajkowski, D. F. Ferguson, J. Frey, S. Graham, T. Maguire, D. Snelling, and S. Tuecke.
Modeling and managing state in distributed systems: The role of OGSI and WSRF.
Proceedings of the IEEE, 93(3):604-612, March 2005.
11: I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, and A. Roy.
A distributed resource management architecture that supports advance reservations and co-allocation.
In Proceedings of the International Workshop on Quality of Service, June 1999.
12: I. Foster and A. Roy.
A quality of service architecture that combines resource reservation and application adaptation.
In Proceedings of the International Workshop on Quality of Service, June 2000.
13: Y. Fu, J. Chase, B. Chun, S. Schwab, and A. Vahdat.
SHARP: An Architecture for Secure Resource Peering.
In Proceedings of the 19th ACM Symposium on Operating System Principles, October 2003.
14: C. Gray and D. Cheriton.
Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency.
In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, December 1989.
15: M. Hibler, L. Stoller, J. Lepreau, R. Ricci, and C. Barb.
Fast, scalable disk imaging with Frisbee.
In Proceedings of the USENIX Annual Technical Conference, June 2003.
16: D. Irwin, J. Chase, L. Grit, and A. Yumerefendi.
Self-Recharging Virtual Currency.
In Proceedings of the Third Workshop on Economics of Peer-to-Peer Systems (P2P-ECON), August 2005.
17: X. Jiang and D. Xu.
Soda: A service-on-demand architecture for application service hosting utility platforms.
In 12th IEEE International Symposium on High Performance Distributed Computing, June 2003.
18: M. Kallahalla, M. Uysal, R. Swaminathan, D. Lowell, M. Wray, T. Christian, N. Edwards, C. Dalton, and F. Gittler.
SoftUDC: A software-based data center for utility computing.
In Computer, volume 37, pages 38-46. IEEE, November 2004.
19: K. Keahey, K. Doering, and I. Foster.
From sandbox to playground: Dynamic virtual environments in the grid.
In 5th International Workshop in Grid Computing, November 2004.
20: I. Krsul, A. Ganguly, J. Zhang, J. Fortes, and R. Figueiredo.
VMPlants: Providing and managing virtual machine execution environments for grid computing.
In Supercomputing, October 2004.
21: R. Macklem.
Not quite NFS, soft cache consistency for NFS.
In USENIX Association Conference Proceedings, pages 261-278, January 1994.
22: D. Oppenheimer, J. Albrecht, D. Patterson, and A. Vahdat.
Design and Implementation Tradeoffs in Wide-Area Resource Discovery.
In Proceedings of Fourteenth Annual Symposium on High Performance Distributed Computing (HPDC), July 2005.
23: P. M. Papadopoulous, M. J. Katz, and G. Bruno.
NPACI Rocks: Tools and techniques for easily deploying manageable Linux clusters.
In IEEE Cluster 2001, October 2001.
24: J. Pormann, J. Board, D. Rose, and C. Henriquez.
Large-scale modeling of cardiac electrophysiology.
In Proceedings of Computers in Cardiology, September 2002.
25: C. Sapuntzakis, R. Chandra, B. Pfaff, J. Chow, M. S. Lam, and M. Rosenblum.
Optimizing the migration of virtual computers.
In 5th Symposium on Operating Systems Design and Implementation, December 2002.
26: N. Taesombut and A. Chien.
Distributed Virtual Computers (DVC): Simplifying the development of high performance grid applications.
In Workshop on Grids and Advanced Networks, April 2004.
27: J. Waldo.
The Jini architecture for network-centric computing.
Communications of the ACM, 42(7):76-82, July 1999.
28: B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar.
An Integrated Experimental Environment for Distributed Systems and Networks.
In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), December 2002.
29: A. Wollrath, R. Riggs, and J. Waldo.
A distributed object model for the Java system.
In Proceedings of the Second USENIX Conference on Object-Oriented Technologies (COOTS), June 1997.
30: L. Zhang, S. Deering, D. Estrin, S. Shenker, and D. Zappala.
RSVP: A New Resource ReSerVation Protocol.
IEEE Network, 7(5):8-18, September 1993.