The TOPIX GridLab Project - The Distributed Computing System

Transcript

The TOPIX GridLab Project - The Distributed Computing System
The TOPIX GridLab Project
Cosimo Anglano, Massimo Canonico
Dipartimento di Informatica
Universita’ del Piemonte Orientale
Alessandria, Italy
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Outline






Grid Computing: a very brief introduction
Peer-to-Peer Grid Computing & OurGrid
GridLab Architecture
HowTo{Use, Contribute}
Demonstration of use
Conclusions
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing

E-Science: massive use of computers to
perform scientific research
• investigation tools (simulation, data mining, etc.)‫‏‬
• remote collaboration tools

Grid computing conceived as the answer to
these computation needs:
• use of a set of geographically dispersed resources
as a single computing platform
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the metaphor

To use an electrical appliance, you just plug in
the power cord into the outlet ...
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the metaphore

... without caring how electricity has been
transported to your home, and who did it ...
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the metaphore

... and who and how it has been generated
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the ultimate goal

Bring computational power to end-users by:
• aggregating as many resources as possible
• hiding which resources are/will be used by
applications
• simplifying as much as possible the interface
between users and the Grid
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the vision
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the issues

Various issues to be tackled
• resource heterogeneity
• geographic distribution
• security
• seamless access across different administrative
domains
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the approach

Use middleware to provide uniform access to
heterogeneous resources
• co-exists with local OSes, and resource
management/security policies and mechanisms
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the “Big Iron” solution

Individual entities contribute with their
resources to what is called a Virtual
Organization (VO)
• temporary association of individual entities
• resource sharing by out-of-band agreements
• VO composition is static (a new member can join
only if the other ones agree)‫‏‬

Typical Grids aggregate high-end, always online and continuously maintained resources
• high-end clusters and/or mainframes
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: the “Big Iron” solution

Globus is the de facto standard middleware
• provides mechanisms to deal with heterogeneity,
secure communication, authentication,
authorization, and resource management
• deployed on dozen of sites
• but requires highly specialized skills and complex
off-line negotiations

Good solution for large labs that work in
collaboration with other large labs
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Voluntary Grid Computing

Harvest the computing power volountarily
donated by individual computer owners
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Voluntary Grid Computing

Popularized by highly visible projects
• the Great Mersenne Prime Search (GIMPS) gives
monetary prizes
• Seti@home received a very large media coverage
• other similar projects (FightAIDS@home,
Folding@home) very successful

BOINC is probably the most prominent
middleware
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Voluntary Grid Computing

Works only if you
• have a very good support team to run “the server”
• invest a good deal of effort in “advertising”
• have a very high visibility project
• are in a prestigious institution
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Computing: an alternative vision

How about small research labs/groups?
• are small
• focus their research on some narrow topic
• do not belong to top Universities
• cannot count on cutting-edge computer support
team
Yet …
they increasingly demand large amounts of
computing power, just as large labs and highvisibility projects do
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Peer-to-Peer Grids

Grids in which participant join spontaneously,
without prior agreements/negotiations, and
may leave without prior notice
• alternative to VO concept

Focus on cooperative resource sharing: “I will
let you use my resources (when I don’t need
them) if I can use yours”
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Peer-to-Peer Grids

Approach pioneered by the Brazilian OurGrid
project
• carried on at the Universidad Federal de Campina
Grande (http://www.ourgrid.org)‫‏‬
• sponsored by HP Brazil
• started in 2003, currently very active
• deployed on a public testbed that can be used by
anyone interested
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid Design Principles




Labs can freely join the system without any human intervention
•
No need for negotiation; no paperwork
Clear incentive to join the system
•
•
•
One can’t be worse off by joining the system
Noticeable increased response time
Freeriding resistant
Basic dependability properties
•
•
•
Some level of security
Some resilience to failures
Scalability
Easy to install, configure and program
•
No need for specialized support team
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid Application Model

Focuses on Bag-of-Tasks (BoT) applications
• No communication among tasks
• facilitates scheduling and security enforcement
• Simple fail-over/retry mechanisms to tolerate faults
• No need of QoS guarantees
• Script-based programming is natural
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid Application Model

Many applications fall in the BoT class:
• data mining
• massive search
• bio computing
• parameter sweep
• Monte Carlo simulations
• fractal calculations
• image processing
• ...
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid architecture
MyGrid: User Interface &
Application Scheduling
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid architecture
OurGrid P2P network
Peer: Site Manager
Grid-wide Resource Sharing
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid architecture
SWAN:
Sandboxing
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid: fostering cooperation

To avoid selfish behaviors (a.k.a. freeriding)
OurGrid uses the Network of Favors mechanism
• All peers maintain a local balance for all known peers
• Peers with greater balances have priority
• Newcomers and peers with negative balance are
treated equally
• The emergent behavior of the system is that by
donating more, one gets more resources back
• No additional infrastructure is needed
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid: fostering cooperation

Each peer keeps a local record of the
value of the favors it has given to and
received from each other peer
• RA(B)=max{0, received from B – donated to B }
=max{0, VA(B,A) – VA(A,B)}
• VA(B,A) = t(B,A) x RPA(B)
• RPA(B) = e(A)/e(B)
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid: fostering cooperation

Each lab can decide whether an external task
running on its machines is killed or not when
an internal task is generated
• leaving the external task running increases the favor
balance but delays internal tasks
• killing external tasks wastes the corresponding
donated CPU time, but does not delay internal tasks
Dipartimento di Informatica, Universita’ del Piemonte Orientale
OurGrid: security mechanisms


Running an unknown application that comes
from an unknown peer is a clear security threat
OurGrid runs remote task runs inside a Xen
virtual machine, with no network access, and
disk access only to a designated partition
• leverages the fact that BoT applications only
communicate to receive input and return the output
• input/output is done by OurGrid itself that runs in a
Xen virtual machine

Sanity checks are executed before a new task
is run
Dipartimento di Informatica, Universita’ del Piemonte Orientale
GridLab: TOPIX’s OurGrid testbed

Small set of high-performance machines
• three dual-processors (2 GHz Xeons Quad Core)
with 4 GB RAM each
• located at Alessandria
• anyone interested can contribute with its own
resources
Dipartimento di Informatica, Universita’ del Piemonte Orientale
GridLab Demo

Running a distributed version of PovRay
• creates three-dimensional, photo-realistic images
using a rendering technique called ray-tracing
• The image to be elaborated is split in many tiles
• The elaboration of each tile corresponds to a task

Side-by-side comparison between the
sequential (running locally) and distributed
version
Dipartimento di Informatica, Universita’ del Piemonte Orientale
GridLab Demo: The balcony

Balcony: a picture of a balcony with a view on
the sea

PovRay performs many consecutive passes to
accurately render it
Dipartimento di Informatica, Universita’ del Piemonte Orientale
GridLab Demo: molecular rendering

Sometimes to achieve good performance you
must think about the characteristics of your
application
• some tuning may be required
• this is the only efforts that must be done to “port” a
BoT application on OurGrid
Dipartimento di Informatica, Universita’ del Piemonte Orientale
How to use: install MyGrid

MyGrid requirements
• User account
• Disk space (about 15MB)‫‏‬
• Linux
• Java – version 1.5 or later

MyGrid installation procedure
• Download and unpack the distribution
• Run a script that will ask a few simple questions
Dipartimento di Informatica, Universita’ del Piemonte Orientale
How to use: define your job

Job is a group of tasks described in a simple
text file:
job:
label: myjob
task:
remote: myApplication par1 > output
final: get output output
Dipartimento di Informatica, Universita’ del Piemonte Orientale
MyGrid GUI
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Job submission
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Job Monitoring
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Job replication
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Job cancellation
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Job description file for Pov-Ray
job:
label: balcony
requirements: (environment = povray)‫‏‬
init: store balcony.pov $STORAGE/balcony.pov
task:
remote: /usr/local/bin/povray +SC68 +EC134
+I$STORAGE/balcony.pov +O$PLAYPEN/balcony-tile$JOB.$TASK.png
final: get balcony-tile-$JOB.$TASK.png balcony-tile-$JOB.$TASK.png
task:
remote: /usr/local/bin/povray +SC135 +EC200
+I$STORAGE/balcony.pov +O$PLAYPEN/balcony-tile$JOB.$TASK.png
final: get balcony-tile-$JOB.$TASK.png balcony-tile-$JOB.$TASK.png
Dipartimento di Informatica, Universita’ del Piemonte Orientale
How to contribute: install the Peer

Peer
• Same requirements as MyGrid
• Download and unpack the distribution
• Create a Grid Description File
Dipartimento di Informatica, Universita’ del Piemonte Orientale
Grid Description File
gum:
name : frodo.lsd.ufcg.edu.br
mem : 56
gum:
name : gandalf.lsd.ufcg.edu.br
environment : povray
gum:
name : warlock.lsd.ufcg.edu.br
os : windows
port : 2351
Dipartimento di Informatica, Universita’ del Piemonte Orientale
How to contribute: install the user agent‫‏‬

User agent: on all computation machines
• For Linux machine
• Same requirements as for MyGrid
• Automatic installation script
• Virtualization with SWAN (optional)‫‏‬
• For Windows machine
• Java Runtime Environment (JRE) 5.0
• Windows 2000 or Windows XP
Dipartimento di Informatica, Universita’ del Piemonte Orientale
How to contribute: install SWAN

SWAN is a on-deman virtual machine used to
compute task on remote machine
• Task is executed in a sandbox
• All data related to the task execution is cleaned up
after the task completion
• SWAN introduces a significative overhead (about 1
minute)‫‏‬

Without SWAN, the execution is carried out by a
specific user in the remote machine