p2p storage networks - e-learning

Transcript

p2p storage networks - e-learning
Università degli Studi di Pisa
Dipartimento di Informatica
Lesson 1
P2P Systems:
an introduction
24/9/2015
Laura Ricci
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
1
USEFUL INFORMATION
The exam can be taken in the following degrees:

Lauree Magistrali in Informatica, Business Informatics, Informatica e
Networking

Laurea triennale (26) (vecchio ordinamento)
Prerequisites:

Computer Networks

Algorithm design
Instructor: Laura Ricci (7 credits), Emanuele Carlini, ISTI, CNR (1 credit)
Hours per week: 2 + 2
Credits: 6.0
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
2
TENTATIVE PROGRAM (WITH SOME INNOVATIONS)
This year we will try to introduce some innovation with respect to the
program of the last years:
●
●
main focus is still on P2P systems
●
algorithms and data structures
●
formal methods
●
applications
P2P systems are complex networks, millions of nodes and connections
●
complex network analysis tools required for the analysis of P2P
topologies.
●
same tools can be applied also in other fields (social networks, big
graphs,...)
●
distributed algorithms for P2P systems exploits local information only.
●
many similarities between P2P
algorithms and vertex-centric
algorithms for the analysis of big graph
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
3
TENTATIVE PROGRAM (WITH SOME INNOVATIONS)
this year we will try to introduce some innovation with respect to the
program of the last years:
●
greater focus on pratical experience
●
not a laboratory course, but some classes devoted to programming
●
acquiring experience on a P2P simulator useful for the final project
●
developing algorithms for P2P systems and for the analysis of big graphs
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
4
TENTATIVE PROGRAM
P2P Systems
●
●
●
●
●
●
unstructured overlays: Flooding, Random Walks
structured overlays: Distributed Hash Tables (DHT)
●
The Chord DHT: a Markov chain model of Chord routing
●
Kademlia
●
Applications:

NoSQL Large Scale Distributed Storage

The eMule KAD implementation of Kademlia
Probabilistic Data Structures for P2P systems
System issues: NAT transversal, heterogenity
P2P Applications
●
Bittorrent: a Content Distribution Network
●
incentives, game based cooperation
●
Bitcoin: Digital currency. Anonimicity in P2P networks
Simulation Environments for P2P Systems: Peersim
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
5
TENTATIVE PROGRAM
Distributed Algorithms for P2P systems and big data:

P2P asynchronous model

synchronous models and frameworks for Big graph computing

vertix centric approaches

epidemic algorithms, random sampling

classical graph algorithms: k-core, connected components, triangle
counting
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
6
TENTATIVE PROGRAM
Models for P2P networks and Big Graphs:

Random Graphs and Small Worlds

Small world navigability: Watts Strogatz and Kleinberg Models

Complex networks navigability: case studies

the Darknet Freenet, a Small World Network

Symphony
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
7
AND IN CASE THERE IS SOME TIME LEFT......

P2P audio/video Streaming

streams diffusion overlays: tree, multi-tree, mesh

Distributed virtual environment: massively multiplayer online games

Distributed Online Social Networks
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
8
DIDACTIC MATERIAL
Mandatory

lesson slides (download this year slides)

tutorial and material published on the course page
Some lesson refer to material from these books

Sasu Takoma, Overlay Networks, Toward Information Networking, 2010

Maarten Van Steen, Graph Theory and Complex Networks, 2010

Korzun, Dmitry, Gurtov, Andrei, Structured Peer-to-Peer Systems:
Fundamentals of Hierarchical Organization, Routing, Scaling, and Security
Other books:




R. Steinmetz, K.Wehrle, Peer to Peer Systems and Applications, LNCS
3485, Springer Verlag, 2005
Peer to Peer Computing, Applications, Architecture, Protocols and
Challanges, Yu-Kwong Ricky Kwok, CRC Press, 2012
Q.Hieu Vu, M.Lupu, B.Chin Ooi, Peer to Peer Computing, Principles and
Applications, Springer Verlag, 2010
Buford, Yu, Lua, P2P Networking and Applications, Morgan Kaufmann,
2009
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
9
EXAM RULES
●


Mid term + Final term:

if the student pass both mid and final term the oral proof, he/she
is exempted from the oral examination

tentative ideas:

mid term: reading a paper and writing a brief relation

final term: programming exercise useful for the final project
choice: written examination or project

project proposals from the students are accepted

the project may be the first part of the final thesis
oral examination
only for those who have not passed the midterm and/or final term
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
10
TOOLS FOR THE PROJECT

Peersim

a simulator oriented to the P2P simulation

highly scalable (until 10000 peers for the implementation of simple
protocols)

http://peersim.sourceforge.net/


some lesson to introduce this tool, full support from my team
PeerfactSim.KOM

a new proposal

supports the high scale simulation on P2P systems

http://peerfact.kom.e-technik.tu-darmstadt.de/
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
11
PROJECT PROPOSALS

Simulation of

unstructured P2P networks

distributed Hash Tables (DHT)

self-organizing systems

analysis of the P2P topologies

implementation of vertex-centric algorithms for the analysis of big
graphs
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
12
PEER TO PEER SYSTEMS

Definition 1:
A peer to peer system is a set of autonomous entities (peers) able to
auto-organize and sharing a set of distributed resources in a computer
network. The system exploits such resources to give a service in a complete
or partial decentralized way

Shared Resources:

Read Only Information (Files)

Read/Write storage space (Distributed File System)

Computing power

Bandwidth
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
13
PEER TO PEER SYSTEMS

Definition 2:
A P2P system is a distributed system defined by a set on nodes
interconnected able to auto-organize and to build different topologies with
the goal of sharing resources like CPU cycles, memory, bandwidth. The system
is able to adapt to a continous churn of the nodes maintaining connectivity and
reasonable performances without a centralized entity (like a server)
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
14
RESOURCE SHARING

P2P: is relative to give and receive from a a community. Each peer shares a
set of resources and obtains, in return, a set of resources




the most common scenario: share musics (audio files) and obtain music in
return (Napster, Gnutella,…)
a peer behaves both like a client and like a server (symmetric
functionality = SERVENT)
but a peer can decide of offering for free a resource, for instance to
participate to a project

searching extra-terrestrial life

cancer therapies research
the shared resources are at “the border” of Internet, they are directly
shared by the peers, there are no “special purpose nodes” for their
management.
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
15
RESOURCE SHARING



the peers' connection is transient: the connections and disconnections to the
network are very frequent
the resources offered by the peers are dynamically added and removed
each peer is paired with a different IP address for each connection to the
system

a resource cannot be located by a static IP

new addressing mechanisms have to be defined, at the application level, not
at the IP level
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
16
P2P: FILE SHARING AND BEYOND...
P2P file sharing

file sharing:light weight/ best effort

persistence and security are not
the main goal

anonymicity is important

Examples:

Napster

Gnutella, KaZaa

eMule

BitTorrent
P2P Content Publishing e Storage Systems

persistency and security are main
goals

Examples: Freenet, initially Wuala then
evolved towards cloud based solutions
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
17
P2P: FILE SHARING AND OTHERS...



Bitcoin Digital Currency:

a peer-to-peer version of the classical electronic payment

the payments may be directly done between the users without a
centralized financial entity

no central entity which guarantees the electronic payment, lower costs

one of my phd students is currently analysing the Bitcoin network
Voice over P2P

P2PSIP working group

definition of a P2P protocol considering NATs

minimum involvement of the centralized server

exploits Distributed Hash Tables

RELOAD protocol definition: Resource Location and Discovery

VoP2P: Skype
IPTV: video streaming applications

PPlive, Ppstream, widely spread in Cina

Joost, Soapcast
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
18
P2P DIFFUSION: MAIN STAGES
2003: Sandvine Study

in Europe (France, Germany, ..)
EDonkey/EMule

in USA
KaZaA/Fastrack
2005:

BitTorrent

the most popular P2P file sharing network

Skype dominates VOIP

KaZaA becomes less popular

eDonkey replaced by eMule exploiting a similar protocol
2008: Spotify in Sweden
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
19
P2P DIFFUSION: MAIN STAGES
2009:

Wuala P2P-based storage service, now cloud-based

users can decide to share with the community the free space on the
hard disks of the users

replication

security issues

KaZaA is no more widely used

PPLive: P2P-based Video Streaming Platform: diffusion in Asia

Vuze (an Azureus evolution); P2P-based Video-on demand

First version of Bitcoin
Some challanges for the future:

P2P social networks?

integration cloud-P2P?

P2P Massively Multiplayer Games?
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
20
FILE SHARING: A 'KILLER APPLICATION'


File Sharing: it begins with the rapid success of Napster, at the end of
nineties, about 10 years later than Wide Web
First generation: Napster

introduces a set of servers where the users register the descriptors of
the files they are going to share



only the content transmission (download/upload) exploits a P2P protocol,
content search is centralized
the presence of a centralized directory has been the 'Achille's heel' of
this application
its was convicted because it did not respect the law on the copyright

it could have detected the content exchanged between the users
through the analysis of the centralized directory
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
21
FILE SHARING: A 'KILLER APPLICATION'


File sharing: second generation

no centralization point

both file search and content transfer are completely distributed

Gnutella, FastTrack/Kazaa, BitTorrent
Side effects of the diffusion of file sharing applications: change of the way
users music is accessed by the

from CD to online music

iTunes
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
22
FILE SHARING: A 'KILLER APPLICATION'

A P2P file sharing application








A user U has a P2P client on its notebook
the connection of the to Internet is intermittent : each time the user obtains a
new IP address for each new connection
the user stores the shared files in a directory and pairs each file with a set of
keys able to identify it (for a song: title, author, publication date,...)
U is interested in finding a song and sends a query to the system
the client finds and shows to the user information about the other peers which
own the required song
U chooses a peer P among these (according some criteria, we will see in the
following)
the file is copied from the notebook of P to that of U
while U is performing the download, other users can upload the parts of the file
already downloaded by U and put in the shared directory
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
23
FILE SHARING: A 'KILLER APPLICATION'
The P2P client behaves like a servlet and allows:

the user to define a directory, in its file system, where the file it is going to
share are stored. Each other peer of the P2P network may download files
from this directory


the user to download files from the shared directory


a peer behaves like a web server
a peer behaves like a client
the user to find the content it is interested in, through queries submitted to
the system

this functionality is similar to that of Google
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
24
FILE SHARING: SOME PROBLEMS



Several P2P clients

can be freely downloaded from the network, but they may contain
spyware or malware,

the client software is free, but it often implies a violation of the
privacy.
Spyware = a software that

collects information related to the online behaviour of a user (visited
sites, online purchases, etc. ) without its authorization.

sends information to an organization which exploits such information
to send focused advertisements to the user
Malware = a client that can damage the computer when it is executed.
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
25
FILE SHARING: SOME PROBLEMS
Pollution (= Inquinamento)


a large amount of corrupted material is injected in the P2P network.
the peers cannot distinguish the good material from the corrupted one, so
they download the corrupted material and contribute to its diffusion.

the number of the corrupted copies may exceed that of those good

pollution companies: Overpeer
The free riders phenomenon:


Peer exploiting the application to download the content shared by other
peers, but they do not offer any content
Solutions:


Incentives mechanisms (eMule credits)
Detecting and isolating free riders (Graph based Algorithms in
Bittorrent)
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
26
VOICE OVER P2P



IMP (Instant Messaging and Presence) applications

Microsoft Messanger, Yahoo Messenger, Jabber, based on a
server architecture.
client
Clients VoIP (Voice over IP) starts diffusion in the middle '90

desktop-to-desktop free calls

low voice quality

limited diffusion
Skype: VoP2P application, voice over P2P: 2003

more than 500 milions of accounts and more than 50 milions of active
users

desktop to desktop calls and desktop to public telephon network calls

good call quality

integrates voice call with IMP characteristics
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
27
SKYPE: GENERAL CHARACTERISTICS




The architecture is derived from that of Kazaa and is based on the presence
of SuperPeers
Based on a proprietary protocol with encripted traffic

no documentation, some analysis based on sniffers
The SuperPeers are responsible of:

looking for the peers and their localization

it is necessary to detect the single peer with whom the call has to be
established (difference with file sharing)

peer localization is a task of the SuperPeer

they behave like rely-router for the peers which are behind a firewall
and a NAT (60% of the user is behind a NAT)
Supports VOIP sessions with a set of users

algorithm for merging of different voice flows
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
28
P2PTV: VIDEO STREAMING
●


The client server model has several problems with these applications
because of the high computational requirements of these applications
Multimedia audio/video files may be downloaded through file sharing
clients: the audio/video file is before downloaded (at least some parts of it)
and then displayed (like a play back)
Video Streaming:


real time transfer and display of the video stream
Cooperation between the peer for then content distribution (content
distribution networks)



Bittorrent
Buffering a set of video frames
Commercial system: SoapCast, PPlive, TVAnts...
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
29
SPOTIFY: P2P LIVE MUSIC STREAMING

A music live streaming serving exploiting an hybrid P2P client/server

Advertised in 2008, initially in Sweden, then in other European countries

In Usa in2011
…..and in Italy on 12 february 2013....(a coincidence??)

Currently more than 20 milions of users, 5 milions with fee

Similar solutions: french Deezer, Pandora from USA


Spotify clients can be freely downlaoded, but the client requires the creation
of a Spotify account
Automatic client update
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
30
SPOTIFY: P2P LIVE MUSIC STREAMING




Combines functionality of P2P and client/server
Main goal for the development of a P2P protocol:

increase the service scalability, decrease the server load

exploit the available network bandwidth
Mechanisms similar to those exploited for file sharing:

unstructured network: it does not exploit the DHT

peer detection: ideas from Gnutella and Bittorrent

NAT transversal mechanism
Streaming requires mechanisms for

buffer management

prefetching

exploit markovian models for the prediciting the client throughput
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
31
P2P BANDWIDTH SHARING

A server pubblishes new content (example: a new game release, a new
release of an operating system, a song...)
Sender
Router
Unicast
Router
Router
Router
Receiver

Receiver
Receiver
Receiver
Receiver
client server model: a single centralized server is a bottleneck
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
32
P2P CONTENT DISTRIBUTION

P2P approaches


To obtain a better balance of the use of the communication bandwidth by
exploiting the less used communication channels
Sender
Peer-to-Peer Content Distribution

the initial file request are served by a

Router
centralized server
further requests are sent to the peers
which have already received and replicated
Receiver/
Sender
Receiver/
Sender
Router
Router
the files
Receiver/
Sender
Receiver/
Sender
Receiver/
Sender
Receiver/
Sender
Receiver/
12
Sender
10
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
33
P2P CONTENT DISTRIBUTION
 A combined use of the P2P and of the client-server model enables the
optimization of the accesses to the server
 Segmented approach (example:BitTorrent)
Doc
Doc
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
34
P2P CONTENT DISTRIBUTION
●
●
A combined use of the P2P and of the client-server model enables the
optimization of the accesses to the server
Segmented approach (example:BitTorrent)
Part
Part
33
Doc
Doc
Part
Part
3
Part
3
Part
22Part
Part
33
Part
Part
22
Part
Part
11
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
35
P2P CONTENT DISTRIBUTION
●
●
A combined use of the P2P and of the client-server model enables the
optimization of the accesses to the server
Segmented approach (example:BitTorrent)
Doc
Doc
Part
Part
11
Part
Part
33
Part
Part
11
Part
Part
22
Part
Part
3
Part
3
Part
22Part
Part
33
Part
Part
11
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
36
P2P CONTENT DISTRIBUTION
●
●
A combined use of the P2P and of the client-server model enables the
optimization of the accesses to the server
Segmented approach (example:BitTorrent)
Part
Part
11Part
Part
22Part
Part
33
Doc
Doc
Part
Part
3
Part
3
Part
22Part
Part
33
Part
Part
1Part
1Part
22
Part
Part
11
Part
Part
22
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
37
P2P CONTENT DISTRIBUTION
●
●
A combined use of the P2P and of the client-server model enables the
optimization of the accesses to the server
Segmented approach (example:BitTorrent)
Part
Part
11Part
Part
22Part
Part
33
Doc
Doc
Part
Part
3
Part
3
Part
22Part
Part
33
Part
Part
1Part
1Part
2Part
2Part
33
Part
Part
11
Part
Part
2Part
2Part
33
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
38
P2P CONTENT DISTRIBUTION
●
●
A combined use of the P2P and of the client-server model enables the
optimization of the accesses to the server
Segmented approach (example:BitTorrent)
Part
Part
11Part
Part
22Part
Part
33
Doc
Doc
Doc
Doc
Part
Part
3
Part
3
Part
22Part
Part
33
Part
Part
1Part
1Part
2Part
2Part
33
Part
Part
11
Part
Part
2Part
2Part
33
Dipartimento di Informatica
Università degli Studi di Pisa
Doc
Doc
Doc
Doc
Introduction
Laura Ricci
39
P2P STORAGE NETWORKS


A P2P Storage Network is a computer cluster which are connected in the
network and exploits all the memory shared by the peers for the definition
of a distributed storage system

Examples: DHT, often store read a read only index

Real distributed file systems: PAST, Freenet, OceanStore, Wuala
Organization:

peer-identifiers association through a hash function

each peer offers a part of its storege space or pays an amount of money



a maximum volume of data are assigned to each peer, according to the
contribution of the peer to the storage network.
document-identifier assignment computed through a hash function
computed on the name of the file or on a part of its content
The storage and search of files in the network is guided by the
identifiers paired with the hash function with peers and files.
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
40
P2P STORAGE NETWORKS
Storage Network Definition
ID
4
Hash
ID
1
ID
ID
ID
Dipartimento di Informatica
Università degli Studi di Pisa
17
ID
25
3
ID
Introduction
8
10
Laura Ricci
41
P2P STORAGE NETWORKS
Storage Network Definition
ID
Hash
ID
4
Hello ???
1
ID
ID
17
ID
25
8
Hello ???
ID
Dipartimento di Informatica
Università degli Studi di Pisa
3
ID
Introduction
10
Laura Ricci
42
P2P STORAGE NETWORKS
Storage Network Definition
ID
4
ID
Hash
Hello ???
ID
1
ID
ID
25
ID
ID
4
17
ID
25
8
3
Hello ???
ID
Dipartimento di Informatica
Università degli Studi di Pisa
3
ID
Introduction
10
Laura Ricci
43
P2P STORAGE NETWORKS
Storage Network Definition
neighbors
3
4
ID
4
ID
25
Hash
Hello ???
ID
1
ID
ID
25
ID
ID
4
17
ID
25
8
3
Hello ???
ID
Dipartimento di Informatica
Università degli Studi di Pisa
3
ID
Introduction
10
Laura Ricci
44
P2P STORAGE NETWORKS
Document Storage
1
17
25
3
4
25
ID
ID
1
4
1
4
10
ID
3
4
8
ID
17
ID
25
Dipartimento di Informatica
Università degli Studi di Pisa
8
3
8
25
1
10
17
ID
10
17
3
ID
Introduction
10
Laura Ricci
45
P2P STORAGE NETWORKS
Document Storage
1
17
25
3
4
25
ID
ID
1
1
4
10
Hash
ID
4
11
ID
3
4
8
ID
17
ID
25
Dipartimento di Informatica
Università degli Studi di Pisa
8
3
8
25
1
10
17
ID
10
17
3
ID
Introduction
10
Laura Ricci
46
P2P STORAGE NETWORKS
Document Storage
1
17
25
3
4
25
ID
ID
1
1
4
10
Hash
ID
4
11
ID
3
4
8
ID
17
ID
25
Dipartimento di Informatica
Università degli Studi di Pisa
8
3
8
25
1
10
17
ID
10
17
3
ID
Introduction
10
Laura Ricci
47
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
11
3
4
25
ID
ID
1
1
4
10
Hash
ID
4
11
ID
3
4
8
ID
17
ID
25
Dipartimento di Informatica
Università degli Studi di Pisa
8
3
8
25
1
10
17
ID
10
17
3
ID
Introduction
10
Laura Ricci
48
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
11
3
4
25
ID
ID
1
1
4
10
Hash
ID
4
11
ID
3
4
8
ID
17
ID
25
Dipartimento di Informatica
Università degli Studi di Pisa
8
3
8
25
1
10
17
ID
10
17
3
ID
Introduction
10
Laura Ricci
49
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
11
3
4
25
ID
ID
1
ID
1
4
10
Hash
ID
4
11
ID
11
3
4
8
ID
17
ID
25
Dipartimento di Informatica
Università degli Studi di Pisa
8
3
8
25
1
10
17
ID
10
17
3
ID
Introduction
10
Laura Ricci
50
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
11
3
4
25
ID
ID
1
ID
1
4
10
Hash
ID
4
11
ID
11
3
4
8
ID
17
ID
25
Dipartimento di Informatica
Università degli Studi di Pisa
8
3
8
25
1
10
17
ID
10
17
3
ID
Introduction
10
Laura Ricci
51
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
11
3
4
25
ID
ID
1
ID
1
4
10
Hash
ID
4
11
ID
11
3
4
8
ID
17
ID
25
Dipartimento di Informatica
Università degli Studi di Pisa
8
3
8
25
1
10
17
ID
10
17
3
ID
Introduction
10
Laura Ricci
52
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
11
3
4
25
ID
ID
1
ID
1
4
10
Hash
ID
4
11
ID
11
3
4
8
ID
Dipartimento di Informatica
Università degli Studi di Pisa
11
10
17
17
ID
25
8
3
8
25
1
10
17
ID
ID
3
ID
Introduction
10
Laura Ricci
53
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
11
3
4
25
ID
ID
1
ID
1
4
10
Hash
ID
4
11
ID
11
3
4
8
ID
Dipartimento di Informatica
Università degli Studi di Pisa
11
10
17
17
ID
25
8
3
8
25
1
10
17
ID
ID
3
ID
Introduction
10
Laura Ricci
54
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
11
3
4
25
ID
ID
1
ID
1
4
10
Hash
ID
4
11
ID
11
3
4
8
ID
Dipartimento di Informatica
Università degli Studi di Pisa
11
10
17
17
ID
25
3
8
25
1
10
17
ID
ID
3
ID
Introduction
8
ID
11
10
Laura Ricci
55
P2P STORAGE NETWORKS
Document Storage
ID
1
17
25
3
4
25
11
ID
ID
1
ID
4
1
4
10
11
ID
3
4
8
ID
17
ID
25
3
8
25
1
10
17
ID
Dipartimento di Informatica
Università degli Studi di Pisa
10
17
3
ID
Introduction
8
requestor: 1
ID 11
10
Laura Ricci
56
P2P STORAGE NETWORKS


A P2P Storage System exploits all the storage offered by the peers

PAST, Freenet, OceanStore,

Freenet: mechanism for anonimisation

Substitution of the sender of a message

Encrypting Techniques
Distributed Hash Tables techniques:

Each peer offers a part of its storege or pays an amount of money

According to its contribute, a maximum volume of data is assigned to
each peer

Hash functions to assign

identifiers to peers

identifiers to documents computed by exploiting the file name or part
of its content

The storage and search of the files is guided by the identifiers
computed by the hash function.
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
57
P2P OVERLAY



A P2P protocol defines the set of messages that the peers exchange, their
format and their semantics
A common characteristics of all the P2P protocols:

identification of the peer trough a unique identifier, which is generally
computed through a hash function

the protocols are defined at application level (stack TCP/IP) and define
a routing strategy
Overlay Network: logic network connecting peers and defined at
application level

The overlay defines a set of logic links between the peers, which does
not correspond to phisical connections

Each logical link corresponds to:

a set of physical links

transversal of a set of routers
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
58
P2P OVERLAY CLASSIFICATION
Unstructured P2P systems (Gnutella, Kazaa,…)



A new peer randomly connects to a set of peers which are active in the P2P
network
The resulting logical network (overlay network) is unstructured
Look-up algorithms:

centralized servers (Napster).

flooding (Gnutella),
Look up cost= linear with N, where N is the number of nodes in the network

Problem: scalability.
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
59
UNSTRUCTURED OVERLAY


Peers directly interacts without the presence of a centralized server
Interaction paradigm based on a decentralized cooperation, rather than a
centralized coordinator.
C
C
C
P
P
S
P
P
P
C
C
C
Client/Server
Dipartimento di Informatica
Università degli Studi di Pisa
P
P
P
P
Peer to Peer
Introduction
Laura Ricci
60
HYBRID UNSTRUCTURED NETWORKS

To increase the system performance, hybrid solutions have been defined


Dynamic definition oa a set of Superpeers indexing the resources shared
by the peers
Resources are directly exchanged between the peers.
P
P
P
SP
P
SP
P
SP
P
P
P
Hybrid Peer to Peer
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
61
STRUCTURED P2P OVERLAY

The choice of the neighbours is defined according to a given criteria

The resulting overlay network is structured


Goal: guaranteeing the la scalability

the network structure guarantees that the look up of an information
has a given complexity (for instance O(log N)).

complexity guarantees also for peer joins and peer leave
distributed hash tables approaches
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
62
OVERLAY CLASSIFICATION



When considering P2P overlay for file sharing, we can distinguish different
'generation' of P2P system, while this characterization is not yet feasible for
other classes of applications
The different generation refer to the way the overlay is managed and the way
data are indexed.
Systems with centralization points

A centralized server coordinating the peers


Unstructured overlay

Pure Peer to peer, no server


Seti @home, Napster, Bittorrent (first generation)
Gnutella, Kazaa, Skype (but authentication is centralized)
Structured Overlay

DHT- based

CAN, Chord, Pastry, Emule (Kademlia),...
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
63
SYSTEM WITH CENTRALIZATION: SETI@home
SETI@Home: Search for Extraterrestrial Intelligence





A research project (UC California, Berkeley) supported also from some
industries. Starts at the end of 90-ies.
Analysis of the radio emissions received from the space, collected by the
Arecibo telescope, located at PortoRico
Goal: to build a ‘supercomputer’ by aggregating the computational power
offered by the computers connected to the Internet, by exploiting the
inactive cycles of the host
The cost is reduced by a factor of 100 with respect to the usage of a
supercomputers


200 millions of dollars to buy a computer of 30 TeraFlops
Cost of the project: less than 1 million dollars
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
64
VOLUNTEER DISTRIBUTED COMPUTING



A centralized Data Server and a set of clients
BOINC: Berkeley Open Infrastructure for Network Computing

Screen Saver Program developed for different platforms
A single data server distributes the data

Problem: reduced scalability and reliability

During some period, the clients are able to connect to the server only during the night


Exploits proxy servers: the proxy connects to the central server during the night and
distributes data during the day
High ratio computation/communication


Each client connects for a while to receive the data, perform the computation (high
computational request) and then sends the results
No communications between the peers
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
65
BOINC VOLUNTEER DISTRIBUTED COMPUTING
1) your PC gets a set of tasks from the project's scheduling server. The tasks
depend on your PC: the server does not give it tasks that requires more RAM
than you have. Projects can support several applications, and the server may
send you tasks from any of them.
2) your PC downloads executable and input files from the project's data server.
If the project releases new versions of its applications, the executable files are
downloaded automatically to your PC.
3) your PC runs the application programs, producing output files.
4) your PC uploads the output files to the data server.
5) later (up to several days later, depending on your preferences) your PC
reports the completed tasks to the scheduling server, and gets new tasks.
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
66
BOINC: VOLUNTEER DISTRIBUTED COMPUTING


Several interesting P2P characteristics, for instance the incentive syystem
Credit System: Credit = Cobblestone measures the amount of computed tasks

the users competes to collect credits

credit check:

each task is sent to a set of computers

when a peer returns a result, a credit is given to the peer.

the server compares a set of results and, if they are comparable, the
smallest credit is given to all computers
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
67
VOLUNTEER DISTRIBUTED COMPUTING

A long list of projects....http://boinc.berkeley.edu/projects.php

Prime number search

Cognitive Process simulation

Quantum computing simulation

Modelling of malaria transmissions

…..
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
68
SYSTEM WITH CENTRALIZATION: BITTORRENT

Conceived for the download of large amount of data

Currently it is the P2P protocol generating the maximum amount of
Internet traffic (according to some statistics >60%)

Focus on the parallel content download, rather than of sophisticated
strategies for content search.

Files are divided in chunks for download

Do ut des: definition of di strategies for collaboration incentive

Different strategies to choose pieces of files to download

The set of peers which cooperate to the file download (swarm) is coordinated
by a server (tracker)

Tracker = centralization point

Last versions (tracker-less) exploit a DHT
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
69
UNSTRUCTURED FLAT OVERLAY: GNUTELLA

Pure P2P (or flat P2P): all the nodes have the same role and are interconnected
through an unstructured overlay

Goal: the file look up is completely decentralized

Each node




Share some files
Receives the query submitted from the user and forward them to the
neighbours

Forward the query received from the neighbours

Replies to the query, by checking if the shared files match the query

Periodically explore the network to acquire knowledge of new peers
Mechanism for propagating the queries = flooding
Main problems: low scalability due to the high number of messages exchanged
on the overlay
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
70
UNSTRUCTURED HIERARCHICAL OVERLAY: KAZAA

one of the more popular file sharing systems

An innovative characteristics: a hierarchical structure based in superpeers

Nodes with different characteristics


memory

CPU

connectivity

uptime

reachability
Peer heterogeneity is explooited to assign them different roles


Stable nodes characterized by good computational
(computational power, connectivity) become super peer
resources
Simple Peer: 'unstable peer', natted peer, poor connectivity
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
71
STRUCTURED OVERLAY
Distributed Hash Tables:

HASH functions to assign logical identifiers to peers and data

The same identifier space for peer and data

Requires the definition of a mapping function data-peer

Key based search:


Different proposals developed in the academic area


Routing algorithm definition: each routing hop on the overlay is guided by
the key which identifies the data which is looked for
CHORD, CAN, PASTRY,...
Some commercial file sharing exploits the DHT Kademlia

Emule KAD network

Trackerless Bittorrent
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
72
P2P: ADVANTAGES


Advantages; to exploit computational resources 'in excess' (idel CPU cycles
unused storage space, unused band width,...) to have in return
risources/services/social networks participation,..
Advantages for the community


self scaling property: the participation of a larger number of users/host
naturally increases the system resources and its capacity to serve a
larger number of requests
Advantages for the seller: decrease of the cost to set up a new application



client/server: requires the use of a server farm characterized by high
connectivity so that the request of millions of users can be satisfied
fault tolerance: the server farm must be replicated in different
locations
the server must be managed so to offer a service 24*7
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
73
P2P: THE FUTURE

The success of a P2P application greatly depends from the creation of a
“critical mass” of users


critical mass: a level of participation to the P2P network which enables the
application to self sustain
In the first P2P systems the novelty of the application has been fundamental
for reaching a critical mass

File sharing: free content, wide content selection

VoP2P: low cost for international calls, good audio quality,...


P2PTV: access to a wide set of programs, possibility of accessing the
content from any location
The success of new P2P applications will greatly depend besides a good
engineering of the application

from the new application appealing

from the definition of new 'business model'
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
74
P2P: THE FUTURE
Exploiting classical P2P techniques in different contexts:



data-centers with thousand of machines exploit techniques taken from
the P2P world.
Amazon Dynamo: distributed storage system over hosts belonging to the
data center

highly scalable

simple: key/value put/get

guarantees service level agreement

exploits Distributed Hash Tables

trade-off availabilty/strong consistency
Facebook Cassandra
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
75
SCIENTIFIC CHALLANGES



The development of P2P applications is a challange and requires the
solution of several novel problems
The classic metodologies for the development of distributed systems of
the “old generation” cannot be exploited:

the order of magnitude of a P2P system is different with respect to
that of classical systems (milion of nodes with respect to hundreds of
nodes)

classical algorithms/techniques classiche “do not scale” on network
of this size

The failure of one of the nodes is not a rare event, rather a normal
event
Systems with dimension and with this dynamicity level require new tools

probabilistic algorithms

statistical analysis of complex networks

game theory: strategies for the peer cooperation

development of new computational models
Dipartimento di Informatica
Università degli Studi di Pisa
Introduction
Laura Ricci
76