Lecture 1 - Introduzione.pptx

Transcript

Lecture 1 - Introduzione.pptx
(Laboratorio di ) Sistemi Informa3ci Avanza3 Giuseppe Manco Riconoscimen3 •  Il materiale per questo corso è basato sulle seguen3 fon3 –  M. E. J. Newman •  The structure and func3on of complex networks, SIAM Review 45, 167-­‐256 (2003) •  hTp://arxiv.org/abs/cond-­‐mat/0303516 –  Lada A. Adamic •  hTp://open.umich.edu/educa3on/si/si508/fall2008 –  Albert-­‐László Barabási •  hTp://barabasilab.neu.edu/courses/phys5116/ –  David Easley, Jon Kleinberg •  Networks, Crowds, and Markets, Cambridge University Press (2010) •  hTp://www.cs.cornell.edu/home/kleinber/networks-­‐book/ –  … E tanto altro materiale che verrà indicato deTagliatamente –  Un ringraziamento speciale a ques3 ricercatori, senza il cui lavoro la struTurazione delle presentazioni sarebbe stata estremamente fa3cosa Outline • 
• 
• 
• 
Network data (Richiami di) Teoria dei grafi Visual analy3cs Modelli matema3ci per la modellazione delle re3 Outline • 
• 
• 
• 
Evoluzione Centralità, importanza, coesione Communi3es Modelli a blocchi Outline • 
• 
• 
• 
• 
• 
• 
Omofilia ed influenza Trust Link Predic3on Recommenda3on Topic Models En3ty Resolu3on Collec3ve intelligence Obiegvi •  Introdurre proprietà, modelli e tools per la modellazione e l’analisi di grandi re3 real-­‐life •  Trovare paTerns, regole, clusters, outliers,… –  In grafi sta3ci e dinamici •  StruTura e dinamica Perché le re3? •  Le re3 sono in ogni dominio –  Una rete è una qualsiasi collezione di oggeg in cui alcune coppie di tali oggeg sono connesse da links –  Dietro ogni sistema complesso c’è una rete, che definisce le interazioni tra I componen3 •  Focus differen3 –  Inizialmente, analisi di singoli grafi e proprietà di ver3ci e archi singoli –  Computers e re3 di comunicazione hanno portato a re3 con milioni (miliardi) di ver3ci •  StruTura e dinamica –  Re3 con milioni di nodi richiedono tecniche anali3che ad hoc Perché le re3? •  Cosa si oTerrebbe dalla modellazione delle re3? –  PaTerns e proprietà sta3s3che sui da3 –  Modelli e principi di progeTazione –  Capire l’organizzazione intrinseca delle re3, e quindi predire il comportamento di sistemi complessi Notazione “Network” ≡ “Graph”
node
edge
points
lines
vertices
edges, arcs
math
nodes
links
computer science
sites
bonds
physics
actors
ties, relations
sociology
Tipi di re3 • 
• 
• 
• 
Re3 sociali Re3 di informazione Re3 tecnologiche Re3 biologiche Re3 Sociali •  Un insieme di persone o gruppi di persone con qualche paTern di contaTo o interazione tra di esse –  PaTerns di amicizia tra individui –  Relazioni economiche tra organizzazioni –  rappor3 familiari –  … 2
CHAPTER 1. OVERVIEW
27
23
15
10
20
16
4
31
30
13
34
11
14
6
21
1
9
33
29
12
7
3
18
19
2
28
25
17
5
22
8
24
32
26
Figure 1.1: The social network of friendships within a 34-person karate club [421].
La rete sociale delle amicizie tra 34 persone di un karate club The imagery of networks has made its way into many other lines of discussion as well:
sorgente: Easley, operations
Kleinberg: Networks, Crowds, and M
Cambridge Global manufacturing
now have
networks
ofarkets, suppliers,
Web sites have networks
University Press (2010) of users, and media companies have networks of advertisers. In such formulations, the
emphasis is often less on the structure of the network itself than on its complexity as a large,
di↵use population that reacts in unexpected ways to the actions of central authorities. The
terminology of international conflict has come to reflect this as well: for example, the picture
of two opposing, state-supported armies gradually morphs, in U.S. Presidential speeches, into
maschi
femmine
Gruppi e connessioni nelle scuole d’infanzia Sorgente: An A=rac>on Network in a Fourth Grade Class (Moreno, ‘Who shall survive?’, 1934). Mappa dei contag sessuali Sorgente: The structure and function of complex networks, M. E.
J. Newman, SIAM Review 45, 167-256 (2003)
Gli archi rappresentano le relazioni stabilite nell’arco di 6 mesi Sorgente: T.A.B. Snijders, Social Network Analysis c Tom A.B. Snijders
Social network Analysis
examples: Political/Financial
Networks
•  Mark Lombardi: tracked and mapped global financial fiascos in the
1980s and 1990s (committed suicide 2000)
•  searched public sources such as news articles
•  drew networks by hand (some drawings as wide as 10ft)
•  Book: Hobbs, Robert. Mark Lombardi :global networks /Robert
Hobbs.. New York : Independent Curators International, c2003..
: departments"
"
: consultants"
"
: external experts"
La struTura di un’organizzazione Sorgente: www.orgnet.com Mappa delle connessioni tra aziende che condividono consiglieri d’amministrazione Sorgente: hTp://mappadelpotere.casaleggio.it/ •  Friendster
"Vizster: Visualizing Online Social
Networks." Jeffrey Heer and danah
boyd. IEEE Symposium on
Information Visualization (
InfoViz 2005).
Il “Grafo sociale” di Facebook!
“Six degrees of
Mohammed Atta”
Uncloaking
Terrorist Networks,
by Valdis Krebs
Small world Small World •  L’esperimento di Milgram – 
HOW TO TAKE PART IN THIS STUDY – 
1. ADD YOUR NAME TO THE ROSTER AT THE BOTTOM OF THIS SHEET, so that the next person who receives this leTer will know who it came from. – 
2. DETACH ONE POSTCARD. FILL IT AND RETURN IT TO HARVARD UNIVERSITY. No stamp is needed. The postcard is very important. It allows us to keep track of the progress of the folder as it moves toward the target person. – 
3. IF YOU KNOW THE TARGET PERSON ON A PERSONAL BASIS, MAIL THIS FOLDER DIRECTLY TO HIM (HER). Do this only if you have previously met the target person and know each other on a first name basis. – 
4. IF YOU DO NOT KNOW THE TARGET PERSON ON A PERSONAL BASIS, DO NOT TRY TO CONTACT HIM DIRECTLY. INSTEAD, MAIL THIS FOLDER (POST CARDS AND ALL) TO A PERSONAL ACQUAINTANCE WHO IS MORE LIKELY THAN YOU TO KNOW THE TARGET PERSON. You may send the folder to a friend, rela3ve or acquaintance, but it must be someone you know on a first name basis. •  La maggior parte delle leTere è andata persa –  TuTavia, un quarto delle leTere è giunto a des3nazione dopo 6 passaggi di mano •  6 Degrees of separa3on Informa3on Networks •  Re3 basate su conoscenza •  Veicolano informazione e collegamen3 II
Cita>on networks Networks in the real world
citation network
World−W
FIG. 4 The two best studied information net
citation network of academic papers in which
papers and the directed edges are citations
another. Since papers can only cite those th
them (lower down in the figure) the graph is
no closed loops. Right: the World Wide We
Scambi di email tra 436 impiega> di Hewle= Packard Research Lab Sorgente: hTp://www-­‐personal.umich.edu/~ladamic/img/
hplabsemailhierarchy.jpg II
Networks in the real world
citation network
World−Wide Web
FIG.
4 eb The two best studied information networks. Left: the
World W
ide W
citation network of academic papers in which the vertices are
papers and the directed edges are citations of one paper by
another. Since papers can only cite those that came before
them (lower down in the figure) the graph is acyclic—it has
no closed loops. Right: the World Wide Web, a network of
text pages accessible over the Internet, in which the vertices
are pages and the directed edges are hyperlinks. There are
othe
Worl
of sit
of clo
studi
the s
and
Web
distr
inter
On
our d
whic
othe
of th
A pa
and
craw
the m
gests
Stanford
MIT
homophily: what attributes are predictive of friendship?
group cohesion
Source: Lada A. Adamic and Eytan Adar, ‘Friends and neighbors on the web’, Social Networks, 25(3):211-­‐230, July 2003. •  Wordnet
sorgente: h=p://wordnet.princeton.edu/man/wnlicens.7WN Movie ra>ngs: preferen>al graphs Sorgente:hTps://wiki.cs.umd.edu/cmsc734_09/index.php?
3tle=Analysis_of_MovieLens_ra3ng_network_using_a_novel_Bipar3te_
Graph_Layout Technological networks •  Re3 costruite 3picamente per la distribu3on di qualche risorsa –  EleTricità –  informazione Power grid Airline connec3on network Sorgente: Northwest Airlines WorldTraveler Magazine
Railways Sorgente: TRTA, March 2003 - Tokyo rail map
Internet Sorgente: Bill Cheswick h=p://www.cheswick.com/ches/map/gallery/
index.html Biological Networks •  metabolic pathways –  Substra3 e prodog metabolici •  Archi rappresentano una reazione metabolica che dato un substrato produce un prodoTo •  Protein interac3on network •  Gene regulatory networks •  Catena alimentare Metabolic network Source: hTp://capsid.msu.montana.edu/
douglasgroup/index.php/complex-­‐
chemical-­‐networks/17-­‐metabolic-­‐
networks.html Interazioni delle proteine nel lievito •  gene regulatory networks
–  L’interazione tra geni
stabilisce la complessità
–  Riusciamo a predire cosa
produrrà l’inibizione di un
gene?
Source: http://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.en
Homo!
Sapiens!
Drosophila!
Melanogaster!
Sistemi complessi!
Molti elements connessi da differenti interactions."
"
"
!
I
3
Introduction
La catena alimentare Sorgente: Newman, The structure and func3on of complex networks, SIAM Review 45, 167-­‐256 (2003) Approcci all’analisi di re3 Domande tradizionali:
Social Networks:
Chi è la persona più
importante nella
rete?
Graph Theory:
Esiste un ciclo nella rete
che tocca i nodi una
volta sola?
Complex Systems :
Quale frazione di nodi va rimossa per
disconnettere il grafo?
Quali tipi di strutture emergono in corrispondenza
di leggi (quasi) semplici per l’evoluzione?
Aree di ricerca Complessità struTurale • 
• 
• 
Le connessioni possono essere semplici o complesse La rete può annoverare classi di nodi differen3 Gli archi possono essere eterogenei con pesi, direzioni e segni differen3 Complessità dinamica • 
• 
Dinamica sulla rete: processi che avvengono sulla rete. Esempi: diffusione della mala0a, sincronizzazione Dinamica della rete: come evolve la rete? Alcuni numeri… biological
technological
information
social
10
network
film actors
company directors
math coauthorship
physics coauthorship
biology coauthorship
telephone call graph
email messages
email address books
student relationships
sexual contacts
WWW nd.edu
WWW Altavista
citation network
Roget’s Thesaurus
word co-occurrence
Internet
power grid
train routes
software packages
software classes
electronic circuits
peer-to-peer network
metabolic network
protein interactions
marine food web
freshwater food web
neural network
type
undirected
undirected
undirected
undirected
undirected
undirected
directed
directed
undirected
undirected
directed
directed
directed
directed
undirected
undirected
undirected
undirected
directed
directed
undirected
undirected
undirected
undirected
directed
directed
directed
n
449 913
7 673
253 339
52 909
1 520 251
47 000 000
59 912
16 881
573
2 810
269 504
203 549 046
783 339
1 022
460 902
10 697
4 941
587
1 439
1 377
24 097
880
765
2 115
135
92
307
m
25 516 482
55 392
496 489
245 300
11 803 064
80 000 000
86 300
57 029
477
z
113.43
14.44
3.92
9.27
15.53
3.16
1.44
3.38
1.66
1 497 135
2 130 000 000
6 716 198
5 103
17 000 000
31 992
6 594
19 603
1 723
2 213
53 248
1 296
3 686
2 240
598
997
2 359
5.55
10.46
8.57
4.99
70.13
5.98
2.67
66.79
1.20
1.61
4.34
1.47
9.64
2.12
4.43
10.84
7.68
!
3.48
4.60
7.57
6.19
4.92
4.95
5.22
16.01
11.27
16.18
4.87
3.31
18.99
2.16
2.42
1.51
11.05
4.28
2.56
6.80
2.05
1.90
3.97
α
2.3
–
–
–
–
2.1
1.5/2.0
–
–
3.2
2.1/2.4
2.1/2.7
3.0/–
–
2.7
2.5
–
–
1.6/1.4
–
3.0
2.1
2.2
2.4
–
–
–
C (1)
0.20
0.59
0.15
0.45
0.088
C (2)
0.78
0.88
0.34
0.56
0.60
0.17
0.005
0.16
0.13
0.001
0.092
−0.029
0.11
0.29
−0.067
0.13
0.15
0.44
0.39
0.080
0.69
0.082
0.012
0.030
0.011
0.67
0.071
0.23
0.087
0.28
0.035
0.10
0.070
0.033
0.010
0.012
0.090
0.072
0.16
0.20
0.18
r
0.208
0.276
0.120
0.363
0.127
0.157
−0.189
−0.003
−0.033
−0.016
−0.119
−0.154
−0.366
−0.240
−0.156
−0.263
−0.326
−0.226
Ref(s).
20, 416
105, 323
107, 182
311, 313
311, 313
8, 9
136
321
45
265, 266
14, 34
74
351
244
119, 157
86, 148
416
366
318
395
155
6, 354
214
212
204
272
416, 421
TABLE II Basic statistics for a number of published networks. The properties measured are: type of graph, directed or undirected; total number of vertices n; total
number of edges m; mean degree z; mean vertex–vertex distance !; exponent α of degree distribution if the distribution follows a power law (or “–” if not; in/out-degree
exponents are given for directed graphs); clustering coefficient C (1) from Eq. (3); clustering coefficient C (2) from Eq. (6); and degree correlation coefficient r, Sec. III.F.
The last column gives the citation(s) for the network in the bibliography. Blank entries indicate unavailable data.
The Structure and Dynamics of Networks Modello di diffusione della pandemia H1N1 Sorgente: Newmann, Barabasi, WaTs, The Structure and Dynamics of Networks L’evoluzione delle re3 sociali biological
technological
information
social
10
• 
network
film actors
company directors
math coauthorship
physics coauthorship
biology coauthorship
telephone call graph
email messages
email address books
student relationships
sexual contacts
WWW nd.edu
WWW Altavista
citation network
Roget’s Thesaurus
word co-occurrence
Internet
power grid
train routes
software packages
software classes
electronic circuits
peer-to-peer network
metabolic network
protein interactions
marine food web
freshwater food web
neural network
type
undirected
undirected
undirected
undirected
undirected
undirected
directed
directed
undirected
undirected
directed
directed
directed
directed
undirected
undirected
undirected
undirected
directed
directed
undirected
undirected
undirected
undirected
directed
directed
directed
n
449 913
7 673
253 339
52 909
1 520 251
47 000 000
59 912
16 881
573
2 810
269 504
203 549 046
783 339
1 022
460 902
10 697
4 941
587
1 439
1 377
24 097
880
765
2 115
135
92
307
m
25 516 482
55 392
496 489
245 300
11 803 064
80 000 000
86 300
57 029
477
z
113.43
14.44
3.92
9.27
15.53
3.16
1.44
3.38
1.66
1 497 135
2 130 000 000
6 716 198
5 103
17 000 000
31 992
6 594
19 603
1 723
2 213
53 248
1 296
3 686
2 240
598
997
2 359
5.55
10.46
8.57
4.99
70.13
5.98
2.67
66.79
1.20
1.61
4.34
1.47
9.64
2.12
4.43
10.84
7.68
!
3.48
4.60
7.57
6.19
4.92
4.95
5.22
16.01
11.27
16.18
4.87
3.31
18.99
2.16
2.42
1.51
11.05
4.28
2.56
6.80
2.05
1.90
3.97
α
2.3
–
–
–
–
2.1
1.5/2.0
–
–
3.2
2.1/2.4
2.1/2.7
3.0/–
–
2.7
2.5
–
–
1.6/1.4
–
3.0
2.1
2.2
2.4
–
–
–
C (1)
0.20
0.59
0.15
0.45
0.088
C (2)
0.78
0.88
0.34
0.56
0.60
0.17
0.005
0.16
0.13
0.001
0.092
−0.029
0.11
0.29
−0.067
0.13
0.15
0.44
0.39
0.080
0.69
0.082
0.012
0.030
0.011
0.67
0.071
0.23
0.087
0.28
0.035
0.10
0.070
0.033
0.010
0.012
0.090
0.072
0.16
0.20
0.18
Sorgente: Newmann, Barabasi, WaTs, The Structure and Dynamics of Networks r
0.208
0.276
0.120
0.363
0.127
0.157
−0.189
−0.003
−0.033
−0.016
−0.119
−0.154
−0.366
−0.240
−0.156
−0.263
−0.326
−0.226
Ref(s).
20, 416
105, 323
107, 182
311, 313
311, 313
8, 9
136
321
45
265, 266
14, 34
74
351
244
119, 157
86, 148
416
366
318
395
155
6, 354
214
212
204
272
416, 421
TABLE II Basic statistics for a number of published networks. The properties measured are: type of graph, directed or undirected; total number of vertices n; total
number of edges m; mean degree z; mean vertex–vertex distance !; exponent α of degree distribution if the distribution follows a power law (or “–” if not; in/out-degree
exponents are given for directed graphs); clustering coefficient C (1) from Eq. (3); clustering coefficient C (2) from Eq. (6); and degree correlation coefficient r, Sec. III.F.
The last column gives the citation(s) for the network in the bibliography. Blank entries indicate unavailable data.
Una ques3one di scale •  Come si rapportano le re3 massive agli studi su scale “ges3bili” –  Valore aggiunto: si osservano fenomeni globali invisibili su piccola scala –  problema3che: significa3vità a livello locale. E’ facile misurare, è difficile capire cosa misurare