Lecture 1 - Introduzione.pptx
Transcript
Lecture 1 - Introduzione.pptx
(Laboratorio di ) Sistemi Informa3ci Avanza3 Giuseppe Manco Riconoscimen3 • Il materiale per questo corso è basato sulle seguen3 fon3 – M. E. J. Newman • The structure and func3on of complex networks, SIAM Review 45, 167-‐256 (2003) • hTp://arxiv.org/abs/cond-‐mat/0303516 – Lada A. Adamic • hTp://open.umich.edu/educa3on/si/si508/fall2008 – Albert-‐László Barabási • hTp://barabasilab.neu.edu/courses/phys5116/ – David Easley, Jon Kleinberg • Networks, Crowds, and Markets, Cambridge University Press (2010) • hTp://www.cs.cornell.edu/home/kleinber/networks-‐book/ – … E tanto altro materiale che verrà indicato deTagliatamente – Un ringraziamento speciale a ques3 ricercatori, senza il cui lavoro la struTurazione delle presentazioni sarebbe stata estremamente fa3cosa Outline • • • • Network data (Richiami di) Teoria dei grafi Visual analy3cs Modelli matema3ci per la modellazione delle re3 Outline • • • • Evoluzione Centralità, importanza, coesione Communi3es Modelli a blocchi Outline • • • • • • • Omofilia ed influenza Trust Link Predic3on Recommenda3on Topic Models En3ty Resolu3on Collec3ve intelligence Obiegvi • Introdurre proprietà, modelli e tools per la modellazione e l’analisi di grandi re3 real-‐life • Trovare paTerns, regole, clusters, outliers,… – In grafi sta3ci e dinamici • StruTura e dinamica Perché le re3? • Le re3 sono in ogni dominio – Una rete è una qualsiasi collezione di oggeg in cui alcune coppie di tali oggeg sono connesse da links – Dietro ogni sistema complesso c’è una rete, che definisce le interazioni tra I componen3 • Focus differen3 – Inizialmente, analisi di singoli grafi e proprietà di ver3ci e archi singoli – Computers e re3 di comunicazione hanno portato a re3 con milioni (miliardi) di ver3ci • StruTura e dinamica – Re3 con milioni di nodi richiedono tecniche anali3che ad hoc Perché le re3? • Cosa si oTerrebbe dalla modellazione delle re3? – PaTerns e proprietà sta3s3che sui da3 – Modelli e principi di progeTazione – Capire l’organizzazione intrinseca delle re3, e quindi predire il comportamento di sistemi complessi Notazione “Network” ≡ “Graph” node edge points lines vertices edges, arcs math nodes links computer science sites bonds physics actors ties, relations sociology Tipi di re3 • • • • Re3 sociali Re3 di informazione Re3 tecnologiche Re3 biologiche Re3 Sociali • Un insieme di persone o gruppi di persone con qualche paTern di contaTo o interazione tra di esse – PaTerns di amicizia tra individui – Relazioni economiche tra organizzazioni – rappor3 familiari – … 2 CHAPTER 1. OVERVIEW 27 23 15 10 20 16 4 31 30 13 34 11 14 6 21 1 9 33 29 12 7 3 18 19 2 28 25 17 5 22 8 24 32 26 Figure 1.1: The social network of friendships within a 34-person karate club [421]. La rete sociale delle amicizie tra 34 persone di un karate club The imagery of networks has made its way into many other lines of discussion as well: sorgente: Easley, operations Kleinberg: Networks, Crowds, and M Cambridge Global manufacturing now have networks ofarkets, suppliers, Web sites have networks University Press (2010) of users, and media companies have networks of advertisers. In such formulations, the emphasis is often less on the structure of the network itself than on its complexity as a large, di↵use population that reacts in unexpected ways to the actions of central authorities. The terminology of international conflict has come to reflect this as well: for example, the picture of two opposing, state-supported armies gradually morphs, in U.S. Presidential speeches, into maschi femmine Gruppi e connessioni nelle scuole d’infanzia Sorgente: An A=rac>on Network in a Fourth Grade Class (Moreno, ‘Who shall survive?’, 1934). Mappa dei contag sessuali Sorgente: The structure and function of complex networks, M. E. J. Newman, SIAM Review 45, 167-256 (2003) Gli archi rappresentano le relazioni stabilite nell’arco di 6 mesi Sorgente: T.A.B. Snijders, Social Network Analysis c Tom A.B. Snijders Social network Analysis examples: Political/Financial Networks • Mark Lombardi: tracked and mapped global financial fiascos in the 1980s and 1990s (committed suicide 2000) • searched public sources such as news articles • drew networks by hand (some drawings as wide as 10ft) • Book: Hobbs, Robert. Mark Lombardi :global networks /Robert Hobbs.. New York : Independent Curators International, c2003.. : departments" " : consultants" " : external experts" La struTura di un’organizzazione Sorgente: www.orgnet.com Mappa delle connessioni tra aziende che condividono consiglieri d’amministrazione Sorgente: hTp://mappadelpotere.casaleggio.it/ • Friendster "Vizster: Visualizing Online Social Networks." Jeffrey Heer and danah boyd. IEEE Symposium on Information Visualization ( InfoViz 2005). Il “Grafo sociale” di Facebook! “Six degrees of Mohammed Atta” Uncloaking Terrorist Networks, by Valdis Krebs Small world Small World • L’esperimento di Milgram – HOW TO TAKE PART IN THIS STUDY – 1. ADD YOUR NAME TO THE ROSTER AT THE BOTTOM OF THIS SHEET, so that the next person who receives this leTer will know who it came from. – 2. DETACH ONE POSTCARD. FILL IT AND RETURN IT TO HARVARD UNIVERSITY. No stamp is needed. The postcard is very important. It allows us to keep track of the progress of the folder as it moves toward the target person. – 3. IF YOU KNOW THE TARGET PERSON ON A PERSONAL BASIS, MAIL THIS FOLDER DIRECTLY TO HIM (HER). Do this only if you have previously met the target person and know each other on a first name basis. – 4. IF YOU DO NOT KNOW THE TARGET PERSON ON A PERSONAL BASIS, DO NOT TRY TO CONTACT HIM DIRECTLY. INSTEAD, MAIL THIS FOLDER (POST CARDS AND ALL) TO A PERSONAL ACQUAINTANCE WHO IS MORE LIKELY THAN YOU TO KNOW THE TARGET PERSON. You may send the folder to a friend, rela3ve or acquaintance, but it must be someone you know on a first name basis. • La maggior parte delle leTere è andata persa – TuTavia, un quarto delle leTere è giunto a des3nazione dopo 6 passaggi di mano • 6 Degrees of separa3on Informa3on Networks • Re3 basate su conoscenza • Veicolano informazione e collegamen3 II Cita>on networks Networks in the real world citation network World−W FIG. 4 The two best studied information net citation network of academic papers in which papers and the directed edges are citations another. Since papers can only cite those th them (lower down in the figure) the graph is no closed loops. Right: the World Wide We Scambi di email tra 436 impiega> di Hewle= Packard Research Lab Sorgente: hTp://www-‐personal.umich.edu/~ladamic/img/ hplabsemailhierarchy.jpg II Networks in the real world citation network World−Wide Web FIG. 4 eb The two best studied information networks. Left: the World W ide W citation network of academic papers in which the vertices are papers and the directed edges are citations of one paper by another. Since papers can only cite those that came before them (lower down in the figure) the graph is acyclic—it has no closed loops. Right: the World Wide Web, a network of text pages accessible over the Internet, in which the vertices are pages and the directed edges are hyperlinks. There are othe Worl of sit of clo studi the s and Web distr inter On our d whic othe of th A pa and craw the m gests Stanford MIT homophily: what attributes are predictive of friendship? group cohesion Source: Lada A. Adamic and Eytan Adar, ‘Friends and neighbors on the web’, Social Networks, 25(3):211-‐230, July 2003. • Wordnet sorgente: h=p://wordnet.princeton.edu/man/wnlicens.7WN Movie ra>ngs: preferen>al graphs Sorgente:hTps://wiki.cs.umd.edu/cmsc734_09/index.php? 3tle=Analysis_of_MovieLens_ra3ng_network_using_a_novel_Bipar3te_ Graph_Layout Technological networks • Re3 costruite 3picamente per la distribu3on di qualche risorsa – EleTricità – informazione Power grid Airline connec3on network Sorgente: Northwest Airlines WorldTraveler Magazine Railways Sorgente: TRTA, March 2003 - Tokyo rail map Internet Sorgente: Bill Cheswick h=p://www.cheswick.com/ches/map/gallery/ index.html Biological Networks • metabolic pathways – Substra3 e prodog metabolici • Archi rappresentano una reazione metabolica che dato un substrato produce un prodoTo • Protein interac3on network • Gene regulatory networks • Catena alimentare Metabolic network Source: hTp://capsid.msu.montana.edu/ douglasgroup/index.php/complex-‐ chemical-‐networks/17-‐metabolic-‐ networks.html Interazioni delle proteine nel lievito • gene regulatory networks – L’interazione tra geni stabilisce la complessità – Riusciamo a predire cosa produrrà l’inibizione di un gene? Source: http://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.en Homo! Sapiens! Drosophila! Melanogaster! Sistemi complessi! Molti elements connessi da differenti interactions." " " ! I 3 Introduction La catena alimentare Sorgente: Newman, The structure and func3on of complex networks, SIAM Review 45, 167-‐256 (2003) Approcci all’analisi di re3 Domande tradizionali: Social Networks: Chi è la persona più importante nella rete? Graph Theory: Esiste un ciclo nella rete che tocca i nodi una volta sola? Complex Systems : Quale frazione di nodi va rimossa per disconnettere il grafo? Quali tipi di strutture emergono in corrispondenza di leggi (quasi) semplici per l’evoluzione? Aree di ricerca Complessità struTurale • • • Le connessioni possono essere semplici o complesse La rete può annoverare classi di nodi differen3 Gli archi possono essere eterogenei con pesi, direzioni e segni differen3 Complessità dinamica • • Dinamica sulla rete: processi che avvengono sulla rete. Esempi: diffusione della mala0a, sincronizzazione Dinamica della rete: come evolve la rete? Alcuni numeri… biological technological information social 10 network film actors company directors math coauthorship physics coauthorship biology coauthorship telephone call graph email messages email address books student relationships sexual contacts WWW nd.edu WWW Altavista citation network Roget’s Thesaurus word co-occurrence Internet power grid train routes software packages software classes electronic circuits peer-to-peer network metabolic network protein interactions marine food web freshwater food web neural network type undirected undirected undirected undirected undirected undirected directed directed undirected undirected directed directed directed directed undirected undirected undirected undirected directed directed undirected undirected undirected undirected directed directed directed n 449 913 7 673 253 339 52 909 1 520 251 47 000 000 59 912 16 881 573 2 810 269 504 203 549 046 783 339 1 022 460 902 10 697 4 941 587 1 439 1 377 24 097 880 765 2 115 135 92 307 m 25 516 482 55 392 496 489 245 300 11 803 064 80 000 000 86 300 57 029 477 z 113.43 14.44 3.92 9.27 15.53 3.16 1.44 3.38 1.66 1 497 135 2 130 000 000 6 716 198 5 103 17 000 000 31 992 6 594 19 603 1 723 2 213 53 248 1 296 3 686 2 240 598 997 2 359 5.55 10.46 8.57 4.99 70.13 5.98 2.67 66.79 1.20 1.61 4.34 1.47 9.64 2.12 4.43 10.84 7.68 ! 3.48 4.60 7.57 6.19 4.92 4.95 5.22 16.01 11.27 16.18 4.87 3.31 18.99 2.16 2.42 1.51 11.05 4.28 2.56 6.80 2.05 1.90 3.97 α 2.3 – – – – 2.1 1.5/2.0 – – 3.2 2.1/2.4 2.1/2.7 3.0/– – 2.7 2.5 – – 1.6/1.4 – 3.0 2.1 2.2 2.4 – – – C (1) 0.20 0.59 0.15 0.45 0.088 C (2) 0.78 0.88 0.34 0.56 0.60 0.17 0.005 0.16 0.13 0.001 0.092 −0.029 0.11 0.29 −0.067 0.13 0.15 0.44 0.39 0.080 0.69 0.082 0.012 0.030 0.011 0.67 0.071 0.23 0.087 0.28 0.035 0.10 0.070 0.033 0.010 0.012 0.090 0.072 0.16 0.20 0.18 r 0.208 0.276 0.120 0.363 0.127 0.157 −0.189 −0.003 −0.033 −0.016 −0.119 −0.154 −0.366 −0.240 −0.156 −0.263 −0.326 −0.226 Ref(s). 20, 416 105, 323 107, 182 311, 313 311, 313 8, 9 136 321 45 265, 266 14, 34 74 351 244 119, 157 86, 148 416 366 318 395 155 6, 354 214 212 204 272 416, 421 TABLE II Basic statistics for a number of published networks. The properties measured are: type of graph, directed or undirected; total number of vertices n; total number of edges m; mean degree z; mean vertex–vertex distance !; exponent α of degree distribution if the distribution follows a power law (or “–” if not; in/out-degree exponents are given for directed graphs); clustering coefficient C (1) from Eq. (3); clustering coefficient C (2) from Eq. (6); and degree correlation coefficient r, Sec. III.F. The last column gives the citation(s) for the network in the bibliography. Blank entries indicate unavailable data. The Structure and Dynamics of Networks Modello di diffusione della pandemia H1N1 Sorgente: Newmann, Barabasi, WaTs, The Structure and Dynamics of Networks L’evoluzione delle re3 sociali biological technological information social 10 • network film actors company directors math coauthorship physics coauthorship biology coauthorship telephone call graph email messages email address books student relationships sexual contacts WWW nd.edu WWW Altavista citation network Roget’s Thesaurus word co-occurrence Internet power grid train routes software packages software classes electronic circuits peer-to-peer network metabolic network protein interactions marine food web freshwater food web neural network type undirected undirected undirected undirected undirected undirected directed directed undirected undirected directed directed directed directed undirected undirected undirected undirected directed directed undirected undirected undirected undirected directed directed directed n 449 913 7 673 253 339 52 909 1 520 251 47 000 000 59 912 16 881 573 2 810 269 504 203 549 046 783 339 1 022 460 902 10 697 4 941 587 1 439 1 377 24 097 880 765 2 115 135 92 307 m 25 516 482 55 392 496 489 245 300 11 803 064 80 000 000 86 300 57 029 477 z 113.43 14.44 3.92 9.27 15.53 3.16 1.44 3.38 1.66 1 497 135 2 130 000 000 6 716 198 5 103 17 000 000 31 992 6 594 19 603 1 723 2 213 53 248 1 296 3 686 2 240 598 997 2 359 5.55 10.46 8.57 4.99 70.13 5.98 2.67 66.79 1.20 1.61 4.34 1.47 9.64 2.12 4.43 10.84 7.68 ! 3.48 4.60 7.57 6.19 4.92 4.95 5.22 16.01 11.27 16.18 4.87 3.31 18.99 2.16 2.42 1.51 11.05 4.28 2.56 6.80 2.05 1.90 3.97 α 2.3 – – – – 2.1 1.5/2.0 – – 3.2 2.1/2.4 2.1/2.7 3.0/– – 2.7 2.5 – – 1.6/1.4 – 3.0 2.1 2.2 2.4 – – – C (1) 0.20 0.59 0.15 0.45 0.088 C (2) 0.78 0.88 0.34 0.56 0.60 0.17 0.005 0.16 0.13 0.001 0.092 −0.029 0.11 0.29 −0.067 0.13 0.15 0.44 0.39 0.080 0.69 0.082 0.012 0.030 0.011 0.67 0.071 0.23 0.087 0.28 0.035 0.10 0.070 0.033 0.010 0.012 0.090 0.072 0.16 0.20 0.18 Sorgente: Newmann, Barabasi, WaTs, The Structure and Dynamics of Networks r 0.208 0.276 0.120 0.363 0.127 0.157 −0.189 −0.003 −0.033 −0.016 −0.119 −0.154 −0.366 −0.240 −0.156 −0.263 −0.326 −0.226 Ref(s). 20, 416 105, 323 107, 182 311, 313 311, 313 8, 9 136 321 45 265, 266 14, 34 74 351 244 119, 157 86, 148 416 366 318 395 155 6, 354 214 212 204 272 416, 421 TABLE II Basic statistics for a number of published networks. The properties measured are: type of graph, directed or undirected; total number of vertices n; total number of edges m; mean degree z; mean vertex–vertex distance !; exponent α of degree distribution if the distribution follows a power law (or “–” if not; in/out-degree exponents are given for directed graphs); clustering coefficient C (1) from Eq. (3); clustering coefficient C (2) from Eq. (6); and degree correlation coefficient r, Sec. III.F. The last column gives the citation(s) for the network in the bibliography. Blank entries indicate unavailable data. Una ques3one di scale • Come si rapportano le re3 massive agli studi su scale “ges3bili” – Valore aggiunto: si osservano fenomeni globali invisibili su piccola scala – problema3che: significa3vità a livello locale. E’ facile misurare, è difficile capire cosa misurare