Inference and Learning with Networked Data
Description
In many applications we would like to draw inferences about entities that are
interconnected in complex networks. For example, calls, emails, IM, and web
pointers link people into huge social networks. However, traditional statistical and
machine learning classification methods assume that entities are independent of each
other. I start by discussing various applications of "classification" (scoring) in
networked data, from fraud detection to counterterrorism to network-based marketing.
I then discuss four characteristics of networked data that allow improvements--
sometimes substantial--over traditional classification: (i) models can take into account
"guilt by association," (ii) inference can be performed "collectively," whereby
inferences on linked entities mutually reinforce each other, (iii) characteristics of
linked entities can be incorporated in models, and (iv) models can incorporate specific
identifiers, such as the identities of particular individuals, to improve inference. I
present results demonstrating the effectiveness of these techniques.
| Slides | |
| 0:00 | Inference and Learning with Networked Data |
| 0:16 | Modeling for prediction using networked data |
| 1:50 | Prediction in networked data |
| 3:11 | Prediction tasks in networked data (cf. Getoor Tutorial 2005) |
| 4:40 | Modeling for prediction |
| 5:05 | The problem: Prediction in Networked Data (1) |
| 5:24 | The problem: Prediction in Networked Data (2) |
| 6:22 | The problem: Prediction in Networked Data (3) |
| 6:55 | The problem: Prediction in Networked Data (4) |
| 7:21 | The problem: Prediction in Networked Data (5) |
| 9:18 | The problem: Prediction in Networked Data (6) |
| 9:43 | Example social network application: Target consumers for new product |
| 13:04 | Sales rates are substantially higher for “network neighbors” |
| 17:13 | More-sophisticated network-based attributes? |
| 17:42 | Cumulative % of Consumers Targeted (Ranked by Predicted Sales) |
| 17:53 | Example social network application: Ecommerce firms increasingly are collecting data on explicit social networks of consumers (1) |
| 19:18 | Example social network application: Ecommerce firms increasingly are collecting data on explicit social networks of consumers (2) |
| 19:31 | So, what’s different about networked data? |
| 19:44 | Unique Characteristics of Networked Data (for predictive inference) (1) |
| 20:10 | Unique Characteristics of Networked Data (for predictive inference) (2) |
| 20:44 | Guilt by association: autocorrelation relationship between labels* of neighboring nodes |
| 20:50 | How can predictive models incorporate network autocorrelation? (Part 0) |
| 24:05 | How can predictive models incorporate network autocorrelation? (Part 1) |
| 28:01 | Some univariate network classification techniques (see Macskassy & P. JMLR 2007) |
| 31:28 | How can predictive models incorporate network autocorrelation? (Part 2) |
| 34:04 | How can predictive models incorporate network autocorrelation? (Part 2, cont.) |
| 35:25 | How can predictive models incorporate network autocorrelation? (Part 2, cont.) |
| 37:59 | Is guilt-by-association justified theoretically? (1) |
| 38:59 | Is guilt-by-association justified theoretically? (2) |
| 39:33 | Is guilt-by-association justified theoretically? (3) |
| 42:01 | Is guilt-by-association justified theoretically? (4) |
| 42:42 | Is guilt-by-association justified theoretically? (5) |
| 43:18 | Unique Characteristics of Networked Data (for predictive inference) (1) |
| 44:03 | Unique Characteristics of Networked Data (for predictive inference) (2) |
| 44:10 | Various techniques for collective inference (see also Jensen et al. KDD 2004) |
| 46:50 | Collective inference cartoon: (1) |
| 47:12 | Collective inference cartoon: (2) |
| 47:47 | Collective inference cartoon: (3) |
| 48:10 | Collective inference cartoon: (4) |
| 48:11 | Collective inference cartoon: (5) |
| 48:12 | Collective inference cartoon: (6) |
| 49:39 | Collective inference cartoon: (7) |
| 49:47 | recall network-based marketing example? |
| 50:26 | Collective inference gives additional improvement, especially for non-network neighbors |
| 53:29 | So, how much “information” is in the network structure alone? |
| 54:04 | Network Classification Case Study |
| 56:04 | How much information is in the network structure? (1) |
| 57:59 | How much information is in the network structure? (2) |
| 60:42 | Univariate network classification techniques (see Macskassy & Provost 2007) (1) |
| 60:47 | Univariate network classification techniques (see Macskassy & Provost 2007) (2) |
| 60:49 | RBN vs wvRN Classifying linked documents (CoRA data) |
| 63:14 | Machine Learning Research Papers (from CoRA data) (1) |
| 64:00 | Machine Learning Research Papers (from CoRA data) (2) |
| 75:05 | Unique Characteristics of Networked Data (for predictive inference) |
| 75:11 | Networks ≠ Graphs? (1) |
| 75:22 | Unique Characteristics of Networked Data (for predictive inference) |
| 75:34 | Machine Learning Research Papers (from CoRA data) (2) |
| 75:45 | Unique Characteristics of Networked Data (for predictive inference) |
| 75:53 | Networks ≠ Graphs? (1) |
| 76:36 | Networks ≠ Graphs? (2) |
| 76:58 | Detecting “bad brokers” (NASD)(Neville et al. KDD 2005) |
| 78:00 | Data on brokers, branches, disclosures Neville et al. KDD 2005) |
| 78:39 | Relational Learning |
| 79:57 | Traditional Learning and Classification |
| 80:14 | Network Learning and Classification |
| 81:04 | Logic modeling |
| 82:33 | Network data in first-order logic |
| 85:49 | Probabilistic graphical models |
| 86:40 | Example: A Bayesian network modeling consumer reaction to new service |
| 88:41 | Probabilistic relational models |
| 90:11 | Relational prob. model of broker variables Neville & Jensen, JMLR to appear) |
| 91:38 | Important concept! |
| 92:45 | Recall: broker dependency network |
| 92:58 | Model unrolled on (tiny) data network |
| 93:16 | Putting it all together: Relational dependency networks (Neville & Jensen, JMLR 2007) |
| 93:44 | Model unrolled on (tiny) data network |
| 94:42 | Combining first-order logic and probabilistic graphical models (1) |
| 96:03 | Combining first-order logic and probabilistic graphical models (2) |
| 98:07 | Combining first-order logic and probabilistic graphical models (1) |
| 99:06 | Combining first-order logic and probabilistic graphical models (2) |
| 102:12 | A snippet from an actual network including “bad guys” |
| 103:01 | Side note: not just for “networked data” – id’s important for any data in a multi-table RDB |
| 103:44 | How to incorporate identifiers of related objects (in a nutshell) |
| 104:49 | Density Estimation for Aggregation |
| 108:38 | Classify buyers of most-common title from a Korean E-Book retailer |
| 110:19 | Machine Learning Research Papers (from CoRA data) |
| 111:24 | (recall CoRA from discussion of univariate network models) Using identifiers on CoRA |
| 113:33 | Summary: Unique Characteristics of Networked Data (for predictive inference) |
| 114:20 | http://pages.stern.nyu.edu/~fprovost/ |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





