A note of caution regarding distances on graphs
published: July 20, 2010, recorded: June 2010, views: 4547
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Non-geometric data is often represented in form of a graph where edges represent similarity or local relationships between instances. One elegant way to exploit the global structure of the graph is implemented by the commute distance (also known as resistance distance). Supposedly it has the property that vertices from the same cluster are "close" to each other whereas vertices from different clusters are "far" from each other. We study the behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take into account the structure of the graph and that is completely meaningless as a distance function on the graph. Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. We suspect that a similar behavior holds for several other distances on graphs.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !