Processing Linked Data at Warp Speed
published: Oct. 19, 2014, recorded: September 2014, views: 1816
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
The Web of Data has grown immensely over the past years. From only one dataset in 2007 the linked portion of the Open Data Cloud has grown to over 31 billion triples (in 2011) usually shown in the diagrams and a plethora of open data sets published by individuals, organizations and governments all over the world usually not shown. Given this immense growth the question arises how to process these data. Even if you can process 10’000 triples per second it will still take more than 861 hours to process the whole cloud… so algorithms traveling (or traversing) the linked data cloud using conventional methods are going to be slow.
In this talk I will talk about two methods for processing large numbers of triples. First, I will introduce the distributed graph-processing framework Signa/Collect, which allows to process billions of edges in seconds. I will highlight the usefulness of the framework in 3 application scenarios. Second, I will briefly touch upon the need and challenges when processing large graphs as data-streams, where the actual data is not stored but only the portions necessary for processing are kept.
Download slides: eswc2014_bernstein_processing_linked_data_01.pdf (83.0 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !