Machine Learning in the Cloud with GraphLab

author: Carlos Guestrin, Computer Science Department, Carnegie Mellon University
published: Jan. 13, 2011,   recorded: December 2010,   views: 960
Categories
You might be experiencing some problems with Your Video player.

Slides

Slides
0:00 Machine Learning in the Cloud
0:32 In ML we face BIGproblems
1:08 Exponential Parallelism
1:43 The Challenges of Parallelism
2:19 Our Current Solution
3:12 MapReduce–Map Phase (1)
3:27 MapReduce–Map Phase (2)
3:31 MapReduce–Map Phase (3)
3:35 MapReduce–Reduce Phase
3:59 MapReduceand ML
4:19 Iterative Algorithms?
5:03 MapAbuse: Iterative MapReduce (1)
5:15 MapAbuse: Iterative MapReduce (2)
5:31 Data-Parallel Algorithms can be Inefficient
7:11 What about structured Problems?
8:07 Parallel Computing
8:52 Common Properties
9:40 Gibbs Sampling
10:41 GraphLabis the Solution
11:45 A New Framework for Parallel Machine Learning
11:55 GraphLab
12:48 Part 1: Data Graph
13:25 Update Functions
14:03 Update Function Schedule
14:20 Part 2: Update Function Schedule
14:32 Need for Dynamic Scheduling
15:04 Dynamic Schedule (1)
15:27 Dynamic Schedule (2)
16:31 Global Information
16:55 Part 3: Shared Data Table (SDT)
17:15 Sync Operation
17:56 Shared Data Table (SDT)
18:24 Safety and Consistency
18:43 Write-Write Race
19:51 Race Conditions + Deadlocks
20:11 Part 4: Scope Rules
21:04 Full Consistency
21:19 Obtaining More Parallelism
21:44 Edge Consistency
21:51 Obtaining More Parallelism
22:07 Thm: Sequential Consistency
23:18 GraphLab
23:47 Multicore Experiments
23:51 Multicore Experiments - Shared memory
24:11 Graphical Model Learning
24:59 Graphical Model Learning
26:27 Lasso
28:05 Full Consistency
28:33 Relaxing Consistency
29:08 CoEM (Rosie Jones, 2005) (1)
29:15 Relaxing Consistency
29:27 CoEM (Rosie Jones, 2005) (1)
31:00 CoEM (Rosie Jones, 2005) (2)
31:50 GraphLabin the Cloud
32:07 Moving towards the cloud…
32:59 Addressing cloud computing challenges
34:03 GraphLabin the Cloud Experiments
34:49 Experiment Setup
35:20 CoEM (Rosie Jones, 2005) (3)
38:51 Video Cosegmentation (1)
39:17 Video Cosegmentation (2)
39:47 Video Cosegmentation (3)
40:19 Video Co-Segmentation
40:52 Cost-Time Tradeoff
42:04 Bayesian Tensor Factorization (1)
42:39 Bayesian Tensor Factorization (2)
43:40 Parallel GraphLab1.1 MulticoreAvailable TodayGraphLabin the Cloud soon…
44:01 GraphLabRelease 1.1
44:23 C++, Java and Python
44:35 Matlab Interface (1)
44:52 Matlab Interface (2)
45:17 GraphLab (1)
45:33 GraphLab (2)
45:52 Future Work
46:13 Parallel GraphLab1.1 MulticoreAvailable TodayGraphLabin the Cloud soon…
46:27 Questions
51:43 Bayesian Tensor Factorization (2)

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
 
    Delicious Bibliography

Description

Exponentially increasing dataset sizes have driven Machine Learning experts to explore using parallel and distributed computing for their research. Furthermore, cloud computing resources such as Amazon EC2 have become increasingly available, providing cheap and scalable platforms for large scale computation. However, due to the complexities involved in distributed design, it can be difficult for ML researchers to take full advantage of cloud resources. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges.

By targeting common patterns in ML, we developed GraphLab, which compactly expresses asynchronous iterative algorithms with sparse computational dependencies common in ML, while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions for a variety of ML tasks, including learning graphical models with approximate inference, Gibbs sampling, tensor factorization, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large-scale real-world problems and demonstrate their scalability on Amazon EC2, using up to 256 processors.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: