Scale-out Beyond MapReduce

Published on 2013-09-2720673 Views

Raghu Ramakrishnan

The amount of data being collected is growing at a staggering pace. The default is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data

KDD 2013 - Chicago

Related categories

Presentation

Scale-out Beyond MapReduce00:00

Outline00:44

Cloud Information Services Lab (CISL)01:35

Big Data - What’s the big deal?02:23

What’s New?02:30

Challenges02:56

Web of Concepts04:01

An Example04:58

Content Optimization06:29

CORE Dashboard: Segment Heat Map07:48

Kinect09:08

Kinect-based Full Body Gait Analysis09:58

Connected devices will soon be EVERYWHERE10:17

HomeOS: Another Instance of IoT10:43

Big Data - Build it, they’re here already!12:29

One Slide MapReducePrimer - 113:02

One Slide MapReducePrimer - 213:27

One Slide MapReducePrimer - 313:31

One Slide MapReducePrimer - 413:33

One Slide MapReducePrimer - 513:48

One Slide MapReducePrimer - 613:50

One Slide MapReducePrimer - 713:53

One Slide MapReducePrimer - 814:08

One Slide MapReducePrimer - 914:11

One Slide MapReducePrimer - 1014:12

One Slide MapReducePrimer - 1114:15

The Digital Shoebox15:46

Microsoft21:00

A Common Vision22:10

How Far Away is Data?24:02

Compute Fabric: YARN24:09

Making YARN Easier to Use: REEF26:32

Digital Shoebox Architecture27:42

Motivation: Machine Learning Workflow29:19

Example: User Activity Modeling30:10

Feature and Target Windows30:54

User Modeling Pipeline31:26

Example Formation: SQL at Scale31:52

Learning a Language Classifier33:08

Scaling Model Building (30,000ft)34:02

The Challenge - 135:18

The Challenge - 235:51

Machine Learning Workflow37:08

Take-Away38:06

REEF: Retainable Evaluator Execution Framework38:24

What have we built on top of REEF?38:35

REEF in the Stack39:04

REEF: Computation and Data Management39:25

Running Example: Distributed Shell40:51

The REEF Control Flow - 141:02

The REEF Control Flow - 242:04

The REEF Control Flow - 342:16

The REEF Control Flow - 442:53

The REEF Control Flow - 542:56

The REEF Control Flow - 643:02

The REEF Control Flow - 743:03

The REEF Control Flow - 844:13

The REEF Control Flow - 944:36

The REEF Control Flow - 1045:07

The REEF Control Flow - 1145:13

REEF Control Flow: Summary45:17

Learning in REEF45:46

The Task: Learn a Regression Model45:51

Linear Models45:57

The Learning Algorithm: Batch Gradient Descent46:00

How It Maps to REEF46:20

How It Maps to REEF: Control Flow47:03

Contrast: Hadoop MapReduce47:34

Data Management Services47:58

Contributing to Apache48:12

There is More, But Not Today48:54

Conclusions49:02