video thumbnail

Architectures for distributed mining of big data

Published on 2017-01-311284 Views

Albert Bifet

Mining Big and Complex Data 2016 - Ohrid

Related categories

Computer Science Data Science Computer Science Computer Science

Presentation

Architectures for Distributed Mining of Big Data00:00

Big Data - 101:06

Big Data - 201:28

Big Data 6V’s02:02

Controversy of Big Data02:22

Motivation MapReduce04:44

How Many Servers Does Google Have?05:26

Typical Big Data Challenges05:39

Jeff Dean07:07

Jeff Dean Facts - 107:59

Jeff Dean Facts - 208:17

MapReduce08:37

References08:42

Numbers Everyone Should Know (Jeff Dean)09:03

Typical Big Data Problem10:27

Functional Programming11:10

Map and Reduce functions11:35

Simplified view of MapReduce12:49

An Example Application: Word Count13:49

WordCount Example14:08

Simple MapReduce Variations15:03

MapReduce Framework - 116:14

MapReduce Framework - 217:20

Fault Tolerance18:08

Complete MapReduce Framework19:05

Partitioners and Combiners19:49

MapReduce Algorithms20:16

Simple MapReduce Algorithms - 120:19

Simple MapReduce Algorithms - 221:03

WordCount Example Revisited - 121:56

WordCount Example Revisited - 222:28

WordCount Example Revisited - 323:00

Average Computing Example - 123:31

Average Computing Example - 224:05

Average Computing Example - 325:11

Monoidify!25:51

Average Computing Example - 426:17

MapReduce Big Data Processing26:34

Apache Flink Motivation - 127:29

Apache Flink Motivation - 227:33

Real time computation: streaming computation28:00

Easy to Write Code - 128:49

Easy to Write Code - 229:35

What is Apache Flink?30:07

Batch and Streaming Engines31:35

Batch Comparison32:04

Streaming Comparison32:32

Spark Motivation33:17

Apache Spark - 133:20

What is Apache Spark33:52

Spark Ecosystem34:22

Spark API35:03

Apache Spark - 235:27

Apache Spark Project35:32

Resilient Distributed Datasets (RDDs)36:14

Spark API: Parallel Collections36:33

Spark API: External Datasets36:46

Spark API: RDD Operations36:57

Apache Spark Streaming37:15

Discretized Streams (DStreams) - 137:41

Discretized Streams (DStreams) - 237:55

Spark Streaming38:17

Spark SQL and DataFrames38:40

Spark Machine Learning Libraries - 139:34

Spark Machine Learning Libraries - 240:30

Spark GraphX - 140:48

Spark GraphX - 241:09

Apache Spark Summary42:37

Apache Kafka44:15

Apache Kafka from LinkedIn - 144:19

Apache Kafka from LinkedIn - 245:28

Apache Kafka from LinkedIn - 345:48

Apache Storm - 146:18

Apache S4 from Yahoo46:19

Apache Storm - 247:28

Storm47:35

Google Cloud DataFlow47:58

Google 200448:04

Google June 201448:24

Google Cloud Data Flow - 149:01

Google Cloud Data Flow - 249:22

Google Cloud Data Flow Paper50:12

Google Cloud Data Flow - 350:29

Apache Beam - 150:51

Apache Beam - 251:14

Architectures51:44

Lambda Architecture51:47

Kappa Architecture52:08

Samoa52:27

Thanks53:06