Mining Massive Data Sets
Description
Today, the amount of data coming from all possible sources is
enormous and growing at a fast pace due, in large part, to the ubiquitous Web
and its increasing presence in our everyday life; but also to emails, cell phones,
credit cards, retail, finance ... These data serve all sorts of functions : from
query and search, to extracting information, providing services as well as
managing security. Many fields are involved : statistics, data mining, text
mining, data streams, search, social networks ... There is no lack of
sophisticated techniques produced by academic activity, where challenges
mostly deal with novelty, accuracy, and scalability of algorithms. However, in
real-world applications, challenges are quite different : scalability (usually one
or two orders of magnitude more than in academic publications), ease-of-use
and capability to integrate efficient techniques into working systems in a
transparent way, while always producing value for the customer. Real-world
solutions are complex and usually need to integrate many technical
components, from the various fields mentioned before: it thus becomes
important to assess how these fields can complement one another.
In the first part of the talk, I will present the challenges of real-world data
mining applications. I will introduce the general Statistical Learning Theory
framework and discuss some of the technical issues involved (large dimension
data sets, missing data, outliers, non-i.i.d. structured data, unlabelled data ...) In
the second part, I will show, taking examples from the implementation in
KXEN and applications developed, how a theoretical framework (Structural
Risk Minimization [1]) can be used to solve some of the challenges met in the
real-world. I will finally describe some open practical issues which will require
further theoretical investigation.
| Slides | |
| 0:00 | Mining Massive Data Sets |
| 0:07 | Agenda |
| 1:12 | A little bit of history – Data mining & NATO |
| 5:23 | A little bit of history |
| 6:30 | A little bit of history (1) |
| 7:41 | A little bit of history (2) |
| 10:03 | Data |
| 10:56 | A little bit of history (2) |
| 11:19 | Data |
| 14:25 | Data (1) |
| 16:57 | Data (2) |
| 17:43 | Data (3) |
| 19:37 | Yahoo! Data – A league of its own … |
| 21:36 | Functions |
| 25:11 | Map of the workshop |
| 26:30 | Agenda |
| 27:53 | What are the issues in the real-world ? |
| 30:44 | What are the issues in the real-world? (1) |
| 35:11 | Data mining in practice |
| 37:20 | Data mining in practice (1) |
| 42:40 | Data mining in practice (2) |
| 46:11 | Data mining in practice (1) |
| 46:32 | Data mining in practice (2) |
| 49:00 | Challenges for the real-world |
| 58:34 | Vapnik’s Statistical Learning Theory |
| 60:21 | Vapnik’s Statistical Learning Theory (1) |
| 62:32 | Vapnik’s Statistical Learning Theory (2) |
| 62:43 | Vapnik’s Statistical Learning Theory (3) |
| 69:46 | Vapnik’s Statistical Learning Theory (4) |
| 71:11 | Vapnik’s Statistical Learning Theory (5) |
| 74:53 | Vapnik’s Statistical Learning Theory (6) |
| 77:58 | Vapnik’s Statistical Learning Theory (7) |
| 80:17 | Vapnik’s Statistical Learning Theory (8) |
| 82:36 | Vapnik's Statistical Learning Theory (9) |
| 83:56 | Vapnik’s Statistical Learning Theory (10) |
| 85:51 | Structural Risk Minimization |
| 86:35 | Structural Risk Minimization (1) |
| 87:07 | Structural Risk Minimization (2) |
| 88:07 | Structural Risk Minimization (3) |
| 88:57 | KXEN implementation |
| 89:51 | KXEN implementation (1) |
| 90:26 | Modelization process in KXEN |
| 90:38 | Modelization process in KXEN (1) |
| 91:01 | Modelization process in KXEN (2) |
| 93:25 | Modelization process in KXEN (3) |
| 93:41 | Modelization process in KXEN (4) |
| 94:45 | Modelization process in KXEN (5) |
| 95:36 | Modelization process in KXEN (6) |
| 96:17 | Modelization process in KXEN (7) |
| 97:17 | Agenda |
| 97:20 | Using textual variables |
| 100:24 | Using textual variables - DataMining Cup'06 |
| 113:57 | Using textual variables - DataMining Cup'06 (1) |
| 116:00 | Large telco operator |
| 119:34 | Large telco operator (1) |
| 130:04 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




