event thumbnail image
RuSSIR - Russian Summer School in Information Retrieval

IR in Social Media (IRSM)

author: Alexey Maykov, Microsoft Live Labs, Microsoft
author: Matthew Hurst, Microsoft Live Labs, Microsoft

Description

We define Social Media as a user-generated content on a Web. Social Media includes but not limited to: blogs, usenet, forums. The first part of a tutorial is pretty technical and hands-on. We will show specifics of a data acquisition from blogs, microblogs, usenet. We will present our existing data sets and show how to use them. In the second part we will talk about specifics of using obtained data. We will cover keyword extraction and other data mining techniques. Spam has become a major problem for Internet users and covers web search as well as most aspects of communication including email, IM, discussion forums. The recent popularity of blogging has spurned a surge in blog spam, with many flavors including splogs, comment spam, trackback spam and ping spam. In this talk we will discuss the differences and commonalities of combating spam in the blog medium vs. other types of spam. The exposition will be supported by results and examples based on real data.

You might be experiencing some problems with Your Video player.
Slides
0:00 IR in Social Media
2:50 Outline
4:56 Outline - Session 1
5:10 Session 1 Outline part1
5:20 Session 1 Outline part2
5:50 Definitions
7:28 Key Social Platforms
7:46 Key Features
9:41 And so it went in the US media...
11:42 Impact
11:58 Reuters and Photoshop
13:10 Rathergate
14:25 Impact Continued
15:47 Blog readership Waves
16:57 Writing blogs: usage trends
17:59 Academia
18:47 Logos
20:32 Conferences
20:48 Blue&Red
21:22 Session 1 Outline - Applications
21:36 Applications 1: BI
21:54 Logos
22:00 Acura MDX
23:57 Applications 2: Consumer
25:18 Live Search
25:22 Applications (addtl)
27:40 Session 1 Outline - Architectures
27:43 Functional Components
27:59 Focus on Content Preparation
29:42 Focus on Content Preparation (cont)
30:40 Maintaining a raw archive allows you to fix preparation issue...
31:40 Challenges
33:05 New Data
38:08 Heterogeneous Data
40:20 Heterogeneous Data (solution)
42:20 Sources of Duplication
44:45 Outline - In-Depth 1
45:03 What to Crawl
54:25 Web Crawler
55:40 Blog Crawler
67:45 Blog Crawler (2)
69:05 Crawl Issues
73:20 Bibliography
83:16 Outline - Session 2

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 1:25:44
Flash video Slides Windows Media video

!NOW PLAYING
Watch Part 2
Part 2 1:25:07
Flash video Windows Media video

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: