event thumbnail image
NATO Advanced Study Institute on Mining Massive Data Sets for Security

CiteSeerX & ChemXSeer: Lessons for Cyber-infrastructure and Web

author: Lee Giles, Pennsylvania State University

Description

E-science or cyberinfrastructure have become crucial for scientific progress and open source systems have greatly facilitated design and implementation. CiteSeer, a search engine and digital library for academic documents in computer and information science, was one of the first cyberinfrastructure projects to show the promise of improved search and access for scientific information. For chemistry we propose the ChemXSeer (funded by NSF Chemistry) architecture, a portal for academic researchers in environmental chemistry, which integrates the scientific literature and search with experimental, analytical and simulation datasets.

You might be experiencing some problems with Your Video player.
Slides
0:00 CiteSeerX and ChemXSeer: Lessons for Cyberinfrastructure and Web Search for eScience or Vertical Search for eScience
1:20 Outline
1:32 The Evolution of Science
2:48 Exponential Data Growth
5:20 How much information is there? (1)
6:13 How much information is there? (2)
6:34 Generations of Search Engines
7:43 Scientific search predecessor
8:45 Vertical Search Engines
9:36 Why Now
11:19 Design Issues
13:03 Our Approach to Vertical Search Engine Design
15:09 Lucene/Nutch Open Architecture
15:47 CiteSeerx today
16:55 What is CiteSeer?
18:02 CiteSeerx Contributors/Collaborators: recent past and present (incomplete list)
18:21 CiteSeer Popularity
19:13 CiteSeer History
21:17 Why CiteSeer became possible
22:46 List of Green Publishers
22:48 Potential CiteSeer Bomb?
24:55 Research/design questions for vertical search engines - CiteSeer
25:33 What Makes CiteSeer Work: Ontology for Academic Documents
27:25 Grouping identical citations Autonomous Citation Indexing
28:11 Citation Graphs – link navigation
28:30 CiteSeer Document Algorithms
29:10 Why Next Generation CiteSeer?
30:01 CiteSeerX: Next-Gen CiteSeer
31:17 Improved Automatic metatagging methods
32:41 Improved and enhanced indexing
33:24 New feature: Acknowledgement Indexing - Beta
33:52 Acknowledgments
34:43 Computational Scientometrics
35:21 Computational Citeometrics
35:56 Citeometrics
37:00 Automated Acknowledge Analysis and Indexing - Research
38:11 Extraction System
38:19 Automatic Acknowledge Passage Identification
38:56 Entity Name Extraction
39:04 Power law for acknowledgements
40:04 Rating Funding Agencies
40:34 Most Acknowledged Authors and Impact Factor
41:26 Examples of Acknowledgements
42:03 Most acknowledged deities
42:11 New feature: Acknowledgement Indexing - Beta (1)
42:14 New feature: Acknowledgement Indexing - Beta (2)
42:49 New feature: Acknowledgement Indexing - Beta 2
43:07 Data Ingestion
44:25 Execution System
44:27 Framework Architecture
44:55 Web Application
45:37 Expandable Data Model
45:55 Who’s who?
46:42 Problem ReFormulation
47:07 Author disambiguation system - clustering
48:11 Dynamic Metadata Updates
48:25 MyCiteSeer
48:53 User Awareness: Staying Current
49:18 Notification Mechanism
49:21 Institutional Tracking
49:44 Social Network Discovery
49:57 Architecture Status Summary
50:41 CiteSeer Facilitated Research
50:54 Computer Science Trend Analysis Using CiteSeer Data
51:21 OverCite - MIT DHT P2P CiteSeer Architecture
51:42 CiteSeer provided urls for crawling
51:54 Accessing CiteSeer: API
52:18 Proposed New feature: CiteSeer Zeitgeist
52:27 Next Generation CiteSeer in progress
53:09 Lessons from CiteSeer
55:17 Status of Data and Publications in the Field of Chemistry
57:29 Status of Data and Publications in the Field of Chemistry
58:21 ChemXSeer Highlights
60:06 Data Interoperability and Information Transfer
60:28 Solution: Data Interoperability and Information Transfer
61:55 ChemXSeer Architecture Design
62:45 Data Handling and Storage
63:30 ChemXSeer Architecture Representation
64:06 http://chemxseer.ist.psu.edu
65:01 ChemXSeer search - alpha
65:31 ChemXSeer Functionalities
65:58 ChemXSeer Formula Search
66:51 Challenges in Formula Search
67:32 Issues in chemical formula search
68:57 Progress so far
69:46 System Architecture for Formula and Document Search in ChemXSeer
70:39 Related Work
72:12 Example Formula Search
72:41 Definitions and Criteria for Formula Indexing
73:54 Formula Search -Query Models
75:13 Ranking formulae
75:55 Experimental Data Search
76:53 Data Search Example in ChemXSeer
77:38 ChemXSeer Table Search (TableSeer)
78:30 Related work: Table search?
78:41 Table ranking
79:16 TableSeer System Architecture
80:16 TableRank - ranking tables in search
81:06 Proposed Query Interface Design
81:36 ChemXSeer Figure Data Extraction
82:03 Proposed Query Interface Design
82:12 ChemXSeer Figure Data Extraction
84:30 System Management - web services
84:50 Conclusions/Future Work
85:51 Robots.txt Search Engine - BotSeer
87:00 Robots Exclusion Protocol
88:30 Top 10 favored and disfavored robots – Ranked by ΔP favorability.
89:49 Robots.txt Search Engine - BotSeer
91:06 Proposed cyberinfrastructure system for archaeology
93:06 Overview
96:38 Summary
98:41 Acknowledgements

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: