CiteSeerX & ChemXSeer: Lessons for Cyber-infrastructure and Web
author:
Lee Giles,
Pennsylvania State University
Description
E-science or cyberinfrastructure have become crucial for scientific
progress and open source systems have greatly facilitated design and implementation.
CiteSeer, a search engine and digital library for academic documents in computer
and information science, was one of the first cyberinfrastructure projects to
show the promise of improved search and access for scientific information. For
chemistry we propose the ChemXSeer (funded by NSF Chemistry) architecture,
a portal for academic researchers in environmental chemistry, which integrates
the scientific literature and search with experimental, analytical and simulation
datasets.
You might be experiencing some problems with Your Video player.
| Slides | |
| 0:00 | CiteSeerX and ChemXSeer: Lessons for Cyberinfrastructure and Web Search for eScience or Vertical Search for eScience |
| 1:20 | Outline |
| 1:32 | The Evolution of Science |
| 2:48 | Exponential Data Growth |
| 5:20 | How much information is there? (1) |
| 6:13 | How much information is there? (2) |
| 6:34 | Generations of Search Engines |
| 7:43 | Scientific search predecessor |
| 8:45 | Vertical Search Engines |
| 9:36 | Why Now |
| 11:19 | Design Issues |
| 13:03 | Our Approach to Vertical Search Engine Design |
| 15:09 | Lucene/Nutch Open Architecture |
| 15:47 | CiteSeerx today |
| 16:55 | What is CiteSeer? |
| 18:02 | CiteSeerx Contributors/Collaborators: recent past and present (incomplete list) |
| 18:21 | CiteSeer Popularity |
| 19:13 | CiteSeer History |
| 21:17 | Why CiteSeer became possible |
| 22:46 | List of Green Publishers |
| 22:48 | Potential CiteSeer Bomb? |
| 24:55 | Research/design questions for vertical search engines - CiteSeer |
| 25:33 | What Makes CiteSeer Work: Ontology for Academic Documents |
| 27:25 | Grouping identical citations Autonomous Citation Indexing |
| 28:11 | Citation Graphs – link navigation |
| 28:30 | CiteSeer Document Algorithms |
| 29:10 | Why Next Generation CiteSeer? |
| 30:01 | CiteSeerX: Next-Gen CiteSeer |
| 31:17 | Improved Automatic metatagging methods |
| 32:41 | Improved and enhanced indexing |
| 33:24 | New feature: Acknowledgement Indexing - Beta |
| 33:52 | Acknowledgments |
| 34:43 | Computational Scientometrics |
| 35:21 | Computational Citeometrics |
| 35:56 | Citeometrics |
| 37:00 | Automated Acknowledge Analysis and Indexing - Research |
| 38:11 | Extraction System |
| 38:19 | Automatic Acknowledge Passage Identification |
| 38:56 | Entity Name Extraction |
| 39:04 | Power law for acknowledgements |
| 40:04 | Rating Funding Agencies |
| 40:34 | Most Acknowledged Authors and Impact Factor |
| 41:26 | Examples of Acknowledgements |
| 42:03 | Most acknowledged deities |
| 42:11 | New feature: Acknowledgement Indexing - Beta (1) |
| 42:14 | New feature: Acknowledgement Indexing - Beta (2) |
| 42:49 | New feature: Acknowledgement Indexing - Beta 2 |
| 43:07 | Data Ingestion |
| 44:25 | Execution System |
| 44:27 | Framework Architecture |
| 44:55 | Web Application |
| 45:37 | Expandable Data Model |
| 45:55 | Who’s who? |
| 46:42 | Problem ReFormulation |
| 47:07 | Author disambiguation system - clustering |
| 48:11 | Dynamic Metadata Updates |
| 48:25 | MyCiteSeer |
| 48:53 | User Awareness: Staying Current |
| 49:18 | Notification Mechanism |
| 49:21 | Institutional Tracking |
| 49:44 | Social Network Discovery |
| 49:57 | Architecture Status Summary |
| 50:41 | CiteSeer Facilitated Research |
| 50:54 | Computer Science Trend Analysis Using CiteSeer Data |
| 51:21 | OverCite - MIT DHT P2P CiteSeer Architecture |
| 51:42 | CiteSeer provided urls for crawling |
| 51:54 | Accessing CiteSeer: API |
| 52:18 | Proposed New feature: CiteSeer Zeitgeist |
| 52:27 | Next Generation CiteSeer in progress |
| 53:09 | Lessons from CiteSeer |
| 55:17 | Status of Data and Publications in the Field of Chemistry |
| 57:29 | Status of Data and Publications in the Field of Chemistry |
| 58:21 | ChemXSeer Highlights |
| 60:06 | Data Interoperability and Information Transfer |
| 60:28 | Solution: Data Interoperability and Information Transfer |
| 61:55 | ChemXSeer Architecture Design |
| 62:45 | Data Handling and Storage |
| 63:30 | ChemXSeer Architecture Representation |
| 64:06 | http://chemxseer.ist.psu.edu |
| 65:01 | ChemXSeer search - alpha |
| 65:31 | ChemXSeer Functionalities |
| 65:58 | ChemXSeer Formula Search |
| 66:51 | Challenges in Formula Search |
| 67:32 | Issues in chemical formula search |
| 68:57 | Progress so far |
| 69:46 | System Architecture for Formula and Document Search in ChemXSeer |
| 70:39 | Related Work |
| 72:12 | Example Formula Search |
| 72:41 | Definitions and Criteria for Formula Indexing |
| 73:54 | Formula Search -Query Models |
| 75:13 | Ranking formulae |
| 75:55 | Experimental Data Search |
| 76:53 | Data Search Example in ChemXSeer |
| 77:38 | ChemXSeer Table Search (TableSeer) |
| 78:30 | Related work: Table search? |
| 78:41 | Table ranking |
| 79:16 | TableSeer System Architecture |
| 80:16 | TableRank - ranking tables in search |
| 81:06 | Proposed Query Interface Design |
| 81:36 | ChemXSeer Figure Data Extraction |
| 82:03 | Proposed Query Interface Design |
| 82:12 | ChemXSeer Figure Data Extraction |
| 84:30 | System Management - web services |
| 84:50 | Conclusions/Future Work |
| 85:51 | Robots.txt Search Engine - BotSeer |
| 87:00 | Robots Exclusion Protocol |
| 88:30 | Top 10 favored and disfavored robots – Ranked by ΔP favorability. |
| 89:49 | Robots.txt Search Engine - BotSeer |
| 91:06 | Proposed cyberinfrastructure system for archaeology |
| 93:06 | Overview |
| 96:38 | Summary |
| 98:41 | Acknowledgements |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Visitors who watched this lecture also watched...
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




