Workshops
Workshop on Scholarly Databases & Data Integration
Date:
August 30 and 31, 2006 (please see the agenda for details)
Meeting Place:
Herman B. Wells Library (map), Indiana University
1320 E. 10th St., Wells Library, Media Showing Room - E 174
Bloomington, IN 47405
Indiana University Campus Map »
Organizers:
Katy Börner
Associate Professor of Information Science, SLIS, Indiana University. Project director of the InfoVis CI and Network Workbench. Co-curator of Places & Spaces.
katy@iu.edu
PR^2 | PPT
Miguel Andrade
Scientist, Molecular Medicine, Ottawa Health Research Institute Assistant, Professor, Departments of Medicine, Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa.
mandrade@ohri.ca
PR^2 | PPT
Stacy Kowalczyk
Associate Director for Projects and Services, Digital Library Program, SLIS Ph.D. student, Indiana University.
skowalcz@iu.edu
PPT
Barend Mons
Associate Professor in Biosemantics, Erasmus Medical Center and Leiden University Medical Center, the Netherlands &
www.knewco.com Knewco, Inc.
bmons@knewco.com
PR^2 | PPT
Erik van Mulligen
Chief Technology Officer,
Erasmus Medical Center and Leiden University, Medical Center, the Netherlands. www.knewco.com
Knewco, Inc.
mulligen@knewco.com
PR^2
Marc Weeber
Head of Technical Operations USA, www.knewco.com Knewco, Inc.
marc@weeber.net
PR^2 | PPT
Workshop Goals & Agenda:
Read White Paper Draft and send comments to katy@iu.edu.
In recent years, bibliographic databases have become ever more important in scientific research and science management. These databases are essential to the primary work of science - retrieving the correct literature and finding potential collaborators and competitors. But increasingly, data and text mining of these databases have become a science in itself with novel discoveries in a multitude of disciplines. For science management, the use of these databases has become essential in connecting new proposals to what is extant and to find reviewers for these proposals. Additionally, measuring success of both projects and individuals in science has become increasingly important, and bibliographic databases are the key resource.
Uniquely identifying authors is currently an unmet challenge in all freely accessible and searchable bibliographic databases (e.g., PubMed, CiteSeer, arXiv, Google Scholar). To our knowledge, the only extensive (non-free) bibliographic database that has uniquely identified authors is the American Mathematical Society(TM)s Mathematical Reviews Database that has kept track of individual authors since its inception in 1940 (http://www.ams.org/mr- database/mr-authors.html). While this was originally a manual effort it is today assisted by computational means to a certain degree.
The recent success of Wikipedia, a community effort to establish an encyclopedia has lead to a series of spin-off projects and proposals that have community annotation of data at their core. WikiAuthors is such a proposal to collectively annotate and disambiguate science authors. Additionally, there have been several papers about algorithms do uniquely identify authors.
The data integration problem associated with the federation and usage of multiple scholarly databases might be best solved by using existing unique author/institutions/geolocations/etc. lists, and a merge of automatic data integration and manual data integration via a Wiki like approach.
Schedule:
Tuesday, August 29th, 2006
8:30pm | Meeting of the Workshop Organizers at Tutto Bene. |
Wednesday, August 30th, 2006
8:00am | Light Breakfast |
8:30am | Introduction by Participants lead by Katy Börner (5 min per person/organization) |
10:00am | Break |
10:15am | Introduction by participants (continued) |
12:00pm | Lunch |
12:30pm | Discussion of Opportunities and Challenges |
2:30pm |
Break |
3:00pm |
Potential Futures of Scholarly Data Acquisition, Management & Utilization |
4:00pm |
Discussion (Lead by Erik Erik van Mulligen) |
6:00pm |
Dinner at Little Tibet |
Thursday, August 31st, 2006
8:00am | Light Breakfast |
8:30am | Introduction to Wikipedia Ideas and Technology (Erik Moeller) Introduction to Author Name Disambiguation (Neil Smalheiser & Vetle Torvik) |
10:00am | Break |
10:30am |
Software Demos - Scholarly Database by Gavin LaRowe and Sumeet Ambre - WiktionaryZ by Erik Moeller - CIShell by Bruce Herr - Scopus by Dannien Sherman - Network Workbench by Bonnie Huang - Biomedical Visualizations by Ketan Mane - Discovery Logic Tools by Mike Pollard |
11:30am | Discussion of Challenges and Opportunities (Lead by Marc Weeber) |
12:00pm | Lunch |
1:00pm | Breakout Sessions (Lead by Miguel Andrade) |
2:00pm | Breakout Session Reports: 1-standards, 2-funding, 3-community, 4-technical |
2:30pm | Break |
3:00pm | Committments & Discussion of Next Steps (Lead by Barend Mons) |
4:00pm | Adjourn |
6:00pm | Indian Dinner at Shanti |
8:00pm | ExArt & Arthur Murray Dance Studio presents MARIA DE BUENOS AIRES http://buskirkchumley.org/ |
Friday, September 1st, 2006
9:00am | Semantic Tagging by Martijn Schuemie |
10:00am | Scholarly Database by Gavin La Rowe & Sumeet Ambre |
11:00am | Cyberinfrastructure Shell & Network Workbench by Bruce Herr and Weixia (Bonnie) Huang |
12:00pm | Working Lunch and Discussion |
Saturday, September 2nd, 2006
9:00am | Boat Tour and BBQ at Lake Monroe, Bloomington |
Participants Attending:
Erik Möller
Editor of infoAnarchy. Webmaster of The Origins of Peace and Violence and Der Humanist. Lead developer of WiktionaryZ and Wikidata, former Chief Research Officer of the Wikimedia Foundation. Wikipedian since 2001 and one of the developers of the underlying MediaWiki software.
moeller@scireview.de
PR^2 | PPT
Ricardo Pietrobon
Assistant Professor of Surgery
Duke University.
pietr007@gmail.com
PR^2
Mike Pollard
Vice President, Discovery Logic.
mikep@DiscoveryLogic.com
PR^2 | PPT
Bruce Herr
Research Staff, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University
bh2@BH2.NET
PR^2
Weixia (Bonnie) Huang
Senior System Architect, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
huangb@iu.edu
Sumeet Ambre
SLIS Master Student, Scholarly Database Developer, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
sambre@iu.edu
PR^2
Israel I. Lederhendler
Director, Division of Information Services, OER, OD
National Institutes of Health, DHHS
Bethesda, Maryland.
lederhei@od.nih.gov
PR^2
Thom Hickey
Vice-President, Bibliometrics Science-Metrix
Chief Scientist, OCLC Research.
hickey@oclc.org
PR^2 | PPT
Gavin La Rowe
Research Assistant, Scholarly Database Team Lead, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
glarowe@iu.edu
PR^2
Neil Smalheiser
Assistant Professor,
Department of Psychiatry,
University of Illinois at Chicago.
smalheiser@psych.uic.edu
PR^2
James Pringle
Vice President, Development,
Thomson Scientific.
james.pringle@thomson.com
PR^2
John Burgoon
Master Student, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
jburgoon@iu.edu
PR^2
Ketan Mane
Ph.D. Candidate, Cy berinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
kmane@iu.edu
PR^2
Martijn Schuemie
Department of Medical informatics,
Erasmus MC,
Rotterdam, The Netherlands.
m.schuemie@erasmusmc.nl
PPT
Lokman I. Meh
Assistant Professor of Library and Information Science, SLIS, Indiana University. Researches citation analysis, digital libraries, and information access.
meho@iu.edu
(will not attend 2nd half of Day 2)
PR^2
Ron Day
Associate Professor of Library and Information Science, SLIS, Indiana University.
Expert in the history, culture and political economy of information, documentation, communication, knowledge, and digital media. roday@iu.edu
Jeff Krause
Professor of Chemistry, Quantum Theory Project, University of Florida & NSF Directorate on Theoretical and Computational Chemistry.
Interested But Cannot Attend:
Robert C. Bolander
Communications & Programs Manager, OCLC Research
bolander@oclc.org
Mark Claydon-Smith
Head of Information Systems Engineering and Physical Sciences Research Council, Polaris House, Mark.Claydon-Smith@epsrc.ac.uk
AnHai Doan
Professor of Computer Science, University of Illinois, Urbana,
anhai@cs.uiuc.edu
Paul Ginsparg
Professor of Physics, Cornell University, Created arXiv.org, ginsparg@cornell.edu
Zachary G. Ives
Assistant Professor of Computer & Information Science,University of Pennsylvania,
zives@cis.upenn.edu
Albert Mons
Chief Executive Officer, Knewco, Inc.
amons@knewco.com
André Skupin
Department of Geography, San Diego State University
skupin@mail.sdsu.edu
Alon Levy
Professor of Computer Science, University of Washington
alon@cs.washington.edu
Andrey Rzhetsky
Associate Professor, Department of Biomedical Informatics, Columbia University
ar345@Columbia.edu
C. Lee Giles
Professor, Computer Science and Engineering, Director, The Intelligent Systems Research Laboratory, Professor, Supply Chain and Information Systems, Associate Director of Research, eBusiness Research Center (eBRC). Pennsylvania State University giles@ist.psu.edu
Adam Jackson
Board Member Knewco, Inc
ajackson@argoglobal.com
Beth Plale
Computer Science Department, Indiana University
plale@iu.edu
Vetle Torvik
Research Assistant Professor, Department of Psychiatry, University of Illinois at Chicago.
vtorvik@uic.edu
PR^2
Chris Rosin
President, Parity Computing, Inc.,
crosin@paritycomputing.com
PR^2 | PPT
Alex Soojung-Kim Pang
Ph.D., Research Director and Blogger-in-Chief, Institute for the Future
apang@IFTF.org
References
- Authors, Authors: Thomson Scientific and Elsevier Scopus Search Them Out, Information Today, Inc. July 24, 2006
- Automated Name Authority Control and Enhanced Searching in the Levy Collection, D-Lib Magazine. April 2001.
- Thomson Scientific Announces Development of full suite of Authorship Tools, Thomson. June 8, 2006
- Schijvenaars, B.J., Mons B., Weeber, M., Schuemie, M.J., van Mulligen E.M.,Wain H.M., Kors J.A. (2005). Thesaurus-based disambiguation of gene symbols . BMC Bioinformatics , 6(1), 149.
- Jelier, R., Jenster, G., Dorssers, L.C., van der Eijk, C.C., van Mulligen E.M., Mons, B., Kors J.A. (2005). Co-occurrence based meta-analysis of scientific texts: Retrieving biological relationships between genes. Bioinformatics, 21(9), 2049-2058.
- Torvik, V.I., Weeber, M., Swanson, D.R. and Smalheiser, N.R. (2003). A Probabilistic Similarity Metric for Medline Records: A model for Author Name Disambiguation. Journal of the American Society for Information Science and Technology, 56 (2). 140-158.
- Han, H., Giles, C.L., Zha, H., Li, C., and Tsioutsiouliklis, K. (2004). Two Supervised Learning Approaches for Name Disambiguation in Author Citations. Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL 2004). 296-305.
- Han, H., Zha, H., and Giles, C.L. (2005). Name Disambiguation in Author Citations using a k-way Spectral Clustering Method. Proceedings of the International Conference on Digital Libraries.
- Newman, M. (2004). Coauthorship Networks and Patterns of Scientific Collaboration. Proceedings of the National Academy of Sciences, 101. 5200-5205.
- Malin, B. (2005). Unsupervised Name Disambiguation via Social Network Similarity. Proceedings of the Workshop on Link Analysis, Counterterrorism, and Security, in conjunction with the SIAM International Conference on Data Mining. Newport Beach, CA. 93-102.
- Dellavalle, R.P., Hester, E.J., Heilig, L.F., Drake, A.L., Kuntzman, J.W., Graber, M., Hester, E., Schilling, Lisa. (2003). Going, Going, Gone: Lost Internet References. Science 302. 787-788.
- Wrenn, J.D., Grissom, J.E., and Conway, T. (2006). E-mail Decay Rates Among Corresponding Authors in MEDLINE. EMBO Reports, 7 (4). 122-127.
- Bennet, D.B., and Williams, P. (2006). Name Authority Challenges for Indexing and Abstracting Databases. Evidence Based Library and Information Practice, 1 (1). 37-57.
Travel/Housing:
Bloomington has a number of hotels, but the most convenient is the Indiana Memorial Union on campus. A block of rooms have been held for this workshop. When you register with the hotel, please let them know that you are with the "Database Integration Workshop". All participants are responsible for their own transportation and accomodations.
Directions:
See the contact page for the Cyberinfrastructure for Network Science Center, http://cns.iu.edu/contact.html or contact Samantha Hale (ude.anaidni@elahjs).
Acknowledgments:
This workshop is sponsored by Knewco, Inc., the Digital Library Program of the Indiana University Libraries, the National Science Foundation under Grant No. CHE-0524661, and a James S. McDonnell Foundation grant in the area Studying Complex Systems entitled Modeling the Structure and Evolution of Scholarly Knowledge