File: www.idi.ntnu.no/grupper/su/publ/db-papers/citation-db-eval.html

         Reidar Conradi
            IDI, NTNU

        conradi@idi.ntnu.no

           24.06.2011

On the utility of 10 major citation databases for scientific papers

       Table of contents

  1. Short examples of using DB.1-D.B10 for Sjberg (DIK) and Conradi (RC).

  2. Characterizing the major citation databases
    2.1  DB.1 ISI Web of Knowledge (now by Thomson Reuters)
    2.2  DB.2. Microsoft Academic Research (MAR)
    2.3  DB.3. Harzing's Publish and Perish (PoP)
    2.4  DB.4 GoogleScholar
    2.5  DB.5 DBLP
    2.6  DB.6 CiteseerX
    2.7  DB.7 Arnetminer
    2.8  DB.8 Cristin (ex-Frida)
    2.9  DB.9 NTNU-SU group's publication list
    2.10 DB.10 SCOPUS (life and physical sciences)

  3. Concluding remarks

  4. Appendix: Some big examples for Sjberg and Conradi regarding PoP.
    4.1 CASE 1: Some PoP-errors in the 109 papers in pop-sjoberg-all.tex
    4.2 CASE-2: The missing 12 PoP-papers in pop-sjoberg-excl-chem-mat.tex  
    4.3 CASE-3: Prune Conradi's 337=>297 papers on pop-conradi-EngCSMath.tex

  Log:
    24.Sep.2001 (RC): Adjusted the descriptions for
                      ISI Web of Knowledge and Arnetminer.
    26.Sep.2001 (RC): Added SCOPUS, empty in the start.


 1. Short examples of using DB.1-DB.10 for Sjberg (DIK), plus SU-group:
    Conradi (RC), ... Jaccheri (MLJ), and ... Daniela S. Cruzes (DSC).

    Subject (topic): "IT", Period: year 0-2010, PhDs: only own,
      'na' means not available.

                               #publications #citations G-index H-index
    DB.1 ISI                   DIK:  38          413        ?      12
                               RC:   89 **       394        ?      11
                               MD:    7            7        ?       2
                               MLJ:  30 ##        65        ?       5
                               TSk:   8            0        ?       0
                               TSt:  42           52        ?       3
                               AIW:  17 (23-6)     5        ?       2
                               DSC:   9            8        ?       2
                            GSindre: 46          377        ?       7
                            JAGulla: 18           22        ?       4

    DB.2 Microsoft Acad. Res.  DIK:  71??        438       18      12
                               RC:  209         1560       33      11
                               MLJ:  57 ##         ?        ?       ?
                               DSC:  17            ?        ?       ?

    DB.3 Publish and Perish    DIK:  88            ?        ?       ?
                               RC:  297            ?        ?       ?
                               MLJ:   ?            ?        ?       ?

    DB.4 GoogleScholar         DIK: 449           na?      na      na
                               RC: 1070           na?      na      na
                               MLJ:   ?           na?      na      na

    DB.5 DBLP                  DIK:  57            ?       na      na
                               RC:  127            ?       na      na
                               MD:   26            ?       na      na
                               MLJ:  34            ?       na      na
                               TSk:   4            ?       na      na
                               TSt:  40            ?       na      na
                               AIW:  29            ?       na      na
                               DSC:  17            ?       na      na

    DB.6 CiteseerX             DIK:  21            ?       na       5
                               RC:  139            ?       na      14
                               MLJ:   ?            ?       na       ?

    DB.7 Arnetminer            DIK:  50          883       na      16
                               RC:  129         2374       na      24
                               MD:   26          339       na      10
                               MLJ:  30 ##       205       na       7
                               TSk:   1            0       na       0
                               TSt:  34          268       na       8
                               AIW:  28          120       na       7
                               DSC:  14           60       na       3

    DB.8 Cristin               DIK: 128           na       na      na
         (last 10-15 years)    RC:  208           na       na      na
                               MD:   67           na       na      na
                               MLJ:  81           na       na      na
                               TSk:  57           na       na      na
                               TSt:  84           na       na      na
                               AIW: 182           na       na      na
                               DSC:  28           na       na      na
                               
    DB.9 NTNU-SU               DIK:  na           na       na      na
                               RC:  207(45 jour.) na       na      na
                               MLJ:  90 ##        na       na      na

    DB.10 SCOPUS (not tried)   DIK:  ??            ?        ?       ?
                               RC:   ??            ?        ?       ?
                               MLJ:  27            ?        ?       ?

Comments:
   ##: MLJ (Jaccheri) has totally 90 entries (by SU), 27 (by ISI)
       and 30 (by ARNET)!!, with 20 in overlap between the latter two,
       i.e. 37 from these two DBs, including 2 books and 3 book-chapters.
       Remaining 50 publ.s: ca. 25 of sufficient quality
                               (but not found in citation DBs) and
                            ca. 25 informal (OK to be separately listed
                                   outside DBs?).
       MAR database has 57 publ.s, which for MLJ seems OK in size,
                                   but contents remain to be checked.

See also PoP examples CASE-1, CASE-2 and CASE-3 in Appendix.


 DB.1 ISI Web of Knowledge
 =========================
   Need a (site) license, e.g. via NTNU.
   Covers all fields, but only articles in indexed jouirnals.
   Thus, it misses most books, book chapters, and not to forget 
   good conference papers (ICSE, VLDB, IJCAI, OOPSLA, and similar).
   ISI recently won over SCOPUS to be the main "import channel" to Cristin.

  URL: http://apps.webofknowledge.com/
   (This database tool is easier to use than the URL listed below.)
   Be sure you are in the "Web of Science".
   Look at the menu below the name "Search", and
     fill-in the "Author" slot with a text like:
       Sjoberg DIK
       Conradi R*
       (Jaccheri L*) OR (Jaccheri M*)
       
   Then click on the below "Search" (or "Clear") button.
   Ex. With a query of (Conradi) and no further preferences,
       we initially get 1027 publication entries.
   So need to click on the submenues of "Refine" results" under "Web
     of Science Categories", like
       ENGINEERING ELECTRICAL ELECTRONIC (74) and
       COMPUTER SCIENCE THEORY METHODS (54),
     and so on for more sub-categories.
   At last, we may end up with 127 publication entries,
     each containing a DOI and lots of other info; really impressive!
   To see all the entries, click on "Create Citation Report" text,
     standing furthest to the right of "Refine" results".
   There is also a manual selection facililty to "fine-prune" the "catch".
   
  URL: http://apps.isiknowledge.com, cf. also above.
   Need a (site) lisence, e.g. via NTNU.
   Covers all fields, but only articles in indexed journals.
   Thus, it misses most books, book chapters, and not to forget 
   good conference  papers (ICSE, VLDB, IJCAI, OOPSLA, and similar).

   First set the button 'Limit to:', then select 'All Years' or similarly.
   Then go to 'Web of Science' in the top headings, and
     now select a more precise year interval.
   Then click on the 'Author Finder' command:
     Step 1: Enter Author Name, e.g. 'Conradi R'.
     Step 2: Select Author Variant, e.g. 'Conradi R' => 'Conradi R*'.
     Step 3: Select Subject Category, choose at least one of:
               LIFE SCIENCES & BIOMEDICINE
               MULTIDISCIPLINARY SCIENCE & TECHNOLOGY (use this!)
               PHYSICAL SCIENCES
     Step 4: Select Institution from a list, and max 50 such.
             Ex. I don't include UNIV HEIDELBERG, if 'Conradi R' is selected.

   Lastly click on 'Finish Now'.
     You may now fine-tune your selection by sub-subjects and document types.
     Then:
       indicate 'Create Citation Report' on the top-right, and
       cut&save the displayed summary with your h-index etc.
     Possibly also:
       indicate 'Records' on the bottom-left, and
       specify a min-max interval (e.g. papers ordered from '1' to '100'),
         about which documenttypes to consider,
       specify whether to include an abstract (say No!) etc., and
       finally specify the record format (e.g. BibTex) and a file name
         to store the textual records.


  DB.2 Microsoft Academic Research (MAR)
  ======================================
  URL: http://academic.research.microsoft.com/Organization/13557 (for NTNU).
       Free to use, covers all fields, appears to be OK.
       Start to click on 'Advanced search',
         then select 'Computer Science'.
       Choose first 'Author' (gives #papers, #citations, G/H-index),
         then 'Publication' (to get BibTex files).
       User interface is a bit messy, e.g . must write 'Dag I.K. Sjoberg'.


  DB.3 Harzing's Publish and Perish (PoP)
  =======================================
  URL: www.harzing.com/pop.htm,
       First a free, executable version must be downloaded and
       installled on your PC.
       It covers all fields, gets data from GoogleScholar - thus immature.

  Often start by clicking on 'Citation analysis' on top-left,
    then e.g. submeny 'Author impact analysis'.
  Then give your author name (e.g. 'R Conradi'),
    possibly some excluded names (e.g. 'FR Conradi'),
    a year interval, and finally
    select one or more of the seven subjects below:
      1. Biology, Life Sciences, Environmental Science.
      2. Business, Administration, Finance, Economics.
      3. Chemistry and Materials Science.
      4. Engineering, Computer Science, Mathematics - taken as 'IT'-related.
      5. Medicine, Pharmacology, Veterinary Science.
      6. Physics, Astronomy, Planetary Science.
      7. Social Sciences, Arts, Humanities.
  Finally click on 'lookup' button on the top-right.
  Results come as a summary (with h-index etc.), or - if requested via the
  top-left 'File' menu - also as BibTex entries on a given textual file.

  PoP seems much more liberal, and the raw data is provided by Google
  Scholar.  The first ca. 20??% of the cited papers look OK, but the
  last 20% is under any acceptable quality limit.  In addition comes
  hords of duplicates, trivial textual errors, and even some plain garbage.

  To check the actual precision of PoP, we have taken some
  PoP-reported citations for two Norwegian SE researchers:
    'DIK Sjoberg' for Dag Ingar Kondrup Sjberg, Ifi, UiO.
    'R Conradi'   for Reidar Conradi, IDI, NTNU.

  See files:
   pop-sjoberg-all.tex             109 papers, all seven fields;
                                         testing duplicates, errors etc.
   pop-sjoberg-excl-chem-mat.tex    97 papers, all except Chem&Materials;
                                         testing field impact.
   pop-sjoberg-only-EngCSMath.tex   88 papers, only 'IT'-related;
                                         testing core IT papers.

   pop-conradi-all.tex             981 papers, all seven fields; !!!
                                         ex. of non-IT colleagues.
   pop-conradi-only-EngCSMath.tex  337 papers, only 'IT'-related;
                                       must filter non-IT papers of
                                         'R Conradi' to get 297 real ones.


  DB.4 GoogleScholar
  ==================
  URL: http//www.googlescholar.com, free to use, all fields.
       Immature since it includes *all* documents ever co-written
       by a given person (thousands ...) and with many errors.
       Not yet computing, #citations, G-index, and H-index.

  DB.5 DBLP
  =========
  URL: http://www.informatik.uni-trier.de/~ley/db/, from Univ. Trier.
       Free to use, only IT covered, good quality, 
       often comes with links to abstract and .pdf-file 

  DB.6 CiteseerX
  ==============
  URL: http://citeseerx.ist.psu.edu/.
       Free to use, only IT covered, good quality.
       Mostly used for single persons and (partial) document titles.
       Emphasis on giving .pdf-files.


  DB.7 Arnetminer
  ===============
  URL: http://arnetminer.org/  -- fairly new and in rapid development.

  Just type in your name.
  Then click on your picture in the "upper-left corner" to get a
    list of your publications, placed at the bottom.
  The publication list is by default sorted from newest to oldest.
  Click on a paper title to get the textual contents as a .pdf-file,
    often via DOI.
  Above this list is a very nice "co-author star" in colors,
  to display yourself in relation to your co-authors.


  DB.8 Cristin (ex-Frida)
  =======================
  URL: http://www.ntnu.no/ub/cristin -- for Norwegian R&D institutions.
  Not very impressive, as the main functionality is 'bean counting'.
  Queries can only have one author name.
  Ex. Cannot define general groups, only disjoint subsets of existing units.

  DB.9 SU group's publication list
  ================================
  URL: http://www.idi.ntnu.no/grupper/su/INT-PUBL.php3. 
  Free to use, only insiders can enter data.
  As *one* textfile written in .php3, with meta-symbols and .pdf files.
  Special file for the ca. 150 PhD theses since 1970.
  Functionality: display papers for given year (default is current year).
                 display papers according to document type and year.
                 display papers according to search terms and AND or OR-op
                 (e.g. 'Reidar Conradi journal 2011').
  Over 1000 entries, but need another platform and wrapping!


  DB.10 SCOPUS
  ============
  URL: http://www.scopus.com/home.url

  Free to use, only insiders can enter data.??
  Mostly life and physical sciences


  3. Concluding remarks
  =====================

 Many and diverse databaser for academic papers!!
   If only journal papers are requested: choose ISI.
   If only ``IT'' coverage is needed, choose DBLP.
   If comprehensive coverage and G/H indices are needed: choose MAR.
   If comprehensive coverage is needed, with G/H indices: Arnetminer/MAR.

   PO that builds on GoogleScholar is pre-mature, see Apendix below.
    Ex. Almost all TeX entries have one or several errors:
        @article instead-of inproceedings + missing @proceedings and editor
        @book    instead-of @proceedings
    Ex. Of 109 Tex-entries for Sjberg, 27 are duplicates and 10 are garbage.


 4. Appendix: Some big examples for Sjberg and Conradi regarding PoP.

 4.1 CASE-1: Some PoP-errors in the 109 papers in pop-sjoberg-all.tex
 ====================================================================
 RC: added an 'a'-prefix (like 0000a1, 0000a2, ...) to name these papers.

 Many duplicates (21 and counting ...) and much garbage (at least 7).

 D1. CONTEX paper, 5 duplicates
 ------------------------------
  @article{pop0000a1,
	author = {DIK SjÁberg and JE Hannay and O Hansen and ...},
	title = {A survey of controlled experiments in software engineering},
	journal = {IEEE Transactions on  },
	publisher = {computer.org},
	url = {http://www.computer.org/portal/web/csdl/doi/10.1109/TSE.2005.97},
	year = {2005},
	note = {201 cites: http://scholar.google.com/scholar?
         cites=9068752407816095539\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
  @article{pop000a11,
	author = {DIK Sjoberg and JE Hannay and ...},
	title = {Vigdis By Kampenes, Amela Karahasanovic, Nils-Kristian Liborg,
           Anette C. Rekdal, A Survey of Controlled Experiments in
           Software Engineering},
	journal = {IEEE Transactions on Software Engineering},
	year = {2005},
	note = {57 cites: http://scholar.google.com/scholar?
         cites=3972648676543320866\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a43,
	author = {DIK SJØBERG and JE HANNAY and O HANSEN and ...},
	title = {V., KARAHASANOVI C, A., LIBORG, N.-K., AND C. REKDAL},
	journal = {A. A survey of controlled  },
	year = {2005},
	note = {2 cites: http://scholar.google.com/scholar?
         cites=1303181160453406441\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a55, %% Note also textual error in title.
	author = {DIK SjÁberg and JE Hannay and ...},
	title = {Vigdis By Kampenes, Amela Karahasanovic, Nils-Kristian Liborg,
           and Anette C. Rekdal. 2005."
           A Survey of Controlled Experiments in Software  },
	journal = {IEEE Transactions on Software Engineering},
	note = {3 cites: http://scholar.google.com/scholar?
        cites=14313076272726917795\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a58,
	author = {DIK Sjoberg and JE Hannay and ...},
	title = {Ka mpenes, VB, Kar ahasanovic, A., Liborg, N.-K. and Rekdal,
          AC 2005. A Su rvey of Controlled Experim ents in
          Software Engineering},
	journal = {IEEE Transactions on Software Engineering},
	note = {2 cites: http://scholar.google.com/scholar?
       cites=13627441494292258647\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a69,
	author = {DIK SjÁberg and JE Hannay and O Hansen and
             VB Kampenes and ...},
	title = {A Survey of Controlled Experiments in Software},
	journal = {Engineering. In IEEE  },
	year = {2005},
	note = {2 cites: http://scholar.google.com/scholar?
         cites=6631457363046079464\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}

 D2. Future of Software Engineering research paper: 4 duplicates
 ------------------------------------
 @book{pop0000a4, %% @book => @article??
	author = {DIK Sjoberg and T Dyba and ...},
	title = {The future of empirical methods in
            software engineering research},
	publisher = {computer.org},
	url = {http://www.computer.org/portal/web/csdl/doi/10.1109/FOSE.2007.30},
	year = {2007},
	note = {88 cites: http://scholar.google.com/scholar?
        cites=16645473109522102835\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a22,
	author = {DIK Sjoberg and T Dyba and ...},
	title = {The Future of Empirical Methods in Software Engineering Research},
	journal = {  Conference on Software Engineering. IEEE Computer  },
	year = {2007},
	note = {19 cites: http://scholar.google.com/scholar?
     cites=2744968747580700492\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a47,
	author = {DIK SjÁberg and T DybÁ and ...},
	title = {The future of empirical methods in software engineering research.
           Future of Software Engineering},
	journal = {  of the 29th International Conference on  },
	year = {2007},
	note = {2 cites: http://scholar.google.com/scholar?
         cites=6658786638492474235\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a54,
	author = {DIK SjÁberg and T DybÁ and ...},
	title = {The Future of Empirical Methods in Software Engineering Research.
        presented at Future of Software Engineering--
        29th International Conference on  },
	journal = {IEEE Computer Society},
	year = {2007},
	note = {3 cites: http://scholar.google.com/scholar?
        cites=11436713780192218117\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a56,
	author = {DIK SjÁberg and ...}, %% author+title: wrong text.
	title = {JÁrgensen. M. 2007. The Future of Empirical Methods
           in Software Engineering Research},
	journal = {Future of Software Engineering (FOSE'07), ed. by  },
	note = {2 cites: http://scholar.google.com/scholar?
       cites=15412413240314648\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}

 D3. Pair Programming meta-analysis paper: 2 duplicates
 ------------------------------------------------------ 
 @article{pop000a29,
	author = {  and T DybÁ and E Arisholm and DIK SjÁberg},
	title = {The effectiveness of pair programming: A meta-analysis},
	journal = {Information and Software  },
	publisher = {Elsevier},
	url = {http://linkinghub.elsevier.com/retrieve/pii/S0950584909000123},
	year = {2009},
	note = {17 cites: http://scholar.google.com/scholar?
         cites=7044227830545391005\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a78,
	author = {  and T Dyba and E Arisholm and DIK Sjoberg},
	title = {The effectiveness of pair programming},
	journal = {Information and  },
	publisher = {dialnet.unirioja.es},
	url = {http://dialnet.unirioja.es/servlet/articulo?codigo=3003393},
	year = {2009},
	note = {Query date: 14.06.2011},
}
@article{pop000a94,
	author = {T Dybå and E Arisholm and DIK Sjøberg and JE Hannay and ...},
	title = {Studies on effectiveness},
	journal = {computer.org}, %% RC: assuming Journal of IST ??
	note = {Query date: 14.06.2011},
}

 D4. Pair Programming - IEEE SW paper: one duplicate
 ---------------------------------------------------
  @article{pop000a14,
	author = {T Dyba and E Arisholm and DIK Sjoberg and ...},
	title = {Are two heads better than one?
            On the effectiveness of pair programminga},
	journal = {Software,  },
	publisher = {ieeexplore.ieee.org},
	url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4375233},
	year = {2007},
	note = {48 cites: http://scholar.google.com/scholar?
         cites=1267146913481237176\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a92,
	author = {T DYBA and E ARISHOLM and DIK SJOBERG and JE HANNAY and ...},
	title = {On the effectiveness of pair programming},
	journal = {IEEE software},
	publisher = {cat.inist.fr},
	url = {http://cat.inist.fr/?aModele=afficheN\&cpsidt=19205363},
	year = {2007},
	note = {Query date: 14.06.2011},
}

 D5. AQUIS paper: one duplicate
 ------------------------------  
  @article{pop000a63,
	author = {  and DIK SjÁberg},
	title = {A simple effort prediction interval method},
	journal = {Proceedings of Achieving Quality in Information  },
	year = {2002},
	note = {2 cites: http://scholar.google.com/scholar?
      cites=12271826550010681096\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a64,
	author = {  and DIK SjÁberg},
	title = {A Simple Effort Prediction Interval Approach},
	journal = {Achieving Quality in Information Systems (AquIS)},
	year = {2002},
	note = {2 cites: http://scholar.google.com/scholar?
       cites=7105812472919430453\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}

 D6. POS-9 proceedings: one duplicate
 ------------------------------------
  @book{pop000a74,  %% author => editor
	author = {  and A Dearle and DIK SjÁberg},
	title = {Persistent object systems: design, implementation, and use:
     9th International Workshop, POS-9, Lillehammer, Norway,
     September 6-8, 2000: revised papers},
	publisher = {books.google.com},
	year = {2001},
	note = {Query date: 14.06.2011},
}
  @article{pop000a91, %% title has errors??
	author = {  and A Dearle and DIK SjÁberg},
	title = {POS-9: persistenet object systems: design, implementation,
       and use:(Lillehammer, revised papers)},
	journal = {Lecture notes in computer science},
	publisher = {cat.inist.fr},
	url = {http://cat.inist.fr/?aModele=afficheN\&cpsidt=65164},
	year = {2001},
	note = {Query date: 14.06.2011},
}

 D7. Basili SE protocol paper: one duplicate
 -------------------------------------------
  @article{pop000a35,
	author = {VR Basili and MV Zelkowitz and DIK SjÁberg and ...},
	title = {Protocols in the use of empirical software engineering artifacts},
	journal = {Empirical Software  },
	publisher = {Springer},
	url = {http://www.springerlink.com/index/k0508k4648004760.pdf},
	year = {2007},
	note = {10 cites: http://scholar.google.com/scholar?
         cites=6562074275962780612\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
@book{pop000a80,  %% book/editor??
	author = {VR Basili and MV Zelkowitz and DIK SjÁberg and P Johnson and ...},
	title = {Empir Software Eng DOI 10.1007/s10664-006-9030-4
           Protocols in the use of empirical software engineering artifacts},
	publisher = {Citeseer},
	url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.66.4027},
	year = {2008},
	note = {Query date: 14.06.2011},
}

 D8. Guide to advanced ESE: one duplicate
 ----------------------------------------
  @book{pop000a18,  %% author => editor
	author = {… and J Singer and DIK Sjøberg},
	title = {Guide to advanced empirical software engineering}, %% advanced??
	publisher = {books.google.com},
	year = {2007},
	note = {26 cites: http://scholar.google.com/scholar?
        cites=16889680598587760632\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
@article{pop00a107,
	author = {… and Janice. Singer and DIK Sjøberg},
	title = {Guide to Advanced Empirical Software Engineering},
	journal = {Springer-Verlag London},
	note = {Query date: 14.06.2011},
}

 D.9 Software constraint models: one duplicate
 ---------------------------------------------
  @article{pop00a104,
	author = {DIK Sjøberg},
	title = {Tittel: Software constraint models Undertittel: a means to
            improve maintainability and consistency Publisert år: 1994
            Dokumenttype: Artikkel Språk: Engelsk},
	journal = {duo.uio.no},
	url = {http://www.duo.uio.no/sok/work.html?WORKID=89934},
	note = {Query date: 14.06.2011},
}
@article{pop00a106,
	author = {DIK Sjøberg},
	title = {Software Constraint Models–A Means to
            Improve Maintainability and Consistency},
	journal = {Citeseer},
	note = {Query date: 14.06.2011},
}

 D.10 Thesaurus-based methodologies ...: one duplicate
 -----------------------------------------------------
  @book{pop0000a9, %% book is his PhD thesis??
	author = {DIK Sjøberg},
	title = {Thesaurus-based methodologies and tools for
            maintaining persistent application systems},
	publisher = {University of Glasgow},
	year = {1993},
	note = {20 cites: http://scholar.google.com/scholar?
      cites=14572592812034530942\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
@article{pop000a41,
	author = {DIK Sjøberg and MP Atkinson and ...},
	title = {Thesaurus-based software environments},
	journal = {The Intersection between …},
	publisher = {Citeseer},
	year = {1994},
	note = {5 cites: http://scholar.google.com/scholar?
       cites=180172948683811808\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
@article{pop000a95,
	author = {DIK Sjøberg and MP Atkinson and ...},
	title = {Tittel: Thesaurus-based software environments Publisert år: 1994 Dokumenttype: Artikkel Språk: Engelsk},
	journal = {duo.uio.no},
	url = {http://www.duo.uio.no/sok/work.html?WORKID=89935},
	note = {Query date: 14.06.2011},
}

 D11. Tichy-shared maintenance paper: one duplicate
 --------------------------------------------------
@article{pop000a21,
	author = {M Vokáč and W Tichy and DIK Sjøberg and E Arisholm and ...},
	title = {A controlled experiment comparing the maintainability of
            programs designed with and without design
           patterns—a replication in a real programming environment},
	journal = {Empirical Software …},
	publisher = {Springer},
	url = {http://www.springerlink.com/index/m370523qvm4489h4.pdf},
	year = {2004},
	note = {33 cites: http://scholar.google.com/scholar?
        cites=15494631168699208001\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop000a99,
	author = {W Tichy and DIK Sjøberg and E Arisholm and ...},
	title = {A Controlled Experiment Comparing the Maintainability of
            Programs Designed With And Without Design Patterns:
           A Replication In A Real Programming Environment”, …},
	journal = {Empirical Software …},
	publisher = {Citeseer},
	url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.3325},
	year = {2004},
	note = {Query date: 14.06.2011},
}

 D13. SPIQ reflection paper for EWSPT'03: one duplicate.
 -------------------------------------------------------
  @article{pop000a46,
	author = {R Conradi and T Dybå and DIK Sjøberg and ...},
	title = {Lessons learned and recommendations from two large
            norwegian SPI programmes},
	journal = {… process technology: 9th …},
	publisher = {books.google.com},
	url = {http://books.google.com/books?hl=en\&lr=\&id=RooizOPAQX8C\&
          oi=fnd\&pg=PA32\&dq=DIK+Sjoberg\&ots=RsJWA7UhZQ\&
          sig=tTV_2jy3I9pBIz8kdxcnbtKkcII},
	year = {2003},
	note = {3 cites: http://scholar.google.com/scholar?
        cites=6917046709958087706\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
  @article{pop00a103,
	author = {RCT Dybå and DIK Sjøberg and ...},
	title = {Lessons Learned and Recommendations from
            Two Large Norwegian SPI Programmes},
	journal = {Software Process Technology},
	publisher = {Springer},
	url = {http://www.springerlink.com/index/h61mxacxmk3f1y7a.pdf},
	year = {2003},
	note = {Query date: 14.06.2011},
}

 D13. Code smell paper: one duplicate
 ------------------------------------
  @article{pop000a61,
	author = {… and DS Cruzes and DIK Sjoberg},
	title = {Are all code smells harmful},
	journal = {A study of God Classes and Brain Classes in …},
	year = {2010},
	note = {2 cites: http://scholar.google.com/scholar?
       cites=8517321368272658612\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}
@article{pop000a66,
	author = {… and DS Cruzes and DIK Sjoberg},
	title = {Are all code smells harmful? A study of God Classes
           and Brain Classes in the evolution of three open source systems},
	journal = {… (ICSM), 2010 IEEE …},
	publisher = {ieeexplore.ieee.org},
	url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5609564},
	year = {2010},
	note = {2 cites: http://scholar.google.com/scholar?
     cites=16245556814282503450\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}

 D14. Theory Use paper: one duplicate
 ------------------------------------

 D15. ?? paper: one duplicate
 ----------------------------

 D16. ?? paper: one duplicate
 ----------------------------

 -- and there may be more duplicates!!


 Errors and Garbage
 ==================

 G1. Sjberg's CV as a paper: one paper  %% crazy!!
 --------------------------------------
  @article{pop000a37,
	author = {DIK Sjøberg},
	title = {received the MSc degree in computer science from the
           University of Oslo in 1987 and the PhD degree in
           computing science from the University of …},
	journal = {Empirical Software Enineering},
	note = {2 cites: http://scholar.google.com/scholar?
         cites=5736062937442514044\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}

 G2. Title is wrong: mixed authors and numbers, submitted to, misc.
                     - at least six papers
 ------------------------------------------------------------
  @article{pop000a77,
	author = {… and SK Shrivastava and DIK Sjoberg and ...},
	title = {Anfindsen, Ole J. 215 Atkinson, Malcolm v, 1, 235,307,
      335 Berman, S. 171, 250 Blackburn, Stephen M. 37, 215, 259, 363},
	journal = {… in persistent object …},
	publisher = {Morgan Kaufmann Pub},
	year = {1999},
	note = {Query date: 14.06.2011},
}

  @article{pop000a49,
	author = {DIK Sjøberg and PC Philbrow and C Waite and ...},
	title = {Build management in database programming language environments},
	journal = {Submitted to: 6th International …},
	year = {1995},
	note = {2 cites: http://scholar.google.com/scholar?
          cites=399355406703947029\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
}

  @article{pop000a76,  %% a7 = a48
	author = {… and T Dybå and DIK Sjøberg and JE Hannay and
             DIK Sjøberg and ...},
	title = {ARE ENGINEERING},
	journal = {ieeexplore.ieee.org},
	url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4052582},
	note = {Query date: 14.06.2011},
   %% Double author by DIK Sjberg??
   %%	JE Hannay, DIK Sjberg, T. Dyba:
   %% 'A Systematic Review of Theory Use in Software Engineering
   %%   Experiments',
   %%   IEEE Transactions on Software Engineering, Feb. 2007,
   %%   33(2):87-107, DOI:10.1109/TSE.2007.12.
}

  @book{pop000a85, %% @book => which? @article
	author = {  and G Brunet and M Chechik and BCD Anda and
             DIK SjÁberg and ...},
	title = {ARE ENGINEERING},
	publisher = {ieeexplore.ieee.org},
	url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5061639},
	year = {2009},
	note = {Query date: 14.06.2011},
   %%	author = {G Brunet and M Chechik and BCD Anda and
      DIK SjÁberg and ...:},
   %% 'xxx',
   %%   IEEE Transactions on Software Engineering, Feb. 20??
   %%   xx(x):xx-xxx, DOI:??.
}
  @book{pop000a87,  %% @book => @article
	author = {  and JE Hannay and E Arisholm and
            H Engvik and DIK SjÁberg and ...},
	title = {ARE ENGINEERING},
	publisher = {ieeexplore.ieee.org},
	url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5401362},
	year = {2010},
	note = {Query date: 14.06.2011},
}
  @article{pop000a96,
	author = {D Wang and FB Bastani and IL Yen and DIK SjÁberg and ...},
	title = {ARE ENGINEERING},
	journal = {117.55.241.},
	url =	{http://117.55.241.6/library/ieee/2005/Software%20Engineering/
            Vol.%2031%20Issue%209/table%20of%20contents.pdf},
	note = {Query date: 14.06.2011},
}
 %%%%%%%%%%%%%%%%%%%%%%%% PoP-errors from pop-sjoberg-all.tex


 4.2 CASE-2: The missing 12 PoP-papers in pop-sjoberg-excl-chem-mat.tex
 ======================================================================
 pop-sjoberg-all.tex has 109 papers, named X1-X12.
 pop-sjoberg-excl-chem-mat.tex has 97 papers, so where are the other 12?
   and none of these deals with Chemistry or Materials Science!

 X1. @article{pop000a11, =a1 in pop*.tex
   author = {DIK Sjoberg and JE Hannay and ...},
	title = {Vigdis By Kampenes, Amela Karahasanovic, Nils-Kristian Liborg,
      Anette C. Rekdal, A Survey of Controlled Experiments in
      Software Engineering},
	journal = {IEEE Transactions on Software Engineering},
	year = {2005},
	note = {57 cites: http://scholar.google.com/scholar?
        cites=3972648676543320866\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
   %% Generally OK.
}

 X2. @article{pop000a38, %% @article => @inproceedings 
                         %%            (+some @proceedings & editor)
	author = {… and B Anda and M Jørgensen and DIK Sjøberg},
	title = {Guidelines on Conducting Software Process Improvement
           Studies in Industry},
	journal = {Seminar in Scandinavia ( …},
	publisher = {Citeseer},
	year = {1999},
	note = {9 cites: http://scholar.google.com/scholar?
        cites=12426118134991045755\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
   %% author = {Erik Arisholm, Bente Anda, Magne Jrgensen and
   %%           Dag I.K. Sjberg},
   %% title = {Guidelines on Conducting Software Process Improvement
   %%          Studies in Industry},
   %% book = {Proc. 22th Information Research Seminar in Scandinavia (IRIS)},
   %% editor = {??},
   %% publisher = {??},
   %% year = {1999},
   %% where = {Jyvaeskylae??, Finland},
   %% pages = {??},
}

 X3. @article{pop000a55,  %% = X1 =a11 above, = a1 elsewhere
	author = {DIK Sjøberg and JE Hannay and ...},
	title = {Vigdis By Kampenes, Amela Karahasanovic, Nils-Kristian Liborg,
           and Anette C. Rekdal. 2005." A Survey of Controlled	Experiments
           in Software …},
	journal = {IEEE Transactions on Software Engineering},
	note = {3 cites: http://scholar.google.com/scholar?
       cites=14313076272726917795\&as_sdt=2005\&sciodt=0,5\&hl=en\&num=100},
	note = {Query date: 14.06.2011},
   %% Almost OK.
}

 X4. @article{pop000a76,  = a7
	author = {… and T Dybå and DIK Sjøberg and JE Hannay and
             DIK Sjøberg and ...},
	title = {ARE ENGINEERING},
	journal = {ieeexplore.ieee.org},
	url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4052582},
	note = {Query date: 14.06.2011},
   %% author = {JE Hannay and DIK Sjoberg and T Dyba},%% Note author sequence
   %% title = {A systematic review of Theory Use in software engineering
   %%          experiments},
   %% IEEE TSE, 13(2):87-107, Feb. 2007.
   %% NB: Preceeding paper:
   %%   E. Arisholm, H. Gallis, T. Dyb, and D.I.K. Sjberg:
   %%   Evaluating Pair Programming with Respect to System Complexity and
   %%   Programmer Expertise, on pp. 65-86 in same IEEE TSE issue!!
}

 X5. @book{pop000a79, %% @book => @proceedings, author => editor
	author = {… and AL Opdahl and DIK Sjøberg},
	title = {Proceedings of NWPER'2000: Nordic Workshop on Programming
            Environment Research: Lillehammer, Norway, May 28-30, 2000},
	publisher = {University of Bergen, Dept. of …},
	year = {2000},
	note = {Query date: 14.06.2011},
}

 X6. @article{pop000a84,
	author = {DIK Sjøberg},
	title = {Tittel: Quantifying schema evolution;
           Publisert år: 2009 Dokumenttype: Artikkel Språk: Norsk Bokmål},
	journal = {duo.uio.no},
	url = {http://www.duo.uio.no/sok/work.html?WORKID=89933\&lang=no},
	note = {Query date: 14.06.2011},
   %% journal = {Information and Software Technology}, 35(1):35-44 (1993).
}

 X7. @book{pop000a85, %% @book => @article
	author = {… and G Brunet and M Chechik and BCD Anda and
            DIK Sjøberg and ...},
	title = {ARE ENGINEERING},
	publisher = {ieeexplore.ieee.org},
	url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5061639},
	year = {2009},
	note = {Query date: 14.06.2011},
   %% B.C.D. Anda, D.I.K. Sjberg, and A. Mockus:
   %%   Variability and Reproducibility in Software Engineering:
   %%   A Study	of Four Companies that Developed the Same System
   %%   IEEE TSE, 35(3):407-429, May-June 2009.
   %%   DOI: 10.1109/TSE.2009.37.
   %% Note preceeding paper in same issue:
   %%   Sebastin Uchitel, Greg Brunet, Marsha Chechik:
   %%   Synthesis of Partial	Behavior Models from Properties and Scenarios.
   %%   IEEE TSE, 35(3):384-406 (2009).
}

 X8. @book{pop000a87, = a28, %% @book => @article
	author = {… and JE Hannay and E Arisholm and H Engvik and
             DIK Sjøberg and ...},
	title = {ARE ENGINEERING},
	publisher = {ieeexplore.ieee.org},
	url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5401362},
	year = {2010},
	note = {Query date: 14.06.2011},
   %% author ={J.E. Hannay and E. Arisholm and H. Engvik and D.I.K. Sjberg},
   %% title = {Effects of personality on pair programming},
   %% IEEE TSE 36(1):61-80, Jan.-Feb. 2010.
}

 X9. @book{pop000a89, %% @book => @inproceedings
                      %%         (+ some @proceedings + editor)
	author = {DIK Sjøberg and MP Atkinson and ...},
	title = {Managing change in persistent object systems},
	publisher = {en.scientificcommons.org},
	url = {http://en.scientificcommons.org/42169124},
	year = {1993},
	note = {Query date: 14.06.2011},
   %% author = {Malcolm P. Atkinson, Dag I. K. Sjberg, Ronald Morrison},
   %%	Managing Change in Persistent Object Systems.
   %% In Shojiro Nishio, Akinori Yonezawa (Eds.): Object Technologies for
   %% Advanced Software, First JSSST International Symposium, Kanazawa,
   %% Japan, November 4-6, 1993, Proceedings. Lecture Notes in Computer
   %% Science 742, Springer Verlag, 1993, ISBN 3-540-57342-9, pp. 315-338.

@inproceedings{DBLP:conf/isotas/AtkinsonSM93,
  author    = {Malcolm P. Atkinson and
               Dag I. K. Sj{\o}berg and
               Ronald Morrison},
  title     = {Managing Change in Persistent Object Systems},
  booktitle = {ISOTAS},
  year      = {1993},
  pages     = {315-338},
  ee        = {http://dx.doi.org/10.1007/3-540-57342-9_81},
  crossref  = {DBLP:conf/isotas/1993},
  bibsource = {DBLP, http://dblp.uni-trier.de}

@proceedings{DBLP:conf/isotas/1993,
  editor    = {Shojiro Nishio and
               Akinori Yonezawa},
  title     = {Object Technologies for Advanced Software, First JSSST
  International
               Symposium, Kanazawa, Japan, November 4-6, 1993,
					Proceedings},
  booktitle = {ISOTAS},
  publisher = {Springer},
  series    = {Lecture Notes in Computer Science},
  volume    = {742},
  year      = {1993},
  isbn      = {3-540-57342-9},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

 X10. @article{pop000a96, %% remove this entry!!
	author = {D Wang and FB Bastani and IL Yen and DIK Sjøberg and ...},
	title = {ARE ENGINEERING},
	journal = {117.55.241.},
   url =	{http://117.55.241.6/library/ieee/2005/Software%20Engineering/
        Vol.%2031%20Issue%209/table%20of%20contents.pdf}, %% ??
	note = {Query date: 14.06.2011},
   %% D. Wang, F.B. Bastani, I.-L. Yen:   
   %%   Automated aspect-oriented decomposition of process-control
   %%   systems for ultra-high dependability assurance.
   %%   IEEE TSE, 31(9):713-732, Sept. 2005.
   %% But succeeding paper in this issue has DIK Sjberg as first author: !!
   %%   Dag I. K. Sjberg, Jo Erskine Hannay, Ove Hansen, Vigdis By Kampenes,
	%%   Amela Karahasanovic, Nils-Kristian Liborg, Anette C. Rekdal: A
	%%   Survey of Controlled Experiments in Software Engineering.
   %%   IEEE TSE, 31(9): 733-753 (2005).
}

 X11. @article{pop000a98, %% @article => @inproceedings
                          %% (+ some @proceedings & author=NN elsewhere)
	author = {DIK Sjøberg and MP Atkinson and J Lopes and ...},
	title = {Tittel: Building an integrated persistent application
         Publisert år: 1993 Dokumenttype: Konferansebidrag Språk: Engelsk},
	journal = {duo.uio.no},
	url = {http://www.duo.uio.no/sok/work.html?WORKID=89912\&lang=no},
	note = {Query date: 14.06.2011},                    
   %% Dag I. K. Sjberg, Malcolm P. Atkinson, Joo Lopes, Philip W. Trinder:
   %%   Building an Integrated Persistent Application.
   %%   Proc. Fourth International Workshop on
   %%   Database Programming Languages - Object
   %%   Models and Languages (DBPL), 1993, pp. 359-375.
}

 X12. @article{pop00a100, = X9
	author = {DIK Sjøberg and MP Atkinson and ...},
	title = {Tittel: Managing change in persistent object systems
         Publisert år: 1993 Dokumenttype: Konferansebidrag Språk: Engelsk},
	journal = {duo.uio.no},
	url = {http://www.duo.uio.no/sok/work.html?WORKID=89932},
	note = {Query date: 14.06.2011},
}
end %%%%%%%%%%%%%%%%%% subset of pop-sjoberg-all.tex 14.06.2011


 4.3 CASE-3: Prune Conradi's 337=>297 papers on pop-conradi-EngCSMath.tex
 ========================================================================
 File:  pop-conradi-onlyEngCSMath.tex  15.06.2011

 Query: R Conradi, onlyEngCSMath fields ('IT').

 Summary:

 Result:  337 seemingly valid bibtex entries.
          285 R Conradi => C cc.
            1 COnradi (no182 has a capital 'O')!
            4 added in author list: (no137, no246, no282, no290)
                                    without an explicit C cc.
            7 of R CONRADI => R RRRRRRR (all in capital letters).
          297 correct ones!!
 Deducted as being 'other' Conradi-persons: -- insert a filter for these?
          36 * Conradi => * OtherXX (* : 'R' or '')
           4 * CONRADI => * OTHERXX (similar)
          (In 5 URLs: *R+Conradi => X+XX, but not making extra bib-entry)
          Note: cannot say 'R Conradi' and get only one 'R',
                without also getting '*R', i.e. 'R'.

 Plus the usual number of inconsistencies!!

end %%%%%%%%%%%%%%%%%% pop-conradi-EngCSMath.tex (IT) 14.06.2011