Overview of different types of Empirical Studies

From Parastoo Mohagheghi and Reidar Conradi: "Quality, Productivity and Economic Benefits of Software Reuse -- A Review of Industrial Studies", Accepted Jan. 2007 for Journal of Empirical Software Engineering, 55 p.
http://www.idi.ntnu.no/grupper/su/publ/parastoo/jese-reusereview-8mar07.pdf

2.2 The different Study Types

Reidar: rewrite this intro to previous chapter 2.2.
We (the two mentioned authors) decided to include all studies reporting quantitative results from industry related to reuse in the review and then classify the study type, leaving out surveys and papers with discussion but no hard data. The study type is important information in each study since it communicates what is expected from a study and how the evidence should be evaluated. However, a search of literature for study types showed that there are not consistent definitions and/or the definitions are not communicated well. Therefore, we have to define our perspective of study types.

One definition of study types that is applied on empirical research is given by Zannier et al. (2006) (see this paper for a complete list of their references). Table 1 shows these definitions and some others that we found, named T1-T9.
Reidar: added five new ones, named T10-T14. More to come as Tn, cf. (Zelkowitz and Wallace, 1998)??.

Table 1 Study types and their definitions

Study type Definition - for T1-T9 verbatim after (Zannier et al., 2600) (Reidar: Added Quantiative or Qualitative.) Other definitions Examples of studies - very preliminary
T1. Controlled experiment All of the following exist: Random assignment of treatments to subjects. Large sample size (>10 participants). Hypotheses formulated. Independent variable selected. Random sampling [1]. Quantitative. - Controlled study (Zelkowitz and Wallace, 1998).
- Experimental study where particularly allocation of subjects to treatments are under the control of the investigator. (Kitchenham, 2004)
- Experiment with control and treatment groups and random assignment of subjects to the groups, and single-subject design with observations of a single subject. The randomization applies on the allocation of the objects, subjects and in which order the tests are performed (Wohlin et al., 1999).
- Experiments explore the effects of things that can be manipulated. In randomized experiments, treatments are assigned to experimental units by chance (Shadish et al., 2001).
Our note: Randomization is used to assure a valid sample that is a representative subset of the study population; either in an experiment or other types of study. However, defining the study population and a sampling approach that assure representativeness is not an easy task, as discussed in (Conradi et al., 2005).
refs??
T2. Quasi-experiment One or more points in Controlled Experiment are missing [3]. Mostly Quantitative. - In a quasi-experiment, there is a lack of randomization of either subjects or objects (Wohlin et al., 1999).
- Quasi-experiment where strict experimental control and randomization of treatment conditions are not possible. This is typical in industrial settings (Frakes and Succi, 2001).
- Quasi-experiments lack random assignment. The researcher has to enumerate alternative explanations one by one, decide which are plausible, and then use logic, design, and measurement to assess whether each one is operating in a way that might explain any observed effect (Shadish et al., 2001).
refs??
T3. Case study All of the following exist: Research question stated. Propositions stated. Unit(s) of analysis stated. Logic linking the data to propositions stated. Criteria for interpreting the findings provided. Performed in a "real world" situation [26]=(Yin, 2003). Qualitative/Quantitative. - A case study is an empirical inquiry that investigates a contemporary phenomenon within its real-life context, especially when the boundaries between phenomenon and context are not clearly evident. A sister-project case study refers to comparing two almost similar projects in the same company, one with and the other without the treatment (Yin, 2003).
- Observational studies are either case studies or field studies. The difference is that multiple projects are monitored in a field study, may be with less depth, while case studies focus on a single project (Zelkowitz and Wallace, 1998).
- Case studies fall under observational studies with uncontrolled exposure to treatments, and may involve a control group or not, or being done at one time or historical (Kitchenham, 2004).
Some papers on case studies (just a pre-taste!):
  • Hans Jørgen Lied and Tor Stålhane: "Experience from process improvement in an SME", R. Messnarz (Ed.): Proc. European Software Process Improvement Conference (EuroSPI?9), Pori, Finland, October 25-27, 1999, Pori School of Technology and Economics, Serie A25, 13 p..
    Download paper (.pdf -- 1.1 MB)
T4. Exploratory case study One or more of points in Case Study are missing [26]. Mostly Qualitative. The propositions are not stated but other components should be present (Yin, 2003). refs??
T5. Experience report All of the following exist: Retrospective. No propositions (generally). Does not necessarily answer how or why. Often includes lessons learned [17]. Qualitative/Quantitative. Our note 1: Covers Postmortem Analysis (PMA) for situations such as completion of large projects, learning from success, or recovering from failure (Birk et al., 2002).
- Our note 2: Data mining can be a way to perform some kind of case studies on available data from past projects, possibly assisted by semi-automatic data capture and processinng (via scripts etc.).
refs??
T6. Meta-analysis Study incorporates results from previous similar studies in the analysis [9]. Qualitative/Quantitative. - Historical studies examine completed projects or previously published studies (Zelkowitz and Wallace, 1998).
Our note: Meta-analysis covers a range of techniques for summarizing findings of studies.
refs??
T7. Example application Authors describing an application and provide an example to assist in the description, but the example is "used to validate" or "evaluate" as far as the authors suggest [21]. Qualitative. Our note: If an example is used to evaluate a technique already developed or to apply a technique in a new setting, it is not classified under example application.
Maybe informal feasibility study or demonstrator is better name?
refs??
T8. Survey Structured or unstructured questions given to participants [16]. - The primary means of gathering quantitative data in surveys is a well-designed, textual questionnaire, containing mostly closed questions (Wohlin et al., 1999). Typical ways to fill in a questionnaire are by paper copy via post or possibly fax, by email attachment, by phone or site interviews, or recently by web (Conradi et al., 2005).
- Structured interviews, with both quantitative and qualitative questions, rely on a pre-made interview guide. Such interviews are used to investigate also more open and qualitative research questions with some generalization potential. Sometimes the interview guide can be reworked into a questionnaire, when more is known about the field of study.
refs??
T9. Discussion Provided some qualitative, textual, opinion-oriented evaluation. E.g. compare and contrast, oral discussion of advantages and disadvantages [no reference]. Quantitative. Expert opinion (Kitchenham, 2004). refs??
T10. Literature study / review Systematic and representative review of previous literature on the actual subject. Qualitative/Quantitative. Alternative def. See [22] on a survey (here called a survey) of 103 controlled experiments.
T11. Ethograhical field study Observation of study object - passive or more participative? Qualitative. Alternative def. See [Perry1994] on an observational study of Lucent developers.
T12. Depth interview No or few questions made beforehand, as opposed to T8 survey. Qualitative. Comment: may take hours and days. refs??
T13. Action research Researcher mingles with the observed persons, often being developers. That is, active co-participation in contrast with more passive participation as in strict T10 field studies. Qualitative. Alternative def. See [Davison2004] on canonical action research.
T14. Grounded theory Generalization based on several of the more qualitative study types above (T11-T13). Qualitative. See .. on grounded theory applied in informatics. refs??
Tn. More methods ?? E.g. the 12 ones in (Zelkowitz and Wallace, 1998). Quantitative or Qualitative. 12 validation methods characterized by Name, Category (one of three is given below in parentheses), Description, and Weakness: Project monitoring, Case study, Assertion, and Field study (Observational); Literature search, Legacy, Lessons learned, and Static analysis (Historical); Replicated, Synthetic, Dynamic analysis, and Simulation (Controlled). Also more methods characterized in four other categories: Scientific method, Engineering method, Empirical method, and Analythical method. refs??

Comments: - Reidar: rewrite these later.

Zannier et al. (2006) analyzed a random sample of 63 papers published in 29 ICSE (the International Conference on Software Engineering) proceedings since 1975 using the above classification. Authors of only 25 papers had defined their study type, and Zannier et al. give both authors and their perspective of the study types. We use their definitions but also add that when studies are performed at a single point in time, they are called cross-sectional, as opposed to longitudinal studies.

A case study may be comparative, and Kitchenham and Pickard (1998) describe three methods of comparison in a quantitative case study, which are a) comparing the results of using a new method with a company baseline, b) comparing components within a project that are randomly exposed to a new method to others or within project component comparison, and c) comparing a project using a new method to a sister project that uses the current method or sister-project case studies. An alternative sister-project design is developing a product twice using different methods or replicated product design. This review found examples of b) and c) in different types of studies, and we hence call the method of comparison for component-comparison (components may be from one or several products) and sister-project comparison (including replicated product design).

References- Reidar: slim this to actually used ones.

Baldassarre, M.T., Bianch, A., Caivano, D. and Vissaggio, G. 2005. An industrial case study on reuse oriented development. Proc. 21st IEEE Int'l Conf. on Software Maintenance (ICSM'05): 283-292.

Basili, V.R. 1990. Viewing maintenance as reuse-oriented software development. IEEE Software. 7(1): 19-25.

Basili, V.R., Briand, L.C. and Melo, W.L. 1996. How software reuse influences productivity in object-oriented systems. Comm. of the ACM. 39(10): 104-116.

Birk, A., Dingsøyr, T. and Stålhane, T. 2002. Postmortem: Never leave a project without it. IEEE Software. 19(3): 43-45.

Boehm, B., Brown, W., Madachy R. and Yang, Y. 2004. Software product line cycle cost estimation model. Proc. 2004 ACM-IEEE Int'l Symposium on Empirical Software Engineering (ISESE'04): 156-164.

Conradi, R., Li, J., Slyngstad, O.P.N., Kampenes, V. B., Bunse, C., Morisio, M. and Torchiano, M. 2005. Reflections on conducting an international survey of Software Engineering. Proc. 4th International Symposium on Empirical Software Engineering (ISESE'05): 214-223.

Coverty, http://scan.coverity.com/, visited in April 2006.

Dedrick, J., Gurbaxani, V. and Kraemer, K.L. 2003. Information technology and economic performance: a critical review of the empirical evidence. ACM Computing Surveys. 35(1): 1-28.

Dybå, T., Kitchenham, B.A. and Jørgensen, M. 2005. Evidence-based software engineering for practitioners. IEEE Software. 22(1): 58-65.

Dybå, T., Kampenes, V. B. and Sjøberg, D. 2006. A systematic review of statistical power in software engineering experiments. Information and Software Technology. 48(8): 745-755.

Fenton, N., Krause, P. and Neil, M. 2002. Software measurement: uncertainty and causal modeling. IEEE Software. 19(4): 116-122.

Fitzgerald, B. and Kenny, T. 2004. Developing an information system infrastructure with open source software. IEEE Software. 21(1): 50-55.

Frakes, W.B. and Kang, K. 2005. Software reuse research: status and future. IEEE Trans. Soft. Eng. 31(7): 529-536.

Frakes, W.B. and Terry, C. 1996. Software reuse: metrics and models. ACM Computing Surveys. 28(2): 415-435.

Frakes, W.B. and Succi, G. 2001. An industrial study of reuse, quality and productivity. Journal of Systems and Software. 57(2001): 99-106.

Gigerenzer, G. 2004. Mindless statistics. The Journal of Socio-Economics. 33(2004): 587–606.

Glass, R.L. 1997. Telling good numbers from bad ones. IEEE Software. 14(4): 15-16, 19.

Glass, R.L. 2002. In search of meaning (a tale of two words). IEEE Software. 19(4): 136, 134-135.

Gregor, S. 2002. A theory of theories in information science. In Gregor, S. and Hart, D. (Eds.). Information Systems Foundations: Building the Theoretical Base. Australian National University. Canberra. 1-20.

Hallsteinsen, S. and Paci, M. (Eds.). 1997. Experiences in Software Evolution and Reuse. Springer.

Kitchenham, B.A. and Pickard, L.M. 1998. Evaluating software eng. methods and tools- part 10: designing and running a quantitative case study. ACM Sigsoft Soft. Eng. Notes. 23(3): 20-22.

Kitchenham, B.A., Pfleeger, S.L., Hoaglin, D.C., El Emam, K. and Rosenberg, J. 2002. Preliminary guidelines for empirical research in software engineering. IEEE Trans. Soft. Eng. 28(8): 721-734.

Kitchenham, B.A. 2004. Procedures for performing systematic reviews. Joint technical report, Keele University Technical Report TR/SE-0401 and National ICT Australia Technical Report 0400011T.1.

Krueger, C. 2002. Eliminating the adoption barrier. IEEE Software. 19(4): 29-31.

Lee, A.S. and Baskerville, R.L. 2003. Generalizing generalizability in information systems research. Information Systems Research. 14(3): 221-243.

Li, M. and Smidts, C.S. 2003. A ranking of software engineering measures based on expert opinion. IEEE Trans. Soft. Eng. 29(9): 811-824.

Lim, W. C. 1994. Effect of reuse on quality, productivity and economics. IEEE Software. 11(5): 23-30.

Lim, W.C. 1996. Reuse economics: a comparison of seventeen models and directions for future research. Proc. 4th Int'l Conf. on Software Reuse (ICSR'96): 41-50.

Madanmohan, T.R. and Dé, R. 2004. Open source reuse in commercial firms. IEEE Software. 21(6): 62-69.

Mohagheghi, P., Conradi, R., Killi, O.M. and Schwarz, H. 2004. An empirical study of software reuse vs. defect-density and stability. Proc. 26th Int'l Conf. on Software Engineering (ICSE'04): 282-292.

Mohagheghi, P., Conradi, R. and Børretzen, J.A. 2006. Revisiting the problem of using problem reports for quality assessment. Proc. 6th Worksop on Software Quality (WoSQ'06) - as part of Proc. 28th International Conference on Software Engineering & Co-Located Workshops: 45-50.

Mohagheghi, P. and Conradi, R. 2006. Vote-counting for combining quantitative evidence from empirical studies- an example. Proc. 5th ACM-IEEE Int'l Symposium on Empirical Software Engineering (ISESE'06): 24-26.

Mohagheghi, P. and Conradi, R. 2007. An empirical investigation of software reuse benefits in a large telecom product. To appear in ACM Transactions of Software Engineering Methodology (TOSEM).

Morad, S. and Kuflik, T. 2005. Conventional and open source software reuse at Orbotech- an industrial experience. Proc. IEEE Int'l Conf. on Software- Science, Technology & Engineering (SwSTE'05), 8 p.

Morisio, M., Romano, D. and Stamelos, I. 2002. Quality, productivity, and learning in framework-based development: an exploratory case study. IEEE Trans. Soft. Eng. 28(9): 876-888.

Morisio, M., Tully, C. and Ezran, M. 2000. Diversity in reuse processes. IEEE Software. 17(4): 56-63.

Norris, J.S. 2004. Mission-critical development with open source software: lessons learned. IEEE Software. 21(1): 42-49.

Perry, D.E., Sim, S.E. and Easterbrook, S.M. 2004. Case studies for software engineering. Proc. 26th Int'l Conf. on Software Engineering (ICSE'04): 736-738.

Pfleeger, S.H. 1996. When the pursuit of quality destroys value. IEEE Software. 13(3): 93-95.

Pfleeger, S.H. 2005. Soup or art? The role of evidential force in empirical software engineering. IEEE Software. 22(1):66-73.

Pickard, L.M., Kitchenham, B.A. and Jones, P.W. 1998. Combining empirical results in software engineering. Information and Software Technology. 40(1998): 811-821.

Ramachandran, M. and Fleischer, W. 1996. Design for large scale software reuse: an industrial case study. Proc. 4th Int'l Conf. on Software Reuse (ICSR'96): 104-111.

Shadish, W.R., Cook, T.D. and Campbell, D.T. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin Company.

Schneidewind, N.F. 1992. Methodology for validating software metrics. IEEE Trans. Software Eng. 18(5): 410-422.

Selby, W. 2005. Enabling reuse-based software development of large-scale systems. IEEE Trans. Soft. Eng. 31(6): 495-510.

SEVO. 2006. http://www.idi.ntnu.no/grupper/su/sevo/index.html

Sommerseth, M. 2006. Component based system development in the Norwegian software industry. NTNU master thesis. http://www.idi.ntnu.no/grupper/su/su-diploma-2006/sommerseth-dipl06.pdf.

Succi, G., Benedicenti, L. and Vernazza, T. 2001. Analysis of the effects of software reuse on customer satisfaction in an RPG environment. IEEE Trans. Soft. Eng. 27(5): 473-479.

Szyperski, C. (with Gruntz, D., Murer, S.). 2002. Component Software, Beyond Object-Oriented Programming. Addison Wesley, 2nd edition.

Thomas, W.M., Delis, A. and Basili, V.R. 1997. An analysis of errors in a reuse-oriented development environment. Journal of Systems and Software. 38(3): 211-224.

Tomer, A., Goldin, L., Kuflik, T., Kimchi, E. and Schach, S.R. 2004. Evaluating software reuse alternatives: a model and its application to an industrial case study. IEEE Trans. Soft. Eng. 30(9): 601-612.

Yin, R.K. 2003. Case Study Research, Design and Methods. Sage Publications.

Wang, C. 1993. Sense and Nonsense of Statistical Inference: Controversy, Misuse, and Subtlety. Marcel Dekker.

Webster, J. and Watson, R.T. 2002. Analyzing the past to prepare for the future: writing a literature review. MIS Quarterly. 26(2): xiii-xxiii.

Wohlin, C., Runeseon, P., Høst, M., Ohlsson, M.C., Regnell, B. and Wesslen, A. 1999. Experimentation in Software Engineering. Kluwer Academic Publications.

Zannier, C., Melnik, G. and Maurer, F. 2006. On the success of empirical studies in the International Conference on Software Engineering. Proc. 28th Int'l Conf. on Software Engineering (ICSE'06): 341-350.

Zelkowitz, M.V. and Wallace, D.R. 1998. Experimental models for validating Technology. IEEE Computer. 31(5): 23-31.

Zhang, W. and Jarzabek, S. 2005. Reuse without compromising performance: industrial experience from RPG software product line for mobile devices. Proc. 9th int'l Software Product Line Conf. (SPLC'05): 57-69.

-----------------------New refereces------------------------------

[Davison2004] Davison, Robert M., Martinsons, Maris G., and Kock, Ned: "Principles of canonical action research", Information Systems Journal, 14(1):65-86, 2004.

[Perry1994] Dewayne E. Perry, Nancy Staudenmayer, and Lawrence G. Votta: "People, Organizations, and Process Improvement", IEEE Software, 11(4):36-45, July 1994 (describes a field study of developers' actual activities at Lucent).

References from Zannier paper:

[1] Basili V.R., et al.; "Experimentation in Software Engineering"; IEEE Trans. Software Engineering SE-12 7 July 1986.

[2] Basili V.R.; "The Experimental Paradigm in S/W Eng." Proc. Int. Wkshp. Experiment. S/W Eng. Issues; V706 1992.

[3] Basili V.R.; "The Role of Experimentation in Software Engineering: Past, Current, and Future"; Proc. 18th Int. Conf. S/W Engineering; pp442-449, 1996.

[4] Basili V.R., et al.; "Building Knowledge through Families of Experiments"; IEEE Trans. Soft. Eng. V 25 No 4, 1999.

[5] Basili V.R., et al.; "Using Experiments to Build a Body of Knowledge" Proc. 3rd Int. Andrei Ershov Memorial Conference on Perspectives of Sys. Informatics; V1755; pp 26-282, 1999.

[6] Briand L., et al.; "Empirical Studies of Object-Oriented Artifacts, Methods and Processes"; Empirical Soft. Eng.; V.4 No 4, pp 287-404, 1999.

[7] Fenton N., et al.; Software Metrics: A Rigorous & Practical Approach 2nd Ed; PWS Pub. Company, 1997.

[8] Fenton N., et al.; "Science and Substance: A Challenge to Software Engineers"; IEEE Soft. V.11 No4, pp 86-95, 1994.

[9] Glass G.V., et al.; Meta-analysis in Social Research; Beverly Hills, CA; Sage, 1981.

[10] Glass R.L., et al.; "Research in software engineering: an Analysis of the Literature" IST 44, pp491-506, 2002.

[11] Jeffery R., et al.; "Has Twenty-five Years of Empirical Software Engineering Made a Difference?" Proc. 9th Asia-Pacific Soft. Eng. Conf (APSEC 02), 2002.

[12] Juristo N., et al.; Basics of Software Engineering Experimentation; Kluwer Acad. Pub. Boston MA, 2001.

[13] Kitchenham B., et al.; "Preliminary Guidelines for Empirical Research in Software Engineering"; IEEE Trans. Software Eng, V.28 No.8: 721-734, 2002.

[14] Lukowicz P., et al.; "Experimental Evaluation in Computer Science: A Quantitative Study"; J. of Sys. & Soft, V.28 No.1 pp 9-18, 1995.

[15] Milton et al.; Introduction to Statistics; McGraw-Hill, 1997.

[16] Patton M.Q.; Qualitative Research & Evaluation Methods 3rd Ed.; Sage Publications, California, 2002.

[17] Perry D.E., et al.; "Case Studies for Software Engineering"; Proc. 26th Int. Conf. on S/W Engineering, 2004.

[18] Perry D., et al.; "Empirical Studies of Software Engineering: A Roadmap"; Int. Conf. on S/W Engineering; Proc. of the Conf. on the Future of S/W Engineering; pp 245-255, 2000.

[19] Pfleeger S.L.; "Soup or Art? The Role of Evidential Force in Empirical Software Engineering"; IEEE Software, 22(1):66-73, Jan-Feb 2005.

[20] Segal J., et al.; "The Type of Evidence Produced by Empirical Software Engineers"; REBSE 05 St. Louis, Missouri, 2005.

[21] Shaw M.; "Writing Good Software Engineering Research Papers" Proc. 25th Int. Conf. on S/W Eng.; pp 726-736, 2003.

[22] Sjøberg D.I.K., et al.; "A Survey of Controlled Experiments in S/W Engineering"; IEEE Trans. Soft. Eng.V31 #9, 2005.

[23] Tichy W.; "Should Computer Scientists Experiment More"; IEEE Computer V31, No.5 pp 32-40, May 1998.

[24] Walker R.J., et al.; "Panel: Empirical Validation - What, Why, When and How"; Proc. 25th Int. Conf. on S/W Engineering; pp 721-722, 2003.

[25] Yancey J.M.; "Ten Rules of Reading Clinical Research Reports"; American J. Orthodontics and Dentofacial Orthopedics, V.109, No.5: pp 558-564, 1996.

[26] Yin R.K; Case Study Research: Design and Methods, 3/e Thousand Oaks, CA: Sage Publications, 2002.

[27] Zelkowitz M.V., et al.; "Experimental Validation of New Software Technology"; S/W Eng. & Knowledge Eng; Empirical S/W Eng.; pp 229-263, 2003.

http://www.ntnu.idi.no/grupper/su/publ/ese/study-types.html


Reidar Conradi
Last modified: Wed Apr 11 13:11:35 MEST 2007