COMPUTATIONAL CREATIVITY EVALUATION 29/11/17 1
OUTLINE WHY TO EVALUATE WHEN TO EVALUATE WHAT TO EVALUATE WHO SHOULD EVALUATE HOW TO EVALUATE 29/11/17 2
WHY TO EVALUATE A comparative, scientific evaluation of creativity is essential for progress in computational creativity, not least to justify how creative a computational creativity system actually is. - Jordanous, 2012 29/11/17 3
WHY TO EVALUATE A comparative, scientific evaluation of creativity is essential for progress in computational creativity, not least to justify how creative a computational creativity system actually is. Ø Evaluation highlights progress Ø Evaluation shows what can be improved - Jordanous, 2012 Ø Evaluation (when done well) allows for comparison with other systems Ø Evaluation argues how a system is creative 29/11/17 4
WHEN TO EVALUATE Ø Evaluation should ideally be a part of every project undertaken in CC Ø Nowadays some type of evaluation is also mandatory for publication! Ø In addition to evaluation done within the system, the system should be evaluated in a comprehensive way at multiple stages during its development Ø Evaluation is an iterative, on-going process 29/11/17 5
WHEN TO EVALUATE Ø Systems should be evaluated when Ø A project starts: What can be achieved with the chosen methodology; setting future evaluation targets? Ø Once a part of the project is finished: Does the system do what it is intended to do; how can we boost its performance? Ø Once the whole project is finished: Does the system as a whole do what it was intended to do; how can we boost its performance? And how does the system compare to other similar systems? 29/11/17 6
WHEN TO EVALUATE Ø Systems should be evaluated when Ø A project starts: What can be achieved with the chosen methodology; setting future evaluation targets? Ø Once a part of the project is finished: Does the system do what it is intended to do; how can we boost its performance? Ø Once the whole project is finished: Does the system as a whole do what it was intended to do; how can we boost its performance? And how does the system compare to other similar systems? Summative evaluation provides a summary of a system s creativity while formative evaluation provides constructive feedback on its strengths and weaknesses. 29/11/17 7
WHAT TO EVALUATE JORDANOUS FOUR PPPPERSPECTIVES ON COMPUTATIONAL CREATIVITY Ø Person/Producer Ø Qualities of the system producing creative artefacts Ø (Could also apply to whoever designs and implements the system) Ø Process Ø Algorithmic processes within, and interactions with the creative entity Ø Product Ø The result of the creative process Ø Press/Environment Ø The environment in which the creativity is situated 29/11/17 8
WHAT TO EVALUATE EVALUATION CRITERIA Ø Ritchie (2001,2007): Quality, Novelty, Typicality Ø Suggested as metrics for evaluating the Product Ø Jordanous suggests can be used to evaluate all Ps Ø Suggests computing ratings for different criteria, e.g. the average typicality of produced items 29/11/17 9
WHAT TO EVALUATE EVALUATION CRITERIA Ø Colton (2008): The creative tripod Ø Skillfull, Appreciative, Imaginative Ø Colton originally suggested a shift from evaluating the product to evaluating the producer Ø He also recognizes that the programmer, the computer and the consumer can all contribute skill, appreciation and imagination to the creative experience Ø Definition of criteria is vague, but one interpretation is Ø Skillfull Ability to produce Ø Appreciative Ability to evaluate the value of the product Ø Imaginative Ability to produce novel items 29/11/17 10
WHAT TO EVALUATE EVALUATION CRITERIA Ø Colton, Charnley and Pease(2011): Computational Creativity Theory Ø IDEA: Well-being rating, Cognitive-effort rating Ø Pease and Colton then again shifted the focus from viewing the producer, process or product to viewing the effect the creative act has on an ideal audience Ø Well-being rating: the personal hedonistic value of a creative act Ø Cognitive-effort rating: the time a person is prepared to to spend interpreting the creative act and its results Ø In the IDEA model, these two ratings are used to compute various effects for a creative act, e.g. disgust 29/11/17 11
WHAT TO EVALUATE EVALUATION CRITERIA Ø Jordanous (2012): Components of creativity Ø 14 themes identified from literature Ø Active involvement and persistence; Dealing with uncertaintiy; Domain competence, General intellect; Generation of results, Independence and freedom, Intention and emotional involvement, Ø Can be used as evaluation criteria selectively Originality, Progression and development, Social interaction and communication; Spontaneity/Subsonscious processing; Thinking and evaluation; Value; Variety, Divergence and experimentation 29/11/17 12
WHAT TO EVALUATE EVALUATION CRITERIA Ø Van der Velde et al. (2015): Originality, Emotional value, Novelty/ innovation, Intelligence, Skill Ø A fresh look to evaluating products Ø Intended for outside evaluators 29/11/17 13
WHO SHOULD EVALUATE Ø Creator vs. Audience Ø Should the system be evaluated by the system s creators themselves, by outside experts, or by the intended audience Ø Experts vs. Laymen Ø Should the system be evaluated by experts of computational creativity, field specific experts, peers, or laymen Ø Evaluation can move on multiple levels Ø Different targets can be evaluated by different persons Ø A combination can be used to achieve more holistic and more useful results 29/11/17 14
HOW TO EVALUATE STANDARDISED PROCEDURE FOR EVALUATING CREATIVE SYSTEMS Ø SPECS: A Standardised Procedure for Evaluating Creative Systems was proposed by Jordanous (2012) as a domain independent way to define an evaluation process for a creative system 29/11/17 15
HOW TO EVALUATE STANDARDISED PROCEDURE FOR EVALUATING CREATIVE SYSTEMS Ø Step 1: Defining creativity Ø A definition the system should satisfy to be considered creative Ø What does it mean to be creative in general? Ø What aspects of creativity are important in the particular domain of the system? Ø What are you going to evaluate? Which Ps are interesting to you? 29/11/17 16
HOW TO EVALUATE STANDARDISED PROCEDURE FOR EVALUATING CREATIVE SYSTEMS Ø Step 2: Identifying Strands (which of the Ps) to Test for Ø Transform your definitions from step 1 to standards for testing the system Ø E.g. see evaluation criteria reported before 29/11/17 17
HOW TO EVALUATE STANDARDISED PROCEDURE FOR EVALUATING CREATIVE SYSTEMS Ø Step 3: Testing Systems Ø Test creative system against the standards set in step 2 and report the results Ø Tests depend on standards, preferences, capabilities, equipment and facilities of the researchers involved Ø Methods can be quantitative and qualitative Ø Suitable evaluators should be selected 29/11/17 18
CONCLUSIONS Ø Evaluation is critical to examine the creativity of computational creativity systems Ø Evaluation is an essential requirement of good research Ø To conduct a good and thorough evaluation, the researcher must identify when to evaluate, what to evaluate, who should evaluate and how to conduct the evaluation 29/11/17 19
REFERENCES Anna Jordanous (2016) Four PPPPerspectives on computational creativity in theory and in practice, Connection Science 28:2, 194-216 Anna Jordanous (2012) A standardised procedure for evaluating creative systems: Computational creativity evaluation based on what it is to be creative. Cognitive Computation, 4(3), 246-279 Geraime Ritchie (2001). Assessing creativity. In G.A. Wiggins (Ed.), Proceedings of the AISB symposium on AI and creativity in arts and science (pp.3-11). York: The Society for the Study of Artificial Intelligence and Simulation of Behaviour. Geraime Ritchie (2007). Some empirical criteria for attributing creativity to a computer program. Minds and Machines, 17, 67-99 Simon Colton, John Charnley, and Alison Pease (2011). Computational creativity theory: The FACE and IDEA Descriptive models, In the Proceedings of the 2nd International Conference on Computational Creativity 2011, pp.90-95 Simon Colton (2008). Creativity versus the perception of creativity in computational systems. Proceedings of AAAI symposium on creative systems (pp. 14-20), Stanford, California, USA Van der Velde, F., Wolf, R. A., Schmettow, M., & Nazareth, D. S. (2015). A semantic map for evaluating creativity. In Proceedings of the sixth international conference on computational creativity (pp. 94-101). 29/11/17 20