Data Challenge Track

Data is the foundation of many important decision-making processes in performance engineering tasks of modern systems. Data can tell us about the past and present of a system’s performance, helping us predict performance or assess the quality of our systems. In ICPE 2023, we will continue to host a data challenge track in its second installment.

In this track, we provide a novel performance dataset from open source Java systems collected by Traini et al. and published recently in the Empirical Software Engineering journal. Participants are invited to come up with new research questions about the dataset and study those. The challenge is open-ended: participants can choose the research questions they find most interesting. The proposed approaches and/or tools and their findings are discussed in short papers and presented in the main conference.

How to participate in the challenge

Read the data description
Think of something cool to do with the data. This can be anything you want, including visualization, analysis, approach or tool
Implement your idea, evaluate it, and write down your idea and the results in a short paper

Data description

This year, the challenge dataset is provided by Traini et al., published alongside their recent study “Towards effective assessment of steady state performance in Java software: Are we there yet?”.

The dataset contains a comprehensive set of performance measurements of 586 microbenchmarks from 30 popular Java open source projects (e.g., RxJava, Log4J2, Apache Hive) spanning various project domains (e.g., application servers, libraries, databases). Microbenchmarks are frequently employed by practitioners to test and ensure the adequate performance of their systems. Microbenchmark measurements help open source maintainers test performance before landing new system features, and identify performance regressions and optimization opportunities. Each benchmark was carefully executed using the Java Microbenchmark Harness (JMH) framework in a controlled environment to reduce measurement noise: results contain, for each benchmark, 3000 measurements batches (JMH iterations) with a minimum execution time of 100ms, repeated in 10 runs. This amounts to more than 9 billion benchmark invocations for the entire dataset, an experiment that lasted ~93 days.

The dataset contains the following:

Performance measurements of 586 microbenchmarks from 30 widely-popular Java open source projects
Git revisions at which the benchmarks were executed
A script to help researchers to read and explore the data

High-level possible ideas for participants include but are not limited to:

Tailor visualization techniques to explore the plethora of data produced by performance microbenchmarks
- Cito et al.: https://doi.org/10.1145/3183440.3183481
Assess the quality of performance measurements within and across different open software systems
- Laaber et al.: https://doi.org/10.1007/s10664-021-09996-y
- Costa et al.: https://doi.org/10.1109/TSE.2019.2925345
Develop approaches to detect warmup and steady-state phases
- Barrett et al.: https://doi.org/10.1145/3133876
- Laaber et al.: https://doi.org/10.1145/3368089.3409683
- Traini et al.: https://doi.org/10.1007/s10664-022-10247-x
Model the performance of benchmark through source code features
Replicate a study/approach on the dataset

Important Dates

The submission time aligns with the other early year tracks (poster, tutorial, demo, wip/vision, and workshops) and can be found here.

Submission requirements

A challenge paper should contain the following elements:

A description of the problem that you are studying, and an explanation of why the problem is important
A description of the solution that you are proposing
An evaluation of the solution
A discussion of the implications of your solution and results

We highly encourage the solution’s source code to be included with the submission (e.g., in a GitHub repository), but this is not mandatory for acceptance of a data challenge paper.

Submissions are made via the ICPE EasyChair by selecting the respective track.

The page limit for challenge papers is 4 pages (including all figures and tables) + 1 page for references. Challenge papers will be published in the companion to the ICPE 2023 proceedings. All challenge papers will be reviewed by the program committee members. Note that submissions to this track are double-blind: for details, see the Double Blind FAQ page. The best data challenge paper will be awarded by the track chairs and the program committee members.

Instructions for Authors from ACM:

By submitting your article to an ACM Publication, you are hereby acknowledging that you and your co-authors are subject to all ACM Publications Policies, including ACM’s new Publications Policy on Research Involving Human Participants and Subjects. Alleged violations of this policy or any ACM Publications Policy will be investigated by ACM and may result in a full retraction of your paper, in addition to other potential penalties, as per ACM Publications Policy.

Please ensure that you and your co-authors obtain an ORCID ID, so you can complete the publishing process for your accepted paper. ACM has been involved in ORCID from the start and we have recently made a commitment to collect ORCID IDs from all of our published authors.
The collection process has started and will roll out as a requirement throughout 2022.
We are committed to improve author discoverability, ensure proper attribution and contribute to ongoing community efforts around name normalization; your ORCID ID will help in these efforts.

Data Challenge Track Chairs

Diego Costa, The University of Quebec in Montreal, Canada
Michele Tucci, Charles University, Czechia

Contact: icpe2023-data-challenge@easychair.org

Submissions to be made via Easychair.