MMSys’22 Grand Challenge on AI-based Video Production for Soccer
Organization
Organized and sponsored by:
Challenge Description
Soccer has a considerable market share of the global sports industry, and the interest in viewing videos from soccer games continues to grow. In this respect, it is important to provide game summaries and highlights of the main game events. However, annotating and producing events and summaries often require expensive equipment and a lot of tedious, cumbersome, manual labor. Therefore, automating the video production pipeline providing fast game highlights at a much lower cost is seen as the "holy grail". In this context, recent developments in Artificial Intelligence (AI) technology have shown great potential. Still, state-of-the-art approaches are far from being adequate for practical scenarios that have demanding real-time requirements, as well as strict performance criteria (where at least the detection of official events such as goals and cards must be 100% accurate). In addition, event detection should be thoroughly enhanced by annotation and classification, proper clipping, generating short descriptions, selecting appropriate thumbnails for highlight clips, and finally, combining the event highlights into an overall game summary, similar to what is commonly aired during sports news. Even though the event tagging operation has by far received the most attention, an end-to-end video production pipeline also includes various other operations which serve the overall purpose of automated soccer analysis.
The goal of this challenge is to assist the automation of such a production pipeline using AI. In particular, we focus on the enhancement operations that take place after an event has been detected, namely:
The challenge is open to any individual, academic or commercial institution.
A detailed overview of the challenge can be found in this paper.
Tasks
In this challenge, we focus on event clipping, thumbnail selection, and game summarization. Soccer games contain a large number of event types, but we focus on cards and goals within the context of this challenge.
Task 1: Event Clipping
Highlight clips are frequently used to display selected events of importance from soccer games. When an event is detected (spotted), there is an associated timestamp indicating when the event happened, e.g., a tag in the video where the ball passes the goal line. However, this single annotation is not enough to generate a highlight clip which summarizes the event for viewers. Start and stop timestamps are needed to extract a highlight clip from the soccer game video (e.g., clipping the frames between t1 seconds before the event annotation and t2 seconds after the event annotation).
In this task, participants are asked to identify the appropriate clipping points for selected events from a soccer game, and generate one clip for each highlight, ensuring that the highlight clip captures important scenes from the event, but also removes “unnecessary”' parts. The submitted solution should take the video of a complete soccer game, along with a list of highlights from the game in the form of event annotations, as input. The output should be one clip per each event in the provided list of highlights. The maximum duration for a highlight clip should be 90 seconds.
Task 2: Thumbnail Selection
Thumbnails capture the essence of video clips and engage viewers by providing a first impression. A good thumbnail makes a video clip more attractive to watch. Thus, selecting an appropriate thumbnail (e.g., by extracting a frame from the video clip itself) is very important. Traditional solutions in the soccer domain rely on the manual or static selection of thumbnails to describe highlight clips, which display important events such as goals and cards. However, such approaches can result in the selection of sub-optimal video frames as snapshots, which degrades the overall quality of the clip as perceived by the viewers, and consequently decreases viewership. Additionally, manual processes are expensive and time consuming. Therefore, research on the implementation of automated algorithms is needed.
In this task, participants are asked to identify the frame that best represents a game highlight, according to rules established by the participants themselves. The rules can be justified by references to scientific literature and industry practices. The submitted solution should take the video of a complete soccer game, along with a list of highlights from the game in the form of event annotations, as input. The output should be one image (thumbnail candidate) per each event in the provided list of highlights.
Task 3: Game Summarization
Soccer game summaries are of tremendous interest for multiple stakeholders including broadcasters and fans. Existing works consider different modalities such as video, audio, and text, but a relatively larger emphasis is put on video summaries in the broadcasting context.
In this task, participants are asked to generate overall game summaries for soccer games. The submitted solution should take the video of a complete soccer game, along with a list of highlights from the game in the form of event annotations, as input. The output should be a text and/or video which presents an overall summary of the game, including an adequate overview of important events, per game.
Task 3a. Text Summary
In this subtask, participants are asked to output a text in English which serves as a summary of the soccer game, for which the maximum value for the total number of words is 100.
Task 3b. Video Summary
In this subtask, participants are asked to output a video (audio optional) which serves as a summary of the soccer game, for which the maximum value for the total duration of the video is 3 minutes (180 seconds). How various events are “concatenated” into a summary is up to the participants, and using scene transition effects, as well as overlays containing detailed information (such as the names of goal scorer or booked players) are allowed.
Overall Score for Task
The overall score will be calculated by adding up the scores for subtask 3a and subtask 3b with the corresponding weights:
score_final = (score_3a x 25% + score_3b x 75%) / 100
Dataset
An official training dataset is provided by the challenge organizers. This dataset consists of complete soccer game videos from the Norwegian Eliteserien, accompanied by a list of highlights in the form of event annotations, for each game. The list of highlights includes goal annotations, card annotations, and additional timing metadata. The dataset was curated by employees of ForzaSys AS and researchers at Simula Metropolitan Center for Digital Engineering (SimulaMet).
Prospective participants should fill out this form to request access to the above dataset.
In addition, prospective participants are free to use any other open dataset for training and validation purposes. In particular, interested participants are referred to the excellent and publicly available SoccerNet dataset, which can be used for training and validation, as well as a transfer learning dataset for presenting additional performance results.
The evaluations will be undertaken using a hidden, previously unseen dataset. It will have the same format as the public training dataset provided by the challenge organizers, but will consist of completely different games.
Evaluation
Performance
As the perceived quality of highlight clips, thumbnails, and text/video summaries are highly subjective, the performance of the submitted solutions will be evaluated by a jury. In particular, a subjective survey will be conducted in double blind fashion with a jury consisting of unaffiliated video experts selected by the challenge organizers. For each submitted solution for a given task, the jury members will be asked to provide an overall subjective performance score out of 100.
Complexity
The following objective metrics will be used to evaluate the submitted solutions in terms of complexity. Participants are asked to calculate the following metrics for their model and include these values in their manuscript.
- Latency: Average runtime per sample (ms) / Frame rate: Average number of frames the submitted solution can analyze per second (fps)
- Number of parameters: Total number of trainable parameters in the submitted solution
- Model size: Storage size (size on disk) of the submitted solution (MB)
Score
Aggregation of the subjective performance scores with the objective complexity scores per submission will be undertaken by the challenge organizers. For Task 3, the text (3a) and video (3b) subtasks are weighted 25% and 75%, respectively.
Important Dates
- Challenge announcement: January 27th
- Dataset release: February 7th
- Paper submission: April 5th
- Notification: April 20th
- Camera ready: April 29th
- Presentations and demos: During the MMSys conference (June 14th-17th)
Grand Challenge Submission System
Submission Guidelines
Submissions should be made according to the guidelines provided below.
- Participation is open to any individual, academic or commercial institution.
- A submission is required for entering the competition and being awarded a prize.
- A submission is composed of a manuscript and relevant software artifacts.
- Separate submissions should be made for different tasks, where each submission includes a manuscript and relevant software artifacts.
- Joint submissions can be made for the same task, if the same group of participants are proposing multiple (alternative) solutions for a given task. In this case, one manuscript can be submitted alongside multiple software artifacts corresponding to different solutions. Necessary information for each solution should be provided in the joint manuscript, and the solutions should be clearly labeled. Each solution can enter the competition separately (i.e., participants can decide if they want to enter one or more of their solutions into the competition).
- There is no limit for the total number of submissions that can be made by the same participant.
The following are required for each submission.
- Source code for the proposed solution should be provided in a public repository, and referenced in the manuscript. Note that no data should be included within the repository itself.
- A technical paper (manuscript) prepared according to the submission format and style for the Open Dataset & Software Track should be submitted using the online paper submission system. This paper should:
- Describe the proposed solution(s), providing enough details about the implementation and contributions.
- Include references to the relevant software artifacts.
- Include a reference to the challenge overview paper.
Optional: Prospective participants are encouraged to provide an executable version of their proposed solution (e.g., in the form of a public Docker container, or public online notebook such as Google Colab, or Jupyter), and reference this resource in their manuscript.
All submissions will undergo a single blind review process. The authors of accepted submissions need to prepare a camera-ready version to be published in the ACM Digital Library.
Awards
Winners of each task will be chosen by the challenge organizers according to the evaluation criteria specified above, and the decision will be final. Results will be announced during the ACM Multimedia Systems Conference (MMSys'22). If contributions of sufficient quality are not received, some or all of the awards may not be granted.
Award prizes:
- Winner of Task 1: €500
- Winner of Task 2: €500
- Winner of Task 3: €500
Participant Support
Please send an e-mail to This email address is being protected from spambots. You need JavaScript enabled to view it. if you have any questions about the challenge. You can visit the Google group for the challenge to view past questions. You can also join the #2022-gc-soccer channel in the ACM MMSys Slack workspace for discussions.
A detailed overview of the challenge can be found in this paper.