Business Process Intelligence Challenge (BPIC)
Eighth International Business Process Intelligence Challenge (BPIC’18)
Platinum Sponsor | Gold Sponsor | Silver Sponsor |
In this challenge, sponsored by Celonis, NWO's DeLiBiDa project and Minit, we provide participants with a real-life event log, and we ask them to analyze these data using whatever techniques available, focusing on one or more of the process owner's questions or proving other unique insights into the process captured in the event log.
We strongly encourage people to use any tools, techniques, methods at their disposal. There is no need to restrict to open-source tools, and proprietary tools as well as techniques developed or implemented specifically for this challenge are welcome.
Our industrial sponsors provide access to their tools for use with the BPI Challenge dataset. If you would like to use Celonis on this data, please contact them directly on BPI2018@celonis.com. If you would like to try minit on this dataset, please contact minit on BPI2018@minit.io.
Important Dates
Publication of the data: | Early February 2018 | ||
Abstract submission deadline: | 2 June 2018 | ||
Report submission deadline: | 16 June 2018 | ||
Presentation of the winners: | At the BPI workshop 2018 in Sydney Australia | ||
Workshop Days: | 9-10 September 2018 |
The Challenge
Like last year, we have decided to have three categories, namely students, academics and professionals. Thanks to the sponsoring of both Celonis and Minit, we can invite winners in all three categories to join the workshop in Sydney to present their findings.
The Student Category
This category targets Bachelor, Master and PhD students or student teams. In this category, the focus is on the originality of the results, the validity of the claims and the depth of the analysis of specific issues identified. We expect participants can focus on a specific aspect of interest and analyze this aspect in great detail. Here, one can choose for example to focus on specific models, such as control-flow models, social network models, performance models, predictive models, etc.
The winner: Jarno Brils, Nina van den Elsen, Jan de Priester and Tom Slooff of the Honors Academy of Eindhoven University of Techology with their report entitled Analysis and Prediction of Undesired Outcomes
The Academic Category
This category targets academics. The focus in this category is much more on the novelty of the techniques applied than the actual results. This provides a great opportunity for BPI researchers to show the practical applicability of their tools and/or techniques on real-life data.
The winner: Stephen Pauwels and Toon Calders of the University of Antwerp with their report entitled Detecting and Explaining Drifts in Yearly Grant Applications
The Professional Category
This category targets professionals to show their skills in analyzing business processes. The submitted reports are judged on their level of professionalism. The participants are expected to report on a broader range of aspects, where each aspect does not have to be developed in full detail. The report submitted in this category will be judged on its completeness of analysis and usefulness for the purpose of a real-life business improvement setting.
The winner: Lalit Wangikar, Sumit Dhuwalia, Abhilasha Yadav, Bhavy Dikshit and Dikshant Yadav from Cognitio Analytics with their report entitled Faster Payments to Farmers: Analysis of the Direct Payments Process of EU's Agricultural Guarantee Fund
The winners were selected by a jury and the winners presented their findings at the workshop in Sydney, Australia!
The Process
The European Union spends a large fraction of its budget on the Common Agricultural Policy (CAP). Among these spendings are direct payments, which are mainly aimed to provide a basic income for farmers decoupled from production. The rest of the CAP budget is spent for market related expenditures and rural development.
The processes that govern the distribution of these funds are subject to complex regulations captured in EU and national law. The member states are required to operate an Integrated Administration and Control System (IACS), which includes IT systems to support the complex processes of subsidy distribution.
The process considered in the BPI Challenge 2018 covers the handling of applications for EU direct payments for German farmers from the European Agricultural Guarantee Fund. The process repeats every year with minor changes due to changes in EU regulations. About 10% of the cases are subject to a more rigorous on-site inspection.
The Data
The data for this year's challenge is brought to you by the German company data experts, located in Neubrandenburg. They provide data from their Java Enterprise system profil c/s.
Profil c/s supports these processes at the level of federal ministries of agriculture and local departments. The system supports various kinds of administrative processes, but for this challenge, the focus is on the yearly allocation of direct payments, starting with the application and, if all goes well, finishing with the authorization of a payment.
The workflows in profil c/s can be understood in terms of documents, where each document has a state that allows for certain actions. These actions can be executed manually at any point in time through document specific tools or they can be scheduled automatically. The latter may be either explicitly stated in the log or implicitly apparent if a large number of actions is performed by the same user at around the same time (batch processing).
In total, the event log contains 2,514,266 events for 43,809 applications over a period of three years. The shortest case contains 24 events, the longest 2973 and on average there are 57 events per case referring to 14 activities. As mentioned, the data is centered around documents and for your convenience, we provide both the complete log file as well as log files for each document type, in which each instance of a document is a case. We expect to publish the data in the 4TU datacenter soon!
There are nine different document types in the data listed in the table below. From 2015 to 2016, the Parcel document was succeeded by the Geo Parcel Document. In 2017, the Geo Parcel document also replaced the Department Control Parcels document.
Document type | Sub Process | Explanation |
Control summary | Main | A document containing the summarized results of various checks (reference alignment, department control, inspections) |
Department control parcels (before 2017) | Main | A document containing the results of checks regarding the validity of parcels of a single applicant |
Entitlement application | Main Objection Change |
The application document for entitlements, i.e., the right to apply for direct payments, usually created once at the beginning of a new funding period |
Inspection | On-Site Remote | A document containing the results of on-site or remote-inspections |
Parcel Document (before 2016) | Main | The document containing all parcels for which subsidies are requested |
Geo Parcel Document (replaces Parcel document since 2016 and Department control parcels since 2017) | Main Declared Reported |
The document containing all parcels for which subsidies are requested. From 2017, the Geo Parcel Document also replaces the Department control parcels document. |
Payment application | Main Application Objection Change |
The application document for direct payments, usually each year |
Reference alignment | Main | A document containing the results of aligning the parcels as stated by the applicant with known reference parcels (e.g., a cadaster) |
For each document type, one or more sub-processes can be found in the data. These sub processes refer to the overall sequence of events that influence a document. The leading document is always an application for which a number of other documents are created. Typically, documents are created by the initialize activity. Then, documents are edited, typically shown by begin editing until finish editing or a similar pair. While editing, several things may be recorded, for example some calculations are made or the application is saved. The log shows the times at which these events are completed and there are considerable dependencies between the sub-activities of different documents. For example, you will usually only be able to decide an application after all the other documents are in a final state.
Download
The data is made available through the 4TU Center for research data as usual . However, for your convenience, we have the data ready for download right now:
- Application log (xes.gz, 150MB) This log contains all event data for three years with application as a case ID,
- Document logs (zip, 150MB) This collection contains eight log files, one for each document type. In each file, only those events relevant for a document are included.
When you use this data, please site this as “van Dongen, B.F. (Boudewijn); Borchert, F. (Florian) (2018) BPI Challenge 2018. Eindhoven University of Technology. Dataset. https://doi.org/10.4121/uuid:3301445f-95e8-4ff0-98a4-901f1f204972”. The Bibtex or other formats can be downloaded from https://doi.org/10.4121/uuid:3301445f-95e8-4ff0-98a4-901f1f204972/object/citation.
Trace attributes
The following attributes are recorded for each case, where each case represent one application of one applicant in a specific year.
Attribute | Type | Explanation |
program-id | literal | Internal id of the funding program |
concept:name (and application) |
literal | Unique case id for the application |
identity:id | UUID | Globally unique case id (UUID) |
Department | literal | Id of the local department |
application | literal | The applicant’s id, the same across years |
year | literal | The current year |
number_parcels* | discrete | The number of parcels |
area | continuous | The total area of all parcels |
basic_payment | boolean | Application for basic payment scheme |
greening | boolean | Application for greening payment |
redistribution | boolean | Application for re-distributive payment |
small farmer | boolean | Application for small farmer scheme |
young farmer | boolean | Application for payment for young farmers |
applicant | literal | Anonimized identifier of applicants |
Derived attributes | ||
penalty_{xxx} | boolean | Indicates if a penalty was applied for a certain reason {xxx} (see also the business questions). The following reasons can be found in the log: JLP1, AVGP, C4, JLP3, JLP2, JLP5, JLP6, C9, AVJLP, V5, CC, AVUVP, GP1, B16, BGK, C16, AGP, B3, B2, AVBP, B5, B4, B6, ABP, AUVP, AJLP, BGKV, JLP7, B5F, BGP. |
amount_applied{x}* | continuous | Amount (in Euro) applied for in the application. The number indicates the current payment subprocess, starting with zero. If a case requires changes by the department or due to objection by the applicant, this number is increased by 1 for each payment. |
payment_actual{x}* | continuous | Amount (in Euro) actually received by the applicant. For the meaning of {x}, see above. |
penalty_amount{x} | continuous | Penalty applied by the department, e.g., due to over-declaration of parcel sizes. For the meaning of {x}, see above. Only available if penalty_applied is true. |
risk_factor | continuous | An optional, manually assigned risk assessment factor. |
cross_compliance | continuous | A penalty term due to violation of cross-compliance rules. |
selected_random | boolean | Has the application been selected for an inspection at random? |
selected_risk | boolean | Has the application been selected for an inspection due to risk assessment? |
selected_manually | boolean | Has the application been selected for an inspection manually? |
rejected | boolean | Entire rejection of the application |
* The marked attributes have been binned in groups of 100 for anonymization purposes, where the bins are identified by the minimum value. This means, if you encounter a value for “area” of 50 ha, you know that the actual area was at least 50 ha but not larger than the next largest value in the data set. Since the binning is done per year, there may be small differences in the attribute values for applicants across years as these values indicate the lower bound of the interval. For instance, you may observe that an applicant got € 100 more in 2016 compared to 2015, but this may only be due to the boundaries of the bins.
Event attributes
The following attributes are recorded for each event. All events are included in the application log and within each application, events are ordered by timestamp. It is important to realize that if two events have exactly the same timestamp, their ordering cannot be concluded from the order in which they appear in the file.
Attribute | Type | Explanation |
success | boolean | Indicates whether the event was successful. |
concept:name (and activity) |
literal | Activity that was performed of which this event indicates the completion. |
docid | literal | Internal id of the document the event refers to. |
doctype | literal | Type of the document as indicated in the list of document types above. |
eventid | literal | Internal id of an event (may be null in case of an inferred event) |
lifecycle:transition | literal | Value is "complete" for all events. Included for compatibility with some tools that require it. |
note | literal | Free text note included for the event. Defaults to "none" if no note is available |
org:resource | literal | indicates the resource responsible for the event. |
subprocess | literal | Subprocess to which the event belongs. Each document is subdivided in a number of subprocesses |
time:timestamp | timestamp | Time at which the event occurred. Note that ordering of events with identical timestamps cannot be concluded from the file. Also note that some timestamps are manually entered and may therefore contain spelling mistakes. |
docid_uuid | UUID | Globally unique id for the document the event belongs to. There is a 1-to-1 correspondence between docid and docid_uuid. |
identity:id | UUID | Globally unique id of each event. Supersedes the eventid attribute in case that the eventid is not unique (e.g. null). Events have a unique identity:id attribute accross all files. |
Additional files
Applicants can file an application each year and within each application, multiple documents are kept. Hence there is a one-to-many relation between applications and documents. To study the documents independently, we provide separate log files for each document type. Within these files, the same events are included as in the original files, but the case id is based on the docid attribute of the event and only events with the correct doctype attribute are included in each file.
The identity:id attributes of each event in these files is globally unique, i.e. this UUID can be used to cross-reference the various log files. For each document, the traces also have an additional trace-level attribute “application” referring to the application in the application log file to which this document belongs.
Business Questions
The company has formulated four business questions on the data. They encourage the participants to focus on one or more questions. However, any other insights that can be obtained on the data are welcome. In your reports, please indicate clearly which question you answered.
Undesired outcomes
A usual case is opened around May of the respective year and should be closed by the end of the year. By “closed”, we refer to the timely payment of granted subsidies. There are, however, several cases each year where this could not be achieved:
- Undesired outcome 1: The payment is late. A payment can be considered timely, if there has been a “begin payment” activity by the end of the year that was not eventually followed by “abort payment”.
- Undesired outcome 2: The case needs to be reopened, either by the department (subprocess “Change”) or due to a legal objection by the applicant (subprocess “Objection”). This may result in additional payments or reimbursements (“payment_actual{x}“ > 0, where x ≥ 1 refers to the xth payment after the initial one)
Question: We would like to detect such cases as early as possible. Ideally, this should happen before a decision is made for this case (first occurrence of “Payment application+application+decide”). You may use data from previous years to make predictions for the current year.
Prediction of penalties (risk assessment)
The applicant may not receive the total amount of what has been applied for. This may occur to a variety of reason, i.e., the stated size of the farmland did not match the actual size as determined by alignment with the reference or a remote or on-site inspection. Other reasons include the violation of cross-compliance rules or noncompliance with the young farmer condition.
The occurrence of such a penalty is indicated by the cut amount (“penalty_amount{x}”) and a code for one or more reasons (“penalty_{xxx}”). Some of these are considered more severe (namely: B3, B4, B5, B6, B16, BGK, C16, JLP3, V5 and BGP, BGKV, B5F in Q2). A certain amount of applications is selected for the more rigorous (on-site) inspection. This may either happen due to an internal risk assessment (“selected_risk”) or randomly (“selected_random”).
We would expect the risk assessment to reveal a comparatively larger fraction of severe violations in the selected sample. However, we see room for improvement.
Question: Can you draw a better sample of the same size (about 5%) with a better recall in uncovering the severe cases (as defined above)?
Note: You should only use events as predictors that happened before the remote and on-site inspections for the particular year have started (for example, the 27th of June 2015). You should also exclude the attributes in the table in the section “derived attributes”, as these are not known before the inspection has taken place and the application was processed.
You may however, use all data from previous years.
In fact, we would be very interested in discovering dependencies across years. However, we would be also interested in statistical evidence that the current year’s risk is independent of whatever happened in the past.
Differences between departments
Departments may have implemented their processes differently and the hypothesis is that there is a relationship between the different processes and the problems described in questions 1 and 2.
Question: How can one characterize the differences between departments and is there indeed a relation?
Differences across years
Usually, around the same number of applications from the same farmers is handled every year. The processes should be similar each year, but may differ due to changes in regulations or in their technical implementation (for instance, the document type “parcel document” has been replaced by the more sophisticated “geo parcel document” in 2016).
Question: How can one characterize these differences as a particular instantiation of concept drift?
Questions about the challenge
Like before, participants can post questions about the data/process in the ProM forum. The company monitors the messages there and will try to respons as soon as possible
Submissions
Submissions should be made through EasyChair at https://www.easychair.org/conferences/?conf=bpi2018 where you indicate your submission to be a challenge submission. A submission should contain a pdf report of at most 30 pages, including figures, using the LNCS/LNBIP format (http://www.springer.com/computer/lncs?SGWID=0-164-6-791344-0) specified by Springer (available for both LaTeX and MS Word). Appendices may be included, but should only support the main text.