Blog of Joos Buijs

About personal things, process mining and the rest in life.

Posts Tagged ‘TU/e

The Results of my Master Project

with 2 comments

Update 26-05-2010: The official XESame (or XESMa) website is now located at processmining.org! This post will not be updated further.

So, after 7 months my master project is completed and the results are final!

Last Monday I gave my final presentation (.pptx, 1.7 MB). This presentation gives a good introduction into the problem and topic of my project.

More detailed information about what I did can be found in my master_thesis (.pdf, 9.8 MB). This should also be used as a temporary ‘user guide’ for my application.

Warning: This is a prototype! No support or guarantee is given whatsoever! Use at your own risk!

If you want to test/play with the prototype I created, it can be downloaded at the link below. However, use it at your own risk 😉

XESMa Application Prototype (v 1.0) (.zip, 3.4 MB)

How to start XESMa in 3 steps:

  1. Extract the contents of the zip file;
  2. In Eclipse (or any Java editor), create a new Java Project from the folder you just extracted;
  3. Execute ‘Application.java’ in the org.processmining.mapper.ui package.

Warning: This is a prototype! No support or guarantee is given whatsoever! Use at your own risk!

Now that everything is finished I will enjoy a holiday until May 3. Then I’ll start on a PhD position, here at the TU/e, more about this in another blog post.

I hope I have/get the time to continue to work on XESMa in the future. I have some ideas for improvement. And of course, your feedback is very much appreciated!

Thesis abstract:

Information systems are taking a prominent place in today’s business process execution. Since most
systems are complex, enterprise-wide systems, very few users, if any, have a clear and complete
view of the overall process. In the area of process mining several techniques have been developed to
reverse engineer information about a process from a recording of its execution. To apply process
mining analysis on process-aware information systems, an event log is required. An event log
contains information about cases and the events that are executed on them.
Although many systems produce event logs, most systems use their own event log format.
Furthermore, the information contained in these event logs is not always suitable for process
mining. However, since much data is stored in the data storage of the information system, it is
often possible to reconstruct an event log that can be used for process mining. Extracting this
information from the business data is a time consuming task and requires domain knowledge. The
domain knowledge required to de ne the conversion is most likely held by people from business,
e.g. business analysts, since they know or investigate the business processes and their integration
with technology. In most cases business analysts have no or limited programming knowledge.
Currently there is no tool available that supports the extraction of an event log from a data source
that doesn’t require programming.
This thesis discusses important aspects to consider when de ning a conversion to an event log.
The decisions made in the conversion de nition in
uence the process mining results to a large
extend. De ning a correct conversion for the speci c process mining project at hand is therefore
crucial for the success of the project. A framework to store aspects of such a conversion is also
developed in this thesis. In this framework the extraction of traces and events as well as their
attributes can be de ned. An application prototype, called `XES Mapper’ or `XESMa’, that uses
this conversion framework is build.
The XES Mapper application guides the de nition of a conversion. The conversion can be
de ned without the need to program. The application can also execute the conversion on the data
source, producing an event log in the MXML or XES event log format. This enables a business
analyst to de ne and execute the conversion on their own. The application has been tested with
two case studies. This has shown that many di erent data source structures can be accessed and
converted.

Keywords: data conversion, database, event log, process mining, process-aware information
system

Edit 01-04-2010 11:50: added XESMa execution steps

Advertisements

Written by Joos Buijs

March 31, 2010 at 15:17

Master Project Update: the end is approaching

with one comment

Hi all!

Well, the title says it all, the end of my master’s project is in sight!

The application is ‘nearly done’, there are so many things that could be improved but… well, there is not much time. So, let’s say that the application is at the beta stage then. Yesterday I tried to use it on a real data source instead of my toy database of only 10 records. The results were promising and today I’m processing some of the things we encountered yesterday. For instance the parsing of dates into a Java Date instance is problematic. The format of the date is not always the same and automatically detecting the format used is nearly impossible. Therefore the user (/you) can now define the format used to represent the date and time.

Another type of problem we encountered was related to the ODBC driver but that I can not fix… Other improvements are related to me trying to be too smart (which of course turns out wrong). And some performance issues (but these might be related to the ODBC driver used). And of course a lot of small improvements to the user interface can/should be made etc. So much to do, so little time 🙂

Early this week I also ‘finished’ the visualization of the conversion. The idea is to visualize which tables and columns are used in certain attributes. In the screenshot below a very small event log (with one event definition) is visualized. The conversion uses 2 columns of the event.csv table. I know that the visualization shown is very small and larger visualizations will get messy but it’s hard to get it right… And, well, its only a prototype 😉

I’m also working on my thesis, for about a month now. The contents is structured as follows:

  1. Introduction (context, problem, goal, scope and method of the project) [4 pages]
  2. Preliminaries (explanation of process aware information systems (PAIS), event logs, process mining and other conversion tools) [12 pages]
  3. Conversion Aspects (what to consider when defining a conversion) [8 pages]
  4. Solution Approach (how I planned to implement the application) [7 pages]
  5. Solution Implementation (more details of the technical implementation and use of the application) [14 pages]
  6. Case Studies (2 case studies (SAP and a custom system) to show the validity of my application) [to write]
  7. Conclusion (conclusions and future work) [to write]

So, I still have to perform my case studies, write Chapters 6 and 7 plus the abstract, preface etc. and thoroughly read the entire thesis. And all of that within the next 2 to 3 weeks. And then I’ll have to wait for the reviews of my supervisors and prepare for the final presentation of March 29…

You are all invited for my final presentation of course!!! It will be held at March 29 2010 at 15:00 in Eindhoven, the Netherlands. If you like to attend, please let me know then I’ll inform you of the location.

If you can not attend the final presentation and/or want to read my thesis or try out my application, keep an eye on this blog. I’ll post a link to both of them just before or after my final presentation.

So, now I’m going back to programming again (stupid SQL error…) and enjoy the weekend in a little bit.

TTFN!

Joos

Written by Joos Buijs

February 19, 2010 at 17:17

Process Mining: A quick overview of web resources on the subject

with 3 comments

Process mining is a hot research subject considering the large number of publications (see for instance Google Scholar and the full publication list of Wil van der Aalst).

Besides official publications there are of course less ‘official’ and less scientific writings about the subject. I was curious what I would find so I started a search on the world wide web…

The number 1 result is of course www.processmining.org, home of the well-known tools ProM and ProM Import developed at the TU/e. This website also explains the basics of process mining. A better introduction to the subject for ‘newbies’ might be the Wikipedia article on Process Mining.

Personally, I would put the LinkedIn group on Process Mining third. This group contains discussions on the subject and links to interesting (blog) posts are added. Another community around process mining is formed by the ProM-user and ProM-developer mailing lists. The ProM forum is not much used but has my personal preference above the (‘old fashioned’) mailing lists.

For those already more in to process mining the ‘IEEE task force on process mining’-wiki could be of interest. Extra tip: add the wiki changes RSS feed to your RSS reader 😉

Business people excited about the possibilities of process mining should visit the following websites of companies that support process mining (in no particular order or claim of completeness):

  • Futura Process Intelligence The first company specifically aimed at process mining, based in Eindhoven. Especially the ’14 day challenge’ should appeal.
  • Fluxicon Possibly the second company specifically aimed at process mining 😉 Also based in Eindhoven (must have a reason…). This ‘new kid on the block’ is one to keep your eye on, curious to see where they are say 2 years from now.
  • Surprisingly, the next company is also based in Eindhoven. MagnaView visualizes data and now also supports several process mining visualizations.
  • Process mining as a business has crossed the waters to Norway. Businesscape provides the ‘Enterprise Visualization Suite’ incorporating several process mining techniques.
  • Process mining is also incorporated in tools such as ARIS from IDS Scheer, BPM|One from Pallas Athena and Fujitsu’s ESI (although they call it Automated Process Discovery but its the same… (they disagree! but that’s not true…))
  • And of course I forgot many other great companies… (let me know in the comments!)

Next, blog posts. There are many of them ‘out there’, some of them even talk about process mining. A (very) small selection is provided below, no selection is made on quality or actuality.

Recently, I discovered that research on the subject of process mining is also spreading to Italy and its already spread to Germany, America and Australia.

Well, I hoped I provided at least a few pointers for further reading.

Joos

Edit 28-01-’10: corrected some small typing mistakes
Edit 30-07-’10: Entered the correct link to the IEEE TFPM *oops*

Written by Joos Buijs

January 22, 2010 at 17:58

Process Mining Terms: A Small Glossary

leave a comment »

Recently I helped someone unfamiliar with process mining in starting analysis on a log. One of the things that I noticed is that it is hard to get to know the overall ‘structure’ and meaning of the terms used. This is further complicated by inconsistent use of terminology in conversations and documentation but also in ProM 5.2. In this post I will try to explain some of the most common terms used in process mining and what they (should) mean.

Note: this is not a ‘definite’ list, it is just how I think the terms should be interpreted and used!!! Furthermore, any suggestions and additions are welcome!!!

The overall picture: A system (e.g. a workflow management system) facilitates the processing of cases using a predefined process in which activities and their ordering is defined. The activities executed in this system are recorded in an event log which can be ‘reverse engineered’ using ProM for instance. The log contains actual executions of events on cases on a certain moment in time by a certain actor etc.
The result of this reverse engineering can be a process model describing the behavior recorded in the log but performance -, social network – and constraint analysis is also possible. We won’t go into all the possible analyses in this post.

So, an (event) log contains information about process instances (e.g. cases) and the events that are performed on/for them.

It is also important to understand that there are two levels: one is the conceptual level in which we do not talk about actual instances but generally talk about objects that can appear in a log. The other level is the instance level in which you look at specific instances of process instances, event executions, originators, etc. etc. In the general terms list I tried to indicate whether a term refers to a conceptual aspect or really refers to an (set of) instance.

General Process Mining terms:

(Used in ProM 5.2 and MXML, new terms are used in ProM 6 and the XES event log format)

  • Activity An action or task that can be performed for a process instance (conceptual level);
  • Data attribute An extra attribute recorded in the MXML file. Examples are the amount of a purchase order or the patient’s age. These attributes can for instance be used for decision analysis in ProM (conceptual level);
  • Event This can either refer to an activity or an event instance performed by a resource on a certain time for a specific process instance. The meaning therefore depends on the context in which it is used;
  • Event Class Used in the ProM Dashboard, it refers to the number of different activities encountered in the log (instance level).
  • Event Log A recording of a set of events, an MXML log is an example of an event log format (instance level);
  • Event Instance A recording of an executed event with information such as execution timestamp, event type and originator (instance level);
  • Event Type Each activity can be in one of several states. The most commonly used states are ‘start’ and ‘complete’. The meaning is very straightforward: an activity is started and a certain amount of time later it is completed. There are several other event types or states, for a complete overview see figure 4 in the ‘MXML paper’ (PDF) (might be outdated) (conceptual level);
  • Log The original log generated by the source system which records things that have happened. In order to be used within ProM this needs to be converted to the MXML format using the ProM Import Framework (instance level).
  • Process Instance (PI) The object you are following and on/for which events occur. Examples are cases, patients, machines etc. (can be both conceptual and instance level);
  • Process mining Analyzing a business process based on an event log, see http://www.processmining.org;
  • ProM An application to apply several process mining techniques to an event log, see http://www.processmining.org. The version at the moment of writing is 5.2 and version 6.0 is under development (nightly builds are available);
  • ProM Import Framework A framework for converting event logs to the MXML event log format. A set of converters for common formats is available but new converters can be programmed in Java;
  • Model Element Used in the ProM Dashboard Summary, it should be interpreted as ‘activity’.
  • MXML A meta model for event logs. An event log needs to be in this XML format to be processed by ProM. More information can be found in the ‘meta model for process mining’ paper (PDF) (conceptual level);
  • MXML log The actual MXML file with all the recordings following the MXML format (instance level);
  • Resource Any actor that can execute an activity, for example humans, the system itself or a web service (conceptual level);
  • Timestamp A time indication consisting of a date and possibly a time part (instance level);

Well, that’s the list for now. I hope I helped someone and did not add to the confusion. If you have any questions, suggestions or additions, please post a comment!!! Especially the ‘conceptual v.s. instance’ part was hard for me to explain so any improvements are welcome.

– Joos –

P.S. @my supervisor: I created this article in the weekend and scheduled it for publication on Tuesday, so don’t think I’m procrastinating 🙂

Activity An action or task that can be performed for a process instance.;

Written by Joos Buijs

November 17, 2009 at 17:00

Posted in Process Mining

Tagged with , , ,

About my master project

with one comment

To make sure that this blog won’t be about funny process models alone it might be a good idea to introduce and explain my master’s project subject: Mapping Data Sources to XES in a Generic Way. Let’s dissect this rather vague sentence to explain what it is all about:

Mapping: in this context it means to let the user define a way to map one data source to another.

Data Sources: most people might think of databases first but text files, XML files and even web services can be considered as data sources. Although it must be seen how many data sources we are able to support, we intend to at least support the common database formats and the CSV (comma separated values) plain text format.

XES: the one requiring the most explanation. Although pronounced (as ‘excess’) similar to one of the well known database formats from a well known software vendor, it means something completely different. In this case we refer to the ‘Extensible Event Stream’ format. This format is an extend-able event log format for, well, storing event logs. For people familiar with the MXML format: XES is the new and improved MXML! For people unfamiliar with MXML: visit processmining.org (more specifically, read an informal introduction to MXML (.PPT, 0,9 MB) or read the more formal MXML introduction paper (PDF, 130 KB)). The XES meta-model is implemented in the OpenXES Java library, more information about XES can also be found there.

a Generic Way: of course, we want our application to be applicable in many situations and therefore it must be generic.

So, in brief, the goal of my project is to develop an application that allows a user to define a mapping from a (set of) data source(s) to the XES event log format and to execute this mapping. Resulting in an event log format that can be used for process mining with (the new version of) ProM.

If you have done one or several process mining project you must know that preparing the data is one of the most time consuming (and, in my opinion, most annoying) part of the process mining project. This master project aims at providing an application that will allow you, the process miner, maybe together with the domain expert, to define a mapping from the data source(s) to the XES event log format without the need to write (Java) code.

Any questions, comments, feature requests etc. etc. are more than welcome!!!

See you at the next post! (Which will either be a ‘funny process model’ or a post showing some GUI designs for the application)

Joos

Written by Joos Buijs

November 3, 2009 at 17:00

Hello world!

with one comment

Welcome to my personal blog on WordPress.com. This is my first post.

And since this is my first post, I’ll tell a little bit about me and why I started this blog.

So, I’m Joos Buijs from the Netherlands. At the moment I’m a graduate student ‘Business Information Systems‘ at Eindhoven University of Technology. I’m working at my master thesis/project which is related to process mining, a ‘hot research topic’ at the group I’m working at. But more about my master project in a later blog post.

The goal of this blog is mainly to create a ‘personal brand’, or in other words, to promote myself to future employers, colleagues and the rest of the world. Secondary, but not less important, I want to get in contact with new people and share idea’s and opinions and help each other. For instance, I believe that my master project might help people active in process mining and they can provide me with tips to improve my application.

If you want to get to know me better you can read my ‘online resume’ also known as my LinkedIn profile (which also includes a picture or myself, to make it more personal). And of course, you can always contact me by adding a comment, send me a message or try to meet me in real life 🙂

That’s it for now. I cannot promise a regular update schedule, in most cases its contradictory to my work schedule. Right now for instance I’m taking a break from reading a book (on plug-in development in Eclipse) but when I’m ‘really working’ I have less time to post something interesting (although then I’m likely to have something interesting to post…). Anywho, hope to see you soon!!!

Take care,

Joos

Written by Joos Buijs

October 20, 2009 at 13:32