Blog of Joos Buijs

About personal things, process mining and the rest in life.

Posts Tagged ‘XES

ProM 5 versus ProM 6

with 6 comments

About Precisely a week ago I read the ‘How to Get Started With Prom‘ blog post on the Fluxicon Blog (err, ‘Capacitor‘). In this blog post Anne explains how an event log can be constructed, using Nitro in this case, then inspected and finally how a process model can be mined and animated using ProM. Overall, the blog post is very nice, as are all posts at the Fluxicon Blog.

There is however one thing I noticed when I was half-way the post: they use ProM 5! At first I thought, why? I mean, ProM 6 is ProM 6 after all, it’s not 4.5, it’s not 5.3, it’s 6! Therefore, it should be better than 5. Furthermore, Fluxicon, especially Christian, had a great influence in the development of ProM 6: Christian designed the new slick user interface of ProM 6 and also developed XES, the new event log standard on which ProM 6 is based (with backward compatibility with MXML which is used in ProM 5). Furthermore, ProM 6 uses ‘packages’ to wrap plug-ins. Packages can be installed and updated independently of the framework therefore allowing plug-ins to be updated by the authors independently of the release cycle of ProM 6.

So, I wondered, why explain new users how ProM 5 works? Shouldn’t you point them to ProM 6? Let them use the newest process mining tool, the state-of-the-art, with all its improvements. I’m not saying that ProM 5 is bad, of course not, but ProM 6 is better. Or is it?

Of course, I could have emailed Anne this question and I would have received a reply but I want to make this a public discussion. Why/when would you use ProM 5 instead of ProM 6?

Well, I can give a couple of reasons but I would sure like to know yours. And, of course, especially Anne’s reasons for introducing ProM 5 to our new process miners instead of ProM 6.

So, in summary, I believe that the benefits of ProM 6 compared to ProM 5 are:

  • Better graphical interface which is nicer than the one of ProM 5. The main new feature of the GUI of ProM 6, in my opinion, is that it’s object based. A plug-in requires certain object (types) and produces certain others. This allows for dynamic ‘chaining’ of plug-ins, each plug-in taking the analysis one step further;
  • Separation between plug-in and ProM 6 framework. You can choose which plug-ins / packages to install and updates can be made more frequent and independent of the ProM 6 framework updates;
  • Support for the new XES event log format but also still supporting the well-known MXML format;
  • Separation of GUI and execution, if a plug-in crashes the framework keeps running in most cases. Furthermore, it also allows for easier ‘grid deployment’ than ProM 5;

However, at the moment, ProM 5 has more (how much more?) plug-ins to offer. Each ProM 5 plug-in needs to be updated by the author (or a student) in order to run in ProM 6. So if you plan to do sophisticated analysis you might want to keep ProM 5 installed.

To conclude, I think that new process miners should be introduced to ProM 6. The usability is better than that of ProM 5 although for both you need a learning period.
For those more advanced in process mining it is necessary to switch between ProM 5 and ProM 6, depending on the type of analysis you want to perform. Hopefully most of the ProM 5 plug-ins will find their way, some with improvements, to ProM 6.

But, that’s only my opinion, what do you think? Do you think ProM 6 can replace ProM 5 yet? Do you point a new process miner to ProM 5 or ProM 6? And did I miss any (dis)advantages???
Let me know either in a comment on this post, the post at the Fluxicon Capacitor or maybe in a dedicated discussion on LinkedIn.

Looking forward to your opinions!!!

Joos

Advertisements

Written by Joos Buijs

November 22, 2010 at 14:36

Wanted: Beta testers for XESame

with 2 comments

So, finally, the day is approaching that my baby gets the “1.0” label. But, before I dare to put it out there, I would like to have it tested, and not only by me.

So, what (/who) is XESame?
XESame started as XESma during my Master’s project. The goal of XESame is to extract event logs from data sources. The input format can be database tables, text files or even XML files. The output is an eventlog in the XES or MXML format.
A good slogan for XESame would be: “Opens the cave of process mining wonders”, but that would be a bit bragging.

Err, sounds great but then what?
This event log can be used in ProM to apply process mining analysis (which is sometimes called ‘magic‘ (Dutch article), it also produces very colorful and nice pictures…). More about ProM and process mining can be found at processmining.org.

But why do you want to test it now?
In September 2010, at the BPM’10 conference in New York, the ProM 6 framework will be officially released. Included in this framework is XESame. The next few days and weeks ProM 6 will be tested (internally) for the release. Since XESame will be released for the first time and I’m the only one working on it, I would really like some thorough testing and feedback.

Okay, so, how can I help?
Well, you can do several things, depending on what you like to do and how much time you can/will spend. First of all, I would suggest that you download XESame and try to extract an event log from data you have available. Then report back to me if XESame was useful and why (not).
XESame uses JDBC to connect to the data source. Since I can not test XESame on ‘all’ data source types out there, I’m interested in how it works on different types of data sources (e.g. different databases such as MySQL, Oracle, MS SQL, etc.)
Furthermore, if you encounter any errors, please let me know so I can try to fix them.
I’m also very interested in what features are missing and how XESame can provide better guidance in defining an extraction of event log data.

But I already looked at XESMa, do you need my help?
Well, yes, for two reasons: First, what did you think of XESMa when you tried it? Second: the graphical user interface of XESame is completely different from the (rudimentary and bloated) interface of XESMa. So I always need (and will appreciate) your help.

Okay, so how do I get started?
Good question (and I’m glad that you want to get started).

First of all, you need to download XESame of course and run it. Go to the ProM 6 BPM’10 release page and download the latest version of the framework and XESame. This should be under the section ‘Download’ or otherwise ‘History’.
For Windows users there is an xesame.exe file that you can start. For Mac/Linux/… users start the MainFrame class in the org.processminning.mapper.ui package from xesame.jar.
If you didn’t try XESame or XESMa before, it might be a good idea to read my Master’s thesis (PDF, 8Mb), especially chapters 5 and 6 with all the examples. (Not in any way suggesting that chapters 1, 2, 3, 4 and 7 are not interesting to read of course.) Although the thesis talks about XESMa a lot, everything should also be applicable for XESame.
And if you’re really interested, look at the XESame source code via http://prom.win.tue.nl:8000/Tracsites/browser/public/XESame/src/org/processmining/mapper or point your SVN client to svn://prom.win.tue.nl/public/XESame (you can use “anonymous/anonymous” for anonymous access, although you cannot commit of course).

Once you’re done fiddling around or when you encounter a serious error or bug or get stuck, contact me and I’ll try to help you. The best way to contact me is to go to my employee page and see if you want to come by my office, give me a call or send me an e-mail (or contact me through Office Communicator on my tue mail address).
Unfortunately, I’m only human so on occasion I might be at the restroom, having lunch or even on holiday (from August 9 until (and including) 20).

So, even if you don’t plan to click on any of the above links, I would like to thank you for reading this post. I hope to hear from you soon and until next time,

Update 28-7-2010 16:50 (CET): I forgot to mention that the ‘official home’ of XESame is http://prom.win.tue.nl/research/wiki/xesame/start (I was too exited…).

Joos

P.S. huge disclaimer follows:
Please note that the author, the department or the university can not be held responsible for any damage caused by direct or indirect usage of XESame (or XESMa). It is recommended that XESame will only be provided read access to the data source and that you run XESame on a copy of the data an not on (the only instance of) the original data source. And of course, XESame is not extensively tested (yet) so it might do strange things to you or your computer. But rest assure, me and my computer survived all months of development.

Written by Joos Buijs

July 27, 2010 at 11:30

The Results of my Master Project

with 2 comments

Update 26-05-2010: The official XESame (or XESMa) website is now located at processmining.org! This post will not be updated further.

So, after 7 months my master project is completed and the results are final!

Last Monday I gave my final presentation (.pptx, 1.7 MB). This presentation gives a good introduction into the problem and topic of my project.

More detailed information about what I did can be found in my master_thesis (.pdf, 9.8 MB). This should also be used as a temporary ‘user guide’ for my application.

Warning: This is a prototype! No support or guarantee is given whatsoever! Use at your own risk!

If you want to test/play with the prototype I created, it can be downloaded at the link below. However, use it at your own risk 😉

XESMa Application Prototype (v 1.0) (.zip, 3.4 MB)

How to start XESMa in 3 steps:

  1. Extract the contents of the zip file;
  2. In Eclipse (or any Java editor), create a new Java Project from the folder you just extracted;
  3. Execute ‘Application.java’ in the org.processmining.mapper.ui package.

Warning: This is a prototype! No support or guarantee is given whatsoever! Use at your own risk!

Now that everything is finished I will enjoy a holiday until May 3. Then I’ll start on a PhD position, here at the TU/e, more about this in another blog post.

I hope I have/get the time to continue to work on XESMa in the future. I have some ideas for improvement. And of course, your feedback is very much appreciated!

Thesis abstract:

Information systems are taking a prominent place in today’s business process execution. Since most
systems are complex, enterprise-wide systems, very few users, if any, have a clear and complete
view of the overall process. In the area of process mining several techniques have been developed to
reverse engineer information about a process from a recording of its execution. To apply process
mining analysis on process-aware information systems, an event log is required. An event log
contains information about cases and the events that are executed on them.
Although many systems produce event logs, most systems use their own event log format.
Furthermore, the information contained in these event logs is not always suitable for process
mining. However, since much data is stored in the data storage of the information system, it is
often possible to reconstruct an event log that can be used for process mining. Extracting this
information from the business data is a time consuming task and requires domain knowledge. The
domain knowledge required to de ne the conversion is most likely held by people from business,
e.g. business analysts, since they know or investigate the business processes and their integration
with technology. In most cases business analysts have no or limited programming knowledge.
Currently there is no tool available that supports the extraction of an event log from a data source
that doesn’t require programming.
This thesis discusses important aspects to consider when de ning a conversion to an event log.
The decisions made in the conversion de nition in
uence the process mining results to a large
extend. De ning a correct conversion for the speci c process mining project at hand is therefore
crucial for the success of the project. A framework to store aspects of such a conversion is also
developed in this thesis. In this framework the extraction of traces and events as well as their
attributes can be de ned. An application prototype, called `XES Mapper’ or `XESMa’, that uses
this conversion framework is build.
The XES Mapper application guides the de nition of a conversion. The conversion can be
de ned without the need to program. The application can also execute the conversion on the data
source, producing an event log in the MXML or XES event log format. This enables a business
analyst to de ne and execute the conversion on their own. The application has been tested with
two case studies. This has shown that many di erent data source structures can be accessed and
converted.

Keywords: data conversion, database, event log, process mining, process-aware information
system

Edit 01-04-2010 11:50: added XESMa execution steps

Written by Joos Buijs

March 31, 2010 at 15:17

The preliminary results are in…

leave a comment »

…and it looks good 🙂

The first preliminary results are those of my intermediate presentation of December 15. It went well, although there is always room for improvement of course. I managed to have a working version of my application by then so that was nice to show. Furthermore, there were actually people there besides my supervisor, tutor and third committee member.

The other preliminary results are the first XES event logs generated by my application. Although generated from a ‘single table source’ using a rather straight forward mapping, it is promising and rewarding to see your event log being loaded in ProM (version 6) and everything works.

Enough work remains to be done, some small (e.g. change some texts in the user interface) others larger (e.g. ordering of events in the event log and automatically linking those tables used in the mapping). But on the other hand, I still have more than a week to implement those functions and completely test my application. For comparison: I needed 2 weeks to build my user interface and update my domain model accordingly. Another 2 weeks where needed to get as far as I am now.

Since the GUI is rather stable I think I can show it to you. So, here it is:

Basic user interface of the XES mapper appication

As you can see, it consists of three main parts: The bottom part is for the ‘general mapping settings’ such as a name and description, the connection settings to the data source, managing the XES extensions (shown in the screen shot), console output and executing the mapping. The top left part is for navigating the mapping definition, here you can select the element (log, trace, event or attribute with ‘children’) you want to edit. The top right part allows you to add, edit and delete attribute definitions (shown in the screen shot), define some mapping properties and for the log specify the event classifiers (you probably have no clue why you want those but don’t worry, you’ll learn in the documentation of the new XES version).

Also, I think that, now I know the application is likely to be born without complications and is likely to survive, I can think of a (nick)name for my baby… I have a nice one in mind but I won’t announce it just yet, you’ll see it at the release.

So, the next week(s) I will add some more functionality to the application, test it thoroughly on test data and eventually on case data. And I will also work on the most exciting part: the thesis! I don’t mind working on the thesis, it is probably the most prominent result of my labor but its not, well, exciting… Programming is nicer, there you can hunt bugs, search for performance improvements and play with your creation. A thesis is just a thing that sits there and you can look at it. Luckily I’m writing it in LaTeΧ so I can still have compiling errors and won’t have to fight with a Word processor.

Well, for now I wish you all a nice Christmas holiday and a very nice New Year in case I don’t blog in the next 2 weeks.

ttfn!

Joos

Written by Joos Buijs

December 18, 2009 at 16:50

Process Mining Terms: A Small Glossary

leave a comment »

Recently I helped someone unfamiliar with process mining in starting analysis on a log. One of the things that I noticed is that it is hard to get to know the overall ‘structure’ and meaning of the terms used. This is further complicated by inconsistent use of terminology in conversations and documentation but also in ProM 5.2. In this post I will try to explain some of the most common terms used in process mining and what they (should) mean.

Note: this is not a ‘definite’ list, it is just how I think the terms should be interpreted and used!!! Furthermore, any suggestions and additions are welcome!!!

The overall picture: A system (e.g. a workflow management system) facilitates the processing of cases using a predefined process in which activities and their ordering is defined. The activities executed in this system are recorded in an event log which can be ‘reverse engineered’ using ProM for instance. The log contains actual executions of events on cases on a certain moment in time by a certain actor etc.
The result of this reverse engineering can be a process model describing the behavior recorded in the log but performance -, social network – and constraint analysis is also possible. We won’t go into all the possible analyses in this post.

So, an (event) log contains information about process instances (e.g. cases) and the events that are performed on/for them.

It is also important to understand that there are two levels: one is the conceptual level in which we do not talk about actual instances but generally talk about objects that can appear in a log. The other level is the instance level in which you look at specific instances of process instances, event executions, originators, etc. etc. In the general terms list I tried to indicate whether a term refers to a conceptual aspect or really refers to an (set of) instance.

General Process Mining terms:

(Used in ProM 5.2 and MXML, new terms are used in ProM 6 and the XES event log format)

  • Activity An action or task that can be performed for a process instance (conceptual level);
  • Data attribute An extra attribute recorded in the MXML file. Examples are the amount of a purchase order or the patient’s age. These attributes can for instance be used for decision analysis in ProM (conceptual level);
  • Event This can either refer to an activity or an event instance performed by a resource on a certain time for a specific process instance. The meaning therefore depends on the context in which it is used;
  • Event Class Used in the ProM Dashboard, it refers to the number of different activities encountered in the log (instance level).
  • Event Log A recording of a set of events, an MXML log is an example of an event log format (instance level);
  • Event Instance A recording of an executed event with information such as execution timestamp, event type and originator (instance level);
  • Event Type Each activity can be in one of several states. The most commonly used states are ‘start’ and ‘complete’. The meaning is very straightforward: an activity is started and a certain amount of time later it is completed. There are several other event types or states, for a complete overview see figure 4 in the ‘MXML paper’ (PDF) (might be outdated) (conceptual level);
  • Log The original log generated by the source system which records things that have happened. In order to be used within ProM this needs to be converted to the MXML format using the ProM Import Framework (instance level).
  • Process Instance (PI) The object you are following and on/for which events occur. Examples are cases, patients, machines etc. (can be both conceptual and instance level);
  • Process mining Analyzing a business process based on an event log, see http://www.processmining.org;
  • ProM An application to apply several process mining techniques to an event log, see http://www.processmining.org. The version at the moment of writing is 5.2 and version 6.0 is under development (nightly builds are available);
  • ProM Import Framework A framework for converting event logs to the MXML event log format. A set of converters for common formats is available but new converters can be programmed in Java;
  • Model Element Used in the ProM Dashboard Summary, it should be interpreted as ‘activity’.
  • MXML A meta model for event logs. An event log needs to be in this XML format to be processed by ProM. More information can be found in the ‘meta model for process mining’ paper (PDF) (conceptual level);
  • MXML log The actual MXML file with all the recordings following the MXML format (instance level);
  • Resource Any actor that can execute an activity, for example humans, the system itself or a web service (conceptual level);
  • Timestamp A time indication consisting of a date and possibly a time part (instance level);

Well, that’s the list for now. I hope I helped someone and did not add to the confusion. If you have any questions, suggestions or additions, please post a comment!!! Especially the ‘conceptual v.s. instance’ part was hard for me to explain so any improvements are welcome.

– Joos –

P.S. @my supervisor: I created this article in the weekend and scheduled it for publication on Tuesday, so don’t think I’m procrastinating 🙂

Activity An action or task that can be performed for a process instance.;

Written by Joos Buijs

November 17, 2009 at 17:00

Posted in Process Mining

Tagged with , , ,

About my master project

with one comment

To make sure that this blog won’t be about funny process models alone it might be a good idea to introduce and explain my master’s project subject: Mapping Data Sources to XES in a Generic Way. Let’s dissect this rather vague sentence to explain what it is all about:

Mapping: in this context it means to let the user define a way to map one data source to another.

Data Sources: most people might think of databases first but text files, XML files and even web services can be considered as data sources. Although it must be seen how many data sources we are able to support, we intend to at least support the common database formats and the CSV (comma separated values) plain text format.

XES: the one requiring the most explanation. Although pronounced (as ‘excess’) similar to one of the well known database formats from a well known software vendor, it means something completely different. In this case we refer to the ‘Extensible Event Stream’ format. This format is an extend-able event log format for, well, storing event logs. For people familiar with the MXML format: XES is the new and improved MXML! For people unfamiliar with MXML: visit processmining.org (more specifically, read an informal introduction to MXML (.PPT, 0,9 MB) or read the more formal MXML introduction paper (PDF, 130 KB)). The XES meta-model is implemented in the OpenXES Java library, more information about XES can also be found there.

a Generic Way: of course, we want our application to be applicable in many situations and therefore it must be generic.

So, in brief, the goal of my project is to develop an application that allows a user to define a mapping from a (set of) data source(s) to the XES event log format and to execute this mapping. Resulting in an event log format that can be used for process mining with (the new version of) ProM.

If you have done one or several process mining project you must know that preparing the data is one of the most time consuming (and, in my opinion, most annoying) part of the process mining project. This master project aims at providing an application that will allow you, the process miner, maybe together with the domain expert, to define a mapping from the data source(s) to the XES event log format without the need to write (Java) code.

Any questions, comments, feature requests etc. etc. are more than welcome!!!

See you at the next post! (Which will either be a ‘funny process model’ or a post showing some GUI designs for the application)

Joos

Written by Joos Buijs

November 3, 2009 at 17:00