Blog of Joos Buijs

About personal things, process mining and the rest in life.

Archive for the ‘Master Project’ Category

2 Important updates (maybe more)

leave a comment »

So, I’m settling nicely into my PhD life. Writing papers, starting up the CoSeLoG project, getting into contact with most of the participants in the project, never a dull moment!

Speaking of the CoSeLoG project, it now has its own, ‘official’, website: http://www.win.tue.nl/coselog. So far it does not contain much information but this will grow in the future (you can follow the RSS feed if you’re interested).

Another big change is the renaming of my XESMa application. Apparently, it can be pronounced like something completely different (which I did not think of). So, from today on, XESMa is renamed to XESame. Not like Sesame Street but Open Sesame of course! (since it will open the process mining cave where all kinds of treasury can be found…)

XESame has also been given a ‘real’ home, at processmining.org! So far the XESame page contains even less information than my blog post but when I find time I’ll add more explanations and examples. Furthermore, I plan to update the GUI of XESame, but CoSeLoG comes first…

So far so good, this was it for now, back to making contacts!

- Joos

Written by Joos Buijs

May 26, 2010 at 17:00

The Results of my Master Project

with 2 comments

Update 26-05-2010: The official XESame (or XESMa) website is now located at processmining.org! This post will not be updated further.

So, after 7 months my master project is completed and the results are final!

Last Monday I gave my final presentation (.pptx, 1.7 MB). This presentation gives a good introduction into the problem and topic of my project.

More detailed information about what I did can be found in my master_thesis (.pdf, 9.8 MB). This should also be used as a temporary ‘user guide’ for my application.

Warning: This is a prototype! No support or guarantee is given whatsoever! Use at your own risk!

If you want to test/play with the prototype I created, it can be downloaded at the link below. However, use it at your own risk ;)

XESMa Application Prototype (v 1.0) (.zip, 3.4 MB)

How to start XESMa in 3 steps:

  1. Extract the contents of the zip file;
  2. In Eclipse (or any Java editor), create a new Java Project from the folder you just extracted;
  3. Execute ‘Application.java’ in the org.processmining.mapper.ui package.

Warning: This is a prototype! No support or guarantee is given whatsoever! Use at your own risk!

Now that everything is finished I will enjoy a holiday until May 3. Then I’ll start on a PhD position, here at the TU/e, more about this in another blog post.

I hope I have/get the time to continue to work on XESMa in the future. I have some ideas for improvement. And of course, your feedback is very much appreciated!

Thesis abstract:

Information systems are taking a prominent place in today’s business process execution. Since most
systems are complex, enterprise-wide systems, very few users, if any, have a clear and complete
view of the overall process. In the area of process mining several techniques have been developed to
reverse engineer information about a process from a recording of its execution. To apply process
mining analysis on process-aware information systems, an event log is required. An event log
contains information about cases and the events that are executed on them.
Although many systems produce event logs, most systems use their own event log format.
Furthermore, the information contained in these event logs is not always suitable for process
mining. However, since much data is stored in the data storage of the information system, it is
often possible to reconstruct an event log that can be used for process mining. Extracting this
information from the business data is a time consuming task and requires domain knowledge. The
domain knowledge required to de ne the conversion is most likely held by people from business,
e.g. business analysts, since they know or investigate the business processes and their integration
with technology. In most cases business analysts have no or limited programming knowledge.
Currently there is no tool available that supports the extraction of an event log from a data source
that doesn’t require programming.
This thesis discusses important aspects to consider when de ning a conversion to an event log.
The decisions made in the conversion de nition in
uence the process mining results to a large
extend. De ning a correct conversion for the speci c process mining project at hand is therefore
crucial for the success of the project. A framework to store aspects of such a conversion is also
developed in this thesis. In this framework the extraction of traces and events as well as their
attributes can be de ned. An application prototype, called `XES Mapper’ or `XESMa’, that uses
this conversion framework is build.
The XES Mapper application guides the de nition of a conversion. The conversion can be
de ned without the need to program. The application can also execute the conversion on the data
source, producing an event log in the MXML or XES event log format. This enables a business
analyst to de ne and execute the conversion on their own. The application has been tested with
two case studies. This has shown that many di erent data source structures can be accessed and
converted.

Keywords: data conversion, database, event log, process mining, process-aware information
system

Edit 01-04-2010 11:50: added XESMa execution steps

Written by Joos Buijs

March 31, 2010 at 15:17

Master Project Update: the end is approaching

with one comment

Hi all!

Well, the title says it all, the end of my master’s project is in sight!

The application is ‘nearly done’, there are so many things that could be improved but… well, there is not much time. So, let’s say that the application is at the beta stage then. Yesterday I tried to use it on a real data source instead of my toy database of only 10 records. The results were promising and today I’m processing some of the things we encountered yesterday. For instance the parsing of dates into a Java Date instance is problematic. The format of the date is not always the same and automatically detecting the format used is nearly impossible. Therefore the user (/you) can now define the format used to represent the date and time.

Another type of problem we encountered was related to the ODBC driver but that I can not fix… Other improvements are related to me trying to be too smart (which of course turns out wrong). And some performance issues (but these might be related to the ODBC driver used). And of course a lot of small improvements to the user interface can/should be made etc. So much to do, so little time :)

Early this week I also ‘finished’ the visualization of the conversion. The idea is to visualize which tables and columns are used in certain attributes. In the screenshot below a very small event log (with one event definition) is visualized. The conversion uses 2 columns of the event.csv table. I know that the visualization shown is very small and larger visualizations will get messy but it’s hard to get it right… And, well, its only a prototype ;)

I’m also working on my thesis, for about a month now. The contents is structured as follows:

  1. Introduction (context, problem, goal, scope and method of the project) [4 pages]
  2. Preliminaries (explanation of process aware information systems (PAIS), event logs, process mining and other conversion tools) [12 pages]
  3. Conversion Aspects (what to consider when defining a conversion) [8 pages]
  4. Solution Approach (how I planned to implement the application) [7 pages]
  5. Solution Implementation (more details of the technical implementation and use of the application) [14 pages]
  6. Case Studies (2 case studies (SAP and a custom system) to show the validity of my application) [to write]
  7. Conclusion (conclusions and future work) [to write]

So, I still have to perform my case studies, write Chapters 6 and 7 plus the abstract, preface etc. and thoroughly read the entire thesis. And all of that within the next 2 to 3 weeks. And then I’ll have to wait for the reviews of my supervisors and prepare for the final presentation of March 29…

You are all invited for my final presentation of course!!! It will be held at March 29 2010 at 15:00 in Eindhoven, the Netherlands. If you like to attend, please let me know then I’ll inform you of the location.

If you can not attend the final presentation and/or want to read my thesis or try out my application, keep an eye on this blog. I’ll post a link to both of them just before or after my final presentation.

So, now I’m going back to programming again (stupid SQL error…) and enjoy the weekend in a little bit.

TTFN!

Joos

Written by Joos Buijs

February 19, 2010 at 17:17

Taking more time to do a better job

leave a comment »

(First of all, happy 2010 to all of you!)

The week before Christmas I had a meeting with my supervisor Wil van der Aalst and my tutor Eric Verbeek where Wil made a tempting suggestion: if you take more time you can visualize the mapping. His argument was that the project as is would certainly be a good master project. However, if I could visualize the mapping between the data source and the event log, the project as a whole would be more coherent and, well, better. It’s the classic ‘time-cost-quality’-triangle decision: by taking more time we can improve the quality.

Of course, this means that I will spend an extra month on my master project, paying tuition for an extra month and start to earn money a month later. But one of my goals of this master project is to show what I’m capable of, to make it a good conclusion of my education career. Therefore I have chosen to extend my project with (what turned out to be) more than a month. So, my final presentation is no longer scheduled at February 8 but will be held somewhere around the end of March. This also means that I have (much) more time to write my thesis. I’m still scheduling the whole of January for thesis writing but the pressure is off. In February I’m planning to implement the visualization and polish the application and thesis.

Well, that was it for now, hope to see you again soon :)

Cheers,

Joos

Written by Joos Buijs

January 4, 2010 at 17:30

The preliminary results are in…

leave a comment »

…and it looks good :)

The first preliminary results are those of my intermediate presentation of December 15. It went well, although there is always room for improvement of course. I managed to have a working version of my application by then so that was nice to show. Furthermore, there were actually people there besides my supervisor, tutor and third committee member.

The other preliminary results are the first XES event logs generated by my application. Although generated from a ‘single table source’ using a rather straight forward mapping, it is promising and rewarding to see your event log being loaded in ProM (version 6) and everything works.

Enough work remains to be done, some small (e.g. change some texts in the user interface) others larger (e.g. ordering of events in the event log and automatically linking those tables used in the mapping). But on the other hand, I still have more than a week to implement those functions and completely test my application. For comparison: I needed 2 weeks to build my user interface and update my domain model accordingly. Another 2 weeks where needed to get as far as I am now.

Since the GUI is rather stable I think I can show it to you. So, here it is:

Basic user interface of the XES mapper appication

As you can see, it consists of three main parts: The bottom part is for the ‘general mapping settings’ such as a name and description, the connection settings to the data source, managing the XES extensions (shown in the screen shot), console output and executing the mapping. The top left part is for navigating the mapping definition, here you can select the element (log, trace, event or attribute with ‘children’) you want to edit. The top right part allows you to add, edit and delete attribute definitions (shown in the screen shot), define some mapping properties and for the log specify the event classifiers (you probably have no clue why you want those but don’t worry, you’ll learn in the documentation of the new XES version).

Also, I think that, now I know the application is likely to be born without complications and is likely to survive, I can think of a (nick)name for my baby… I have a nice one in mind but I won’t announce it just yet, you’ll see it at the release.

So, the next week(s) I will add some more functionality to the application, test it thoroughly on test data and eventually on case data. And I will also work on the most exciting part: the thesis! I don’t mind working on the thesis, it is probably the most prominent result of my labor but its not, well, exciting… Programming is nicer, there you can hunt bugs, search for performance improvements and play with your creation. A thesis is just a thing that sits there and you can look at it. Luckily I’m writing it in LaTeΧ so I can still have compiling errors and won’t have to fight with a Word processor.

Well, for now I wish you all a nice Christmas holiday and a very nice New Year in case I don’t blog in the next 2 weeks.

ttfn!

Joos

Written by Joos Buijs

December 18, 2009 at 16:50

My sincere apologies: I’m working

leave a comment »

Hello to my few readers,

I”m really sorry that I did not update my blog for the past two weeks. The reason is simple: I’m working. I’ve started the ‘real’ programming phase of my application. This means that I either am happily programming and don’t want to lose the rhythm or I’m stuck and really frustrated and want to solve it as soon as I can. The problem with blogging (and Twittering etc. etc.) is that the more you work and have interesting stuff to tell, you don’t have time to tell it. And when you have enough time to tell it, you don’t remember what to talk about…

So, I hope that you can stick with me until new year. By then I hope to be finished with my application and start on my thesis. As you know, writing a thesis is not always that interesting so by then I hope to have enough to talk about because I’m sure that time won’t be a problem :)

As a ‘gift’ (I have a strange sense of what to give to people ;) ) I add the domain model of my application (current status!!!) to this post. This might either add to the confusion or understanding but at least I provide you with data (maybe not information but at least data).

Domain Model (version of 2-12-2009 11:00)

Hope to see you soon!

Joos

Oh, by the way, I give an intermediate presentation on Tuesday December 15 from 9:00 – 10:00 at the TU/e, Main Building, Room 5.95 (Seidelzaal). You are more than welcome to be there. I will also announce my final presentation via this blog in due time :)

Written by Joos Buijs

December 2, 2009 at 17:46

About my master’s project: more concrete

leave a comment »

Since my introduction in my last post about my master project might be a little vague and general I thought that it might be a good idea to provide you with the user interface sketches I made a couple of weeks ago. They are not final of course and some details have changed in the mean time but the main idea stays the same.

Two example screens of my future application at work are:

Mapping application just started with only the log and trace elements.

This screen shows the editor after project creation with only the basic log and trace elements (which can not be added or removed).

Mapping Editor, with events

This screen shows the editor with some events and properties defined.

It might be good to note that you should view the mapping at the ‘meta’ level. This means that we do not define the event instances themselves but we define where to find events for the traces (for which we also defined how to retrieve them). To complicate things further, we might not need to specify each event type (e.g. “Create order”, “pay order”) separately but if we have some kind of event log as input, the event name (or type or WFMElt or what you call it) could be stored in the data source. Then you might only specify one event mapping which retrieves all the event instances from the data source.

The second screen for instance shows the definition of the “Create” event which can be found in the table “Order”. The username and timestamp values can be extracted from the data source as defined in this example. This event would be added to each trace that we can extract according to the trace mapping definition.

Additionally, there are some mapping properties needed for execution. These are the ‘default’ entity to use, how to link two entities together (if not defined in the data source) and a possible selection criteria for traces and/or events. Furthermore, the trace needs to have a unique identifier defined so we can connect events to traces.

The log, trace, event and attribute terms are re-used from the XES definition and the whole mapping definition quite closely follows this event log meta format (where this mapping is another meta level higher I suppose).

Well, I know that this might still sound rather vague but I hope at least less vague then in my previous post.

If you have any questions, please ask!

Joos

PS: I’m actually on holiday this week (this is a scheduled post) so I might not reply before November 16 (2009).

Edit 13-11-2009 21:15: Improved some text, ‘fixed’ the images and added tags to post. Memo to self: never create a post 2 minutes before you leave for a holiday ;)

Written by Joos Buijs

November 10, 2009 at 17:00

About my master project

with one comment

To make sure that this blog won’t be about funny process models alone it might be a good idea to introduce and explain my master’s project subject: Mapping Data Sources to XES in a Generic Way. Let’s dissect this rather vague sentence to explain what it is all about:

Mapping: in this context it means to let the user define a way to map one data source to another.

Data Sources: most people might think of databases first but text files, XML files and even web services can be considered as data sources. Although it must be seen how many data sources we are able to support, we intend to at least support the common database formats and the CSV (comma separated values) plain text format.

XES: the one requiring the most explanation. Although pronounced (as ‘excess’) similar to one of the well known database formats from a well known software vendor, it means something completely different. In this case we refer to the ‘Extensible Event Stream’ format. This format is an extend-able event log format for, well, storing event logs. For people familiar with the MXML format: XES is the new and improved MXML! For people unfamiliar with MXML: visit processmining.org (more specifically, read an informal introduction to MXML (.PPT, 0,9 MB) or read the more formal MXML introduction paper (PDF, 130 KB)). The XES meta-model is implemented in the OpenXES Java library, more information about XES can also be found there.

a Generic Way: of course, we want our application to be applicable in many situations and therefore it must be generic.

So, in brief, the goal of my project is to develop an application that allows a user to define a mapping from a (set of) data source(s) to the XES event log format and to execute this mapping. Resulting in an event log format that can be used for process mining with (the new version of) ProM.

If you have done one or several process mining project you must know that preparing the data is one of the most time consuming (and, in my opinion, most annoying) part of the process mining project. This master project aims at providing an application that will allow you, the process miner, maybe together with the domain expert, to define a mapping from the data source(s) to the XES event log format without the need to write (Java) code.

Any questions, comments, feature requests etc. etc. are more than welcome!!!

See you at the next post! (Which will either be a ‘funny process model’ or a post showing some GUI designs for the application)

Joos

Written by Joos Buijs

November 3, 2009 at 17:00

Hello world!

with one comment

Welcome to my personal blog on WordPress.com. This is my first post.

And since this is my first post, I’ll tell a little bit about me and why I started this blog.

So, I’m Joos Buijs from the Netherlands. At the moment I’m a graduate student ‘Business Information Systems‘ at Eindhoven University of Technology. I’m working at my master thesis/project which is related to process mining, a ‘hot research topic’ at the group I’m working at. But more about my master project in a later blog post.

The goal of this blog is mainly to create a ‘personal brand’, or in other words, to promote myself to future employers, colleagues and the rest of the world. Secondary, but not less important, I want to get in contact with new people and share idea’s and opinions and help each other. For instance, I believe that my master project might help people active in process mining and they can provide me with tips to improve my application.

If you want to get to know me better you can read my ‘online resume’ also known as my LinkedIn profile (which also includes a picture or myself, to make it more personal). And of course, you can always contact me by adding a comment, send me a message or try to meet me in real life :)

That’s it for now. I cannot promise a regular update schedule, in most cases its contradictory to my work schedule. Right now for instance I’m taking a break from reading a book (on plug-in development in Eclipse) but when I’m ‘really working’ I have less time to post something interesting (although then I’m likely to have something interesting to post…). Anywho, hope to see you soon!!!

Take care,

Joos

Written by Joos Buijs

October 20, 2009 at 13:32

Follow

Get every new post delivered to your Inbox.