データマイニング - Data Mining | PROCESS MINING INITIATIVE - プロセスマイニング・イニシアティブ

The Road from Process Mining to Augmented Business Process Management (Japanese ver.) – Marlon Dumas

2021-12-30 by Jun Matsuo

当記事は、Tartu大学教授、Marlon Dumas氏の掲載許諾を得て日本語に翻訳したものです。日本語での理解がしやすいよう、多少補足・意訳している箇所があります。日本語版の文責はすべて松尾にあります。

Marlon Dumas氏は、BPM(Business Process Management）、Process Miningの研究者として世界的に著名です。オープンソースのプロセスマイニングツール、「Apromore（アプロモーレ）」を開発販売するApromore Pty Ltdの共同創業者でもあります。

また、世界の多数の大学において、BPMの教科書に採用されている『Fundamentals of Business Process Management』の共著者です。なお、『Fundamentals of Business Process Management』の日本語版が2022年中に刊行予定です。

The Road from Process Mining to Augmented Business Process Management

プロセスマイニングから拡張ビジネスプロセスマネジメントへ

– Marlon Dumas, Professor at University of Tartu | Co-founder at Apromore

ビジネスプロセスマネジメント（BPM）の分野において、2021年はわくわくする一年となった。プロセスマイニング、タスクマイニング、デジタルプロセスツイン、予測プロセスモニタリングなどの分野で、導入の成功事例や報告が相次いだ。

そして、これからやってくるものがまだある。私たちは、BPMに対する新しいアプローチの誕生を目の当たりにしようとしているのである。データ分析と人工知能（AI）の手法を活用して、継続的なプロセス改善を実現するアプローチである。私たちはこのアプローチを拡張BPM – Augmented BPM –と呼んでいる。

2022年には、拡張BPMの方向にさらに歩みを進めることになるだろう。この記事では、拡張 BPMの出現をもたらす潮流と、これらの潮流から、組織がどのように利益を得られるかを探っている。

拡張 BPMとは？

拡張 BPMとは、データ分析とAIに基づき、プロセスの設計時と、プロセス実行時の両方でプロセス改善の意思決定を行う、ビジネスプロセス管理のアプローチである。

拡張BPMは、個々のタスクの実行や意思決定の自動化（例：機械学習コンポーネントを使用して顧客の苦情を分類する）に、分析やAIを使用する以上のものである。それは、分析とAIを全面的に利用して、ビジネスプロセスを継続的に監視し、適応させ、また再設計することである。

拡張 BPM ピラミッド

拡張 BPMがカバーする範囲をよりよく理解できるよう、図1に示したような「ケイパビリティ（能力）のピラミッド」として概念化した。

最下層には、「記述的プロセスマイニング – Descriptive Process Mining -」がある。（これは、従来のプロセスマイニングの領域である）プロセスマイニングは、企業システムから抽出したデータセットを用いてビジネスプロセスを分析する技術である。これらのデータセットはイベントログと呼ばれる。イベントログは、ビジネスプロセスの文脈においては、アクティビティ（またはアクティビティ内のステップ）の実行を捕捉した記録の集合体である。

プロセスマイニングには様々な技術が含まれるが、これらは4つのケイパビリティ領域に分けられる。

自動化されたプロセス発見 – Automated Process Discovery

データからプロセスモデルを発見し、プロセスの主な経路や例外を明らかにし、無駄（反復・手戻りや、過剰な処理など）を浮き彫りにする機能。

適合性検査 – Conformance Checking

コンプライアンスルールの違反（請求書のない購買発注など）や、観測された実際の手順と、準拠すべき規範的手順との乖離など、望ましい手順からの逸脱を検出する機能。

パフォーマンス・マイニング – Performance Mining

定量的なパフォーマンス指標をプロセスの要素に結びつける機能。例えば、SLA(Service level Agreement)に対する違反に関わるボトルネック、反復・手戻りの繰り返しがもたらす過剰なコストや無駄を明らかにする。

バリアント分析 – Variant Analysis

異なるサブセットのケース（例えば、地域別）でプロセスがどのように実行されているかを比較することにより、プロセスにおける好ましい、あるいは好ましくない逸脱を識別する機能。

これらの機能により、ボトルネック、反復・手戻り、コンプライアンス違反などの摩擦が起きている箇所を特定し、その原因やKPI（主要業績評価指標）への影響を調査することができる。これらの機能を利用して、継続的なプロセス改善に取り組んでいる企業は多い。

記述的プロセスマイニングは、それ自体が価値のある能力であるが、その長期的な価値は、それが他の豊富な能力につながる扉を開くことにある。実際、組織がプロセスマイニングのために収集した同じデータセットを使って、将来何が起こるかを教えてくれる予測モデルを構築することができる。

これにより、拡張 BPMピラミッドの第2層である「予測的プロセスマイニング – Predictive Process Mining – 」にたどり着く。記述的プロセスマイニングでは、プロセスが過去にどのように実行されてきたかを理解ができる。一方、予測的プロセスマイニングでは、プロセスが将来どのように展開するかを予測する。予測的プロセスマイニングには2つの機能がある。

予測的プロセスモニタリング – Predictive Process Monitoring

プロセスの将来の状態を予測する機能。例えば、O2C（Order-to-Cash：受注から入金まで）のプロセスでは、顧客が注文した製品が時間通りに発送されるか、あるいは遅れて発送されるかを予測することができる。一般的に、予測的プロセスモニタリングは機械学習技術を用いて実装される。まず、過去のデータをもとに予測モデルを作成し、それをイベントストリーム（現在実行中のプロセス）に適用して将来どうなるかを予測する。

デジタルプロセスツイン – Digital Process Twin

プロセスを変更した場合の影響を予測すること。例えば、ERPシステム上で実行されるO2Cプロセスを考えてみよう。記述的プロセスマイニングを適用することで、プロセスの包装工程でボトルネックが発生し、多くの遅延が発生していることが判明するかもしれない。ここで、プロセスマイニングと機械学習を用いて、デジタルプロセスツイン（DPT）と呼ばれるプロセスの複製を構築する。そして、このDPTを用いて、包装工程にスタッフを追加投入した場合に何が起こるかをシミュレーションする。DPTでは、このような変更やその他の実行可能な変更が納期遅れに与える影響度合いを推定することができる。管理者は、この機能のおかげでプロセス改善行動のROIを推定し、より効果な改善行動を見出すことができる。

プロセスがこの先どうなるかを予測することは役に立つ。しかし、予測が価値を生むのは、それに基づく改善行動があってこそだ。これが、拡張 BPMピラミッドの第３層である「処方的プロセス改善 – Prescriptive Process Improvement- 」である。処方的プロセス改善とは、予測をアクションに変えることであり、1つまたは複数のKPIに関して、ビジネスプロセスのパフォーマンスを改善するために最適なタイミングで実行される仕組みである。

この層では、「プロセスマイニング」から「プロセス改善」へと焦点が移る。プロセスマイニングでは、データからパターンを発見し、そのパターンを使ってプロセスを説明したり、予測を立てたりすることに焦点を当てる。ピラミッドの第３層では、パターンは二の次となり、代わりに、改善アクションを扱う。

処方的プロセス改善には2つの機能がある。

処方的プロセスモニタリング – Prescriptive Process Monitoring

1つまたは複数のKPIに関して、プロセスのパフォーマンスを最適化するためのアクションを、リアルタイムまたはそれに近い状態で推奨する機能。例えば、ある処方的プロセスモニタリングシステムが、バッチ製品の出荷が遅れる可能性を検出したとする。そのとき、遅延の影響を最小限に抑えるため、当該製品を注文した顧客に連絡して、製品を2つのバッチに分けて発送する選択肢を提案することを推奨できるだろう。

自動化されたプロセス改善 – Automated Process Improvement

例えば、不良率やサイクルタイムを最小限に抑えつつ、コストを削減するなど、相反するKPI間のトレードオフを実現するために、プロセスに変更を加えることを推奨する機能。自動プロセス改善システムは、週の初めに発生する特定のボトルネックを軽減するため、一部の担当者の割り当てルールや作業スケジュールを変更するようにプロセスオーナーに提案したり、誤発注を防ぐため、一部の発注書に追加の検証ステップを実行するように提案したりする。

上記のようなレコメンデーションは、行動と結果の間の因果関係を発見し、その関係を利用して、プロセスのどのような場合に（いつ）特定の行動を行うのが最適かを判断する因果推論と呼ばれる技術を用いて作成できる。

処方的プロセス改善では、人間のプロセス参加者に対し、機械が可能なアクションを提案する。人間の参加者は、これらの推奨事項を適用するか、あるいは無視するかを決定する。言い換えれば、システムと人間の参加者の間のやりとりは一方通行である。もし、改善アクションが、人間の参加者とAIとの会話の結果だったらどうだろうか？

これで４層目が見えてきた。「拡張 BPM – Augmented BPM – 」である。拡張 BPMは、ビジネスプロセス実行システムの自律性と、マシンと人間の参加者との間での豊かな対話が行われるという点で、処方的プロセス改善を超えている。拡張 BPMはまだ始まったばかりの概念であるが、すでに2つの特徴的なテーマを特定することができる。

対話的プロセスの最適化 – Conversational Process Optimization

プロセスのパフォーマンスが低下する状況を自動的に検出し、そのパフォーマンス低下の原因を人間のプロセス参加者（プロセスオーナーなど）に説明し、その対策を人間の参加者と議論する機能。例えば、対話型プロセス最適化システムが、ある種類の出荷がしばしば遅れることを検出したら、プロセスオーナーにこれらの出荷の輸送ルート変更をすべきであると提案する。人間の参加者は、ルート変更オプションのうち、いくつかは費用が増える可能性があるために採用しないかもしれない。あるいは、顧客に対して、複数の輸送ルートオプションを提供すると決定するかもしれない。当システムは、顧客の所在地に応じて、各顧客に複数の選択肢を提供することができる。

適応型自動運転プロセス – Adaptive Self-driving processes

自動化されたシステムが、プロセスの中で起こりうる次のアクションを判断し、次に取るべきアクションを決定する。また、人間への引継ぎが必要な状況を検出できる能力のこと。例えば、過去の実行データに基づいて、発注書を受け取った際に行うべき検証手順をシステムが決定することができる。これまでに見たことのない新しいタイプの購買注文をシステムが検出すると、人間の担当者に引き継ぎ、その担当者が、この新しいタイプの注文に対してどの検証を行うべきかを決定する。当システムは担当者の判断を記録しておおり、このタイプの発注書を再び受け取ったときにはそれを適用する。

ピラミッドのこの最後の層では、「プロセス改善」から「BPM」へと移行している。拡張BPMは、パターンを発見したり、プロセス再設計の提案を行うだけではない。拡張 BPMは、BPMのライフサイクル全体を扱うアプローチである。

拡張 BPMの恩恵を受けるために、自社は何ができるか？

多くの読者にとって、拡張されたBPMはあまりにも未来的であり、すぐに行動を起こすには値しないと思われるかもしれない。しかし、ピラミッドの最初の二つの層は、すでに実際に広く活用されている。また、第３の層を支える技術は急速に進化しており、すでに他の分野で成功を収めている。拡張 BPMのピラミッドを登ることで得られる利益は極めて大きい。ピラミッドを登るステップを踏まない組織は、取り残される可能性が高くなるだろう。その機会損失は無視できないほど大きい。

ピラミッドに沿った取り組みを考えている企業は、その過程で重要な３つのポイントを心に留めておくとよい。

１．基礎を固め、登り始め、登り続け、先延ばしにしない。

多くのマネージャーは、「データがない」「データが十分ではない」と言って、プロセスマイニングの導入を先延ばしにする。確かに、プロセスマイニングのためのデータを得ることはしばしば困難である。しかし、その効果は数千もの成功事例で繰り返し実証されている。プロセスマイニングを行うためのデータを得ることで、多くの扉が開かれる。今日、プロセスマイニングに使用されたデータは、明日には予測的プロセスモニタリングや、デジタルプロセスツインの構築に使用することができる。データの収集と前処理という障害を乗り越えれば、その可能性は無限に広がる。なお、タスクマイニングは、企業システムでは、データ収集が行えない場合に、データ収集のための別の方法を提供することに留意されたい。

２．レイヤを飛ばしてはいけない。

BPMピラミッドの下層部は、上層部からビジネス価値を引き出すための基盤となる。上の層の機能を採用することで最大限の利益を得たいと考える組織は、下の層をマスターする必要がある。

３．戦略との整合を取り、段階的にガバナンスを構築する。

プロセスマイニング、予測モニタリング、また処方的プロセス改善の取り組みは、組織の戦略的優先事項に基づいて行われる必要がある。拡張BPMピラミッドの機能は、何よりもまず、組織にとって重要なビジネスプロセスに適用されるべきである。また、これらのテクノロジーは、1つのプロセスずつ段階的に採用することが重要だ。時間をかけて、ピラミッドのテクノロジーが予測可能、かつ繰り返し価値を生み出すことを保証するために、ガバナンス構造が必要である。しかし、そこに到達する前に、社内でいくつかの成功事例を作り、幹部の支持を得る。そうして、拡張BPMピラミッドのすべての能力が具体的な価値を生み出すことを示すことで、彼らの支持を維持することが重要である。

免責事項、承認およびライセンス

この作品は、タルトゥ大学の教授として書かれたものです。私の研究は、欧州研究評議会（PIXプロジェクト）とエストニア研究評議会から資金提供を受けています。また、オープンソースのプロセスマイニングソリューションを提供するApromoreの共同設立者でもあります。

この記事はクリエイティブ・コモンズ表示一般ライセンス CC-BY 4.0 (CC-BY 4.0)の下でライセンスされています。

　Marlon Dumas – Professor at University of Tartu | Co-founder at Apromore

→　原文はこちらからどうぞ

プロセスマイニング最新機能群と課題、今後の進化の方向性

2021-11-162021-08-27 by Jun Matsuo

Latest Process Mining Functionality, Challenges, and Future Evolutionary Trends

English follows Japanese.

今回の記事では、2021年夏時点における、プロセスマイニングのテクノロジーやソリューションに焦点を当て、機能、課題、今後の進化についてお伝えします。

１　プロセスマイニングの最新機能群

プロセスマイニングは、テクノロジーやツールの側面に関心が行きがちであるが、その本質は、データ分析の理論体系・方法論（Discipline）である。実際、プロセス“マイニング”という言葉でわかるように、データマイニングの一類型と考えることができる。ただし、あらゆる事象を分析対象とする幅広い概念のデータマイニングと異なり、文字通り「プロセス」を分析対象とするのがプロセスマイニングである。その基本となる用途は「プロセスの可視化」であり、プロセスが可視化されたことによって、対象プロセスがはらむ問題点の発見が容易になる。結果として、プロセス改善の取り組みに大きな役割を果たすことができる。

1.1　現在の主要機能

さて、プロセスマイニングは、前述したように「プロセスの可視化」の方法論の確立とツール開発からその研究がスタートしている。それは、業務遂行に使用するITシステムから抽出されたデータに基づき、業務手順を示すフローチャートを自動的に作成する機能であり、「プロセス発見（Process Discovery）」と呼ばれる。その後、研究の進展、ツールの高度化に伴い、様々な機能が実装されてきた。以下は、現在のプロセスマイニングツールの多くが実装している主な分析機能である。

・プロセス発見：

業務手順を自動的にフローチャート化し、作業頻度や所要時間などを算出する

・適合性検査：

データに基づき発見された現状プロセス（as-is）と標準プロセス（to-be）との比較分析を行い、現状プロセスの逸脱を抽出する

・ダッシュボード：

対象プロセスについて、様々な切り口から集計・分析した結果を各種グラフや図でビジュアルに表示する（BIツールと同等）

1.2　最新機能群

さらに、近年では、最先端のプロセスマイニングツールでは、次のような最新機能群が搭載され始めている。

・ビジネスルールマイニング：

　対象プロセスにおいて、フローの分岐（意思決定ノード）が発生している箇所がある場合、その分岐を決定している基準＝ビジネスルールをデータに基づいて自動発見する

・シミュレーション（What-If分析）

　プロセス発見機能によって可視化された現状プロセスについて、一部のタスクを排除したり、あるいは自動化したりすることで、どの程度の改善効果が期待できるかをシミュレートする

・運用サポート

　現在仕掛中の案件について、業務遂行に関わるデータをリアルタイムに吸い上げ、業務の逸脱を探知したり、将来の問題発生を予測したりして、担当者にアラートを出す、また最善手を提案する、あるいは自動的に改善施策を実行する。

上記３つの最新機能のうち、ビジネスルールマイニング、およびシミュレーションは、既に完了した案件、すなわち過去データを分析対象としているが、運用サポートは、未完了の案件に関わるデータを逐次処理し、円滑な業務遂行を支援することが主眼である。この意味で、運用サポートは、分析の方法論の枠を超えたITソリューションの一形態とも言えるだろう。このため、プロセスマイニング業界最大手のセロニス社では、当該機能を「EMS（Execution Management System）」と呼んでいる。

２　プロセスマイニングが克服すべき課題

2.1　データ前処理の難しさ

データマイニングでは、全体の所要時間の約8割がデータの収集・抽出、クリーニングといったデータ前処理に費やされると言われる。プロセスマイニングでも同様である。多様なITシステムから抽出された数十～数百に及ぶデータファイルを適切に統合し、抜け漏れ、文字化けなどのダーティなデータを補正し、ツールに投入して分析可能な「データセット」を作り上げる労力は大きい。プロセスマイニングにおけるデータ前処理の難度を高くしている要因としては、データの抽出元が各種業務システムであることから、業務システムへの理解が必要であること、また、業務プロセス改善に資する分析結果を導くためのデータセットを作成するためには、業務自体への理解、また業務改善手法にもある程度通暁している必要があることが挙げられる。

2.2　ツールの分析品質

分析品質については２つの課題を述べたい。一つはDFGs（Directly Follows Graphs）の限界、もうひとつは、Convergence/Divergence問題である。

2.2.1　DFGsの限界

プロセスマイニングの基本機能である「プロセス発見」は、当初、ペトリネットがベースになっていたが、より現実に近いフローチャートを再現するために、様々なアルゴリズムが開発されてきている。ただ、業界有識者の話によれば、現在実用化されているプロセスマイニングツールのほとんどは、ファジーマイナーと呼ばれるアルゴリズムに基づいたもの（各社独自の改善は行っていると思われる）であると言われている。
同アルゴリズムは、一般にＤFGs(Directly-follows Graphs)と呼ばれる。ペトリネットや、また業務手順をフローチャートとして記述するための世界標準であるBPMN（Business Process Modeling and Notation）と異なり、ノードとノードが直接（Directly）結びつけられたフローチャートがDFGsである。すなわち、分岐ノードが描かれないため、このアルゴリズムでは、どこでどのような分岐が発生しているのか、具体的には、排他的（OR）なのか、並行的（AND）なのか、といったことが把握できない。このため、現状のプロセスを自動的に再現するとはいっても、分岐が明確でない不完全なものになるというのが現実である。もちろん、これについては、BPMN形式のフローチャートへの自動変換や、前述したビジネスルールマイニングの採用などの機能改善が行われてきている。

図１　Petri net、BPMN、Fuzzy Minerのフロー図例
上図でわかるように、DFGsであるFuzzy Minerには、Petri netやBPMNのような分岐ノードが存在しないため、同じプロセスの表現でありながら、Fuzzy Minerでは分岐のルールを判別することができない。

2.2.2　Convergence/Divergence問題

プロセスマイニングでは、対象プロセスで処理される案件に対して行われる各アクティビティを束ねて、フローチャートを描くために、「案件ID」、「アクティビティ（処理内容）」、およびタイムスタンプの3項目が必須である。例えば、請求書処理プロセスであれば、各請求書に付番されている個別の請求書番号、そして、その請求書に対して行われる「受領」、「確認」、「承認」、「支払い」などのアクティビティをタイムスタンプとともにITシステムから抽出することになる。

　実際のプロセスにおいてしばしば直面するのは、案件IDがひとつではないという点である。具体例を示そう。図２は、エンジニアリング会社の受注から資材調達までのプロセスの一般的なイメージである。受注した機械は、発注企業の仕様に基づいて製造されなければならないため、受注後は、まず設計を行い、次に設計図（Blueprint）に基づいて必要な資材・パーツを洗い出し、サプライヤに発注する流れとなる。ここで、受注した案件は、工事番号（Construction Number）で管理されるが、一つの機械に対して複数の設計図が作成されるため、設計段階では、設計図番号（Blueprint Number）が用いられる。さらに、資材・パーツの洗い出しにはパーツ番号（Parts Number）が、調達時には、複数のパーツがいくつかにまとめられて調達要求が出される。この時は、調達要求番号（Procurement Request Number）が付番される。さらに、複数の調達要求は、サプライヤ毎に集約されて発注が行われる。ここでは発注番号（Order Number）が管理用のIDとなる。

図２　受注から資材調達までのプロセス例（エンジニアリング会社）
1台の機械受注に対して複数のBluleprint、Parts、Procurement Request、Orderが紐づけられ、ひとつの案件IDだけでは適切な分析が行えない

　このように、ひとつの案件が処理されていく中で、集約されたり（Convergence）、拡散したり（Divergence）するプロセスが実務ではごく普通に見られる。従来のアプローチでは、プロセス開始時の工事番号を案件IDとして資材調達までを一気通貫に分析することになるが、途中に集約や拡散が存在していると、実態とはかけ離れたプロセスが再現されてしまう。（例えば、拡散している箇所は単なる繰り返しタスクとして認識されるなど）

　このConvergence/Divergence問題は、プロセスマイニングの分析品質を左右する最大の課題と言える。そこで、近年では、プロセスマイニングのゴッドファーザー、Wil van der Aalst教授が率いる研究者たちが「Object-Centric Process Mining」(1)と称する独自の方法論により当課題の解決に取り組んでいる。また、myInvenioには、マルチレベルマイニングという機能が実装されており、一つのプロセスについて複数の案件IDを設定することで、プロセスの集約・拡散の状況を加味したフローの再現を実現している。

３　今後の進化の方向性

　プロセスマイニングは、データ分析の枠を超えて、業務支援ソリューションとしての役割も果たしつつあることは前述した。ここでは、プロセスマイニングは今後、どのように進化していくのか、俯瞰的な視点で述べてみたい。

3.1　プロセスマイニング1.0

プロセスマイニングは。現状のプロセスをデータから自動再現する「プロセス発見」が基本機能であった。これは、現状をありのままに描きだすという点において「記述的分析（Descriptive Analysis）」である。
ただし、本来やりたいことは、プロセスに潜む非効率性やボトルネックなどの問題個所の抽出である。つまり、どこが悪いのか、を探し出さなければならない。そこで、この部分の処理時間が長すぎる、あるいは繰り返しが多いなど、容易に問題と思われる個所を教えてくれる機能が付加されている。診断的分析（Diagnostic Analysis）に属する機能である。プロセスマイニングツールでは、一般に「根本原因分析（Root Cause Analysis）」と命名されている。
以上は、過去データを対象とする分析機能であり、プロセスマイニング1.0と呼ぶべきものであろう。

図３プロセスマイニングの進化
プロセスマイニングの機能は、プロセスマイニング1.0から2.0へと大きく進化しつつある

3.2　プロセスマイニング2.0

　プロセスマイニングの分析対象として、未完了、すなわち現在進行中の案件データをリアルタイムに取り込むようになると、逸脱の発見に加えて、現在走っている案件はあとどのくらいで完了しそうなのか、といった所要時間の予測や、将来に発生するかもしれない逸脱の予測も可能になる。こうした予測的分析（Predictive Analysis）が実装されたツールも増えつつある。
　さらには、予測結果に基づいて、所要時間を短縮するために、あるいは将来の逸脱発生を未然に防ぐために、今どのような対応を行うべきかを提案する機能を持つツールも登場しつつある。これは「処方的分析（Prescriptive Analysis）」の機能である。

　こうした未完了データを扱うプロセスマイニング分析は、既存のプロセスマイニング1.0を大きくバージョンアップするものであり、プロセスマイニング2.0と呼ぶことができるであろう。
予測的分析、処方的分析は未成熟であり、その信頼性は必ずしも高いとは言えないが、今後のさらなる技術進展を通じて、ERPなどのエンタープライズシステムに基づく円滑な業務遂行を支援する価値あるソリューションとして多くの企業への導入が進むことは間違いないと思われる。

Latest Process Mining Functionality, Challenges, and Future Evolutionary Trends

1 Latest Functions of Process Mining

Process mining tends to attract attention in terms of technology and tools, but its essence is a theoretical system and methodology (discipline) of data analysis. In fact, as the term “process” mining suggests, it can be considered as a type of data mining. However, unlike data mining, which is a broad concept that targets all kinds of events for analysis, process mining literally targets “processes” for analysis. The basic use of process mining is “process visualization,” and the visualization of processes facilitates the discovery of problems associated with the target processes. As a result, it can play a significant role in process improvement efforts.

1.1 Current Major Functions

As mentioned above, the research of process mining has started from the establishment of the methodology of “process visualization” and the development of tools. It is a function to automatically create a flowchart showing business procedures based on data extracted from IT systems used for business execution, and is called “Process Discovery. Since then, various functions have been implemented as research has progressed and tools have become more sophisticated. The following are the main analysis functions implemented in most of the current process mining tools.

Process Discovery

automatically create a flowchart of business procedures and calculate the frequency of work and time required.

Conformance Checking

compares and analyzes the current process (as-is) discovered based on data with the standard process (to-be), and extracts deviations from the current process.

Dashboards

A function to display the results of aggregation and analysis of target processes from various perspectives in various graphs and tables.

1.2 Latest Functions

In addition, in recent years, the most advanced process mining tools have begun to include the following latest functions.

Business Rule Mining

When there is a flow branching (decision node) in a target process, it automatically discovers the criteria (business rules) that determine the routing based on the data.

Simulation (What-If Analysis)

Simulate how much improvement can be expected by eliminating or automating some of the tasks in the current process visualized by the process discovery function.

Operational Support

For projects that are currently in progress, the system absorbs data related to business execution in real time, detects deviations in business operations, predicts future problems, and alerts the person in charge, suggests the best course of action, or automatically implements improvement measures.

Of the three latest functions mentioned above, business rule mining and simulation analyze past data, i.e., data that has already been completed, while operational support focuses on supporting smooth business execution by sequentially processing data related to unfinished projects. In this sense, it can be said that operational support is a form of IT solution that goes beyond the framework of analysis methodology. For this reason, Ceronis, the largest company in the process mining industry, calls this function “EMS (Execution Management System).

2 Issues to be overcome to make process mining better to be used

As seen in the acquisition of Signavio, a major tool vendor, by SAP and myInvenio by IBM, process mining is increasingly recognized as an important tool that is part of IT solutions. However, there are issues that need to be overcome in order for it to be used properly in business practices and to bring results. In this section, I would like to present the main issues from two perspectives.

2.1 Difficulties in data preprocessing

In data mining, it is said that about 80% of the total time required is spent on data preprocessing such as data collection, extraction, and cleaning. The same is true for process mining. It takes a lot of effort to properly integrate dozens to hundreds of data files extracted from various IT systems, to correct dirty data such as omissions and garbled characters, and to create a “data set” that can be fed into tools for analysis. Factors that make data pre-processing in process mining difficult include the fact that the source of data extraction is various business systems, and thus an understanding of the business systems is necessary. In addition, in order to create a data set to derive analysis results that contribute to business process improvement, it is necessary to understand the business itself and to have some familiarity with business improvement methods.

2.2 Analysis quality of tools

There are two issues that need to be addressed regarding the quality of analysis. One is the limitation of DFGs (Directly Follows Graphs), and the other is the Convergence/Divergence problem.

2.2.1 Limitations of DFGs

The basic function of process mining, “process discovery,” was initially based on Petri nets, but various algorithms have been developed to reproduce flowcharts closer to reality. However, according to industry experts, most of the process mining tools currently in practical use are said to be based on an algorithm called fuzzy miner (each company is believed to have made its own improvements).

This algorithm is commonly called DFGs (Directly-follows Graphs). Unlike Petri nets and BPMN (Business Process Modeling and Notation), which is the world standard for describing business procedures as flowcharts, DFGs are flowcharts in which nodes are directly connected to each other (directly). In other words, since branching nodes are not drawn, the algorithm cannot grasp where and how the branching is occurring, specifically, whether it is exclusive (OR) or concurrent (AND). For this reason, even if the current process is automatically reproduced, the reality is that the branching is not clear and incomplete. Of course, functional improvements have been made in this regard, such as automatic conversion to BPMN format flowcharts and the adoption of business rule mining as mentioned above.

2.2.2 Convergence/Divergence Problem

In process mining, three items, “case ID,” “activity (event),” and timestamp, are essential to draw a flowchart by bundling each activity performed for a case processed in the target process. For example, in the case of an invoice processing process, the individual invoice number attached to each invoice and the activities such as “receipt,” “confirmation,” “approval,” and “payment” for that invoice are extracted from the IT system along with the time stamp.

What we often face in the actual process is that there is no single case ID. Let’s take a concrete example. The figure below shows a general image of the process of an engineering company from order receipt to material procurement.

Since the ordered machine must be manufactured based on the specifications of the ordering company, after receiving the order, the company first designs the machine, then identifies the necessary materials and parts based on the blueprint, and then places an order with the supplier. Since multiple blueprints are created for a single machine, the Blueprint Number is used in the design stage. In addition, the Parts Number is used to identify materials and parts, and at the time of procurement, multiple parts are combined into several parts and a procurement request is issued. In this case, a Procurement Request Number is assigned. In addition, the multiple procurement requests are aggregated to each supplier and an order is placed. In this case, the Order Number becomes the ID for management.

In this way, the processes of convergence and divergence are commonly seen in practice as a single case is processed. In the conventional approach, the construction number at the beginning of the process is used as the case ID, and the entire process is analyzed up to the procurement of materials, but if there is convergence or divergence in the process, a process that is far from the actual situation is reproduced. (For example, the diffused part is recognized as a mere repetitive task.)

This Convergence/Divergence problem is the biggest issue that affects the analysis quality of process mining. In recent years, researchers led by Professor Wil van der Aalst, the Godfather of Process Mining, have been working on solving this problem using a unique methodology called “Object-Centric Process Mining” .

3 Future Direction of Evolution

We have already mentioned that process mining is playing a role as a business support solution beyond the framework of data analysis. In this section, we will discuss how process mining will evolve in the future from a bird’s eye view.

3.1 Process Mining 1.0

Process mining is. The basic function of process mining was “process discovery,” which automatically reproduces the current process from data. This is a “Descriptive Analysis” in that it depicts the current state as it is.

However, what we originally wanted to do was to extract problem areas such as inefficiencies and bottlenecks hidden in the process. In other words, we need to find out what is wrong with the process. Therefore, there is an additional function that can easily tell us where the problem is, such as the processing time of this part is too long or there are too many repetitions. This is a function that belongs to Diagnostic Analysis. In process mining tools, it is generally named “Root Cause Analysis.

The above is an analysis function for historical data, and should be called Process Mining 1.0.

3.2 Process Mining 2.0

When process mining starts to take in uncompleted, i.e., ongoing, case data in real time as a target of analysis, it becomes possible not only to detect deviations but also to predict how long it will take to complete the currently running case, and to predict deviations that may occur in the future. In addition, it is possible to predict how long it will take to complete a case that is currently running, and to predict future deviations. The number of tools that implement such predictive analysis is increasing.

Furthermore, based on the prediction results, tools that can suggest what actions should be taken now to shorten the time required or to prevent future deviations from occurring are also emerging. This is the function of “Prescriptive Analysis”.

Such process mining analysis that deals with incomplete data is a major upgrade of the existing process mining 1.0, and can be called process mining 2.0.

Although predictive and prescriptive analyses are still in their infancy and their reliability is not necessarily high, it is certain that they will be introduced to many companies as valuable solutions to support smooth business execution based on enterprise systems such as ERP through further technological progress in the future.

プロセスマイニングの進化

2020-06-29 by Jun Matsuo

Process Mining Evolution – from Process Mining 1.0 to 2.0

当記事では、プロセスマイニングが今後どのように進化していくのかについて簡単にご説明します。プロセスマイニングベンダーは現在、この進化の方向に向かってツールの機能拡張に取り組んでいます。

プロセスマイニングとは、業務システムから抽出したイベントログデータに基づいて、現行業務プロセス（as isプロセス）を可視化し、非効率な手順やボトルネックなどを発見する「分析アプローチ」です。その目的は、業務プロセスの継続的改善にあります。多くの場合、大量のデータを扱うことから、ビッグデータ分析のひとつと言えます。またデータマイニングとも近い関係にあります。

さて、日々遂行される業務プロセスの継続的改善を目的としていることから、プロセスマイニングの分析アプローチは「記述的分析」を起点に、「処方的分析」に向けて進化を始めています。なお、これは、一般的なデータマイニングにおける分析アプローチの進化と軌を一にしています。

記述的分析 – Descriptive Analytics

記述的分析とは、ありのままの現状を把握することです。

イベントログから現行プロセスを「プロセスフローチャート」の形で見える化する機能、すなわち「プロセス発見（Process Discovery)」で得られるものであり、プロセスマイニング分析の最も基本的な機能です。（したがって、この機能がないものはプロセスマイニングツールとは呼べません）

診断的分析 – Diagnostic Analysis

診断的分析とは、記述的分析で得られた現行プロセスモデルにおける問題点（非効率やボトルネックなど）の要因分析を行うものです。

「なぜ、この箇所は想定より時間が掛かっていて非効率となっているのか」、「なぜ、ここで処理待ちが多く発生しているのか、すなわちボトルネックなのか」というなぜを追求します。「根本原因分析（Root Cause Analysis）」と呼ばれる深堀り分析です。

予測分析 – Predictive Analytics

予測分析では、現在仕掛中の未完了案件（Running Case）をリアルタイムに分析し、今後どうなりそうかを予測します。

記述的分析、診断的分析では、完了済、すなわち過去のイベントログデータを分析しますが、さらに、予測モデルを開発することで、未完了案件の未来の振る舞いを確率的に予測します。すなわち、次に起こりえる活動（Activity)はなんになる可能性が高いか、また、終了までの所要時間はあと何時間になりそうか、といったことを予測し、担当者に伝えます。

ある案件の今後の流れが好ましくない方向に行きそうである、またKPIの目標値よりも所要時間が長くなりすぎて約束納期を過ぎてしまう、といったことを事前に知ることができれば、適切な予防策を講じることが可能となります。

処方分析 – Prescriptive Analytics

処方分析は、単に今後のプロセスの振る舞いを予測するだけでなく、プロセス改善のために、どのような打ち手が望ましいかをアドバイスするものです。

医師が患者を診療して、熱やセキなどの症状を記述し、インフルエンザと診断、今後高熱がさらに続くと予測して、解熱剤を処方するように、業務プロセスの将来の悪化を予測したときに、どのような改善策を講じるかを提案する。これが処方分析です。

以上、説明してきた進化の4段階のうち、記述的分析と診断的分析は、多くのプロセスマイニングツールの機能として既に実装されています。また、ユーザーもこの２つの分析を活用していることから、「プロセスマイニング1.0」と言えるでしょう。機能としては、プロセス自動発見機能と関連する分析機能によって、根本原因分析を行います。

そして、予測的分析機能や、処方的分析機能を持つものは、「プロセスマイニング2.0」と呼ぶ先進的なプロセスマイニングツールです。一部のリーディングベンダーがこれらの機能へと拡張を始めているところです。

プロセスマイニング2.0の機能としては、以下の3つが挙げられますが詳細は別記事にて。

・予測的プロセス監視 – Predictive Process Monitoring

・処方的プロセス監視 – Prescriptive Process Monitoring

・自動プロセス改善 – Automated Process Improvement

なお、プロセスマイニング進化について詳しく調べたい方は、BPM、プロセスマイニングの専門家、Marlon Duma氏（エストニアTartu大学教授）の著作やスライドシェア資料、セッション動画をご参照ください。

プロセスマイニング料理法

2020-05-212020-05-21 by Jun Matsuo

Process Mining: How to Cook
English follows Japanese. Before proofread.

プロセスマイニングは、「分析手法」です。プロセスマイニングツールを導入しただけではなにも始まりません。「分析プロジェクト」として一連の手順を企画し、実行管理を行う必要があります。

ただ、過去になんらかの調査や分析プロジェクトを行ったことがないと、分析プロジェクトの手順を理解するのは簡単ではないようです。そこで、プロセスマイニング分析の流れを料理の流れにみたてて解説してみたいと思います。

まず、料理の流れを確認します。場所の想定はレストランの厨房です。最初の活動は「食材仕入れ」、最後は、盛り付けてお客さまのテーブルに配膳する活動としています。

料理の流れ

１　食材仕入れ

食品卸業者などを通じて、各地から様々な食材を仕入れます。

２　食材

食材が揃いました。虫が食っていないか、腐ってないかなど確認します。

３　下準備

包丁で食材を切り刻んだり、鍋で沸かしたお湯でゆがいて苦みを取るなどの下ごしらえを行います。

４　調理

様々な調理器具を用いて食材を調理します。

５　盛り付け・配膳

出来上がった料理を見た目良く盛り付け、お客さまに提供します。

なお、このレストランでの料理全体をとりしきるのがマスターシェフの役割です。

次に、プロセスマイニング分析の手順を上記各調理手順と対応させてながら解説しましょう。

プロセスマイニング手順

１　データ抽出　＝　食材仕入れ

SAPに代表されるERPや、SalesforceなどのCRMシステム、あるいは独自開発の業務システムなど、分析対象データとなるイベントログが記録・蓄積されている各種システムからデータを抽出します。

データの抽出方法としては、SQLによってDBから直接データを抜くことが一般的です。

データ抽出作業は基本的に、システムエンジニア、あるいはシステム管理者が行います。ERPのようにデータベース構造が複雑な場合、分析対象データがどこに所在しているかを見極める必要があるため、例えばSAPを熟視したSAPの専門家の支援を受けます。

２　分析対象データ　＝　食材

システムから抽出されたデータは、総称して「イベントログ」と呼ばれます。システム上での操作履歴をイベント単位でタイムスタンプと共に記録されたものだからです。

データフォーマットとしてはCSVで提供してもらえると、後工程のデータ前処理が楽になります。場合によっては、JSON形式のまま提供されることもあります。JSON形式のイベントログですと、前処理がちょっと面倒になります。

３　データ前処理　＝　下準備

システムから抽出されたイベントログデータは、多くの場合、10本以上の複数のファイルから構成されています。操作履歴となるアクティビティとタイムスタンプ等が記録されているファイル、またマスターデータが含まれたファイルなどです。

次工程の分析を行うためには、基本的にはすべてのファイルを統合して1本のファイルに仕立てる必要があります。また、元ファイルには、文字化けしている箇所や、本来なんらかの値が入っているはずのセルが空白であるなど、そのままでは分析できないデータが多数含まれています。

そこで、ノイズ的なデータは除去する、補正するなど、料理のあく抜きと同様のデータクリーニングなどを行う必要があります。こうした、元データを分析可能なクリーンなデータに加工する工程がデータ前処理です。

データ前処理は、データをどのように処理すればクリーンになるかを熟知したデータサイエンティストが、ETLツールやPythonなどを駆使して行います。

４　分析　＝　調理

データ前処理によって下ごしらえが済み、分析可能なクリーンデータができたら、ようやくプロセスマイニングツールに投入し、様々な分析が可能となります。

プロセスマイニングツールは非常に多機能なツールです。使いこなせるようになるには相応のトレーニングと経験が必要ですが、数字の羅列にしか見えないイベントログデータから、業務プロセスがフローチャートとして視覚化され、非効率であったりボトルネックであったりする箇所をあぶりだすのは楽しい作業です。

プロセスマイニングツールによる分析は、なにより使用するツールを熟知したツールの専門家が必要ですが、どのように分析を行うか、という分析視点を与えるのはプロセスアナリストです。また、データサイエンティストも、データの前処理を通じて、元データについての理解を深めていますので、分析作業の支援を行うことができます。

５　レポーティング　＝　盛り付け・配膳

プロセスマイニングツールでの様々な分析結果から判明した、対象プロセスの課題や問題点についてグラフ、表などを活用しながらレポートを作成します。レポートを提出する関係者は必ずしもデータ分析に慣れているわけではありませんので、なにが課題・問題なのかを容易に理解できるようなビジュアル表現に留意する必要があります。

レポート作成を担当するのは、プロセスアナリスト中心に、プロセス改善のノウハウ（リーン、シックスシグマなど）を持つつプロセスコンサルタントの支援も受けるのが理想です。必要に応じて追加分析を行うこともありますので、データサイエンティスト、ツールエキスパートのサポートもあるといいでしょう。

プロセスマイニング分析のプロジェクト全体を取り仕切る、レストランのマスターシェフに該当するのは、プロジェクトマネージャーです。プロジェクトマネージャーは、すべての工程を熟知している必要はありません。しかし、各工程の概要は理解していることと、なによりプロジェクトを円滑に遂行するスキルが求められます。

以上、料理の比喩を用いて、プロセスマイニング分析の標準的な手順をご説明しました。各工程とも、相応のスキルと経験が求められる難度の高い作業ではありますので、各分野のエキスパートがうまく連携してプロジェクトを進めていくことが求められます。

Process Mining: How to Cook

Process mining is an “analytical method”. The mere introduction of a process mining tool doesn’t start anything. You will need to plan a series of steps as an “analytical project” and manage their execution.

However, if you have not done any research or analysis projects in the past, it does not seem to be easy to understand the steps of an analysis project. Therefore, I would like to explain the flow of process mining analysis by contrasting the flow of cooking.

First, let’s see the flow of the food. The assumed location is the kitchen of a restaurant. The first activity is “purchasing foodstuff” and the last is serving dished-up food to customers’ tables.

COOKING FLOW

1 Purchase of foodstuffs

purchase a variety of food from all over the world through food wholesalers.

2 Foodstuff

The foods to be cooked are now available. Check to see if there are any insects eating or rotting.

3 Precooking

prepare the food by chopping it with a knife or boiling it in a pot of boiling water to remove the bitterness.

4 Cooking

Cooks food using a variety of cooking utensils.

5 dishing-up and serving

dish up cooked foods and serve the finished dishes to the customers.

Role of Master Chef

Note that the role of the master chef is to oversee the entire cooking process of the restaurant.

Next, let’s explain the steps of the process mining analysis, corresponding to the above cooking steps.

process mining procedure

1 Extraction of data = Purchase of foodstuff

extract data from various systems that record and accumulate event logs that are the target data for analysis, such as ERP represented by SAP, CRM systems such as Salesforce, or proprietary business systems.

As a method of data extraction, it is common to extract data directly from a DB by SQL.

Data extraction is basically done by system engineers or system administrators, and when the database structure is complex, such as ERP, it is necessary to determine where the data to be analyzed is located, for example, with the assistance of SAP experts who have good knowledge about SAP.

2 Data to be analyzed = Foodstuff

The data extracted from the system is collectively referred to as the “event log. This is because the history of operations on the system is recorded on an event-by-event basis with a time stamp.

As a data format, it would be easier to pre-process the data in the post-process if it were provided in CSV format. In some cases, the event log may be provided in JSON format and the pre-processing of the event log in JSON format can be a bit cumbersome.

3 Data preparation = Precooking

The event log data extracted from the system is often composed of multiple files, often ten or more. It can be a file that records activity and time stamps, etc., as well as a file that contains the master data.

Basically, all the files must be combined into a single file in order to analyze by a process mining tool. In addition, the original files contain a lot of data that cannot be analyzed as it is, such as garbled parts and empty cells that should have contained some kind of value.

Therefore, it is necessary to remove or adjust for those noisy data, that is, perform data cleaning similar to the removal of unfavorable parts of food. Data preparation is the process of processing the original data into clean data that can be analyzed by a process mining tool

Data preparetaion is done by data scientists who know how to process data to make it clean, using ETL tools, Python, and other tools, languages.

4 Analysis = Cooking

Once the data has been pre-processed and the clean data is ready for analysis, it can finally be fed into process mining tools for various analyses.

The process mining tool is a very versatile tool. It takes some training and experience to become proficient, but it’s fun to visualize business processes as a flowchart from event log data that looks like nothing more than a litany of numbers to uncover inefficiencies and bottlenecks.

Analysis with process mining tools requires tool experts who are familiar with the tools used, but it is the process analyst who gives the analytical perspective on how to do the analysis. The data scientist also has a better understanding of the original data through pre-processing of the data, so they can assist in the analytical work.

5 Reporting = Dishing up and serving

create reports using graphs, tables, etc. on the issues and problems of the target process identified from various analysis results with process mining tools. Since the people receiving the report are not necessarily familiar with data analysis, it is necessary to keep in mind the visual presentation that makes it easy to understand what the issue or problem is.

Ideally, the report should be written by a process analyst, with the assistance of a process consultant with process improvement know-how (Lean, Six Sigma, etc.). It’s also good to have the support of a data scientist or tool expert, as additional analysis may be required.

Role of Project Manager

It is the project manager who correspond to the master chef of the restaurant who runs the entire process mining analysis project. A project manager does not have to be familiar with the entire process. However, you must have a good understanding of each step of the process and above all, you must have the skills to execute the project smoothly.

So far I have used the culinary metaphor to explain the standard procedure for process mining analysis. Each process is a highly challenging one that requires a certain level of skill and experience, so it is necessary for experts in each field to work well together to advance the project.

プロセスマイニングとデータマイニング・AI、BPMとの関係

2020-04-172020-02-24 by Jun Matsuo

How process mining can relate to data mining, AI and BPM.

プロセスマイニングと密接な関係がある隣接分野があります。ひとつはデータマイニング・AI、もうひとつはBPM（Business Process Management）です。

今回は、どのように関係があるのかを簡単にご説明しましょう。

まずは「データマイニング・AI」とは何かから説明します。データマイニングは、基本的にビッグデータを対象とした分析手法であり、その主な目的はものごとの因果関係や典型的なパターンのような「法則性」を発見して、様々な意思決定に役立てることです。

例えば、各地の気温、湿度などの天候情報を大量に収集し、データマイニングでそのデータを分析することで、どのような状況において晴天になりやすいのか、それとも雨天になりやすいのかの予測式がつくられ、天気予報に活用されています。

データマイニングでは、数十年前から活用されてきた「多変量解析」の手法、例えば、回帰分析や、クラスター分析、決定木分析に加え、近年は主にニューラルネットワークによるディープラーニングが飛躍的な進歩を遂げ、ものごとを判別したり、予測する精度が大きく向上しています。一般に、これらの分析手法のことは「AI（Artificial Intelligence:人工知能）」と呼ばれますが、AIはデータマイニングにおいて頻繁に利用される手法なので、当記事では「データマイニング・AI」と一括りにしています。

さて、データマイニングはあらゆる分野のあらゆるビッグデータを分析対象としますが、基本的に「プロセス」を対象とはしてきませんでした。ある瞬間、すなわちスナップショット的な静的なデータを抽出して、要約したり、分類したり、因果関係を見出してきたりしたのです。

一方、プロセスマイニングは、文字通り、時系列のひとつながりになった動的なデータから、プロセスの流れを描き出すこと、すなわち「プロセスモデル」を作成することが基本にあります。もちろん、プロセス処理件数や処理時間など、プロセスに関わる静的な各種統計量も併せて算出する点は、データマイニングと共通しています。

こう考えると、データマイニングとプロセスマイニングは、分析手法としては兄弟分のようなものです。（どちらにも「マイニング」という言葉が含まれていますし）

ただ、プロセスマイニングを主体に考えると、プロセスに関わる様々な分析を深めていくうえで、データマイニング、AIの手法が応用されています。例えば、現在処理中の案件（ランニングケース）の終了までのリードタイムを推測するためには、データマイニングにおける「予測分析」が採用されています。

それ以外にも、必要に応じて、クラスター分析や決定木分析などが活用可能であり、今後も、プロセスマイニングツールとしての分析の幅や精度を高めるためにデータマイニングの手法がプロセスマイニングに取り入れられていくと考えられます。

では次に、BPM（BPM）について考えてみましょう。BPMはシンプルにいえば、プロセスを改善することを目的として、プロセスの現状を分析し、問題点を解消するto beプロセスを設計し、現場に展開・監視を行う一連の活動です。

このBPMの活動のうち、とりわけ「現状分析」において、プロセスマイニングの基本アプローチのひとつ、「プロセス発見」は役立ちますし、その後の設計、展開、監視においても、プロセスマイニングが提供できる「適合性検査」、「プロセス強化」のアプローチはBPMにとって強力な武器となりえます。

このように、プロセスマイニングとデータマイニング・AI、BPMはお互いに補完しあえる関係にあると言えます。プロセスマイニングのゴッドファーザー、Wil van der Aalst教授は、「プロセスマイニングは、データマイニングとBPMをつなぐ橋である」と述べられていますが、まさに、BPMの取り組みにおいて、プロセスに特化したデータマイニングとしての「プロセスマイニング」は大きな役割を果たしていくと思われます。