Podcast

Lukáš Mazánek (Raiffeisenbank): We Build Data Products and Implement Data Mesh to Enable Business to Be Independent From It

August 8, 2024

He wants to create a world where businesses deliver real value without relying on IT. By democratizing data and creating a data mesh, he aims to achieve this vision. Our guest is Lukáš Mazánek, Chief Data Officer of Raiffeisenbank in the Czech Republic.

Watch the interview

Listen to the podcast

Read the podcast as an interview

(The interview was shortened and edited using ChatGPT)

Ivana Karhanová: Raiffeisenbank has an vision titled ‘ Vision 2035’. Can you share more details with us about it?

Lukáš Mazánek: Certainly. The year 2035 might seem distant, but there’s a reason for setting such a long-term vision. We believe it represents a deep structural change. We want everyone to understand this from the name itself—2035 gives us enough time to implement this change. The vision aims at achieving business autonomy and delivering new value through data and technology surrounding the data mesh.

Ivana Karhanová: When you mention business autonomy, are you referring to autonomy without IT dependency, which we discussed earlier?

Lukáš Mazánek: Actually, I’m referring to the autonomy of the business as a business domain. This means the business should be capable of delivering value from end to end autonomously, with all necessary roles and technologies within the business domain itself. The challenge lies in connecting such autonomous parts of the company together to deliver one unified product as a bank.

Ivana Karhanová: Product as a bank?

Lukáš Mazánek: Yes, I mean the product within the framework of the bank.

Ivana Karhanová: You are responsible for the data strategy specifically. You aim to create a data mesh and democratize data within Raiffeisenbank. You often work with terms like ‘semantic layer’. Could you explain what a semantic layer is and why it’s so important for data mesh and data democratization?

Lukáš Mazánek: I’d love to. Lately, it’s been a focal point of my work. First, let’s discuss the problem we’re addressing. Our enterprise is growing increasingly complex. When you add a new application to a complex environment, it doesn’t just add to the complexity linearly—it multiplies it. For example, we have about 350 applications and systems. Adding just one more doesn’t make it 351 because of the intricate associations and integrations between applications, which account for 35 to 60% of IT expenditures. Adding one application can lead to hundreds of new data integrations, significantly accelerating the complexity.

Every system typically acts as a silo, locking data within. Data also has different meanings within each application. Take a simple example like a customer’s name. In some countries, people have a third name, and if an application only has fields for first and last names, where do you put the third name? It varies from one application to another. Some might place it in the first name field, others in the last name, and suddenly the meaning of the attribute changes. This is a basic example, but things are much more complex in the banking industry. To provide easy access to data across the enterprise, we need a layer that offers a unified meaning of data and information.

Ivana Karhanová: Is this similar to what we call a data catalog?

Lukáš Mazánek: Well, a data catalog is, I would say, just the beginning. It gives you information about what attributes are available throughout the organization and where these attributes are located. But you also need to define the meaning of these attributes. Each attribute represents a concept, and we need to be very careful with the words we use. A concept isn’t the same as an object. For example, when I say “chair,” I have a concept in mind that might include various types of chairs—four-legged, three-legged, or one-legged, in any color. That’s the concept; it’s not the actual chair. This applies to IT as well.

Ivana Karhanová: So, you’re saying that it’s essential to describe the concept clearly so everyone has the same understanding of it, like having a consistent image of what a chair represents, even though the chairs aren’t the same?

Lukáš Mazánek: Exactly. The chairs aren’t the same, but the concept is. For instance, if I say we will move chairs from a room, I don’t need to specify each chair’s appearance. I’ll recognize them as chairs when I see them, and I’ll move them. First, we define that this attribute represents this concept, then we clarify what that concept means, which can vary depending on the context given by the business domain. We’re back to the autonomy of business domains. It’s crucial to define these domains because they provide the context that gives meaning to concepts. This is like when you try to define a business term with representatives from across the company—ask them to define who their customer is, and you’ll never reach consensus. For retail and corporate sectors, a customer might mean a physical person or a legal entity, respectively—two completely different concepts. However, conceptually, they understand each other when referring to customers across departments. Thus, we need to define these concepts for all attributes and create a business glossary. Once we’ve established these concepts at the business domain level, we can understand what each data point means within that domain.

These tools have been available for many years; that’s nothing new. However, the emergence of knowledge graphs is revolutionary. Knowledge graphs visualize concepts and their interrelationships, making it easy to see and understand these connections. This visualization makes the implications immediately apparent. The semantic layer provides an easily understandable and accessible way to interact with all the data within the enterprise. Suddenly, you don’t need to understand every specific system to access its data or information. You can simply query the semantic layer, and it delivers the answers immediately. It’s a paradigm shift. Additionally, a semantic layer is vital for artificial intelligence, which enhances efficiency, makes decisions, and optimizes processes. But AI needs knowledge, not just data. The difference lies in providing context, meaning, experience, and other essential metadata. By enriching data with this metadata and incorporating it into the semantic layer, we enable AI to automate decisions and even generate summaries efficiently. I apologize for the lengthy answer, but it’s necessary to cover the complexities involved.

Ivana Karhanová: And does this mean that you’ve already begun implementing this vision?

Lukáš Mazánek: Yes, we’ve been working on this for several years but started with smaller initiatives. Initially, we began with a data catalog and business glossary. We’re also exploring proof of concepts for knowledge graphs and the semantic layer.

Ivana Karhanová: Which business domain did you choose to start with?

Lukáš Mazánek: That’s an excellent question. Currently, we are in the midst of an agile transformation, constructing our first official business domain. However, our investment sector has been functioning like a business domain for quite some time, although it’s not officially designated as one. We are preparing a proof of concept in the data mesh area for it. It’s important to clarify that data mesh involves more than just sharing data—it’s about creating data products that are standardized, which stems from understanding the data and its meaning, closely tied back to the semantic layer.

Ivana Karhanová: It sounds like managing proper data governance in each domain is also a significant challenge, right?

Lukáš Mazánek: Absolutely, data governance is critical. We adopt domain-driven design by Eric Evans, which views a business domain as a sphere of knowledge and action. Simply drawing lines on an org chart doesn’t create a business domain. It must have influence and act as a coherent unit. When set up correctly, it provides the necessary context for the data.

Ivana Karhanová: So, it evolves from the bottom up?

Lukáš Mazánek: Precisely, it manifests organically. For instance, in a workshop, our retail department restructured into sub-domains like daily banking payments, showing how domains can naturally evolve based on internal dynamics. It’s crucial how these domains are formed because they provide context to the data. I aim to help our business units articulate their operations using conceptual language, as outlined in the concept of business knowledge blueprints by Mr. Ross. This approach helps describe the entire conceptual environment of a domain, making it understandable to those outside it.

Returning to your point on data governance, our philosophy allows each domain the autonomy to manage its affairs as they see fit, even if that means occasionally taking shortcuts for expedience and profit. However, once data crosses the boundaries of a domain, federated governance applies. This governance should be standardized and optimized, incorporating governance as code within the data mesh—meaning data products come with built-in rules. If those rules aren’t followed, access is denied. We plan to establish a top-level ontology to unify definitions across domains, which brings us to the term “ontology”—a term not everyone likes, but it’s crucial for our framework.

Ivana Karhanová: Like many terms in this field, understanding ‘ontology’ is essential for grasping the full scope of what you’re implementing.

Lukáš Mazánek: I apologize, but ontology helps in communication. Ontology, a concept deeply rooted in philosophy, defines things and their interrelations. A top-level ontology outlines what elements an organization has and how they are related. Fortunately, some of this groundwork is already done for us. There are open ontologies, such as FIBO—the Financial Industry Business Ontology—which provides definitions for nearly every concept in the financial sector. It’s not comprehensive, so you might need to extend it to meet specific needs. But once established, it serves as a standard, allowing each domain to align with it. Within your domain, you can operate independently, but when you cross the border into another domain, you must link your concepts to this top-level ontology, ensuring all business domains understand each other.

Ivana Karhanová: So crossing these domain borders occurs whenever someone outside your domain needs data from it?

Lukáš Mazánek: Exactly, that’s right.

Ivana Karhanová: And this would apply to most use cases?

Lukáš Mazánek: It depends on the industry. In banking or other heavily regulated sectors, crossing borders is common. However, in some industries, activities and data sharing are mostly contained within individual domains, with minimal cross-domain interaction. This is where data mesh becomes particularly useful. It’s a method for organizing and sharing data within a domain in a decentralized and scalable way. We achieve this with domain ownership, creating data products that package data with all necessary metadata, such as refresh rates, delivery frequencies, security policies, and so forth. Crucially, to support autonomy, we need a self-service platform. This prevents each domain from having to start from scratch, which would otherwise lead to a mess rather than an efficient mesh.

Ivana Karhanová: A mess instead of a mesh, indeed.

Lukáš Mazánek: Implementing security measures multiple times across different domains would be too expensive. You can handle it once through a self-service layer, which resolves all the unnecessary and boring compliance and IT stuff so the business can focus on their goals. In my data strategy, I describe this as a robot assembly line—everything is ready and shiny, and suddenly, if the business wants to create a red car instead of a white one, they can start production right away. We also have federative governance, which we’ve touched on before.

Ivana Karhanová: Can you give me an example of a data product?

Lukáš Mazánek: Sure. Take sales, for example. If we wanted to create a machine learning model to analyze customer transactions, it would be handy to have those transactions. In the retail business domain, where transactions for a given customer are captured, these transactions are packaged together with policies like who can access them or GDPR regulations. This package is then provided usually as a service to other business domains and is also registered to a so-called marketplace. This way, everyone in the company can see that this data product, probably called something like “All Transactions of Retail Customers,” is available. It’s provided in the marketplace, and then everyone can access it.

Ivana Karhanová: Okay, so if I understand correctly, I can come to the marketplace, ask for the transactions data of retail customers, and ask a lot of questions I’m interested in from, for example, a marketing perspective to build a better strategy. This is what a data product should be?

Lukáš Mazánek: Definitely. Typically, the customers of the marketplace are data scientists, but it can also be a financial officer or a risk manager who might use the data to detect fraudulent behavior or compute the financial position of a customer.

Ivana Karhanová: So if I come from a different business domain, I am able to more simply ask and receive answers on questions about a data product?”

Lukáš Mazánek: What you receive is a package of data. What you do with this package is up to you. You might load it into an application or even Excel. But then, you can do whatever you want with it because the meaning of the data is clear—each attribute is defined in the semantic layer. You can read its definition, understand its intended use, and then use it as needed. You could even set up a new pipeline to read this data regularly—every hour, every day—whatever suits your needs, and build your application around this pipeline. What’s new here is the decoupling, because not all data are stored in the same place. If there’s an outage in one system but the data is decentralized, it remains accessible. If everything were centralized and an outage occurred, everyone would be affected. This exemplifies the concept of decentralization. Not every approach fits every problem; some need centralized solutions, others decentralized. Data mesh suits complex and large organizations. If your business is simple with few systems, data mesh might not be necessary. But if you have hundreds of systems and no clear understanding of them, then data mesh is essential.

Ivana Karhanová: How does this align with the Microsoft Fabric concept, which seems to support the data mesh approach?

Lukáš Mazánek: That’s an excellent question. Data Fabric aligns perfectly with data mesh because it automates what would be overwhelming to handle manually. For instance, we utilize an augmented data catalog, heavily populated with metadata from our data warehouse—about 80% of all our enterprise data, well-documented. We plan to train a large language model on this data and metadata to find similarities in other systems and enrich them with metadata. This will help populate our data dictionary and aid in defining concepts using top-level ontology and artificial intelligence, refining our business glossary and semantic layer. So, data fabric and data mesh complement each other well,  with one being a technological tool and the other an organizational methodology.

Ivana Karhanová: But this requires a certain level of maturity from both your business and IT personnel, right?

Lukáš Mazánek: Absolutely, and that’s a critical point. One of IT’s main goals is to reduce the cognitive load—the amount of knowledge required to perform tasks. We aim to ensure that not every customer needs to be a data analyst or scientist to work effectively with the data.

Ivana Karhanová: And the customers you’re referring to?

Lukáš Mazánek: Internal customers. I’m speaking from the IT perspective. We need to create an environment that removes these obstacles, automating them with rules and applications. For instance, we’re currently testing a tool for managing our business glossary and data dictionary. This tool allows users to create applications simply by configuring metadata defined within it. Once you define your concepts and relationships, and specify data locations in the data catalog using this tool, you can generate a front-end application. Users input data as usual, but now, the data is captured with its metadata, linking fields to concepts and relationships. This reduces the cognitive load for users, which is a key IT objective.

Ivana Karhanová: Do you believe you will achieve this by 2035?

Lukáš Mazánek: I expect we’ll see results much sooner in selected areas. I don’t favor huge transformation projects that aim to completely overhaul an organization within a decade; it’s overly ambitious. Instead, we should evolve incrementally.

Ivana Karhanová: So, a step-by-step approach?

Lukáš Mazánek: Exactly. Consider data mesh, introduced in 2019 by Zhamak Dehghani—it’s a very new concept. We need to be cautious, starting small and scaling gradually. My initial goal is to capture new data in this new way, not to transform the entire organization immediately, but to avoid worsening the current situation. It might not sound ambitious, but I believe it’s the right approach.

Ivana Karhanová: What are the main challenges you anticipate?

Lukáš Mazánek: The biggest challenge is cultural change. We live in an application-first world. In meetings about new products, the focus is often on the application’s behavior, such as how it will react or where buttons will be placed, rather than on data management. But the paradigm is shifting. I’m currently reading a book “Data-Centric Revolution,” which I highly recommend. It discusses why these changes are hard. It recounts the story of a Hungarian doctor in the 19th century who linked hygiene to postpartum complications. He demonstrated improvements in hospital conditions, yet he was ridiculed and ended up in a mental hospital, dying before his findings were accepted. This illustrates how difficult it is for people to shift paradigms, even when lives are at stake.

Take, for example, sailors who suffered from scurvy due to a lack of vitamin C. They conducted experiments with lemon juice, which resolved the issue, yet it took decades to become standard practice to bring lemon juice on voyages. People continued to die in the meantime. The primary challenge we face is cultural change because adopting a data-centric mindset is fundamentally different. It’s about prioritizing data care, aiming for a unified data meaning across the organization, and creating simple applications that can be easily replaced as needed. I’m always amazed—it’s so obvious in corporations, where old systems are replaced without proper attention to the data itself. Data migration is common, but the focus rarely shifts to genuinely caring for the data.

Ivana Karhanová: So, it’s more about leveraging new functions and features?

Lukáš Mazánek: Exactly. Cultural change won’t be easy. It’s the biggest hurdle, not technology or methodology. We have all the tools we need; the challenge is convincing people of this path and engaging them in the process. I often start my talks by asking who finds data governance sexy. Surprisingly, more hands are going up now, even from those in business roles, indicating a shift in perception. But still, data management is often left to a small team, which is ineffective without broader support. We need a network effect, where every employee understands and treats data as a vital asset, though I admit that saying ‘data as an asset’ can sound vague. It’s about recognizing that data is crucial and handling it as the company’s most valuable resource.

Ivana Karhanová: Thank you for joining us in the studio and sharing your insights.

Lukáš Mazánek: Thank you very much. It was a pleasure.

Join hundreds of professionals who enjoy regular updates by our experts. You can unsubscribe at any time.

SUBSCRIBE - Sidebar Newsletter

More Insights