Data Mesh - What is it, and is it for us?
Everyone is talking about Data Mesh. Whilst the meaning and value of terms like data strategy, data management and data governance are becoming more universally understood, the same can’t be said of Data Mesh. Throughout this blog I’m going to break down exactly what Data Mesh is, and how it can be embedded into your business. Moreover, I’ll also take you through how to judge whether or not it will positively impact your organisation, because, spoiler alert, it's not for everyone.
What is Data Mesh
According to Thoughtworks, Data Mesh is an approach to data operations which allows for the management and analysis of data in real time. It enables stakeholders to access and examine data in-situ, meaning teams own the data their business area/department (domain) is responsible for.
Data Mesh came into being as a potential solution to the perennial problem of data silos within business departments and the long lead times that can be associated with building a single central data platform. While data mesh may be suitable for many organisations, there are important scale and maturity considerations to be considered which we’ll come to later. There are four key dimensions to Data Mesh:
Domain-oriented decentralised data ownership and architecture
Data as a product
Self-serve data infrastructure as a platform
Federated computational governance.
Domain Ownership
Data Mesh places ownership of data on those closest to it - i.e. those managing data within their departments - in order to allow for flexible continuous change, scalability and rapid growth. When adopting a Data Mesh approach each business area is responsible for the curation and management of their data, crucially this is not just to meet their own needs. They must also cater for users in other areas of the business and publish and communicate the data that is available. In a data mesh this is done by building and managing what is known as a data product.
Data as a Product
A key challenge in getting value from data is the conflict between understanding, trusting and using quality data. In many organisations, data is collected and stored during regular business activities, but not then used for anything else. Data Mesh changes this mindset, encouraging teams to treat data as a product with the people understanding and using this data being treated as customers. To enable this, data owners should have a high level understanding of who each ‘customer’ is, how they use and consume data, and what interfaces are required to facilitate ‘customers’ accessing the data. To embed data as a product within an organisation, it’s suggested that a Data Product Developer role is created - people who would be responsible for building, maintaining and serving the team’s data products within the shared infrastructure.
Self-serve data platform
For teams to individually own and publish their data products, they will require the ability to extract data from their operational systems in a way which is simple, manageable, can scale with the growth of the business and supports interoperability. What do I mean here? I’m talking about technology. There will be a need to integrate data from various operational systems, aggregate that data and make it available for use. Clearly each business area could do this their own way but for cost expediency, knowledge transfer and ease of overall management reasons it's highly likely that a shared data platform will be required. What’s important here is that there is a standard technology toolset and that the domains can use as much or as little of this as they require to build and publish their data products.
Federated Computational Governance
Whilst freedom and flexibility are important aspects of a data mesh, for the mesh to work as a whole and have a value that is greater than the sum of the parts there has to be a rulebook that everyone agrees to abide by - otherwise all that's been created is a decentralised model with all the associated problems of duplication and inefficiency.
It's easy to imagine this being a traditional data governance function, but the point of a data mesh model is that there are no central teams, everything should be federated. This is where the concept of computational governance comes into play. This is essentially the automated application of rules such that data products cannot be published if they do not meet the required standards. This sounds simple but there is a lot to this when you consider the range of standards that need to be enforced. Is the right technology being used in the right way, are there data and interface definitions to support the product, is the product unique i.e. it doesn’t provide data/information duplicated in another product, are there ‘release notes’ that describe the product and how/when to use it etc.
Is Data Mesh for us?
When considering the use of data mesh, it is very important to consider the context surrounding your organisation, and whether it will actually see benefit from this being adopted.
Technology Constraints
As previously discussed, for data mesh to be effectively implemented within an organisation it needs a shared scalable infrastructure or platform. It's important to realise that an existing centralised data warehouse (the sort of thing many organisations have already) is unlikely to be an effective platform to support data mesh long term. Therefore organisations really need to be thinking about cloud based platforms that can scale up/down as required. Such a move requires different skills, maybe new skills for an organisation, which means that training and recruitment are important considerations. Recall that in a mesh model this is not just about equipping one central team with these skills, each business area/domain will need these skills.
Clearly then putting the foundations in place could be a large and expensive change project in itself, thus careful evaluation and key questions to be considered are:
What is the technological journey your company will need to go on?
Are you able to use the cloud for this project?
Does your team have the skill sets needed to build and maintain the data products, if not how will you equip them?
Governance Constraints
A key pillar of Data Mesh is the governance enabling its embedding within an organisation, as discussed above, ideally this is automated wherever possible. This sounds easy but in practice it's difficult to achieve and the technology products required are still emerging/developing and not insignificant projects to implement.
The backstop could be a central data governance team that implements the governance you can’t automate. This is perhaps not a bad thing to start with as all change requires a level of leadership, facilitation and coaching to embed.
Of course if an organisation already has effective governance in place this type of activity will be more familiar and therefore easier to adapt, but, for those organisations who never really tackled data governance, moving straight to a federated model may be too large a step to take in one go.
Key questions to ensure your organisation answers here are:
How will you embed data mesh within your organisational culture?
How do you ensure people adopt the same standards and what steps will you automate?
How do you intend to lead, manage and nurture the change?
Scale Constraint
There are 2 types of scale to consider, data complexity and organisation size.
A large multinational organisation is likely to have many different departments, different systems, different datasets and potentially different languages and time zones to contend with. In such environments it's highly likely that there are already ‘local’ teams working with data and thus it's easy to see how a mesh model could be adopted and help.
Similarly in an organisation that has many different types of product or service offering there could be many different operational systems and data complexity that makes it hard to bring everything together, again it's easy to see how a mesh model could help.
However, in smaller organisations or those organisations with less data complexity (e.g. small number of operational systems) and perhaps a historical reliance on a few key data resources, it's harder to see how a data mesh model will add value and/or be more efficient.
Conclusion
The data mesh model is undoubtedly a useful and value creating approach to data operations but it's definitely not for everyone. I also think there is some technology evolution required, specifically to better support the ‘federated computational governance’ element, to make it truly effective and operate as intended.
Smaller scale businesses, be that organisational size or data complexity, would be well advised to take a different approach. For me there will just not be enough benefit to outweigh the cost and complexity of moving to a data mesh model in these organisations.
Larger scale businesses (again size or data complexity) have much more to gain and are potentially better placed to move to a data mesh model given the high likelihood of business area data teams already being in place. However, I would encourage such organisations to think about their data maturity. Adopting a data mesh model requires a level of maturity and, in my opinion, will suit organisations that are more data mature and have established data governance practices. I suspect anyone seeking to use data mesh to avoid data maturity growing pains will find this a very challenging journey.