In The Data Science Playground, The First Lesson Is DataOps – Forbes

LONDON - DECEMBER 18: A concrete surfaced playground is shown December 18, 2002 in London. School ... [+] playing fields are being sold off at a rate of almost one a week despite government assurances that the Conservative policy of selling sports grounds would be reversed. (Photo by John Li/Getty Images)

The very fabric of cloud is still forming. As fast as we solidify elements of cloud computing structures, networks and services in order to agree how they should best be architected, deployed and managed... there is an equal and opposite stream of new development that sees sometimes quite esoteric techniques, best-practices and methodologies come to the fore.

Already past its esoteric adolescence and into mainstream deployment and onward augmentation is the concept of Infrastructure-as-Code (IaC).

This approach to base layer cloud network services creation sees the steps required to provision infrastructure such as servers, data storage and networks being represented as software code (or some form of descriptive model). It has already become increasingly common as organizations look to streamline their IT processes and reduce the amount of time to create (and destroy i.e. retire and decommission) IT infrastructure.

Typically incorporated into an organizations DevOps strategy, IaC, which has historically been used to facilitate the creation of virtualized environments, is now seen as a key building block for automating the configuration and provisioning of the services provided by hyperscalers and other technologies that comprise todays polycloud environments.

All well and good so far then, yes. But as we now also look to extend our use of data science at a core operational level, the use of IaC needs to be revisited to give it not just DevOps goodness, but DataOps wellbeing at the same time. This is the opinion of Nelson Petracek in his role as global CTO with Tibco, a company known for its data-centric cloud platform technologies.

Petracek says that data science is driving the most progressive business models out there. This is the creation of data-driven decision intelligence, data-centric business modeling and the use of Artificial Intelligence (AI) and Machine Learning (ML) in all its forms.

But he argues, as we engineer data science into the operational fabric of business running on Infrastructure-as-Code (IaC) cloud implementations, automating traditional infrastructure provisioning is not enough and supporting DevOps capabilities is not sufficient

The need for automation in the world of data science is not just about the software, services and applications, but also the data itself, said Petracek.

Thus, IaC has a new role to play, one focused on the DataOps processes needed for todays modern data fabric, data mesh and data management architectures.

The difference between DevOps and DataOps is another discussion in and of itself, but - in general - DataOps includes not just DevOps principles for accelerating the creation of analytics products, but also other methodologies needed to optimize the use, delivery, and value of data within an organization, stated Petracek.

Amongst other capabilities, the Tibco CTO explains that DataOps involves the data workflows and processes needed to deliver high quality data and results to data consumers in a timely and contextual fashion. This is not the realm of DevOps, which instead focuses on the end-to-end delivery of software products and services.

As a result, IaC in the data science world must be extended beyond DevOps and IT infrastructure creation. Servers, storage and networks still need to be provisioned, but so do data warehouse table structures, data pipelines, data quality checks, model validations and various other supporting infrastructure elements, clarified Petracek.

So what we have here is not just cloud infrastructure and not just composable turn-off-and-onable Infrastructure-as-Code (IaC), we have DataOps-Aware Cloud IaC (or DOACIaC, an acronym that doesnt actually exist).

From Petraceks viewpoint, IaC techniques do indeed have the ability to capture these much more DataOps compliant elements of Infrastructure-as-Code and subsequently provide a number of benefits.

There benefits would include improved repeatability, simplified maintenance and reduced configuration errors. A reduction in the amount of time and effort required to create or tear down an environment, along with any software application or data service dependencies. Versions of each dependency can be fixed, eliminating incompatibilities and unexpected execution results.

We also get the chance to move forward towards improved audit processes, as the IaC artifacts may be stored and versioned in a shared repository. It is possible to access this history to prove the environment and data-related configurations associated with a particular analysis or data process at any point in time, said Petracek, who also points to new opportunities for more rapid experimentation, streamlined data science playground creation and cross-team collaboration.

Tibcos Petracek concludes by saying that this DataOps-enriched approach to cloud infrastructure enables greater re-use, simplified migrations to different environments and reduced support requirements.

The definition of Infrastructure as Code (IaC) is expanding as organizations attempt to improve the speed, accuracy and value of their data pipelines. Successful enterprises understand that DataOps should be their focus, with DevOps a subset of that strategy. Perhaps the time has come for IaC to take on an expanded role as well, he concluded.

Software application development, data science and related aspects of cloud engineering are fond of using schoolyard metaphors already; we know that sandboxing techniques describe the ability for kids (in this case: software professionals, data scientists and citizen software/data non-techies too) to mess around with ideas that dont create a mess for everyone else. Extending sandboxes to a wider notion of the data science playground is arguably a logical move.

View post:

In The Data Science Playground, The First Lesson Is DataOps - Forbes

Related Posts

Comments are closed.