Understanding the Role and Attributes of Data Access Governance in Data Science & Analytics – Analytics Insight

Understanding the Role and Attributes of Data Access Governance in Data Science & Analytics

Data scientists and business analysts need to not only find answers to their questions by querying data in various repositories, but also transform it in order to build sophisticated analysis and models. Read and write operations are at the heart of the data science process and are essential to helping them make quick and highly informed decision-making. It is also an imperative capability for data infrastructure teams that are tasked with democratizing data while complying with privacy and industry regulations.

Understanding and meeting the necessary components for both groups require a data governance platform capable of accelerating the data sharing process to satisfy the unique requirements of the data consumers, while ensuring the organization as a whole is remaining in compliance with regulations such as GDPR, CCPA, LGPD, and HIPAA.

Data is the raw material for any type of analytics whether it is related to the historical analysis presented in reports and dashboards by business analysts, or predictive analysis that involves building a model by data scientists that anticipates an event or behavior that has not yet occurred. To be truly useful, the raw information that forms the basis of reports and dashboards must be converted into data ready for consumption so business analysts can create reports, dashboards, and visualizations to paint a picture of the overall health of the organization.

Data scientists too can benefit from converted data as they can now leverage it to build and train statistical models using techniques such as linear regression, logistic regression, clustering, and time series. The output of which can be used to automate decision-making using sophisticated techniques such as machine learning.

But this task is becoming increasingly difficult due to the rise in compliance regulations such as GDPR, CCPA, LGPD, and HIPAA and the need for organizations to secure sensitive data across multiple cloud services. In fact, according to Gartners Hype Cycle for Privacy, 2021 report[1], By year-end 2023, 75% of the worlds population will have its personal data covered under modern privacy regulations, up from 25% todayand that before year-end 2023, more than 80% of companies worldwide will be facing at least one privacy-focused data protection regulation.

Because data analytics is an exploratory exercise, it requires data consumers such as business analysts and data scientists to analyze large bodies of data to reveal patterns, behaviors, or insights to inform some decision-making process. Machine learning, on the other hand, specifically attempts to understand the features with the biggest influence on the target variable. This requires access to a large amount of data that may contain sensitive elements, personally identifiable information (PII) such as a persons age, social security number, address, etc.

In many instances, this data is owned by different business units and is subjected to strict data sharing agreements; presenting infrastructure teams with unique challenges such as balancing the need to provide data consumers with access to enterprise data at the required granularity while complying with privacy regulations and requirements set by the actual data owners themselves. Another major challenge for the data infrastructure team is to support the rapid demand for data by the data science team for their analytics and innovation projects.

Data science requires not only reading data but also updating it in the above-mentioned preprocessing steps. Put simply, data science by nature is a read and write-intensive activity. To address this, data infrastructure teams usually create sandbox instances for these data consumers whenever they start a new project. However, these too require robust data access governance so as to not expose any sensitive or confidential data during data exploration.

According to the previously mentioned, Gartner Hype Cycle for Privacy, 2021 report, through 2024, privacy-driven spending on data protection and compliance technology will breakthrough to more than $15 billion worldwide. To support the growing data science activities in a company, data infrastructure teams need to implement a unified data access governance platform that has four important attributes:

Enterprises can only thrive in this economy if data can flow to the far reaches of the organization to help make decisions that improve the companys profitability and competitive position. However, every company must share data with proper guardrails in place so that only authorized personnel can access the required data. This is mandated by an ever-increasing list of privacy regulations, as well as to foster the trust that customers have placed with the company. A data governance solution that companies need to securely extract insights from their data must support both read and write operations, as well as automate the process of identifying and classifying sensitive data, take action on it by encrypting it, and providing visibility into the companys data ecosystem.

Balaji Ganesan is CEO and co-founder of both Privacera, the cloud data governance and security leader, and XA Secure, which was acquired by Hortonworks. He is an Apache Ranger committer and member of its project management committee (PMC). To learn more visit http://www.privacera.com or follow the company on Twitter.

Share This ArticleDo the sharing thingy

About AuthorMore info about author

Analytics Insight is an influential platform dedicated to insights, trends, and opinions from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

The rest is here:

Understanding the Role and Attributes of Data Access Governance in Data Science & Analytics - Analytics Insight

Related Posts

Comments are closed.