Category Archives: Data Science
PLOS Board Appointments – The Official PLOS Blog – PLOS
After a careful search, I am excited to share with our community four new appointments to the PLOS Board of Directors. This is a critical time for us as we expand our journal offerings, our global reach and challenge the landscape of Open Access publishing regarding sustainable business models. Each new member brings a depth and breadth of knowledge in their fields, which will enable us to continue to drive our mission forward while serving our scientific communities. The Board plays a key role as strategic thought partner to PLOS leadership, as well as oversight of organizational performance (business, strategic and financial), compliance and risk management.
Dr. Arlene Espinal, who joined the Board on September 1, currently serves as the Head of Microsoft Cloud Data Science and Analytics for Microsoft Corp. She is a leader in global strategy, Quantum-AI and next-generation digital technologies. She is also passionate about talent development and leads teams with diversity, inclusion, equitability, belonging and acceptance in mind essential to community and business. Recognized for her seminal role in driving awareness and change to social disparities that impact our communities, the Hispanic IT Executive Council named Dr. Espinal as a 2020 Top 100 Global Technology Leader. She was again recognized this year for her executive contributions. The National Diversity and Leadership Council recognized Dr. Espinal as one of the 2021 Top 50 Most Powerful Women in Technology.
Dr. Israel Borokini, who joined the Board on September 1, is a postdoctoral research fellow in the Department of Integrative Biology, University of California, Berkeley. His research focuses on combining ecological, geospatial, genomic, cytological, and phylogenetic data to identify patterns of community assemblages and biodiversity, and the eco-evolutionary mechanisms that generate and maintain them. Dr. Borokini completed his Ph.D. in the Ecology, Evolution, and Conservation Biology graduate program at the University of Nevada, Reno. He completed his undergraduate and Masters degrees in his home country of Nigeria before spending a decade as Principal Scientific Officer at the National Center for Genetic Resources and Biotechnology in Ibadan, Nigeria. Dr. Borokini not only expands the scientific expertise on the Board but also brings a passion for PLOSs mission. He has personally experienced the challenges of access to research in a low resource environment and will bring valuable perspectives to the Boards discussions as PLOS grows globally and prioritizes equity.
Richard Wilders deep experience in global public health law has a recurring theme: ensuring access. Prior to private practice, he was the General Counsel and Director of Business Development at the Coalition for Epidemic Preparedness Innovations (CEPI). At CEPI, he directed the legal and business development affairs during its initial start-up phase and through the first two years of the response to the COVID-19 pandemic. Prior to CEPI, he was the Associate General Counsel in the Global Health Program at the Bill & Melinda Gates Foundation. He provided legal expertise to ensure access to drugs, vaccines and diagnostics with a particular focus on access by affected populations in low and middle-income countries. His work also addressed how to ensure access to the artifacts of scientific research, including published materials, data, software code and biological materials. His Open Access policy work at Gates won the SPARC Innovator Award in 2015. Richard has also served as a committee member of the Roundtable on Aligning Incentives for Open Science convened by the National Academies of Science, Engineering and Medicine. He joined the Board in June, 2022.
Fernan Federici joined the Board in October, 2021. As we expand globally, Dr. Federicis perspective from a different research culture will prove invaluable. He is currently an Associate Professor and molecular geneticist at the Pontificia Universidad Catolica in Santiago, Chile. He has been a champion of Open Science in a number of areas including protocols and reagents, where he contributes toReclone (the Reagent Collaboration Network). Fernans research group also works on the promotion and development of Free/Libre Open Source technologies for research and education in molecular biology and bioengineering. The group is part of ReClone, the Gathering for Open Science Hardware community (GOSH) and the CYTED-reGOSH network for open technologies in Latin America.
I would be remiss if I did not take the opportunity to express my heartfelt thanks to Robin Lovell Badge, Mike Carroll and Meredith Niles for their outstanding years of service to the PLOS Board. Their wisdom and counsel have been enormously beneficial to me, and our organization, as we collectively charted a new path for PLOS, one focused on sustainability, inclusivity and expanding our roots globally. While its hard to say goodbye, we are excited to bring on board so many exceptional individuals with fresh perspectives. Please join me in welcoming our new Board members!
Excerpt from:
All You Need to Know About MATLAB (Matrix Laboratory) – Spiceworks News and Insights
MATLAB is defined as the proprietary software app and programming language by MathWorks, which facilitates complex data analysis tasks such as algorithm implementation, interacting with other apps, and manipulating a data matrix. This article explains the purpose of MATLAB, its key concepts, and use cases in 2022.
MATLAB is the proprietary software app and programming language by MathWorks, which facilitates complex data analysis tasks such as algorithm implementation, interacting with other apps and manipulating a data matrix.
How MATLAB Operates | Source
MATLAB stands for Matrix laboratory. It was designed by Cleve Moller and developed by MathWorks. It is a multipurpose programming language for numerical computation.
The LINPACK and EISPACK projects were responsible for initial development so that they could offer access to the Matrix software they had created. With over 4 million users, MATLAB has become a must-have tool. Advanced engineering and science courses are used as instructional tools. Researchers use it in industries as a development and analysis tool.
MATLAB has features such as built-in editing, debugging tools, and data structure. It has easy-to-use graphic commands and various built-in commands and math functions that enable users to perform mathematical calculations. This software allows users to manipulate matrices, run algorithms, design user interfaces, and visualize multiple functions and data types. It is used for signal processing, image and audio processing, machine learning, and deep learning.
Here are the key features of MATLAB:
See More: Why the Future of Database Management Lies In Open Source
MATLAB offers users numerous benefits, making it such an effective tool. It:
Further, errors in MATLAB are easy to fix as it is not a compiled language but an interpreted one. It also provides a platform for users to perform symbolic math operations using symbolic manipulation algorithms and tools.
However, there are a few constraints to remember. It is designed for scientific computing and, therefore, unsuitable for other applications. MATLAB as an interpreted language, is slower than other compiled languages such as C++. It is not a general-purpose programming language such as Fortran or C. Users have to create different files for different functions as MATLAB does not allow them to make functions in a single .m file, unlike other programming languages.
Finally, most MATLAB commands lack a direct equivalent in other programming language commands, as those commands are specific to MATLAB use only. This makes your skills non-transferable. Before we discuss how MATLAB works in more detail, here are some software applications that offer similar functionality:
See More: What Is Enterprise Data Management (EDM)? Definition, Importance, and Best Practices
As a fourth-generation programming language, MATLAB is primarily applied in technical computing. It provides a user-friendly environment that allows them to perform computation, visualization, and programming functions.
When a program is written on MATLAB, it uses a just-in-time compiler to make the written program fast. Afterward, it assigns mathematical processing jobs to the computers central processing unit and optimizes library calls. Thus, it ensures that the program is solved more easily. The following components power MATLABs working:
The term MATLAB environment refers to the collection of tools and infrastructure made accessible to users on the MATLAB platform. Capabilities to manage variables in the workspace, and facilities to import and export data, are included in this component. Tools for organizing, creating, debugging, and profiling M-files and programs designed with MATLAB are also available in the environment.
A MATLAB environment can be used as an interactive calculator or a programming environment. In calculator mode, the built-in functions, algorithms, and toolboxes of MATLAB provide an all-in-one environment to perform calculations and visualize results using graphical plotting. On the other hand, MATLAB, in programming mode, has an editor, a debugger, and a profiler that enables users to write their functions and scripts.
When users start MATLAB, a window with several panels appears. This window has a workspace panel, a command window, a current directory panel, and a command history panel. The command window has a command line prompt used to run functions that work on variables. All variables are made and stored in the workspace, where the workspace panel lets users access them easily.
Users can view saved data files on the current directory panel. Users can access the history of all commands that have been executed from the command history panel. Additionally, MATLAB has other window panels that one can access as the need arises. Such windows include a debugger window, an array editor window, and a help browser window. Users can access helpful information about any function or toolbox can be accessed through the command line help function in the help browser.
On the command line, arrays are built from the ground up. In MATLAB, data is structured into multidimensional arrays. Users can modify arrays through addition or multiplication to achieve different objectives. In addition, individual elements are added, while one can achieve multiplication via matrix multiplication or multiplication of elements.
In calculator mode, you can change the elements of an array by double-clicking on the arrays name in the workspace panel. This opens the array editor, which lets you change the array by hand.
Meanwhile, in programming mode, you can change the elements by making a part of an array on the left-hand side of an assignment statement. Users can delete a row or column by putting it in an empty array. A 0 x 0 matrix is the same as an empty array.
In MATLAB, every variable is an array or a matrix. Variables in the workspace are visible from the workspace panel. Users can also access variables using the who command. The who command displays variables currently in memory, their types, the memory allocated to each variable, and whether they are complex variables or not. Data files are saved in the current directory accessible via the existing directory panel in the format .mat.
In MATLAB, functions are used to perform computational tasks. They were created to improve the basic functionalities of MATLAB. They only interact with one another via the arguments that act as the input and output. Functions have their isolated workspace for their variables. In calculator mode, users can write each line of the function one after the other at the command line, execute and copy it, if it works as expected.
A script refers to a file that consists of several sequential lines of MATLAB commands. In the workspace, scripts use variables. Functions and scripts consist of text files with a .m extension. To differentiate functions from scripts in MATLAB, the keyword function appears in the first line of the text content, right at the beginning.
N-D rays are multidimensional arrays used in MATLAB. Either the fundamental 2-D arrays are extended to generate them, or the arrays are constructed from scratch directly by applying functions such as zeros and ones. Dense arrays have to be represented by N-D arrays since they cant be stored in the same way as sparse arrays.
See More: What Is Data Security? Definition, Planning, Policy, and Best Practices
A function is a collection of statements that, when combined, carry out a specific job or task. MATLAB functions are specified in separate files, such as script files. One may download MATLAB here. The functions name and the files name must always be comparable.
In general, functions will take in more than one parameter and may return more than one argument after processing those arguments. Variables inside a functions workspace, known as the local workspace, are subject to the functions operations. This workspace is distinct from the base workspace, which may be reached using the MATLAB command prompt.
In MATLAB, functions can be created using the below syntax: function [out1,out2, , outN] = run (in1,in2,in3, , inN). From this function, run is the name of the function that accepts the input arguments in1, in2inN and returns output out1, out2outN.
Now let us turn to the five types of MATLAB functions:
Primary functions are usually defined within a file. They are listed first in the function file. Additionally, main functions may be invoked from outside the file in which they are defined, either by additional functions or via the command line.
Sub-functions are similarly defined within a file. Optional sub-functions may appear after primary functions within a file. Unlike major functions, sub-functions cant be invoked from other functions or the command line outside of the file that specifies them. They are accessible to the principal function and additional sub-functions inside a function file that specifies sub-functions.
Nested functions are defined within another function or parent function. A nested function can access and alter the variables declared by its parent function. They are declared inside the context of some other function and have access to the workspace of the parent function.
A function is defined in MATLAB with a single statement. It consists of a single MATLAB statement and an unbounded quantity of input and output parameters. One may create anonymous functions at the command line or inside a script or function in MATLAB. This allows users to build essential functions without creating a separate file. Thus, they are not stored in program files.
Private functions are only accessible to a small subset of other functions. It is a type of primary function that resides in subfolders known as private. Users can create private functions to avoid revealing the implementation of a function. Users cant invoke private functions from the command line or outside their parent folder. The following are examples of essential MATLAB functions:
See More: Top 10 Data Governance Tools for 2021
MATLAB is used in several industries, including the automotive, biotech, and pharmaceutical sectors and the electronics, artificial intelligence, robotics, and communication sectors. It is helpful for data scientists, mechanical engineers, machine learning experts, computational finance specialists, and research scientists. The following are the uses of MATLAB:
Data analytics involves studying and analyzing data to get valuable insights. Data analytics is usually done with software and tools. MATLAB provides an environment where data scientists, engineers, and IT specialists can effectively analyze data. They can also build big data analytics systems for instance, for financial analysis.
Organizations can use MATLAB to perform an economic assessment. It has tools that financial specialists can use to evaluate factors such as profitability, solvency, liquidity, and organizational stability.
MATLAB provides a platform where users can easily control whole systems and devices. Users can use MATLAB to create a control system for various industrial systems. The control systems are based on the control loop. Through the control system, users can give commands to the parts of the system, manage them, and regulate their behavior.
For instance, engineers can create a control system using MATLAB to enable them to control heating systems easily. Additionally, MATLAB has a control system toolbox that allows users to analyze algorithms and apps and design linear control systems.
Embedded systems refer to computer components comprising more than 90% hardware and 10% software. They are designed to perform specific tasks. MATLAB has a unique push button feature that generates a code and runs it on the hardware after it is pushed. Examples of embedded systems are microwaves, cameras, and printers.
Motor control algorithms are helpful in the regulation of speed and other performance characteristics of an application. MATLAB algorithms help with precision control, energy efficiency, and system protection. In the development stage, MATLAB can help users reduce the time to develop algorithms and cost-save them before committing to expensive hardware testing.
Testing and measuring electronic products is a standard manufacturing best practice. Electronic products are subjected to various tests during this process to ensure that only quality and standard products are sent to the market. Physical examinations are carried out to identify any material defects, while functional tests are carried out to ensure that the products work as expected.
MATLAB allows engineers to perform tasks while testing and measuring electronic products. It provides them with the necessary tools and helps them automate tasks. Additionally, they can use MATLAB to perform live visualization and data analysis from the data they collect.
See More: What Is a Data Catalog? Definition, Examples, and Best Practices
Computers and unique digital signal processors perform various signal processing operations in digital signal processing. The MATLAB environment makes it easier for users to use signal processing techniques when analyzing time series data. It also provides a unified workflow for developing streaming applications and embedded systems.
Robotics is a multidisciplinary field of science and engineering that involves the creation of robots or human-like machines. MATLAB provides an all-in-one environment where robotic researchers and engineers can design robots. They can use MATLAB to create and tune algorithms, generate codes automatically and make real-world model systems.
Mechatronics combines the scientific fields of electronics and mechanical engineering. In mechatronic systems, electrical, mechanical, control, and embedded software subsystems are integrated. MATLAB provides an all-in-one environment where mechatronic engineers can design and simulate all those subsystems.
Image processing focuses on processing raw images to prepare them for other tasks, such as computer vision. In image processing, pixels of images are managed through the modification of matrix values with the help of math techniques. Meanwhile, computer vision involves looking at pictures as the human eye does, then understanding and predicting the visual output.
MATLAB provides an environment where the vital process of building algorithms and analysis of images can be done. For instance, it includes machine learning algorithms that support applications that enhance pictures by using face beauty and scanning barcodes. Digital image processing is also helpful in transmitting, receiving, and decoding data from satellites.
Engineers design predictive maintenance techniques to determine the equipments condition to figure out when users must conduct maintenance. MATLAB has a predictive maintenance toolbox that engineers can use to level data, design condition indicators, and estimate the remaining useful life of a machine.
Wireless communication involves connecting two or more devices using a wireless signal. Engineers working in teams can boost productivity by working with MATLAB. With MATLAB, they can reduce development time as they can easily exchange ideas and eliminate design problems early by pointing out overlooked errors. MATLAB also provides streamlined testing and verification of wireless devices.
See More: What Is Big Data? Definition, Types, Importance, and Best Practices
MATLAB is indispensable for technical teams working with data operations and user interfaces (UI). It simplifies complex calculations, makes it easy to work out AI and ML algorithms, and facilitates UI simulation and design. MATLAB is also available directly online via your web browser, removing the need to install software locally. Ultimately, MATLAB combines visualization, advanced computation, and programming in an easy-to-use way.
Did this article answer all your doubts and queries about MATLAB? Tell us on Facebook, Twitter, and LinkedIn. Wed love to hear from you!
Follow this link:
All You Need to Know About MATLAB (Matrix Laboratory) - Spiceworks News and Insights
Top 10 Best Countries to Study AI for Indian Aspirants in 2023 – Analytics Insight
Life without data is quite difficult to even think about. There isnt a single activity you do that doesnt involve data. With such humungous data available, it is important that the data collected is put to the best possible use. This is where technology plays a significant role and why artificial Intelligence is critical. If you are an AI aspirant, you have landed at the right place. On that note, have a look at the top 10 best countries to study AI for Indian aspirants in 2023.
UK has gained wide recognition as a place that has succeeded in establishing the link between artificial intelligence (AI) and the Financial Technology industry (FinTech). That being said, can anything get better than being a part of a wide range of international summits such as the Deep Learning Summit, AI Summit, ODSCs European Conference, etc.?
Canada takes pride in being the home to the top universities in the world in the field of artificial intelligence. The universities here have contributed to various scholarly disciplines and commercial innovations involving AI and Big Data.
India, too, has countless opportunities in the field of AI. Out of the many cities in this context, Bangalore, Mumbai, and Hyderabad are considered to be a hub for every significant technological advancement. These cities also boast of several dedicated Artificial intelligence labs, thus making India an absolute favorite place to get educated from.
Undoubtedly, the USA is one of the dream locations to work for the ones interested in making a career in AI. A point worth a mention is that one can find the headquarters of almost every major big American tech company in the USA.
The manner and extent to which artificial intelligence, data science, and other technical aspects are adopted in France are quite astonishing. Thus, theres no denying that the future of AI here is bright and is all set to open opportunity doors for artificial Intelligence aspirants and professionals.
This European country pays you quite handsomely way beyond your expectation levels in the AI domain. With top organizations such as Dell, HP, IBM, Microsoft, Google, and Oracle, to name a few, having ample opportunities in the field of artificial intelligence here, you know where to go!
Yes, Germany is known for varied opportunities in the automobile sector. Additionally, the country also boasts of an ample number of career options/job roles in the field of AI. Well, you will have a promising career here.
Singapore boasts of a string of top-rated universities such as the National University of Singapore, Nanyang Technological University, Singapore Management University, and more. Having said that, you know why the country features in the list of the top 10 best countries to study AI for Indian aspirants.
China is home to several amazing technology-oriented Universities right from Peking University to Tsinghua University, Fudan University to Zhejiang University. You now have a portfolio of Universities to choose from.
Share This Article Do the sharing thingy
About Author More info about author
View original post here:
Top 10 Best Countries to Study AI for Indian Aspirants in 2023 - Analytics Insight
Behind the blackout triggered by Hurricane Fiona is a long-embattled history of Puerto Rico’s weak and outdated electrical grid – CNN
CNN
Less than two weeks after Hurricane Fiona made landfall on Puerto Rico, triggering an islandwide blackout for 1.5 million customers, power has been restored to 84% of residents, officials said.
Fiona hit the US territory as a Category 1 storm September 18, dropping record rainfall, unleashing mudslides, flooding neighborhoods and leaving most of the island without power or water. The islands health department said at least 25 deaths may be linked to the storm.
Fiona made landfall almost exactly five years after 2017s Category 4 Hurricane Maria left many residents without electricity for months and delivered a blow from which the island has never fully recovered.
Power outages on the island have been a long-running source of frustration for Puerto Ricans who rely on a fragile and poorly-maintained power grid, with modernization efforts slow to materialize over several decades, first by a publicly-owned entity and today by a private caretaker.
The highly centralized grid has one major power line which, if compromised, shatters the entire system. The grid continues to suffer from a history of underinvestment and an outdated energy infrastructure, making it vulnerable to natural disasters and prone to extensive outages.
Because the whole system hasnt been properly cared for or modernized, we are in a position where anytime a storm hits or there is some sort of natural disaster, the whole grid falls apart, said Lpez Varona of the Center for Popular Democracy, an advocacy group organizing recovery efforts for Puerto Rico.
LUMA Energy, the Canadian-American power company responsible for power distribution and transmission on the island, took over management of the grid in 2021 from the government-owned Puerto Rico Electric Power Authority, known as PREPA, which has relied on fossil fuels to power the system.
LUMA, which landed a 15-year operation and maintenance deal, has faced growing criticism from activists and residents for steep billing costs and widespread blackouts, as well as calls for the government to terminate its contract.
PREPA, which is still in charge of power generation on the island, was created in 1941 as the islands sole electric utility.
It filed for bankruptcy in 2017 under Title III of the Puerto Rico Oversight, Management, and Economic Stability Act of 2016, which created a legal framework for restructuring the US territorys $74 billion debt. In September, mediation talks to restructure the PREPAs $9 billion debt to bondholders ended without a deal.
Gov. Pedro Pierluisi gave his first public criticism of LUMA in August, saying he is not satisfied with its performance and said it must make changes to significantly improve its services.
The House of Representatives Committee on Energy and Commerce wrote to LUMA on September 27 expressing deep concerns over the power outages following Fiona. It said the company had not prepared the islands energy infrastructure to withstand a Category 1 hurricane.
Ongoing outages and the complete disruption of power following Hurricane Fiona amplify concerns that LUMA has failed to adequately develop and maintain crucial electrical infrastructure in Puerto Rico despite its lucrative 15-year contract, three committee leaders wrote to LUMA Energy President and CEO Wayne Stensby.
In addition to LUMAs service being riddled with chronic power outages and disruptions, the letter states, Puerto Ricans spend 8% of their income on electricity, compared to just 2.4% spent by the average citizen in mainland United States.
In September 2020, the US Federal Emergency Management Agency (FEMA) dedicated more than $9.4 billion in funding for projects to transform the grid and more funds have been added since, bringing the total to roughly $12 billion, which represents the largest allocation of funds in FEMAs history.
FEMA approved an additional $107.3 million in funding earlier this year to modernize the grid in the aftermath of Maria, which includes 15 major projects to repair and restore the system.
Dr. Shay Bahramirad, senior vice president of Engineering, Asset Management and Capital Programs for LUMA who is serving as the incident command for Fionas impact, told CNN significant progress has been made through the projects funded by FEMA.
LUMA has made improvements during the first year of its contract, such as improved safety for employees, 30% fewer outages than PREPA, replacing more than 3,000 broken utility poles and connecting more than 25,000 customers to rooftop solar power energy, according to a progress update released in June.
Bahramirad said the poor design of the system, along with a lack of maintenance and management, has created cascaded outages.
There is a sense of urgency. I completely understand the frustration of our customers. However it is equally important to do this right, Bahramirad said, adding it is critical the system is getting built based on data science and sound engineering per industry standards.
Hurricane Maria led to an overall vision for the need of reforming the energy system, said Alejandro Figueroa, director of infrastructure for the Financial Oversight and Management Board for Puerto Rico, which was established under the oversight legislation in 2016. He previously served as general counsel for the Puerto Rico Energy Bureau, the islands energy regulator.
The board was created to restructure PREPAs debt, putting it on a path of fiscal stability and overseeing efforts to make the system more resilient.
Figuroa pointed to two key causes of the systems problems: A revolving door of management of the publicly-owned entity, which he said mimicked political cycles, as new governors would bring in new managers, and PREPAs rates, which remained unchanged for nearly two decades.
He explained as inflation and recessions raised the cost of materials and labor, and the population of the island dwindled, the company could not keep up with maintenance, office operations or its debt, and the system was ripe for disaster when Maria struck.
Under the restructuring, power distribution, transmission and maintenance were shifted to LUMA, a private operator, a move many activists have criticized, arguing it has only worsened the systems reliability.
Why are there so many blackouts? Mostly, its because of lack of investment in the system, Figuroa said. You have frequent equipment breakdowns, you have a maintenance program that reacts to an outage and tries to fix it as quickly as possible instead of proactively identifying a weak point in the system and properly fixing it to reduce the chance of an outage over time.
Additionally, Figuroa said, the lack of investment by PREPA to maintain its distribution and transmission lines has made the system especially vulnerable to clashing with trees and other aspects of the tropical islands ecosystem.
If you have a wooden pole instead of an aluminum pole, when a hurricane comes, the pole is more likely to fall and therefore causes longer outages than if you had strengthened the system, he said.
Since LUMA took control in 2021, outages happen less often than under PREPA, but are longer by an average of 30 minutes, Figuroa pointed out, but with a caveat: The islands energy regulator concluded PREPA grossly underreported its numbers for 2019 and 2020.
A lot of people are frustrated because they perceive the service being delivered today has not improved, or in some cases they perceive that its getting worse. But a lot of that has to do with the amount of investment and work that needs to be put into fixing and improving one of the United States largest and most complex energy systems in existence, Figuroa said.
The process will unavoidably take time, he added, for customers to feel a meaningful improvement in the quality of services LUMA provides.
Varona of the Center for Popular Democracy said the organization believes LUMAs contract should be canceled, adding LUMA has not been an effective administrator of the grid.
The advocacy group has criticized the Financial Oversight and Management Board for using its power to impose devastating austerity measures and negotiate unsustainable debt restructuring plans that enrich Wall Street and hurt Puerto Ricans, said a report released last year.
There must be real investments in not only modernizing the grid, but decentralizing and replacing it with a renewable energy system, Varona added, echoing calls by activists and many residents to use FEMAs funds for a transition to solar power.
The Queremos Sol coalition has proposed a plan to be adopted by Puerto Ricos government to transform the grid with the goal of achieving 50% renewable energy generation by 2035 and 100% by 2050.
The proposal provides a pathway to a self-sufficient system relying on renewable resources, mainly solar, by using clean renewable technologies and inclusive structures and processes meant to eliminate partisan political interference and systemic corruption, it states.
We need to make sure that the dollars that were allocated to rebuild the electrical grid move faster so that we can rebuild the grid, Varona said. And hopefully when we do, we rebuild it through a decentralized system that relies more on the biggest source of power that Puerto Rico has: 365 days of sun.
Read this article:
Getting Started with Pandas Cheatsheet – KDnuggets
Pandas is one of, if not the, most widely-used and relied-upon libraries in the Python ecosystem. Pandas is often the first stop for data scientists for data processing, analysis, and manipulation.
Do you have tabular data you want to process? There is basically not way around using Pandas, and nor should you look for one. Pandas is rich in functionality, is incredibly powerful, and provides robust flexibility. Want to inspect data? Pandas can help. Need to query data? Pandas has you covered. Have to prepare tabular data for machine learning? Pandas is here for you.
KDnuggets' Abid Ali Awan further describes Pandas as follows:
Pandas is a flexible and easy-to-use tool for performing data analysis and data manipulation. It is widely used among data scientists for preparing data, cleaning data, and running data science experiments. Pandas is an open-source library that helps you solve complex statistical problems with simple and easy-to-use syntax.
Do you know how to leverage Pandas in your projects? You really should! There are plenty of resources to help with this, but getting right to work and dirtying your hands is always a great idea. But where do you turn for a quick reference?
To help, KDnuggets has put together this fantastic Pandas primer, which covers some of the important first steps in your Pandas journey.
The following quick reference cheatsheet guide will provide you with the basic Pandas operations needed to start querying and modifying DataFrames, the basic data structure of the library. It will show you how to create DataFrames, import and export data to and from them, inspect the DataFrames, as well subset, query, and reshape the DataFrames. Once you master these introductory operations, you should be ready for more advanced Pandas tasks.
Download the quick reference cheatsheet guide PDF here!
Learning Pandas is worth the effort. Beginners are often discouraged by the breadth of operations and the at-first intimidating syntax. But by taking it step by step, mastering the basics, and keeping a reference handy while you practice (like, say, this cheatsheet), you will be making progress with Python's most data processing ubiquitous library in no time.
Matthew Mayo (@mattmayo13) is a Data Scientist and the Editor-in-Chief of KDnuggets, the seminal online Data Science and Machine Learning resource. His interests lie in natural language processing, algorithm design and optimization, unsupervised learning, neural networks, and automated approaches to machine learning. Matthew holds a Master's degree in computer science and a graduate diploma in data mining. He can be reached at editor1 at kdnuggets[dot]com.
More here:
Five Best Career Choices For Certified Data Scientists – Spiceworks News and Insights
Data science has been around for a few decades, but it was only lately that companies realized they need to leverage the approach to use enormous data stacks for decision making. The employment prospects expand as the profession diversifies and rises in prominence. Here are the top five data science careers in terms of roles and responsibilities, skills, certifications, job prospects, and average pay.
Data science has been around since the 1990s, but its importance was acknowledged only when organizations found themselves unable to utilize massive amounts of data for decision-making. Data science has aided organizations in expanding beyond the traditional boundaries of data consolidation. It helps enterprises to have access to more and more information and to perceive new things in a different light.
Are you a certified data scientist with a relevant bachelors degree looking to move beyond your current entry-level job? Data science positions can encompass everything from business intelligence and machine learning to data architecture and big data managementall of which make it extremely confusing when charting a career path for yourself. Which position will match your talents and goals? To help you decide, here we compare five top data science careers in terms of job objectives and responsibilities, skills, certifications, job outlook, and average salary.
See More: What Is Data Science? Definition, Lifecycle, and Applications
This career path is for those possessing both business acumen and consulting skills and an excellent understanding of data. As a BI Analyst, your focus will be on analyzing your organizations existing data, such as monthly sales, quarterly expenses, or customer churn. Youll examine the data in terms of your organizations key performance indicators (KPIs) and business performance and recommend where improvements need to be made. In addition to mining your own companys data, you will be gathering data from various sources, including your competitors and industry data. Among the goals of your analysis will be to find ways your company can improve its market position, profit margins, and the efficiency of its systems, procedures and functions, as well as new ways to better its data collection and analysis methodologies.
Skill Sets
Certification: Certifications available to BI analysts include the Microsoft Certified: Data Analyst Associate and TDWIs Certified Business Intelligence Professional certification. Certifications in specific computer languages like SAS are also available.
Job outlook: 11% growth rate through 2029 (U.S. Bureau of Labor Statistics (BLS)
Average annual salary: $66,000 $79,000 (Glassdoor and Payscale)
Individuals pursuing this career path are more interested in using data to help companies make better decisions and improve their business practices than in creating the algorithms used for data discovery and acquisition. As a data analyst, you will use existing tools, systems and data sets to generate actionable insights from your organizations data. You will identify, extract, and analyze key business performance, risk and compliance data and present your findings to the organizations decision-makers. You will be called upon to write reports and present your findings. You will need to be able to recognize and understand the trends and insights that can be found in big data sets. Many data analysts move on to become data engineers, data architects, or data scientists after they have acquired over ten years of experience.
Skill sets
Certification: Online certification courses are available for data analytics, including certifications in business analytics, predictive analytics, and data visualization, such as those provided by 365 Data Science and Analytics Vidhya.
Job outlook: 22% growth rate through 2030 (BLS)
Average annual salary: $57,000 68,000 (Glassdoor and PayScale)
This career path is for those more interested in building and optimizing data systems than mining them for actionable insights. Unlike the other data science careers, data engineering focuses on the systems and hardware that facilitate an organizations data activities rather than data analysis. As a data engineer, you will use your analytical and decision-making skills to develop your organizations data infrastructure and build data pipelines that ensure the relevant departments and decision-makers can access the data they need. Your focus will be on collecting, managing, analyzing and visualizing large datasets, and ensuring that all big data applications are accessible and working properly. The data engineer career path could also be a stepping stone toward a career in machine learning engineering.
Skill sets
Certification: Online certification courses are available, such as the Certified Data Management Professional (CDMP) certification offered by Data Management Association (DAMA) International, Googles Certified Professional in data engineering, IBM Certified Engineer in Big Data, the CCP Data Engineer from Cloudera, and the Microsoft Certified Solutions Expert certification in data management and analytics.
Job outlook: Between 22% and 33% through 2030 (BLS)
Average annual salary: $103,000 $117,000 (Glassdoor and PayScale)
This career path is for analytical and creative individuals whose main interest lies in innovating and designing new solutions for storing and managing complex database systems. As a data architect, you will work with software designers and data engineers to develop databases from the ground up, including design patterns, data modeling, and database integration. You will also be charged with integrating, centralizing, protecting and maintaining all data sources within your company. Youre responsible for how your organizations data is collected, stored and accessed.
Skill sets
Certification: Certified Data Management Professional (CDMP) from the Institute for Certified Computing Professionals.
Job outlook: 9% through 2031. (BLS)
Average annual salary: $104,000 $125,000 (Glassdoor and Payscale)
This career path is for those excited about the patterns and trends they can learn from building predictive machine learning models. As a data scientist, you need an analytical mindset and a passion for seeing your work improve business outcomes. Individuals pursuing a data scientist career must be able to take on the roles of mathematician, computer scientist, and business strategist and convey their analyses to technical and non-technical stakeholders. You will build and deploy predictive models that go beyond discovering what has happened to what will happen using machine learning or deep learning techniques. This role requires you to be an excellent problem-solver and be willing to keep your skills current. Many start their data scientist careers as data architects or data analysts.
Skill sets
Certification: Online certification courses are available for data science practitioners, such as those provided by 365 Data Science and Analytics Vidhya.
Job outlook: 27.9% growth rate through 2026 (BLS)
Average annual salary: $100,000-118,000 (Glassdoor and Payscale)
See More: Data Scientist: Job Description, Key Skills, and Salary in 2022
When choosing which career path, you should follow, take into consideration the advice of Yvon Chouinard, billionaire and founder of outdoor apparel brand Patagonia:
I regard purpose as being at the intersection of what the world needs, what youre good at, what youre passionate about, and how you can make money.
The world needs data scientists and is willing to compensate them well for their skills. Thus, choosing the direction of your career comes down to how your skill set matches that required by your chosen career and, more importantly, how passionate you are about it. You can always add to your skill set, but you can never recover the time spent in a job you are not passionate about.
Which data science career path would you like to pursue? Comment below or let us know on LinkedIn, Twitter, or Facebook. We would love to hear from you!
Originally posted here:
Five Best Career Choices For Certified Data Scientists - Spiceworks News and Insights
School of Data Science and Society gains momentum – The Well : The Well – The Well
After years of planning by hundreds of faculty and administrators across campus, Carolinas School of Data Science and Society (SDSS) is well underway.
The schools inaugural dean, Stan Ahalt, former director of the Renaissance Computing Institute, has been working tirelessly with an implementation leadership team. They are planning an official launch event later this semester.
The School of Data Science and Society will leverage the talents of world-class faculty across disciplines and focus on the foundations and applications of data science to improve lives in North Carolina and across the globe, said Chancellor Kevin M. Guskiewicz. The new school will also prepare students for a changing workplace and help attract and keep competitive employers in our state.
On Sept. 12, Guskiewicz joined Ahalt for a well-attended public conversation about expectations and future plans for the school at the UNC CURRENT ArtSpace + Studio. Also on hand to field questions from Ahalt were Interim Vice Chancellor for Research Penny Gordon-Larsen and Assistant Professor of Art History Kathryn Desplanque. The event was part of the popular Carolina Data Science Now series, co-sponsored by the new school and RENCI to illuminate data science research across disciplines.
I love the and society at the end of the schools name, Guskiewicz said when Ahalt asked about his expectations for the school. Were going to bring the social sciences, the human dimension, into the school, in the way we not only capture data, analyze data, interpret data, but (consider) how society uses that data to make informed decisions.
During the event, Ahalt and guests discussed how the school will address the increasing need for data literacy across different industries and research fields, including hiring faculty with diverse research backgrounds, forming relationships with relevant industry partners, providing training on effective data science methods and building a curriculum that addresses critical topics such as cybersecurity, artificial intelligence, and data privacy and ethics.
And they spoke a lot about the schools role in aiding the sharing of data between experts at the Universitys numerous research centers and institutes.
Data is really the language of collaboration, Gordon-Larsen said.
Gordon-Larsens insight into the collaborative nature of data science resonated with those in attendance on Sept. 12 for good reason. Throughout the decade or so of planning that led to this moment, collaboration has been a core principle. The schools interdisciplinary approach will be reflected in a pan-University advisory council to be named later this fall.
Other next steps include:
Over the next few months, Ahalt and the implementation team RENCIs Jay Aikat, Carolina geneticist Terry Magnuson and administrator Anna Rose Medley will define research clusters based on the subject areas on which the school will concentrate. At first that might mean three to five research clusters, all interdisciplinary, involving people from different schools focused on a major challenge that needs to be solved.
The implementation team views establishing the curriculum as among the groups most important goals. Theyre well underway in building the online masters degree program. A minor in the College introduced in the fall of 2021 has proven extremely popular, attracting more than 500 students in its first year. And the team is working on both the Bachelor of Science and Bachelor of Arts degrees in collaboration with the College and other schools.
The implementation team members and Ahalt welcome engagement with faculty, staff and students who have questions about the schools next steps and how they can get involved. Email sdss@unc.edu with questions, comments or suggestions.
The implementation team projects the first three years as a startup phase hiring faculty and staff and launching degree programs to move the school into a steady state of operations in five to seven years. A brick-and-mortar location will come later.
The schools leadership team is in the process of building out infrastructure to support curriculum, academic and faculty affairs, student enrollment and student mentoring. Additionally, discussions are underway with units across campus to develop a strategic roadmap to promote data literacy and data-related research and training across the entire University.
The collective expertise we need is already present on this campus. The SDSS will grow this pool of expertise and provide a focal point for collaborations, Ahalt said. Carolina is a unique institution that practices the credo of collaboration across disciplines. We will focus on the science, methods and technologies that anchor data science as well as applications that have an impact on society.
Based on the interest shown so far by faculty, staff and students, the School of Data Science and Society will be a welcome addition to the University.
What: SDSS Distinguished Speaker Series with Dr. Phil Bourne, founding dean of the School of Data Science at the University of Virginia (seminar open to the public, reception to follow)When: Wednesday, Sept. 28 at 12:20 p.m.Where: Kerr Hall, room 2001
Data science is transformative an easy assertion to make when one has worked in academia for several years. Nevertheless, the digital transformation of society cannot be denied. Academic data science initiatives around the country are responding and contributing to the transformation with new trainees, innovative research and local community action. As UNC launches its new School of Data Science and Society, we will spend time reflecting on the age-old question: Whats in it for me?
See original here:
School of Data Science and Society gains momentum - The Well : The Well - The Well
Analytics and Data Science News for the Week of September 23; Updates from Count, Domino Data, Power BI, and More – Solutions Review
The editors at Solutions Review have curated this list of the most noteworthy analytics and data science news items for the week of September 23, 2022.
Keeping tabs on all the most relevant analytics and data science news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last month, in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy analytics and data science news items.
Count is a hyper-collaborative data platform that is putting collaboration and problem-solving at the heart of data analysis. Its flagship product canvas is an all-in-one data analysis and contextualization platform that helps teams join forces during the entire analytics workflow, accelerating data-driven decision-making across the whole business.
Read on for more.
The new Domino integration with NVIDIA GPUs and NetApp data management and storage solutions will allow teams to run AI and machine learning workloads in either data centers or AWS without refactoring them. To support partners as they advance AI centers of excellence with full-stack solutions, Domino has also released an NVIDIA-validated reference architecture for integrating MLOps and on-premises NVIDIA DGX systems.
Read on for more.
Domo released the 10th edition of its Data Never Sleeps (DNS) infographic, the annual glimpse at how much data is generated on the internet every minute by the ways people interact online. Over the last decade of chronicling the worlds data usage, Domo finds that the use of services such as Instagram, YouTube, Amazon, and Venmo among others has increased hundreds and even thousands of percentage points in some cases.
Read on for more.
Matik Team enables individuals and small teams to automate the creation of any presentation that needs to be personalized regularly or updated frequently. Matik Team automates the process of creating these data-driven presentationsall users have to do is provide Matik with a few inputs, like who the presentation is for or a specific date range, and Matik will generate a presentation natively in Google Slides or Microsoft PowerPoint that is ready for use.
Read on for more.
Horizontal Fusion is the term used to highlight the approach of fusing multiple smaller data source queries together into a larger data source query. Fewer data source queries mean fewer roundtrips and fewer expensive scans over large data sources, which ultimately results in sizeable DAX performance gains plus reduced processing demand at the data source. Not only do DAX queries run faster with Horizontal Fusion, especially in DirectQuery mode, but scalability also increases.
Read on for more.
The aim is to offer a unique webinar featuring an inside look at the vendors new platform for automation and remediation, which enables visibility across all environments so IT teams can continuously improve the digital workplace by optimizing productivity and cost. Alongside a live product demo, the Spotlight event will also feature an interview about the product with a member of Nexthinks team.
Read on for more.
Striim announced at the Big Data LDN conference and expo that it has expanded its worldwide reach and is making its fully managed data streaming service Striim Cloud available to the United Kingdom and Europe. Databricks Technology Partners integrate their solutions with Databricks to provide complementary capabilities for ETL, data ingestion, business intelligence, machine learning, and governance.
Read on for more.
ThoughtSpot has announced the opening of a new office in Trivandrum, the companys third R&D center in India. The new investment will fuel continued product innovation for the companys Modern Analytics Cloud vision and product line. The investments ThoughtSpot is making in India over the next five years is part of the companys long term strategy in the market and a natural evolution of activity to date.
Read on for more.
Youll master the skills necessary to become a successful Data Scientist. Youll work on projects designed by industry experts, and learn to run data pipelines, design experiments, build recommendation systems, and deploy solutions to the cloud. It is recommended that students be familiar with machine learning concepts and Python programming, probability, and statistics.
View training.
For consideration in future analytics and data science news roundups, send your announcements to the editor: tking@solutionsreview.com.
Widget not in any sidebars
Tim is Solutions Review's Editorial Director and leads coverage on big data, business intelligence, and data analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in data management and data integration, Tim is a recognized influencer and thought leader in enterprise business software. Reach him via tking at solutionsreview dot com.
Read the rest here:
W&M explores creation of a computing and data science school – College of William and Mary
William & Mary is exploring the possibility of establishing a new academic unit in computing and data science, Provost Peggy Agouris told members of the Board of Visitors on Thursday.
The effort springs from a surge in student interest in applied science, computer science and data science at William & Mary, and a commitment from the university in its strategic plan to support anticipated needs in the Virginia workforce.
To meet anticipated growth, Agouris has formed an exploratory design team with representatives from all five W&M schools, while three core departments are working to develop a model for the proposed academic unit, which could potentially be a separate school.
Its critical to evaluate how these growing units can be best organized because it can have serious implications on our ability to provide resources for the education that W&M is offering across disciplines and to attract and expand key partnerships said Agouris, who presented the effort during the Board of Visitors Committee on Academic Affairs meeting in the W&M Alumni House. The right organizational structure can re-imagine our value in the computational and data space. It can foster important relationships at the state and federal levels, with other institutions, with friends and donors, and with like-minded organizations that might be new partners to us. Its my hope that it deepens our strengths and expands our horizons.
The university has experienced an explosion of interest in the computational sciences in recent years, and computational skills are also increasingly used throughout other disciplines. Over the last 10 years, interest in computational fields has more than tripled at W&M, going from 211 declared majors in just two fields (computer science and math) to 738 in six (computer science, data science, math, computational and applied mathematics and statistics, business analytics data science, and business analytics supply chain).
The growth in those fields reflects an overall increase in student interest in STEM fields at W&M. From 2011 to 2022, the number of graduates in STEM disciplines at W&M has more than doubled, growing from 284 to 693. Looking just at the past two years, the number of computer science degrees that the university conferred went from 78 to 93. In the data science program, which just began in 2020, the number of degrees conferred went from eight in 2021 to 35 in 2022.
At the same time, data has become increasingly important to the university overall. With data as one four initiatives outlined in the Vision 2026 strategic plan, William & Mary has committed to expanding its presence and influence in computational and data sciences consistent with student demand and Virginia workforce needs.
This school represents an opportunity to boldly grow the community of William & Mary in new directions, serve new student populations and showcase the incredible talent of our teachers and researchers to new domestic and international audiences, said Dan Runfola, assistant professor of applied science. By integrating our computational activities into a new unit, we recognize the unique challenges and opportunities these rapidly evolving fields present and gain the ability to nimbly respond to new opportunities without disrupting our ability to offer a world class liberal arts education.
Formal discussions about a possible computing and data science unit at W&M started in spring 2022 and developed organically, Agouris said, with faculty members initially raising the idea. After an ad hoc design team with representatives from the universitys arts and sciences, business, education, law and marine science was formed to explore the possibilities, its members began conducting research on similar structures at other universities and considering what might make sense for William & Mary.
Faculty leaders from the departments of computer science, applied science and the data science program are now working on drafting a model based on that research. This semester, the model will be refined as feedback is received from various stakeholders, including the Faculty Assembly.
The model and action plan are expected to be finalized in the spring, with a goal of submitting them to the Board of Visitors and the State Council on Higher Education in Virginia in the fall of 2023.
The exploratory effort is part of William & Marys continued work to increase its offerings in the computational sciences as career opportunities and student interest grow.
Currently, the university offers bachelors, masters and doctoral degrees in computer science as well as a computer science minor. In 2020, W&M began offering a bachelors degree in data science, and subsequently created the popular Jump Start Data Science summer program that can lead to an accelerated minor. The Department of Applied Science has a well-established doctoral program that also offers a data science concentration. Applied science also offers an undergraduate minor and masters degree options.
Increasing the number of students with data science and computational skills is also a focus of the federal and state government. In 2019, the university joined the commonwealths Tech Talent initiative, which seeks to increase the number of Virginians with computer science-related degrees. The Tech Talent Investment Program provides funding to participating Virginia universities and colleges to help expand that tech talent pipeline.
While preparing interested students to enter that pipeline is one of the key drivers for exploring a new computing and data science unit at W&M, Agouris said it is all still in the early phases and that the university is doing its due diligence in seeing what might be the best fit for the university.
We want to make sure this makes sense for our university based on the growth we are experiencing, the associated demands, and also what we are hearing from our academic community said Agouris.
Staff, University News & Media
Original post:
W&M explores creation of a computing and data science school - College of William and Mary
How to build an effective DataOps team – TechTarget
A DataOps strategy is heavily reliant on collaboration as data flows between managers and consumers throughout the business. Collaboration is essential to DataOps success, so it's important to start with the right team to drive these initiatives.
It's natural to think of DataOps as simply DevOps for data -- not quite. It would be more accurate to say that DataOps is trying to achieve for data what DevOps achieves for coding: a dramatic improvement in productivity and quality. However, DataOps has some other problems to solve, in particular how to maintain a mission-critical system in continuous production.
The distinction is important when it comes to thinking about putting together a DataOps team. If the DevOps approach is a template, with Product Managers, Scrum Masters and Developers, the focus will end up on delivery. DataOps also needs to focus on continuous maintenance and requires some other frameworks to work with.
One key influence on DataOps has been Lean manufacturing techniques. Managers often use terms taken from the classic Toyota Production System, which has been much studied and imitated. There're also terms like data factory when talk starts about data pipelines in production.
This approach requires a distinctive team structure. Let's first look at some roles within a DataOps team.
The roles described here are for a DataOps team deploying data science in mission-critical production.
What about teams who are less focused on data science? Do they need DataOps, too, for example, for a data warehouse? Certainly, some of the techniques may be similar, but a traditional team of extract, transform and load (ETL) developers and data architects is probably going to work well. A data warehouse, by its nature, is less dynamic and more constant than an Agile pipelined data environment. The following DataOps team roles handle the rather more volatile world of pipelines, algorithms and self-service users.
Nevertheless, DataOps techniques are becoming more relevant as data warehouse teams push to be ever more Agile, especially with cloud deployments and data lakehouse architectures.
Let's start with defining the roles required for these new analytics techniques.
Data scientists do research. If an organization knows what they want and they just need someone to implement a predictive process, then get a developer who knows their way around algorithms. The data scientist, on the other hand, explores for a living, discovering what is relevant and meaningful as they do.
In the course of exploration, a data scientist may try numerous algorithms, often in ensembles of diverse models. They may even write their own algorithms.
The DataOps team can make the difference between an enterprise who occasionally does cool things with data and an enterprise that runs efficiently and reliably on data, analytics and insight.
The key attributes for this role are restless curiosity and an interest in the domain, as well as technical insight -- especially in statistics -- to understand the significance of what they discover and the real-world impact of their work.
This diligence matters. It is not enough to find one good model and stop there because business domains rapidly evolve. Also, while everyone may not work in areas with compelling ethical dilemmas, data scientists in every domain sooner or later come across issues of personal or commercial privacy.
This is a technical role, but don't overlook the human side, especially if the organization is only hiring one data scientist. A good data scientist is a good communicator that is able to explain findings to a nontechnical audience, often executives, while being straightforward about what is and is not possible.
Finally, the data scientist, especially one working in a domain which is new to them, is unlikely to know all the operational data sources -- ERP, CRM, HR systems and so on -- but they certainly need to work with the data. In a well-governed system, they may not have direct access to all the unprocessed data of an enterprise. They need to work with other roles who understand the source systems better.
Generally, it is the data engineer who moves data between operational systems and the data lake -- and, from there, between zones of the lake such as raw data, cleansed and production areas.
The data engineer also supports the data warehouse, which can be a demanding task in itself as they must maintain history for reporting and analysis while providing for continuous development.
At one time, the data engineer may have been called a data warehouse architect or ETL developer, depending on their expertise. But data engineer is the new term of art, and it captures better the operational focus of the role in DataOps.
Another engineer? Yes and one focused on operations. But the DataOps engineer has a different area of expertise: supporting the data scientist.
The data scientist's skills focus on modeling and deriving insight from data. However, it is common to find that what works well on the workbench can be difficult or expensive to deploy into production. Sometimes, an algorithm runs too slowly against a production data set but also uses too much compute or storage to scale effectively. The DataOps engineer helps here by testing, tweaking and maintaining models for production.
As part of this, the DataOps engineer knows how to keep a model scoring accurately enough over time as data drifts. They also know when to retrain the model or reconceptualize it, even if that work falls to the data scientist.
The DataOps engineer keeps models running within budget and resource constraints that they likely understand better than anyone else on the team.
In a modern organization, the data analyst may have a wide range of skills, ranging from technical knowledge to aesthetic understanding of visualization to so-called soft skills, such as collaboration. They are also less likely to have had much technical training compared to, say, a database developer.
Their data ownership -- and influence -- may depend less on where they sit in the organizational hierarchy and more on their personal commitment and their willingness to take ownership of a problem.
These people are in every department. Look around. Someone is "the data person," who, regardless of job title, knows where the data is, how to work with it and how to present it effectively.
To be fair, this role is becoming more formalized today, but there are still a large number of data analysts who have grown into the role from a business rather than technical background.
Is the executive sponsor a member of the team? Perhaps not directly, but the team won't get far without one. A C-level sponsor can be critical for aligning the specific work of a DataOps team with the strategic vision and the tactical decisions of the enterprise. They can also ensure the team has budget and resources with long-term goals in mind.
Few organizations can, or will, immediately stand up a team of four or more just for DataOps. The capabilities and value of team must grow over time.
How, then, should a team grow? Who should be the first hire? It all depends on where the organization is starting from. But there needs to be an executive sponsor from day zero.
It is unlikely the team is starting from scratch. Organizations need DataOps precisely because they already have work in progress that needs to be better operationalized. They may have started to look at DataOps because they have data scientists stretching the boundaries of what they can manage today.
If so, the first hire should be a DataOps engineer because it is their role to operationalize data science and make it manageable, scalable and comprehensive enough to be mission-critical.
On the other hand, it is possible an organization has a traditional data warehouse, and there are data engineers involved and data analysts downstream from them. In this case, the first DataOps team position would be a data scientist for advanced analysis.
An important question is whether to create a formal organization or a virtual team. This is another important reason for the executive sponsor, who may have a lot of say in the answer. Many DataOps teams start as virtual groups who work across organizational boundaries to ensure data and data flow are reliable and trustworthy.
Whether loosely or tightly organized, these discrete disciplines grow in strength and impact over time, and their strategic direction and use of resources will cohere into a consistent framework for exploration and delivery. As this happens, the organization can add more engineering for scale and governance and more scientists and analysts for insight. At this point, wherever the organization started, the team is likely to become more formally organized and recognized.
It's an exciting process. The DataOps team can make the difference between an enterprise that occasionally does cool things with data and an enterprise that runs efficiently and reliably on data, analytics and insight.
Continue reading here: