Category Archives: Data Science

Data Science is Here to Spearhead Organizations Through Tough Competition – Analytics Insight

Data science, as a field, started getting recognition in the early 2000s. But it took an entire pandemic to create the demand it now has. Organizations that were reluctant to embrace digital transformation and use modern technologies like data science are accelerating the rate at which they are adopting this analytical technology. It wont be wrong to say that every business across industries, be it manufacturing, automobile, retail, and pharmaceutical, are leveraging the capabilities of data science to get a competitive edge. This increasing demand is resulting in a flood of data science jobs.

To those who have some knowledge about this field, they are familiar with the fact that data science professionals are of utmost importance to organizations. Data engineers, data analysts, and data scientists are the roles that are flooding job portals. As technology develops with every year, the skills required for data science professionals also vary with time and advancements. For future generations who are going to be a part of dynamic workforces, keeping up with the latest tech trends, in this case data science, is crucial.

From an organizations point of view, data science brings many advantages to the table. Firstly, it helps businesses make better decisions using data-driven approaches. Its a data professionals responsibility to be the trusted advisor to the organizations top management and present the necessary data and metrics that will help the teams make informed decisions. Not only that, data science capabilities will also help businesses predict favorable outcomes and forecast potential growth opportunities.

At the end of the day, the main goal of any organization is to earn profits. A data scientist puts his/her skills to use to explore the available data, analyze what business processes work and dont, and prescribe strategies that will improve overall performance, customer engagement, and result in greater ROI. A data professional will also help employees to understand their tasks, improve on them, and help teams devote their efforts to tasks that will make a substantial difference.

For every company that involves itself with products and services, it is crucial for the company to ensure their solutions reach the right audience. Instead of relying on assumptions, data science helps companies identify the right target audience. With a thorough analysis of the companys data sources and in-depth knowledge about the company and its goals, data science skills assist teams in targeting the right audience and refine existing strategies for better sales. A data professionals knowledge about the dynamic market through data analysis can also help in product innovation.

Before everything else, efficient and skilled employees make or break an organization. Data scientists also help recruiters in sourcing the right profiles from the available talent. Through social media, corporate databases and job portals, data professionals should possess the skills to sieve through the data points and identify the right candidate for the right roles.

With these advantages and many more, data science is an invaluable asset for organizations. Hence, this field is a lucrative career option that the future generation must prepare for, if they want to make their place in the tech industry. In this magazine edition, Analytics Insight is putting a spotlight on the most prominent analytics and data science institutes that are guiding young tech leaders with the right skills to ace the field of data science. With digital transformation becoming an essential part of every established and upcoming business, the demand for data science professionals is only going to grow.

Share This ArticleDo the sharing thingy

About AuthorMore info about author

Analytics Insight is an influential platform dedicated to insights, trends, and opinions from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

Visit link:

Data Science is Here to Spearhead Organizations Through Tough Competition - Analytics Insight

The Biggest Data Science News Items During the First Half of 2021 – Solutions Review

Our editors curated this list of the biggest data science news items during the first half of 2021, as highlighted on Solutions Review.

Data science is one of thefastest-growing fieldsin America. Organizations are employing data scientists at a rapid rate to help them analyze increasingly large and complex data volumes. The proliferation of big data and the need to make sense of it all has created a vortex where all of these things exist together. As a result, new techniques, technologies, and theories are continually being developed to run advanced analysis, and they all require development and programming to ensure a path forward.

Part of Solutions Reviews ongoing analysis of the big data marketplace includes covering the biggest data science news stories which have the greatest impact on enterprise technologists. This is a curated list of the most important data science news stories from the first half of 2021. For more on the space, including the newest product releases, funding rounds and mergers and acquisitions, follow our popular news section.

Databricks raised $1 billion in Series G funding in response to the rapid adoption of its unified data platform, according to a press release. The capital injection, which follows araise of $400 millionin October 2019, puts Databticks at a $28 billion valuation. The round was led by new investor Franklin Templeton with inclusion from Amazon Web Services, CapitalG and Salesforce Ventures. The funding will enable Databricks to move ahead with additional product innovations and scale support for the lakehouse data architecture.

In a media statement, Databricks co-founder and CEO Ali Ghodsi said We see this investment and our continued rapid growth as further validation of our vision for a simple, open and unified data platform that can support all data-driven use cases, from BI to AI. Built on a modern lakehouse architecture in the cloud, Databricks helps organizations eliminate the cost and complexity that is inherent in legacy data architectures so that data teams can collaborate and innovate faster. This lakehouse paradigm is whats fueling our growth, and its great to see how excited our investors are to be a part of it.

OmniSci recentlyannounced the launchof OmniSci Free, a full-featured version of its analytics platform available for use at no cost. OmniSci free will enable users to utilize the full power of the OmniSci Analytics Platform, which includes OmniSciDB, OmniSci Render Engine, OmniSci Immerse, and the OmniSci Data Science Toolkit. The solution can be deployed on Linux-based servers and is generally adequate for datasets of up to 500 million records. Three concurrent users are permitted.

In a media statement on the news, OmniSci co-founder and CEO Todd Mostak said Our mission from the beginning has been to make analytics instant, powerful, and effortless for everyone, and the launch of OmniSci Free is our latest step towards making our platform accessible to an even broader audience. While our open source database has delivered significant value to the community as an ultra-fast OLAP SQL engine, it has become increasingly clear that many use cases heavily benefit from access to the capabilities of our full platform, including its massively scalable visualization and data science capabilities.

DataRobotrecently announcedthe release of DataRobot 7, the latest version of its flagship AI and machine learning platform. The release is highlighted by MLOps remove model challengers which allow customers to challenge production models no matter where they are running and regardless of framework or language in which it was built. Additionally, DataRobot 7 also offers choose your own forecast baseline which lets users compare the output of their forecasting models with predictions from DataRobot Automated Time Series.

In a media statement, DataRobot SVP of Product Nenshad Bardoliwalla said Through ongoing engagement with our customers, weve developed an intimate understanding of the challenges they face, as well as the opportunities they have, with AI. Our latest platform release has been specifically designed to help them seize the transformative power of AI and advance on their journeys to becoming AI-driven enterprises.

Tableau announced the releaseof Tableau 2021.1, the latest version of the companys flagship business intelligence and data analytics offering. The release is highlighted by the introduction of business science, a new class of AI-powered analytics that enables business users to take advantage of data science techniques. Business science is delivered via Einstein Discovery. Other key additions aim to simplify analytics at scale and expand the Tableau ecosystem to help different user personas understand their environment.

In a media statement about the news, Tableau Chief Product Officer Francois Ajenstat said Data science has always been able to solve big problems but too often that power is limited to a few select people within an organization. To build truly data-driven organizations, we need to unlock the power of data for as many people as possible. Democratizing data science will help more people make smarter decisions faster.

Dataiku recently announced the release of Dataiku 9, the latest version of the companys flagship data science and machine learning platform. The release is highlighted by best practice guardrails to prevent common pitfalls, model assertations to capture and test known use cases, what-if analysis to interactively test model sensitivity, and a new model fairness report to augment existing biased detection methods when building responsible AI models. Dataikuraised $100 millionin Series D funding last summer.

The release notes add For business analysts engaged in data preparation tasks, the highly requested fuzzy join recipe makes it easy to join close-but-not-equal columns, an updated formula editor requires less time to learn, and updated date functions simplify time date preparation. It also touts support for the Dash application framework.

Domino Data Lab recently announced a series of new integrated solutions and product enhancements with NVIDIA,according to a press release. The technologies were unveiled at theNVIDIA GTC Conference. Dominos latest is highlighted by Dominos availability for the NetApp ONTAP AI Integrated Solution, which upgrades data science productivity with software that streamlines the workflow while maximizing infrastructure utilization. As such, Domino has been tested and validated to run on the packaged offering and is available via the NVIDIA Partner network.

The new platform automatically creates and manages multi-node clusters and releases them when training is done. Domino currently supports ephemeral clusters using Apache Spark and Ray, and will be adding support for Dask in a product release later in the year. Administrators can also divide a single NVIDIA DGX A100 GPU into multiple instances or partitions to support a variety of users with Dominos support. According to the announcement, this allows 7x the number of data scientists to run a Jupyter notebook attached to a single GPU versus without MIG.

Exploriumrecently announcedthat it has secured $75 million in Series C funding, according to a press release on the companys website. The funding isExploriums second roundin the last nine months and brings the companys total capital raised to more than $125 million since its founding in 2017. Explorium doubled its customer base during the last 16 months.

In a media statement on the news, Explorium CEO Maor Shlomo said As we saw last year, machine learning models and tools for advanced analytics are only as good as the data behind them. And often that data is not sufficient. Were addressing a business-critical need, guiding data scientists and business leaders to the signals that will help them make better predictions and achieve better business outcomes.

Alteryx recentlyannounced product enhancementsacross its product line of data science and analytics tools, as well as the release of Alteryx Machine Learning. The company broke the news at Alteryx Inspire Virtual, its annual user conference. Currently available in early access, Alteryx Machine Learning provides guided, explainable, and fully automated machine learning (AutoML). Key features include feature engineering and deep feature synthesis, automated insight generation, and an Education Mode that offers data science best practices.

In a media statement on the news, Alteryx Chief Product Officer Suresh Vittal said: We are investing deeply in analytics and data science automation in the cloud, starting with Designer Cloud, Alteryx Machine Learning and AI introduced today. We remain focused on being the best at democratizing analytics so millions of people can leverage the power of data.

Tim is Solutions Review's Editorial Director and leads coverage on big data, business intelligence, and data analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in data management and data integration, Tim is a recognized influencer and thought leader in enterprise business software. Reach him via tking at solutionsreview dot com.

More here:

The Biggest Data Science News Items During the First Half of 2021 - Solutions Review

Behind the scenes: A day in the life of a data scientist – TechRepublic

Helping others use data is "like giving them a superpower," says the senior data scientist at an ag-tech startup, Plenty.

Data Scientist Dana Seidel at work.

Image: Dana Seidel

Dana Seidel was "traipsing around rural Alberta, following herds of elk," trying to figure out their movement patterns, what they ate, what brought them back to the same spot, when she had an epiphany: Data could help answer these questions.

SEE: Snowflake data warehouse platform: A cheat sheet (free PDF) (TechRepublic)

At the time, enrolled in a master's program at the University of Alberta, she was interested in tracking the movement of deer and elk and other central foragers. Seidel realized that she could use her math and ecology background at Cornell University to help evaluate a model that could answer these questions. She continued her studies, earning a Ph.D. at University of California Berkeley related to animal movement and the spread of diseaseswhich she monitored, in part, by collecting data from collars. Kind of like a Fitbit, Seidel explained, "tracking wherever you go throughout the day," yielding GPS data points that could connect to land data, such as satellite images, offering a window into the movement of this wildlife.

Seidel, 31, has since transitioned from academia to the startup world, working as the lead data scientist at Plenty, an indoor vertical farming company. Or as she would call herself a "data scientist who is interested in spatial-temporal time series data."

SEE: Behind the scenes: A day in the life of a freelance JavaScript developer (TechRepublic)

Seidel was born in Tennessee, but grew up in Kansas. She's 31, which she said is "old" for the startup world. As someone who spent her twenties "investing in one career path and then switching over," she doesn't necessarily have the same industry experience as her colleagues. So while she is grateful for her experience, a degree is not a necessity, she said.

"I'm not sure that my Ph.D. helps me in my current job," she said. One area where it did help her, however, was by giving her access to internshipsat Google Maps, in Quantitative Analysts and RStudiowhere she gained experience in software development.

"But I don't think writing more papers about anthrax and zebras really convinced anybody that I was a data scientist," she said.

Seidel learned the programming language R, which she loved, in college, and in her master's program started building databases. She said she "generally taught myself alongside these courses to use the tools." The biggest skill of being a data scientist "may very well just be knowing how to Google things," she said. "That's all coding really is, creative problem-solving."

SEE: Job description: Chief data officer (TechRepublic Premium)

The field of data science is about a decade old, Seidel saidpreviously, it was statistics. "The idea of having somebody who has a statistics background or understands inferential modeling or machine learning has existed for a lot longer than we've called it a data scientist," she said, and a master's in data science didn't exist until the last year of her Ph.D.

Additionally, "data scientist" is very broad. Among data scientists, many different jobs can exist. "There are data scientists that focus very much on advanced analytics. Some data scientists only do natural language processing," she said. And the work emcompasses many diverse skills, she said, including "project management skills, data skills, analysis skills, critical thinking skills."

Seidel has mentored others interested in getting into the field, starting with a weekly Women in Machine Learning and Data Science coffee hour at Berkeley. The first piece of advice? "I would tell them: 'You have skills,'" Seidel said. Many young students, especially women, don't realize how much they already know. "I don't think we communicate often to ourselves in a positive way, all of the things we know how to do, and how that might translate," she said.

For those interested in transitioning from academia to industry, she also advises getting experience in software development and best practices, which may have been missing from formal education. "If you understand things like standard industry practices, like version control and git and bash scripting a little bit so that you have some of that language, some of that knowledge, you can be a more effective collaborator." Seidel also recommends learning SQLone of the easiest languages, in her opinionwhich she calls "the lingua franca of data analytics and data science. Even though I think it's something you can absolutely learn on the job, it's going to be the main way you access data if you're working in an industry data science team. They're going to have large databases with data and you need a way to communicate that," she said. She also recommends building skills, through things like the 25-day Advent of Code, and other ways to demonstrate a clean coding style. "What takes a good amount of legwork, and until you have your industry job, it's unpaid legwork, but it can really help make you stand out," she said.

SEE: Top 5 things you need to know about data science (TechRepublic)

On a typical morning at her current job, working from home, Seidel is drinking coffee and answering Slack messages in her home office/ quilting studio. She checks to see if there are questions about the data, something wrong with the dashboard, or a question about plant health. Software engineers working on the data may also have questions, she said. There's often a scrum meeting in the morning, and they operate with sprint teams (meeting every two weeks) and agile workflows.

"I have a pretty unique position where I can float between various data scrums we do, we have a farm performance scrum versus a perception team or a data infrastructure team," Seidel explained. "I can decide: What am I going to contribute to in this sprint?" Twice a week there's a leadership meeting, where she is on the software and data leads, and she can listen in on what else is being worked on, and what's coming up ahead, which she said is one of the most important meetings for her, since she can hear directly "when a change is happening on the software side or there's a new requirement coming out of ops for a software or for software or for data that's coming."

In the afternoon, she has a good block of development time, "to dig into whatever issue I'm working on that sprint," she said.

SEE: How to become a data scientist: A cheat sheet (TechRepublic)

Seidel manages the data warehouse and ensures data streams are "being surfaced to end users in core data models." Last week, she worked on the farm performance scrum, "validating measurements that are coming out of the farm, thinking ahead about the new measurements we need to be collecting, and thinking about the measurements that we have in our south San Francisco farm, measurements streaming in from a couple of thousand devices." She needs to ensure accurate measurement streams, which come from everything from the temperature to irrigation, to ensure plant health, and answer questions like: "Why did last week's arugula do better than this week's arugula?"

The primary task is to know if they're measuring the right thing, and to push back and say, "Oh, OK, what is it that you want that data to be explaining? What is the question you're asking?" She needs to stay a few steps ahead, she said, and ask: "What are all the new data sources that I need to be aware of that we need to be supporting?"

The toughest part of the job? "I really hate not having the answer. I hate having to say, "No, we don't measure that thing yet." Or, "We'll have that in the next sprint." Balancing giving people the answers with giving them tools to access the answers themselves is a daily challenge, she said, with the ultimate goal of making data accessible.

And saying, "Oh, yes, that data is there and it's this simple query," or, "Oh, have you seen this tool I built a year ago that can solve this problem?" is really gratifying.

"Helping someone learn how to ask and answer questions from data is like giving them a superpower," Seidel said.

Learn the latest news and best practices about data science, big data analytics, and artificial intelligence. Delivered Mondays

The rest is here:

Behind the scenes: A day in the life of a data scientist - TechRepublic

Scaling AI and data science 10 smart ways to move from pilot to production – VentureBeat

Presented by Intel

Fantastic! How fast can we scale? Perhaps youve been fortunate enough to hear or ask that question about a new AI project in your organization. Or maybe an initial AI initiative has already reached production, but others are needed quickly.

At this key early stage of AI growth, enterprises and the industry face a bigger, related question: How do we scale our organizational ability to develop and deploy AI? Business and technology leaders must ask: Whats needed to advance AI (and by extension, data science) beyond the craft stage, to large-scale production that is fast, reliable, and economical?

The answers are crucial to realizing ROI, delivering on the vision of AI everywhere, and helping the technology mature and propagate over the next five years.

Unfortunately, scaling AI is not a new challenge. Three years ago, Gartner estimated that less than 50% of AI models make it to production. The latest message was depressingly similar. Launching pilots is deceptively easy, analysts noted, but deploying them into production is notoriously challenging. A McKinsey global survey agreed, concluding: Achieving (AI) impact at scale is still very elusive for many companies.

Clearly, a more effective approach is needed to extract value from the $327.5 billion that organizations are forecast to invest in AI this year.

As the scale and diversity of data continues to grow exponentially, data science and data scientists are increasingly pivotal to manage and interpret that data. However, the diversity of AI workflows means that the data scientists need expertise across a wide variety of tools, languages, and frameworks that focus on data management, analytics modeling and deployment, and business analysis. There is also increased variety in the best hardware architectures to process the different types of data.

Intel helps data scientists and developers operate in this wild wild West landscape of diverse hardware architectures, software tools, and workflow combinations. The company believes the keys to scaling AI and data science are an end-to-end AI software ecosystem built on the foundation of the open, standards-based, interoperable oneAPI programming model, coupled with an extensible, heterogeneous AI compute infrastructure.

AI is not isolated, says Heidi Pan, senior director of data analytics software at Intel. To get to market quickly, you need to grow AI with your application and data infrastructure. You need the right software to harness all of your compute.

She continues, Right now, however, there are lots of silos of software out there, and very little interoperability, very little plug and play. So users have to spend a lot of their time cobbling multiple things together. For example, looking across the data pipeline; there are many different data formats, libraries that dont work with each other, and workflows that cant operate across multiple devices. With the right compute, software stack, and data integration, everything can work seamlessly together for exponential growth.

Creation of an end-to-end AI production infrastructure is an ongoing, long-term effort. But here are 10 things enterprises can do right now that can deliver immediate benefits. Most importantly, theyll help unclog bottlenecks with data scientists and data, while laying the foundations for stable, repeatable AI operations.

Consider the following from Rise Labs at UC Berkeley. Data scientists, they note, prefer familiar tools in the Python data stack: pandas, scikit-learn, NumPy, PyTorch, etc. However, these tools are often unsuited to parallel processing or terabytes of data. So should you adopt new tools to make the software stack and APIs scalable? Definitely not!, says Rise. They calculate that it would take up to 200 years to recoup the upfront cost of learning a new tool, even if it performs 10x faster.

These astronomical estimates illustrate why modernizing and adapting familiar tools are much smarter ways to solve data scientists critical AI scaling problems. Intels work through the Python Data API Consortium, the modernizing of Python via numbas parallel compilation and Modins scalable data frames, Intel Distribution of Python, or upstreaming of optimizations into popular deep learning frameworks such as TensorFlow, PyTorch, and MXNet and gradient boosting frameworks such as xgboost and catboost are all examples of Intel helping data scientists get productivity gains by maintaining familiar workflows.

Hardware AI accelerators such as GPUs and specialized ASICs can deliver impressive performance improvements. But software ultimately determines the real-world performance of computing platforms. Software AI accelerators, performance improvements that can be achieved through software optimizations for the same hardware configuration, can enable large performance gains for AI across deep learning, classical machine learning, and graph analytics. This orders of magnitude software AI acceleration is crucial to fielding AI applications with adequate accuracy and acceptable latency and is key to enabling AI Everywhere.

Intel optimizations can deliver drop-in 10-to-100x performance improvements for popular frameworks and libraries in deep learning, machine learning, and big data analytics. These gains translate into meeting real-time inference latency requirements, running more experimentation to yield better accuracy, cost-effective training with commodity hardware, and a variety of other benefits.

Below are example training and inference speedups with Intel Extension for Scikit-learn, the most widely used package for data science and machine learning. Note that accelerations ranging up to 322x for training and 4,859x for inference are possible just by adding a couple of lines of code!

Figure 1. Training speedup with Intel Extension for Scikit-learn over the original package

Figure 2. Inference speedup with Intel Extension for Scikit-learn over the original package

Data scientists spend a lot of time trying to cull and downsize data sets for feature engineering and models in order to get started quickly despite the constraints of local compute. But not only do the features and models not always hold up with data scaling, they also introduce a potential source of human ad hoc selection bias and probable explainability issues.

New cost-effective persistent memory makes it possible to work on huge, terabyte-sized data sets and bring them quickly into production. This helps with speed, explainability, and accuracy that come from being able to refer back to a rigorous training process with the entire data set.

While CPUs and the vast applicability of their general-purpose computing capabilities are central to any AI strategy, a strategic mix of XPUs (GPUs, FPGAs, and other specialized accelerators) can meet the specific processing needs of todays diverse AI workloads.

The AI hardware space is changing very rapidly, Pan says, with different architectures running increasingly specialized algorithms. If you look at computer vision versus a recommendation system versus natural language processing, the ideal mix of compute is different, which means that what it needs from software and hardware is going to be different.

While using a heterogeneous mix of architectures has its benefits, youll want to eliminate the need to work with separate code bases, multiple programming languages, and different tools and workflows. According to Pan, the ability to reuse code across multiple heterogeneous platforms is crucial in todays dynamic AI landscape.

Central to this is oneAPI, a cross-industry unified programming model that delivers a common developer experience across diverse hardware architectures. Intels Data Science and AI tools such as the Intel oneAPI AI Analytics Toolkit and the Intel Distribution of OpenVINO toolkit are built on the foundation of oneAPI and deliver hardware and software interoperability across the end to end data pipeline.

Figure 3. Intel AI Software Tools

The ubiquitous nature of laptops and desktops make them a vast untapped data analytics resource. When you make it fast enough and easy enough to instantaneously iterate on large data sets, you can bring that data directly to the domain experts and decision makers without having to go indirectly through multiple teams.

OmniSci and Intel have partnered on an accelerated analytics platform that uses the untapped power of CPUs to process and render massive volumes of data at millisecond speeds. This allows data scientists and others to analyze and visualize complex data records at scale using just their laptops or desktops. This kind of direct, real-time decision making can cut down time to insight from weeks to days, according to Pan, further speeding production.

AI development often starts with prototyping on a local machine but invariably needs to be scaled out to a production data pipeline on the data center or cloud due to expanding scope. This scale out process is typically a huge and complex undertaking, and can often lead to code rewrites, data duplication, fragmented workflow, and poor scalability in the real world.

The Intel AI software stack lets one scale out their development and deployment seamlessly from edge and IOT devices to workstations and servers to supercomputers and the cloud. Explains Pan: You make your software thats traditionally run on small machines and small data sets to run on multiple machines and Big Data sets, and replicate your entire pipeline environments remotely. Open source tools such as Analytics Zoo and Modin can move AI from experimentation on laptops to scaled-out production.

Throwing bodies at the production problem is not an option. The U.S. Bureau of Labor Statistics predicts that roughly 11.5 million new data science jobs will be created by 2026, a 28% increase, with a mean annual wage of $103,000. While many training programs are full, competition for talent remains fierce. As the Rise Institute notes: Trading human time for machine timeis the most effective way to ensure that data scientists are not productive. In other words, its smarter to drive AI production with cheaper computers rather than expensive people.

Intels suite of AI tools place a premium on developer productivity while also providing resources for seamless scaling with extra machines.

For some enterprises, growing AI capabilities out of their existing data infrastructure is a smart way to go. Doing so can be the easiest way to build out AI because it takes advantage of data governance and other systems already in place.

Intel has worked with partners such as Oracle to provide the plumbing to help enterprises incorporate AI into their data workflow. Oracle Cloud Infrastructure Data Science environment, which includes and supports several Intel optimizations, helps data scientists rapidly build, train, deploy, and manage machine learning models.

Intels Pan points to Burger King as a great example of leveraging existing Big Data infrastructure to quickly scale AI. The fast food chain recently collaborated with Intel to create an end-to-end, unified analytics/AI recommendation pipeline and rolled out a new AI-based touchscreen menu system across 1,000 pilot locations. A key: Analytics Zoo, a unified big data analytics platform that allows seamless scaling of AI models to big data clusters with thousands of nodes for distributed training or inference.

It can take a lot of time and resources to create AI from scratch. Opting for the fast-growing number of turnkey or customized vertical solutions on your current infrastructure makes it possible to unleash valuable insights faster and at lower cost than before.

The Intel Solutions Marketplace and AI builders program offer a rich catalog of over 200 turnkey and customized AI solutions and services that span from edge to cloud. They deliver optimized performance, accelerate time to solution, and lower costs.

The District of Columbia Water and Sewer Authority (DC Water), worked with Intel partner Wipro to develop Pipe Sleuth, an AI solution that uses deep learning- based computer vision to automate real-time analysis of video footage of the pipes. Pipe Sleuth was optimized for the Intel Distribution of OpenVINO toolkit and Intel Core i5, Intel Core i7 and Intel Xeon Scalable processors, and provided DC water with a highly efficient and accurate way to inspect their underground pipes for possible damage.

Open and interoperable standards are essential to deal with the ever-growing number of data sources and models. Different organizations and business groups will bring their own data and data scientists solving for disparate business objectives will need to bring their own models. Therefore, no single closed software ecosystem can ever be broad enough or future-proof to be the right choice.

As a founding member of the Python Data API consortium, Intel works closely with the community to establish standard data types that interoperate across the data pipeline and heterogeneous hardware, and foundational APIs that span across use cases, frameworks, and compute.

An open, interoperable, and extensible AI Compute platform helps solve todays bottlenecks in talent and data while laying the foundation for the ecosystem of tomorrow. As AI continues to pervade across domains and workloads, and new frontiers emerge, the need for end-to-end data science and AI pipelines that work well with external workflows and components is immense. Industry and community partnerships that build open, interoperable compute and software infrastructures are crucial to a brighter, scalable AI future for everyone.

Learn More: Intel AI, Intel AI on Medium

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and theyre always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contactsales@venturebeat.com.

See the article here:

Scaling AI and data science 10 smart ways to move from pilot to production - VentureBeat

Beware the 1% view of data science – ComputerWeekly.com

This is a guest blogpost by Shaun McGirr, AI Evangelist, Dataiku

As data science and AI become more widely used, two separate avenues of innovation are becoming clear. One avenue, written about and discussed publicly by individuals working at Google, Facebook and peer companies, depends on access to effectively infinite resources.

This generates a problem for further democratisation of AI: success stories told by the top echelon of data companies drown out the second avenue of innovation. There, smaller-scale data teams deliver stellar work in their own right, without the benefit of unlimited resources, and also need a share of the glory.

One thing is certain: a whole class of legacy IT issues dont plague global technology companies at anywhere near the scale of traditional enterprises. Some even staff entire data engineering teams to deliver ready-for-machine-learning data to data scientists, which is enough to make the other 99% of data scientists in the world salivate with envy.

Access to the right data, in a reasonable time frame, is still a top barrier to success for most data scientists in traditional companies, and so the 1% served by dedicated data engineering teams might as well be from another planet!

Proudly analogue companies need to go on their own data journey on their own terms, said Henrik Gthberg, Founder and CEO of Dairdux, on the AI After Dark podcast. This highlights that what is right and good for the 1% of data scientists working at internet giants is unlikely to work for those having to innovate from the ground up, with limited resources. This 99% of data scientists must extract data, experiment, iterate and productionise all by themselves, often with inadequate tooling they must stitch together themselves based on the research projects of the 1%.

For example, one European retailer spent many months developing machine learning models written in Python (.py files) and run on the data scientists local machines. But eventually, the organisation needed a way to prevent interruptions or failure of the machine learning deployments.

As a first solution, they moved these .py files to Google Cloud Platform (GCP), and the outcome was well received by the business and technical teams in the organisation. However, once the number of models in production went from one to three and more, the team quickly realized the burden involved in maintaining models. There were too many disconnected datasets and Python files running on the virtual machine, and the team had no way to check or stop the machine learning pipeline.

Beyond these data scientists doing the hard yards to create value in traditional organisations, there is also the latent data population capable but hidden away who have real-world problems to solve but who are even further from being able to directly leverage the latest innovations. If these people can be empowered to create even a fraction of the value of the 1% of data scientists, their sheer number would mean the total value created for organisations and society would massively outweigh the latest technical innovations.

Achieving this massive scale, across many smaller victories, is the real value of data science to almost every individual and company.

Organisations dont need to be a Facebook to get started on an innovative and advanced data science or AI project. There is still a whole chunk of the data science world (and its respective innovations) that is going unseen, and its time to give this second avenue of innovation its due.

Go here to read the rest:

Beware the 1% view of data science - ComputerWeekly.com

Top Data Science Jobs to Apply for this Weekend – Analytics Insight

Analytics Insight has selected the top data science jobs for applying this weekend.

Data science is an essential part of any industry today, given the massive amounts of data that are produced. Data science is one of the most debated topics in the industry these days. Its popularity has grown over the years, and companies have started implementing data science techniques to grow their business and increase customer satisfaction.

Location: Bengaluru, Karnataka

Human-led and tech-empowered since 2002, Walmart global tech delivers innovative solutions to the biggest retailer in the world Walmart. By leveraging emerging technologies, they create omnichannel shopping experiences for their customers, across the globe and help them save money and live better. The company is looking for an IN4 Data Scientist for Ad-tech. The position requires skills in building Data Science models for online advertising.

Apply here.

Location: India

Position Objective

The purpose of this role is to partner with the Regional and global BI customers within RSR (who can include but are not limited to Data engineering, BI Support Teams, Operational Teams, Internal teams, and RSR clients) and provide business solutions through data. This position has operational and technical responsibility for reporting, analytics, and visualization dashboards across all operating companies within RSR. This position will develop processes and strategies to consolidate, automate, and improve reporting and dashboards for external clients and internal stakeholders. As a Business Intelligence Partner you will be responsible for overseeing the end-to-end delivery of regional and global account and client BI reporting. This will include working with data engineering to provide usable datasets, create dashboards with meaningful insights & visualizations within our BI solution (DOMO), and ongoing communication and partnering with the BI consumers. The key to this is as a Business Intelligence Partner you will have commercial and operational expertise with the ability to translate data into insights. You will use this to mitigate risk, find operational and revenue-generating opportunities and provide business solutions.

Apply here.

Location: Hyderabad, Telangana

As the Data Scientist role within the Global Shared Services and Office of Transformation group at Salesforce, you will work cross-functionally with business stakeholders throughout the organization to drive data-driven decisions. This individual must excel in data and statistical analysis, predictive modeling, process optimization, building relationships in business and IT functions, problem-solving, and communication. S/he must act independently and own the implementation and impacts of assigned projects, and demonstrate the ability to be successful in an unstructured, team-oriented environment. The ideal candidate will have experience working with large, complex data sets, experience in the technology industry, exceptional analytical skills, and experience in developing technical solutions.

Responsibilities

Partner with Shared Services Stakeholder organizations to understand their business needs and utilize advanced analytics to derive actionable insights

Find creative solutions to challenging problems using a blend of business context, statistical and ML techniques

Understand data infrastructure and validate data is cleansed and accurate for reporting requirements.

Work closely with the Business Intelligence team to derive data patterns/trends and create statistical models for predictive and scenario analytics

Communicate insights utilizing Salesforce data visualization tools (Tableau CRM and Tableau) and make business recommendations (cost-benefit, invest-divest, forecasting, impact analysis) with effective presentations of findings at multiple levels of stakeholders through visual displays of quantitative information

Partner cross-functionally with other business application owners on streamlining and automating reporting methods for Shared Services management and stakeholders.

Support the global business intelligence agenda and processes to make sure we provide consistent and accurate data across the organization

Collaborate with cross-functional stakeholders to understand their business needs, formulate a roadmap of project activity that leads to measurable improvement in business performance metrics/key performance indicators (KPIs) over time.

Apply here.

Job Description:

Gathers data, analyses, and reports findings. Gathers data using existing formats and will suggest changes to these formats. Resolves disputes and acts as an SME, first escalation level.

Conducts analyses to solve repetitive or patterned information and data queries/problems.

Works within a variety of well-defined procedures and practices. Supervised progress and results; inform management about analysis outcomes. Works autonomously within this scope, with regular steer required e.g., on project scope and prioritization.

Supports stakeholders in understanding analyses/outcomes and using them on a topic related to own their areas of expertise. Interaction with others demands influencing and persuasion in a tactful manner to explain and advise on performed analyses of information.

Job holder identifies shortcomings in current processes, systems, and procedures within the assigned unit and suggests improvements. Analyses propose and (where possible) implements alternatives.

Apply here.

Responsibilities

Develop analytical models to estimate annual, monthly, daily platform returns and other key metrics and weekly tracking of AOP vs actual returns performance.

Monitor key OKR metrics across the organization for all departments. Work closely with BI teams to maintain OKR dashboards across the organization

Work with Business teams (Rev, Mktg, category), etc. on preliminary hypothesis evaluation on returns leakages/inefficiencies in the system (category, rev & pricing constructs, etc.)

Regular analysis and experimentation to find areas of improvement in returns maintaining a highly data-backed approach Maintain monthly reporting and track of the SNOP process

Influence various teams/stakeholders within the organization to meet goals & planning timelines.

Qualifications & Experience

B Tech/BE in Computer Science or equivalent from a tier 1 college with 1-3 years of experience.

Problem-solving skills the ability to break a problem down into smaller parts and develop a solution approach with an appreciation for Math and Business.

Strong analytical bent of mind with strong communication/persuasion skills.

Demonstrated ability to work independently in a highly demanding and ambiguous environment.

Strong attention to detail and exceptional organizational skills.

Strong knowledge of SQL, advanced Excel, R

Apply here.

The rest is here:

Top Data Science Jobs to Apply for this Weekend - Analytics Insight

Thickness and structure of the martian crust from InSight seismic data – Science Magazine

Single seismometer structure

Because of the lack of direct seismic observations, the interior structure of Mars has been a mystery. Khan et al., Knapmeyer-Endrun et al., and Sthler et al. used recently detected marsquakes from the seismometer deployed during the InSight mission to map the interior of Mars (see the Perspective by Cottaar and Koelemeijer). Mars likely has a 24- to 72-kilometer-thick crust with a very deep lithosphere close to 500 kilometers. Similar to the Earth, a low-velocity layer probably exists beneath the lithosphere. The crust of Mars is likely highly enriched in radioactive elements that help to heat this layer at the expense of the interior. The core of Mars is liquid and large, 1830 kilometers, which means that the mantle has only one rocky layer rather than two like the Earth has. These results provide a preliminary structure of Mars that helps to constrain the different theories explaining the chemistry and internal dynamics of the planet.

Science, abf2966, abf8966, abi7730, this issue p. 434, p. 438, p. 443 see also abj8914, p. 388

A planets crust bears witness to the history of planetary formation and evolution, but for Mars, no absolute measurement of crustal thickness has been available. Here, we determine the structure of the crust beneath the InSight landing site on Mars using both marsquake recordings and the ambient wavefield. By analyzing seismic phases that are reflected and converted at subsurface interfaces, we find that the observations are consistent with models with at least two and possibly three interfaces. If the second interface is the boundary of the crust, the thickness is 20 5 kilometers, whereas if the third interface is the boundary, the thickness is 39 8 kilometers. Global maps of gravity and topography allow extrapolation of this point measurement to the whole planet, showing that the average thickness of the martian crust lies between 24 and 72 kilometers. Independent bulk composition and geodynamic constraints show that the thicker model is consistent with the abundances of crustal heat-producing elements observed for the shallow surface, whereas the thinner model requires greater concentration at depth.

Read the original post:

Thickness and structure of the martian crust from InSight seismic data - Science Magazine

The future of data science and risk management – Information Age

This article will explore what the possible future of data science capabilities for risk management could entail

What will the future of data science for risk management hold?

Data science has been vital in enhancing risk management operations in recent times. With cyber attacks, including phishing and ransomware, on the rise since the Covid-19 pandemic took hold, managing and mitigating the effects of such incidents, with the aid of network visibility, is key to business continuity. Additionally, there are IT outages and insider threats to contend with, which also require a strong risk management strategy.

In this article, we explore how the future of data sciences role in risk management initiatives will take shape.

With incidents that can bring operations to a stand-still becoming more diverse, its vital that those risk management measures are as agile as possible to avoid being caught out. Data science can help businesses to better analyse short-term and long-term trends, and respond to possible risks and disruption quickly, and this is set to be focused on more going forward.

Whether in marketing, sales, demand, pricing or operations, the key to risk management is not only in spotting the potential risks, but in understanding their likelihood, scale and impact and then reacting accordingly, said Matt Andrew, partner & UK managing director of Ekimetrics.

In retail, for example, weve seen the impact of not having a thorough enough understanding of market, category and consumer trends and risks with mitigations in place soon enough to react in the face of a market-changing pandemic. For the likes of Arcadia Group and Debenhams, factors such as the high cost of brick and mortar stores and a failing offer, including poor e-commerce, became increasingly impossible to deal with. Those that had already begun to invest in this area of data science will have had a better chance to regroup quickly and make better decisions, from big pivots to the ability to capitalise on micro opportunities.

By understanding the potential range of outcomes and how they interact through data analytics, businesses can support greater agility in their decision-making about where and how to invest, and help to future-proof against other risks that are yet to emerge.

We gauged the perspectives of experts in data science, asking them about the biggest emerging trends in data science. Read here

A key aspect of data science that has a bright future is automation. This decreases strain on data scientists while speeding up processes, and when it comes to mitigating risks, automation can minimise errors when it comes to data reconciliation the movement and alignment of critical company data between systems.

Douggie Melville-Clarke, head of data science at Duco, explained: As businesses move towards making more data-first decisions, the emphasis on data automation is growing, with companies automating as much of the data reconciliation process as possible to speed up process, help businesses scale and crucially mitigate risk.

Data reconciliation has traditionally cost financial firms significant sums of money through man hours and regulatory fines. Automation takes away the human error element from data reconciliation. Manual tasks can often become tedious to a human brain leaving room for error, but a computer cant get bored or show up to work tired. Its consistent. And this consistency is crucial when dealing with large datasets.

Repeatable tasks can be delegated to a computer to handle more efficiently and with a lower error rate freeing up the workforce to do jobs that add more value to the business, such as new product offering or adapting to regulatory changes.

Data automation platforms also enable businesses to get a full view of the data transformation process, end to end. Through automated data lineage, businesses can track the cleansing and manipulation processes the data undergoes, giving them a holistic view of the data in a structured way, as opposed to an unstructured one. This aids with error spotting and reporting, both internally and to regulatory boards.

According to Trevor Morgan, product manager at comforte AG, the value-add that data science is set to bring to risk management in the near future is two-fold: the ability to manage more data in one go, and looking to the future rather than past events.

Enterprise data is growing nearly exponentially, and it is also increasing in complexity in terms of data types, said Morgan.

We have gone way past the time when humans could sift through this amount of data in order to see large-scale trends and derive actionable insights. The platforms and best practices of data science and data analytics incorporate technologies which automate the analytics workflows to a large extent, making dataset size and complexity much easier to tackle with far less effort than in years past.

The second value-add is to leverage machine learning, and ultimately artificial intelligence, to go beyond historical and near-real-time trend analysis and look into the future, so to speak. Predictive analysis can unveil new customer needs for products and services and then forecast consumer reactions to resultant offers. Equally, predictive analytics can help uncover latent anomalies that lead to much better predictions about fraud detection and potentially risky behaviour.

Nothing can foretell the future with 100% certainty, but the ability of modern data science to provide scary-smart predictive analysis goes well beyond what an army of humans could do manually.

Gartner has forecasted that security and risk management spending worldwide will grow 12.4% to reach $150.4 billion in 2021

While AI has demonstrated the capability of helping to increase the agility of organisations decision making, there is also the matter of higher regulation of the technology to consider, with legislation in the EU being a notable example. To stay compliant, risk management aided by data science is likely to be the way forward.

Data science and risk management professionals will work hand in hand to ensure risk and governance procedures are at a high standard, said Theresa Bercich, director of product strategy and principal data scientist at Lucinity.

AI compliance will be more regulated, as evidenced by the EU creating legislation around this topic. This means that new job titles, positions and people will join the world of AI (which has already started), that will create frameworks for governance and risk.

The power of AI and the demand for its value proposition is driving significant changes in the technology space including the breakdown of traditional silos and the development of intelligent software deploying data in a productive manner.

Data science trends in healthcare as identified by experts in the field

Data science trends in banking leveraging data science capabilities in order to accelerate operations and increase flexibility

How to embark on a data science career the key factors to consider.

More:

The future of data science and risk management - Information Age

How to get clinicians onboard with predictive analytics – Healthcare IT News

Healthcare has higher barriers to adopting data science than other industries. State-of-the-art analytics solutions are already available, but few of them are in use by clinicians.

At University of Virginia Health System, health leaders worked to establish a culture of data-driven decision-making with clinicians, with data science guides clinicians in finding opportunities for improvement, designing and implementing interventions, and evaluating impacts.

Bommae Kim, senior data scientist at Hackensack Meridian Health and until last year with UVA Health, also as a senior data scientist said a key challenge to wider adoption is lack of interest.

"Due to their disinterest or ambivalence to data science, it may be difficult to find opportunities to work with clinicians to begin with," she said.

Kim, who along with Dr. Jonathan Michel, director of data science at University of Virginia Health, will speak on the topic next month at HIMSS21. She said a lack of trust and a lack of understanding are two other challenges to adoption of analytics solutions

"Clinicians may disagree with analytics results due to lack of trust in data science," she said. "It may also be challenging to introduce advanced analytics due to the level of data literacy."

She explained the key opportunities for clinicians adopting data science depends on the analytics maturity and executive leadership support at the organization.

"Of the multiple aspects to consider, I'd like to point out actionability in finding opportunities," said Kim. "Unless strong clinician support is already in place, it would be extremely challenging to succeed in purely clinical topics, for example sepsis."

She noted those clinical topics are certainly important to any health system but may not be readily actionable for many reasons.

On the other hand, Kim noted some topics are highly relevant to clinicians yet not purely clinical--LOS and readmissions, for instance.

"Their causes and interventions are not necessarily clinical, unlike sepsis, and clinicians seem more open to data scientists' suggestions in less-clinical domains," she said. "I would consider them more actionable topics. Once a strong relationship is built with clinicians, it'll be easier to move to more clinical domains with their support."

She explained UVA Health Data Science often engages with clinicians by presenting data analysis about their patients and workflows as for their project or interest. Such sessions naturally lead clinicians to data-driven decision making.

"Through such engagement, we built trust and improved data literacy among clinicians," said Kim.

"Moreover, in the process data scientists learned what clinicians truly want and need. What they ask for may not be what they truly want or need. With improved clinician trust and data literacy and a better understanding of clinician needs, we were able to move toward more advanced analytics."

Jonathan Michel and Bommae Kim will address the use of data science among clinicians at HIMSS21 in a session titled "Making Prescriptive Analytics Work for Clinicians." It's scheduled for Thursday, August 12 from 1-2 p.m. in room Wynn Lafite 2

Visit link:

How to get clinicians onboard with predictive analytics - Healthcare IT News

How NASA is using knowledge graphs to find talent – VentureBeat

All the sessions from Transform 2021 are available on-demand now. Watch now.

One of NASAs biggest challenges is identifying where data science skills reside within the organization. Not only is data science a new discipline its also a fast-evolving one. Knowledge for each role is constantly shifting due to technological and business demands.

Thats where David Meza, acting branch chief of people analytics and senior data scientist at NASA, believes graph technology can help. His team is building a talent mapping database using Neo4j technology to build a knowledge graph to show the relationships between people, skills, and projects.

Meza and his team are currently working on the implementation phase of the project. They eventually plan to formalize the end user application and create an interface to help people in NASA search for talent and job opportunities. Meza told VentureBeat more about the project.

VentureBeat: Whats the broad aim of this data led project?

David Meza: Its about taking a look at how we can identify the skills, knowledge and abilities, tasks, and technology within an occupation or a work role. How do we translate that to an employee? How do we connect it to their training? And how do we connect that back to projects and programs? All of that work is a relationship issue that can be connected via certain elements that associate all of them together and thats where the graph comes in.

VentureBeat: Why did you decide to go with Neo4j rather than develop internally?

Meza: I think there was really nothing out there that provided what we were looking for, so thats part of it. The other part of the process is that we have specific information that were looking for. Its not very general. And so we needed to build something that was more geared towards our concepts, our thoughts, and our needs for very specific things that we do at NASA around spaceflights, operations, and things like that.

VentureBeat: Whats the timeline for the introduction of Neo4j?

Meza: Were still in the implementation phase. The first six to eight months was about research and development and making sure we had the right access to the data. Like any other project, thats probably our most difficult task making sure we have the right access, the right information and thinking about how everything is related. While we were looking at that, we also worked in parallel on other issues: whats the model going to look like, what algorithms are we going to use, and how are we going to train these models? Weve got the data in the graph system now and were starting to produce a beta phase of an application. This summer through the end of the year, were looking towards formalizing that application to make it more of an interface that an end user can use.

VentureBeat: Whats been the technical process behind the implementation of Neo4j?

Meza: The first part was trying to think about whats going to be our occupational taxonomy. We looked at: How do we identify an occupation? What is the DNA of an occupation? And similarly, we looked at that from an employee perspective, from a training perspective, and from a program or project perspective. So simply put, we broke everything down into three different categories for each occupation: a piece of knowledge, a skill, and a task.

VentureBeat: How are you using those categories to build a data model?

Meza: If you can start identifying people that have great knowledge in natural language processing, for example, and the skills they need to do a task, then from an occupation standpoint you can say that specific workers need particular skills and abilities. Fortunately, theres a database from the Department of Labor called O*NET, which has details on hundreds of occupations and their elements. Those elements consist of knowledge, skills, abilities, tasks, workforce characteristics, licensing, and education. So that was the basis for our Neo4j graph database. We then did the same thing with training. Within training, youre going to learn a piece of knowledge; to learn that piece of knowledge, youre going to get a skill; and to get that skill, youre going to do exercises or tasks to get proficient in those skills. And its similar for programs: we can connect back to what knowledge, skills, and tasks a person needs for each project.

VentureBeat: How will you train the model over time?

Meza: Weve started looking at NASA-specific competencies and work roles to assign those to employees. Our next phase is to have employees validate and verify that the associated case around knowledge, skills, abilities, tasks, and technologies that what we infer based on the model is either correct or incorrect. Then, well use that feedback to train the model so it can do a little bit better. Thats what were hoping to do over the next few months.

VentureBeat: What will this approach mean for identifying talent at NASA?

Meza: I think it will give the employees an opportunity to see whats out there that may interest them to further their career. If they want to do a career change, for example, they can see where they are in that process. But I also think it will help us align our people better across our organization, and we will help track and maybe predict where we might be losing skills, where we maybe need to modify skills based on the shifting of our programs and the shifting of our mission due to administration changes. So I think itll make us a little bit more agile and it will be easier to move our workforce.

VentureBeat: Do you have any other best practice lessons for implementing Neo4j?

Meza: I guess the biggest lesson that Ive learned over this time is to identify as many data sources that can help you provide some of the information. Start small you dont need to know everything right away. When I look at knowledge graphs and graph databases, the beauty is that you can add and remove information fairly easily compared to a relational database system, where you have to know the schema upfront. Within a graph database or knowledge graph, you can easily add information as you get it without messing up your schema or your data model. Adding more information just enhances your model. So start small, but think big in terms of what youre trying to do. Look at how you can develop relationships, and try to identify even latent relationships across your graphs based on the information you have about those data sources.

Read the original post:

How NASA is using knowledge graphs to find talent - VentureBeat