Category Archives: Data Science
Global Data Science Platform Market Anticipated to Hit $224.3 Billion by 2026, Growing at a CAGR of 31.1% from 2019 to 2026 – GlobeNewswire
New York, USA, Aug. 24, 2021 (GLOBE NEWSWIRE) -- According to a report published by Research Dive, the global data science platform market is expected to generate a revenue of $224.3 billion by 2026, growing at a CAGR of 31.1% during the forecast period (2019-2026). The inclusive report provides a brief overview of the current scenario of the market including significant aspects of the market such as growth factors, challenges, restraints and various opportunities during the forecast period. The report also provides all the market figures making it easier and helpful for the new participants to understand the market.
Download FREE Sample Report of the Data Science Platform Market Report: https://www.researchdive.com/download-sample/77
Dynamics of the Market
Drivers: Application of data science and its analytical tools substantially help organizations to take better business decisions based on the deciphered data. Furthermore, the analytical tools also help organizations to predict the purchasing pattern of their customers, thus enabling them to focus on their product innovation and offerings accordingly. These factors are expected to bolster the growth of the market during the forecast period.
Restrains: Lack of skilled and experienced professionals are expected to impede the growth of the market during the forecast period.
Opportunities: Persistent technological advancements in the analytical tools are expected to create vital investment opportunities for the growth of the market during the forecast period.
Check out How COVID-19 impacts the Data Science Platform Market. Click here to Speak with Our Analyst: https://www.researchdive.com/connect-to-analyst/77
Segments of the Market
The report has divided the market into different segments based on type, end-use and region.
Check out all Information and communication technology & media Industry Reports: https://www.researchdive.com/information-and-communication-technology-and-media
Type: Service Sub-segment to be Most Profitable
The service sub-segment is expected to generate a revenue of $76.0 billion during the forecast period. Service type significantly helps to analyze the demands of the clients and subsequently aid in increasing customer satisfaction. These factors are expected to accelerate the growth of the sub-segment during the forecast period.
Access Varied Market Reports Bearing Extensive Analysis of the Market Situation, Updated With The Impact of COVID-19: https://www.researchdive.com/covid-19-insights
End-Use: Banking, Financial Services and Insurance Sub-segment to Have the Highest Growth Rate
The banking, financial services and insurance sub-segment are expected to grow exponentially, surging at a CAGR of 29.4% during the forecast period. Data Science substantially helps companies to effectively monitor transactions and detect any kind of frauds in the sub-segment. This factor is expected to drive the growth of the sub-segment during the forecast period.
Region: North America Anticipated to Dominate the Market
The North America data science platform market is expected to generate a revenue of $80.3 billion during the forecast period.
Increasing adoption of analytical tools in the region is expected to fuel the growth of the market during the forecast period. Moreover, rising demand for IOT is further expected to drive the growth of the market in this region during the forecast period.
Key Players of the Market
1. Civis Analytics2. Domino Data Lab Inc.3. Databricks4. Microsoft Corporation5. Dataiku6. Alphabet Inc. (Google)7. Cloudera Inc.8. Anaconda Inc.9. Altair Engineering Inc.10. IBM Corporation
For instance, in February 2021, ThoughtWorks, a global software consultancy firm, acquired Fourkind, a Finland-based management consulting and advisory service company, so as to maximize the ability of ThoughWorks to support and service various clients in countries like Finland and Netherlands.
These players are currently focusing on R&D activities, mergers, acquisitions, partnerships and collaborations to sustain the growth of the market. The report also provides an overview of many important aspects including financial performance of the key players, SWOT analysis, product portfolio, and latest strategic developments.Click Here to Get Absolute Top Companies Development Strategies Summary Report.
TRENDING REPORTS WITH COVID-19 IMPACT ANALYSIS
Application Security Market: https://www.researchdive.com/5735/application-security-market
Threat Intelligence Security Solutions Market: https://www.researchdive.com/8355/threat-intelligence-security-solutions-market
Zero Trust Security Market: https://www.researchdive.com/5368/zero-trust-security-market
More here:
Liquidity is key to unlocking the value in data, researchers say – MIT Sloan News
open share links close share links
Like financial assets, data assets can have different levels of liquidity. Certificates of deposit tie up money for a certain length of time, preventing other use of the funds. Siloed business applications tie up data, which makes it difficult, even impossible, to use that data in other ways across an organization.
A recent research briefing, Build Data Liquidity to Accelerate Data Monetization, defines data liquidity as ease of data asset reuse and recombination. The briefing was written by Barbara Wixom, principal research scientist at the MIT Center for Information Systems Research (CISR) and Gabriele Piccoli of Louisiana State University and the University of Pavia.
Unlike physical capital assets listed on a corporate balance sheet, such as buildings and equipment, data does not deteriorate over time. In fact, it can become more valuable as it is used in different ways.
While data is inherently reusable and recombinable, an organization must activate these characteristics by creating strategic data assets and building out its data monetization capabilities, the authors explain.
Typically, companies use data in linear value-creation cycles, where data is trapped in silos and local business processes. Over time, the data becomes incomplete, inaccurate, and poorly classified or defined.
To increase data liquidity, organizations need to decontextualize the data, divorcing it from a specific condition or context. The authors suggest using best practices in data management including metadata management, data integration and taxonomy/ontology to ensure each data asset is accurate, complete, current, standardized, searchable, and understandable throughout the enterprise.
Such data management practices build key enterprise capabilities like data platform, data science, acceptable data use, and customer understanding, which increases datas monetization potential.
As a companys strategic data assets become more highly liquid and their number grows, data is made increasingly available for conversion to value, and the companys data monetization accelerates, write the authors.
In explaining how an organization can create highly liquid data assets for use across an enterprise, the authors cite the example of Fidelity Investments, a Boston-based financial services company.
The firm is combining more than 100 data warehouses and analytics stores into one common analytics platform, built upon five foundational structures:
Fidelitys goal is to organize the data around a common set of priorities such as customer, employee, and investible security. The result will be strategic data assets that are integrated and easily consumable. We want to create long-term data assets for creating value, not only immediately, but also for use cases that are yet to be identified, says Mihir Shah, Fidelitys enterprise head of data, in the briefing.
As long as Fidelitys internal data consumers follow core rules, they can combine data from different sources and build for specific requirements. Not only has Fidelity already created valuable data assets through this platform, it has begun to identify value-add opportunities using data that were never before possible activities that would add value for customers, revenue, and efficiency, according to the authors.
Once data is highly liquid, future ready companies can use it to produce value in three ways, according to MIT CISR research:
Fidelity is one of more than 70 strategic data asset initiatives that MIT CISR researchers uncovered in the course of interviews with its member organizations. The projects illustrate how the beauty lies not in a single use of data, but in the recurring reuse and recombination of the carefully curated underlying strategic data assets, the authors write.
As companies transform into future-ready entities, they need to view their strategic digital initiatives not simply as a way to exploit digital possibilities, but also as opportunities for reshaping their data into highly liquid strategic data assets, they conclude.
Read the original here:
Liquidity is key to unlocking the value in data, researchers say - MIT Sloan News
ConcertAI Expands Data Science Collaboration with Janssen to Drive Effective Therapies and Address Health Disparities in Clinical Trials – Woburn…
CAMBRIDGE, Mass., Aug. 24, 2021 /PRNewswire/ --ConcertAI, LLC (ConcertAI), a market leader for Real-World Data (RWD) and enterprise AI technology solutions for precision oncology, announced today the expansion of its multi-year collaboration with Janssen Research & Development, LLC (Janssen) across several disease area programs.
Through application of advanced AI and Data Science capabilities with 'high depth' real-world clinical data, ConcertAI is partnering with Janssen and its Research & Development Data Science team to advance innovative insights that inform clinical strategies and support study designs at a pace not possible through legacy approaches. The expanded collaboration further extends the two companies' novel work to broaden access to trials in new sites and strengthen trial diversity.
"ConcertAI's novel working model integrates the largest and deepest clinical and genomic data, enterprise AI, and will partner with Janssen and the world's leading data scientists and research scientists to generate evidence in support of critical disease insights and regulatory decisions,"said Jeff Elton, PhD, CEO of ConcertAI. "We are proud to collaborate with Janssen to drive effective medicines for the benefit of patients with the highest unmet medical needs."
It has been reported that while nearly 40 percent of Americans are considered members of a racial or ethnic minority, a smaller portion of patients enlisted in clinical trials are minorities. Janssen is deeply committed to enhancing diversity in clinical trials to ensure trials are representative of the patients most afflicted by disease, recognizing that patient access is inhibited if they are not.
"ConcertAI has a comprehensive, representative, and independently sourced RWD for oncology, hematology and urology with clinically integrated community oncology networks, regional health systems and leading academic centers," said Warren Whyte, PhD, Vice President of Scientific Partnerships & Customer Success at ConcertAI. "That data, and our network of leading experts and advocates for healthcare equity, is moving us forward with leaders like those at Janssen."
Through the expanded and broad collaboration, ConcertAI is broadening the sources of data used, moving earlier into disease states, and assuring that more patients have access to these innovative therapies both through clinical trials and through enhanced evidence generation in support of the new standards of care.
About ConcertAI
ConcertAI is the leader in Real-World Evidence (RWE) and AI technology solutions for life sciences and healthcare. Our mission is to accelerate insights and outcomes for patients through leading real-world data, AI technologies, and scientific expertise in partnership with the leading biomedical innovators, healthcare providers, and medical societies. For more information, visit us athttp://www.concertai.com.
Media Contact: Dianne Yurek, dyurek@concertAi.com
View original content:https://www.prnewswire.com/news-releases/concertai-expands-data-science-collaboration-with-janssen-to-drive-effective-therapies-and-address-health-disparities-in-clinical-trials-301361057.html
SOURCE ConcertAI
Read the original here:
MSK Study Identifies Biomarker That May Help Predict Benefits of Immunotherapy – On Cancer – Memorial Sloan Kettering
In recent years, immune-based treatments for cancer have buoyed the hopes of doctors and patients alike. Drugs called immune checkpoint inhibitors have provided lifesaving benefits to a growing list of people with several types of cancer, including melanoma, lung cancer, bladder cancer, and many more.
Despite the excitement surrounding these medications, a frustrating sticking point has been the inability of doctors to predict who will benefit from them and who will not.
On August 25, 2021, a group of researchers from Memorial Sloan Kettering Cancer Center reported in the journal Science Translational Medicine that a specific pattern, or signature, of markers on immune cells in the blood is a likely biomarker of response to checkpoint immunotherapy. Within this immune signature, a molecule LAG-3 provided key information identifying patients with poorer outcomes.
This link was discovered in a group of patients with metastatic melanoma and validated in a second group of patients with metastatic bladder cancer, suggesting that this potential biomarker may be broadly applicable to patients with a variety of cancers.
According to Margaret Callahan, an investigator with the Parker Institute for Cancer Immunotherapyat MSK and the physician-researcher who led the study, the large patient cohorts, robust clinical follow-up, and rigorous statistical approach of the study gives her enthusiasm that this immune signature is telling us something important about who responds to immunotherapy and why.
The findings pave the way for prospective clinical trials designed to test whether incorporating this biomarker into patient care can improve outcomes for those who are less likely to benefit from existing therapies.
Despite the excitement surrounding immune checkpoint inhibitors, a frustrating sticking point has been the inability of doctors to predict who will benefit from them and who will not.
In making their discoveries, the researchers had data on their side. As one of the first cancer centers in the world to begin treating large numbers of patients with immunotherapy, MSK has a cache of stored blood from hundreds of patients treated over the years, efforts pioneered by MSK researchers Jedd Wolchok and Phil Wong, co-authors on the study. The investigators of this study made their discoveries using pre-treatment blood samples collected from patients enrolled on seven different clinical trials open at MSK between 2011 and 2017.
To mine the blood for clues, researchers used a technique called flow cytometry. Flow cytometry is a tool that rapidly analyzes attributes of single cells as they flow past a laser. The investigators goal was to identify markers found on patients immune cells that correlated with their response to immunotherapy primarily PD-1 targeting drugs like nivolumab (Opdivo) and pembrolizumab (Keytruda). But this wasnt a job for ordinary human eyeballs.
When you think about the fact that there are hundreds of thousands of blood cells in a single patient blood sample, and that were mapping out the composition of nearly 100 different immune cell subsets, its a real challenge to extract clinically relevant information effectively, says Ronglai Shen, a statistician in the Department of Epidemiology and Biostatistics at MSK who developed some of the statistical tools used in the study. Thats where we as data scientists were able to help Dr. Callahan and the other physician-researchers on the study. It was a perfect marriage of skills.
The statistical tools that Dr. Shen and fellow data scientist Katherine Panageas developed allowed the team to sort patients into three characteristic immune signatures, or immunotypes, based on unique patterns of blood markers.
The immunotype that jumped out was a group of patients who had high levels of a protein called LAG-3 expressed on various T cell subsets. Patients with this LAG+ immunotype, the team found, had a much shorter survival time compared with patients with a LAG- immunotype: For melanoma patients, there was a difference in median survival of more than four years (22.2 months compared with 75.8 months) and the difference was statistically significant.
This immune signature is telling us something important about who responds to immunotherapy and why.
LAG-3 (short for lymphocyte-activation gene 3) belongs to a family of molecules called immune checkpoints. Like the more well-known checkpoints CTLA-4 and PD-1, LAG-3 has an inhibitory effect on immune responses, meaning it tamps them down. Several drugs targeting LAG-3 are currently in clinical development, although defining who may benefit from them the most has been challenging.
When Dr. Callahan and her colleagues started this research, they did not plan to focus on LAG-3 specifically. We let the data lead us and LAG-3 is what shook out, she says.
One strength of the study is its use of both a discovery set and a validation set. What this means is that the investigators performed their initial analysis on one set of blood samples from a large group of patients in this case, 188 patients with melanoma. Then, they asked whether the immune signature they identified in the discovery set could predict outcomes in an entirely different batch of patients 94 people with bladder cancer.
It could, and quite well.
When we looked at our validation cohort of bladder cancer patients who received checkpoint blockade, those who had the LAG+ immunotype had a 0% response rate, Dr. Callahan says. Zero. Not one of them responded. Thats compared with a 49% response rate among people who had the LAG- immunotype.
Because of the large data set, the scientists were also able to ask how their LAG+ immunotype compares with other known biomarkers of response specifically, PD-L1 status and tumor mutation burden. What they found was the immunotype provided new and independent information about patient outcomes, rather than just echoing these other biomarkers.
Biomarkers are important in cancer for several reasons. They may help clinicians and patients select the best treatment and may allow them to avoid unnecessary treatment or treatment that is unlikely to work.
Immunotherapy drugs are not without potential toxicity, Dr. Panageas says. So, if we can spare someone the potential risks of a treatment because we know theyre not likely to respond, thats a big advance.
The second reason is cost. Immunotherapy drugs are expensive, so having a means to better match patients with available drugs is vital.
And, because the researchers identified this biomarker using patient blood samples, it raises the pleasing prospect that patients could be assessed for this marker using a simple blood draw. Other biomarkers currently in use rely on tumor tissue typically obtained by a biopsy.
If I told you that you could have a simple blood draw and in a couple of days have information to make a decision about what therapy you get, Id say it doesnt get much better than that, Dr. Callahan says. Of course, there is still much work to be done before these research findings can be applied to patients in the clinic, but we are really enthusiastic about the potential to apply these findings.
A limitation of the study is that it is retrospective, meaning that the data that were analyzed came from blood samples that were collected years ago and stored in freezers. To confirm that the findings have the potential to benefit patients, investigators will need to test their hypothesis in a prospective study, meaning one where patients are enrolled on a clinical trial specifically designed to test the idea that using this immunotype in treatment decisions can improve patient outcomes.
What Im most excited about is prospectively evaluating the idea that not only can we identify patients who wont do as well with the traditional therapies but that we can also give these patients other treatments that might help them, based on our knowledge of what LAG-3 is doing biologically, Dr. Callahan says.
Key Takeaways
Read the rest here:
Understanding The Macroscope Initiative And GeoML – Forbes
How is it possible to harness high volumes of data on a planetary scale to discover spatial and temporal patterns that escape human perception? The convergence of technologies such as LIDAR and machine learning is allowing for the creation of macroscopes, which have many applications in monitoring and risk analysis for enterprises and governments.
Microscopes have been around for centuries, and they are tools that allow individuals to visualize and research phenomena that are too small to be perceived by the human eye. Macroscopes can be thought of as carrying out the opposite function; they are systems that are designed to uncover spatial and temporal patterns that are too large or slow to be perceived by humans. In order to function, they require both the ability to gather planetary-scale information over specified periods of time, as well as the compute technologies that can deal with such data and provide interactive visualization. Macroscopes are similar to geographic information systems, but include other multimedia and ML-based tools.
Dr. Mike Flaxman, Spatial Data Science Practice Lead at OmniSci
In an upcoming Data for AI virtual event with OmniSci, Dr. Mike Flaxman, Spatial Data Science Practice Lead, and Abhishek Damera, Data Scientist, will be giving a presentation on building planetary geoML and the macroscope initiative. OmniSci is an accelerated analytics platform that combines data science, machine learning, and GPU to query and visualize big data. They provide solutions for visual exploration of data that can aid in monitoring and forecasting different kinds of conditions for large geospatial areas.
The Convergence of Data and Technologiesn a world where the amount and importance of data continue to grow exponentially, it is increasingly important for organizations to be able to harness that data. The fact that data now flows digitally changes how we collect and integrate from many different sources across varying formats. Because of this, getting data from its raw condition to the state of being analysis-ready and then actually performing the analysis can be challenging and often requires very complex pipelines. Traditional software approaches generally do not scale very well, resulting in teams that are increasingly looking to machine learning algorithms and pipelines to perform tasks such as feature classification, extraction, and condition monitoring. This is why companies like OmniSci are applying ML as part of a larger macroscope pipeline to provide analytics methods for applications such as powerline risk analysis and naval intelligence.
One way that OmniSci is using their technology is in monitoring powerline vegetation by district at a national level in Portugal. Partnering with Tesselo, they are using a combination of imagery and LIDAR technologies to build a more detailed and temporally flexible portrait of land cover that can be updated weekly. Using stratified sampling for ML and GPU analytics for real-time integration, they are able to extract and render billions of data points from samples sites for vegetation surrounding transmission lines.
For large scale projects such as the above, there are most often two common requirements: extremely high volumes of data are required to provide accurate representations of specific geographical locations, and machine learning is needed for data classification and continuous monitoring. OmniSci aims to address the question of how, on a technical level, these two requirements can be integrated in a manner that is dynamic, fast, and efficient. The OmniSci software is a platform that consists of three layers, each of which can be independently used or combined together. The first layer, OmniSci DB, is a SQL database that provides fast queries and embedded ML. The middle component, the Render Engine, provides server-side rendering that acts similarly to a map server and can be combined with the database layer to render results as images. The final layer, OmniSci Immerse, is an interactive front-end component that allows the user to play around with charts and data and request queries from the backend. Together, the OmniSci ecosystem can take in data from many different sources and formats and talk to other SQL databases through well-established protocols. Data scientists can use traditional data science tools jointly, making it easy to analyze the information. OmniScis solution centers on the notion of moving the code to the data rather than the data to the code.
Case Study in Firepower-Transmission Line Risk AnalysisA specific case study for OmniSci Immerse demonstrates the ability to perform firepower-transmission line risk analysis. Growing vegetation can pose high risks to power lines for companies such as PG&E, and it can be inefficient and challenging to accurately assess changing risks in an accurate manner. However, combining imagery and LIDAR data, OmniSci is providing a better way to map out the physical structures of different geographic areas, such as in Northern California, to analyze risk without needing on-site visits. OmniScis platform combines three factors of physical structure, vegetation health over time, and varying wind speeds over space to determine firepower strike tree risk. They are addressing both the issues of scale and detail to allow utility companies to determine appropriate actions through continuous monitoring.
In addition to the firepower-transmission line risks analysis example, there are many other use cases for macroscope technologies and methods. OmniSci is providing a way to perform interactive analyses on multi-billion row datasets, and they can provide efficient methods for critical tasks such as anomaly detection. To learn more about the technology behind OmniSci solutions as well as the potential use cases, make sure to join the upcoming Data for AI community for the virtual event.
Originally posted here:
Cancer Informatics for Cancer Centers: Scientific Drivers for Informatics, Data Science, and Care in Pediatric, Adolescent, and Young Adult Cancer -…
This article was originally published here
JCO Clin Cancer Inform. 2021 Aug;5:881-896. doi: 10.1200/CCI.21.00040.
ABSTRACT
Cancer Informatics for Cancer Centers (CI4CC) is a grassroots, nonprofit 501c3 organization intended to provide a focused national forum for engagement of senior cancer informatics leaders, primarily aimed at academic cancer centers anywhere in the world but with a special emphasis on the 70 National Cancer Institute-funded cancer centers. This consortium has regularly held topic-focused biannual face-to-face symposiums. These meetings are a place to review cancer informatics and data science priorities and initiatives, providing a forum for discussion of the strategic and pragmatic issues that we faced at our respective institutions and cancer centers. Here, we provide meeting highlights from the latest CI4CC Symposium, which was delayed from its original April 2020 schedule because of the COVID-19 pandemic and held virtually over three days (September 24, October 1, and October 8) in the fall of 2020. In addition to the content presented, we found that holding this event virtually once a week for 6 hours was a great way to keep the kind of deep engagement that a face-to-face meeting engenders. This is the second such publication of CI4CC Symposium highlights, the first covering the meeting that took place in Napa, California, from October 14-16, 2019. We conclude with some thoughts about using data science to learn from every child with cancer, focusing on emerging activities of the National Cancer Institutes Childhood Cancer Data Initiative.
PMID:34428097 | DOI:10.1200/CCI.21.00040
Read the original:
Empowering the Intelligent Data-Driven Enterprise in the Cloud – CDOTrends
Businesses realize that the cloud offers a lot more than digital infrastructure. Around the world, organizations are turning to the cloud to democratize data access, harness advanced AI and analytics capabilities, and make better data-driven business decisions.
But despite heavy investments to build data repositories, setting up advanced database management systems (DBMS), and building large data warehouses on-premises, many enterprises are still challenged with poor business outcomes, observed Anthony Deighton, chief product officer at Tamr.
Deighton was speaking at the Empowering the intelligent data-driven enterprise in the cloud event by Tamr and Google Cloud in conjunction with CDOTrends. Attended by top innovation executives, data leaders, and data scientists from Asia Pacific, the virtual panel discussion looked at how forward-looking businesses might kick off the next phase of data transformation.
Why a DataOps strategy makes sense
Despite this massive [and ongoing] revolution in data, customers still can't get a view of their customers, their suppliers, and the materials they use in their business. Their analytics are out-of-date, or their AI initiatives are using bad data and therefore making bad recommendations. The result is that people don't trust the data in their systems, said Deighton.
As much as we've seen a revolution in the data infrastructure space, we're not seeing a better outcome for businesses. To succeed, we need to think about changing the way we work with data, he explained.
And this is where a DataOps strategy comes into play. A direct play on the popular DevOps strategy for software development, DataOps relies on an automated, process-oriented methodology to improve data quality for data analytics. Deighton thinks the DevOps revolution in software development can be replicated with data through a continuous collaborative approach with best-of-breed systems and the cloud.
Think of Tamr working in the backend to clean and deliver this centralized master data in the cloud. Offering clean, curated sources to questions such as: Who are my customers? What products have we sold? What vendors do we do business with? What are my sales transactions? And of course, for every one of your [departments], there's a different set of these clean, curated business topics that are relevant to you.
Data in an intelligent cloud
But wont an on-premises data infrastructure work just as well? So what benefits does the cloud offer? Deighton outlined two distinct advantages to explain why he considers the cloud the linchpin of the next phase of data transformation.
You can store infinite amounts of data in the cloud, and you can do that very cost-effectively. It's far less costly to store data in the cloud than it is to try to store it on-premises, in [your own] data lakes, he said.
Another really powerful capability of Google Cloud is its highly scalable elastic compute infrastructure. We can leverage its highly elastic compute and the fact that the data is already there. And then we can run our human-guided machine learning algorithms cost-effectively and get on top of that data quickly.
Andrew Psaltis, the APAC Technology Practice Lead at Google Cloud, drew attention to the synergy between Tamr and Google Cloud.
You can get data into [Google] BigQuery in different ways, but what you really want is clean, high-quality data. That quality allows you to have confidence in your advanced analytics, machine learning, and to the entire breadth of our analytics and AI platform. We have an entire platform to enable you to collaborate with your data science team; we have the tooling to do so without code, packaged AI solutions, tools for those who prefer to write their code, and everywhere in between.
Bridging the data silos
A handful of polls were conducted as part of the panel event, which saw participants quizzed about their ongoing data-driven initiatives. When asked about how they are staffing their data science initiatives, the majority (46%) responded they have multiple teams across various departments handling their data science initiatives.
The rest are split between either having a central team collecting, processing, and analyzing data or a combination of a central team working with multiple project teams across departments.
Deighton observed that multiple work teams typically result in multiple data silos: Each team has their silo of data. Maybe the team is tied to a specific business unit, a specific product team, or maybe a specific customer sales team.
The way to break the data barriers is to bring data together in the cloud to give users a view of the data across teams, he says. And it may sound funny, but sometimes, the way to break the interpersonal barriers is by breaking the data barriers.
Your customers don't care how you are organized internally. They want to do business with you, with your company. If you think about it, not from the perspective of the team, but the customer, then you need to put more effort into resolving your data challenges to best serve your customers.
Making the move
When asked about their big data change initiatives for the next three years, the response is almost unanimous: Participants want to democratize analytics, build a data culture, and make decisions faster (86%). Unsurprisingly, the top roadblock is that IT takes too long to deliver the systems data scientists need (62%) and the cost of data solutions (31%).
The cloud makes sense, given how it enables better work efficiency, lowers operational expenses, and is inherently secure, said Psaltis. Workers are moving to the cloud, Psaltis noted as he shared an anecdote about an unnamed organization that loaded the cloud with up to a petabyte of data in relatively short order.
This was apparently done without the involvement or knowledge of the IT department. Perhaps it might be better if the move to the cloud is done under more controlled circumstances with the approval and participation of IT, says Psaltis.
Finally, it is imperative that data is cleaned and kept clean as it is migrated to the cloud. Simply moving it into the cloud isn't enough. Without cleaning the data first, you will end up with poor quality, disparate data in the cloud. Where each applications data sits within a silo, with more silos than before, and difficulty making quality business decisions, summed up Deighton.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at[emailprotected].
Image credit: iStockphoto/Artystarty
See more here:
Empowering the Intelligent Data-Driven Enterprise in the Cloud - CDOTrends
The Winners Of Weekend Hackathon -Tea Story at MachineHack – Analytics India Magazine
The Weekend Hackathon Edition #2 The Last Hacker Standing Tea Story challenge concluded successfully on 19 August 2021. The challenge involved creating a time series analysis model that forecasts for 29 weeks . It had almost 240+participants and 110+ solutions posted on the leaderboard.
Based on the leaderboard score,we have the top 4 winners of the Tea Story Time Series Challenge, who will get free passes to the virtual Deep Learning DevCon 2021, to be held on 23-24 Sept 2021. Here, we look at the winners journeys, solution approaches and experiences at MachineHack.
First Rank Vybhav Nath C A
Vybhav Nath- a final year student at IIT Madras. He entered this field during his second year of college and started participating in MachineHack hackathons from last year. He plans to take up a career in Data Science.
Approach
He says the problem was unique in the sense that many columns in the test set had a lot of null value. So this was a challenging task to solve. He kept his preprocessing steps restricted to imputation and replacing N.S tasks. This was the first competition where he didnt use any ML model. Since many columns had null values, he interpolated the columns to get a fully populated test set. Then the final prediction was just the mean of these Price columns. He thinks this was total doosra by the cool MachineHack Team.
Experience
He says, I always participate in MH hackathons whenever possible. There are a wide variety of problems which test multiple areas. I also get to participate with many Professionals which I found to be a good pointer about where I stand among them.
Check out his solution here.
Second prize Shubham Bharadwaj
Shubham has been working as a Data Scientist for about 7 years now. He has been working on large datasets for the past 7 years. Started off with SQL then BigData Analytics, then Data Engineering and finally working as a Data Scientist. But he is new to hackathons and this is his fourth hackathon in which he has participated. He loves to solve complex problems.
Approach
The data which was provided was very raw in nature, there were around 70 percent missing values in the test dataset. From his point of view ,finding the best imputation method was the backbone of this challenge.
Preprocessing steps followed:
1. Converting the columns to correct data types,
2. Imputing the missing values- He tried various methods like filling the null values with mean of each column, mean of that row, MICE. But the best was KNN imputer with n_neighbors as 3.
For removing the outliers,he used the IQR(InterQuartile Range), which helped in reducing the mean square error.
Models tried were logistic regression, then XGBRegressor, ARIMA, T-POT, and finally H2OAutoML which yielded the best result.
Experience
Shubham says I am new to the MachineHack family, and one thing is for sure that I am here to stay. Its a great place, I have already learned so much. The datasets are of wide variety and the problem statements are unique, puzzling and complex. Its a must for every aspiring and professional data scientist to upskill themselves.
Check out his solution here.
Third prize Panshul Pant
Panshul is a Computer Science and Engineering graduate. He has picked up data science mostly from online platforms like Coursera, Hackerearth, MachineHack and by watching videos on YouTube. Going through articles on websites like Analytics India Magazine have also helped him in this journey. This problem was based on a time series which made it unique, though he solved it using machine learning algorithms rather than other traditional ways.
Approach
There were certain string values like N.S, No sale etc in all numerical columns which I changed to Null values and imputed all the null values. I tried various ways to impute NaNs like with zero, mean, f-fill and b-fill methods .Out of these forward and backward filling methods performed significantly better. Exploring the data he noticed that the prices increased over the months and years, having a trend. The target columns values were also very closely related to the average of prices of all the independent columns.He kept all data including the outliers without much change as tree based models are quite robust to outliers.
As the prices were related to time he extracted time based features as well out of which day of week proved to be useful. An average based feature which had the average of all the numerical columns was extremely useful for good predictions. He tried using some aggregate based features as well but they were not of much help. For predictions he used tree based models like lightgbm and xgboost. The combination of both of them using weighted average gave best results.
Experience
Panshul says It was definitely a valuable experience. The challenges set up by the organisers are always exciting and unique. Participating in these challenges has helped me hone my skills in this domain.
Check out his solution here.
Fourth prize Shweta Thakur
Shwetas fascination with data science started when she realised how numbers can guide decision making. She did a PGP-DSBA course from Great Learning . Even though her professional work does not involve Data Science activity, she loves to challenge herself by working on Data Science projects and participating in Hackathons.
Approach
Shweta says that the fact that it is a time series problem makes it unique. She observed the trend and seasonality in the dataset and the higher correlation between various variables. Didnt treat the outliers but tried to treat the missing values with interpolate (linear, spline)method, ffill, bfill, replacing with other values from dataset.Even though some of the features were not as significant in identifying the target but removing them didnt improve the RMSE. She tried only SARIMAX.
Experience
Shweta says It was a great experience to compete with people from different back-ground and expertise.
Check out his solution here.
Once again, join us in congratulating the winners of this exciting hackathon who indeed were the Last Hackers Standing of Tea Story- Weekend Hackathon Edition-2 . We will be back next week with the winning solutions of the ongoing challenge Soccer Fever Hackathon.
Original post:
The Winners Of Weekend Hackathon -Tea Story at MachineHack - Analytics India Magazine
Mathematical Optimization: A Powerful Prescriptive Analytics Technology That Belongs In Your Data Science Toolbox – insideBIGDATA
In this special guest feature, Dr. Gregory Glockner, Vice President and Technical Fellow at Gurobi, explains how you can get started using mathematical optimization and provides some examples of how this prescriptive analytics technology can be combined with machine learning to deliver business benefits across various industries. Prior to joining Gurobi in 2009, Dr. Glockner was partner and Chief Operating Officer for Dwaffler, a provider of decision analysis tools. Dr. Glockner has a B.S. magna cum laude from Yale University in Applied Mathematics and Music, and an M.S. and Ph.D. in Operations Research from the Georgia Institute of Technology. He has trained users of optimization software in Brazil, Hong Kong, Japan, Singapore, South Korea, and throughout the USA and Canada. He is an expert in optimization modeling and software development.
We are in the midst of a golden age of data analytics, where high-quality data abounds and powerful, advanced analytics tools are readily available.
Enterprises across industries are looking to leverage these analytics tools to generate solutions to their mission-critical problems, guide their predictions and decisions, and gain a competitive advantage. But with so many analytics tools on the market, many companies have difficulties determining which ones they truly need.
Broadly speaking, analytics consists of three different types of tools:
All three types of analytics tools are widely used by organizations today. For example, as governments and the healthcare industry rush to vaccinate the global population against COVID-19, descriptive analytics tools can provide us with an accurate, real-time overview of current vaccination and infection rates; predictive analytics tools can forecast what would happen to infection rates if we administer more vaccines in specific locations at certain times; and prescriptive analytics tools can help us decide exactly where and when to distribute vaccines.
If you as a data scientist or IT professional want to extract maximum value from your data (by utilizing it to drive insights, predictions, decisions, and the best possible business outcomes), you should use all three types of analytics tools, ideally in an integrated manner.
You probably have a very firm grasp of descriptive and predictive analytics tools, but perhaps are not that familiar with prescriptive analytics in general and mathematical optimization (the primary prescriptive analytics tool) in particular.
In this article, Ill briefly explain how you can get started using mathematical optimization and provide some examples of how this prescriptive analytics technology can be combined with machine learning to deliver business benefits across various industries.
Learning to Leverage Mathematical Optimization at Scale
Chances are that you, like most data scientists and IT professionals, already have some experience using mathematical optimization most likely in Excel.
Like a Swiss Army Knife, Excel provides users access to a number of different tools, including forecasting and scenario analysis functionality and a basic mathematical optimization solver.
Although Excel gives you the opportunity to get your get your feet wet with these analytics tools and perform simple tasks, this softwares capabilities are quite limited as it cant handle large, multi-dimensional data sets or problems of significant complexity.
If you want use mathematical optimization or other sophisticated analytics tools at scale, you need a more specialized and robust tool for the job.
When it comes to mathematical optimization, theres a wide array of commercial mathematical optimization computational and modeling tools on the market, many of which interface with many of the popular programming languages that data scientists are accustomed to such as Python, MATLAB, and R.
You can use your programming language of choice to build mathematical optimization models and applications just like you do with machine learning.
Of course, it will take some time and effort to learn to write code for mathematical optimization, but in the end it will pay off, as you will be able to utilize this potent prescriptive analytics technology on its own or in combination with machine learning to automatically generate solutions to your most critical and challenging business problems and make optimal decisions.
Making an Impact Across Industries
Mathematical optimization and machine learning have proved to be a dynamic duo, and companies across many different industries have used these two analytics technologies together to address a wide range of real-world business problems and achieve greater productivity and profitability.
Here are just a few examples of how this combination of mathematical optimization and machine learning is delivering major busines value in various industry verticals:
Adding Mathematical Optimization to Your Data Science Toolbox
There has been a continuous increase in the number of data scientists using mathematical optimization, as well as the number of different use cases of this prescriptive analytics technology (on its own and in combination with machine learning), across various industries.
If you are interested in adding mathematical optimization to your toolbox, you can get started by exploring and experimenting with mathematical optimization in Excel. Then when you are ready to experience the full power of this technology you can move on to industrial-strength mathematical optimization tools that will enable you to tackle problems that are huge in terms of complexity, scale, and significance.
If you want to unlock the true value of your data (by using it to not only derive insights and predictions, but also to drive optimal decision making), then you need mathematical optimization along with machine learning and other analytics technologies in your toolset.
Sign up for the free insideBIGDATAnewsletter.
Join us on Twitter:@InsideBigData1 https://twitter.com/InsideBigData1
Read the rest here:
Data science is a team sport: How to choose the right players – ZDNet
Building deep and ongoing data science capabilities isn't an easy process: it takes the right people, processes and technology. Finding the right people for the right roles is an ongoing challenge, as employers and job seekers alike can attest.
"The people part is probably the least well-understood aspect of this entire equation," John Thompson, global head of advanced analytics & AI at CSL Behring, said during a virtual panel discussion on Thursday.
As the head of analytics at one of the leading international biotechnology companies, Thompson oversees data science teams that tackle a wide range of initiatives. He and the experts in the virtual panel, hosted by MLOps firm Domino Data Lab, agreed that scaling data science requires more than just data scientists.
To kick off data science initiatives at CSL Behring, Thompson says he starts with a "skeleton team you need for a project to be successful." That typically includes engineers, data scientists, a UI or UX data visualist and subject matter experts.
A successful data science team also needs a leader who can make sure projects stay focused on business objectives.
"If we're saying data science is a team sport, you don't just need all the players; you need a coach," said Matt Aslett, research director for the data, AI& Analytics Channel at 451 Research.
It's clear that a complete data science team comprises more than just data scientists -- but it isn't necessarily wise to consolidate data science teams within an IT department, added Nick Elprin, CEO and co-founder at Domino Data Lab.
"One of the things we've seen among companies we work with that are most successful is they closely align those teams with business objectives," he said. "How you guide their work and prioritize, the closer you can get that to the core company objective, the more likely you are to [be successful]. When you move more into IT, you get further away from core objectives."
Managers also need to consider how their teams are organized when they're hiring, Elprin said. They should ask, he said, "What types of skills are you going to make core to the role, and what will you augment with other people you'll collaborate with?
"Companies have success [building data science teams] with folks who know stats and basic programming and augmenting them with people who know devOps or other engineering capabilities," Elprin added.
Meanwhile, it's important to consider when professional data scientists are truly needed versus tools that purport to "democratize" data science and machine learning.
"It depends on the nature of the problem you're pointing your data science and machine learning folks toward," Elprin said. "For commoditized problems, some of the auto ML solutions can be effective. If you're talking about a problem unique to your business or core to your differentiation, you need more of... the flexibility that comes with developing your own proprietary models and using the power of code to express those ideas."
Finally, advancing impactful data science projects requires buy-in from executives, Thompson noted.
"The real challenge is the macro-level change management process; it's not really about the data science process," he said. To realize the full value of a full data science initiative, he said it's important to convey to executives that "in the end, it's going to drive change. You need to be ready to drive change... if you don't want to do that, maybe we should do a project, not a program."
Go here to see the original:
Data science is a team sport: How to choose the right players - ZDNet