Category Archives: Data Science

SEO and the future world without cookies – Search Engine Land

Third-party cookie tracking is going away, and the SEO industry is ready and diligently preparing ambivalent and posting memes on Twitter.

SEO has been dealing with the lack of cookie tracking since its existence.

So does the cookies demise actually matter?

Well, my friends, Im here to tell you two things.

The good news: this change means something and theres an opportunity.

The bad news: its not going to be easy.

It takes some smart chops to get done, plus considerable resources youre not likely to get sitting over there in the SEO corner of the office (remote, of course).

Legislation, such as the EU General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), are tightening up how advertisers can use to track users.

I think the claim behind this initiative that consumers want more control of their data is partly valid but not tremendous.

Most users dont care and dont think about whos tracking them, except in passing. Its not a concern that causes them to change their online behavior unless theyre trying to hide something.

Most of us would prefer to have clarity and reasonable limits on what an advertiser can track and how they target us. But in general, Ive found we leave it at that.

The average user doesnt think much about it, especially since it gets technical and specialized quickly, and more so by the year.

But these privacy restrictions are coming and theyre a good thing.

Have you noticed, for example, the rise of auditory targeted ads and content youre being delivered by Google?

Try an experiment some time in your home.

Start talking about a random but specific topic and repeat the keyword(s) a few times.

Youre likely to find it in your news feed, in ads, in search results, and sprinkled around in the most unusual recommended places.

Freaky? Yeah, kinda

Its probably good weve got legislation setting some limits, however, limited they are at this early stage.

Its not new, anyway.The focus on cookies has been happening for years.

For example, Firefox began blocking third-party cookies as early as 2019. Safari followed suit in 2020.

As the move to a cookieless future gains force and creates greater restrictions in digital advertising, SEO needs to keep pace.

We need to get a seat at the table, especially with regard to the measurement of channel effectiveness, attribution, and yes, incrementality (I said it!).

The latter is a big word and a difficult thing to do in SEO.

Traditional measurement models that leverage cookies, such as multi-touch attribution (MTA), will be increasingly phased out of analytics toolkits.

The two primary models marketers have historically used are media mix modeling (MMM) and MTA.

MMM is a top-down approach that typically covers multiple years of data, while MTA is a bottom-up approach, more granular, and reliant upon cookies to track sessions and users.

The problems with cookies are of some significance, too. They fail to measure cross-device, and more recently, theyre opt-in only.

But marketers still need to measure performance. Cookies have been handy for that.

When considering how a cookieless future impacts SEO, follow the model already set forth by other measurement channels: build a clean room.

The reality is that a clean room probably will not be built specifically for SEO. It doesnt need to be since SEO doesnt have first-party data, anyway.

This is where the hard reality of SEO relative to other channels becomes apparent. Measuring it will not lead to the investment of resources across an organization. Not by itself, anyway.

But you can leverage the work others have done in paid media, for example, to get some interesting measurement applications for SEO.

Rather than using individual data, this approach takes a high-frequency metric (i.e., organic search sessions) and examines how other media (i.e., TV spots) impacts the channel.

This type of analysis provides insight into how SEO captures the demand created by a TV ad, offline campaign, or display campaign.

Trying to force organic clicks into media mix modeling (MMM) is a misapplication of the metric due to the fact youll be changing the results already reported to the organization.

The paid media team would disagree, and the organization would be distracted and potentially stuck in arguments over attribution.

Instead, we can take the MMM and set aside all the sales driven by media. Then, we can run SEO clicks against the base sales to attempt to tease out the signal of SEO that is hiding in the base.

Additionally, we can consider running a model of paid media impressions against SEO clicks to understand media interaction.

This is similar to the aggregated attribution approach but more granular.

Get the daily newsletter search marketers rely on.

We have to balance the reality of how much teams are willing to invest in tracking the effectiveness of SEO, relative to other channels.

There is a lot of money pouring into media, obviously, and this drives heavy innovation into media mix modeling and attribution for these channels.

The same cannot be said for SEO. But we need to find ways to measure SEOs effectiveness, and it needs to be sophisticated in line with other channels approaches today.

Gone are the days of relying on some third-party Semrush charts, except in cases perhaps where were looking at competitive insights.

It may very well be that existing MMM solutions already have adequate insights available to them that include owned and earned observations without risking what analytics teams call, collinearity, the phenomenon of insights being skewed from data sets that are dependently correlated (i.e., linear) when sliced and diced.

Another consideration is that teams may simply not need, or have the budget for, complex modeling such as MMM. In these cases, perhaps Google Analytics 4 and Adobe do everything thats needed at a basic level, which can be augmented with some SEO testing.

The answer to all this is simple but hard to accomplish.

SEO as a channel famously plays second fiddle to media be it paid search, display or paid social.

Yes, companies invest in SEO and care about SEO.

When everything else is accounted for, however, media dollars will always take precedence in any measurement conversations.

Resources follow the money, and SEO is on the short end of the stick when it comes to resources from the analytics and data science teams.

But it doesnt have to be.

Getting SEO data sets into the clean rooms and aligning them to other data sources is key for gaining insights into the channel.

As digital marketers move toward using clean rooms such as Google Ads Data Hub (ADH) and others, SEO teams need to get site analytics data into these environments.

By bringing the data together, SEOs can look at customer journeys across paid media impressions, clicks and site activity (including a tag for source as organic search).

In this new environment with SEO analytics data added to a clean room, marketers can also get toward an attribution use case to measure the contribution and even the incrementality of the SEO channel and its relationship with the other channels.

But theres a difficult catch here. The reason its hard is that generating buy-in from others is essential.

Theres already enough focus and resources centered on this transition away from cookies toward clean rooms and more trackable solutions.

This means resources arent sitting around (usually) waiting to accommodate SEO. And most marketers wont have the interest to rank SEO priorities above media, especially when it comes to things like attribution and channel performance measurement.

But thats exactly what we need as SEOs more than ever: good performance tracking.

And we especially need SEOs contribution, and yes, incremental addition to the cross-channel picture.

Doing this successfully is part of the SEOs world: navigating resources and teams along with generating buy-in from the right groups to prioritize this work.

If you can do that, the entire organization will benefit from greater clarity of SEOs contribution and its value to the business.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

New on Search Engine Land

About The Author

See the original post here:

SEO and the future world without cookies - Search Engine Land

Journal Article: Why It Takes a Village to Manage and Share Data – LJ INFOdocket

The article linked below was recently published by Harvard Data Science Review (HDSR).

Title

Why It Takes a Village to Manage and Share Data

Authors

Christine L. BorgmanUCLA

Philip E. Bourne

Source

Harvard Data Science Review (4)3

DOI: 10.1162/99608f92.42eec111

Abstract

Implementation plans for the National Institutes of Health policy for data management and sharing, which takes effect in 2023, provide an opportunity to reflect on the stakeholders, infrastructures, practice, economics, and sustainability of data sharing. Responsibility for fulfilling data-sharing requirements tends to fall on principal investigators, whereas it takes a village of stakeholders to construct, manage, and sustain the necessary knowledge infrastructure for disseminating data products. Individual scientists have mixed incentives and many disincentives to share data, all of which vary by research domain, methods, resources, and other factors.

Motivations and investments for data sharing also vary widely among academic institutional stakeholders such as university leadership, research computing, libraries, and individual schools and departments. Stakeholder concerns are interdependent along many dimensions, seven of which are explored: what data to share; context and credit; discovery; methods and training; intellectual property; data science programs; and international tensions. Data sharing is not a simple matter of individual practice, but one of infrastructure, institutions, and economics. Governments, funding agencies, and international science organizations all will need to invest in commons approaches for data sharing to develop into a sustainable international ecosystem.

Direct to Full Text Article

Direct to Table of Contents HSDR 4(3)

Filed under: Associations and Organizations, Data Files, Funding, Libraries, Management and Leadership, News

Go here to read the rest:

Journal Article: Why It Takes a Village to Manage and Share Data - LJ INFOdocket

‘Communication and collaboration are everything in data science’ – Siliconrepublic.com

Daniel Moore explains why honesty is the best policy in data science and why the best models and most sophisticated techniques are not what matters most.

Daniel Moore is a lead data scientist at Liberty IT with a decade of experience in the analytics and data science field.

He has a background in biophysics and has worked in diverse fields such as cancer research, drug design, the mobility sector and insurance.

Moore told SiliconRepublic.com that his love for maths and physics were what started him on his STEM career. By the time it came to choosing what to study in university, it was between computer science and human biology. But the latter won out.

I know there was a turning point in my education and one person that shaped how I got here. In your final year of most university degrees, you undertake a research project. Essentially you work with a professor to explore your thoughts on how to solve a novel problem. Mine was Frank, he said.

As part of this, I worked with a new group of people who were all really friendly and devoted time to helping me learn. No question was too stupid. This was my first time programming. I remember how amazing it felt to solve mathematical problems automatically and with incredible speed. I didnt know such a field existed where I could work in biology and explore my love for technology and computers.

This experience made him change his career plans. He did a masters degree in computational biology, now known as bioinformatics, and went to complete a PhD in biophysics and drug design.

I was exposed to the real tangible impact of data science on humans, he said. How we can leverage data to screen for cancer automatically, to design and develop new drugs in a move toward a disease-free world preventing unnecessary suffering and death. I may have got here by chance, but I stayed because the impact and use case were just incredible.

The models you produce as a data scientist are useless if you cannot explain how they work DANIEL MOORE

When you finish your PhD or your degree, one question always gets asked, should I continue in academia or move to industry? However, a good work-life balance was important to me so I decided against studying medicine and chose to explore a career in industry.

It wasnt an obvious choice. I applied to be a lecturer before submitting my CV to a small local start-up company to work as something called a data scientist. It was the first time I heard that term, and with a fear of interviews, I almost didnt go. Looking back, Im so glad I took that opportunity for multiple reasons to experience a career in industry, within a start-up and as a data scientist.

For two years I was fortunate enough to work within the mobility sector on cutting-edge tech such as self-driving cars, biometric wearables and how we can use such sensors to improve driver safety. I got to work with some amazing tech and OEM car brands such as Bentley and Volvo, and experience that start-up culture even working in Silicon Valley for a brief time. However, start-up life is difficult, your input directly shapes the future of the company, and you feel a tremendous amount of responsibility.

I decided to make a change and joined Liberty IT as a senior data scientist. What drew me to this job was the diversity of projects and the impact they could have. Like many, I would hear the word insurance and practically fall asleep with boredom. However, this is certainly not the case here and the projects I get to work on would put this stereotype to shame.

I never realised just how much insurance impacts our everyday lives, making things right when something unfortunate happens. The projects I get to work on impact the everyday person. Moreover, I have always enjoyed learning. The sheer diversity of projects within computer vision, natural language processing, predictive modelling and MLOps provide enough learning for a lifetime.

No one wants to do the same thing day in, day out, and thats the beauty of data science in general its an emerging discipline and that means you need to constantly learn and adapt. It was one of the great aspects of research and now is part of my career as a data scientist. I love it.

As a data scientist, you tend to work on solving various problems by developing statistical models and learning from historical data. However, you often need to explain some insights and your methodology to solve a given problem in the manner that you did.

The difficulty is that you need to explain this to a non-technical individual, the stakeholder. Its all too easy to pull the wool over someones eyes and razzle-dazzle them with fancy terminology and buzzwords. Data science is a disciple where you need to be honest, show in a non-biased way your insights and recognise there are a plethora of methods to solve any given problem.

No matter how experienced you are, sometimes you will not be able to solve that problem, your approach might be incorrect, and youll often be the bearer of bad news: Sorry Mr Stakeholder, its unrealistic to train a model to be 100pc accurate.

Its incredibly challenging to be so honest with your insights. In fact, to progress your career in data science it often feels like you need to develop the best performing model and use the most sophisticated techniques. There have been times in my career when I have chosen to have the easier conversations, to be passive rather than debate why a problem might not be possible to solve with data science.

My advice is always be honest, always have those difficult conversations, push back when you need to and have the confidence to speak openly to your stakeholders regardless of their seniority. Ultimately, people will have more respect and trust for you if you do.

I would say there was a combination of people that influenced my career. My university professor Frank helped give me the confidence to ask the stupidest of questions and sparked my initial interest in research. Irina, my PhD supervisor, showed me what it takes to strive for perfection, and my first boss Gawain helped show me the reality of the commercial world.

On another side, some individuals have explained complex topics in ways that really helped me understand them. Josh Starmer for example, the individual behind StatQuest videos on YouTube. He helped me appreciate that everyone can understand the most complex topics.

The most obvious trait is the logical part of me loves the methodical nature of data science. If you are new to the field, it may feel like there are a million and one ways to analyse data or create a predictive model. However, there is method to the madness and as you gain experience you realise that most projects follow a similar lifecycle.

I think there are a few other areas which are well suited to a career in analytics. I am the type of person who needs to understand fully how something works. This inquisitive nature drives me to ask questions and get to the root of a problem. It helps to be inquisitive and understand how your solution will solve the actual problem.

Lastly, communication and collaboration are everything in data science. You can be the most technically proficient individual, however, the models you produce as a data scientist are useless if you cannot explain how they work and can then collaborate with your team to move your model into production, to consume real-world data and make actionable insights.

Data science is one of the most diverse fields imaginable. Frankly, its overwhelming. The norm for progressing to a more senior role is to deepen your knowledge in one of the three big areas of data science computer vision, NLP or predictive modelling before branching out into another area.

Try to ignore the feeling that you need to be knowledgeable across all of these areas. If you are reading a job description that is asking for knowledge across all three disciplines, they dont need a single data scientist, they need a full team of specialists.

At Liberty IT, one of the questions we ask all new starts is what area excites you the most in data science? Ultimately, if you have never worked on a computer vision project but are eager to explore this side of data science, they will pair you with an experienced individual to help you learn and progress in that area.

I feel like there is a real emphasis on training and development at Liberty IT that makes it much easier to progress your career. Everyone in our team is given the opportunity to take on board other responsibilities that help develop their career, such as interviewing, supervising more junior employees and teaching others within the data science community. Its honestly a really nice place to work.

Lastly, I think there is a misconception about career progression in the tech industry. Many think that a promotion simply means more money. As you progress in your career your responsibility changes. I think its worthwhile reflecting on what type of work you enjoy the most, the technical aspect or the management and strategy side.

Its my opinion that the most interesting careers are the ones that you find difficult to describe to your parents, what exactly it is you do, and where you define the language of your discipline.

Data science and the general field of analytics have been exactly this for me. Its changing rapidly and theres a lot to learn.

If youre starting your career in analytics or data science, imposter syndrome is something you need to be mindful of. Dont panic about not knowing everything. Ask questions and be open to learning new things.

10 things you need to know direct to your inbox every weekday. Sign up for theDaily Brief, Silicon Republics digest of essential sci-tech news.

Visit link:

'Communication and collaboration are everything in data science' - Siliconrepublic.com

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week – LJ INFOdocket

Journal Article: "Why It Takes a Village to Manage and Share Data"

The article linked below was recently published by Harvard Data Science Review (HDSR). Title Why It Takes a Village to Manage and Share Data Authors Christine L. Borgman UCLA Philip ...

From the Seattle Public Library: The Seattle Public Library and Seattle arts organization Wa Na Wari are partnering on a project to advance racial equity in American archives, as part ...

17% Salary Increase Part Of First-Ever Librarian Union Deal With University Of Michigan (via @MLive) Information Processing Society of Japan (IPSJ) is Joining The Wikipedia Library! (via Diff) National Science ...

The article linked below was published today by Scientometrics. Title Impact Factions: Assessing the Citation Impact of Different Types of Open Access Repositories Authors Jonathan Wheeler University of New Mexico ...

The article linked below was posted on arXiv. Title Information Retention in the Multi-Platform Sharing of Science Authors Sohyeon Hwang Northwestern University Emke-gnes Horvt Northwestern University Daniel M. Romero University ...

From a Joint News Release: A new global study from AIP Publishing, the American Physical Society (APS), IOP Publishing (IOPP) and Optica Publishing Group (formerly OSA) has found that 82% ...

From the Institute of Museum and Library Services: The Institute of Museum and Library Services today announced 71 awards totaling $21,189,566 to support libraries and archives across the country. The ...

From The Library of Congress: The Library of Congress today announced the appointment of two digital transformation leaders to direct acquisition, discovery, use and preservation of the Librarys collections. Kate ...

The article linked below was published today by Data Science Journal. Title A Critical Literature Review of Historic Scientific Analog Data: Uses, Successes, and Challenges Authors Julia A. Kelly University ...

From Rutgers Today: A Rutgers researcher is teaming up with a professor from Yale to develop a digital database dedicated to the study of Black-authored and Black-published books, magazines, and ...

FCC Announces $77 Million In Emergency Connectivity Funding For Schools And Libraries To Help Close The Homework Gap (via FCC) Ford, Mellon and MacArthur Foundations Transfer Sole Ownership of Historic ...

From a NOAA News Release: A comprehensive update to NOAAs Billion Dollar Disasters mapping tool now includes U.S. census tract data providing many users with local community-level awareness of ...

See more here:

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week - LJ INFOdocket

Hex Wants to Build the Frontend for the Modern Data Stack – thenewstack.io

What Google Docs did for word processing and Figma did for interface design, Hex hopes to do for data science. Which is to say, make data science and analytics a collaborative process, using a slick web-based user interface.

I spoke to Hex CEO Barry McCardel about how the data stack will change over the 2020s and what it means for data science notebooks, a common tool for data scientists.

We think of ourselves as building the frontend for the modern data stack, McCardel began. While thats the ultimate goal, practically speaking Hex fits into a category of tools known as the data science notebook. It competes with other such products, like Jupyter, Amazon SageMaker and Google Colab. Programming notebooks have actually existed since at least the late 1980s when Mathematica was launched. But in the modern cloud computing era, the open source Jupyter Notebook has been the flag-bearer for the data science notebook industry. Project Jupyter was launched in 2014, as a spinoff of IPython (Interactive Python).

I think at the core, notebooks are really just a very nice way to be able to do iterative analysis, McCardel told me. It basically breaks your code up into chunks, called cells. So you have re-runnable individual chunks. And those chunks can both run as a small unit, but also show you the output. So that was really one of the core innovations with the notebook format, where I can run a small bunch of the code and see what it does maybe Im seeing a chart, or a table, or just a result set, or whatever.

However, McCardel views Hex as more than just a notebook, or a Jupyter in the cloud (which is how he classifies some of Hexs competitors). What we see from our customers is that they arent just looking for a notebook solution, he said. Theyre looking for something that helps them share and helps them bring in people from different types of backgrounds and stakeholders.

Hex is trying to position itself as not just a tool for data scientists, but for data analysts and Business Intelligence (BI) roles both of which are less technical and more business-focused.

In Hex, users can use a no-code interface, or do queries in SQL or Python. Data scientists have traditionally used Python, but according to McCardel, that isnt necessarily the case with Hex users.

This idea that, oh, data scientists are high-end technical and use Python, and SQL is lesser, is not true at all! SQL is really good at a bunch of things. And youll talk to a lot of data scientists who use mostly SQL, and thats totally cool. It doesnt make them less of a data scientist. Theres a lot of really great things you can do [with SQL].

Clearly, though, Hex is eyeing a much broader market than competing tools that simply focus on data scientists. Its reminiscent of many of the popular low-code platforms that have emerged in the enterprise development market in recent years, most of which also target business users.

We want it to be easily accessible for people of all technicality levels, said McCardel, regarding the target users for Hex. People with just baseline data knowledge and curiosity, people who might be coming from spreadsheets or BI tools.

So are Python users adopting Hex too, I asked?

Yeah, its everything Jupyter can do and more. If youre coming from Python-based data science work, Hex will be familiar and powerful and give you a lot of new superpowers.

Another interesting aspect of Hex is that it has seemingly joined forces with two other modern data companies. Snowflake and Databricks, two leading data warehouses, were both investors in Hexs most recent funding round in March. So I asked McCardel, how would a data professional use Hex alongside one or both of those other tools?

So, Hex sits on top of those environments on top of Snowflake and on top of Databricks and helps customers make the most of the data that are in those environments. If you have brought all of your data into a data warehouse, including both of those two, often the next question is: so what now? How do I make this useful and impactful for the organization? Hex really seeks to answer that question.

Ultimately though, a lot hinges on Hexs ability to emulate the likes of Google Docs and Figma in becoming a user-friendly tool that also offers enough oomph to keep power users happy. In this case, the data scientists are the power users.

McCardel admires how Figma allows designers to share their work and bring other people into the process as stakeholders, and he wants to achieve that in Hex for data scientists. Not only that, he said, but Figma translates a lot of those people [stakeholders] into editors. The users become creators in the system, not just viewers.

So I get really excited when I see that type of thing happen in Hex, he continued. We see engineers and product managers and other people come in and actually become editors. Hex is not just for high-end data scientists.

Finally, we talked about where things are headed for the data stack. With tools like Snowflake, DataBricks, and perhaps now Hex, the tools available to enterprise users are increasingly sophisticated and easily accessible via the web. I was curious for McCardels thoughts on what comes next.

The last few years have been this revolution in the data world, on the integration story, he said. Im using Fivetran or Stitch or Airbyte to get my data from source into my warehouse. I now have my data in my warehouse, I can use DBT [data build tool] to transform it. I can use observability tools to monitor for quality. [] I can store this at any scale. I can run queries at any scale, you dont need to worry about provisioning servers anymore. I have been working in data for like 10 years, and its such a huge difference from 10 years ago. Its like the ability as an organization to just have all of my data in one place and have it integrated, clean and ready to go.

However, he added, the story is not over. He thinks the next part of the cloud data revolution will be in frontend tooling, or as he put it, what that new frontend for the data stack is.

Hex is clearly eyeing the data stack frontend, but well just have to wait and see whether it can capture that market as well as Figma did for interface design.

Feature image via Shutterstock.

See the original post here:

Hex Wants to Build the Frontend for the Modern Data Stack - thenewstack.io

MCI Onehealth Partners with MDClone to Accelerate Research through Global Clinical Intelligence Offering – Bio-IT World

MCI Onehealth Partners with MDClone to Accelerate Research through Global Clinical Intelligence Offering

By granting global partners access to real-world insights and synthetic data, the new partnership aims to accelerate research and inspire new therapy development to drive better patient outcomes

TORONTO, July 28, 2022 (GLOBE NEWSWIRE) --MCI Onehealth Technologies Inc. (MCI) (TSX: DRDR), a clinician-led healthcare technology company focused on increasing access to and quality of healthcare, andMDClone, a digital health company and leader in synthetic data, are pleased to announce an advanced clinical intelligence offering for their global partners. This offering combines real-world health insights with mirrored synthetic data to power deeper research and inspire novel therapeutic development.

MCIs collaboration with MDClone will provide our partners with greater access to high-valuedata-insights-as-a-servicefor an array of research, clinical and data science needs, saidDr.Alexander Dobranowski, MD, Chief Executive Officer of MCI. Whether through MCIs clinic network, international healthcare providers, or pharmaceutical, life sciences and biotech partners, our mutually enhanced insights will help to quickly translate healthcare data and research into improved health and quality of life for patients.

The real-world patient health journeys that MCIs tech-enabled network is able to capture offer a comprehensive picture to researchers, who can benefit from a fuller perspective. The partnership between MCI and MDClone will leverage MDClones technology to load, organize and protect MCI-generated patient data and use this data to help find insights to improve care. In addition, MCI and MDClone intend to work together to improve data collection and curation to better serve the needs of applied healthcare research.

MDClone offers clients robust, detailed data for thorough end-to-end, real-world analysis. Using the MDClone ADAMS Platform analytics tools and synthetic data capabilities, clinicians, researchers, and healthcare professionals can explore healthcare data more efficiently to accelerate real-world evidence processes. Withsynthetic data capabilitiesat the forefront, users can leverage self-service tools to access, analyze, and share information without privacy concerns. Additionally, the real-time identification and extraction of information about a specific population of interest allows users at healthcare systems to overcome some of the common barriers that can slow clinical data projects progress.

Were thrilled to partner with innovators like MCI in the healthcare and life science industries and beyond. Together, we can provide tailored clinical insights that meet clients needs, and from those insights, MDClone can generate synthetic data that researchers can use to better understand disease progression, enhance care delivery, and develop new products that can improve patient outcomes, saidJosh Rubel, Chief Operating Officer of MDClone.

In keeping with its objective to be a preeminent health technology leader, MCI nurtures international opportunities to leverage its vast pool of high-quality structured clinical information. The MDClone ADAMS Platforms unique ability to convert datasets and cohorts of interest intosynthetic filesthat are statistically comparable to the original data, but composed entirely of artificial patients, aids in broader and more secure access and opens the doors to third-party access and larger-scale research impact.

MCIs audience for health insights continues to grow in Canada and will further benefit from access to MDClones global roster of top-tier health system and pharma relationships. The collaboration with MDClone will accelerate MCIs entry into the clinical insights and analytics sectors in the United States of America and Israel, including potential access to headquarter-level decision-makers of global pharma and life science leaders.

Through this commercial arrangement, we each have the benefit of immediate introduction to the active client rosters of the other, and we each gain a superior andunique offering to acquire new partners, fueling the expansion of MCIs health insight services into international markets, added Dr. Dobranowski.

About MCI

MCI is a healthcare technology company focused on empowering patients and doctors with advanced technologies to increase access, improve quality, and reduce healthcare costs. As part of the healthcare community for over 30 years, MCI operates one of Canadas leading primary care networks with nearly 260 physicians and specialists, serves more than one million patients annually and had nearly 300,000 telehealth visits last year, including online visits viamciconnect.ca. MCI additionally offers an expanding suite of occupational health service offerings that support a growing list of nearly 600 corporate customers. Led by a proven management team of doctors and experienced executives, MCI remains focused on executing a strategy centered around acquiring technology and health services that complement the companys current roadmap. For more information, visitmcionehealth.com.

About MDClone

MDClone offers an innovative, self-service data analytics environment powering exploration, discovery, and collaboration throughout healthcare ecosystems cross-institutionally and globally. The powerful underlying infrastructure of theMDClone ADAMS Platformallows users to overcome common barriers in healthcare in order to organize, access, and protect the privacy of patient data while accelerating research, improving operations and quality, and driving innovation to deliver better patient outcomes. Founded in Israel in 2016, MDClone serves major health systems, payers, and life science customers in the United States, Canada, and Israel. Visitmdclone.comfor more information.

For media enquiries please contact:Nolan Reeds | MCI Onehealth | nolan@mcionehealth.comErin Giegling | MDClone | erin.giegling@mdclone.com

Read the original post:

MCI Onehealth Partners with MDClone to Accelerate Research through Global Clinical Intelligence Offering - Bio-IT World

These are the roles available in data and analytics – Siliconrepublic.com

Hays Martin Pardey explains what a data analytics professional does, what they can expect in their career and how to develop the necessary skills.

Data analytics has become a critical part of many businesses, but even within the analytics space, there are many roles available, including a data analyst, data engineer, data scientist and data manager. All these roles contribute to the goal of deriving meaningful insight from data.

Data analysts derive insight from data, while data engineers extract and manipulate data from systems and build data capability.

Data scientists, meanwhile, are able to build predictive models that help organisations make decisions based on potential future events, as well as driving automation and artificial intelligence systems. Lastly, data managers look after the data to ensure quality, governance and security.

Data roles used to be mainly centred around extracting simple management information and building reports for key stakeholders so that they could accurately analyse company performance. Now, as organisations become more data-centric, these roles have become complex.

There is a greater focus on ensuring that vast amounts of data can be analysed and accessed at any time across the organisation, as well as be used to build predictive models and power AI systems.

As a result, many organisations have now built internal data practices that employ many different types of data professional.

We have seen salaries for these roles increase steadily over the last few years as demand grows among organisations for better data.

An entry-level data analyst in permanent employment can expect to earn anywhere from 25,000 to 30,000 in the UK, while advanced data engineers and data scientists could even command six-figure salaries. Contract rates vary greatly depending on roles and skills.

Firstly, decide what area of data you want to work in. Are you highly analytical? Do you enjoy number-crunching and solving business problems? Or are you more technical, with a thirst for building data platforms and extracting the right data?

Where possible, try to develop your skills. See whether you can get any hands-on experience where you can actually apply the skills although, admittedly, this is much easier if you are already at an organisation or educational institution.

If youre in employment, look for opportunities to work with data within your current company. Ive heard of companies that are training employees in non-data roles to become data professionals in internal data academies.

For those in higher education, enquire about any projects you can work on. Many universities now have partnerships with major corporate organisations so that you can contribute to real life data projects.

While the data and analytics profession is strong, the need for data insights across all business levels means data skills have become critical even outside the technical sphere.

Research from Digital Realty revealed that more than one in five (21pc) IT leaders globally highlighted that the lack of internal talent to analyse data, and the lack of talent to build technical capacity (21pc), are among the greatest obstacles their organisations are facing when drawing insights from their data.

Luckily, there is a whole host of online courses out there, depending on what area of data you wish to pursue. For example, My Learning, Hays free online learning portal, has lessons in data science and analytics for those interested.

Data analysis forms part of a lot of roles these days. If youre in employment, you will likely have access to data and reporting systems in your current role. Make sure that you are fully trained on how to use them.

Seek out the head of data in your current company and talk to them about what it takes and whether they can provide you with any support. Enquire about learning resources that your employer already provides and whether there are any courses, classes or even modules that are directly relevant to what you want.

The value of being able to work with data effectively is high, so organisations are likely to see the benefit in supporting you in upskilling as they will benefit from your new skillset at the same time.

By Martin Pardey

Martin Pardey is a director for technology solutions at Hays UK with more than 20 years personal recruitment experience in the sector.

10 things you need to know direct to your inbox every weekday. Sign up for theDaily Brief, Silicon Republics digest of essential sci-tech news.

Continued here:

These are the roles available in data and analytics - Siliconrepublic.com

ACD/Labs and TetraScience Partner to Help Customers Increase Scientific Data Effectiveness – PR Newswire

BOSTON, July 28, 2022 /PRNewswire/ --TetraScience, theScientificData Cloud company, announced today that ACD/Labs, a leading provider of scientific software for R&D, has joined the Tetra Partner Network to help pharma and biopharma customers achieve greater scientific insights and outcomes.

"We are thrilled to partner with ACD/Labs, who have a long history of innovating how customers use analytical data analysis in R&D," said Simon Meffan-Main, Ph.D., VP, Tetra Partner Network. "Combining their characterization, lead optimization and interpretation products with the Tetra Data Platform will further help customers respond to the ever increasing pace of innovation in biopharma."

For decades ACD/Labs has been helping scientists to assemble multi-technique analytical data from major instrument vendors in a single environment. The company's Spectrus platform standardizes analytical data processing and knowledge management to help customers get answers, make decisions, and share knowledge. Digital interpretations stored with chemical context and the expert's annotations enable R&D organizations to store and manage knowledge that is chemically searchable. ACD/Labs' enterprise technologies remove the burden of routine data analysis from the scientist, automate data marshalling, and improve data accessibility and integrity.

The Tetra Data Platform produces Tetra Data, which is vendor-agnostic, liquid, and FAIR (Findable, Accessible, Interoperable, Reusable) scientific data that can be searched, accessed, and analyzed across the pharmaceutical and biopharmaceutical pipelines. With this partnership, customers will be able to use Tetra Data with ACD/Labs' Spectrus products to accelerate workflows and analyze scientific data with more specificity.

"Solutions from ACD/Labs and TetraScience work to remove the burden of data management from the scientist's workflow and make the IT function more effective," saidGraham McGibbon, Director of Strategic Partnerships, ACD/Labs. "We sharea common goal of creating unrestricted innovation for scientists and IT departments and are delighted to be part of the Tetra Partner Network."

"Industry participants of all kinds global pharmas, biotech startups, informatics providers, CROs, biopharma app companies, and more recognize that this movement to the Scientific Data Cloud must be driven by vendor-neutral and open partnerships that are deeply data-centric," explained Patrick Grady, CEO of TetraScience. "Biopharma needs to unify and harmonize experimental data in the cloud, in order to fully capitalize on the power of AI and data science. In turn, AI and data science will uncover insights that will accelerate discovery and development of therapeutics that extend and enhance human life. We are thrilled to further extend this network together with ACD/Labs."

To learn more about ACD/Labs and our partnership, please read the blog "Science at Your Fingertips - Across the Enterprise".

About TetraScience

TetraScience is the Scientific Data Cloud company with a mission to accelerate scientific discovery and improve and extend human life. The Scientific Data Cloud is the only open, cloud-native platform purpose-built for science that connects lab instruments, informatics software, and data apps across the biopharma value chain and delivers the foundation of harmonized, actionable scientific data necessary to transform raw data into accelerated and improved scientific outcomes. Through the Tetra Partner Network, market-leading vendors access the power of our cloud to help customers maximize the value of their data. For more information, please visittetrascience.com.

About ACD/Labs

ACD/Labs is a leading provider of scientific software for R&D. We help our customers assemble digitized analytical, structural, and molecular information for effective decision-making, problem solving, and product lifecycle control. Our enterprise technologies enable automation of molecular characterization and facilitate chemically intelligent knowledge management.

ACD/Labs provides worldwide sales and support, and brings decades of experience and success helping organizations innovate and create efficiencies in their workflows. For more information, please visit http://www.acdlabs.com or follow ACD/Labs onTwitterandLinkedIn.

SOURCE TetraScience

View post:

ACD/Labs and TetraScience Partner to Help Customers Increase Scientific Data Effectiveness - PR Newswire

Harvard team wins Boston Regional Datathon for second straight year – Harvard School of Engineering and Applied Sciences

Last year, Aakash Mishra and Frank DAgostino learned an important distinction in data science while competing in the Citadel Boston Regional Datathon. Their team built a model to accurately predict Airbnb real estate prices in the southern United States, but failed to place. Another Harvard team won the competition by linking public trust in the government with increased mortality rates during the COVID-19 pandemic.

That loss taught the two incoming fourth-year students at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) that no matter how good a model is, data scientists should seek to use that model to address a concrete challenge.

You have to understand what is an important issue that needs an answer, and then you need to use your technical know-how to answer that question, said DAgostino, whos pursuing an A.B. in applied mathematics. Its the combination of not only the topic and how you come up with a problem, but also how you approach it in a way thats rigorous enough to be accepted.

Mishra and DAgostino took that lesson into their second attempt at the Boston Regional Datathon earlier this year, with much better results. Along with Viet Vu (A.B. 23, statistics) and MIT masters student Doron Hazan, the team proved a causal relationship between the FDA-approved drug glipizide and an increased rate of heart failure in diabetic patients. Their efforts netted them the $15,000 first prize and a trip to the World Championship Datathon later this year in New York.

What we did this time was very actionable, Mishra said. Knowing the effect of glipizide or another type of drug on diabetic patients, and being able to provide exact numbers linking it to heart failure, could help inform doctors.

The 2022 datathon was a one-day event in which teams were given data sets in the morning, then had six hours to complete their analyses and submit their methodology and reports to the judges. For the SEAS team, the data consisted of 70,000 anonymized patient records and medical histories.

When we originally got the data, it was really messy, said Mishra, whos pursuing an A.B. in computer science. There were a lot of missing rows, parts of the data that didnt make sense, and parts of the data set that werent filled out correctly. We had to figure out what we were going to keep, what we were going to throw away, and what we were going to infer.

Cleaning up the data and deciding what challenge to address took about three hours. The second half of the day consisted of running the models while team members worked on the background portions of the report, then analyzing the results in the final hour.

We all have different backgrounds, Mishra said. Frank is more into data science, Viet is into computational biology, and Im computer science. So, a lot of the things I did early on were making sure we had proper features to train our models on and developing the code for the models themselves. Frank tried to figure out the best way to approach creating these models and what we could infer from them, and Viet came up with the mathematical background for them and what results we could take away.

The Datathon forced the team to draw on numerous lessons from their coursework at SEAS. They derived their results using concepts such as hierarchical linear models and synthetics controls, both of which they learned in Harvard courses, as well as an overall approach to data analysis.

In class, were taught that you have to explore the data first, and then after that you choose some features, Mishra said. Then you want to figure out what the most interesting trends in the data are, and after that try to develop a model. Having that in mind helps with doing this quickly.

That approach paid off for Mishra, DAgostino and their teammates, and they'll need to stick to that formula if they want to capture the $100,000 prize at the world championships in New York City.

Just like in the Boston Regional Datathon, the key will be coming up with the right question to answer using the data.

In a lot of coursework, they kind of just give us the questions in a problem set and we solve them, DAgostino said. In the real world, you dont even know the question half the time. Once you have the question, then its easy to answer it.

Read more:

Harvard team wins Boston Regional Datathon for second straight year - Harvard School of Engineering and Applied Sciences

Want your companys A.I. project to succeed? Dont hand it to the data scientists, says this CEO – Fortune

Arijit Sengupta once wrote an entire book titled Why A.I. is A Waste of Money. Thats a counterintuitive title for a guy who makes his money selling A.I. software to big companies. But Sengupta didnt mean it ironically. He knows firsthand that for too many companies, A.I. doesnt deliver the financial returns company officials expect. Thats borne out in a slew of recent surveys, where business leaders have put the failure rate of A.I. projects at between 83% and 92%. As an industry, were worse than gambling in terms of producing financial returns, Sengupta says.

Sengupta has a background in computer science but he also has an MBA. He founded BeyondCore, a data analytics software company that Salesforce acquired in 2016 for a reported $110 million. Now hes started Aible, a San Francisco-based company that provides software that makes it easier for companies to run A.I. algorithms on their data and build A.I. systems that deliver business value.

Aible makes an unusual pledge in the A.I. industry: it promises customers will see positive business impact in 30 days, or they dont have to pay. Their website is chock full of case studies. The key, Sengupta says, is figuring out what data the company has available and what it can do easily with that data. If you just say what do you want, people ask for the flying car from Back to the Future, he says. We explore the data and tell them what is realistic and what options they have.

One reason most A.I. projects fail, as Sengupta sees it, is that data scientists and machine learning engineers are taught to look at model performance (how well does a given algorithm do with a given data set at making a prediction) instead of business performance (how much money, in either additional revenue or cost-savings, can applying A.I. to a given dataset generate).

To illustrate this point, Aible has run a challenge in conjunction with UC Berkeley: it pits university-level data science students against high school 10th graders using a real-world data set comprised of 56,000 anonymized patients from a major hospital. The competing teams must find the algorithm for discharging patients from the 400-bed hospital that will make the hospital the most money, understanding that keeping patients in the hospital unnecessarily adds costs, but so does making a mistake that sees the same patient later readmitted. The winner gets $5,000. The data scientists can use any data science software tools they want, while the high school kids use Aibles software. The high school kids have beaten the data scientistsby a mileevery time theyve run the competition, Sengupta says.

The teens, Sengupta says, are able to keep their eyes on the bottom line. Theyre not concerned with the particular model that Aible suggests (Aible works by training hundreds of different models and finding the one that works best for a given business goal), whereas the data scientists get caught up on training fancy algorithms and maximizing accurate discharge predictions, but losing sight of dollars and cents.

Senguptas point is that ignoring, or not actually understanding, the business use of an A.I. system can be downright dangerous. He describes what he calls the A.I. death spiral, where an A.I. system maximizes the wrong outcome and literally runs a business into the ground. Take for example an A.I. system designed to predict which sales prospects are most likely to convert to paying customers. The system can achieve a higher accuracy score by being conservativeonly identifying prospects that are highly likely to convert. But that shrinks the pool of possible customers significantly. If you keep running this optimization process using only the small number of customers who convert, the pool will just keep shrinking, until eventually the business winds up with too few customers to sustain itself. Customer win rate, Sengupta says, is the wrong metricthe A.I. should be trained to optimize revenue or profits, or maybe overall customer growth, not conversion rates.

Sidestepping these pitfalls requires a little bit of machine learning understanding, but a lot of business understanding. Sengupta is not alone in hammering home this theme. Its a point that a lot of those working on A.I. in commercial settingsincluding deep learning pioneers such as Andrew Ngare increasingly making: algorithms and computing power are, for the most part, becoming commodities. In most of the case studies on Aibles website, customers used the startups cloud-based software to train hundreds of different models, sometimes in less than 10 minutes of computing time. Then the business picks the model that works best.

What differentiates businesses in their use of A.I. is what data they have, how they curate it, and exactly what they ask the A.I. system to do. Building models is becoming a commodity, Sengupta says. But extracting value from the model is not trivial, thats not a commodity.

With that, heres the rest of this weeks A.I. news.

Jeremy Kahn@jeremyakahnjeremy.kahn@fortune.com

Our mission to make business better is fueled by readers like you. To enjoy unlimited access to our journalism, subscribe today.

Read more here:

Want your companys A.I. project to succeed? Dont hand it to the data scientists, says this CEO - Fortune