Category Archives: Data Science
KDnuggets Survey: Benchmark With Your Peers On Data Science … – KDnuggets
Partnership Content
The All Things Insights Survey Committee along with KDnuggets, AI Business, The AI Summit, Enter Quantum, IOT World Today, the Digital Analytics Association and Marketing Analytics and Data Science have created a Spend & Trends survey to provide you the opportunity to benchmark with your peers on how they are spending and the mindsets around current trends.
The results from this survey will provide you and your colleagues in our community with much needed benchmarking information on mindset and focus trends as well as budget and technology spend.
Well analyze the responses and output results into the Spend & Trends Report.
Our goal is to provide resources for analytics and data science disciplinarians to better collaborate with and within the marketing function as well as the rest of the organization.
Alchemer is trusted by tens of thousands of brands around the world. Please take my survey now
Well send you the Report as soon as its released. Your responses will be kept completely confidential. We appreciate your timethis research helps our entire industry and we cant do it without you. Thank you for helping us advance the analytics and data science discipline.
See the original post:
KDnuggets Survey: Benchmark With Your Peers On Data Science ... - KDnuggets
Analytics and Data Science News for the Week of September 15 … – Solutions Review
Solutions Review editors curated this list of the most noteworthy analytics and data science news items for the week of September 15, 2023.
Keeping tabs on all the most relevant analytics and data science news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last week, in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy analytics and data science news items.
Included in Toolbox is Anaconda Assistant, the recently released AI assistant designed specifically for Python users and data scientists, which can guide you in your first steps or supercharge your work, even if you have advanced experience.
Read on for more.
The Databricks Lakehouse unifies data, analytics and AI on a single platform so that customers can govern, manage and derive insights from enterprise data and build their own generative AI solutions faster. The support from Databricks financial and strategic partners comes on the heels of its Q2 momentum.
Read on for more.
This new product, driven by data semantics and real-world relevance, eliminates a major headache for data science teams preparing and deploying AI data. Powered by Generative AI, FeatureByte Copilot saves data science teams significant time, effort, and resources while moving AI projects from ideation to implementation faster, at scale, and with greater accuracy.
Read on for more.
Shared device mode is a device-level configuration that enables single sign-on (SSO) and device-wide sign-out for Microsoft Power BI and all other apps on the device that support this configuration.With shared device mode, frontline workers can securely share a single device throughout the day, signing in and out as needed.
Read on for more.
With Qlik Staige, customers can innovate and move faster by making secure and governed AI part of everything they can do with Qlik from experimenting with and implementing generative AI models to developing AI-powered predictions.
Read on for more.
Qrvey enables dashboard creators to build reports using different data sources to create customizable dashboards specific to their business needs. This means end users can have a single dashboard that combines data sourced from Snowflake and data sourced from Qrvey.
Read on for more.
The assistant, called Einstein Copilot, can summarize video calls, deliver personalized answers to customer questions and generate emails for marketing campaigns, among others, the company said ahead of its Dreamforce conference this week. AI copilots function like a virtual assistant which can set reminders, schedule meetings and also create content while a Generative Pre-trained Transformer (GPT) uses human language to answer questions and produce content requested by the user.
Read on for more.
SAS Viya Workbench is currently available under private preview, with general availability estimated for early 2024. For synthetic data generation, SAS is working with customers in the banking and health care industries. SAS is also extensively researching the application of large language models (LLMs) to industry problems with a primary focus on delivering trusted and secure results to customers.
Read on for more.
The investment has been led by World Trade Ventures with participation from new and existing investors. It takes the total capital raised by SQream to $135 million and comes at a time when data and analytics workloads are increasing at a breakneck pace.
Read on for more.
This long-running annual event provides attendees the opportunity to hear inspiring keynotes, learn from real-world success stories, and gain key insights on how to solve some of the biggest data challenges that companies face.
Read on for more.
Watch this space each week as Solutions Review editors will use it to share new Expert Insights Series articles, Contributed Shorts videos, Expert Roundtable and event replays, and other curated content to help you gain a forward-thinking analysis and remain on-trend. All to meet the demand for what its editors do best: bring industry experts together to publish the webs leading insights for enterprise technology practitioners.
With the next Spotlight event, the team at Solutions Review has partnered with leading developer tools provider Infragistics. The vendor will bring two of its biggest tools in the market together App Builder and Reveal to show you how to create end-to-end solutions with beautiful UX, interactions, theming, data binding, and self-service dashboards and embedded BI quickly.
Read on for more.
For consideration in future data science news roundups, send your announcements to the editor: tking@solutionsreview.com.
Tim is Solutions Review's Executive Editor and leads coverage on data management and analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in Data Management, Tim is a recognized industry thought leader and changemaker. Story? Reach him via email at tking@solutionsreview dot com.
Read this article:
Analytics and Data Science News for the Week of September 15 ... - Solutions Review
University of Illinois: Information Sciences Professor Developing … – LJ INFOdocket
From the University of Illinois:
JooYoung Seo, a professor ofinformation sciencesat the University of Illinois Urbana-Champaign, is developing a data visualization tool that will help make visual representations of statistical data accessible to researchers and students who are blind or visually impaired.
The multimodal representation tool is aimed at the accessibility of statistical graphs, such as bar plots, box plots, scatter plots and heat maps.
Sighted people can pick up a great deal of insight and get the big picture from visualization, but visualized data is very challenging to those who are visually impaired, said Seo, whose research includes accessible computing, universal design and inclusive data science. Seo, who is blind, is a certified accessibility expert. He also is affiliated with U. of I.sNational Center for Supercomputing Applications, where he is addressing accessibility issues for a National Science Foundation-funded high-performance computing project.
Learn More, Read the Complete Article
Filed under: Data Files, Maps, News
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area.He earned his MLIS degree from Wayne State University in Detroit.Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.
Original post:
University of Illinois: Information Sciences Professor Developing ... - LJ INFOdocket
Joe Depa named inaugural chief data and analytics officer at Emory – Emory News Center
ATLANTA Joe Depa, a global leader in data operations, analytics and artificial intelligence (AI), has been named Emory Universitys inaugural chief data and analytics officer. He began his new position on Sept. 11.
In this inaugural role, Depa will use the power of data to enhance health outcomes by ensuring better patient care and reducing clinician burnout, expand Emorys academic impact through groundbreaking research and education, and create an environment where the Emory community can thrive by focusing on efficiency and culture. Depas new position will support both the Emory University and Emory Healthcare data infrastructure.
Joes expertise and experience is a perfect fit for Emory at this time, as we seek to leverage the power of data and AI to enhance our capabilities in academic, administrative and research areas and improve patient outcomes, says John Ellis, PhD, interim chief information officer and senior vice provost for Emory University.Joe is also passionate about using data for good and is committed to our mission of improving the health of individuals and communities at home and throughout the world. We welcome Joe warmly as he beginsthis pivotal work.
Depa comes to Emory from Accenture, a Fortune 50 technology provider, where he served as the senior managing director and global lead for data and AI for the companys strategy and consulting business. There he managed their award-winning team of global professionals specializing in data science and AI strategy, and served on the global leadership committee. He focused on helping clients in health, life sciences and across industries to leverage data to develop new clinical data products, improve the patient and employee experience and reduce operating expenses.
As health care pivots to address patient access, workforce shortages and ballooning expenses, AI, machine learning and large language models have the potential to help, but only if guided by the right expertise, says Alistair Erskine, MD, chief information and digital officer for Emory Healthcare and vice president of digital health for Emory University.Joes experience in and out of health care, combined with his purpose-driven mission to alleviate human suffering, makes him the ideal inaugural leader for this critical role.
I am excited to join Emory in this new role to help enrich the patient, clinician and researcher experience through AI and data science, says Depa. This position supports a purpose-driven mission, using the power of data, to help advance positive changes in the lives of patients being cared for at Emory, in our daily work on our campuses and in our society.
Depa received a bachelors degree in industrial and systems engineering and a masters degree in analytics from Georgia Institute of Technology (Georgia Tech). Outside of work, he is a board member for Cure Childhood Cancer and founder and supporter of other organizations focused on research and advancing precision medicine for childhood cancer.
See original here:
Joe Depa named inaugural chief data and analytics officer at Emory - Emory News Center
Data-driven insights: Improving remote team performance with time … – Data Science Central
The way we work has changed, with remote teams now a common part of the landscape. While remote work offers flexibility, it also brings challenges. Managing remote teams effectively is crucial to ensure productivity and collaboration.
In this article, well explore how using time tracking for remote teams can help manage employees performance better. Time-tracking tools provide insights into how work is done, helping organizations make informed decisions. Well see how analyzing time-tracking data reveals when teams are most productive and how tasks are managed. By understanding these patterns, organizations can enhance remote team performance and achieve better outcomes.
Time-tracking apps usually capture detailed information about tasks, projects, and activities, including start and end times, task descriptions, and breaks taken. They generate reports that display time allocation across different projects, clients, or categories, shedding light on where your efforts are concentrated. Furthermore, these apps often provide visual representations like charts and graphs, illustrating productivity trends, peak hours, and patterns of time distribution.
By analyzing this data, individuals and teams can gain valuable insights into how time is being allocated, identify bottlenecks, and streamline processes. This data-driven approach enables better time management and helps prioritize tasks effectively.
At the heart of effective time tracking for remote teams lies the practice of meticulously recording daily activities. From the moment a remote worker starts their day to when they sign off, every task, break, and project engagement is captured. This detailed chronicle not only offers a panoramic view of how time is spent but also highlights potential areas for optimization.
This approach offers transparency into each team members workflow. Managers gain insights into the types of tasks being executed, the time dedicated to each task, and potential areas where efforts might be misplaced.
Furthermore, tracking daily activities brings to light the ebbs and flows of each team members work patterns. This knowledge empowers remote teams to identify productivity trends, such as the times when individuals are most focused and effective.
Additionally, some time tracking tools offer customizable tagging systems, allowing you to categorize tasks based on their nature or complexity. For instance, users can label tasks as high priority, creative, or routine and later review their tracked time and note when they tackled specific types of tasks with the highest level of energy. This categorization helps you to identify peak productivity hours and the kinds of tasks that thrive during these periods.
Through time tracking, remote teams can pinpoint bottlenecks that hinder productivity. Whether its a recurring task that consumes excessive time or a specific step in a project workflow causing delays, these pain points become apparent. Armed with these insights, individuals and teams can pinpoint these time drains and take targeted actions to minimize them.
Moreover, time tracking data doesnt just show where time is being lost; it offers a deeper understanding of why its happening. Are there particular tasks that consistently take longer than expected? Are there patterns of multitasking that fragment concentration and efficiency? These insights allow for a more holistic analysis of work habits and the identification of underlying causes of time wastage. As a result, teams can implement strategies to address these specific issues.
In addition, many time-tracking tools for remote teams offer reports that show how time is allocated through different websites and apps. It offers a valuable window into your digital behavior, helping you gauge if you are spending excessive time on non-work-related websites. By analyzing these reports, team members can gather insights into whether their online activities align with their intended work goals. For example, if the reports show that you often spend a lot of time on social media or entertainment websites during work hours, its clear that you need to make changes to stay more focused.
By analyzing historical time data across various tasks and projects, teams can gain a clearer understanding of how long certain activities actually take to complete. This insight replaces guesswork with empirical evidence, enabling more accurate and realistic project timelines. As teams delve into the accumulated data, they can identify patterns in task durations, uncover potential bottlenecks, and factor in unforeseen variables that might affect future projects.
Furthermore, time-tracking data facilitates a proactive approach to managing project scope and client expectations. Armed with a comprehensive record of task durations and progress, project managers can provide clients with more transparent updates and realistic forecasts. Should any deviations from the initial project plan arise, the data serves as a valuable reference point to communicate adjustments and potential impacts. This not only fosters stronger client relationships built on trust but also enables teams to adapt swiftly, ensuring project goals remain achievable within the defined timeframe.
Time tracking data plays a great role in fostering a healthier work-life balance, especially in the context of remote work where boundaries between professional and personal life can blur. By providing a clear picture of how time is allocated throughout the day, you can identify when work goes into personal time or vice versa. For instance, if time tracking data reveals that work-related tasks often extend into evenings, you can adjust your work pattern to finish work a bit earlier.
Time tracking for remote teams also helps to reveal whether there are adequate breaks to rest and recharge, or if theres a tendency to overindulge in extended pauses. This information is crucial for sustaining a balanced work routine. If time tracking data shows prolonged periods without breaks, it may suggest incorporating short, regular breaks to prevent burnout and maintain focus. Conversely, excessive and frequent breaks might signal an opportunity to structure work periods more effectively. By analyzing the intervals between productive work sessions and short respites, individuals can fine-tune their approach to breaks, optimizing their productivity and well-being in the process.
By harnessing the power of data-driven insights, remote teams can unlock their true potential. From identifying peak productivity hours to enhancing work-life balance, time-tracking analytics pave the way for informed decisions, personalized strategies, and a more harmonious work environment.
Read the original:
Data-driven insights: Improving remote team performance with time ... - Data Science Central
IST researcher among recipients of $29 million fusion energy … – Pennsylvania State University
UNIVERSITY PARK, Pa. The U.S. Department of Energy awarded a $29 million grant to seven multi-institution teams across the country to explore applications of machine learning, artificial intelligence and data resources in fusion and plasma sciences. A Penn State faculty member is one of the 19 individual recipients recognized, with a share close to $400,000 to focus on the use of machine learning to help mitigate nuclear reactor disruptions.
Romit Maulik, assistant professor in the Penn State College of Information Sciences and Technology (IST) will collaborate with researchers from Los Alamos National Laboratory, the University of Florida and The University of Texas at Austin (UT Austin) over the next three years with this funding. The teams project is titled, DeepFusion Accelerator for Fusion Energy Sciences in Disruption Mitigation. The researchers will focus on using machine learning to better predict and prevent imminent failures in nuclear fusion reactors, which generate energy through the same process that powers the sun.
Artificial intelligence and scientific machine learning are transforming the way fusion and plasma research is conducted, said Jean Paul Allain, associate director for fusion energy sciences within the DOEs Office of Science, in a DOE press release. Allain is currently on leave from his role as the head of the Ken and Mary Alice Lindquist Department of Nuclear Engineering at Penn State. The U.S. is leveraging every tool in its pursuit of an aggressive program that will bring fusion energy to the grid on the most rapid timescale.
Before joining IST this year, Maulik who is also a Penn State Institute for Computational and Data Sciences co-hire had been collaborating on the Tokamak Disruption Mitigation project with Los Alamos National Laboratory to build machine learning algorithms to aid scientific discovery in nuclear fusion. He said this grant will support him as he takes a deeper dive into the machine learning side of things.
Nuclear fusion reactors are prone to catastrophic performance failures, Maulik said. This creates a safety hazard that prevents nuclear fusion from being commercialized or becoming a power source for the grid.
Maulik said one grand challenge is the inability to predict when a reactor will fail. Simulations provide insight, but they may be too slow and expensive to be used in real time to detect what might happen.
We want to use data science to accelerate these simulations dramatically, Maulik said. If we can rapidly predict an imminent failure, we can control the factors that affect our experiment so that this failure may be avoided.
Maulik said the project will develop machine learning models using previously run simulations as well as experimental data that is coming from actual reactor facilities.
Once were able to detect failures ahead of time, well be able to begin proposing mitigation strategies, he said.
Read more here:
IST researcher among recipients of $29 million fusion energy ... - Pennsylvania State University
TUM Launches Munich Data Science Institute to Drive Collaboration … – The Munich Eye
The Technical University of Munich (TUM) is proud to announce the official inauguration of the Munich Data Science Institute (MDSI), a pivotal initiative within the framework of TUM AGENDA 2030. Supported by funding from the Excellence Initiative of the German government and federal states, the MDSI serves as a central hub for advancing the realms of data science, machine learning, and artificial intelligence (AI) at TUM, from foundational research to practical interdisciplinary applications. It also aims to provide training and education opportunities for master's students, researchers, and professionals in the field of data science.
In recent years, groundbreaking developments in machine learning, AI, natural language processing, and computer-based imaging have fundamentally reshaped society, the economy, and the landscape of scientific knowledge. With the aim of bolstering the foundational principles of modern data sciences, machine learning, and AI, and applying these insights to high-potential applications, TUM has established the Munich Data Science Institute (MDSI). As an integrative research institute, MDSI will harness the expertise of over 60 professors across various interdisciplinary domains.
The official launch event for the institute was held at the Galileo Building on the TUM Research Campus in Garching.
Bavaria's Minister of Science, Markus Blume, underscored the significance of data in his keynote address at the MDSI's opening. He stated, "Data is the treasure of our time. The Munich Data Science Institute is our key to the treasure chest and will open the door to innovation. In the MDSI, TUM is bringing together what must come together in the world of data science: business and science, fundamental research and applications. Because only through collaboration and a strong interdisciplinary network can we play a role in shaping the significant transformation of the digital age."
President Prof. Thomas F. Hofmann emphasized the importance of teamwork in the era of machine learning and AI, stating, "To effectively leverage the potential of the age of machine learning and AI, we need to see modern data science as a team sport. With the MDSI, we are delivering fresh impetus to data-based technology developments and integrating them into real-world applications. Machine learning and AI harbor enormous potential. From life sciences and medicine, material and design sciences to quantum science, astrophysics, and climate science - as well as the dynamics of societal, political, and economic systems - the MDSI will support pioneering data science experts in reshaping the boundaries of what is now feasible."
Stephan Gnnemann, Executive Director of the MDSI and Professor of Data Analytics and Machine Learning, outlined the institute's goals, saying, "At the MDSI, we want to study the foundations of modern data science. This relates to the areas of mathematics and informatics that deal with machine learning. But we also want to apply what we learn in specialized areas such as the development of new materials or in personalized medicine."
The MDSI also aims to disseminate research findings to the business world and society at large, facilitating the transfer of AI-based solutions to industry partners and startups in the data-related domain. Additionally, the MDSI will offer support to researchers grappling with the increasing demand for data-related tasks in their work and will serve as a network for interdisciplinary connections among AI experts.
The MDSI is a convergence point for TUM's strategic data-supported activities, ensuring synergy and reducing redundancies between different disciplines. As President Prof. Thomas F. Hofmann emphasized, "Purely quantitative growth by adding new disconnected activities, one after another, in the fields of data science and programs will not have the necessary impact to reach global player status."
Incorporating Various Initiatives and Facilities
The Munich Data Science Institute incorporates a range of initiatives and facilities under its umbrella:
The TUM Georg Nemetschek Institute - Artificial Intelligence for the Built World, an initiative supported by a generous 50 million euro donation from the Nemetschek Innovation Foundation in 2020. This initiative focuses on AI and machine learning applications throughout the entire life cycle of buildings, from planning and construction to sustainable management.
The AI Future Lab AI for Earth Observation (AI4EO), funded by the Federal Ministry of Research and led by Xiaoxiang Zhu, one of the five MDSI directors. AI4EO combines TUM's strengths in geodesy, earth observation, satellite technology, mathematics, AI, and ethics to develop reliable models related to global urbanization, food supply, and natural disaster management.
The establishment of the Center for Digital Medicine and Health, a new research building with federal and state funding, will be positioned within the medical campus of Klinikum rechts der Isar. Under the leadership of MDSI director Daniel Rckert, it will focus on the development of data-driven approaches and AI methods in medicine.
The Munich Center for Machine Learning (MCML), a collaboration between TUM and LMU, funded by the Federal Ministry of Education and Research and the HighTech Agenda Bayern as one of the National Centers of Excellence for AI Research. The TUM branch of the MCML is integrated into the MDSI infrastructure.
The Konrad Zuse School of Excellence in Reliable AI, coordinated by TUM and LMU, has received funding from the German Academic Exchange Service (DAAD) since 2022. The MDSI contains the business office of the Konrad Zuse School and is led by MDSI Executive Director Stephan Gnnemann.
These initiatives underscore TUM's commitment to advancing data science, machine learning, and AI, while fostering interdisciplinary collaboration to drive innovation and tackle complex challenges in the digital age.
Opening ceremony of the Munich Data Science Institute (MDSI) of the Technical University of Munich (TUM, on September 14th, 2023 in the Galileo Congress Center Garching; in the picture: TUM President
Here is the original post:
TUM Launches Munich Data Science Institute to Drive Collaboration ... - The Munich Eye
NIH awards $50.3 million for multi-omics research on human … – National Institutes of Health (.gov)
News Release
Tuesday, September 12, 2023
New research consortium will develop innovative strategies for clinical studies involving ancestrally diverse populations.
The National Institutes of Health is establishing the Multi-Omics for Health and Disease Consortium, with approximately $11 million awarded in the consortiums first year of funding. The new consortium aims to advance the generation and analysis of multi-omic data for human health research.
Multi-omics refers to a research approach that incorporates several omics data types derived from different research areas such as genomics, epigenomics, transcriptomics, proteomics and metabolomics. Each of these data types reveals distinct information about different aspects of a biological system, and leveraging all these data types at once is becoming increasingly possible with advances in high-throughput technologies and data science.
The integration of multiple types of data from an individual participants biological sample can provide a more holistic view of the molecular factors and cellular processes involved in human health and disease, including untangling genetic and non-genetic factors in health and disease. Such an approach offers great promise in areas such as defining disease subtypes, identifying biomarkers and discovering drug targets.
Beyond gaining insights into individual diseases, the primary goal of this consortium is to develop scalable and generalizable multi-omics research strategies as well as methods to analyze these large and complex datasets, said Joannella Morales, Ph.D., a National Human Genome Research Institute (NHGRI) program director involved in leading the consortium. We expect these strategies will ultimately be adopted by other research groups, ensuring the consortiums work will have broad and long-lasting impacts for clinical research.
Approximately half of the awarded funds will support the work of six disease study sites, which will examine conditions such as fatty liver diseases, hepatocellular carcinoma, asthma, chronic kidney disease and preeclampsia, among others. The sites will enroll research participants, at least 75% of whom will be from ancestral backgrounds underrepresented in genomics research. The sites will also collect data on participants environments and social determinants of health to be used in conjunction with the multi-omics data. Combining the multi-omic and environmental data can offer an even more comprehensive view of the factors that contribute to disease risk and outcomes.
Specimens provided by participants will be processed at the omics production center, which will use high-throughput molecular assays to generate genomic, epigenomic, transcriptomic, proteomic and metabolomic data that will be analyzed to generate molecular profiles of disease and non-disease states. The data analysis and coordination center will then incorporate all of these data into large, organized datasets that will be made available to the scientific community for further studies.
Multi-omics studies are at the forefront of biomedical research and promise to advance our understanding of disease onset and progression, said Erin Ramos, Ph.D., M.P.H., deputy director of NHGRIs Division of Genomic Medicine. All while potentially providing important clues for treatment design and drug-discovery efforts. This new consortium is an important step in making those advances a reality.
Approximately $50.3 million will be awarded to the consortium over five years, pending the availability of funds. The award is funded jointly by NHGRI, the National Cancer Institute (NCI) and the National Institute of Environmental Health Sciences (NIEHS).
Multi-Omics for Health and Disease Consortium
Disease study sites and principal investigators
Omics production center and principal investigator
Data analysis and coordinating center and principal investigator
The National Human Genome Research Institute (NHGRI) is one of the 27 institutes and centers at the NIH, an agency of the Department of Health and Human Services. The NHGRI Division of Intramural Research develops and implements technology to understand, diagnose and treat genomic and genetic diseases. Additional information about NHGRI can be found at: https://www.genome.gov/.
About the National Institutes of Health (NIH):NIH, the nation's medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit http://www.nih.gov.
NIHTurning Discovery Into Health
###
See original here:
Podcast: Vanguard’s Ryan Swann on Big Data Strategies for Big … – InformationWeek
Investment management company Vanguard operates with some 20,000 employees, has more than 50 million investors, with more than $7 trillion dollars in assets under management.
Big money, big data, and a big responsibility to say the least. Financial institutions lean increasingly on data and technology to better navigate fluctuations of the market, which can see dramatic shifts as well as slow-burn trends. For instance, a few weeks ago LG AI Research and Qraft Technologies signed an agreement at the New York Stock Exchange to support efforts in AI applications and the creation of financial instruments. They are clearly not alone in the race to leverage AI and machine learning in conjunction with financial data.
With an operation of Vanguards size and scope, using data and analytics becomes a priority to help maximize investments. A combination of co-locating data and analytics teams to work with leaders while centralizing data is part of Vanguards approach to advising its clients on their investments and reducing risk.
Ryan Swann, Vanguards chief data analytics officer, shares some of the data strategies and structure employed by his team to identify ways to further assist customers who invest through Vanguard by identifying behaviors that might leave money on the table.
Listen to the full podcast here
Big Tech Forging Partnerships to Further AI Development Strategies
Data Management in ALM is Crucial
Could AI Cause a Global Financial Meltdown?
More here:
Podcast: Vanguard's Ryan Swann on Big Data Strategies for Big ... - InformationWeek
Modeling social media behaviors to combat misinformation – William & Mary
Not everyone you disagree with on social media is a bot, but various forms of social media manipulation are indeed used to spread false narratives, influence democratic processes and affect stock prices.
In 2019, theglobal costof bad actors on the internet was conservatively estimated at $78 billion. In the meantime, misinformation strategies have kept evolving: Detecting them has been so far a reactive affair, with malicious actors always being one step ahead.
AlexanderNwala, a William & Mary assistant professor of data science, aims to address these forms of abuse proactively. With colleagues at theIndiana University Observatory on Social Media, he has recently publishedan Open Access paperinEPJ Data Scienceto introduce BLOC, a universal language framework for modeling social media behaviors.
The main idea behind this framework is not to target a specific behavior, but instead provide a language that can describe behaviors, said Nwala.
Automated bots mimicking human actions have become more sophisticated over time. Inauthentic coordinated behavior represents another common deception, manifested through actions that may not look suspicious at the individual account level, but are actually part of a strategy involving multiple accounts.
However, not all automated or coordinated behavior is necessarily malicious. BLOC does not classify good or bad activities but gives researchers a language to describe social media behaviorsbased on which potentially malicious actions can be more easily identified.
A user-friendly tool to investigate suspicious account behavior is in the works at William & Mary. Ian MacDonald 25, technical director of the W&M undergraduate-ledDisinfoLab, is building a BLOC-based website that would be accessed by researchers, journalists and the general public.
The process, Nwala explained, starts with sampling posts from a given social media account within a specific timeframe and encoding information using specific alphabets.
BLOC, which stands for Behavioral Languages for Online Characterization, relies on action and content alphabets to represent user behavior in a way that can be easily adapted to different social media platforms.
For instance, a string like TpR indicates a sequence of four user actions: specifically, a published post, a reply to a non-friend and then to themselves and a repost of a friends message.
Using the content alphabet, the same set of actions can be characterized as (t)(EEH)(UM)(m) if the users posts respectively contain text, two images and a hashtag, a link and a mention to a friend and a mention of a non-friend.
The BLOC strings obtained are then tokenized into words which could represent different behaviors. Once we have these words, we build what we call vectors, mathematical representations of these words, said Nwala. So well have various BLOC words and then the number of times a user expressed the word or behavior.
Once vectors are obtained, data is run through a machine learning algorithm trained to identify patterns distinguishing between different classes of users (e.g., machines and humans).
Human and bot-like behaviors are at the opposite ends of a spectrum: In between, there are cyborg-like accounts oscillating between these two.
We create models which capture machine and human behavior, and then we find out whether unknown accounts are closer to humans, or to machines, said Nwala.
Using the BLOC framework does not merely facilitate bot detection, equaling or outperforming current detection methods; it also allows the identification of similarities between human-led accounts. Nwala pointed out that BLOC had also been applied to detect coordinated inauthentic accounts engaging in information operations from countries that attempted to influence elections in the U.S. and the West.
Similarity is a very useful metric, he said. If two accounts are doing almost the same thing, you can investigate their behaviors using BLOC to see if perhaps theyre controlled by the same person and then investigate their behavior further.
BLOC is so far unique in addressing different forms of manipulation and is well-poised to outlive platform changes that can make popular detection tools obsolete.
Also, if a new form of behavior arises that we want to study, we dont need to start from scratch, said Nwala. We can just use BLOC to study that behavior and possibly detect it.
As Nwala points out to students in his class onWeb Science the science of decentralized information structures studying web tools and technologies needs to take into account social, cultural and psychological dimensions.
As we interact with technologies, all of these forces come together, he said.
Nwala suggested potential future applications of BLOC in areas such as mental health, as the framework supports the study of behavioral shifts in social media actions.
Research work on social media, however, has been recentlylimitedby therestrictionsimposed by social media platforms on application programming interfaces.
Research like this was only possible because of the availability of APIs to collect large amounts of data, said Nwala. Manipulators will be able to afford whatever it takes to continue their behaviors, but researchers on the other side wont.
According to Nwala, such limitations do not only affect researchers, but also the society at large as these studies help raise awareness of social media manipulation and contribute to effective policymaking.
Just as theres been this steady shout about how the slow decline of local news media affects the democratic process, I think this rises up to that level, he said. The ability of good faith researchers to collect and analyze social media data at a large scale is a public good that needs not to be restricted.
Editors note:Dataanddemocracyare two of four cornerstone initiatives in W&Ms Vision 2026 strategic plan. Visit theVision 2026 websiteto learn more.
Antonella Di Marzio, Senior Research Writer
See more here:
Modeling social media behaviors to combat misinformation - William & Mary