Category Archives: Data Science

Analytics and Data Science News for the Week of August 12; Updates from Anaconda, insightsoftware, Verta, and More – Solutions Review

The editors at Solutions Review have curated this list of the most noteworthy analytics and data science news items for the week of August 12, 2022.

Keeping tabs on all the most relevant analytics and data science news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last month, in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy analytics and data science news items.

The Anaconda and Oracle Cloud Infrastructure Partnership will offer open-source Python and R tools and packages by embedding and enabling AnacondasrepositoryacrossOCI Artificial Intelligence and Machine Learning Services. Customers have access to Anaconda services directly from within OCI without a separate enterprise license.

Read on for more.

The annual Industry Excellence Awards acknowledge vendors who have achieved leadership position in the companys 2022 Wisdom of Crowds Analytical Data Infrastructure (ADI), Business Intelligence (BI), and/or Enterprise Performance Management (EPM) Flagship Market Studies. The reports are based on data collected from end-users and provide a broad assessment of each market.

Read on for more.

insightsoftware plans to leverage the Dundas platform to enhance its existing Logi solutions, notably adding a strong extract, transform, and load (ETL) engine and pixel perfect reporting. The Dundas solution is a flexible, end-to-end BI platform that offers software providers the ability to customize dashboards, reports, and visualizations. It was designed to operate as a one stop shop for self-service analytics, with integration into multiple data sources.

Read on for more.

Adding BigSquare to the Litera solutions portfolio will empower law firms to make better financial decisions with fast access to financial data and insights. BigSquares BI software retrieves, analyzes, and visualizes financial data onto a configurable financial intelligence dashboard that is easy to digest and understand. Lawyers and management teams can leverage the practical insights themselves, reducing the need to hire or rely on specialists to interpret the data.

Read on for more.

The report is based on a survey of 500 data and technology leaders, across a variety of industries, who are managing active data workloads of 150 terabytes or more. The purpose of the survey and report is to uncover key trends around how organizations are managing the shift from big-data volumes toward ingesting, storing, and analyzing hyperscale data sets, which include trillions of data records, and the expected technical requirements and business results from that shift.

Read on for more.

TheNet Emotional Footprint (NEF)of each software provider is a result of aggregated emotional response ratings across the areas of service, negotiation, product impact, conflict resolution, strategy, and innovation. The NEF is a powerful indicator of overall user sentiment toward the provider and its product from the software users point of view.

Read on for more.

The new updates include additions to Vertas native integration ecosystem and subsequent capabilities around enterprise security, privacy and access controls, model risk management, and the pursuit of responsible AI. Verta was recently recognized as a 2022 Gartner Cool Vendor in AI Core Technologies.

Read on for more.

For consideration in future analytics and data science news roundups, send your announcements to the editor: tking@solutionsreview.com.

Tim is Solutions Review's Editorial Director and leads coverage on big data, business intelligence, and data analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in data management and data integration, Tim is a recognized influencer and thought leader in enterprise business software. Reach him via tking at solutionsreview dot com.

See more here:

Analytics and Data Science News for the Week of August 12; Updates from Anaconda, insightsoftware, Verta, and More - Solutions Review

OU researchers award two NSF pandemic prediction and prevention projects – EurekAlert

image:Photo of the University of Oklahoma's Norman campus. view more

Credit: Photo provided by the University of Oklahoma.

Two groups of researchers at the University of Oklahoma have each received nearly $1 million grants from the National Science Foundation as part of its Predictive Intelligence for Pandemic Prevention initiative, which focuses on fundamental research and capabilities needed to tackle grand challenges in infectious disease pandemics through prediction and prevention.

To date, researchers from 20 institutions nationwide were selected to receive an NSF PIPP Award. OU is the only university to receive two grants to the same institution.

The next pandemic isnt a question of if, but when, said OU Vice President for Research and Partnerships Toms Daz de la Rubia. Research at the University of Oklahoma is going to help society be better prepared and responsive to future health challenges.

Next-Generation Surveillance

David Ebert, Ph.D., professor of computer science and electrical and computer engineering in the Gallogly College of Engineering, is the principal investigator on one of the projects, which explores new ways of sharing, integrating and analyzing data using new and traditional data sources. Ebert is also the director of the Data Institute for Societal Challenges at OU, which applies OU expertise in data science, artificial intelligence, machine learning and data-enabled research to solving societal challenges.

While emerging pathogens can circulate among wild or domestic animals before crossing over to humans, the delayed response to the COVID-19 pandemic has highlighted the need for new early detection methods, more effective data management, and integration and information sharing between officials in both public and animal health.

Eberts team, composed of experts in data science, computer engineering, public health, veterinary sciences, microbiology and other areas, will look to examine data from multiple sources, such as veterinarians, agriculture, wastewater, health departments, and outpatient and inpatient clinics, to potentially build algorithms to detect the spread of signals from one source to another. The team will develop a comprehensive animal and public health surveillance, planning and response roadmap that can be tailored to the unique needs of communities.

Integrating and developing new sources of data with existing data sources combined with new tools for detection, localization and response planning using a One Health approach could enable local and state public health partners to respond more quickly and effectively to reduce illness and death, Ebert said. This planning grant will develop proof-of-concept techniques and systems in partnership with local, state and regional public health officials and create a multistate partner network and design for a center to prevent the next pandemic.

The Centers for Disease Control and Prevention describes One Health as an approach that bridges the interconnections between people, animals, plants and their shared environment to achieve optimal health outcomes.

Co-principal investigators on the project include Michael Wimberly, Ph.D., professor in the College of Atmospheric and Geographic Sciences; Jason Vogel, Ph.D., director of the Oklahoma Water Survey and professor in the Gallogly College of Engineering School of Civil Engineering and Environmental Science; Thirumalai Venkatesan, director of the Center for Quantum Research and Technology in the Dodge Family College of Arts and Sciences; and Aaron Wendelboe, Ph.D., professor in the Hudson College of Public Health at the OU Health Sciences Center.

Predicting and Preventing the Next Avian Influenza Pandemic

Several countries have experienced deadly outbreaks of avian influenza, commonly known as bird flu, that have resulted in the loss of billions of poultry, thousands of wild waterfowl and hundreds of humans. Researchers at the University of Oklahoma are taking a unique approach to predicting and preventing the next avian influenza pandemic.

Xiangming Xiao, Ph.D., professor in the Department of Microbiology and Plant Biology and director of the Center for Earth Observation and Modeling in the Dodge Family College of Arts and Sciences, is leading a project to assemble a multi-institutional team that will explore pathways for establishing an International Center for Avian Influenza Pandemic Prediction and Prevention.

The goal of the project is to incorporate and understand the status and major challenges of data, models and decision support tools for preventing pandemics. Researchers hope to identify future possible research and pathways that will help to strengthen and improve the capability and capacity to predict and prevent avian influenza pandemics.

This grant is a milestone in our long-term effort for interdisciplinary and convergent research in the areas of One Health (human-animal-environment health) and big data science, Xiao said. This is an international project with geographical coverage from North America, Europe and Asia; thus, it will enable OU faculty and students to develop greater ability, capability, capacity and leaderships in prediction and prevention of global avian influenza pandemic.

Other researchers on Xiaos project include co-principal investigators A. Townsend Peterson, Ph.D., professor at the University of Kansas; Diann Prosser, Ph.D., research wildlife ecologist for the U.S. Geological Survey; and Richard Webby, Ph.D., director of the World Health Organization Collaborating Centre for Studies on the Ecology of Influenza in Animals and Birds with St. Jude Childrens Research Hospital. Wayne Marcus Getz, professor at the University of California, Berkeley, is also assisting on the project.

The National Science Foundation grant for Eberts research is set to end Jan. 31, 2024, while Xiaos grant will end Dec. 31, 2023.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Read more here:

OU researchers award two NSF pandemic prediction and prevention projects - EurekAlert

The World is Moving Beyond Big Data, According to Ocient Survey of 500 Data and Technology Leaders – insideBIGDATA

Ocient, a leading hyperscale data analytics solutions company serving organizations that derive value from analyzing trillions of data records in interactive time,released a report, Beyond Big Data: The Rise of Hyperscale. The report is based on a survey of 500 data and technology leaders, across a variety of industries, who are managing active data workloads of 150 terabytes or more. The purpose of the survey and report is to uncover key trends around how organizations are managing the shift from big-data volumes toward ingesting, storing and analyzing hyperscale data sets, which include trillions of data records, and the expected technical requirements and business results from that shift.

The survey was conducted in May 2022 by Propeller Insights. Respondents include partners, owners, presidents, C-level executives, vice presidents and directors in many industries including technology, manufacturing, financial services, retail, and government. Their organizations annual revenue ranges from $50 million to $5 billion. Approximately 50% of respondents represent companies with annual revenue greater than $500 million.

Key findings of the survey include:

Extraordinary data growth

Data is growing at an extraordinary rate. According to John Rydning, research vice president of the IDC Global DataSphere, a measure of how much new data is created, captured, replicated, and consumed each year, The Global DataSphere is expected to more than double in size from 2022 to 2026. The Enterprise DataSphere will grow more than twice as fast as the Consumer DataSphere over the next five years, putting even more pressure on enterprise organizations to manage and protect the worlds data while creating opportunities to activate data for business and societal benefits.

IDC Global DataSphere research also documented that in 2020, 64.2 zettabytes of data was created or replicated and forecasted that global data creation and replication will experience a compound annual growth rate (CAGR) of 23% over the 2020-2025 forecast period. At that rate, more than 180 zettabytes thats 180 billion terabytes will be created in 2025.

The survey respondents reflect forecasts of such exponential data growth. When asked how fast the volume of data managed by their organization will grow over the next one to five years, 97% of respondents answered fast to very fast, with 72% of C-level executives expecting the volume to grow very fast over the next five years.

Barriers to supporting data growth and hyperscale analytics

To support such tremendous data growth, 98% of respondents agreed its somewhat or very important to increase the amount of data analyzed by their organizations in the next one to three years. However, respondents are experiencing barriers to harnessing the full capacity of their data and cited these top three limiting factors:

When asked about their biggest data analysis pain points today, security and risk ranked first among C-level respondents (68%), with metadata and governance (41%) and slow data ingestion (31%) being two other top concerns. When scaling data management and analysis within their organization, 63% said maintaining security and compliance as data volume and needs grow was a challenge they are currently facing.

Survey respondents also indicated legacy systems are another source of pain and a barrier to supporting data growth and hyperscale analytics. When asked if they plan to switch data warehousing solutions, more than 59% of respondents answered yes, with 46% of respondents citing a legacy system motivating them to switch. When ranking their most important considerations in choosing a new data warehouse technology, modernizing our IT infrastructure was ranked number one.

Faster data analytics improve decisions, revenue and success

The survey respondents believe hyperscale data analytics is crucial to their success. Sixty-four percent of respondents indicate hyperscale data analytics provides important insights used to make better business decisions, and 62% said it is essential for planning and strategy.

The survey respondents also indicated there is a strong relationship between implementing faster data analytics and growing the companys bottom line. When asked about this relationship, an overwhelming 78% of respondents agreed there is a definite relationship. For the C-level audience, more than 85% cited the relationship.

Data analysis is no longer a nice-to-have for organizations. Hyperscale data intelligence has become a mission-critical component for modern enterprises and government agencies looking to drive more impact and grow their bottom line. With the rapid pace of growth, its imperative for enterprises and government agencies to enhance their ability to ingest, store, and analyze fast-growing data sets in a way that is secure and cost effective, said Chris Gladwin, co-founder and CEO, Ocient. The ability to migrate from legacy systems and buy or build new data analysis capabilities for rapidly growing workloads will enable enterprises and government organizations to drive new levels of agility and growth that were previously only imaginable.

Join us on Twitter:@InsideBigData1 https://twitter.com/InsideBigData1

Sign up for the free insideBIGDATAnewsletter.

See original here:

The World is Moving Beyond Big Data, According to Ocient Survey of 500 Data and Technology Leaders - insideBIGDATA

Lucy Family Institute for Data & Society funds A&L faculty project proposals // Department of Political Science // University of Notre Dame – nd.edu

Faculty in the College of Arts and Letters are participating in interdisciplinary projects funded by the Lucy Family Institute for Data & Society (the Institute) that inspire novel research and scholarship, enhance stakeholder engagement, foster collaboration, and address wicked problems.

The Institute solicited proposals from affiliates and fellows to discover and define thematic goals of interest to a broader coalition of faculty on campus, as part of its strategic planning initiatives for the next three to five years.

Submitted proposals were within four funding tracks: Convening, Research Accelerator, Infrastructure & Services, and Partnerships. The Institute, after a substantial review process led by the members of the steering committee, awarded the following 13 projects that involve collaboration among all colleges and schools and are intended to generate translational value for societal benefit:

To learn more about these grants and other funding opportunities, visit https://lucyinstitute.nd.edu/about/funding-opportunities/.

Originally published by Alissa Doroh at lucyinstitute.nd.edu on August 08, 2022.

Go here to read the rest:

Lucy Family Institute for Data & Society funds A&L faculty project proposals // Department of Political Science // University of Notre Dame - nd.edu

Building the Best Zeolite – University of Houston

Natural zeolite mineral originating from Croft Quarry in Leicester, England

Jeffrey Rimer, Abraham E. Dukler Professor of chemical and biomolecular engineering at the University of Houston, has summarized methods of making zeolites in the lab and examined how the emergence of data analytics and machine learning are aiding zeolite design.

If science and nature were to have a baby, it would surely be the zeolite. This special rock, with its porous structure that traps water inside, also traps atoms and molecules that can cause chemical reactions. Thats why zeolites are important as catalysts, or substances that speed up chemical reactions without harming themselves. Zeolites work their magic in the drug and energy industries and a slew of others. With petrochemicals, they break large hydrocarbon molecules into gasoline and further into all kinds of petroleum byproducts. Applications like fluid catalytic cracking and hydrocracking rely heavily on zeolites.

So important is the use of zeolites that decades ago scientists began making them (synthetic ones) in the lab with the total number of crystal structures exceeding 250.

Now, an undisputed bedrock in the global zeolite research community, Jeffrey Rimer, Abraham E. Dukler Professor of chemical and biomolecular engineering at the University of Houston, has published a review in the Nature Synthesis journal summarizing methods over the past decade that have been used to prepare state-of-the art zeolites with nano-sized dimensions and hierarchical structures.

The findings emphasize that smaller is better and structure is critical.

These features are critical to their performance in a wide range of industrial applications. Notably, the small pores of zeolites impose diffusion limitations for processes involving catalysis or separations where small molecules must access pores without obstruction from the accumulation of residual materials like coke, which is a carbonaceous deposit that blocks pores, reports Rimer. This calls for new methods to prepare zeolites with smaller sizes and higher surface area, which is a challenging task because few zeolites can be prepared with sizes less than 100 nanometers.

The review article summarizes advanced methods to accomplish this goal, including work from Rimers own group on finned zeolites, which he invented. Zeolites with fins are an entirely new class of porous catalysts using unique nano-sized features to speed up the chemistry by allowing molecules to skip the hurdles that limit the reaction.

Rimer also examines how the emergence of data analytics and machine learning are aiding zeolite design and provides future perspectives in this growing area of research. That helps make up the new methods that Rimer suggests as imperative, resulting in major advantages of infusing computational and big data analyses to transition zeolite synthesis away from trial-and-error methodologies.

Besides, speeding up the process of crystallizing zeolites, and speeding up the reactions of the zeolites themselves, will result in many socioeconomic advantages, according to Rimer.

Improved zeolite design includes the development of improved catalysts for energy applications (including advancements in alternative energy), new technologies for regulating emissions that impact the environment and separations to improve industrial processes with impact on petroleum refining, production of chemicals and water purification, he said.

Read the original post:

Building the Best Zeolite - University of Houston

MLOps | Is the Enterprise Repeating the Same DIY Mistakes? – insideBIGDATA

There is a reason the enterprise doesnt build their own cloud computing infrastructure.Last decade, IT infrastructure teams sought to build their own private clouds because they thought they could do it cheaper and better suited to their business versus public cloud. Instead, they ended up taking longer and costing more than expected to build, requiring more resources to maintain, and having less of the latest capabilities in security and scaling than what was provided by the public clouds. Instead of investing in core business capabilities, these enterprises ended up investing significant time and headcount to infrastructure that couldnt match expanded business needs.

Many enterprises are now repeating that same do-it-yourself approach to most things MLOps by creating custom solutions cobbled together from various open source tools like Apache Spark.

These often result in model deployments taking weeks or even months per model, inefficient runtimes (as measured by inferences run over compute and time required), and especially lack the observability needed to test and monitor the ongoing accuracy of models over time. These approaches are too bespoke to provide scalable, repeatable processes to multiple use cases in different parts of the enterprise.

The case of the misdiagnosed problem

In addition, conversations with line of business leaders and chief data and analytics officers have taught us that organizations keep hiring more data scientists but arent seeing the return. As we delved deeper, however, and started asking questions to identify the blockers to their AI, they quickly realized their bottleneck was actually at the last mile deploying the models to use against live data, running them efficiently so the compute costs didnt outweigh the gains, and then measuring their performance.

Data scientists excel at turning data into models that help solve business problems and make business decisions. But the expertise and skills required to build great models arent the same skills needed to push those models in the real world with production-ready code, and then monitor and update on an ongoing basis.

This is where ML engineers come in. ML engineers are responsible for integrating tools and frameworks together to ensure the data, data pipelines, and key infrastructure are working cohesively to productionize ML models at scale (see our more in-depth breakdown comparing the roles of data scientists versus ML engineers available here).

So now what? Hire more ML engineers?

But even with the best ML engineers, enterprises face two major problems to scaling AI:

How to get the most value from AI

Enterprises have poured billions of dollars into AI based on promises around increased automation, personalizing the customer experience at scale, or delivering more accurate and granular predictions. But so far there has been a massive gap between AI promises and outcomes, with only about 10% of AI investments yielding significant ROI.

In the end, to solve the MLOps problem, Chief Data & Analytics officers need to build the capabilities around data science that are core to the business, but invest in technologies that automate the rest of MLOps. Yes, this is the common build vs. buy dilemma, but this time the right way to measure isnt solely OpEx costs, but in how quickly and effectively your AI investments are permeating throughout the enterprise, whether generating new revenues through better products and customer segments or cutting costs through greater automation and decreased waste.

About the Author

Aaron Friedman is VP of Operations atWallaroo.ai. He has a dynamic background in scaling companies and divisions, including IT Outsourcing at Verizon, Head of Operations forLowes.comand JetBlue, Head of Global Business Development at Qubole, and growing and selling two systemintegrationcompanies.

Sign up for the free insideBIGDATAnewsletter.

Join us on Twitter:@InsideBigData1 https://twitter.com/InsideBigData1

Here is the original post:

MLOps | Is the Enterprise Repeating the Same DIY Mistakes? - insideBIGDATA

Research Associate in Design Analytics and Music Physiology job with KINGS COLLEGE LONDON | 304739 – Times Higher Education

Job description

This is an exciting opportunity for a data scientist with strong musical sensibilities to play a key role in the development of computational tools for remodelling music expressivity to achieve specific cardiovascular (autonomic) aims. The objectives will be to design and implement techniques to morph expressive music parameters in ways that powerfully impact listener perception and physiology in targeted ways, to evaluate these strategies and their effectiveness, and to develop algorithms to analyse users design decisions to learn from their choices.

The work will be carried out in the context of the ERC project COSMOS (Computational Shaping and Modeling of Musical Structures), augmented by the Proof-of-Concept project HEART.FM (Maximizing the Therapeutic Potential of Music through Tailored Therapy with Physiological Feedback in Cardiovascular Disease), on citizen/data science approaches to studying music expressivity and on autonomic modulation through music. See https://doi.org/10.3389/fpsyg.2022.527539.

The remodelled expressions will be rendered synthetically or through the projects reproducing piano. Effectiveness of the expression remodelling at achieving the physiological aims will be tested on listeners, for example, through the HEART.FM mobile app tracking their physiology whilst they listen to the remodelled music. Successful transformations will be integrated into CosmoNote (https://cosmonote.ircam.fr), the web-based citizen science portal of COSMOS, or a sister web application for widespread public deployment. Collaborative designs may be explored.

The successful candidate will make major contributions to, and be involved in, all aspects of the computational modelling, interaction design, and software development; testing and validation, including on listeners (healthy volunteers or patients); and, development of algorithms for the design analytics, liaising with other research team members, and with collaborators across multiple domains, and be able to prioritise and organise their own work to deliver research results.

The successful candidate will have a PhD in computer science or a closely-related field, ideally with experience in human-computer interaction, sound and music computing (including programming with MIDI), or web programming (Javascript: D3.js). They should demonstrate a strong ability to design and implement computational algorithms to solve problems with objectives and constraints, and possess sound musical judgement.

They should be highly motivated, and have strong communication skills and a good track record of scientific publication. Personal integrity, a strong work ethic, and a commitment to uphold the highest standards in research are essential attributes.

The project is hosted by the Department of Engineering in the Faculty of Natural, Mathematical & Engineering Sciences and the School of Biomedical Engineering & Imaging Sciences (BMEIS) in the Faculty of Life Sciences & Medicine (FoLSM) at Kings College London. KCL was ranked 6th nationally in the recent Research Excellence Framework exercise. FoLSM was ranked 1st and Engineering was ranked 12th for quality of research.

The research will take place in BMEIS at St Thomas Hospital and Becket House, on the south bank of the River Thames, overlooking the Houses of Parliament and Big Ben in London.

This post will be offered on a fixed-term contract for 12 months (renewable to 31 May 2025)

This is a full-time post

Key responsibilities

Key responsibilities and outcomes

Designing and developing computational algorithms and sandbox environments to remodel musical expressivity with targeted physiological outcomes

Evaluating and validating the proposed methodologies and assessing their effectiveness and potential for clinical translation

Integrating the expression transformation tools into sandbox environments for the web in collaboration with other software programmer(s)

Following the principles of good software design, development, and documentation practices

Preparing high-quality manuscripts for publication, writing clearly about the computational techniques, outcomes, and design analytics

Presenting key findings at scientific conferences and public engagement events

Maintaining suitable performance levels for the software, following good software design, development, and documentation practices

General

Demonstrate collaborative approach to research and software development

Liaise directly with internal / external colleagues in an independent manner

Use initiative, discretion, knowledge and experience in planning, coordination and problem-solving

Demonstrate ownership of tasks and development of solutions to problems

Governance

Maintain an awareness and observation of ethical rules and legislation governing the storage of projected data

Maintain an awareness and observation of confidentiality agreements with collaborators and external organisations

Maintain an awareness and observation of appropriate procedures for the disclosure and protection of inventions and other intellectual property generated as part of the post holders activities and other team members working within the project

Development

To attend regular project meetings and training courses for professional and personal development as required

Communication & Networking

Develop and maintain effective working relationships with staff within the School as well as externally

Regularly communicate information in a clear and precise way

Decision Making, Planning & Problem Solving

Lead in decisions that have a significant impact on their own work, that of others and be party to collaborative decisions

Manage own workload, prioritising these in order to achieve their objectives

Communicate to management any difficulties associated with carrying out work tasks

Resolve problems where the solution may not be immediately apparent and where there is a need to use judgement to achieve resolution

Plan in advance for heavy workload

Use own initiative and creativity to solve problems

The above list of responsibilities may not be exhaustive, and the post holder will be required to undertake such tasks and responsibilities as may reasonably be expected within the scope and grading of the post.

Skills, knowledge, and experience

Essential criteria

1. PhD in operations research, statistics, computer science, music computing, or a related field

2. Experience designing/adapting computational algorithms to solve problems with objectives and constraints

3. Strong musical sensibilities, adaptable, willingness to learn, motivated to work with real-world music and physiological data

4. Good knowledge of software design principles and code management on Git

5. Excellent written and oral communication skills

6. Track record of high-quality, peer-reviewed scientific publications

7. Ability to work with people from diverse backgrounds and specialties

Desirable criteria

1. Experience with music software and related file formats and protocols

2. Experience programming graphical user interfaces to alter music properties

3. Hands on experience working with sound and music

Please note that this is a PhD level role but candidates who have submitted their thesis and are awaiting award of their PhDs will be considered. In these circumstances the appointment will be made at Grade 5, spine point 30 with the title of Research Assistant. Upon confirmation of the award of the PhD, the job title will become Research Associate and the salary will increase to Grade 6.

Read more:

Research Associate in Design Analytics and Music Physiology job with KINGS COLLEGE LONDON | 304739 - Times Higher Education

The Difference Between Standard Deviation and Standard Error – Built In

Have you ever wondered what the difference is between standard deviation and standard error?

If you havent, heres why you should care.

Standard deviation measures the dispersion variability of the data in relation to the mean. In other words, the closer to zero the standard deviation is, the closer to the mean the values are in the studied data set. The standard distribution gives us valuable information in terms of the percentage of data within one, two and three standard deviations from the mean.

Lets use R to generate some random data:

Now, lets generate a normally distributed graph:

When we calculate the mean of a particular sample, were not interested in the mean of that sample. Instead, we want to draw conclusions about the population from which the sample comes. We usually collect representative sample data because were limited in terms of resources for collecting information about the whole population. So, well use it as an estimate of the whole population mean.

More on DataUnderstanding Box Plots

Of course, there will be different means for different samples from the same population This is called the sampling distribution of the mean.You can use the standard deviation of the sampling distribution to estimate the variance between the means of different samples. This is the standard error of the estimate of the mean. This is where everybody gets confused. The standard error is a type of standard deviation for the distribution of the means.

In short, standard error measures the precision of the estimate of the sample mean.

The standard error is strictly dependent on the sample size. As a result, the standard error falls as the sample size increases. If you think about it, the bigger the sample, the closer the sample mean is to the population mean, and thus, the closer the estimate is to the actual value.

R code for computing standard error below:

More on Data16 Data Modeling Tools You Should Know

If you need to draw conclusions about the spread and variability of the data, use standard deviation.

If youre interested in finding how precise the sample mean is or youre testing the differences between two means, then standard error is your metric.

Excerpt from:

The Difference Between Standard Deviation and Standard Error - Built In

Cyber and Information Systems – GOV.UK

The Cyber and Information Systems (CIS) division delivers evidence-based resilient sensing, information and effector systems for the defence and security of the UK. The threats that the UK faces every day are many and varied. We advance the tools, techniques and tradecrafts required to establish world-class capabilities to detect and counter these threats through research, development and technology assessment.

CIS collaborates with our partners to offer the skills, knowledge, expertise and facilities needed to support defence in:

Our vision is to deliver transformational information superiority.

Much of our work is sensitive so, in order to keep the nation and the people who work for us safe, this is just a snapshot of the varied and exciting work we do.

The work we do in AI and Data Systems enables defence to turn data into information advantage and accelerates the responsible and ethical adoption of artificial intelligence across defence. Dstl provides a world-class capability applying artificial intelligence, machine learning and data science to defence and security challenges.

Our expertise includes:

Our work with other nations provides our scientists with exciting opportunities to collaborate with our partners in countries such as the US and we have recently established a hub in Newcastle helping us harness talent from across the UK.

Dstl Newcastle

Effective and resilient communications and networks are essential to all military operations. The armed forces must be able to operate in all environments from sub-sea to space and face some of the most difficult communication challenges imaginable.

One of the huge challenges for defence is how we integrate and communicate information from strategic and operational levels down to the tactical applications in warfare across all domains: land, sea, air, space and cyber. We do this through Command, Control, Communications, Computers (C4) Systems.

The C4 Group conducts research to deliver the science and technology advances that enable complex information services like AI to reach into these environments and to overcome threats to our communications.

With employees ranging from physicists to engineers working in electronics, computer science, systems, information and data sciences, we identify ways to connect all military commands and equipment (present and future) and exploit the electromagnetic spectrum. This includes building in resilience against threats posed by highly technical adversaries. It will also involve collaboration with allies, organisations and coalition partners, such as NATO.

Figure: The Electromagnetic Environment from AJP 3.6 C (NATO EW Doctrine) NATO UNCLASSIFIED; Reproduced with permission from MoD Head of Delegation for NATO EW within the NATO EW.

Adversaries will seek to find ways to disrupt communications to deny our forces access to the information that is fundamental to our success. Threats like jamming, interception and cyber-attack are under constant development by hostile states so it is essential that we continually improve our networks to maintain our advantage. Our staff work on developing both next generation and generation-after-next communications and networks systems.

Some of our current work includes:

We use our expertise to deliver communications options for everything from submarines to fast jets and from complex headquarters to autonomous swarms. We achieve this through designing experiments and conducting trials to prove concepts. And because were at the extreme cutting edge of technology, we work closely with leading experts from academia, commercial companies and international partners.

The military and civilian worlds are entirely dependent on electronic systems and data. These computers are in everything from our everyday phones, tablets and laptops through to our vehicles, power distribution, communications systems, and other invisible systems. Dstls cyber capability plays a vital role in protecting UK defence and security from the cyber threats to our systems.

Its not the stuff that we can say about our work that makes it exciting its the stuff we cant!

Our Cyber group brings diverse, creative minds together to provide unique ideas to solve classified challenges. We work across software, electronics, radio frequency (RF), hardware, security and safety critical systems to:

We use our cyber expertise in everything from security, safety and autonomous systems through to hardware, antennas and large complex software to deliver solutions that bring brand new, cutting-edge capabilities to allow the UK to establish strategic advantage.

The Electronic Warfare Technology and Enterprise (EWTE) Group is comprised of RF, electrical, electronic, software, system and mechanical engineers, physicists, mathematicians and internationally-recognised experts in electronic warfare (EW).

We support the Ministry of Defence (MOD) to deliver substantial EW equipment procurement programmes for the armed forces by:

We work on classified operational capability and provide rapid advice on how to overcome challenges, with our expertise in science and technology in areas such as communications and radars, RF systems, signal processing and machine learning, systems engineering and modelling and simulation.

We further specialist research by sponsoring and mentoring PhDs, in areas including Information Theory, and we have also launched an Electromagnetic Research Hub to provide highly classified analysis and advice into defence operations.

The Intelligence Innovation group develops and delivers transformations in defence analytical capability to UK intelligence through the application of data science, sensor exploitation and collaboration. As an impartial, in-government capability, we:

Our expertise includes innovative and forward-thinking individuals from data professionals in science, engineering and fusion to technical professionals within digital signal processing, open-source information exploitation, systems engineering and consultancy.

Operational research (OR) is a scientific approach to the solution of problems in the management of complex systems that enables decision makers to make better decisions.

The Joint Enablers Group conduct operational research which allows the MOD to make informed decisions on capability in the areas of:

We do this through a combination of qualitative and quantitative methods. Qualitative methods are subjective techniques which allow us to explore new concepts and ideas and quantitative methods are objective techniques which allow us to numerically measure difference.

To enable the divisions to execute their projects, ensuring value for money, CIS is home to around 150 project professionals. These include project and programme managers who work closely with our scientists and often experience first-hand the exciting science we deliver to our customers. They do not necessarily have a background in science or technology, but focus on applying international, government and MOD best practice in portfolio, programme and project (P3) management. This group is essential in providing exploitable science and technology to realise tangible defence benefits.

The ability to find your enemy on land, sea, air or space remains key to military success. Dstls Sensing Group is the core of UK MODs research and innovation hub dedicated to delivering world leading military sensing technology.

Our scientists work hand in glove with frontline forces to understand the military challenges of the modern battlefield and develop the sensors needed for tomorrows challenges and beyond.

With state of the art laboratory, experimental and field trials capabilities, we shape the military sensing landscape of the future to ensure the UK can always maintain a technology advantage over its adversaries.

Covering a plethora of sensing modalities, we:

Our Sensing Group includes professionals in quantum technologies, electro-optic sciences, radio frequency engineering, alternative navigation, precision timing, data processing and sensor system modelling.

The Space Group is the centre for science and technology support to UK defence and security in space. We work to deliver the UKs National and Defence Space Strategies, developing the UKs ability to become a meaningful influencer in space.

Our mission is to provide domain advice and use science and technology to deliver freedom of action in and through space. This is a huge remit covering space domain awareness experiments, novel satellite technology; and optical satellite communications. We collaborate with professionals in academia, industry and internationally to develop world-class scientific capability and to design and develop mission demonstrators for Space Command.

Our employees work at all levels within the space architecture. This includes:

The Space Group also provides the evidence base for decisions around the shape and scale of the defence approach to space in the future.

Our professionals in the space team include space systems engineers, payload engineers, synthetic aperture radar scientists, space domain awareness scientists and a wide range of other specialists to deliver the MODs growing need for space capability.

The CIS division at Dstl is always looking for top talent who can bring innovation, passion and skills to the team.

We need people with diverse and creative minds who can bring their individual perspectives to our classified challenges.

Working within CIS provides a fantastic opportunity to work on cutting edge technologies, interesting projects (some we cannot disclose) and to work with like-minded professionals internally and across academia, industry and internationally.

The benefits of working in CIS include:

Find out more about the benefits of working for Dstl.

At CIS and Dstl we actively encourage applicants from diverse backgrounds. Meet AJ, software engineer with a neurodiverse background who works on our exciting projects.

Celebrating Neurodiversity at Dstl: AJs Story

Read the original post:

Cyber and Information Systems - GOV.UK

Zilliz Announces Key Contributions to Milvus 2.1, the Leading Open-Source Vector Database for Structured and Unstructured Data – insideBIGDATA

Zilliz, whose founders created the Milvus open-source project, announced major contributions to the Milvus 2.1 release. The added functionality further bridges the gap between data pools, removes data silos, and offers performance and availability enhancements to address developers most common concerns. Milvus is one of the worlds most advanced vector databases, capable of managing massive quantities of both structured and unstructured data to accelerate the development of next-generation data fabric.

When it comes to unstructured data, solutions offered by industry incumbents tend to be add-on capabilities or tools in a legacy database management system, whereas Milvus is designed around unstructured data from day one and is now offering more built-in capabilities to unlock more powerful and integrated data processing.

A graduated-stage open-source project under the LF AI & Data Foundation, Milvus is built for scalable similarity search and used by a wide array of enterprises across industries. It embraces distributed architecture and can easily scale as data volumes and workloads increase. Highly scalable, reliable, and exceptionally fast, Milvus supports DML operations (adding, deleting, updating) and near real-time search of vectors on a trillion-byte scale.

With this 2.1 update, Milvus sees significant improvement in its performance, reducing search latency on million-scale datasets to five milliseconds, while further simplifying deployment as well as ops workflow.

Bridging Gaps and Improving Performance

Machine learning is producing vast pools of scalar and vector data on a daily basis. With the introduction of more scalar data types, Milvus 2.1 is bridging this critical gap between data pools.

Data silos can now be better integrated and linked, enabling businesses to unlock the full potential of their data, said Milvus project maintainer Xiaofan James Luan, who also serves as the director of engineering at Zilliz. When it comes to unstructured data, solutions offered by industry incumbents tend to be add-on capabilities or tools in a legacy database management system, whereas Milvus is designed around unstructured data from day one and is now offering more built-in capabilities to unlock more powerful and integrated data processing.

Zillizs contributions to the 2.1 release specifically include:

An overall performance boost including reduced latency; highly improved throughput for small-NQ application scenarios, such as reverse image search and intelligent chatbot; support of multiple memory replicas for small tables to increase throughput; and 2x increase in search performance.

Improved scalar data processing that adds Varchar into supported data types and supports creating indexes for scalar data, taking hybrid search to a more intuitive level.

Production-grade enhancements and higher availability, with clearer monitoring metrics for observability, easier and more diverse deployment options including embedded Milvus for simple deployment and Ansible for offline deployment, integration that supports Kafka as log storage, and enhanced security supporting password protection and TLS connection.

A developer-friendly ecosystem in the making that includes more tutorials for building real-world applications, connecting Milvus with open-source vector data ETL framework Towhee; and that adds Feder, an open-source tool that helps Milvus users select the index best suited to their application scenario by visualizing the process of vector similarity search.

In addition to the integration and security features enumerated, Milvus will provide more functionalities essential to modern security mechanisms, including ACL (Access Control Lists) and advanced encryption methods.

Commitment to Open-Source Ecosystems

As data infrastructure for unstructured data, Milvus is revolutionary because it processes vector embeddings and not just strings. In the future, Zilliz, the company founded by the creators of Milvus, seeks to build an ecosystem of solutions around Milvus, and some of the projects that will contribute to this have already surfaced, including Towhee, our open-source vector data ETL framework, and Feder, an interactive visualization tool for unstructured data. With Milvus 2.1 and the new demos, users can see how these products can come together to solve a series of problems that involve unstructured data, added Luan.

Zilliz is committed to the developer community and will continue to contribute to open-source projects like Milvus. The companys technology has broad applications spanning new drug discovery, computer vision, recommender engines, chatbots, and much more.

Sign up for the free insideBIGDATAnewsletter.

Join us on Twitter:@InsideBigData1 https://twitter.com/InsideBigData1

Originally posted here:

Zilliz Announces Key Contributions to Milvus 2.1, the Leading Open-Source Vector Database for Structured and Unstructured Data - insideBIGDATA