Category Archives: Machine Learning
Hugging Face: Everything you need to know about the AI platform – Android Police
Hugging Face is a platform for viewing, sharing, and showcasing machine learning models, datasets, and related work. It aims to make Neural Language Models (NLMs) accessible to anyone building applications powered by machine learning. Many popular AI and machine-learning models are accessible through Hugging Face, including LLaMA 2, an open source language model that Meta developed in partnership with Microsoft.
Hugging Face is a valuable resource for beginners to get started with machine-learning models. You don't need to pay for any special apps or programs to get started. You only need a web browser to browse and test models and datasets on any device, even on budget Chromebooks.
Hugging Face provides machine-learning tools for building applications. Notable tools include the Transformers model library, pipelines for performing machine-learning tasks, and collaborative resources. It also offers dataset, model evaluation, simulation, and machine learning libraries. Hugging Face can be summarized as providing these services:
Hugging Face receives funding from companies including Google, Amazon, Nvidia, Intel, and IBM. Some of these companies have created open source models accessible through Hugging Face, like the LLaMA 2 model mentioned at the beginning of this article.
The number of models available through Hugging Face can be overwhelming, but it's easy to get started. We walk you through everything you need to know about what you can do with Hugging Face and how to create your own tools and applications.
The core of Hugging Face is the Transformers model library, dataset library, and pipelines. Understanding these services and technologies gives you everything you need to use Hugging Face's resources.
The Transformers model library is a library of open source transformer models. Hugging Face has a library of over 495,000 models grouped into data types called modalities. You can use these models to perform tasks with pipelines, which we explain later in this article.
Some of the tasks you can perform through the Transformers model library are:
A complete list of these tasks can be seen on the Hugging Face website, categorized for easy searching.
Within these categories are numerous user-created models to choose from. For example, Hugging Face currently hosts over 51,000 models for Text Generation.
If you aren't sure how to get started with a task, Hugging Face provides in-depth documentation on every task. These docs include use cases, explanations of model and task variants, relevant tools, courses, and demos. For example, the demo on the Text Generation task page uses the Zephyr language models to complete models. You'll refer to the model for instructions on how to use it for the task.
These tools make experimenting with models easy. While some are pre-trained with data, you'll need datasets for others, which is where the datasets library comes into play.
The Hugging Face datasets library is suitable for all machine-learning tasks offered within the Hugging Face model library. Each dataset contains a dataset viewer, a summary of what's included in the dataset, the data size, suggested tasks, data structure, data fields, and other relevant information.
For example, the Wikipedia dataset contains cleaned Wikipedia articles of all languages. It has all the necessary documentation for understanding and using the dataset, including helpful tools like a data visualization map of the sample data. Depending on what dataset you access, you may see different examples.
Models and datasets are the power behind performing tasks from Hugging Face, but pipelines make it easy to use these models to complete tasks.
Hugging Face's pipelines simplify using models through an API that cuts out using abstract code. You can provide a pipeline with multiple models by specifying which one you want to use for specific actions. For example, you can use one model for generating results from an input and another for analyzing them. This is where you'll need to refer to the model page you used for the results to interpret the formatted results correctly.
Hugging Face has a full breakdown of the tasks you can use pipelines for.
Now you have an understanding of the models, datasets, and pipelines provided by Hugging Face, you're ready to use these assets to perform tasks.
You only need a browser to get started. We recommend using Google Colab, which lets you write and execute Python code in your browser. It provides free access to computing resources, including GPUs and TPUs, making it ideal for basic machine-learning tasks. Google Colab is easy to use and requires zero setup.
After you've familiarized yourself with Colab, you're ready to install the transformer libraries using the following command:
Then check it was installed correctly using this command:
You're now ready to dive into Hugging Face's libraries. There are a lot of places to start, but we recommend Hugging Face's introductory course, which explains the concepts we outlined earlier in detail with examples and quizzes to test your knowledge.
Collaboration is a huge part of Hugging Face, allowing you to discuss models and datasets with other users. Hugging Face encourages collaboration through a discussion forum, a community blog, Discord, and classrooms.
Models and datasets on Hugging Face also have their own forums where you can discuss errors, ask questions, or suggest use cases.
Machine learning and AI are daunting for beginners, but platforms like Hugging Face provide a great way to introduce these concepts. Many of the popular models on Hugging Face are large language models (LLMs), so familiarize yourself with LLMs if you plan to use machine-learning tools for text generation or analysis.
Read more:
Hugging Face: Everything you need to know about the AI platform - Android Police
Microsoft, OpenAI: US Adversaries Armed with GenAI – InformationWeek
Microsoft and OpenAI say Iran, North Korea, Russia, and China have started arming their US cyberattack efforts with generative artificial intelligence (GenAI).
The companies said in a blog post on Microsofts website Wednesday that they jointly detected and stopped attacks using their AI technologies. The companies listed several examples of specific attacks using large language models to enhance malicious social engineering efforts -- leading to better deepfakes and voice cloning attempting to crack US systems.
Micosoft said North Koreas Kimsuky cyber group, Irans Revolutionary Guard, Russias military, and a Chinese cyberespionage called Aquatic Panda, all used the companies large language model tools for potential attacks and malicious activity. The attack from Iran included phishing emails pretending to come from an international development agency and another attempting to lure prominent feminists to an attacker-built website on feminism.
Cyberattacks from foreign adversaries have been steadily increasing in severity and complexity. This month, the Cybersecurity and Infrastructure Agency (CISA) said China-backed threat actor Volt Typhoon targeted several western nations critical infrastructure and have had access to the systems for at least five years. Experts fear such attacks will only increase in severity as nation-states use GenAI to enhance their efforts.
Related:Firms Arm US Against AI Cyberattacks
Nazar Tymoshyk, CEO at cybersecurity firm UnderDefense, tells InformationWeek in a phone interview that even as threats become more sophisticated through GenAI, the fundamentals for cybersecurity should stay the same. The onus for safeguarding, he said, is on the company producing AI. Every product is AI-enabled, so its now a feature in every program, he says. It becomes impossible to distinguish between whats an AI attack. So, its the company who is responsible to put additional controls in place.
Microsoft called the attack attempts early stage, and our research with OpenAI has not identified significant attacks employing the LLMs we monitor closely. At the same time we feel this is important research to expose early stage, incremental moves that we observe well-known threat actors attempting, and share information on how we are blocking and countering them with the defender community.
The companies say hygiene practices like multifactor authentication and zero-trust defenses are still vital weapons against attacks -- AI-enhanced or not. While attackers will remain interested in AI and probe technologies current capabilities and security controls, its important to keep these risks in context.
Related:What CISOs Need to Know About Nation-State Actors
In a separate blog post, OpenAI says it will continue to work with Microsoft to identify potential threats using GenAI models.
Although we work to minimize potential misuse by such actors, we will not be able to stop every instance. But by continuing to innovate and investigate, collaborate, and share, we make it harder for malicious actors to remain undetected across the digital ecosystem and improve the experience for everyone else.
OpenAI declined to make an executive available for comment.
While Microsoft and OpenAIs report was focused on how threat actors are using AI tools for attacks, AI can also be a vector for attack. Thats an important thing to remember with businesses implementing GenAI tools at a feverish pace, Chris Tito Sestito, CEO and co-founder of adversarial AI firm HiddenLayer tells InformationWeek in an email.
Artificial intelligence is, by a wide margin, the most vulnerable technology ever to be deployed in production systems, Sestito says. Its vulnerable at a code level, during training and development, post-deployment, over networks, via generative outputs and more. With AI being rapidly implemented across sctors, there has also been a substantial rise in intentionally harmful attacks providing why defensive solutions to secure AI are needed.
Related:Microsoft IDs Russia-Backed Actor Behind Leadership Email Hacks
He adds, Security has to maintain pace with AI to accelerate innovation. Thats why its imperative to safeguard your most valuable assets from development to implementation companies must regularly update and refine their AI-specific security program to address new challenges and vulnerabilities.
See the rest here:
Microsoft, OpenAI: US Adversaries Armed with GenAI - InformationWeek
Understanding the Basics of Machine Learning: A Comprehensive Guide – Medium
In the rapidly evolving world of technology, machine learning (ML) stands out as a transformative force, driving innovation and efficiency across numerous industries. This article discusses some of the fundamentals of machine learning, its various components, and how organizations can leverage this technology to gain a competitive edge.
What is Machine Learning?
Machine learning is a subset of artificial intelligence (AI) that enables systems to learn and improve from experience without being explicitly programmed. At its core, ML involves the development of algorithms that can analyze and interpret data, learn from it, and then make decisions or predictions based on what they have learned. The essence of machine learning is its ability to adapt to new data independently, allowing for more accurate outcomes without human intervention.
Critical Components of Machine Learning
Machine learning encompasses a broad spectrum of methods and technologies, but it can be broadly categorized into three main types:
Supervised Learning: This involves training a model on a labeled dataset, which means that each training example is paired with the output it should produce. The goal is for the model to learn to predict the output from the input data, making it suitable for regression and classification problems.
Unsupervised Learning: Unlike supervised learning, unsupervised learning works with datasets without labeled responses. The system tries to learn the patterns and the structure from the data by itself, which is helpful for clustering, dimensionality reduction, and association rule learning.
Reinforcement Learning: This type of learning is based on agents acting in an environment to achieve some goals. Through trial and error, these agents learn the best strategies to achieve their objectives, optimizing their performance based on rewards and punishments.
Each of these components plays a vital role in solving different types of problems and can be chosen based on a projects specific needs and goals.
How Organizations Benefit from Machine Learning
Machine learning offers myriad benefits to organizations, enabling them to operate more efficiently and effectively in todays data-driven world. Some of the key advantages include:
Enhanced Decision Making: By analyzing vast amounts of data, ML models can uncover patterns and insights that humans might overlook, leading to better-informed decisions.
Increased Efficiency: Automating routine tasks with ML algorithms can significantly reduce the time and resources required, allowing human employees to focus on more strategic activities.
Personalization: ML enables businesses to offer personalized experiences to their customers by understanding individual preferences and behaviors, thereby increasing engagement and satisfaction.
Predictive Analytics: Organizations can use ML to forecast future trends and behaviors, allowing them to prepare and adapt to upcoming changes effectively.
Innovation: The insights gained from ML can drive the development of new products and services, opening new markets and opportunities.
Conclusion
Machine learning is a powerful tool that has the potential to revolutionize how we work, live, and interact with the world around us. By understanding its basics and applications, organizations can harness its power to unlock new levels of efficiency, innovation, and growth. As machine learning continues to evolve, staying abreast of its developments will be crucial for anyone looking to thrive in the digital age.
References:
Machine Learning is a subset of artificial intelligence Machine learning. https://ded9.com/what-is-machine-learning-artificial-intelligence/
The Role of Machine Learning in the Advancement of Smart Technologies. https://sensory-smart.com/en/post/the-role-of-machine-learning-in-the-advancement-of-smart-technologies/1?locate=en
The rest is here:
Understanding the Basics of Machine Learning: A Comprehensive Guide - Medium
Top 5 Robot Trends 2024 International Federation of Robotics reports | RoboticsTomorrow – Robotics Tomorrow
The stock of operational robots around the globe hit a new record of about 3.9 million units. This demand is driven by a number of exciting technological innovations. The International Federation of Robotics reports about the top 5 automation trends in 2024:
The trend of using Artificial Intelligence in robotics and automation keeps growing. The emergence of generative AI opens-up new solutions. This subset of AI is specialized to create something new from things its learned via training, and has been popularized by tools such as ChatGPT. Robot manufacturers are developing generative AI-driven interfaces which allow users to program robots more intuitively by using natural language instead of code. Workers will no longer need specialized programming skills to select and adjust the robots actions.
Another example is predictive AI analyzing robot performance data to identify the future state of equipment. Predictive maintenance can save manufacturers machine downtime costs. In the automotive parts industry, each hour of unplanned downtime is estimated to cost US$1.3m - the Information Technology & Innovation Foundation reports. This indicates the massive cost-saving potential of predictive maintenance. Machine learning algorithms can also analyze data from multiple robots performing the same process for optimization. In general, the more data a machine learning algorithm is given, the better it performs.
Human-robot collaboration continues to be a major trend in robotics. Rapid advances in sensors, vision technologies and smart grippers allow robots to respond in real-time to changes in their environment and thus work safely alongside human workers.
Collaborative robot applications offer a new tool for human workers, relieving and supporting them. They can assist with tasks that require heavy lifting, repetitive motions, or work in dangerous environments.
The range of collaborative applications offered by robot manufacturers continues to expand.
A recent market development is the increase of cobot welding applications, driven by a shortage of skilled welders. This demand shows that automation is not causing a labor shortage but rather offers a means to solve it. Collaborative robots will therefore complement not replace investments in traditional industrial robots which operate at much faster speeds and will therefore remain important for improving productivity in response to tight product margins.
New competitors are also entering the market with a specific focus on collaborative robots. Mobile manipulators, the combination of collaborative robot arms and mobile robots (AMRs), offer new use cases that could expand the demand for collaborative robots substantially.
Mobile manipulators so called MoMas - are automating material handling tasks in industries such as automotive, logistics or aerospace. They combine the mobility of robotic platforms with the dexterity of manipulator arms. This enables them to navigate complex environments and manipulate objects, which is crucial for applications in manufacturing. Equipped with sensors and cameras, these robots perform inspections and carry out maintenance tasks on machinery and equipment. One of the significant advantages of mobile manipulators is their ability to collaborate and support human workers. Shortage of skilled labor and a lack of staff applying for factory jobs is likely to increase demand.
Digital twin technology is increasingly used as a tool to optimize the performance of a physical system by creating a virtual replica. Since robots are more and more digitally integrated in factories, digital twins can use their real-world operational data to run simulations and predict likely outcomes. Because the twin exists purely as a computer model, it can be stress-tested and modified with no safety implications while saving costs. All experimentation can be checked before the physical world itself is touched. Digital twins bridge the gap between digital and physical worlds.
Robotics is witnessing significant advancements in humanoids, designed to perform a wide range of tasks in various environments. The human-like design with two arms and two legs allows the robot to be used flexibly in work environments that were actually created for humans. It can therefore be easily integrated e.g. into existing warehouse processes and infrastructure.
The Chinese Ministry of Industry and Information Technology (MIIT) recently published detailed goals for the countrys ambitions to mass-produce humanoids by 2025. The MIIT predicts humanoids are likely to become another disruptive technology, similar to computers or smartphones, that could transform the way we produce goods and the way humans live.
The potential impact of humanoids on various sectors makes them an exciting area of development, but their mass market adoption remains a complex challenge. Costs are a key factor and success will depend on their return on investment competing with well-established robot solutions like mobile manipulators, for example.
The five mutually reinforcing automation trends in 2024 show that robotics is a multidisciplinary field where technologies are converging to create intelligent solutions for a wide range of tasks, says Marina Bill, President of the International Federation of Robotics. These advances continue to shape the merging industrial and service robotics sectors and the future of work.
Physicists detect elusive ‘Bragg glass’ phase with machine learning tool | Cornell Chronicle – Cornell Chronicle
Cornell quantum researchers have detected an elusive phase of matter, called the Bragg glass phase, using large volumes of x-ray data and a new machine learning data analysis tool. The discovery settles a long-standing question of whether this almostbut not quiteordered state of Bragg glass can exist in real materials.
Crystal structure of pure ErTe3
The paper, Bragg glass signatures in PdxErTe3 with X-ray diffraction Temperature Clustering (X-TEC), published in Nature Physics on Feb. 9. The lead author isKrishnanand Madhukar Mallayya, postdoctoral researcher in the Department of Physics in the College of Arts and Sciences (A&S). Eun-Ah Kim, professor of physics (A&S), is the corresponding author. The research was conducted in collaboration with scientists at Argonne National Laboratory and at Stanford University.
The researchers present the first evidence of a Bragg glass phase as detected from X-ray scattering, which is a probe that accesses the entire bulk of a material, as opposed to just the surface of a material, in a systematically disordered charge density wave (CDW) material, PdxErTe3. They used comprehensive X-ray data and a novel machine learning data analysis tool, X-ray Temperature Clustering (X-TEC).
Despite its theoretical prediction three decades ago, concrete experimental evidence for CDW Bragg glass in the bulk of the crystal remained missing, Mallayya said.
Read the full story on the College of Arts and Sciences website.
Cracking the Code: How Uber Masters ETA Calculation on a Massive Scale – Medium
Predicting ETAs
Ubers main goal in predicting ETA was to be reliable. This means that the estimated time of arrival should be very close to the actual time, and this accuracy should be consistent across different places and times.
The simplest approach that comes to mind to find the predicted ETA is to use map data, such as the haversine distance (shortest distance between two points), and add a scaler for speed. However, this method is not sufficient, as there can be a significant gap between the predicted and the actual ETA calculation since people dont travel in a straight line between two points.
To address this issue, Uber has incorporated additional layers such as routing, traffic information, map matching, and machine learning algorithms to enhance the reliability of the predicted ETA.
Lets deep dive into the additional layers.
Problem statement : Build a large-scale system that computes the route from origin to destination with the least cost and low latency.
To achieve this they represent the physical map as a graph
Every road intersection represents a node, and each road segment is represented as a directed edge.
To determine the ETA, they need to find the shortest path in this directed weighted graph. Dijkstras algorithm is commonly used for this purpose, but its time complexity is O(n log n), where n is the number of road intersections or nodes in the graph.
Considering the vast scale of Ubers operations, such as the half a million road intersections in the San Francisco Bay Area alone, Dijkstras algorithm becomes impractical.
To address this issue, Uber partitions the graph and precomputes the best path within each partition.
Interacting with the boundaries of graph partitions alone is sufficient to discover the optimal path.
Picture a dense graph represented on a circular map.
To find the best path between two points in a circle, traditionally, every single node in the circle needs to be traversed, resulting in a time complexity proportional to the area of the circle ( * r).
However, by partitioning and precomputing, efficiency is improved. It becomes possible to find the best path by interacting only with the nodes on the circles boundary, reducing the time complexity to the perimeter of the circle (2 * * r).
In simpler terms, this means that the time complexity for finding the best path in the San Francisco Bay Area has been reduced from 500,000 to 700.
Once we have the route, we need to determine the travel time. To do that, we require traffic information.
Consider traffic conditions when determining the fastest route between two points.
Traffic depends on factors like time of day, weather, and the number of vehicles on the road.
They used traffic information to determine the edge weights of the graph, resulting in a more accurate ETA.
They integrated historical speed data with real-time speed information to enhance the accuracy of traffic updates, as the inclusion of additional traversal data contributes to more precise traffic information.
Before moving forward, there were two critical questions that needed addressing:
1. Validity of Real-time Speed: Too short a duration might imply a lack of understanding of the current road conditions. Conversely, if its too long, the data becomes outdated.
2. Integrating Historical and Real-time Speeds: Striking a balance here involves a tradeoff between bias and variance. Prioritizing real-time data yields less bias but more variance. Emphasizing historical data introduces more bias but reduces variance. The challenge lies in finding the optimal balance between the two
GPS signals can be less reliable and less frequent, especially when a vehicle enters a tunnel or an area with many tall buildings that can reflect the GPS signals.
Also, mobile GPS signals are usually close to the street segments but not perfectly on it, which makes it difficult to get the exact street coordinates.
Map matching is like connecting the dots! Imagine you have red dots representing raw GPS signals.
Now, the goal is to figure out which road segments these dots belong to. Thats where map matching comes in it links those red dots to specific road segments.
The resulting blue dots show exactly where those GPS signals align with the road segments. Its like fitting the puzzle pieces together to see the actual path on the map.
They use the Kalman filter for map matching. It takes GPS signals and matches them to road segments.
Besides they use the Viterbi algorithm to find the most probable road segments. Its a dynamic programming approach.
Ubers initial aim was to provide reliable ETA information universally. Reliability has been discussed above; now, lets shift the focus to how Uber ensures availability everywhere.
Uber has observed that ETA predictions in India are less accurate compared to North America due to systematic biases or inefficiencies. This is where Machine Learning (ML) can play a crucial role by capturing variations in: 1. Regions 2. Time 3. Trip types 4. Driver behavior etc
By leveraging ML, Uber aims to narrow the gap between predicted ETAs and actual arrival times, thereby enhancing the overall reliability and user experience.
Lets define a few terms, and then we will better understand their decisions.
1. Linear Model: Definition: A linear model assumes a linear relationship between the input variables (features) and the output variable. It follows the equation (y = mx + b), where (y) is the output, (x) is the input, (m) is the slope, and (b) is the intercept. Example: Linear regression is a common linear model used for predicting a continuous outcome.
2. Non-linear Model: Definition: A non-linear model does not assume a linear relationship between the input and output variables. It may involve higher-order terms or complex mathematical functions to capture the patterns in the data. Example: Decision trees, neural networks, and support vector machines with non-linear kernels are examples of non-linear models.
3. Parametric Model: Definition: A parametric model makes assumptions about the functional form of the relationship between variables and has a fixed number of parameters. Once the model is trained, these parameters are fixed. Example: Linear regression is parametric since it assumes a linear relationship with fixed coefficients.
4. Non-parametric Model: Definition: A non-parametric model makes fewer assumptions about the functional form and the number of parameters in the model. It can adapt to the complexity of the data during training. Example: k-Nearest Neighbors (KNN) is a non-parametric algorithm, as it doesnt assume a specific functional form and adapts to the data during prediction based on the local neighbourhood of points.
Since ETA is influenced by factors like location and time of day, and there is no predefined relationship between variables, they opted for non-linear and non-parametric machine learning models like
In their terms, With great (modelling) power comes great (reliability) responsibility! So, they have fallback ETAs to avoid system downtime situations.
They also monitor ETA to prevent issues for both internal and external consumers.
Link:
Cracking the Code: How Uber Masters ETA Calculation on a Massive Scale - Medium
AI What is it good for? ‘Machine Learning’ at Central Square Theatre takes a look – WBUR News
The longer one lives, the more opportunities there are to act as a caregiver for a loved one in need. Though its not a glamorous job (its downright difficult), luckily, there are technological tools that can help. Reminders to take medicine or to call a doctor can be set with Siri or Alexa, family members can use cameras to converse and to ensure a loved ones safety, and there are multiple ways that artificial intelligence (AI) can be used to perform tasks, make predictions and even getspeedier diagnosesof various diseases, particularly cancer.
But even with all its promise, how much should technology take on? Will privacy and other ethical lines continue to blur? Does technologys presence in health care factor in that some people might do better than their prognosis? What of hope and faith? Questions like these shape Francisco Mendozas probing play Machine Learning (through Feb. 25 at Central Square Theater), where a son aims to help his father, who is battling cancer and a penchant for alcohol, with an app he named Arnold (a perfectly machine-sounding Matthew Zahnzinger). The Central Square production was produced in partnership with Teatro Chelsea, which Rivera helms, and the Catalyst Collaborative@MIT.
Whats interesting about the bilingual show is that it doesnt attempt to present definitive answers to the imminent questions about technologys use in health care. However, through the lens of a father and son (Gabriel and Jorge) struggling to connect, it does present a balanced case not too heavily laden with tech speak so that its unapproachable that shows how leaning too much on tech alone could help or hurt.
Machine Learning (ML) is a type of AI that isnt necessarily programmed to perform a specific task but can learn to make decisions or predictions over time as its exposed to more data. In the play Jorge (Armando Rivera) lands a paid fellowship and uses his app, Arnold (ML), to help manage his dad, Gabriel (Jorge Alberto Rubio) from pills to predictions and recommendations.
It's a solid production under Gabriel Vega Weissmans direction. Multiple suspended screens are aglow with a green churning image when Arnold speaks. The actor voicing Arnold is offstage. The clever (and on-genre) use of video and projections by SeifAllah Salotto-Cristobal and white screen-shaped squares that hide furniture and other props (courtesy of scenic designer Janie E. Howland and props person Julia Wonkka) bring the audience through multiple settings.
There are even a few telling visits into the past a terrifying car accident, Gabriel and a young Jorge watching "The Terminator" or Jorges visit to his dads house after Gabriel and his mom divorced that highlight how the chasm between them has widened and seems uncrossable. In these scenes, the acting chops of a young Jorge, wonderfully rendered by Xavier Rosario, get to shine.
Despite their challenges, Jorge and Gabriel still love each other. What Jorge lacks when it comes to expressing sentimental emotion, he funnels into monitoring and secretly hoping to save his dad. But what Jorge forgets, like many of us sometimes do, is that we all have a responsibility in relationships. Everyone has the choice to talk about what ails them, to unburden themselves, and often, to forgive. Not doing so can lead to torment.
But most of all, Jorge momentarily forgets that the use of tech doesnt mean that the action or inaction of AI will always be accurate or helpful or that it will always do what one hopes. After all, the data AI is driven by is derived from humans with all our innovation and intelligence as well as our biases and shortcomings.
Machine Learning at Central Square Theater shows through Feb. 25. The play was produced in partnership with Teatro Chelsea and the Catalyst Collaborative@MIT.
Here is the original post:
AI What is it good for? 'Machine Learning' at Central Square Theatre takes a look - WBUR News
How symmetry can come to the aid of machine learning – MIT News
Behrooz Tahmasebi an MIT PhD student in the Department of Electrical Engineering and Computer Science (EECS) and an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL) was taking a mathematics course on differential equations in late 2021 when a glimmer of inspiration struck. In that class, he learned for the first time about Weyls law, which had been formulated 110 years earlier by the German mathematician Hermann Weyl. Tahmasebi realized it might have some relevance to the computer science problem he was then wrestling with, even though the connection appeared on the surface to be thin, at best. Weyls law, he says, provides a formula that measures the complexity of the spectral information, or data, contained within the fundamental frequencies of a drum head or guitar string.
Tahmasebi was, at the same time, thinking about measuring the complexity of the input data to a neural network, wondering whether that complexity could be reduced by taking into account some of the symmetries inherent to the dataset. Such a reduction, in turn, could facilitate as well as speed up machine learning processes.
Weyls law, conceived about a century before the boom in machine learning, had traditionally been applied to very different physical situations such as those concerning the vibrations of a string or the spectrum of electromagnetic (black-body) radiation given off by a heated object. Nevertheless, Tahmasebi believed that a customized version of that law might help with the machine learning problem he was pursuing. And if the approach panned out, the payoff could be considerable.
He spoke with his advisor, Stefanie Jegelka an associate professor in EECS and affiliate of CSAIL and the MIT Institute for Data, Systems, and Society who believed the idea was definitely worth looking into. As Tahmasebi saw it, Weyls law had to do with gauging the complexity of data, and so did this project. But Weyls law, in its original form, said nothing about symmetry.
He and Jegelka have now succeeded in modifying Weyls law so that symmetry can be factored into the assessment of a datasets complexity. To the best of my knowledge, Tahmasebi says, this is the first time Weyls law has been used to determine how machine learning can be enhanced by symmetry.
The paper he and Jegelka wrote earned a Spotlight designation when it was presented at the December 2023 conference on Neural Information Processing Systems widely regarded as the worlds top conference on machine learning.
This work, comments Soledad Villar, an applied mathematician at Johns Hopkins University, shows that models that satisfy the symmetries of the problem are not only correct but also can produce predictions with smaller errors, using a small amount of training points. [This] is especially important in scientific domains, like computational chemistry, where training data can be scarce.
In their paper, Tahmasebi and Jegelka explored the ways in which symmetries, or so-called invariances, could benefit machine learning. Suppose, for example, the goal of a particular computer run is to pick out every image that contains the numeral 3. That task can be a lot easier, and go a lot quicker, if the algorithm can identify the 3 regardless of where it is placed in the box whether its exactly in the center or off to the side and whether it is pointed right-side up, upside down, or oriented at a random angle. An algorithm equipped with the latter capability can take advantage of the symmetries of translation and rotations, meaning that a 3, or any other object, is not changed in itself by altering its position or by rotating it around an arbitrary axis. It is said to be invariant to those shifts. The same logic can be applied to algorithms charged with identifying dogs or cats. A dog is a dog is a dog, one might say, irrespective of how it is embedded within an image.
The point of the entire exercise, the authors explain, is to exploit a datasets intrinsic symmetries in order to reduce the complexity of machine learning tasks. That, in turn, can lead to a reduction in the amount of data needed for learning. Concretely, the new work answers the question: How many fewer data are needed to train a machine learning model if the data contain symmetries?
There are two ways of achieving a gain, or benefit, by capitalizing on the symmetries present. The first has to do with the size of the sample to be looked at. Lets imagine that you are charged, for instance, with analyzing an image that has mirror symmetry the right side being an exact replica, or mirror image, of the left. In that case, you dont have to look at every pixel; you can get all the information you need from half of the image a factor of two improvement. If, on the other hand, the image can be partitioned into 10 identical parts, you can get a factor of 10 improvement. This kind of boosting effect is linear.
To take another example, imagine you are sifting through a dataset, trying to find sequences of blocks that have seven different colors black, blue, green, purple, red, white, and yellow. Your job becomes much easier if you dont care about the order in which the blocks are arranged. If the order mattered, there would be 5,040 different combinations to look for. But if all you care about are sequences of blocks in which all seven colors appear, then you have reduced the number of things or sequences you are searching for from 5,040 to just one.
Tahmasebi and Jegelka discovered that it is possible to achieve a different kind of gain one that is exponential that can be reaped for symmetries that operate over many dimensions. This advantage is related to the notion that the complexity of a learning task grows exponentially with the dimensionality of the data space. Making use of a multidimensional symmetry can therefore yield a disproportionately large return. This is a new contribution that is basically telling us that symmetries of higher dimension are more important because they can give us an exponential gain, Tahmasebi says.
The NeurIPS 2023 paper that he wrote with Jegelka contains two theorems that were proved mathematically. The first theorem shows that an improvement in sample complexity is achievable with the general algorithm we provide, Tahmasebi says. The second theorem complements the first, he added, showing that this is the best possible gain you can get; nothing else is achievable.
He and Jegelka have provided a formula that predicts the gain one can obtain from a particular symmetry in a given application. A virtue of this formula is its generality, Tahmasebi notes. It works for any symmetry and any input space. It works not only for symmetries that are known today, but it could also be applied in the future to symmetries that are yet to be discovered. The latter prospect is not too farfetched to consider, given that the search for new symmetries has long been a major thrust in physics. That suggests that, as more symmetries are found, the methodology introduced by Tahmasebi and Jegelka should only get better over time.
According to Haggai Maron, a computer scientist at Technion (the Israel Institute of Technology) and NVIDIA who was not involved in the work, the approach presented in the paper diverges substantially from related previous works, adopting a geometric perspective and employing tools from differential geometry. This theoretical contribution lends mathematical support to the emerging subfield of Geometric Deep Learning, which has applications in graph learning, 3D data, and more. The paper helps establish a theoretical basis to guide further developments in this rapidly expanding research area.
See the original post here:
How symmetry can come to the aid of machine learning - MIT News
Data, Artificial Intelligence (AI), and Machine-Learning Are the Cornerstones of Prosperous Real Estate Portfolios – ATTOM Data Solutions
The only way for investors to achieve sustained outperformance relative to the market and their peers is if they have a unique ability to uncover material facts that are almost completely unknown to everybody else.
Mark J. Higgins, CFA, CFP, CFA Institute
The best investors have an uncanny ability to identify undervalued stocks the hidden gems. They see a stock that will outperform the market where most investors see nothing at all. The housing market is not the stock market, but some investors manage to jump on the best deals that others miss, and they are tapping data solutions to do so.
In this article, we explore how data, machine learning, and artificial intelligence-powered solutions are now integral to real estate investing at every stage. From property searches and deal negotiations to project and portfolio management, real estate and property AI solutions can help investors to make data-driven decisions and be more profitable.
To outperform the market, you need to identify undervalued assets. That means assessing an assets future potential and understanding all the variables that might affect your investment over time.
In the case of real estate, the variables include how much cashflow an asset can produce from future rentals, whether units need upgrades or refurbishments, the market demand for properties, economic variables, such as employment, crime rates, and interest rates, any risks to the property due to climate or hazards, and more.
Finding such data used to be time-intensive, if it could be found at all, and much of it might be overlooked in the rush to seal a deal. Today, however, investors have all of this information accessible from data platforms and APIs. Investors can tailor analytics to focus on the criteria they care about and still make fast investment decisions.
It used to be that real estate investors relied on networking in their locales to find out about potential projects. The geographic areas for sourcing properties were limited. Real Estate API data platforms have removed boundary limitations by providing real estate and property data on a national level and down to the granular street level. The world has opened up for investors, and the only boundaries investors worry about now are neighborhood boundary lines for school districts, demographics, and local house prices.
The incredible growth in Proptech sector, or property technology, had created rapid saturation. Proptech are digital solutions and startups providing tools to real estate professionals, asset managers, and property owners. They facilitate the researching, buying, selling, and managing of real estate. According to Globe Newswire the worldwide PropTech market was valued at billions of dollars and growing rapidly. Market size was around USD 19.5 billion in 2022 and is predicted to grow to around USD 32.2 billion by 2030.
Examples of these cutting-edge technologies are ATTOM, a property and real estate data provider; Zillow, another dataset provider; Opendoor, a digital platform for buying and selling homes, and Homelight, which matches buyers and sellers. Other players include Axonize, a Smart Building Software as a Service (SaaS), that uses IoT to help property owners optimize energy consumption, reduce costs, and improve space utilization. Home365 is a property management solution that offers vacancy insurance rental listings, and tenant management and maintenance.
Before the rise of Proptech and APIs, conventional analytical methods required investors and analysts to wade through millions of records or data points to discern patterns. By the time an investor arrived at a decision, and probably a risky one, the best opportunities were gone.
Lets say a developer is looking for parcel zones suitable for development. Using advanced analytics based on artificial intelligence (AI) and machine learning, the developer can collect hyperlocal community data, expected land use, government planning data, and local economic data to assess the potential ROI of a parcel.
An investor might be looking for a commercial property investment. Combining Yelp data with property price data might show that having two upscale restaurants within a quarter of a mile correlates with higher property prices, while more than four correlates with lower prices. This type of information is an example of how an investor might use data to identify investment targets quicker than their competitors.
AI and machine-learning solutions parse an unlimited amount of information that is the right mix of community, pricing, and location-based data to provide results.
Real Estate Data providers like ATTOM offer expansive data about properties, market trends, and historical sales. They offer neighborhood data, climate data, and other valuable data that can be used for predictive modeling to manage risk.
The investment decision is just one area where data has changed real estate investing. Property owners also use technology for project management.
Just as identifying potential real estate investments is now a data and solution-driven process, property management is also now digitalized. Solutions like Appfolio and Doorloop track property performance metrics like occupancy rates, maintenance costs, and rental income for investors.
Many of these solutions, including AppFolio and Buildium, automate rent collection, maintenance tracking, and will take care of communications between management and tenants using chatbots and automated emails.
Pouring over Excel spreadsheets and risk ratios and following due diligence used to be the way to a robust, risk-mitigated portfolio. But digital solutions like BiggerPockets and DealCheck will analyze deals, assess ROI, and evaluate risk for you. They will even educate you on investing and team you up with agents and brokers that serve your niche.
DealChecks software analyzes deals such as rental property acquisitions, flips, and multi-family buildings. It will estimate profits and configure deal parameters for you.
Granted, these solutions are limited in that they cannot structure an investing strategy. For that, investors must decide their niche or direction and find projects that follow their business model. Then, data analytics can support that strategic direction with long-term roles and goals for projects and investments.
Lets say an investor wants to build a portfolio of multifamily buildings, machine learning algorithms can identify neighborhoods with potential based on macro data and hyperlocal forecasts, such as the demand for multifamily housing and government subsidies. This allows the asset manager to identify the undervalued properties the hidden gems.
Its true that institutional investors have the resources to hire teams of experts to build models and create architecture. They can hire translators to apply findings to actions. But just like online investing platforms democratized stock investing, data APIs are leveling the playing field for real estate.
Pre-digital transformation, only investors teamed with connected and informed real estate brokers could lead real estate investing. Today, data and solutions providers have opened up a world where nationwide property data is at their fingertips and informed analytical reports are mitigating portfolio risk.
Data, AI, and machine-learning solutions have opened the gates for savvy real estate investors. They are helping to narrow down a competitive field that has reached global proportions.
Learn more about how ATTOMs data can power your portfolio and reveal the hidden gems.
Read the original:
Data, Artificial Intelligence (AI), and Machine-Learning Are the Cornerstones of Prosperous Real Estate Portfolios - ATTOM Data Solutions
Advancing Fairness in Lending Through Machine Learning – Federal Reserve Bank of Philadelphia
Our economys financial sector is using machine learning (ML) more often to support lending decisions that affect our daily lives. While technologies such as these pose new risks, they also have the potential to make lending fairer. Current regulation limits lenders use of ML and aims to reduce discrimination by preventing the use of variables correlated with protected class membership, such as race, age, or neighborhood, in any aspect of the lending decision. This research explores an alternative approach that would use an applicants neighborhood to consciously reduce fairness concerns between LMI and non-LMI applicants. Since this approach is costly to lenders and borrowers, we propose concurrent use with more advanced ML models that soften some of these costs by improving model predictions of default. The combination of embracing ML and setting explicit fairness goals may help address current disparities in credit access and ensure that the gains from innovations in ML are more widely shared. To successfully achieve these goals, a broad conversation should continue with stakeholders such as lenders, regulators, researchers, policymakers, technologists, and consumers.
Read the rest here:
Advancing Fairness in Lending Through Machine Learning - Federal Reserve Bank of Philadelphia