Page 1,900«..1020..1,8991,9001,9011,902..1,9101,920..»

Pandemic Cheating in College Compared with Cheating in Online Chess in Episode 004 of The Score FE News – FE News

Two university professors explore the similarities between cheating during online classes and cheating in online chess communities during episode 004 of The Score Podcast on academic integrity

On episode four of The Score, host Kathryn Baron (@TchersPet) spoke with Dr. Alexander Matros, a professor in the Darla Moore School of Business at the University of South Carolina, and Eren Bilen (@Ernbilen), Assistant Professor of Data Analytics at Dickinson College. Both Matros and Bilen are chess players. They discuss their September 2020 study on online cheating amid COVID-19 and explain how the International Chess Federation and the Internet Chess Club deal with cheating. Their suggestions offer insight about how universities might address cheating.

The full episode is available to listen on Apple, Spotify or at The Score. Excerpts from the episode follow.

Note: Removal of filler words and minor edits have been made for clarity.

Kathryn Baron (01:38): Why did you select online chess as a barometer for online cheating in colleges and universities?

Dr. Alexander Matros (01:44): If you look back, chess had this problem for over 20 years and chess was played online on different platforms, including this Internet Chess Club. And it was interesting to see how they deal with the problem of cheating. And we tried to use this experience for what we can see now in education.

Kathryn Baron (02:04): Now, what are the differences though? Because clearly a private chess club can do different things than a college or university. If you look at a private chess club, they have this problem that if somebodys trying to cheat, then if this information is public knowledge, then nobody wants to join the club because its very difficult to compete with this guy who might win everything. And cheating might take different forms, for example, somebody might help you, then you can use computer support. There are many different ways.

Dr. Alexander Matros (02:39): What we saw last year, for example, during the pandemic, this was a very similar situation because if you think about cheating on exams, for example, at colleges and university schools, you can also get outside help from somebody in your room. You can also try to use your computer and search for answers. From this point of view, these are very similar problems. And the difference is only that in chess, we saw that more than 20 years ago we can try to look at their previous experience. And in academia we just saw that last year.

Kathryn Baron (03:13): Why has it been going on in chess for so long? That just kind of surprised me.

Dr. Alexander Matros (03:19): Just again, this is the nature of the beast. In humans, they would like to win. They would like to get some money and in chess, if you are not the top player, you would like to prove that you can beat everybody else. And if you see that you have some problems, one possibility is to use some outside help. Plus a lot of money is at stake. For example, if you qualify for this event, then even if youre eliminated immediately in the first round, you can win like $5,000. And now we are talking about like more than 20 years ago. It was a lot of cheating. People tried to help, people tried to use computers, and so on. This was just one kind of extreme. But if you look at the research side, you can see that in different competitions, people try to show that theyre the best. Sometimes money is involved, but even if you remove money, [and use] pride, you can try to show that Im better than my opponent. And people did some experimental studies when they found that in a tournament without prices, people would still be ready to put some effort in order to win the competition.

Kathryn Baron (04:29): So its not just that the stakes are high,or theres money involved or winning a competition, but theres also something about human nature thats involved.

Dr. Alexander Matros (04:35): Yes. So even if the price is equal to zero, then people are ready to put in effort when you have to spend resources and then at the end you win nothing, but you win your pride. Okay. So I managed to beat these guys. Yeah.

Kathryn Baron (04:51): In academia, you specifically looked at advanced placement exams, which can earn high school students college credit. And what did you find there? And what was different? Was it online for the first time? Were there hybrids? What changed during COVID?

Eren Bilen (05:08): Yeah, the 2020 AP exams were the first time that these AP exams were given online basically because of COVID. And what happened was if you look at Google searches, and this is public information, you can just access this information, easily. What you see is the 2020 AP exam for the math subject was given on May 12. This was in the afternoon Eastern time. We had 2:00 PM on May 12. If you look at some of the keywords related to math concepts, such as derivative, integral, critical points, inflection point, things like that, youll see a spike, exactly at 2:00 PM, and then following 3:00 PM, and so on, the spike basically disappears.

The next day, on May 13, it was the English literature subject. If you do a similar study, you check, this time instead of checking math-related keywords, you check literature-related keywords. You can [search] imagery, literary techniques, diction, things like that. You get the spike, exactly at 2:00 PM on May 13. This is again the time of the test.

And then last, you can even check physics. For example, this was the next day on May 14, but this time was not 2:00 PM, it was 4:00 PM in the afternoon. And you get this spike on physics related keywords at exactly 4:00 PM on May 14. It looks like students basically do some Google searching in order to find the answers, was this helpful [to the student]? Yes. No. Were not sure, but at least students tried.

Kathryn Baron (06:57): At least they tried to cheat. Was this an unproctored online exam?

Eren Bilen (07:06): That is correct. It was unproctored.

Kathryn Baron (09:12): And you had posed several questions in your report including whether colleges or universities can expect a surge in cheating to continue. And you write that unlike the face to face examination, Cheating should be expected in online testing. And you add that, Cheating is a part of the student equilibrium strategy in the online examination. So what does this say about us? It just seems to be a sad commentary on who we are in our ethics.

Dr. Alexander Matros (09:42): In our paper, we looked at this problem from at least two directions. First was the theoretical approach and second, we looked at some data from a real life exam and had a simple model. In this simple model, we just assumed that a student can either cheat or not to cheat. So if nobody cheats, then professors would never monitor. And then it would be so simple to cheat. And then we also looked at data. Eren, maybe you can just talk a little bit about the data, what we found.

Eren Bilen (10:21): In the data, we were quote-on-quote, lucky in the sense that we had one special tool that enabled us to basically pinpoint whats going on. The issue was we looked at the time the students took to answer their questions. We gave them basically a test with 20 questions. And these questions were not multiple choice. The students had to enter numbers using their keyboards. And what we saw was that some of the students had very strange timings. For example, on a question that you expect a student to take on average, lets say five minutes, the student gave an answer in seven seconds. You can say, Okay, this is one occasion. The student just input a random number or something. That was not the case. [What they gave] was the correct answer. For example, the correct answer was, lets say, 347. A student was able to pick that number 347 in less than 10 seconds. And this kept going and going. Next question. Similar. Third question, again, similar. It kept on going for 20 questions. The overall time the student took to complete the exam was about 10 minutes.

Kathryn Baron (12:30): But Eren, in seven seconds, how did they cheat, could they actually look something up online that quickly?

Eren Bilen (12:36): You cannot do this in seven seconds. What we believe that students had was the answers from other students who volunteered to take the test before they did, and they gave them the correct answers. And then you basically had a list in front of you with question names and then the correct answers. They basically looked at this test, the answer sheet, and it probably took them on average, 10 seconds to be able to figure out that was the question that they were seeing on the screen. And basically, they inputted the correct number using their keyboards. It looks like this on average takes 10 seconds.

Kathryn Baron (15:51): You earlier had mentioned fairness. And it does seem that this issue raises some huge ethical issues around fairness, because a student who works very hard to get good grades could very likely do worse in a class because that student didnt cheat. And even though teachers and professors know from say homework assignments and classroom participation, which students are studying, what can they do when the test results dont reflect that because of cheating?

Dr. Alexander Matros (16:21): Yeah. I think in a sense, you ask very, very important questions. In a sense, during this pandemic during the whole year, we had some expectations, we had some, you can call this social norm, so what we expect. Lets say people would come to a class and they would take a test and then you can rank them based on these results. And everything is from this point of view, more or less fair. Now, if you take a test at home, especially if its not proctored, and nobody knows who took this test, then the situation now is such that we have another social norm. When you have these expectations or if you have these beliefs that everybody else is cheating, this immediately puts you in a situation where you might be the best student, but you feel that you have no chances to compete with this, as a student, unless you cheat as well.

When you have these expectations, these are self-fulfilling expectations. And now if everybody cheats, everybody expects that. And then they play according to these morals.

Dr. Alexander Matros (20:06): If you put a little bit of effort trying to check them, then maybe they would just abstain from this kind of behavior. And then even simple monitoring can remove a lot, a lot of cheating. It would definitely not remove all cheating, but it would remove simple ones. So, for students like you describe, who would actually prepare in their rooms, you cannot eliminate that, but they put in so much effort. If they would study instead, they would do so much better.

Dr. Alexander Matros (22:38): But online, you have some clues, its never direct evidence. Its only like indirect evidence. You can say, okay, so the student took a test and finished this test in 5 minutes with 20 questions. It was multiple choice. And their answer is perfect. But then is it possible? Yes, its possible. Because again, you can also win a lottery, you just put the number and then you just like won. So, the student had a good day, and answered everything correctly. Then its possible. You cannot say this was impossible. Students guess correctly, so perfect.

Kathryn Baron (24:39): But do your colleagues feel that there is a lot of cheating going on in their classes or do they feel that their students, Im just wondering is there a consensus that, Yeah its going on, or are they sort of in the dark about it?

Dr. Alexander Matros (24:54): No, I think this is clearly a consensus that there was cheating and what people will do. They would try to find some ways on how to deal with that.

Dr. Alexander Matros (26:14): In my first 10 years, I had zero cases. And during the pandemic yeah, I did report several cases.

Eren Bilen (32:33): Yeah. We have to move from a bad equilibrium to a better one, absolutely. I absolutely agree. In order to do that, we need to use some sort of proctoring. It could be in-person proctoring, it could be live proctoring, but with the use of proctoring, we can basically move from those bad equilibria to the better ones. Because in a bad equilibrium basically, you give an option to a student to cheat, but if youre using proctoring, then hopefully 99% of the time, a student wont be able to cheat. Thats the key takeaway that I want to point out.

Listen to the entire episode 004 of The Score with guests, Dr. Alexander Matros and Eren Bilen here.

Link:
Pandemic Cheating in College Compared with Cheating in Online Chess in Episode 004 of The Score FE News - FE News

Read More..

Chess stars of the future to face off at UK Chess Challenge event in Shropshire – Shropshire Star

The 2022 Shropshire Megafinal of the UK Chess Challenge will take place at Charlton School in Wellington this June, with trophies up for grabs for the age group winners and medals for other high-scoring competitors.

Players will compete for places at a regional gigafinals later in the year, which in turn could lead to a spot in the national terafinal at Blenheim Palace in October.

The UK Chess Challenge is one of the biggest junior competitions in the sport in the world and has seen more than 1 million youngsters compete since it was launched in 1996.

There will be sections for under eights, under 10s, under 12s, under 14s and under 18s and the aim is to give youngsters from beginners to strong players a taste of tournament chess.

Those who progress from the Wellington event on June 25 will have the chance to compete at one of the gigafinals in Manchester and Harrow in July or online on the Lichess chess-playing website in September.

A Challengers event will also take place in September to determine the top 60 players 12 in each category to gain places at the 11-round terafinal on October 15 and 16.

Shropshire has a growing junior chess club that meets on Saturdays at the Nerdy Caf in Shrewsbury, while several juniors also compete for clubs in the Shropshire Chess League.

Many players who started out as juniors locally have now become some of the countys strongest players, including Athar Ansari of the Oakengates-based Maddocks club and Newport duo Daniel Hilditch-Love and Thalia Holmes.

Christopher Lewis, organiser of the UK Chess Challenge Shropshire Megafinal, has written to schools across the county urging any which run chess clubs to enter players.

He added he was also on the lookout for volunteers to help make sure the event runs smoothly.

He said: We are now looking for volunteers to help run the event on the day and also setting up the evening before. Those on the day will likely act as an arbiter for one of the sections.

You just need an enthusiasm for junior chess. Take it from me that volunteering at a junior tournament is a surprising amount of fun and an incredibly fulfilling day.

To learn more or to inquire about volunteering at the event, contact Mr Lewis on 07508 487092 or email christopher.d.lewis44@gmail.com

More here:
Chess stars of the future to face off at UK Chess Challenge event in Shropshire - Shropshire Star

Read More..

Chess Valley RFC to ride to Paris in aid of three charities – Watford Observer

Thirteen people will cycle more than 250 miles to Paris to mark a rugby clubs 25th anniversary.

The team at Chess Valley RFC will embark on the ride from their home ground at Croxley Guild of Sport on June 3, finishing at the Stade de France in Paris on Sunday.

On the way, the group ofplayers, coaches, support crew and the club chairman will stop off at 25 grassroots rugby clubs.

The 255-mile ride will be in aid of three charities:Cure Parkinsons, Whizz-Kids, and the Matt Ratana Rugby Foundation.

Team captain Andy Smith said: We have been training very hard for this ride, and we needed to. We are a few years away from our peaks, and most of us are built more for the Tower of Power than the Tour de France.

One hundred miles and 17 rugby clubs on day 1 will keep us busy, but the overnight ferry followed by another 155 miles will then challenge us a little more. But the support we have received has been superb, from the Chess Valley RFC community, from friends, and from dozens of other clubs on the route or local to us here in Hertfordshire.

Weve even had support from some English rugby stars. We are all cycling with huge pride as we raise money for three important charities.

Among the 25 clubs the team will visit along the away are Met Police RFC and East Grinstead RFC, in honour of police sergeant Matt Ratana, who was shot inside a police custody facility in London in 2020.

The Matt Ratana Rugby Foundation, which the squad will be raising money for alongside two other charities, exists to support rugby initiatives in schools and in the community.

Several of the Chess Valley RFC team are also Metropolitan Police officers.

Police sergeant Matt Ratana who was killed while on duty in 2020. Credit: Metropolitan Police

Andy added: Our aim is to raise as much money as possible for these causes, but also to celebrate the importance of grassroots rugby in the community. The perfect way to mark Chess Valleys 25th year.

To support the team and donate to their challenge visit https://www.justgiving.com/crowdfunding/chessvalleyvelos

More here:
Chess Valley RFC to ride to Paris in aid of three charities - Watford Observer

Read More..

MLPDS Machine Learning for Pharmaceutical Discovery and Synthesis …

is a collaboration between the pharmaceutical and biotechnology industries and the departments of Chemical Engineering, Chemistry, and Computer Science at the Massachusetts Institute of Technology. This collaboration will facilitate the design of useful software for the automation of small molecule discovery and synthesis.

The MIT Consortium,Machine Learning for Pharmaceutical Discovery and Synthesis(MLPDS), bringstogether computer scientists, chemical engineers, and chemists from MIT with scientists from membercompaniesto create new data science and artificial intelligence algorithms along with tools to facilitate thediscovery and synthesis of new therapeutics. MLPDS educates scientists and engineers to work effectively atthe data science/chemistry interface and provides opportunities for member companies and MIT tocollectively create, discuss, and evaluate new advances in data science for chemical and pharmaceuticaldiscovery, development, and manufacturing.

Specific research topics within the consortium include synthesis planning; prediction of reaction outcomes,conditions, and impurities; prediction of molecular properties; molecular representation, generation, andoptimization (de novo design); and extraction and organization of chemical information. The algorithms aredeveloped and validated on public data and then transferred to member companies for application toproprietary data. All members share intellectual property and royalty free access to all developments. MITendeavors to make tool development and transfer successful through one-on-one meetings andteleconferences with individual member companies, Microsoft Teams channels, GitLab software repositories,and consortium face-to-face meetings and teleconferences.

More here:
MLPDS Machine Learning for Pharmaceutical Discovery and Synthesis ...

Read More..

Leveraging machine learning processes to revolutionize the adoption of AI – Express Computer

By Amaresh Tripathy, Global Analytics Business Leader, Genpact

Since the rise of digitalization in the post pandemic world, the role of Artificial Intelligence (AI) and Machine Learning (ML) in driving digital business transformation has greatly increased. Enterprise leaders are accelerating digital initiatives at an unprecedented rate across industries, transforming how people live and work. However, as these programmes take shape, it is observed that only around half of all AI proofs of concept make it to production. For most teams, realizing their AI vision is still a long way off.

The push to move to cloud, as well as the expanding number of machine learning models, that witnessed tremendous growth during the pandemic, is projected to continue in the future. However, while operationalizing Artificial Intelligence, it has been found that merely 27% of the projects piloted by organizations successfully move to production.

What is Machine learning Operations all about?Machine learning operations (MLOps) is all about how to effectively manage data scientists and operations resources to allow for successful development, deployment, and monitoring of models. Simply put, MLOps assist teams in developing, deploying, monitoring, and scaling AI and ML models in a consistent manner, reducing the risks associated with not having a framework for long-term innovation. Consider MLOps to be a success formula.

The challenge The disparity between what AI/ML is used for at present and its potential usage, stems from a number of problems. These are largely related to model building, iteration, deployment, and monitoring. If AI/ML is to alter the global corporate landscape, these concerns must be solved. Organizations that have already begun their path to operationalize AI/ML or are generating Proofs of Concept (PoC) might avoid some of these pitfalls by proactively incorporating best-practices in MLOps to enable smooth model development and addressing scaling issues.

Worse still, organizations spend precious time and resources monitoring and retraining models. Successful machine learning experiments are difficult to duplicate, and data scientists lack access to the technical infrastructure required to develop.

Paving the way to implementationThe development of a Machine Learning model often begins with a business objective, which can be as simple as minimizing fraudulent transactions to less than 0.1 percent or even being able to recognize peoples faces in a photograph on social networking platforms. Additionally, business objectives can also include performance targets, technical infrastructure requirements, and financial constraints; all of which can be represented as key performance indicators, or KPIs, which further allow the business performance of ML models in production to be monitored.

MLOps help ML-based solutions get into production faster through automated model training and retraining processes, as well as continuous integration and continuous delivery strategies for delivering and upgrading Machine Learning pipelines.

MLOps practices and framework allow data engineers to design and build automated data pipelines, data ops platforms, and automated data feedback loops for model improvement, resolving more than 50 issues related to the lack of clean, regulated, governed, and monitored data needed to build production-ready models.

Way forward: The future of MLOpsAccording to our research, many organizations are keen to have centralized ML Operations in the future, as opposed to the current de-centralized approach. The benefit of this type of centralized learning is that the model can generalize based on data from a group of devices and thus work with other compatible devices immediately. Centralized learning also implies that data can explain all the differences between devices and their environments.

MLOps, while still an uncharted territory for many, is quickly becoming a necessity for businesses across industries, with the hope that it makes the business more dependable, scalable and efficient. If the benefits of AI are to be realized, the models that increasingly drive business decisions must follow suit. For years, AI has been optimized through DevOps in the way software is built, run, and maintained, and it is now time to do the same for Machine Learning. It is critical to make AI work at scale with MLOp.

Advertisement

See the rest here:
Leveraging machine learning processes to revolutionize the adoption of AI - Express Computer

Read More..

Is logistic regression the COBOL of machine learning? – Analytics India Magazine

Joseph Berkson developed logistic regression as a general statistical model in 1944. Today, logistic regression is one of the main pillars of machine learning. From predicting Trauma and Injury Severity Score (TRISS) to sentiment analysis of movie reviews, logistic regression has umpteen applications in ML.

In a recent tweet, Bojan Tunguz, senior software engineer at NVIDIA, compared logistic regression to COBOL, a high-level programming language used for business applications.

It would be great if we could replace all of the logistic regressions with more advanced algos, but realistically we will never completely get rid of them, said Bojan.

COBOL first gained currency in 1970 when it became the de-facto programming language for business applications in mainframe computers around the world.

COBOLs relevance is chalked up to its simplicity, ease of use and portability.

Logistic regression is a simple classification algorithm used to model the probability of a discrete outcome from a given set of input variables. LR is used in supervised learning for binary classification problems.

Pierre Franois Verhulst published the first logistic function in 1838. Logistic regression was used in the biological sciences in the early twentieth century.

Source: dataaspirant.com

After Verhulsts initial discovery of logistic function, the most notable discoveries were the probit model, developed by Chester Ittner Bliss in 1934 and maximum likelihood estimation by Ronald Fisher in 1935.

In 1943, Wilson and Worcester used the logistic model in bioassay which was the first known application of its kind. In 1973 Daniel McFadden connected the multinomial logit to the theory of discrete choice, specifically Luces choice axiom. This gave a theoretical foundation for logistic regression, and earned McFadden a Nobel prize in 2000.

According to the global survey by Micro Focus, COBOL is viewed as strategic by 92 % of respondents.

Key findings of the Micro Focus COBOL Surveys include:

The importance of different AI/ML topics in organisations worldwide (2019). Source: Statista

Bojan Tunguzs tweet garnered both for and against responses.

While many said a simple solution that works should not be messed with, the opposite camp said complex algorithms like XGBoost provide better results.

Andreu Mora, an ML and Data science expert at Adyen payments, said: If a simple algorithm gets you a good performance it might not be a wise move to increase operational work by 500% for a 5% performance uplift.

To this, Bojan replied: Depends on the use case. If a 5% improvement in performance can save you $5B, then you totally should consider it.

Amr Malik, a research fellow at Fast.ai, said: For this scenario to be true, youd need to be supporting a $100 Billion dollar business operation with LR based models. Thatd be a gutsy bet on a really big farm.

We have picked the best responses from the tweet thread:

On LinkedIn, Damien Benveniste, an ML tech lead at Meta AI, said he never uses algorithms like logistic regression, Naive Bayes, SVM, LDA, KNN, Feed Forward Neural Network, etc. and relies only on XGBoost.

Read more here:
Is logistic regression the COBOL of machine learning? - Analytics India Magazine

Read More..

AI and machine learning are improving weather forecasts, but they won’t replace human experts – The Conversation Indonesia

A century ago, English mathematician Lewis Fry Richardson proposed a startling idea for that time: constructing a systematic process based on math for predicting the weather. In his 1922 book, Weather Prediction By Numerical Process, Richardson tried to write an equation that he could use to solve the dynamics of the atmosphere based on hand calculations.

It didnt work because not enough was known about the science of the atmosphere at that time. Perhaps some day in the dim future it will be possible to advance the computations faster than the weather advances and at a cost less than the saving to mankind due to the information gained. But that is a dream, Richardson concluded.

A century later, modern weather forecasts are based on the kind of complex computations that Richardson imagined and theyve become more accurate than anything he envisioned. Especially in recent decades, steady progress in research, data and computing has enabled a quiet revolution of numerical weather prediction.

For example, a forecast of heavy rainfall two days in advance is now as good as a same-day forecast was in the mid-1990s. Errors in the predicted tracks of hurricanes have been cut in half in the last 30 years.

There still are major challenges. Thunderstorms that produce tornadoes, large hail or heavy rain remain difficult to predict. And then theres chaos, often described as the butterfly effect the fact that small changes in complex processes make weather less predictable. Chaos limits our ability to make precise forecasts beyond about 10 days.

As in many other scientific fields, the proliferation of tools like artificial intelligence and machine learning holds great promise for weather prediction. We have seen some of whats possible in our research on applying machine learning to forecasts of high-impact weather. But we also believe that while these tools open up new possibilities for better forecasts, many parts of the job are handled more skillfully by experienced people.

Today, weather forecasters primary tools are numerical weather prediction models. These models use observations of the current state of the atmosphere from sources such as weather stations, weather balloons and satellites, and solve equations that govern the motion of air.

These models are outstanding at predicting most weather systems, but the smaller a weather event is, the more difficult it is to predict. As an example, think of a thunderstorm that dumps heavy rain on one side of town and nothing on the other side. Furthermore, experienced forecasters are remarkably good at synthesizing the huge amounts of weather information they have to consider each day, but their memories and bandwidth are not infinite.

Artificial intelligence and machine learning can help with some of these challenges. Forecasters are using these tools in several ways now, including making predictions of high-impact weather that the models cant provide.

In a project that started in 2017 and was reported in a 2021 paper, we focused on heavy rainfall. Of course, part of the problem is defining heavy: Two inches of rain in New Orleans may mean something very different than in Phoenix. We accounted for this by using observations of unusually large rain accumulations for each location across the country, along with a history of forecasts from a numerical weather prediction model.

We plugged that information into a machine learning method known as random forests, which uses many decision trees to split a mass of data and predict the likelihood of different outcomes. The result is a tool that forecasts the probability that rains heavy enough to generate flash flooding will occur.

We have since applied similar methods to forecasting of tornadoes, large hail and severe thunderstorm winds. Other research groups are developing similar tools. National Weather Service forecasters are using some of these tools to better assess the likelihood of hazardous weather on a given day.

Researchers also are embedding machine learning within numerical weather prediction models to speed up tasks that can be intensive to compute, such as predicting how water vapor gets converted to rain, snow or hail.

Its possible that machine learning models could eventually replace traditional numerical weather prediction models altogether. Instead of solving a set of complex physical equations as the models do, these systems instead would process thousands of past weather maps to learn how weather systems tend to behave. Then, using current weather data, they would make weather predictions based on what theyve learned from the past.

Some studies have shown that machine learning-based forecast systems can predict general weather patterns as well as numerical weather prediction models while using only a fraction of the computing power the models require. These new tools dont yet forecast the details of local weather that people care about, but with many researchers carefully testing them and inventing new methods, there is promise for the future.

There are also reasons for caution. Unlike numerical weather prediction models, forecast systems that use machine learning are not constrained by the physical laws that govern the atmosphere. So its possible that they could produce unrealistic results for example, forecasting temperature extremes beyond the bounds of nature. And it is unclear how they will perform during highly unusual or unprecedented weather phenomena.

And relying on AI tools can raise ethical concerns. For instance, locations with relatively few weather observations with which to train a machine learning system may not benefit from forecast improvements that are seen in other areas.

Another central question is how best to incorporate these new advances into forecasting. Finding the right balance between automated tools and the knowledge of expert human forecasters has long been a challenge in meteorology. Rapid technological advances will only make it more complicated.

Ideally, AI and machine learning will allow human forecasters to do their jobs more efficiently, spending less time on generating routine forecasts and more on communicating forecasts implications and impacts to the public or, for private forecasters, to their clients. We believe that careful collaboration between scientists, forecasters and forecast users is the best way to achieve these goals and build trust in machine-generated weather forecasts.

See the original post here:
AI and machine learning are improving weather forecasts, but they won't replace human experts - The Conversation Indonesia

Read More..

CFPB Releases a Warning But No Helpful Guidance on Machine Learning Model Adverse Action Notices – Lexology

On May 26, the Consumer Financial Protection Bureau (CFPB or Bureau) announced that federal anti-discrimination law requires companies to explain to applicants the specific reasons for denying an application for credit or taking other adverse actions, even if the creditor is relying on credit models using complex algorithms.

In a corresponding Consumer Financial Protection Circular published the same day, the CFPB started with the question, When creditors make credit decisions do these creditors need to comply with the Equal Credit Opportunity Acts (ECOA) requirement to provide a statement of specific reasons to applicants against whom adverse action is taken?

Yes, the CFPB confirmed. Per the Bureaus analysis, both ECOA and Regulation B require creditors to provide statements of specific reasons to applicants when adverse action is taken. The CFPB is especially concerned with something called black-box models decisions based on outputs from complex algorithms that may make it difficult to accurately identify the specific reasons for denying credit or taking other adverse actions.

This most recent circular asserts that federal consumer financial protection laws and adverse action requirements should be enforced, regardless of the technology used by creditors, and that creditors cannot justify noncompliance with ECOA based on the mere fact that the technology they use to evaluate credit applications is too complicated, too opaque in its decision-making, or too new.

The Bureaus statements are hardly novel. Regulation B requires adverse action notices and does not have an exception for machine learning models, or any other kind of underwriting decision-making for that matter. Its difficult to understand why the Bureau thought it was necessary to restate such a basic principle, but what is even more difficult to understand is why the Bureau has not provided any guidance on the appropriate method for deriving adverse action reasons for machine learning models. The official commentary to Regulation B provides specific adverse action logic applicable to logistic regression models, but the Bureau noted in a July 2020 blog post that there was uncertainty about the most appropriate method to do so with a machine learning model. That same blog post even stated that the Bureau would consider resolving this uncertainty by amending Regulation B or its official commentary. A few months later, the Bureau hosted a Tech Sprint on adverse action notices during which methods for deriving adverse action reasons from machine learning models were specifically presented to the Bureau. Now, a year and half later, the Bureau has still declined to provide any such guidance, and the May 26 announcement simply emphasizes and perpetuates the same uncertainty that the Bureau itself recognized in 2020, without offering any guidance or solution whatsoever. It is disappointing, to say the least.

View original post here:
CFPB Releases a Warning But No Helpful Guidance on Machine Learning Model Adverse Action Notices - Lexology

Read More..

Incorporation of machine learning and deep neural network approaches into a remote sensing-integrated crop model for the simulation of rice growth |…

Study locations and rice data

The ML and DNN models were developed for the rice growing areas in the entire geographic regions of Cheorwon and Paju in South Korea (Fig.4). Then, the parameterised ML and DNN models were evaluated for the representative rice growing areas of Gimje, South Korea and Pyeongyang, North Korea. Cheorwon and Paju were selected as these areas are typical rice cultivation regions in the central portion of the Korean peninsula. The paddy rice cultivation regions in Cheorwon and Paju have areas of 10,169 and 6,625ha, respectively, representing 80.4% and 62.6% of the total staple croplands for each region, according to the Korean Statistical Information Service, KOSIS (https://kosis.kr/).

Study location boundary maps of (a) Cheorwon, (b) Paju, (c) Gimje in South Korea and (d) Pyeongyang in North Korea.

The leading rice cultivar in Cheorwon and Paju was Odae (bred by NICS in 1983), cultivated in more than 80% of the paddy fields during the study period, according to KOSIS. Rice seedlings were transplanted in these areas between May 15 and 20, deemed as the ideal transplanting period.

We used the temporal profiles of NDVI from the Terra MODIS MOD09A1 surface reflectance 8-day product with a spatial resolution of 500m, which were employed for the ML and DNN model input variable. This product is the composited imagery by selecting the best pixels considering the cloud and solar zenith during eight days33. It is essential to secure reliable and continuous phenological NDVI data for determining crop yield in monsoon regions like the current study area concerning input variables for the process-based crop model. Therefore, the cloud-contaminated pixels were interpolated with other poor quality pixels caused by aerosol quantity or cloud shadow using the spline interpolation algorithm during the rice-growing season to improve data quality during the monsoon season. This approach has been widely used in time series satellite imagery for interpolation34,35,36. The criteria for poor quality pixels for interpolation were determined from the 16-bit quality assurance (QA) flags from the MOD09A1 product33.

Furthermore, we estimated the incoming solar radiation on the surface (insolation) obtained from the COMS Meteorological Imager (MI). Insolation reflects the energy source of photosynthesis for the crop canopies. We adopted a physical model to estimate solar radiation by considering atmospheric effects such as aerosol, water vapour, ozone, and Rayleigh scattering37,38,39,40,41. Before estimating the solar radiation from the physical model, we classified clear and cloudy sky conditions because cloud effects should be considered for their high attenuation influences. If the pixel image was assigned as a clear sky condition, atmospheric parameterisations were performed for direct and diffuse irradiance owing to the effects of atmospheric constituents and solar-target-satellite sensor geometry40,42,43,44. If the pixel images were considered as under cloudy conditions, the cloud attenuation was calculated using a cloud factor for visible reflectance and the solar zenith angle42. Finally, the estimated solar radiation from COMS MI was used as one of the main input parameters of the RSCM system. Comprehensive descriptions of those parameters used for the physical model can be referenced from earlier studies41,43.

The maximum and minimum air temperature data were obtained from the Regional Data Assimilation and Prediction System (RDAPS) provided by the Korea Meteorological Administration (KMA, https://www.kma.go.kr). The spatial resolution of the RDAPS is 12km, and it is composed of 70 vertical levels up to about 80km. The global data assimilation and prediction system is provided at 3-h intervals for the Asian regions, and forecasts are performed four times a day (00, 06, 12, and 18 UTC) for 87h. In addition, the system is operated in a 6-h interval analysis-prediction-circulation system using the four-dimensional variational data assimilation45. The weather datasets were resampled to a spatial resolution of 500m using the nearest neighbour method that does not change the existing values to match the MODIS imagery.

The current study employed the RSCM to incorporate an ML and DNN procedure and then simulate rice growths and yields (Supplementary Fig. S1). We integrated an ML and DNN regressor into the RSCM-rice system based on the investigation of the ML or DNN regressors described in the following subsection. The ML or DNN scheme was implemented to improve the mathematical regression approach for the RS-based VIs and LAI relationships, as described below.

RSCM is a process-based crop model designed to integrate remotely sensed data, allowing crop modellers to simulate and monitor potential crop growth6. This model can accept RS data as input to perform its within-season calibration procedure5, wherein the simulated LAI values are compared to the corresponding observed values. Four different parameters (that is, L0, a, b, and c) are utilised in the within-season procedure to define the crop-growth processes based on the optimisation of LAI using the POWELL procedure46. In addition, these parameters can be calibrated using the Bayesian method to obtain acceptable values with a prior distribution that was selected based on the estimates from earlier studies6,47. The current research project applied consistent initial conditions and parameters to calibrate the RSCM-rice system.

The ML models investigated in this study were Polynomial regression, Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Regression (SVR), RF, Extra Trees (ET), Gradient Boosting (GB), Histogram-based Gradient Boosting (HGB), Extreme Gradient Boosting (XGB), and Light Gradient Boosting machine regression (LightGB) regressors. These models are implemented in scikit-learn (https://scikit-learn.org/), while the DNN model (Supplementary Fig. S4) is implemented in Keras (https://keras.io/), which are achievable on Python (https://www.python.org/).

The Polynomial regression model is a particular regression model to overcome the limitations of simple linear regression by estimating the relationship with the Nth degree polynomial. The Ridge and Lasso additionally use l2-norm and l1-norm as constraints in the existing model. These characteristics of the models show better performance than the conventional linear regression, which uses the least-squares method to find appropriate weights and biases to reduce overfitting48,49.

The SVR allows the definition of the amount of allowable error and finds a hyperplane of higher dimensions to fit the data. The SVR is widely used for classification and numerical prediction and is less overfitting and easier to use than neural networks. However, it takes a long time to build an optimisation model, and it is difficult to interpret the results50.

The RF is an ensemble model that trains multiple decision tree models and aggregates its results. It has good generalisation and performance, is easy to tune parameters, and is less prone to overfitting. On the other hand, memory consumption is higher than in other ML models. Also, it is not easy to expect higher performance improvement even when the amount of training dataset increases. Extra trees increase randomness by randomly splitting each candidate feature in the tree, which can reduce bias and variance51. The difference from the RF is that ET does not use bootstrap sampling but uses the whole origin data when making decision trees. The GB belongs to the boosting series among the RF ensemble models, which combines weak learners to create strong learners with increased performance. Meanwhile, the GB training process is slow and not efficient in overfitting. There are HGB, XGB, and LightGB in the form of the GB that improve performance by increasing the training speed and reducing overfitting. The HGB speeds up the algorithm by grouping each decision tree with a histogram and reducing the number of features. The XGB improves learning speed through parallel processing and is equipped with functions necessary to improve performance compared to the GB, such as regularisation, tree pruning, and early stopping. The LightGBM significantly shortens the training time and decreases memory use by using a histogram-based algorithm without showing a significant difference in predictive performance compared to the XGBoost52.

The DNN increases the predictive power by increasing the hidden layer between the input and the output layers. Non-linear combinations between input variables are possible, feature weighting is performed automatically, and performance tends to increase as the amount of data increases. However, since it is difficult to interpret the meaning of the weights, there is a disadvantage in that the results are also difficult to interpret. In addition, when fewer training datasets are collected, the performance of the ML models mentioned above can be better53.

This study used satellite-based solar radiation and model-based maximum and minimum temperatures to estimate LAI values during the rice-growing seasons on the study sites (Cheorwon, Paju, Gimje, and Pyeongyang) for seven years (20112017), employing the ML and DNN regressors. We reproduced rice LAI values from the MODIS-based NDVI values using the empirical relationship between LAI and NDVI (Supplementary Fig. S2). Cheorwon and Paju datasets were used for the ML and DNN model development, while Gimje and Pyeongyang datasets were employed for the model evaluation. The target LAI variable data used for the model development showed characteristic seasonal and geographical variations (Supplementary Figs. S3 and S4). The model development datasets were divided into train and test sets with a 0.8 and 0.2 ratio using the scikit-learn procedure. All the ML and DNN regressors were trained and tested, obtaining appropriate hyperparameters. Alpha values for the Ridge and Lasso were determined as 0.1 and 0.01 based on a grid search approach with a possible range of values (Supplementary Fig. S5). The activation function employed for the DNN model was the rectified linear unit (ReLU), implementing six fully connected layers with a design of gradual increasing and decreasing units from 100 to 1,000 (Supplementary Fig. S6). The model was performed with a dropout rate of 0.17, the adam optimizer at a learning rate of 0.001, 1,000 epochs, and a batch size of 100. The DNN hyperparameters were determined based on a grid search approach and a trial and error approach, seeking minimum and steady losses for each study region (Supplementary Fig. S7).

We analysed the performance of the ML (that is, RF) and DNN regimes using two statistical indices in Python (https://www.python.org), namely the RMSE and the ME54. This index denotes the comparative scale of the residual variance of simulated data compared to the observed data variance. Furthermore, ME can assess the agreement between the experimental and simulated data, showing how well these data fit through the 1:1 line in a scatter plot. The index value can vary from to 1. We employed normalized ME for advanced interpretation, allowing for the ME measure in simulation estimation approaches used in model evaluation. Thus, ME=1, 0, and correspond to ME=1, 0.5, and 0, respectively. Therefore, the model is considered reliable if the ME value is nearer to 1, whereas the simulated data are considered less dependable if the ME value is close to 0.

Read more from the original source:
Incorporation of machine learning and deep neural network approaches into a remote sensing-integrated crop model for the simulation of rice growth |...

Read More..

Machine Learning Shows That More Reptile Species May Be at Risk of Extinction Than Previously Thought – SciTechDaily

Potamites montanicola, classified as Critically Endangered by automated the assessment method and as Data Deficient by the IUCN Red List of Threatened Species. Credit: Germn Chvez, Wikimedia Commons (CC-BY 3.0)

Machine learning tool estimates extinction risk for species previously unprioritized for conservation.

Species at risk of extinction are identified in the iconic Red List of Threatened Species, published by the International Union for Conservation of Nature (IUCN). A new study presents a novel machine learning tool for assessing extinction risk and then uses this tool to show that reptile species which are unlisted due to lack of assessment or data are more likely to be threatened than assessed species. The study, by Gabriel Henrique de Oliveira Caetano at Ben-Gurion University of the Negev, Israel, and colleagues, was published on May 26th in the journal PLOS Biology.

The IUCNs Red List of Threatened Species is the most comprehensive assessment of the extinction risk of species and informs conservation policy and practices around the world. However, the process for categorizing species is time-consuming, laborious, and subject to bias, depending heavily on manual curation by human experts. Therefore, many animal species have not been evaluated, or lack sufficient data, creating gaps in protective measures.

To assess 4,369 reptile species that were previously unable to be prioritized for conservation and develop accurate methods for assessing the extinction risk of obscure species, these scientists created a machine learning computer model. The model assigned IUCN extinction risk categories to the 40% of the worlds reptiles that lacked published assessments or are classified as DD (Data Deficient) at the time of the study. The researchers validated the models accuracy, comparing it to the Red List risk categorizations.

The authors found that the number of threatened species is much higher than reflected in the IUCN Red List and that both unassessed (Not Evaluated or NE) and Data Deficient reptiles were more likely to be threatened than assessed species. Future studies are needed to better understand the specific factors underlying extinction risk in threatened reptile taxa, to obtain better data on obscure reptile taxa, and to create conservation plans that include newly identified, threatened species.

According to the authors, Altogether, our models predict that the state of reptile conservation is far worse than currently estimated, and that immediate action is necessary to avoid the disappearance of reptile biodiversity. Regions and taxa we identified as likely to be more threatened should be given increased attention in new assessments and conservation planning. Lastly, the method we present here can be easily implemented to help bridge the assessment gap on other less known taxa.

Coauthor Shai Meiri adds, Importantly, the additional reptile species identified as threatened by our models are not distributed randomly across the globe or the reptilian evolutionary tree. Our added information highlights that there are more reptile species in peril especially in Australia, Madagascar, and the Amazon basin all of which have a high diversity of reptiles and should be targeted for extra conservation efforts. Moreover, species-rich groups, such as geckos and elapids (cobras, mambas, coral snakes, and others), are probably more threatened than the Global Reptile Assessment currently highlights, these groups should also be the focus of more conservation attention

Coauthor Uri Roll adds, Our work could be very important in helping the global efforts to prioritize the conservation of species at risk for example using the IUCN red-list mechanism. Our world is facing a biodiversity crisis, and severe man-made changes to ecosystems and species, yet funds allocated for conservation are very limited. Consequently, it is key that we use these limited funds where they could provide the most benefits. Advanced tools- such as those we have employed here, together with accumulating data, could greatly cut the time and cost needed to assess extinction risk, and thus pave the way for more informed conservation decision making.

Reference: Automated assessment reveals that the extinction risk of reptiles is widely underestimated across space and phylogeny by Gabriel Henrique de Oliveira Caetano, David G. Chapple, Richard Grenyer, Tal Raz, Jonathan Rosenblatt, Reid Tingley, Monika Bhm, Shai Meiri and Uri Roll. 26 May 2022, PLOS Biology.DOI: 10.1371/journal.pbio.3001544

Here is the original post:
Machine Learning Shows That More Reptile Species May Be at Risk of Extinction Than Previously Thought - SciTechDaily

Read More..