Challenges and New Frontiers of AI – ETCIO.com

By Som Pal Choudhury

The phenomenal impact that Artificial Intelligence (AI) is projected to have on our economy and our daily lives is nothing short of astounding. It is predicted that AI will significantly ($15.7 trillion) contribute to the world economy by 2030. While its prominence has magnified its adoption and use-cases, criticisms abound with its adoption resulting in job losses, unintended biases, privacy, surveillance concerns and even the energy-hogging data centres building the AI models. As with any new technology, its abuse versus its safe and productive use with the right sets of ethics and regulations rests on us.

With significant adoption underway in all facets of life and business, the challenges and concerns around training AI with unbiased data, data scarcity, trust, explainability and privacy are becoming the top concerns for broader adoption. Researchers and thought leaders worldwide are trying to solve them with several new frontiers emerging and being explored. We took a deeper dive to understand these challenges and summarise our learnings here.

Artificial Intelligence research has significantly picked up in India, and our review of patents and research shows a solid research base here in edge AI and Federated Learning. Large tech giants have released edge frameworks orthogonal to the well-entrenched cloud-based AI/ML. Federated learning involves a central server that collates information from many edge-generated models to create a global model without transferring local data for training. It has a hyper-personalised approach, is time-efficient, cost-effective and supposedly privacy friendly as user data is not sent to the cloud.

AutoML has seen significant progress to ensure that data scientists are not stuck in repetitive and time-consuming tasks starting from data cleaning, playing around with different models and hyper-parameters and eventually fine-tuning them for best results. AutoML uses an inherent reinforcement learning and recurrent neural network approach so that these models and parameters start with an initial input or auto-picked, but gets continuously and automatically refined based on results.

There are a wide variety of platforms in the market today, and we are at Gen 3 of AutoML evolution with more verticalised domain-specific platforms. Most platforms still select the model and the hyperparameters, which means that the data scientists still need to do the bulk of the work in data preparation and cleaning, where the majority of time is often spent. Other advanced platforms also include cleaning, encoding and feature extraction, a must to build a good model quickly, but the approach is template driven and may not always be a good fit.

AI practitioners have always been plagued with a paucity of data and hence the effort to generate acceptable models with reduced datasets or simply their quest to find more data. Finding more data include public annotated data (e.g. Google public dataset, AWS open data), data augmentation running transforms on available data and transfer learning where other similar but the larger dataset is used to train the models. Rapid progress continues on creation of artificial or synthetic data. Synthetic Minority Over-sampling Technique (SMOTE) and several of its modifications are used in classic cases where minority data is sparse and hence oversampled. Generating completely new data with self-learning (AlphaGo self-played 4.9 million times) and simulation (recreating city traffic scenarios using gaming engines) are more recent approaches to create synthetic data. Unfortunately, more data also amplifies the resource and time constraints to train, including the time and effort required to clean, remove noise, remove redundancies, outliers etc. The holy grail of AI training is Few-Shot Learning (FSL), that is, training with a smaller dataset. It is an area of active research, as highlighted in this recent survey paper.

A vast amount of open-source models, datasets, active collaboration and benchmarks continue to accelerate AI development. Open AIs GPT-3 launch took NLP to another level with 175 billion parameters trained on 570 gigabytes of text. Huawei recently trained the Chinese version of GPT-3 with 1.1 terabytes of Chinese text. Alphabet subsidiary Deepminds AlphaFold had the most significant breakthrough in Biology with 92.4 percent accuracy in the well-known protein structure and folding prediction competition. Cityscapes has built a large-scale 50 cities dataset of diverse urban street scenes. Beyond image and language recognition, the next frontier of AI is intent understanding from video. While India rose in the AI Vibrancy index from rank 23 to 5 in 2021, a lot still needs to be done in terms of collaboration, open-source and India specific datasets.

With the growing need for security of sensitive and private information, there is a call for machine learning algorithms to be run on data that is protected by encryption. Homomorphic encryption (HE) is a concept that is now being leveraged to train models on data without decrypting it and risking data leaks. Intel is one of the players in this space that has collaborated with Microsoft to develop silicon for this purpose. With growing interest in research and development in this field, these HE methods will become more commonplace and advanced.

Removing toxicity and biases is the aim of Ethical AI or Responsible AI, but development is at nascent stages. Google and Accenture have announced Responsible AI frameworks. European Commissions white paper on AI focuses on trust, and the UN AI ethics committee formation is an excellent initiative.

The evolution of AI is happening at a breakneck pace, and 2021 will be no different.

The author is Partner, BIF and Arjun Nair, Intern BIF, Junior at Brown University

Read more:
Challenges and New Frontiers of AI - ETCIO.com

Related Posts

Comments are closed.