Problem-Solving And Discussion With Experts Are The Best Methods For Studying A Subject: Sumanta Mukherjee, IBM – Analytics India Magazine

Chemical engineering and applied mathematics are very rare combinations. Sumanta Mukherjee, a research scientist at IBM, possesses this rare broad knowledge base. Sumanta is an experienced research scientist with a track record of accomplishments in the information technology and services industries.

In addition, Sumanta is a researcher with expertise in machine learning, data science, mathematical modelling, computational biology, bioinformatics, and algorithm design. Analytics India Magazine caught up with him to gain insights into his perspectives on some of these topics.

AIM: Given that the beginning of your career was not in data science, you have climbed up the ladder certainly well. What would you say were the obstacles in starting your path in data science, and what approach did you take to overcome them?

Sumanta Mukherjee: I have a diverse career path. I started my career as a chemical engineer. Then pursued higher study in computational science, followed by a PhD in applied mathematics.

Post completion of every degree, I have worked with industries for a few years. I have worked as a process engineer, software developer, and currently, researcher.

After completion of my PhD, I have joined IBM Research, Bangalore. I am grateful to the great set of colleagues I had at my workplace. IBM Research has a very diverse, open, and inclusive environment. Therefore, most of my learning was via interaction with the experts in the field and while solving a targeted problem.

From my experience, the best way to learn a topic is by solving a problem and discussing it with people who have experience in that field and making continuous attempts to improvise your solution.

Data science is no different. One big benefit is free access to a large community and freely available resources. However, data science is expanding at a tremendous pace, which is a challenge to keep up. It demands continuous reading and updating yourself with the trend.

A strong grasp of mathematics, statistics, and programming helps a lot. There are two important dimensions to data science,

Keeping up with both is difficult. So, better keep your attention on one specific dimension.

AIM: How significant is participation in hackathons and similar competitions when pursuing a career in data science?

Sumanta Mukherjee: It is very important, and the benefits are multi-faceted

There are also data science-specific competitions, like Kaggle. Anyone seriously pursuing a data science career should be a part of the Kaggle community.

AIM: As someone with a research background and considerable experience working with research laboratories, could you emphasise the importance of research and the areas where companies should focus their efforts in machine learning?

Sumanta Mukherjee: My answer to this question will be biased. My experience is restricted to the IBM research lab, composed of a very able set of individuals.

I think industries are doing very well in finding challenging questions for the research community.

One purpose is to use data science and ML to support the current industry, and the other is to explore new questions. Most industries focus on addressing the first purpose where there is a direct business value. The second purpose is more academic, but it may help improve the future of science and industry. Therefore, I hope industries in India increase their academic collaborations to achieve a balanced and sustainable future.

One specific challenge to the application of data science is ethical restriction. Data can reveal many insights which may violate ethics. Therefore, defining rules and regulations around the application of data science and an effort to build algorithms that respect ethical restrictions should be prioritised.

AIM: Your research and industry experience has focussed on applied mathematics and energy efficiency. When effective energy management is critical, how do you believe data scientists can help solve these problems in todays environment?

Sumanta Mukherjee: I indeed joined IBM research, the smart energy group, but currently, I am a part of the retail-supply-chain team.

Data science is a tool to understand and comprehend a large volume of data. Data is in a plethora today. In any field, the volume of data is increasing exponentially. In this context, I will emphasise the two primary goals of data science,

(1) estimation and

(2) knowledge mining (eXplainable AI).

Estimation helps in taking a reactive approach to addressing a problem, while knowledge mining may help us adopt a proactive strategy to address a problem.

If we ask the right question, data science can help us in finding a comprehensive answer. Data science is a tool to help the progress of science and technology if used correctly.

AIM: Which machine learning/deep learning algorithm is your go-to and why?

Sumanta Mukherjee: Every algorithm has a different purpose. The selection of an algorithm depends on the problem. Often, we need to customise the input-output to cast the problem appropriate for an algorithm. Sometimes we may need to tweak the algorithm to cater to the problem.

In the structured data domain, one algorithm stands out XGBoost. There are many competing alternatives, but it is always my first algorithm of choice to address structured data regression/classification problems. The large adoption of this algorithm in the applied machine learning community is due to its stability, scalability, and easy library interface. In addition, many explainability tools help in deriving insights from the trained model.

AIM: What suggestions would you provide to someone seeking their first data science position?

Sumanta Mukherjee:

AIM: The rate of advancement in this field, particularly in deep learning, is unmatched. What will be the next frontier for algorithms based on deep learning?

Sumanta Mukherjee: Deep learning is the current trend. What makes it beautiful, the basic building block of a deep learning model is extremely simple, but when put together as a system, it can do magic. Exponential growth in participation of the NeurIPS conference is a direct indicator of its growing popularity.

AIM: Many publicly available datasets can be used to enhance our machine learning abilities. What kind of projects should aspiring data scientists work on to improve their resumes for todays job market, in your opinion?

Sumanta Mukherjee:

AIM: Please share with us the names of role models for you, if any. How has their work inspired you?

Sumanta Mukherjee: Richard P Feynman, is my role model since my childhood. I have always admired his way of understanding and explaining concepts. How easily we can explain it to others shows how well we understand the concept. Only when we understand something well enough (not by jargon, but by its basic functions) can we improvise the system or find flaws. Therefore, an in-depth understanding of the fundamentals of data science is essential.

AIM: Are there any research papers that you think every data scientist should read?

Sumanta Mukherjee: Research papers are very application-specific. There are tons of them, and its hard to list them all. I recommend articles by Geoffrey Hinton that are a must-read for those who want to work in deep learning. I closely follow the work by Bernhard Schlkopf, Yoshua Bengio, and Michael Jordan.

A few texts books for avid data scientists are listed below

Machine Learning Tom Mitchell

Pattern Classification David Stork, Peter Hart, Richard Duda

Machine learning: A probabilistic perspective Kevin Murphy

Deep Learning Aaron Courville, Ian Goodfellow, Yoshua Bengio

A Probabilistic Theory of Pattern Recognition Luc Devroye, Laszlo Gyorfi, Gabor Lugosi

The Elements of Statistical Learning Trevor Hastie, Robert Tibshirani, Jerome Friedman

Statistical Rethinking: A Bayesian Course with Examples in R and Stan Richard McElreath

Elements of Information Theory Joy Thomas, Thomas Cover

Information Theory, Inference and Learning Algorithms David Mackay

Learning in Graphical Models Michael Jordan

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems Aurelien Geron

Link:

Problem-Solving And Discussion With Experts Are The Best Methods For Studying A Subject: Sumanta Mukherjee, IBM - Analytics India Magazine

Related Posts

Comments are closed.