How thousands of servers leverage hybrid IT at Times Internet – ETCIO.com

With more than 550 million monthly active users, 38 brands (TOI, ET, CricBuzz, Gaana, MX Player, Magicbricks, Dineout, and others) under its name, and 5,500 employees, Times Internet Limited (TIL) is Indias largest digital products company.

Spread across multiple geographies, the Times Internet datacentre and infrastructure stack is one of the most complex ones in the market and Sumit Malhotra, CIO of Times Internet is taking a hybrid approach to effectively manage and improve the companys infrastructure.

To manage workloads, self-healing hybrid cloud servers have been deployed, powered by 80,000 CPU cores, running over 12,000 instances that allow any of the groups businesses to scale up and down as needed. The company also works with other cloud partners such as AWS, Google Cloud and Azure for different services.

Using ML for managing workloads For the data centres, a machine learning-backed monitoring system has been developed, which is capable of monitoring 10 lakh metrics per second.

Given the companys wide range of offerings and diverse set of customers, consistency and reliability is expected from the networking infrastructure. For effective performance, a solution for identifying outages in real-time has also been deployed.

Malhotra explained that performing a simple check on a particular website itself is a long process given the complexities involved.

A website sitting on CDN (Content Delivery Network) is mapped to a load balancer and then to the application server. Each application server is itself connected to the different database and cache servers--its a big mesh. Considering we have a lot of interdependencies across, we wanted to make sure that we wanted to identify the problem area very quickly, he said.

After pooling data from different sources (from monitoring system, CDN and logging engine), predictive analysis is done to find out the probable cause of outage.

It helps us cut down and identify the reason for the outage from 2 minutes to 60 minutes to 10 seconds, added Malhotra.

The company also has a centralised system which sends bills to each business heads related to IT infrastructure consumption and requirements. Through this, the various departments within the firm can take an analytical look at utilization of the different IT resources and how much they are spending on what resources, helping them optimize their costs.

Going forward, Malhotra wants to take this system to another level where the respective business heads will also get insights saying that Youll save X amount of money if you follow this recommendation.

There are many and security is on top of the list. This year a lot more of our investment will go into our SoC. Most of our data centres are scalable and we are moving away from transit links to peering links which will help us in reducing cost.

"We are also focusing on optimisation which would enable us to get more of our existing hardware", he added.

Chatbot for IT supportTIL started using chatbot called Toby two years ago and it has been extensively integrated with various systems. It began with a small function where the bot would notify the user (a TIL employee) if a visitor comes and it will ask for his/her approval.

Then the chatbot extended it to various other things like requesting an IT asset, cabs or other services.

The idea is to give people a window where they can get a lot more done in a lesser amount of time so that the manual requests get really low. More than 80% of the workflow today is done through the chatbot.

Malhotras team created bots and integrated with different systems such as ticketing, a CMDB (Configuration Management Database) for all our assets, transport booking, digital management asset. We created APIs which integrated with all these systems and integrated with G-Suite (used by all TIL employees).

Malhotra also plans to integrate Toby with the HR tool; an employee would be able to ask things such as leaves left and much more through chatbot.

We are also going to integrate the chatbot with our SOC operations. For example, we get an antivirus alarm and we are finding an anomaly. So, the user will be notified in real-time, that a problem has been going on and the request has been attended. The idea is to take the feedback from different systems and reach out to people through a single window, says Malhotra.

Continue reading here:
How thousands of servers leverage hybrid IT at Times Internet - ETCIO.com

Related Posts

Comments are closed.