Category Archives: Data Science

Hot topics and emerging trends in data science – Information Age

We gauged the perspectives of experts in data science, asking them about the biggest emerging trends in data science

What does the near future of data science entail?

As one of the fastest evolving areas of tech, data science has seen a rise up the corporate agenda as less and less leaders base business decisions on guess work. With added capabilities such as artificial intelligence (AI) and the edge complementing the work of data scientists, the field is becoming more accessible to employees, but this still requires training of data skills, on the most part. In this article, we explore some key emerging trends in data science, as believed by experts in the field.

Firstly, its believed that the involvement of AI and machine learning (ML) will increase further, and enable more industries to become truly data-centric.

As businesses start to see the benefits of artificial intelligence and machine learning enabled platforms, they will invest in these technologies further, said Douggie Melville-Clarke, head of data science at Duco.

In fact, the Duco State of Reconciliation report which surveyed 300 heads of global reconciliation utilities, including chief operating officers, heads of financial control and heads of finance transformation found that 42% of those surveyed will investigate the use of more machine learning in 2021 for the purposes of intelligent data automation.

Data science in insurance

Melville-Clarke went on to cite the insurance industry, often perceived as a sector thats had difficulty innovating due to high levels of regulation, as an example for future success when it comes to data science.

He explained: The insurance industry, for example, has already embraced automation for processes such as underwriting and quote generation. But the more valuable use of artificial intelligence and machine learning is to increase your service and market share through uses like constrained customisation.

Personalisation is one of the key ways that banks and insurance companies can differentiate themselves, but without machine learning this can be a lengthy and expensive process.

Machine learning can help these industries tailor their products to meet the individual consumers needs in a much more cost-effective way, bettering the customer experience and increasing customisation.

Johanna Von Geyr, partner and EMEA lead banking, financial services & insurance at ISG, explores digital transformation in the insurance sector. Read here

Along with rising use of AI and ML models, organisations have been combining AI with robotic process automation (RPA), to reduce operational costs through automating decision making. This trend, known as hyperautomation, is predicted to help companies to continue innovating fast in a post-COVID environment in the next few years.

In many ways, this isnt a new concept the key goal of enterprise investment in data science for the past decade has been to automate decision-making processes based on AI and ML, explained Rich Pugh, co-founder and chief data scientist at Mango Solutions, an Ascent company.

What is new here is that hyperautomation is underpinned by an RPA-first approach that can turbocharge process automation and drive increased collaboration across analytic and IT functions.

Business leaders need to focus on how to harness enterprise automation and continuous intelligence to elevate the customer experience. Whether that is embedding intelligent thinking into the processes that will drive more informed decision making, such as deploying automation around pricing decisions to deliver a more efficient and personalised service, or leveraging richer real-time customer insights in conjunction with automation to execute highly relevant offers and new services at speed.

Embarking on the hyperautomation journey begins with achieving some realistic and measurable future outcomes. Specifically, this should include aiming for high-value processes, focusing on automation and change, and initiating a structure to gather the data that will enable future success.

Dan Sommer, senior director at Qlik, identified software-as-a-service (SaaS) and a self-service approach among users, along with a shift in advanced analytics, as a notable emerging trend in data science.

To those in the industry, its clear that SaaS will be everyones new best friend with a greater migration of databases and applications from on premise to cloud environments, said Sommer.

Cloud computing has helped many businesses, organisations, and schools to keep the lights on in virtual environments and were now going to see an enhanced focus on SaaS as hybrid operations look set to remain.

In addition, well see self-service evolving to self-sufficiency when it comes to effectively using data and analytics. Empowering users to access data, insights and business logic earlier and more intuitively will enable the move from visualisation self-service to data self-sufficiency in the near future.

Finally, advanced analytics need to look different. In uncertain times, we can no longer count on backward-looking data to build a comprehensive model of the future. Instead, we need to give particular focus to, rather than exclude outliers and this will define how we tackle threats going forward too.

Jonathan Bowl, AVP & general manager, UK, Ireland & Nordics at Commvault, explores the value of SaaS offerings in a post-COVID business environment. Read here

With employees gradually becoming more comfortable with using data science tools to make decisions, while aided by automation and machine intelligence, a concept thats materialised as a hot topic for the next stage of development is the concept of data fabric.

Trevor Morgan, product manager at comforte AG, explained: A data fabric is more of an architectural overlay on top of massive enterprise data ecosystems. The data fabric unifies disparate data sources and streams across many different topologies (both on-premise and in the cloud), and provides multiple ways of accessing and working with that data for organisational personnel, and with the larger fabric as a contextual backdrop.

For large enterprises that are moving with hyper-agility while working with multiple or many Big Data environments, data fabric technology will provide the means to harness all this information and make it workable throughout the enterprise.

Another important trend to consider regarding the future of data science is the new career paths and jobs that are set to emerge in the coming years.

According to the World Economic Forum (WEF)s Future of Jobs Report 2020, 94% of UK employers plan to hire new permanent staff with skills relevant to new technologies and expect existing employees to pick up new skills on the job, said Anthony Tattersall, vice-president, enterprise, EMEA at Coursera.

Whats more, WEFs top emerging jobs in the UK data scientists, AI and machine learning specialists, big data and Internet of Things all call for skills of this nature.

We therefore envision access to a variety of job-relevant credentials, including a path to entry-level digital jobs, will be key to reskilling at scale and accelerating economic recovery in the years ahead.

The Industrial Data Scientist

In regards to new roles to emerge in data science, Adi Pendyala, senior director at Aspen Technology, predicts the emergence of the Industrial Data Scientist: These scientists will be a new breed of tech-driven, data-empowered domain experts with access to more industrial data than ever before, as well as the accessible AI/ML and analytics tools needed to translate that information into actionable intelligence across the enterprise.

Industrial data scientists will represent a new kind of crossroads between our traditional understanding of citizen data scientists and industrial domain experts: workers who possess the domain expertise of the latter but are increasingly shifting over to the data realm occupied by the former.

To kick off our Data Science month, this article will explore how you can embark on a career in data science, and the key factors to consider. Read here

New tools

Many organisations are being impacted by a shortage of data scientists in proportion to demand, but Julien Alteirac, regional vice-president, UK&I at Snowflake, believes that new tools, powered by ML, could help to mitigate this skills gap in the near future.

When it comes to analysing data, most organisations employ an abundance of data analysts and a limited number of data scientists, due in large part to the limited supply and high costs associated with data scientists, said Alteirac.

Since analysts lack the data science expertise required to build ML models, data scientists have become a potential bottleneck for broadening the use of ML. However, new and improved ML tools which are more user-friendly are helping organisations realise the power of data science.

Data analysts are empowered with access to powerful models without needing to manually build them. Specifically, automated machine learning (AutoML) and AI services via APIs are removing the need to manually prepare data and then build and train models. AutoML tools and AI services lower the barrier to entry for ML, so almost anyone will now be able to access and use data science without requiring an academic background.

Read this article:

Hot topics and emerging trends in data science - Information Age

Training the Next Generation of Indigenous Data Scientists – The New York Times

Native DNA is so sought after that people are looking for proxy data, and one of the big proxy data is the microbiome Mr. Yracheta said. If youre a Native person, you have to consider all these variables if you want to protect your people and your culture.

In a presentation at the conference, Joslynn Lee, a member of the Navajo, Laguna Pueblo and Acoma Pueblo Nations and a biochemist at Fort Lewis College in Durango, Colo., spoke about her experience tracking the changes in microbial communities in rivers that experienced a mine wastewater spill in Silverton, Colo. Dr. Lee also offered practical tips on how to plan a microbiome analysis, from collecting a sample to processing it.

In a data-science career panel, Rebecca Pollet, a biochemist and a member of the Cherokee Nation, noted how many mainstream pharmaceutical drugs were developed based on the traditional knowledge and plant medicine of Native people. The anti-malarial drug quinine, for example, was developed from the bark of a species of Cinchona trees, which the Quechua people historically used as medicine. Dr. Pollet, who studies the effects of pharmaceutical drugs and traditional food on the gut microbiome, asked: How do we honor that traditional knowledge and make up for whats been covered up?

One participant, the Lakota elder Les Ducheneaux, added that he believed that medicine derived from traditional knowledge wrongly removed the prayers and rituals that would traditionally accompany the treatment, rendering the medicine less effective. You constantly have to weigh the scientific part of medicine with the cultural and spiritual part of what youre doing, he said.

Over the course of the IndigiData conference, participants also discussed ways to take charge of their own data to serve their communities.

Mason Grimshaw, a data scientist and a board member of Indigenous in A.I., talked about his research with language data on the International Wakashan A.I. Consortium. The consortium, led by an engineer, Michael Running Wolf, is developing an automatic speech recognition A.I. for Wakashan languages, a family of endangered languages spoken among several First Nations communities. The researchers believe automatic speech recognition models can preserve fluency in Wakashan languages and revitalize their use by future generations.

Original post:

Training the Next Generation of Indigenous Data Scientists - The New York Times

How to empower the data scientist in the era of edge computing and AI – Information Age

Dan Warner, CEO and co-founder of LGN, discusses how data scientists can be empowered in the era of edge computing and AI

With data constantly evolving, the scientists managing this cannot succeed alone.

For a while now, the position of data scientist has been one of the most hyped roles in technology and, indeed, business. Its not hard to see why as organisations wake up to the seemingly limitless potential in their data, theyve realised they need people that can extract, analyse and interpret large amounts of data. The demand is such that there is ongoing talk of a data scientist shortage, particularly in more experienced, senior roles.

Yet for all this attention, how effective are those data scientists, and how empowered do they actually feel? Its a pertinent question, coming at a time when so much data is underutilised. Are businesses, knowing they need to make better use of their data, hiring data scientists without fully understanding how best to deploy the talent?

Perhaps a better way to look at it is to ask whether businesses know how to make better use of their data are they hiring data scientists and expecting them to work miracles, or are businesses ensuring that not only do they have the right talent, but that they are feeding these teams with the right data?

To kick off our Data Science month, this article will explore how you can embark on a career in data science, and the key factors to consider. Read here

Many might think that its the job of the data scientist to find the right data, but theyre wrong. Ultimately, data scientists can only work with what theyre given, in the same way that a salesperson can only do so much with a poor product, or a Formula One driver can only achieve so much with an average car.

What, then, is the right data? Obviously, that varies from business to business, but fundamentally there are a number of principles that good data will follow, irrespective of organisational need. Firstly, it needs to be fresh that means it needs to reflect the real world as it is at that moment. Everything changes so fast that a lot of data quickly becomes irrelevant. The more it stagnates, the less value it has.

So, if a data scientist is working on old data when there is more recent information available, the insights they can extract are going to be less relevant to the environment the business is operating in.

Secondly, it needs to be live data so it needs to be from the real world, not training data, and not made up. Why? Because the real world is messy, throwing up anomalies that no one would ever have thought of, creating obstacles that models and indeed data scientists brought up on sanitised training data wont be able to process.

Put another way if an organisation feeds its data scientists and their models stale, offline data, then the best that enterprise can hope for is irrelevant, limited insights.

That means businesses need to find a way of continually feeding their data scientists with live, evolutionary data, in real-time, from the real world. How do they do that? With edge computing.

Edge computing needs no introduction with the explosion in Internet of Things devices over the last few years, more and more data processing is happening at the edge of networks. Sensors on everything from wind turbines and tractors to fridges and streetlamps are capturing data constantly. Its real, its live, its messy, and it is exactly what data scientists need to be working on.

Businesses need to empower their data scientists by giving them training data and performance metrics from the edge. They can then use this to inform their AI models, which in turn are then deployed onto edge devices. These real-world environments give data scientists vital information on how their models stand up to anomalies and variations that cant be recreated in labs or test environments. The models could well perform badly, at least initially thats a good thing, as it gives data scientists something to dig into, to understand whats come up that they hadnt thought of.

That said, whether the models perform well or poorly, data needs to be accessed, cleaned, annotated and ultimately fed back into the model for training on a continual basis. Its a feedback loop that needs to keep running so that systems can improve and adapt. But it needs to be a smart extraction of data no system can possibly manage all the data sensors are collecting, so having a way of identifying and getting the most important data back from the edge is critical.

On top of that, data scientists need to be able to redeploy sensors and machines to investigate, re-image and analyse data sources confusing the AI models. Whichever way the data has been gathered, however automated the process, at some point it was subject to human thinking, assumptions and presumptions. These may have been based on the data and evidence available at the time, but that may no longer be appropriate to capture the data needed. This is where being able to shift what data is being collected is vital for data scientists to remain effective, working on the most relevant information.

As digital innovation continues to accelerate, we explore how machine learning models can be trained to be future-ready. Read here

Ultimately, this all signals a shift away from the old paradigm of collecting big sets of training data, segmenting, training the model and seeing what happens, and towards a new paradigm one of active learning, where AI models learn how to cope with the real world, and data scientists are empowered to work effectively. In doing so, they will be better equipped to gather the insights and intelligence needed to give their organisations a true competitive edge in increasingly crowded, data-driven marketplaces.

Originally posted here:

How to empower the data scientist in the era of edge computing and AI - Information Age

New High Profile Addition to Maritime Optimas Team, Set to Build the Companys Analysis/Data Science Unit – Hellenic Shipping News Worldwide

Sven Melsom Ziegler ex Clarkson Platou will head and build Maritime Optimas analysis/data science unit. Sven is a well-known and experienced Shipping/Offshore Market Analyst.

I am looking forward to working with the team and the data in Maritime Optima, Mr Melsom Ziegler says. I was introduced to the team and their data platform during the spring. It is rare to find a startup investing so much in maritime data quality and seeing such a competent team with so much passion and willingness to improve data quality continuously. Since the industry lacks clear definitions of most data, someone must start doing this thoroughly because machines need clear definitions to create value for humans. Otherwise, the gains from digitalization and automation will be hard to obtain. Here we have a unique data platform that enables state-of-the-art shipping/offshore models and analysis.

Sven Melsom Ziegler started working with RS Platou, where he stayed for 21 years. Sven was born in Cape Town and raised in Larvik and Athens. After completing his education at Strathclyde University of Glasgow and CASS in London, he later settled down in Oslo. He comes from Forum Market Services, where he has been working with mainly oil market/bunker analysis and quantitative strategies.

We are very proud to have Sven on board, says the founder and CEO Kristin Omholt-Jensen. We have kept a very low profile so far. During the Covid months, we have spent time keeping the team well, together and motivated, investing in and collecting data, and defining data templates, testing and scaling. It is super important for us to develop the product together with our users, so since our R&D partners left their offices and went to their home offices while we needed active feedback from real users, we launched a freemium application last autumn. Since the launch, we have been growing very quickly. Today we have close to 9000 active registered users, and we pick up AIS data every minute for more than 65 000 vessels from more than 600 satellites and terrestrial senders. We also know we manage to do cost-effective scaling. We have a very exciting road-map and will develop new products / features continuously (based on feedback from the users) and it is perfect timing to have Sven as a part of our team.

Maritime Optima is set up to develop and distribute user-friendly and flexible maritime saas software across platforms helping professionals in the maritime industry save time, work more efficiently, make better decisions and maybe have more fun. Software should be easy to understand and looked upon as a partner and not something you have to use and hate. We believe that colleagues should be working in teams, but you can also work in your one-man-team if you want. We think professionals will dare to share more and more, but they will not want to share everything with everyone. So we have made it super-easy to share public or keep private, and we have therefore included a user log showing the users and teams activity, so it is easy to find later.

The maritime office software industry is young, and there are many startups. To change the way the industry works, their routines, and how things have been done for years will take time, but we are prepared and willing to show that we are here to stay. We continuously launch new features based on user feedback, invest time to improve the data quality and increase the number of users. The maritime software industry is still young and will be consolidated during the next few years, and we want to play an active role in that consolidation, says Omholt-Jensen.

Start building your own maritime office by registering a free account in Maritime Optima: http://www.app.maritimeoptima.comSource: Maritime Optima

Read more from the original source:

New High Profile Addition to Maritime Optimas Team, Set to Build the Companys Analysis/Data Science Unit - Hellenic Shipping News Worldwide

Don’t Forget the Human Factor in Autonomous Systems and AI Development – Datanami

(Jozsef Bagota/Shutterstock)

It goes without saying that humans are the intended beneficiaries of the AI applications and autonomous systems that data scientists and developers are creating. But whats the best way to design these AI apps and autonomous systems to maximize human interaction and human benefit? Thats a tougher question to answer. Its also the focus of human factors specialists, who are increasingly in demand.

Datanami recently caught up with one of these in-demand human factors specialists. Catherine Neubauer is a research psychologist at the Army Research Lab, and a lecturer at the University of Southern Californias online Master of Science in Applied Psychology Program. Neubauer, who holds a Ph.D. in Psychology with an emphasis on Human Factors from the University of Cincinnati, has researched various aspects of the human factors equation, including assessing human performance and decision-making.

According to Neubauer, there are a slew of concerns where humans and the latest technology come together.

AI and autonomous systems are really becoming very prevalent in our everyday interactions, she says. We really need to focus on them because if we dont design them with the human user in mind, that interaction is not going to be easy or desirable.

As an applied psychologist working in this field, Neubauer understands the problem from multiple angles. On the one hand, she wants to understand how humans interact with autonomous systems and AI so that humans can be better trained to work with next-gen systems. On the other hand, her work also informs data scientists and developers on how they can build better, more human-centered systems.

There is considerable room for improvement on both sides of the equation. I think were getting there, she says, but I think a lot more work is needed.

Tesla cars have a self-driving mode, but the carmaker warns users not to rely on it (Flystock/Shutterstock)

For instance, in the autonomous driving arena, where Neubauer has spent a considerable amount of time, people may feel that great progress is being made. After all, some new cars can essentially drive themselves, at least in some circumstances. But those aha experiences are not what they may appear to be, she says.

Theres this idea of Oh great, I have this self-driving car. Its a Tesla. I can just sit back and not pay attention [and] fall asleep. Thats not the case. Were not there yet, she tells Datanami in a recent interview. There are limitations to this technology. In an ideal state, yes it can it can drive around on its own. But the human should always be ready to take over control if they need to.

Similarly, advances in natural language processing (NLP) have supercharged the capabilities of personal assistants, which are able to understand and respond to ever-more-sophisticated questions and requests. But once again, the gains should not overshadow the fact that a lot more work is needed.

I think we are doing a good job in the sense that we made very large gains and what were capable of doing, she says. But I still think that you know more work needs to be done to get it to where you know you can just easily interact with a personal assistant, that its like a robot or something like that, with no mistakes, no errors. Were still seeing some kinks that need to be worked out.

Some of Neubauers latest research involves the algorithmic detection of human emotion. Computer vision technology has made great strides not only in being able to recognize specific faces, but also to detect somebodys mood based on how their face appears. Knowing if a human is happy, sad, or angry can be very valuable, and governments around the world are investing in the technology as part of their defense initiatives.

But, again, the technology is not quite there yet, Neubauer says.

The best AI products are designed with humans in mind (Aurielaki/Shutterstock)

While I think its really great that that we kind of have this classification system to read the emotion, you kind of have to take that with a grain of salt, because everyone expresses emotions differently, she says. And some people might feel really happy, but theyre just not outwardly expressive. Some people might feel really sad or depressed, but you might not see that expressed for whatever reasons.

Instead of just using the computer vision algorithm, Neubauer is investigating multi-modal forms of emotion detection. This is a promising area of research, she says. Im not going to focus specifically on a facial expression. Im going to get other streams of data to give me more information about a human, she says.

So what should data scientists and autonomous systems developers do if they want to benefit from human factors research? Number one is know your users.

I think that the best products or systems or technologies that we interact with have been designed with the human user in mind, Neubauer says. First and foremost, you have to make sure that your designing the systems for your users, to make them easy to use.

A rule of thumb with this sort of design thinking is to make the product so easy to use that it doesnt require a manual. This often requires limiting the ways in which a user can interact with an application or a system, and to encourage exploration. (There is a limit to this rule, of course after all, Tesla tells users in the manual to always be ready to take over controls, but many people obviously ignore this.)

Neubauers second piece of advice for data scientist and autonomous systems developers who wan to incorporate human factors advances into their work, interestingly, concerns ethics.

I like to think of myself as an ethical person, and I am always thinking of where my research and my work is going, and whos going to be using it, she says. Just because we can do something with technology doesnt mean we should. So anytime were implementing this technology, building new systems, we have to ask ourselves, is it actually helping society? And who is it helping?

Catherine Neubauer, PhD, is a research psychologist at the Army Research Lab and a lecturer in human factors at USC

Not all humans are good at assessing risk. Its not necessarily a qualification that data scientists will look to build out, or to put on their resume. But in Neubauers reading, risk assessment should be part of the creative process for those creating AI apps and autonomous systems, particularly when it comes to the risks that they are asking their users to take.

The risks of a bad outcome are significantly higher when AI and autonomous features are built into self-driving cars, autopilot systems in airplanes, and traffic control systems for trains, for example, than they are in developing a personal assistant or adding probabilistic features to a word processor program (Clippy, were looking at you).

If its some sort of stressful, high-stakes scenario and I have an autonomous agent working with me and it [tells] me to go left when I should go right, because thats the data that it had trained its decision on, thats going to be a problem, Neubauer says. On the other hand, maybe youre a surgeon going into surgery. You want to make sure your personal assistant is letting you know what your appointments are. So I think it depends on the scenario that youre in and how important it is to make sure that we have a very small, if not non-existent, percentage or likelihood that an errors going to occur.

It appears that were at the beginning of a period of great progress in the field of AI and autonomous systems. There are many aspects of life and society that can benefit from a certain degree of data-driven decision making.

But in the long run, there are other intangible aspects to the human factors equation that should bear some scrutiny. Neubauer understands that AI and autonomous systems can reduce our cognitive workload and let us get more done. But she also wonders how the ever-growing use of this technology will impact human development.

Sometimes I get concerned that we basically have these personal assistants in our phone reminding us to do everything, she says. We dont have to memorize phone numbers anymore. What is actually going to happen to our cognitive system if we have GPS taking us everywhere? We dont have to actually develop a mental map of the cities we live in. Those kinds of basic skills worry me that theyre not being used. And if theyre not being used there, were not going to be strong in those areas.

Related Items:

Why Human Integration Is Critical to the Future of AI

The Next Generation of AI: Explainable AI

User Autonomy and the Pending Data War

More here:

Don't Forget the Human Factor in Autonomous Systems and AI Development - Datanami

Humanities versus STEM: The forced dichotomy where no one wins – IT PRO

Last autumn, following a difficult six months of redundancies and furloughs that hit the arts and culture sector particularly hard, a governmental campaign encouraging people to retrain to work in the tech industry was met with heavy criticism from the public. The ad showed a young dancer named Fatima tying up her ballet shoes, likely in preparation for rehearsal, with the message Fatimas next job could be in cyber. (she just doesnt know it yet) overlaid on the left of the image. In the end, the campaign became so unpopular (not to mention thoroughly ridiculed) that the campaign was ultimately scrapped and the government forced to apologise.

Although it was probably not the main intention by design, the ad became a symbol of the way the arts and culture sectors are seen as frivolous, held in less respect than the supposedly more responsible and important STEM subjects. Last year, arts and design, which could well have been the course studied by the person hired by the government to create the infamous campaign, was ranked ninth out of the ten most popular university subjects, according to QS World University Rankings by Subject. Computer science and information systems and engineering and technology topped the list, while humanities subjects such as history, languages, literature, and philosophy, were nowhere to be seen. This shouldn't come as a surprise: Many young people considering studying Arts & Humanities are advised not to pursue this path, as its stereotypically seen as a fast track to unemployment or redundancy, as seen during the pandemic. In fact, if you imagine a hypothetical usefulness spectrum dictated by the economy, humanities are going to be on the opposite end from subjects such as maths, computer science, and chemistry.

Long-time NASA engineer Peter Scott uses a metaphor of driving a car to illustrate the divide between those who are in the tech industry and those who arent.

The fact is that it's a matter of perspective, and this perspective came to me after driving my children everywhere. They have to sit in the back seat all the time and when one of them gets old enough, and they get to ride in the front seat once in a while, it's like: Oh, you can see so much more here.

In a world that is rapidly progressing with new technologies, being outside of STEM is a bit like being driven around in a car while being forced to sit in the back seat.

There are insiders and outsiders in every industry, but the tech industry is the one that's doing the most to reshape where we're going, he says. For everyone else, it's like: We don't know where we're going. [The driver] seems to think you know where you're going, but you're not giving us a good enough picture of it.

After three decades at NASA's Jet Propulsion Laboratory (JPL), Scott decided to embark on a more human-facing career of public speaking, which he describes as scarier and less likely to happen than jumping out of a plane. However, he also notes that he managed to achieve both. Nowadays, he blends being a business coach and successful TEDx speaker with contracting for NASA.

I'm balancing both of these worlds because I want and need to be able to see both sides at the same time. Engineers and scientists, we tend to get locked into a certain view of the world that's driven by the equations and the principles that we know and the natural laws that explain everything. And you can't say that it doesn't, because that's the principle of science, he says.

However, while science may be the vanguard of change, if taken in isolation it leaves out a whole perspective that's driven by poetry, emotion, and artistic values, notes Scott.

Because they're not quantifiable and measurable, they get not so much disdained as just ignored by scientists and tech people. It doesn't get you where you need to go as a tech person, so you can't afford to spend time on that. So these worlds are like C P Snows Two Cultures theyre growing, if anything, further apart, he warns.

The gap between the two separate cultures of humanities and STEM is especially visible in the evolving technology of artificial intelligence, which is becoming more present in everyday life in our phones, workplaces, and even supermarkets.

Our journey with AI especially is one that requires a common understanding, says Scott. We can't advance this technological agenda that upends everyone's life in a largely, and hopefully, positive way without understanding both sides, without finding the bridge between those two cultures.

Although AI is treated as inherently technological, last years events have proven that its also a major ethical issue. Nevertheless, despite University of Oxford physicist David Deutsch predicting in 2012 that philosophy will be the key that unlocks artificial intelligence, the New College of the Humanities (NCH) is so far the only university in the UK to offer a joint degree in philosophy and artificial intelligence. Dr Brian Ball, who is the head of NCHs philosophy faculty and an associate professor, tells IT Pro that the degree was launched after the school partnered with the Boston-based Northeastern University.

We are quite proud of our MA in Philosophy and Artificial Intelligence, and some of our related degrees, such as our MSc AI with a Human Face, and our various bachelor's degrees with humanities majors and data science minors, he says. They are prompted in part by our joining the global network of Northeastern University, where interdisciplinarity and the cultivation of human, data, and technological literacies are central to higher education, and partly by the intrinsic merits of studying these subjects together.

The IT Pro Podcast: Soft skills vs STEM skills

The UKs level of soft skills is dropping - so what can we do about it?

According to Ball, AI can benefit from at least two of these merits, with philosophy able to provide the technology with ethicality as well as explainability. This is particularly crucial at a time when facial recognition is increasingly under fire for being prone to unethical usage.

Artificial intelligence isnt the only field where philosophy might be useful, however. Over the last five years, Exasol chief data and analytics officer Peter Jackson recruited plenty of data scientists, but not all of them come from a traditional IT or data science background. In fact, he says the best data scientist to ever be a part of one of his teams, who was creative, curious, and could turn insights into compelling arguments, didn't hold a degree in computing or data science, but in philosophy. Jackson says that, when recruiting, he doesnt only look at candidates technical skills and their ability to understand data, but also their storytelling skills. According to him, this specific ability can be found in somebody who's done English literature, who is very good at writing poems and stories, and building a coherent argument.

I need them to be able to interpret the output of that piece of work, either to me, or the rest of the team, or to stakeholders, he tells IT Pro. If they can only go so far, and they have to hand [it] over to somebody else to tell the story, you can get a disconnect. The person who's telling the story might not be able to answer some of the technical questions that may arise: What training set did you use? or Where did that data come from?. So I try to recruit data scientists who are able to at least tell the first part of the story of their work.

However, Jackson notes that finding people with both data science and storytelling skills is very hard.

Sometimes, because of particular skills that you need from the data science point of view, you quite often compromise on that. And that's where you do need the professional storytellers, professional writers who can support it, he adds.

Asked about his thoughts on the split between STEM sciences and humanities, Jackson says that he doesnt see it as a dichotomy.

I dont think there should be a divide. I think as a society, as an economy, we need smart, educated people and I think that is the priority.

Owning your own access security

The key to building strong cloud security and avoiding the risk of vendor lock-in

Developing a dynamic infrastructure

How to implement holistic changes to support distributed workers

How upgraded server and storage platforms support digital transformation

New Dell EMC PowerStore delivers high-end enterprise storage features at midrange price point

How to maximise the value of your data and apps with IaaS

Free yourself from infrastructure complexity

See the original post:

Humanities versus STEM: The forced dichotomy where no one wins - IT PRO

Board approves UW System’s 2021-22 Annual Operating Budget (day 1 news summary) – University of Wisconsin System

MADISON The University of Wisconsin System Board of Regents unanimously voted Thursday to approve a $6.564 billionannual operating budget for 2021-22.

Key takeaways of the annual budget presented by Sean Nelson, Vice President of Finance, include:

Regent Bob Atwell said he appreciated the UW Systems ongoing efforts to keep costs down for students, but urged the Board to consider not just the cost to students, but also the cost of a UW education for students as part of a larger discussion of meeting higher education needs in Wisconsin.

In response, President Thompson reiterated his call for a blue-ribbon commission to study the overall state of higher education in Wisconsin, including both the UW System and the Wisconsin Technical College System.

Several Regents and Chancellors spoke in support of a commission, emphasizing the need to proceed without delay and to include input from campuses.

System President Thompson told Regents it is vital that the University support the whole student as things return to pre-pandemic life and that includes supporting students academic, financial, emotional, and overall health. He said the UW System is expanding some of its traditional programs, like the Summer Bridge Program, to address growing needs.

In a promising sign, Thompson told the Board that new fall freshman applications for UW System universities are up by about 30% over each of the last two years. Moreover, applications by Wisconsin residents, first-generation students, and underrepresented minorities are also up.

I am thrilled with these positive application numbers, he said. It shows our strategies are working and we are setting the stage for success.

Over the past 15 months, UW System has worked to simplify the application process, including waiving application fees, creating a new EApp, allowing students to use a single application for multiple universities, and suspending the ACT requirement.

Thompson also updated Regents on recent progress with UW Systems Administrative Transformation Program (ATP), the multi-year program to address Systemwide legacy process inefficiencies, risks, and gaps in functionality, and to build an administrative infrastructure for the future.

Im going to push this very hard in order for this university to be run like a modern institution, Thompson said.

He reported that he and Chancellor Blank recently approved an amended timeline for implementing ATP that would simultaneously benefit all campuses, rather than UW-Madison going first and others following. Thompson also noted that a 10-year contractual agreement between UW System and Workday, Inc., for cloud-based enterprise resource planning software had been approved by Regents in the Business & Finance Committee earlier in the day and that a new Request for Proposal on implementation services was issued last week.

Turning to budget matters, Thompson noted that Governor Evers had signed the state biennial budget just this morning. That budget provides $8.25 million in additional GPR, $628.7 million for building projects, a 2 percent pay plan increase in each year, and returns tuition setting authority to the Board of Regents by not extending the tuition freeze.

I want to thank Governor Evers and the legislature for their leadership, especially for investing in critical improvements in our infrastructure and support for our employees, Thompson said.

Regent President Edmund Manydeeds III, in his first report to the Board as President, shared a few thoughts on his expected priorities.

As the UW System works to provide a pre-pandemic college experience for students this fall, he said its also recognized there likely will be lingering effects of COVID, particularly in the areas of student and employee behavioral health.

We are actively engaged in a behavioral health initiative to improve the wellbeing and academic success of our students across the UW System because we know that healthy students are more likely to stay in school, graduate, and lead productive and fulfilling lives, he said.

Manydeeds is also focused on improving the campus climate for underrepresented students and employees. He noted the first recipients of the new Wisconsin Regents Opportunity Scholarships, which recognize underrepresented and underserved students, were announced this week. Funded by UW System, 267 students will be awarded scholarships in the inaugural round, totaling $995,000.

We have to do more than just talk about equity, diversity, and inclusion. We have to live it, and weave it through everything we do in the System, Manydeeds said. It starts by doing just one thing a day to make that happen.

Manydeeds also told Regents he wants to appoint a special committee to review the Boards bylaws and governance issues. A key item for this committee to address is the UW Systems allocation of state GPR dollars to campuses.

The Board welcomed new colleague, Dr. Jill Underly, who has just begun her term as State Superintendent of Public Instruction. Underly has a deep background in public education, most recently serving six years as superintendent of the Pecatonica School District in southwestern Wisconsin.

President Thompson introduced Dr. Jim Henderson, recently named interim Chancellor at UW-Whitewater after Chancellor Dwight Watson announced his resignation for medical reasons. Henderson previously served as UW System Vice President for Academic and Student Affairs.

The Business & Finance Committee approved a contract between UW-Madison and the National Football League. Under the agreement, the National Football League will support the UW-Madison School of Medicine and Public Healths 4-year multi-site, longitudinal, multi-discipline investigation involving the development and validation of new technologies toward muscle injury risk mitigation in collegiate football players, with a total budget of $3,999,974. The goal of the project is to incorporate advanced imaging and biomechanics to build digital models of athletes risk for hamstring strain injury.

In other business, the Business & Finance Committee:

The Education Committee approved amendments to Regent Policy Document (RPD) 4-12, Academic Program Planning, Review, and Approval in the University of Wisconsin System, to incorporate provisions requiring institutions to review credit requirements of degree programs that require more than 130 credit hours to complete and to reduce the number of students who accumulate excess credits. The proposal recognizes and seeks to address the institutional systems and processes that may present a barrier to students completing their degrees in a timely manner without amassing excess credits.

In other business, the Education Committee:

Chief Compliance Officer Katie Ignatowski presented the Fiscal Year 2022 Compliance Plan, which was approved by the Audit Committee. Ignatowski noted that the Office of Compliance and Integrity (OCI) has worked with each UW System institution to identify individuals responsible for key compliance obligations, craft policies to codify standards for compliance, and develop tools and resources necessary to aid compliance efforts.

Ignatowski told Regents the OCI expects to finalize policies in 2022 related to youth protection and records management.

In other business, the Audit Committee:

The Capital Planning & Budget Committee approved UW-Eau Claires request to lease student athletics, events, and recreation space within the Sonnentag Event and Recreation Complex. It also approved a new segregated fee that will be applied towards the lease of the facility.

The facility, which results from a unique community partnership between Mayo Clinic Health System Northwest, Blugold Real Estate, and the City of Eau Claire, will provide additional athletics and recreational needs for UW-Eau Claire and a collaborative use of space with Mayo Clinic, who contributes sports medicine, athletics and human performance training expertise, rehabilitation, medical imaging, and research conducted with the UW-Eau Claire Department of Kinesiology and other academic departments.

In other business, the Capital Planning & Budget Committee:

Presenting to the REDI Committee, three WiSys faculty innovators highlighted their efforts to build a culture of research, discovery, and product commercialization at UW System comprehensive universities. Their unique perspectives were followed by updates from several students who have been recognized systemwide for excellence in undergraduate research, product development and start-ups.

UW-Parkside Chancellor Debbie Ford, WiSys Advisory Board Chair, led a panel discussion with three other chancellors highlighting the growth of intellectual property development, industry partnerships, and entrepreneurial ecosystems at UW System campuses and in their surrounding communities.

The University of Wisconsin System Board of Regents will resume its meeting at 8:45 a.m., July 9, 2021, in Madison.

Continued here:

Board approves UW System's 2021-22 Annual Operating Budget (day 1 news summary) - University of Wisconsin System

What is Multiple Regression? – Built In

Linear regression, while a useful tool, has significant limits. As its name implies, it cant easily match any data set that is non-linear. It can only be used to make predictions that fit within the range of the training data set. And, most importantly for our purposes, linear regression can only be fit to data sets with a single dependent variable and a single independent variable.

This is where multiple regression comes in. While it cant overcome all of linear regressions weaknesses, its specifically designed to create regressions on models with a single dependent variable and multiple independent variables.

Multiple regression is an extension of linear regression models that allow predictions of systems with multiple independent variables.

To start, lets look at the general form of the equation for linear regression:

y = B * x + A

Here, y is the dependent variable, x is the independent variable, and A and B are coefficients dictating the equation. The difference between the equation for linear regression and the equation for multiple regression is that the equation for multiple regression must be able to handle several inputs, instead of only the single input of linear regression. To account for this change, the equation for multiple regression looks like this:

y = B_1 * x_1 + B_2 * x_2 + + B_n * x_n + A

In this equation, the subscripts denote the different independent variables. For example, x_1 is the value of the first independent variable, x_2 is the value of the second independent variable, and so on. It keeps going as we add more independent variables until we finally add the last independent variable, x_n, to the equation. (Note that this model allows you to have any number, n, independent variables and more terms are added as needed.) The B coefficients employ the same subscripts, indicating they are the coefficients linked to each independent variable. A, as before, is simply a constant stating the value of the dependent variable, y, when all of the independent variables, the xs, are zero.

As an example, imagine that youre a traffic planner in your city and need to estimate the average commute time of drivers going from the east side of the city to the west. You dont know how long it takes on average, but you do know that it will depend on a number of factors like the distance driven, the number of stoplights on the route, and the number of other cars on the road. In that case you could create a linear multiple regression equation like the following:

y = B_1 * Distance + B_2 * Stoplights + B_3 * Cars + A

Here y is the average commute time, Distance is the distance between the starting and ending destinations, Stoplights is the number of stoplights on the route, and A is a constant representing other time consumers (e.g. putting on your seat belt, starting the car, maybe stopping at a coffee shop).

Now that you have your commute time prediction model, you need to fit your model to your training data set to minimize errors.

Similarly to how we minimize the sum of squared errors to find B in linear regression, we minimize the sum of squared errors to find all of the B terms in multiple regression. The difference here is that since there are multiple terms, and an unspecified number of terms until you create the model, there isnt a simple algebraic solution to find A and B. This means we need to use stochastic gradient descent. You can find a good description of stochastic gradient descent in Data Science from Scratch by Joel Gros or use tools in the Python scikit-learn package. Fortunately, we can still present the equations needed to implement this solution before reading about the details.

The first step is summing the squared errors on each point. This takes the form:

Error_Point = (Actual Prediction)

In this instance, Error is the error in the model when predicting a persons commute time, Actual is the actual value (or that persons actual commute time), and Prediction is the value predicted by the model (or that persons commute time predicted by the model). Actual Prediction yields the error for a point, then squaring it yields the squared error for a point. Remember that squaring the error is important because some errors will be positive while others will be negative and if not squared these errors will cancel each other out making the total error of the model look far smaller than it really is.

To find the error in the model, the error from each point must be summed across the entire data set. This essentially means that you use the model to predict the commute time for each data point that you have, subtract that value from the actual commute time in the data point to find the error, square that error, then sum all of the squared errors together. In other words, the error of the model is:

Error_Model = sum(Actual_i Prediction_i)

Here i is an index iterating through all points in the data set.

Once the error function is determined, you need to put the model and error function through a stochastic gradient descent algorithm to minimize the error. The stochastic gradient descent algorithm will do this by minimizing the B terms in the equation.

Once youve fit the model to your training data, the next step is to ensure that the model fits your full data set well.

To make sure your model fits the data use the same r value that you use for linear regression. The r value (also called the coefficient of determination) states the portion of change in the data set predicted by the model. The value will range from 0 to 1, with 0 stating that the model has no ability to predict the result and 1 stating that the model predicts the result perfectly. You should expect the r value of any model you create to be between those two values. If it isnt, retrace your steps because youve made a mistake somewhere.

You can calculate the coefficient of determination for a model using the following equations:

r = 1 (Sum of squared errors) / (Total sum of squares)

(Total sum of squares) = Sum(y_i mean(y))

(Sum of squared errors) = sum((Actual_i Prediction_i))

Heres where testing the fit of a multiple regression model gets complicated. Adding more terms to the multiple regression inherently improves the fit. Additional terms give the model more flexibility and new coefficients that can be tweaked to create a better fit. Additional terms will always yield a better fit to the training data whether the new term adds value to the model or not. Adding new variables which do not realistically have an impact on the dependent variable will yield a better fit to the training data, while creating an erroneous term in the model. An example of this would be adding a term describing the position of Saturn in the night sky to the driving time model. The regression equations will create a coefficient for that term, and it will cause the model to more closely fit the data set, but we all know that Saturns location doesnt impact commute times. The Saturn location term will add noise to future predictions, leading to less accurate estimates of commute times even though it made the model more closely fit the training data set. This issue is referred to as overfitting the model.

Additional terms will always improve the model whether the new term adds significant value to the model or not.

This fact has important implications when developing multiple regression models. Yes, you could keep adding more terms to the equation until you either get a perfect match or run out variables to add. But then youd end up with a very large, complex model thats full of terms which arent actually relevant to the case youre predicting.

One way to determine which parameters are most important is to calculate the standard error of each coefficient. The standard error states how confident the model is about each coefficient, with larger values indicating that the model is less sure of that parameter. We can intuit this even without seeing the underlying equations. If the error associated with a term is typically high, that implies the term is not having a very strong impact on matching the model to the data set.

Calculating the standard error is an involved statistical process, and cant be succinctly described in a short article. Fortunately there are Python packages available that you can use to do it for you. The question has been asked and answered on StackOverflow at least once. Those tools should get you started.

After calculating the standard error of each coefficient, you can use the results to identify which coefficients are highest and which are lowest. Since high values indicate that those terms add less predictive value to the model, you can know those terms are the least important to keep. At this point you can start choosing which terms in the model can be removed to reduce the number of terms in the equation without dramatically reducing the predictive power of the model.

Another method is to use a technique called regularization. Regularization works by adding a new term to the error calculation that is based on the number of terms in the multiple regression equation. More terms in the equation will inherently lead to a higher regularization error, while fewer terms inherently lead to a lower regularization error. Additionally, the penalty for adding terms in the regularization equation can be increased or decreased as desired. Increasing the penalty will also lead to a higher regularization error, while decreasing it will lead to a lower regularization error.

With a regularization term added to the error equation, minimizing the error means not just minimizing the error in the model but also minimizing the number of terms in the equation. This will inherently lead to a model with a worse fit to the training data, but will also inherently lead to a model with fewer terms in the equation. Higher penalty/term values in the regularization error create more pressure on the model to have fewer terms.

Read More From Our ExpertsWhat is Linear Regression? Explaining Concepts and Applications With Tensorflow 2.0

The model youve created is not just an equation with a bunch of numbers in it. Each one of the coefficients you derived states the impact an independent variable has on the dependent variable assuming all others are held equal. For instance, our commute time example says the average commute will take B_2 minutes longer for each stoplight in a persons commute path. If the model development process returns 2.32 for B_2, that means each stoplight in a persons path adds 2.32 minutes to the drive.

This is another reason its important to keep the number of terms in the equation low. As we add more terms it gets harder to keep track of the physical significance (and justify the presence) of each term. Anybody counting on the commute time predicting model would accept a term for commute distance but will be less understanding of a term for the location of Saturn in the night sky.

Note that this model doesnt say anything about how parameters might affect each other. In looking at the equation, theres no way that it could. The different coefficients are all connected to only a single physical parameter. If you believe two terms are related, you could create a new term based on the combination of those two. For instance, the number of stoplights on the commute could be a function of the distance of the commute. A potential equation for that could be:

Stoplights = C_1 * Distance + D

In this case, C_1 and D are regression coefficients similar to B and A in the commute distance regression equation. This term for stoplights could then be substituted into the commute distance regression equation, enabling the model to capture this relationship.

Another possible modification includes adding non-linear inputs. The multiple regression model itself is only capable of being linear, which is a limitation. You can however create non-linear terms in the model. For instance, say that one stoplight backing up can prevent traffic from passing through a prior stoplight. This could lead to an exponential impact from stoplights on the commute time. You could create a new term to capture this, and modify your commute distance algorithm accordingly. That would look something like:

Stoplights_Squared = Stoplights

y = B_1 * Distance + B_2 * Stoplights + B_3 * Cars + B_4 * Stoplights_Squared + C

These two equations combine to create a linear regression term for your non linear Stoplights_Squared input.

Multiple regression is an extension of linear regression models that allow predictions of systems with multiple independent variables. We do this by adding more terms to the linear regression equation, with each term representing the impact of a different physical parameter. When used with care, multiple regression models can simultaneously describe the physical principles acting on a data set and provide a powerful tool to predict the impacts of changes in the system described by the data.

This article was originally published on Towards Data Science.

Read the original:

What is Multiple Regression? - Built In

Grab the Opportunity: Top AI and Data Science Jobs to Apply Today – Analytics Insight

AI and data science jobs are already seen as a rewarding career path for professionals

Artificial intelligenceis a promising technology, that has made significant changes in the 21st century. Starting from self-driving cars and robotic assistants to automated disease diagnosis and drug discovery, the stronghold ofartificial intelligenceis no joke. Along withartificial intelligence,data sciencehas also shifted the way we live and work. With the demand fordata scienceandartificial intelligence spiralling, the job market is opening its door toAI and data science jobs. The tech sphere has ensured thatartificial intelligence jobsanddata science jobsprovide limitless opportunities for professionals to explore cutting edge solutions. According to a Gartner report,Artificial intelligence jobsrose to over 2.3 million in 2020. While the competition in the industry is heating up,AI and Data science jobsare already seen as the rewarding career path. Analytics Insight has listed topAI and Data science jobs that aspirants should apply for today.

Location: Bengaluru, Karnataka, India

About the company: IBM, also known as International Business Machines Corporation is a leading American computer manufacturer. The company has developed a thoughtful, comprehensive approach to corporate citizenship that they believe aligns with IBMs values and maximized the impact they can make as a global enterprise.

Roles and responsibilities: As a data scientist at IBM, the candidate is expected to develop, maintain, and evaluate AI solutions. He/she will be involved in the design of data solutions using artificial intelligence-based technologies like H2O, Tensorflow. They are responsible for designing algorithms and implementation including loading from disparate datasets, pre-processing using Hive and Pig. The candidate should scope and deliver solutions with the ability to design solutions independently based on high-level architecture. They should also maintain the production systems like Kafta, Hadoop, Cassandra, and Elasticsearch.

Qualifications

Applyherefor the job.

Location: Bengaluru, Karnataka, India

About the company: Accenture is a global professional services company that provides a range of services and solutions in strategy, consulting, digital, technology, and operations. Combining deep experience and specialized skills across 40 industries and business functions, Accenture works at the intersection of business and technology to help clients improve performance and create sustainable value for stakeholders.

Roles and responsibilities: As a senior analyst- artificial intelligence innovation, the candidate will be aligned with Accentures insights and intelligence vertical and help them generate insights by leveraging the latest artificial intelligence and analytics techniques to deliver value to its clients. Generally, the artificial intelligence innovation team at Accenture is responsible for the creation, deployment, and managing of the operations of projects. In this role, the candidate will need to analyze and solve increasingly complex problems. He/she should frequently interact with their peers at Accenture and clients to manage the development well.

Qualifications

Applyherefor the job.

Location: Azcapotzalco, Mexico City, Mexico

About the company: AT&T is a US-based telecom company and the second largest provider of mobile services. AT&T operates as a carrier of both fixed and mobile networks in the US but offers telecoms services elsewhere. The company also provides pay-TV services through DirecTV.

Roles and responsibilities: The artificial intelligence engineer at AT&T is responsible for designing and implementing artificial intelligence and machine learning packages, including data pipelines, to process complex, large-scale datasets used for modelling, data mining, and research purposes. He/she is expected to design, develop, troubleshoot, debug, and modify software for AT&T services or the management and monitoring of these service offerings. They should interact with systems engineers t realize the technical design and requirements of the service, including management, systems, and data aspects.

Qualifications

Applyherefor the job.

Location: Bengaluru, Karnataka, India

About the company: LinkedIn is a social networking site designed to help people make business connections, share their experience and resumes, and find jobs. LinkedIn is free, but a subscription version called LinkedIn Premium offers additional features like online classes and seminars.

Roles and responsibilities: As a data engineer at LinkedIn, the candidate is expected to work with a team of high-performing analytics, data science professionals, and cross-functional teams to identify business opportunities, optimize product performance or go to market strategy. He/she should build data expertise and manage complex data systems for a product or a group of products. They should perform all the necessary data transformation tasks to serve products that empower data-driven decision-making. The candidate should establish efficient design and programming patterns for engineers as well as for non-technical partners.

Qualifications

Applyherefor the job.

Location: Bengaluru, Karnataka, India

About the company: Google is an American search engine company founded in 1998. Began as an online search firm, the company now offers more than 50 internet services and products including email, online document creation, software for mobile phones and tablet computers, etc.

Roles and responsibilities: As a data scientist at Google, the candidate will evaluate and improve Googles products. They will collaborate with a multi-disciplinary team of Engineers and Analysts on a wide range of problems, bringing analytical rigour and statistical methods to the challenges of measuring quality, improving consumer products, and understanding the behaviour of end-users, advertisers, and publishers. He/she should work with large, complex data sets and solve difficult, non-routine analysis problems by applying advanced methods. The candidate should conduct end-to-end analysis, including data gathering and requirements specification, processing, analysis, ongoing deliverables, and presentations.

Qualifications

Applyherefor the job.

Location: Melbourne, Victoria, Australia

About the company: Tata Consultancy Services, also known as TCS, is a global leader in IT services, digital, and business solutions. The company partners with clients to simplify, strengthen, and transform their business.

Roles and responsibilities: As a data engineer, the candidate should design and build production data pipelines from ingestion to consumption within a big data architecture, using Java, Python, or Scala. He/she should design and implement data engineering, ingestion, and curation functions on AWS cloud using AWS native or custom programming.

Qualifications

Applyherefor the job.

Read more:

Grab the Opportunity: Top AI and Data Science Jobs to Apply Today - Analytics Insight

Women in Data Science event will showcase the versatility of a career in data – Siliconrepublic.com

Communication, engagement and creativity are key to fostering the next generation of data scientists, said event organiser Aine Lenihan.

Aine Lenihan, also known as Data Damsel, will host a panel of world-leading women working in data science as part of a free virtual event she has organised.

The event, which coincides with International Women in Engineering Day on 23 June, aims to support women and girls entering the male-dominated field of data science.

All genders are welcome to attend the event, which is part of the Women in Data Science (WiDS) Worldwide conference series run by Stanford University in more than 150 locations worldwide.

In her role as WiDSs Irish ambassador, Lenihan said she curated the events speakers carefully with the aim of highlighting to attendees the versatility of a data science career.

Unfortunately, the people shaping the data that shapes the world are a homogeneous bunch AINE LENIHAN

We have something for sports fans and entrepreneurs with the amazing Hlne Guillaume straight from her experience on Dragons Den. We will have the head of data science at Starling Bank, Harriet Rees, as well as TEDx speaker and clinical neuroscientist Rebecca Pope. Well also have the managing director of Accenture Labs, the renowned Medb Corcoran, and Jennifer Cruise, who has just joined EY as director of analytics.

Having mentored and lectured in data and database management systems in Trinity College Dublin, as well as working in the industry for over 20 years, Lenihan is passionate about education and increasing the visibility of women role models for the next generation.

Real-world applications of data science, from combatting cybersecurity to climate change, are shaping todays world and tomorrows. Unfortunately, the people shaping the data that shapes the world are a homogeneous bunch, she said.

Somewhere upwards of 78pc to 85pc of data scientists are male. Bringing women into data science is critical for ensuring accurate and unbiased data is available for todays data-driven businesses.

Lenihans own personal experience as a woman in data science also drives her push for greater gender diversity in her industry. She currently works as a senior analytics architect in IBMs Watson division and previously work in various software and data roles at AIB.

A challenge I have found throughout my career has been missing the company of having other women on my team. Thats a really important factor for me, she said.

Thankfully, there are now lots of active communities, including WiDS, bridging those gaps.

So, what advice would Lenihan give to people hoping to pursue a career in data science and the STEM industry in general?

She predicts the sector will be one of the fastest-growing through to the rest of the decade, and it will be subject to continual evolution of trends in artificial intelligence. AI will also create completely new jobs we havent even dreamed up yet, she said.

My advice is not gender-specific, but a call to those who do not consider STEM for them because they are creative rather than analytical. To those people, I want them to know that creativity is the secret sauce in STEM. Creativity and STEM are no longer chalk and cheese. There is a place for all skillsets and talents in STEM careers. Being creative rather than analytical does not rule out a career in STEM for anyone. In fact, the magic happens when both work together.

Lenihan has herself learned to appreciate the broad expanse of data science and the multitude of backgrounds data scientists can have in her role mentoring second and third-level students on IBMs Pathways to Technology (P-Tech) programme.

As a mentor with P-Tech, students share their career goals and dreams for the future with me, some of which they think are unrelated to AI. But it really gives me the perfect opportunity for me to demonstrate how AI will touch every industry. So you want to be a footballer? Well, AI is being used to create future superstars by boosting performance, minimising injury, predicting recovery time. Want to work as a makeup artist? Well check out the handheld makeup printer, or the NASA-backed skincare micro-mist.

The dynamic Data Damsel who was given the nickname by her former colleagues due to her relatively rare status as a young, female data expert is confident about the industrys future.

There really is so much happening to pique the interest of future data damsels, but communication and engagement is key. There will no doubt come a time I will be more data dame than damsel but hopefully if I continue my advocacy, there will be many generations of data damsels behind me.

Read more:

Women in Data Science event will showcase the versatility of a career in data - Siliconrepublic.com