Category Archives: Data Science
KnowBe4’s Vice President of Data Science and Engineering Paras … – PR Web
KnowBe4's vice president of data science and engineering, Paras Nigam, has been recognized with a in the leaders in AI and analytics industry category.
Nigam is a seasoned entrepreneur with a keen interest in cybersecurity and data science. He serves as the vice president of data science and engineering at KnowBe4, overseeing the development of cutting-edge AI and data-driven products and aligns them with the organization's overarching AI adoption strategy. Additionally, he leads KnowBe4's SecurityCoach product teams across India. Nigam is dedicated to building a high-caliber AI team, with a particular focus on generative AI, and fostering a culture of innovation within KnowBe4. He was one of 8,400 nominations for the 2023 3AI Zenith Awards.
"I am humbled to be recognized as an inspiring leader within the AI and analytics industry," said Nigam. "I am grateful for my family's unwavering support and the invaluable guidance of my mentors. I also want to express my heartfelt gratitude to KnowBe4's CEO Stu Sjouwerman, all of the research and development leaders and my team who have entrusted me with the opportunity to drive the AI practice at KnowBe4. Let's continue to inspire and innovate together!"
For a full list of the 2023 3AI Zenith Award recipients, visit here. To learn more about KnowBe4 and view open positions, visit here.
About KnowBe4 KnowBe4, the provider of the world's largest security awareness training and simulated phishing platform, is used by more than 65,000 organizations around the globe. Founded by IT and data security specialist Stu Sjouwerman, KnowBe4 helps organizations address the human element of security by raising awareness about ransomware, CEO fraud and other social engineering tactics through a new-school approach to awareness training on security. The late Kevin Mitnick, who was an internationally recognized cybersecurity specialist and KnowBe4's Chief Hacking Officer, helped design the KnowBe4 training based on his well-documented social engineering tactics. Organizations rely on KnowBe4 to mobilize their end users as their last line of defense and trust the KnowBe4 platform to strengthen their security culture and reduce human risk.
Media Contact
Amanda Tarantino, KnowBe4, (727) 748-4221, [emailprotected], https://www.knowbe4.com/
SOURCE KnowBe4
Link:
KnowBe4's Vice President of Data Science and Engineering Paras ... - PR Web
Berkeley Space Center at NASA Ames to become innovation hub for … – UC Berkeley
The University of California, Berkeley, is teaming up with NASA's Ames Research Center and developer SKS Partners to create research space for companies interested in collaborating with UC Berkeley and NASA scientists and engineers to generate futuristic innovations in aviation, space exploration and how we live and work in space.
The Berkeley Space Center, announced today (Monday, Oct. 16), aims to accommodate up to 1.4 million square feet of research space on 36 acres of land at NASA Ames' Moffett Field in Mountain View, leased from NASA.
The new buildings, some of which could be ready for move-in as early as 2027, will house not only state-of-the-art research and development laboratories for companies and UC Berkeley researchers, but also classrooms for UC Berkeley students. These students will benefit from immersion in the Silicon Valley start-up culture and proximity to the nation's top aeronautical, space and AI scientists and engineers at Ames.
"We would like to create industry consortia to support research clusters focused around themes that are key to our objectives, in particular aviation of the future, resiliency in extreme environments, space bioprocess engineering, remote sensing and data science and computing," said Alexandre Bayen, a UC Berkeley professor of electrical engineering and computer sciences and associate provost for Moffett Field program development.
"We're hoping to create an ecosystem where Berkeley talent can collaborate with the private sector and co-locate their research and development teams, he added. And since we will be close to NASA talent and technology in the heart of Silicon Valley, we hope to leverage that to form future partnerships."
Ever since Naval Air Station Moffett Field was decommissioned in 1994 and NASA Ames acquired an additional 1,200 acres, NASA has been focused on developing those acres into a world-class research hub and start-up accelerator. Initiated in 2002, NASA Research Park now has some 25 companies on site, including Google's Bay View campus.
"We believe that the research and the capabilities of a major university like Berkeley could be a significant addition to the work being done at Ames," said NASA Ames Director Eugene Tu. "In a more specific way, we would like the potential of having proximity to more students at the undergraduate and graduate level. We would also like the possibility of developing potential partnerships with faculty in the future. The NASA mission is twofold: inspiring the next generation of explorers, and dissemination of our technologies and our research for public benefit. Collaboration between NASA and university researchers fits within that mission."
UC Berkeley hopes eventually to establish housing at Moffett Field to make working at the innovation center easier for students without a 47-mile commute each way. Bayen noted that Carnegie Mellon University already occupies a teaching building at Moffett Field. With the addition of UC Berkeley and the proximity of Stanford University, he expects the intensity of academic activities in the area, both instructional and research, to increase immensely.
"We have major facilities here at Ames the world's largest wind tunnel, NASA's only plasma wind tunnel to test entry systems and thermal protection systems, the agency's supercomputers and the university will likely build facilities here that that we might leverage as well. So, I look at that as a triad of students, faculty and facilities," Tu added. "Then the fourth piece, which is equally important: If the project is approved to move forward, the university will likely bring in partners, will bring in industry, will bring in startups, will bring in incubators that could be relevant to NASA's interest in advancing aeronautics, science and space exploration."
"What they're doing at NASA Ames is transformational, but in order to make it heroic, in order to make it even larger than what is now possible, they have to use the combined resources of the number one public university in the world, private industry and the most innovative place on the planet, which is Silicon Valley," said Darek DeFreece, the projects founder and executive director at UC Berkeley.
Bayen emphasized that many academic institutions are now becoming global universities: New York University has demonstrated the ability to operate independent campuses on different continents the Middle East and Asia while Cornell has successfully opened a second campus in Manhattan, five hours from Ithaca. In the same vein, UC Berkeley is innovating by launching this research hub that, over the decades to come, could evolve into a campus as instructional and research and development activities grow.
This expansion of Berkeleys physical footprint and academic reach represents a fantastic and unprecedented opportunity for our students, faculty and the public we serve, said UC Berkeley Chancellor Carol Christ. Enabling our world-class research enterprise to explore potential collaborations with NASA and the private sector will speed the translation of discoveries across a wide range of disciplines into the inventions, technologies and services that will advance the greater good. We are thrilled. This is a prime location and a prime time for this public university.
Claire Tomlin, now professor and chair of electrical engineering and computer sciences at UC Berkeley, conducted her first research on automated collision avoidance systems for drones at Moffett Field, and foresees similar opportunities there for UC Berkeley students, especially those enrolled in the College of Engineerings year-old aerospace engineering program.
"With our new aerospace engineering major, it is the right time to get started at Moffett Field. It offers an outdoor testbed for research on how to integrate drones or other unpiloted aerial vehicles, which are being used increasingly for aerial inspection or delivery of medical supplies, into our air traffic control system," she said. "I anticipate great collaborations on topics such as new algorithms in control theory, new methods in AI, new electronics and new materials."
Tomlin envisions research on networks of vertiports to support operations of electric autonomous helicopters or e-VTOLs (electric vertical takeoff and landing vehicles), much like UC Berkeley's pioneering research in the 1990s on self-driving cars; collaborative work on how to grow plants in space or on other planets to produce food, building materials and pharmaceuticals, similar to the ongoing work in UC Berkeley's Center for the Utilization of Biological Engineering in Space (CUBES); and collaborations on artificial intelligence with top AI experts in the Berkeley Artificial Intelligence Research lab (BAIR).
"This is the decade of electric automated aviation, and the Berkeley Space Center should be a pioneer of it, not just by research, but also by experimentation and deployment," Tomlin said. "We're interested in, for example, how one would go about designing networks of vertiports that are economically viable, that are compatible with the urban landscape, that are prone to public acceptance and have an economic reality."
"Advanced air mobility and revolutionizing the use of the airspace and how we use drones and unpiloted vehicles for future air taxis or to fight wildfires or to deliver cargo are other areas of potential collaboration," Tu added.
Hannah Nabavi is one UC Berkeley student eager to see this proposed collaboration with NASA Ames and industry around Silicon Valley, even though she will have graduated by the time it comes to fruition. A senior majoring in engineering physics, she is the leader of a campus club called SpaceForm that is currently tapping NASA Ames scientists for research tips on projects such as how materials are affected by the harsh environment on the moon.
"I think one of the primary advantages to UC Berkeley of having this connection is it allows students to obtain a perspective on what's happening in the real world. What are the real-world problems? What are the goals? How are things getting done?" said Nabavi, who plans to attend graduate school on a path to a career in the commercial space industry. "It also helps students figure out what they want to focus on by providing an early understanding of the research and industrial areas in aerospace."
But beyond the practical benefits, she said, "I think that seeing all of these scientists and engineers tackling issues and questions at the forefront of aerospace can serve as a huge inspiration to students."
In addition, data science and AI/machine learning are rapidly disrupting the aviation and space industry landscape as it evolves toward automation and human-machine interaction and as ever bigger datasets are being produced. The workforce needs retraining in these rapidly evolving fields, and UC Berkeleys College of Computing, Data Science, and Society (CDSS) is well positioned to provide executive and professional education to meet these needs.
"Berkeley Space Center offers the possibility for CDSS students to work on these new challenges, particularly in the fields of aeronautics and astronautics, planetary science and quantum science and technology," said Sandrine Dudoit, associate dean at CDSS, professor of statistics and of public health and a member of the Moffett Field Faculty Steering Committee.
DeFreece noted that there are NASA collaborations already happening on the UC Berkeley campus. Many leverage the mission management and instrument-building skills at the Space Sciences Laboratory, which is responsible for the day-to-day operation of several NASA satellites and is building instruments for spacecraft that NASA will land on the moon or launch to monitor Earth and the sun.
UC Berkeley researchers are already investigating how to print 3D objects in space, how to create materials to sustain astronauts on Mars, how to test for life-based molecules on other planets and moons, and whether squishy robots could operate on other planets. UC Berkeley spin-offs are developing ways to monitor health in space and provide low-cost insertion of satellites into orbit.
"The Berkeley Space Center could be a place where half of the day students are collaborating with center neighbors, and the other half of the day they might be taking classes and seeing their mentors who are supervising class projects on the satellite that is hovering over their heads at that very moment," Bayen said. "Experiences like these just don't exist anywhere else at the present time."
UC Berkeley's Haas School of Business and Berkeley Law are also working on issues surrounding the commercial exploitation of space, including asteroids and other planets, and the laws that should govern business in space.
"Space law and policy are also areas where I think there's some tremendous opportunities to collaborate with the university," Tu said. "What are we going to do when we find resources on the moon, and other countries do as well, and companies want to make money from that?"
In return for its investment and partnership, UC Berkeley will receive a portion of the revenues that the real estate development is projected to generate. While market-based returns are always subject to change, the joint venture conservatively estimates that the research hub will receive revenues more than sufficient to ensure that Berkeley Space Center is self-sustaining, as well as provide new financial support to the core campus, its departments and colleges, and faculty and students.
UC Berkeley also expects significant additional revenue from other, project-related sources, including new research grants, industry participation and partnerships, and the incubation and commercialization of emerging companies born from translational research and technologies created at the site.
SKS Partners, a San Francisco-based investor and developer of commercial real estate properties in the western U.S., will lead the venture. The planning team for the Berkeley Space Center will pursue LEED certification for its buildings a mark of sustainability by using solar power, blackwater and stormwater treatment and reuse, and emphasizing non-polluting transportation.
While construction is tentatively scheduled to begin in 2026, subject to environmental approvals, UC Berkeley is already creating connections between Silicon Valley companies on the NASA Ames property, including executive education programs.
"In the next couple of years, we could conceivably have a semester rotation program, where UC Berkeley students spend one semester at Berkeley Space Center, take three classes taught there, do their research there, are temporarily housed there for a semester, just like they would do a semester abroad in Paris," Bayen said. "Ultimately, we hope to build experiences that currently do not exist for students, staff and faculty and create an innovation ecosystem where breakthroughs that require public-private partnerships are enabled."
The development team includes as co-master planners HOK, an architecture, engineering and planning firm, and Field Operations, an interdisciplinary urban design and landscape architecture firm.
See the article here:
Berkeley Space Center at NASA Ames to become innovation hub for ... - UC Berkeley
IEO evaluation of the Bank of England’s use of data to support its … – Bank of England
Foreword from the Chair of Court
Data are critical to the work of a central bank. The Bank of England has long recognised this. Most recently, we defined decision-making informed by the best available data, analysis and intelligence as a timeless enabler of our mission. And, to deliver on that, in 2021 we made modernise the Banks ways of working a strategic priority for the years 2021 to 2024.
At the same time, the pace of innovation in data and analytics continues to increase, as the recent advances in the capabilities of large language models make clear. Every day, the Bank makes decisions that affect millions of the UKs people and businesses the Banks data and analytics capabilities support and power that decision-making process. It is therefore vital that we stand back and consider whether our data capabilities will remain fit for purpose in a rapidly changing world such that we deliver our timeless enabler and ultimately our mission.
To that end, in October 2022 the Banks Court of Directors commissioned its Independent Evaluation Office (IEO) to conduct an evaluation of the Banks use of data to support its policy objectives.
The IEOs report is clear. Overall, and despite many positive steps, looking forward the Bank must ensure that its data capabilities advance to match its ambition, especially as data and analytics best practice advances rapidly. While the Bank is not alone in facing this challenge, addressing it is strategically critical. The Bank will therefore need to set itself up for success by stepping up the pace of change, investing in its technology and people, and overcoming the barriers that will impede progress in a rapidly changing data and analytics landscape.
The IEOs recommendations provide a foundation for doing so. They make 10 detailed recommendations, grouped into three broad themes: committing to a clear vision for data and analytics, supported by a comprehensive strategy and effective governance; overcoming the institutional, cultural and technological barriers faced by organisations as they move to new and emerging data-centric ways of working to keep in step with a changing world; and ensuring the Banks staff have the support and skills they need.
At our 22 September meeting, Court welcomed the Banks commitment in taking forward these recommendations. We will monitor their implementation as part of the IEOs follow-up framework.
David Roberts, Chair of CourtOctober 2023
Data have long been at the heart of central banking. But the availability of data and the capabilities to draw insights from them have developed rapidly over the past decade or so. These changes, when coupled with expanding remits and global shocks, have created both opportunities and challenges for central banks. In that context, in October 2022 the Court of Directors (the Banks board) commissioned its IEO to conduct an evaluation of the Banks use of data to support its policy objectives.
In response to rapid change, central banks have innovated in multiple dimensions, from institutional structures to technological infrastructure, to new analytical methods and data sources. But, like many organisations, they have faced a range of challenges along the way, whether from legacy systems, established working practices or the practicalities of cloud migration.
The Bank of England has been on a similar journey to peer central banks. It made data a prominent feature of the 2014 One Bank strategy and in the strategic priorities for the next three years that it set out in 2021. It created the role of Chief Data Officer supported by an expanding team. It has developed a sequence of data strategies (Box A), founded on credible problem statements. It has rolled out new analytical and storage capabilities with associated training for staff. And, supported by its emerging centres of excellence, it has done pioneering analysis with new techniques and data sources with examples ranging from the use of machine learning to predict individual bank distress from regulatory returns and plausibility checking returns from regulated firms, through to tracking the macroeconomy at high frequency during the pandemic with unconventional measures of activity. footnote [1] footnote [2]
The Banks current data and analytics operating model devolves a large amount of responsibility for data and analytics to its business functions. The central Data and Analytics Transformation Directorate, led by the Chief Data Officer, is responsible for enabling those areas in their delivery of the central data strategy, which is half of one of the Banks seven strategic priorities for 202124. This model is currently in transition, partly in response to our evaluation and partly as a result of leadership change, with a new Chief Data Officer having started in role in April 2023.
Our evaluation took the overarching research question Is decision-making to support the Banks policy objectives informed by the best available data, analysis and intelligence, and can it expect to be so in the future?. We adapted this from the Banks timeless enabler on data, which was set out alongside the 202124 strategic priorities. We broke that down into four detailed areas of investigation, covering broad questions of strategy and governance and three detailed areas of data management: acquisition and preparation; storage and access; and analysis and dissemination. Our evidence gathering involved: conducting around 175 interviews, across the Bank and a range of other organisations, including peer central banks and regulators; a staff survey, complemented by targeted focus groups; and consulting an advisory group of senior Bank staff and, separately, two external expert advisors.
With best practice in data and analytics advancing rapidly, the Bank will need to step up the pace of change and associated investment if it is to take advantage of new opportunities. While progress has been made using a devolved operating model, data capabilities are inconsistent across the organisation and in some cases the current approach is sub-optimal. To progress further, management will need to systematically address a range of foundational technology and process issues and build the capabilities necessary to enable the Bank to take advantage of new data tools so it can be in the best position to deliver on the Bank's mission. We make 10 detailed recommendations, which we grouped into three broad themes: committing to a clear vision, supported by a comprehensive strategy and effective governance; breaking down institutional, cultural and technological barriers to keep in step with a changing world; and ensuring staff have the support and skills they need.
Theme 1: Agree a clear vision for data and analytics, supported by a comprehensive strategy and effective governance.
1. Agree and champion a vision for data use, matching funding to ambition.
2. Collaboratively design deliverable Bank-wide and local business area data strategies to meet measurable business outcomes.
3. Ensure governance structures can support the agreement, co-ordination and monitoring of data transformation, with clear accountability for delivery.
Theme 2: Break down institutional, cultural and technological barriers to keep the Bank's data and analytical practices in step with a changing world.
4. Improve day-to-day collaboration across the business on data and analytics.
5. Agree the approach to sharing data and analytics inside and outside the Bank.
6. Narrow the gap with modern data and analytics practices, with the most impactful initial step being a phased migration to cloud.
7. Systematically monitor and experiment with new approaches and technology for data and analytics.
Theme 3: Ensure staff have the support and skills they need to work effectively with data.
8. Embed common standards to make data and analysis easily discoverable and repeatable.
9. Provide staff with the easily accessible support and guidance they need across the data lifecycle.
10. Develop a comprehensive data skills strategy encompassing hiring, training, retention and role mix.
In addition, the Bank is currently taking a range of actions to strengthen key foundational enablers. Successful execution of these wider initiatives will be crucial to fully delivering the Banks data ambitions: i) improvements to the approach to setting organisational strategy, prioritisation and budgeting; ii) tackling technology obsolescence; and iii) strengthening of the Banks central services and change management capabilities. The appointment of an Executive Director to lead a new Change and Planning function, the delivery of the Central Services 2025 programme, and future iterations of the Banks wider talent strategy will contribute across these areas of focus.footnote [3]
The evaluation was conducted by a dedicated project team reporting directly to the Chair of Court.footnote [4] The IEO team benefited from feedback and challenge from a Bank-wide senior-level advisory group (including Bank Governors). David Craig (founder and former CEO, Refinitiv, former Head of Data and Analytics, LSEG, and Executive Fellow, London Business School) and Kanishka Bhattacharya (Expert Partner, Bain & Company, and Adjunct Associate Professor, London Business School) provided support and independent challenge to the team and reviewed and endorsed the findings in this report.
This report was approved for publication by the Chair of Court at the September 2023 Court meeting.
Data have long sat at the heart of central banking, including at the Bank of England. At least since the heyday of the gold standard, monetary policy makers have drawn on data to determine the stance of monetary policy. The Banks Quarterly Bulletin, the Banks flagship publication from its introduction in 1960 through to the 1993 launch of the Inflation Report, offered a detailed commentary on economic and financial developments, supported by an extensive range of statistics. These days, the Monetary Policy Report and Financial Stability Report continue to provide detailed coverage of the data and analytics that have gone into policy formulation. Supervisors now work with a broad range of regulatory returns, with the volume of supervisory data available having increased materially since the global financial crisis.
Nonetheless, the world of data has been changing rapidly and central banks have had to adapt to at least three continuing developments. Global events have presented new policy challenges, most notably the global financial crisis and Covid-19 pandemic. Central banks have often broadened their focus, with some taking on additional macroprudential, microprudential and supervisory roles. More broadly, technological change has led to both vastly more data being available to central banks and the development of powerful new tools to interpret them.
Central banks have had to innovate in response to these developments, although this has not always been easy. They have explored institutional change, including appointing chief data officers, adopting data strategies and experimenting with a range of structures for data governance and management. Many have migrated to cloud. In 2020, 80% of respondents to a BIS survey said that they were using big data sources, up from 30% in 2015.footnote [5] But, at the same time, they have struggled with legacy systems and migrating to new technology, including the unfamiliar IT arrangements that this can involve. New analytics and data practices have needed to fit into existing policy frameworks, including generating reliable results that can be interpreted by policymakers.
The Bank of England has been on the same journey as its major peers. It acquired new responsibilities following the global financial crisis, including: a statutory committee responsible for macroprudential policy; and microprudential policy for, and supervision of, banks, insurers and financial market infrastructures. It has had to adapt to major economic events, including the global financial crisis, the UKs exit from the European Union, the Covid-19 pandemic and, most recently, Russias invasion of Ukraine. And over the past decade or so the Bank has acquired large amounts of new data including microdata on firms and households, regulatory data on banks and insurers and asset or even transaction-level data on key financial products and unconventional data from operational, administrative and digital sources.
The Bank made data a prominent feature of its 2014 One Bank strategy and the strategic priorities for the next three years that it set out in 2021, with both strategies supported by credible assessments of the Banks analytics and data capabilities. In 2014 it created the role of Chief Data Officer, initially at a relatively junior level, before it was made an Executive Director role in 2019. Its data transformation efforts have been supported by an expanding team; from a small Division reporting to the Chief Information Officer it has grown to a full Directorate, bringing together data transformation with Divisions that were already part of the Monetary Policy area, covering advanced analytics and the collation and publication of statistical and regulatory data.
Over the past decade the Bank has taken significant steps to enhance data and analytics. It launched a rationalised and improved suite of analytical tools, which allowed it to focus support and training resources more effectively. It has expanded the range of storage options available, most notably introducing the Data and Analytics Platform to host large data sets. In 2014 it created an Advanced Analytics Division, to act as a centre of excellence. Together, these steps have facilitated increased uptake of programmatic analytical tools and have allowed further centres of excellence to emerge across the organisation.
As a result, the Bank has been able to conduct innovative data and analytics work. Notable examples include: embedding machine learning into the plausibility checking of returns from regulated firms; a predictive analysis tool to support selection of regulated firms for the Prudential Regulation Authoritys (PRAs) Watchlist; tools to analyse insights from firms management information; and, during the pandemic, the rapid adoption of high-frequency indicators from unconventional sources to track economic developments.footnote [6]
The Banks current approach to delivering its mission of promoting the good of the people of the United Kingdom is summarised in its strategic priorities, which support cross-Bank prioritisation. Data appear twice within the current strategic plan, both as a timeless enabler of the Banks mission decision-making informed by the best available data, analysis and intelligence and as Strategic Priority 7, modernise the Banks ways of working. Strategic Priority 7 has two sponsors at Executive Director level, the Chief Data Officer and the Chief Information Officer, as the majority of actions fall to the Data and Analytics Transformation (DAT) and Technology Directorates. The Banks data strategy is an integral part of Strategic Priority 7 and is led by the Chief Data Officer and DAT.
Strategic priorities 202124
The 2021 data strategy had three broad strands. The first focused on enabling, consisting primarily of expanding and refining existing offerings around data collection, storage, support and training. The second consisted of targeted improvements in business outcomes, such as the work in the PRA on RegTech. The third was the Transforming Data Collection programme, run jointly with the Financial Conduct Authority (FCA), which aimed to ensure regulators get the data they need to fulfil their mission, at the lowest possible cost to industry.footnote [7]
DAT, under the ultimate oversight of the Deputy Governor for Monetary Policy and (since 2022) the Chief Operating Officer, is a central function with four roles, all relevant to delivering the data strategy:
DSID plays a key role in several components of the Banks data strategy. They provide key support services, including: management of the Data and Analytics Platform; a Data and Analytics email-based helpdesk; provision of training and guidance to support analytical and data management best practice; and ownership of the Data Catalogue, which is intended to support data governance and act as a repository of key data sources in use across the Bank. DSID also partners with business areas to help them to deliver their priority outcomes through better use of data and analytics. That is supported by the Data and Analytics Business Partners team, which provides a formal link between business areas and experts in the data function, and the Analytics Enablement Hub (AEH), which works with business areas on targeted projects. AEH also provides training and guidance to support the use of a range of modern analytical tools (eg R, Python), strategically selected for the Banks use cases. DSID and the PRA are also working with the FCA and industry to transform data collection from the UK financial sector and have recently established a cross-Bank taskforce to more effectively combine expertise.footnote [8]
More broadly, many of DATs functions are intended to enable business areas to deliver the central data strategy in line with business-area priorities. The management of data assets and many data transformation initiatives sit with individual business areas. For example, after DAT processes and plausibility-checks collections, regulatory data on banks are held and managed by the PRA, while Monetary Analysis manages a database of macroeconomic time series. Business areas across the Bank have begun to experiment with different approaches to strengthening data science skills, whether through training or hiring. Many areas have developed their own centres of analytical excellence specialising in data science techniques. For example, the PRA RegTech team has developed natural language processing tools and acts as a co-ordinating hub for other data specialist teams working in PRA supervision and policymaking areas. Similarly, the Financial Markets Infrastructure Directorates Data team has developed expertise in techniques required to analyse large transaction-level markets data sets. With extensive autonomy, some areas have well-developed data strategies focused on business area priorities in addition to the transforming data collection agenda, the PRAs data strategy covers how regulatory data are accessed, the development of dashboards for supervisors, as well as coaching and digital skills while others have more minimal arrangements. This dispersion of responsibilities was mirrored in oversight, which at the outset of this evaluation was spread across a large number of data or (for investment) programme boards, as well as strategic committees like the Executive Policy Co-ordination Committee, the Executive Operational Co-ordination Committee and the Operations and Investment Committee.
Overall, the current operating model for data and analytics is in transition, partly in response to our evaluation and partly as a result of leadership change.footnote [9] More broadly, important enablers of the Banks data activities are undergoing change: a new plan is being drawn up to tackle technology obsolescence, alongside a cloud migration strategy; central services are being upgraded through the Central Services 2025 (CS2025) programme; and the Banks change capabilities are being strengthened by the appointment of an Executive Director for Change and Planning. Successful execution of these wider initiatives will be crucial to fully delivering the Banks data ambitions.
We took the overarching research question Is decision-making to support the Banks policy objectives informed by the best available data, analysis and intelligence, and can it expect to be so in the future?. We adapted this from the Banks timeless enabler on data, which was set out alongside the 202124 strategic priorities (Figure 1). We broke that down into four evaluation criteria, each underpinned by a set of benchmarks:
We conducted an extensive evidence-gathering exercise, drawing on three main sources:
Launched in 2014, the One Bank strategy was intended as a transformative strategic plan to help the Bank, which had recently expanded to accommodate the newly created Financial Policy Committee and Prudential Regulation Authority, operate successfully as a single organisation.footnote [10] One of the plans four pillars was dedicated to analytic excellence, including making creative use of the best analytical tools and data sources to tackle the most challenging and relevant issues. The strategy saw the creation of the Banks first Chief Data Officer and the Advanced Analytics Division. Specific actions included: external partnering to explore the use of big data and advanced inductive analytics capabilities; and the creation of a One Bank data architecture. The One Bank data architecture aimed to: integrate all the Banks data under the common oversight of the Chief Data Officer; increase the efficiency of data collection and management; share data more widely inside the Bank; and make greater use of third-party providers with economies of scale.
The National Audit Office (NAO) evaluated progress on the One Bank strategy in 2017.footnote [11] It found that of the 15 initiatives planned as part of the strategy, only one was substantively incomplete, the One Bank data architecture. The NAO found that: This turned out to be much more complex than expected, with the Bank identifying that the new IT would need to support around 182 data systems and 2,700 data sets.
Vision 2020, launched in 2017, was the successor strategic plan to the One Bank strategy. Formally, Vision 2020 had a reduced focus on data relative to its predecessor, with data touched on only in the context of data visualisation, under Creative, targeted content, and data sharing, under Unlocking potential. However, in parallel to Vision 2020 in 2017, a data programme was developed as a successor to the One Bank data architecture. Recognising that the previous initiative had been very ambitious relative to the available expertise, budget and planned timescales, the scope agreed in 2017 was narrower and focused on providing self-service tools. Despite the reduced ambition, the 2014 strapline for the programme was preserved: an integrated data infrastructure across the Bank, to enable information sharing. When the programme closed in mid-2020 it was considered to have substantively delivered on the narrower 2017 objectives, though delivery of the Data and Analytics Platform was separated out and was not fully rolled out until 2022.
In 2018, the Bank commissioned Huw van Steenis to write a report on the future of finance.footnote [12] His 2019 report recommended that the PRA embrace digital regulation, including developing a long-term strategy for data and regulatory technology. In its response, the Bank committed to develop a world-class regtech and data strategy.footnote [13] Specific commitments included: consulting supervised firms on how to transform the hosting and use of regulatory data; enhancing the Banks analytics, including peer analysis, machine learning and artificial intelligence; proofs of concept around enhanced analytics and process automation; and making the PRAs rulebook machine readable. As part of delivering its response, the Bank elevated the role of Chief Data Officer to Executive Director level and created the Data and Analytics Transformation Directorate, bringing together a range of existing Divisions.
Launched in 2021, the Banks strategic priorities for 202124 include Strategic Priority 7, modernise the Banks ways of working.footnote [14] This has two elements, one focused on data (described in more detail under current operating model) and one on strengthening the Banks technology. The data elements of the Banks response to the Future of Finance report were incorporated into Strategic Priority 7.
The Bank has consistently set itself a high level of ambition on data and analytics over the past decade and the effective use of data appears prominently in its current strategic plan. The Banks ambitions have been grounded in convincing assessments of the Banks data and analytics capabilities. However, progress has been inconsistent across the organisation with variation in the degree to which ambition has been matched by resources, plans and management oversight. This highlights the challenges inherent in a more devolved data operating model, especially during times of significant transformation in the external data and analytics landscape. Prompted by emerging findings from this evaluation and the arrival of a new Chief Data Officer in April, this area has seen the most change since our evaluation began, which offers a strong foundation for addressing our Theme 1 recommendations around vision, resources, strategy and governance.
Ensuring consensus around the Banks vision for data will be vital, because delivering it will require funding that consistently matches ambition and more concerted championing. The broad support that we heard from the Banks Executive for the current level of ambition suggests the Banks existing timeless enabler could continue to serve as a benchmark for the Banks ambitions. But the renewed strategic conversation currently occurring is needed to restate the case and galvanise support. A renewed vision will need to be met with plans and budgets that consistently match the level of ambition, even if the Bank faces competing priorities. The Bank will also need to review how it tracks spending on data and analytics and make sure funding remains consistent with plans. The Banks senior leaders will need to build on the efforts of the new Chief Data Officer to ensure the centrality of effective data use to the Banks mission is understood both inside and outside the Bank. The PRA has piloted a data skills coaching programme for senior leaders which, if extended Bank-wide, would help support championing efforts.
With an agreed vision and commitment to funding (Recommendation 1), the Banks central functions and business areas will need to work together to refresh the Banks data strategy. This collaborative approach will require common understanding of the art of the possible and of the Bank-wide and individual business areas target operating models, encompassing data, technology and skills. These inputs would support the development of enterprise, data and technology architectures describing their current and target states.footnote [15] As the Bank-wide strategy is refreshed, business areas will need to develop local strategies that are embedded within that and champion them. Nor can the data strategy stand alone it will need to be consistent with, and perhaps developed alongside, supporting strategies for technology and people. In order to build trust, it is important that stakeholders can see measurable progress. That would be aided by: quantified and planned expected benefits ex-ante; mechanisms to track progress; and evaluated outcomes ex-post.
The newly established Data and Analytics Board fills an executive-level gap identified in the early stages of the evaluation and will need to ensure its membership, terms of reference and supporting structures allow the refreshed data strategy (Recommendation 2) to be developed, co-ordinated and monitored during implementation.footnote [16] The Board is a promising development; as it becomes established, its co-chairs the Chief Data Officer and Chief Information Officer and membership will want to ensure that it: remains a forum that effectively convenes central functions and business areas; forges consensus on Bank-wide data and analytics priorities, including the details of the strategy, such as benefits, deliverables and timescales; ensures all the Banks data and analytics transformation activities are consistent with the organisational strategy; keeps abreast of the latest technological developments (Recommendation 7); monitors progress; and holds its members to account for delivery of the strategy and key dependencies. As part of this, it will need to review what supporting structures it requires, including: subcommittees (for example, to ensure data and technology initiatives are consistent with agreed strategies and architectures); monitoring tools (for example, executive scorecards); and accountability devices (such as published documents and member objectives).footnote [17] Our external advisors recommended that governance structures should evolve over time as data maturity increases, suggesting the Bank requires stronger central direction at the early stages of the journey before it can move to a more decentralised approach.
The data analysis produced by Bank staff is highly regarded. Policy committees praise the staffs outputs and its centres of excellence conduct innovative analysis with data. The Bank also continues to rank among the most transparent of central banks. But, not unlike other specialist organisations, the Bank has wrestled in recent years with a range of barriers to making the most of the large amounts of data it acquires. Notwithstanding progress made in recent years, difficulties remain. As with other large, specialist organisations the Bank has found it difficult to combine different types of expertise and to collaborate effectively across business areas, and between business areas and central functions, with local areas preferring to develop their own solutions. A perhaps understandable risk aversion has contributed to: a relatively constrained approach to data sharing, both internally and externally, beyond that necessary due to statutory prohibitions; and a nervousness around adopting new technologies, notably cloud solutions, or working practices. Our recommendations focus on breaking down these institutional, cultural and technical barriers, through: strengthening collaboration, particularly through the use of partner roles linking central functions and local areas; articulating principles to guide greater sharing of data and analytics, internally and externally; strengthening the technological foundations of the Banks data and analytics, particularly by migrating to cloud; and finding ways to draw more extensively on external technical expertise. Continued development of a unified data and technology architecture, supported by improved governance structures, will also be crucial.
The Bank should consider structures that could strengthen collaboration and more effectively combine expertise. This applies across business areas, between business areas and central functions, and between different professions (particularly data, change management and technology specialists). Its business partnerships programme if fully implemented offers a promising start, focused on building collaboration between business areas and the data function. This could play a crucial role in helping: central functions understand desired business outcomes; business areas understand what is possible; and the Bank in ensuring that data projects can be incorporated within Bank-wide data and technology architectures. The Bank will need to monitor progress on business partnerships, including a balanced assessment of how business areas have engaged with it, perhaps at the Data and Analytics Board. Further action may be needed to reinforce cultural change. The Bank should review lessons from the partnerships and the newly established cross-Bank AI and data collection taskforces when considering the most effective ways to bring together people with common interests and expertise. More broadly, we came across interesting models at peer central banks, including those focused on combining data and technology experts with business area specialists to produce repeatable products. There are also a range of models (eg guilds, tribes) and delivery frameworks (eg the Data Management Capability Assessment Model) established in the data management profession for combining expertise.footnote [18] This will have wider implications, since CS2025 also proposes partnership models for the Banks Technology and People Directorates.
Greater openness around data and analytics, internally and externally, would foster greater scrutiny and challenge, helping the Bank gain additional insights and keep up with a rapidly evolving world the Bank produces large amounts of extremely valuable data and analysis, but much of it is only easily available to subsets of Bank staff. The Bank could adopt a presumption of sharing, but would need to further consider the implications and appropriate guardrails. The Bank has important legal obligations and constraints when it comes to sharing data but, within those, it should articulate and highlight a set of principles for disseminating data and analytics, internally and externally. Guiding principles would allow the Bank to consider how to safely open up wider access to data and analysis and might facilitate external collaboration. This is consistent with the IEOs Research Evaluation, which recommended that the Bank needed to support access to data for external co-authors to broaden the expertise and perspectives that the Bank can draw on. We note that some other organisations that face similar binding restrictions have found means to facilitate access to internal data, for example the ONS Secure Research Service.footnote [19] Moving to cloud (Recommendation 6) could help overcome technical barriers to sharing.
The Bank will need to develop an achievable plan to modernise most of its data and analytics practices, to avoid falling further behind a rapidly evolving frontier. A reliance on inefficient manual processes generates risks and staff frustration. A move to cloud would be the most powerful technological step the Bank could take to close this gap, supported by common standards (Recommendation 8) and upskilling (Recommendation 10). While a move to cloud is no panacea and brings new challenges, we have seen that peers and other organisations have been able to unlock capabilities through the use of modern tools and provide increased computing capacity. Cloud offers a range of enhanced capabilities that could improve data collection, discoverability (for example, automation of data cataloguing) and analysis, and allow some embedding of modern data management practices (Recommendation 8). This could include being more open to buying in off the shelf tools than is currently the case. Peers experience suggests cloud migration might also help with other issues such as efficient use of licenses, facilitating access for external experts (Recommendation 5) and obsolescence, with cloud providers keeping tools up to date. As a late adopter of cloud, the Bank can learn lessons from other organisations experience of making the transition.
The Bank will need to consider the role of its emerging centres of excellence in raising data maturity; centres of excellence like Advanced Analytics and local business area data science hubs have brought deep expertise into the Bank and, through collaboration, have helped develop others skills. The Bank should ask whether there is more it can learn from others experience of innovation hubs and how their role should evolve as maturity rises. Further mechanisms might include an external advisory board made up of experienced experts to provide challenge to the Data and Analytics Board; the Bank has used such arrangements effectively in a number of areas and the Monetary Authority of Singapore have used it for data and technology.footnote [20] footnote [21] Coaching senior staff on data could be expanded Bank-wide (Recommendation 1) and draw on lessons from elsewhere, for example reverse-mentoring, where talented analysts and data specialists coach senior staff on the art of the possible.
The Bank has long understood the importance of enabling individuals to work effectively with data. The 2014 data strategy focused on enabling business areas and encouraging individuals to get and make better use of data within their roles is a key outcome in the Banks 2021 data strategy, with DAT identifying data, tools, platforms, training and other support services as important enablers to facilitate that outcome. While the Bank has introduced data specialist roles and built up its training offering, developing data skills and embedding new, more modern approaches takes time. Recent technological developments have increased analytical capabilities and offer the potential to automate more processes, freeing staff time to focus more on higher value-added analysis. This may have implications for the skills the Bank wants to develop and how staff best work with each other. We have identified three recommendations to help the Bank make progress by ensuring staff have the support and skills they need: embed common standards to make data and analysis easily discoverable and repeatable; provide staff with accessible support and guidance across the data lifecycle; and develop a comprehensive data skills strategy encompassing hiring, training, retention and skills mix.
The Banks data and analytics guidance needs to be comprehensive, easier to find and better incentivised. The Bank has recently refreshed its guidance on data management, which will be launched this year. It would benefit from doing the same for analytical common standards, not least to ensure that greater use of programmatic analytical tools is suitably resilient. When these are established, the Bank should consider how to raise engagement and adherence. Training is one option, and is standard at induction in professional services firms, consultancies and banks. Other options include greater leadership championing, recognition of good practice in performance reviews, mandatory training, audits and individuals attesting to compliance with the standards through the Our Code process.footnote [22] In the past, the Bank has used the performance management process to influence behaviour, or enforced compliance top-down. Best practice could be built into future data and analytics platforms as part of cloud migration (Recommendation 6), which could include automation of data cataloguing which our advisors indicated was widely adopted in data-mature private sector firms.
The Bank could go further in joining up existing data and analytics support. This would materially enhance the support accessible to staff, who currently struggle to navigate the fragmented range of services available. A single front door, clearly visible on the desktop or intranet front page and spanning the data lifecycle, could effectively triage data and analytics requests, directing them to existing resources or escalating to deeper support, as appropriate. The Bank already operates elements of this approach in interactions between hubs, helpdesks and users joining it up would significantly ease the experience of staff. The Bank should define how this service would relate to that offered by business partners and how those business partners are resourced to meet any increased demand.
The Bank should develop a comprehensive data skills strategy, embedded within wider initiatives around talent and skills. Such a strategy will need to consider the career proposition for data specialists, including both data scientists and the technology specialists that support data work. It will need to articulate where skills should be developed inside the Bank, across all levels of seniority and supported by a training offer, or hired into the organisation. It will need to be informed by a clearly defined operating model (Recommendation 2) that articulates the role of data specialists relative to other skillsets in the organisation, including analysts and technology specialists. This should be linked to the People Directorates wider talent strategy. The Bank can draw lessons from the mix of approaches business areas have adopted when experimenting with building area-wide data skills.
Read more here:
IEO evaluation of the Bank of England's use of data to support its ... - Bank of England
Unveiling real-time economic insights with search big data – Phys.org
This article has been reviewed according to ScienceX's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:
by KeAi Communications Co.
close
Economic indicators released by the government are pivotal in shaping decisions across both the public and private sectors. However, a significant limitation of these indicators lies in their timeliness, as they rely on macroeconomic factors like inventory turnover and iron production. For instance, in the case of Japan Cabinet Office's Indexes of Business Conditions, the indices are typically released with a two-month delay.
To overcome the drawbacks of conventional macro-variable-driven techniques, a team of Japanese researchers developed a big data-driven method capable of providing accurate nowcasts for macroeconomic indicators. Importantly, this approach eliminates the need for aggregating semi-macroeconomic data and relies solely non-prescribed search engine query data (Search Big Data) obtained from a prominent search engine used by more than 60% of the nation's internet users.
"Our new model demonstrated the ability to forecast key Japanese economic indicators in real time (= nowcast), even amid the challenges posed by pandemic-related disruptions," said co-corresponding author of the study, Kazuto Ataka. "By leveraging search big data, the model identifies highly correlated queries and performs multiple regression analysis to provide timely and accurate economic insights."
Remarkably, the model showed adaptability and resilience even in the face of rapid economic shifts and unpredictable scenarios. Furthermore, in-depth analysis has revealed that economic activities are influenced not only by economic factors but also by fundamental human desires, including libido and desire for laughter. This underscores the complex interplay between human interests and economic developments.
close
"Our findings offer a nuanced perspective for understanding real-time economic trends. The model's outstanding performance in nowcasting during the pandemic represents a significant advancement over current methodologies, emphasizing the potential of incorporating various real-time data sources to enhance the precision of economic nowcasting," added Ataka.
The study, published in The Journal of Finance and Data Science, stands as a significant advancement in the field of economic nowcasting, opening avenues for more informed and timely decision-making in both the public and private sectors.
More information: Goshi Aoki et al, Data-Driven Estimation of Economic Indicators with Search Big Data in Discontinuous Situation, The Journal of Finance and Data Science (2023). DOI: 10.1016/j.jfds.2023.100106
Provided by KeAi Communications Co.
Follow this link:
Unveiling real-time economic insights with search big data - Phys.org
5 Data Structures That Every Data Scientist Should Learn – Analytics Insight
5 most common and important data structures that every data scientist should learn and master
Data structures are the building blocks of data science. They are the ways of organizing and storing data in a computer so that it can be accessed and manipulated efficiently. Data structures can affect the performance, complexity, and readability of your code. Therefore, it is important to learn the most common and useful data structures for data science. In this article, we will introduce you to 5 data structures that every data scientist should learn and how they can help you solve various data problems.
1. Stacks- Stacks are data structures that follows the Last In First Out (LIFO) principle. Elements are added and removed from the top of the stack. Stacks are efficient for implementing operations such as function calls and backtracking.
2. Queues- Queues are data structures that follows the First In First Out (FIFO) principle. Elements are added to the back of the queue and removed from the front of the queue. Queues are efficient for implementing operations such as job scheduling and message processing.
3. Trees- Trees are hierarchical data structures that consists of a set of nodes, where each node can have one or more children nodes. Trees are efficient for storing and searching data that has a hierarchical relationship, such as a file system or a directory of employees.
4. Heap- Heap is a data structure that maintains a sorted order of elements. Heaps are efficient for implementing priority queues and sorting algorithms.
5. Hash tables- Hash tables are data structures that maps keys to values. Hash tables are efficient for finding the value associated with a given key.
Here is the original post:
5 Data Structures That Every Data Scientist Should Learn - Analytics Insight
EDIH Data Scientist, School of Computer Science job with … – Times Higher Education
Applications are invited for a Temporary post of a EDIH Data Scientist within UCD School of Computer Science - CeADAR.
Applications are invited for the positions of EDIH Data Scientists in the newly established European Digital Innovation Hub (EDIH) for AI in Ireland as part of the CeADAR Centre - Ireland's Centre for Applied Artificial Intelligence. CeADAR has been successful in the Europe-wide competitive selection process to be the EDIH for AI in Ireland in addition to its continuing national status.
There are 4 key services that this AI EDIH will provide:
The EDIH is seeking experienced individuals who have a demonstrated successful track record in data science in industrial research settings (>2 years) or in academic centres. Individuals in this role are expected to have proven experience applying artificial intelligence, machine learning, computational statistics, and statistics to real world problems. The ideal candidate will have a keen interest in contributing to the development of proof of concepts to allow companies to leverage the benefits of state-of-the-art AI algorithms.
Relevant areas of interest include: deep learning, explainable AI, computer vision, privacy preserving machine learning, reinforcement learning, natural language processing, self and semi-supervised learning, and active learning.
Equality, Diversity and Inclusion
UCD is committed to creating an inclusive environment where diversity is celebrated, and everyone is afforded equality of opportunity. To that end the university adheres to a range of equality, diversity and inclusion policies. We encourage applicants to consult those policies here https://www.ucd.ie/equality/ . We welcome applications from everyone, including those who identify with any of the protected characteristics that are set out in our Equality, Diversity and Inclusion policy.
Salary Range: 53,000 - 59,000 per annum
Appointment on the above range will be dependent upon qualifications and experience.
Closing date: 17:00hrs (local Irish time) on 26th of October 2023.
Applications must be submitted by the closing date and time specified. Any applications which are still in progress at the closing time of 17:00hrs (Local Irish Time) on the specified closing date will be cancelled automatically by the system. UCD are unable to accept late applications.
UCD do not require assistance from Recruitment Agencies. Any CV's submitted by Recruitment Agencies will be returned.
Prior to application, further information (including application procedure) should be obtained from the Work at UCD website: https://www.ucd.ie/workatucd/jobs/
Continue reading here:
EDIH Data Scientist, School of Computer Science job with ... - Times Higher Education
"Missing Law of Nature" Proposes How Stars and Minerals Evolve … – Technology Networks
Register for free to listen to this article
Thank you. Listen to this article using the player above.
Want to listen to this article for FREE?
Complete the form below to unlock access to ALL audio articles.
An interdisciplinary study, drawing on expertise from fields including philosophy of science, astrobiology, data science, mineralogy and theoretical physics, has identified a previously overlooked aspect of Darwins theory of evolution. The research extends the theory beyond the traditional confines of biological life, and proposes a universal law applicable to an array of systems such as planetary bodies, stars, minerals and even atoms. The paper unveils what the authors term a missing law of nature that encapsulates an inherent principle shaping the evolution of complex natural systems.
The study was published in the Proceedings of the National Academy of Sciences.
Subscribe to Technology Networks daily newsletter, delivering breaking science news straight to your inbox every day.
The new work details a Law of Increasing Functional Information the tendency for systems composed of a mix of components to evolve towards increased complexity, diversity and patterning. The law is applicable to any system characterized by a multitude of configurations, living or non-living, where natural processes engender a plethora of arrangements, yet only a select few persist through a process termed selection for function.
Co-authorJonathan Lunine, the David C. Duncan Professor in the Physical Sciences and chair of astronomy in the College of Arts and Sciences at Cornell University, said that the paper was a true collaboration between scientists and philosophers to address one of the most profound mysteries of the cosmos: why do complex systems, including life, evolve toward greater functional information over time?"
The additional theory applies to systems, like cells or molecules, which are composed of parts that can be rearranged repeatedly by natural processes. While these phenomena can produce endless variation in structure, only a handful of these configurations tend to endure the law terms this selection for function. Darwins law looked at a purely biological form of function survival and reproduction. The new study suggests that this view can be widened to include other types of function. The third, termed novelty, embodies the propensity of evolving systems to venture into unprecedented configurations, occasionally culminating in novel characteristics.
The authors also draw parallels between biological evolution and the evolution of stars and minerals. Primordial minerals, they suggest, represented particularly stable atomic arrangements that then laid the groundwork for subsequent mineral generations and, subsequently, the emergence of life. The example of star structures shows how the tendency towards function can build complex systems the earliest stars, which were created just after the Big Bang, were composed of only two elements: hydrogen and helium. These were then built on to create the more than 100 elements that make up our periodic table today.
If increasing functionality of evolving physical and chemical systems is driven by a natural law, we might expect life to be a common outcome of planetary evolution, concluded Lunine.
Reference: Wong ML, Cleland CE, Arend D, et al. On the roles of function and selection in evolving systems. PNAS. 2023;120(43):e2310223120. doi:10.1073/pnas.2310223120
This article is a rework of a press release issued by Cornell University. Material has been edited for length and content.
See the rest here:
"Missing Law of Nature" Proposes How Stars and Minerals Evolve ... - Technology Networks
Mastering the Art of Data Cleaning in Python – KDnuggets
Data cleaning is a critical part of any data analysis process. It's the step where you remove errors, handle missing data, and make sure that your data is in a format that you can work with. Without a well-cleaned dataset, any subsequent analyses can be skewed or incorrect.
This article introduces you to several key techniques for data cleaning in Python, using powerful libraries like pandas, numpy, seaborn, and matplotlib.
Before diving into the mechanics of data cleaning, let's understand its importance. Real-world data is often messy. It can contain duplicate entries, incorrect or inconsistent data types, missing values, irrelevant features, and outliers. All these factors can lead to misleading conclusions when analyzing data. This makes data cleaning an indispensable part of the data science lifecycle.
Well cover the following data cleaning tasks.
Before getting started, let's import the necessary libraries. We'll be using pandas for data manipulation, and seaborn and matplotlib for visualizations.
Well also import the datetime Python module for manipulating the dates.
First, we'll need to load our data. In this example, we're going to load a CSV file using pandas. We also add the delimiter argument.
Next, it's important to inspect the data to understand its structure, what kind of variables we're working with, and whether there are any missing values. Since the data we imported is not huge, lets have a look at the whole dataset.
Heres how the dataset looks.
You can immediately see there are some missing values. Also, the date formats are inconsistent.
Now, lets take a look at the DataFrame summary using the info() method.
Heres the code output.
We can see that only the column square_feet doesnt have any NULL values, so well somehow have to handle this. Also, the columns advertisement_date, and sale_date are the object data type, even though this should be a date.
The column location is completely empty. Do we need it?
Well show you how to handle these issues. Well start by learning how to delete unnecessary columns.
There are two columns in the dataset that we dont need in our data analysis, so well remove them.
The first column is buyer. We dont need it, as the buyers name doesnt impact the analysis.
Were using the drop() method with the specified column name. We set the axis to 1 to specify that we want to delete a column. Also, the inplace argument is set to True so that we modify the existing DataFrame, and not create a new DataFrame without the removed column.
The second column we want to remove is location. While it might be useful to have this information, this is a completely empty column, so lets just remove it.
We take the same approach as with the first column.
Of course, you can remove these two columns simultaneously.
Both approaches return the following dataframe.
Duplicate data can occur in your dataset for various reasons and can skew your analysis.
Lets detect the duplicates in our dataset. Heres how to do it.
The below code uses the method duplicated() to consider duplicates in the whole dataset. Its default setting is to consider the first occurrence of a value as unique and the subsequent occurrences as duplicates. You can modify this behavior using the keep parameter. For instance, df.duplicated(keep=False) would mark all duplicates as True, including the first occurrence.
Heres the output.
The row with index 3 has been marked as duplicate because row 2 with the same values is its first occurrence.
Now we need to remove duplicates, which we do with the following code.
The drop_duplicates() function considers all columns while identifying duplicates. If you want to consider only certain columns, you can pass them as a list to this function like this: df.drop_duplicates(subset=['column1', 'column2']).
As you can see, the duplicate row has been dropped. However, the indexing stayed the same, with index 3 missing. Well tidy this up by resetting indices.
This task is performed by using the reset_index() function. The drop=True argument is used to discard the original index. If you do not include this argument, the old index will be added as a new column in your DataFrame. By setting drop=True, you are telling pandas to forget the old index and reset it to the default integer index.
For practice, try to remove duplicates from this Microsoft dataset.
Sometimes, data types might be incorrectly set. For example, a date column might be interpreted as strings. You need to convert these to their appropriate types.
In our dataset, well do that for the columns advertisement_date and sale_date, as they are shown as the object data type. Also, the date dates are formatted differently across the rows. We need to make it consistent, along with converting it to date.
The easiest way is to use the to_datetime() method. Again, you can do that column by column, as shown below.
When doing that, we set the dayfirst argument to True because some dates start with the day first.
You can also convert both columns at the same time by using the apply() method with to_datetime().
Both approaches give you the same result.
Now the dates are in a consistent format. We see that not all data has been converted. Theres one NaT value in advertisement_date and two in sale_date. This means the date is missing.
Lets check if the columns are converted to dates by using the info() method.
As you can see, both columns are not in datetime64[ns] format.
Now, try to convert the data from TEXT to NUMERIC in this Airbnb dataset.
Real-world datasets often have missing values. Handling missing data is vital, as certain algorithms cannot handle such values.
Our example also has some missing values, so lets take a look at the two most usual approaches to handling missing data.
If the number of rows with missing data is insignificant compared to the total number of observations, you might consider deleting these rows.
In our example, the last row has no values except the square feet and advertisement date. We cant use such data, so lets remove this row.
Heres the code where we indicate the rows index.
The DataFrame now looks like this.
The last row has been deleted, and our DataFrame now looks better. However, there are still some missing data which well handle using another approach.
If you have significant missing data, a better strategy than deleting could be imputation. This process involves filling in missing values based on other data. For numerical data, common imputation methods involve using a measure of central tendency (mean, median, mode).
In our already changed DataFrame, we have NaT (Not a Time) values in the columns advertisement_date and sale_date. Well impute these missing values using the mean() method.
The code uses the fillna() method to find and fill the null values with the mean value.
You can also do the same thing in one line of code. We use the apply() to apply the function defined using lambda. Same as above, this function uses the fillna() and mean() methods to fill in the missing values.
The output in both cases looks like this.
Our sale_date column now has times which we dont need. Lets remove them.
Well use the strftime() method, which converts the dates to their string representation and a specific format.
Original post:
Girls4Tech STEM program: Closing the gender gap in tech – Mastercard
In 2015 I met Eva M. when the 11-year-old attended our first Girls4Tech program expansion to Gurugram, India.
Three years ago, she interned with us in our Toronto office, and today shes a programmer analyst with Scotiabank. She credits her love of cybersecurity and computer programming to the hands-on, real-world activities she enjoyed with Girls4Tech.
I had the best time in that workshop, Eva recently told me. There was so much to learn. What made Girls4Tech different than any other workshop is that we did activities although on a smaller scale that would actually happen at Mastercard.
Nearly 10 years ago, we created Girls4Tech, our signature STEM education program, to showcase Mastercards technology and to help girls see that it takes all kinds of skills including ones they already possess, like curiosity and initiative to pursue a STEM career.
At the time, the number of girls pursuing STEM careers was at an all-time low not just in the U.S. but around the globe. In 2017, one in five boys said they would pursue STEM, while only one in 20 girls were interested in seeking those same degrees, according to the World Economic Forum.
Our goal was to create a program that would focus on girls and engage our employees as role models and mentors, highlighting their payments technology backgrounds. We believed this corporate-community partnership could help level the playing field and reduce the inequities between boys and girls pursuing STEM careers. Because we know this: When there are multiple voices with myriad experiences at the table, we will create better technology and better products and services for our customers.
Since the early 2000s, Im pleased to say there has been tremendous advocacy for girls in STEM, not just by us but by governments, major corporations and many nonprofits. Youd think with all this focus that the numbers would have changed dramatically. But according to Deloitte Global, the number of women in large technology companies has increased only 2.6% since 2019, and women represent just 33% of the population in tech roles. In the U.S., women make up only 28% of the STEM workforce, according to the American Association of University Women, and gender gaps are particularly high in some of the fastest-growing and highest-paid jobs of the future.
"We know this: When there are multiple voices, with myriad experiences at the table, we will create better technology and better products and services for our customers."
Susan Warner
So what does that tell us? There is more to be done. Capturing girls interest in STEM at age 8 or 10 is one thing; keeping that interest is another. STEM role models and mentoring programs are integral to fostering that interest. Thats why we will debut a new Girls4Tech mentoring and scholarship program in 2024.
Constant learning is also key. Parents, teachers and girls should check out upskilling programs like the ones Microsoft, Google and IBM have created. Stay on top of the trends did you know women make up only 25% of the cybersecurity workforce, a field that already suffers from an enormous shortage of professionals? Thats a STEM field just waiting for women to apply. And finally, when women join the STEM workforce, we need to retain them, so companies need to take a hard look at who is leaving these roles and why.
As we roll up our sleeves to get ready for the next 10 years, lets take a moment to celebrate Girls4Techs success. To date, weve reached 5.7 million girls two years earlier thanthe goal we announced in 2020, and according to research conducted by Euromonitor we are now the worlds largest STEM program designed for young girls.
Weve translated our program into 23 languages, and more than 7,000 employees have volunteered at in-person and digital events in 63 countries. Last week, for International Day of the Girl, we hosted a follow the sun event in which we welcomed girls at 15 events in eight countries.
Since the launch of our original Girls4Tech program in 2014, weve expanded the curriculum to include Girls4Tech Cyber and AI, Girls4Tech 2.0, Girls4Tech & Sports and Girls4Tech & Code, a 20-week coding and mentoring program. In August we launched our first Girls4Tech Python Bootcamp for underrepresented college women in tech. And while Girls4Tech was not designed to be a pipeline program at Mastercard, we are very pleased to announce the first full-time hiring of a G4T participant, Zainab Ibrahim, an associate product specialist in Cyber & Intelligence Solutions.
To extend our curriculum reach over the years, Girls4Tech has partnered with education organizations including Scholastic, We Are Teachers, American Indian Foundation and Teach for Ukraine. In 2020 we announced a partnership with Discovery Education to expand Girls4Tech by bringing cyber and AI careers to life for students in the U.S. and Canada. And were expanding our partnership and our work to include data science, AI and blockchain for 2024. As we look to support girls all over the world,Girls4Tech.comalso offers free STEM activities and resources in 10 languages for teachers and parents to encourage those interested in fun STEM activities.
Yes, theres more work to be done to create an equitable workforce in technology. But its women like Eva and Zainab (and Beatrice, Nahfila, Zoya, Rina I could go on) who keep us focused. Because we also know this: Every act matters, and together we can make a difference and change the equation.
Originally posted here:
Girls4Tech STEM program: Closing the gender gap in tech - Mastercard
IIT Madras partners with five startups for initiatives in emerging technologies – IndiaTimes
MUMBAI: IIT Madras Pravartak Technologies Foundation is partnering with start-ups for various strategic initiatives in emerging technologies. The key aspects of this collaboration include: industry-oriented skilling in niche technologies by start-ups and project execution in niche areas such as AI, ML and Data ScienceThe MoU was signed recently between IITM Pravartak and five startups - Crion Versity, Dataswitch, Neekan Consulting, Rudram Dynamics and Skill Angels. Those present on the occasion include Prof V Kamakoti, Director, IIT Madras, Prof. Mangala Sunder Krishnan, Professor Emeritus, IIT Madras, Dr MJ Shankar Raman, CEO, IIT Madras Pravartak Technologies Foundation and Mr. Balamurali Shankar, General Manager, Digital Skills Academy, IIT Madras. IITM Pravartak is funded by the Department of Science and Technology, Government of India, under its National Mission on Interdisciplinary Cyber-Physical Systems, and hosted as a Technology Innovation Hub (TIH) by IIT Madras.Highlighting the importance of this initiative, Prof V Kamakoti, director, IIT Madras, said on Tuesday, Start-ups must become leading employers and look at IIT Madras for their talent requirements. Start-ups in skilling sector should intervene early with students and impart cognitive ability, foundational maths and science skills for their success in higher education.IITM Pravartak Technologies is a Section 8 Company housing the Technology Innovation Hub on Sensors, Networking, Actuators and Control Systems. It is funded by Department of Science and Technology, Government of India, under its National Mission on Interdisciplinary Cyber-Physical Systems and hosted by IIT MadrasSpeaking about this collaboration, MJ Shankar Raman, CEO, IIT Madras Pravartak Technologies Foundation, said, We will work with these start-ups on niche areas like Drone pilot training, Data analytics, AI/ML and Generative AI. Our clients come to us for insights on their complex and sensitive unstructured data. We leverage startups like DataSwitch for such requirements MJ Shankar Raman added, One of our partners, Neekan Consulting, is a Technology, Process and Marketing consulting company enabling SMBs and start-ups in all industry domains. They work with us on Product and Program management apart from skilling freshers and make them job ready. Similarly, SkillAngels (based out of IITM research park) uses gamification, animation and adaptive learning strategies for Cognitive assessments and upskilling.The important outcomes expected from this collaboration include: newer ways of understanding cutting edge technologies through contents and platform belonging to start-ups and developing point solutions for nice problem areas in AI, ML and data science.Further, Balamurali Shankar, General Manager, Digital Skills Academy of IITM Pravartak, said, These start-ups have a combination of academic and industry expertise and thereby giving a good learning experience to the students. One of our partners, Crion Versity, founded by IIT Madras Alumni and come with a rich experience of running a digital twin organization Crion Technologies at IITM Research Park. Their flagship career experience programs provide engaging short form learnings on job skills in areas such as Data Analytics.Balamurali Shankar also mentioned that Rudram Dynamics offers specialized in programs such as drone pilot training, B2G (Business to Government) analytics programs as well as Cyber Law. With all these start-ups coming together, students and industry professionals have a variety of choices to upskill in their respective domains.
Read this article:
IIT Madras partners with five startups for initiatives in emerging technologies - IndiaTimes