Category Archives: Data Science
Heres how Data Science & Business Analytics expertise can put you on the career expressway – Times of India
Today, Data Science & Business Analytics has gained the status of being all-pervasive across functions and domains. The mammoth wings of data and analytics are determining how we buy our toothpaste to how we choose dating partners, to how we lead our lives. Nearly 90% of all small, mid-size, and large organizations have adopted analytical capabilities over the last 5 years to stay relevant in a market where large volumes of data are recorded every day. They use it to formulate solutions to build analysis models, simulate scenarios, understand realities and predict future states.
According to a recent report by LinkedIn, herere some of the fastest-growing in-demand jobs of the past year and the next few years to come. Hiring for the roles of Data Scientist, Data Science Specialist, Data Management Analyst, Statistical Modeling has gone up by 46% since 2019. While there has been a surge in job openings, there are also some common myths co-existing with them. Contrary to popular belief, you dont need a programming background or advanced math skills to learn Data Science and Business Analytics skills.
This is so because most of the tools and techniques are easy to use and find ubiquitous application in all domains and professionals from vastly different industries like BFSI, Marketing, Agriculture, Healthcare, Genomics, etc. Good knowledge of statistics will need to be developed though. Also, Data Science and Business Analytics is based on the use of common human intelligence that can be applied to solve any and all industry problems. Hence, you dont need a Fourier series or advanced mathematical algorithms to build analytical models. Math learned till 10+2 level is good enough and can serve as a starting base for professionals in all domains.
Herere a few of the best high-paying jobs worth pursuing in this field:1. Data Scientist
Data scientists have to understand the challenges of business and offer the best solutions using data analysis and data processing. For instance, they are expected to perform predictive analysis and run a fine-toothed comb through unstructured/disorganized data to offer actionable insights. They can also do this by identifying trends and patterns that can help the companies in making better decisions.
2. Data Architect
A data architect creates the blueprints for data management so that the databases can be easily integrated, centralized, and protected with the best security measures. They also ensure that the data engineers have the best tools and systems to work with. A career in data architecture requires expertise in data warehousing, data modelling, extraction transformation and load (ETL), etc. You also must be well versed in Hive, Pig, and Spark, etc.
3. Data Analyst
A data analyst interprets data to analyse results to a specific business problem or bottleneck that needs to be solved. It is different from the role of a data scientist, as they are involved in identifying and solving critical business problems that might add immense value if solved. They interpret data and analyse it using statistical techniques, improve statistical efficiency and quality along with implementing databases, data collection tools, and data analytics strategies. They help with data acquisition and database management, recognize patterns in complex data sets, filter Data and clean by reviewing regularly and perform analytics reporting.
4. Data Engineer
Todays companies make considerable investments in data, and the data engineer is the person who builds, upgrades, maintains and tests the infrastructure to ensure it can handle algorithms thought up by data scientists. They Develop and maintain architectures, align them with business requirements, identify ways to ensure data efficiency and reliability, perform predictive and prescriptive modelling, engage with stakeholders to update and explain regarding analytics initiatives. The good news is that the need for data engineers spans many different types of industries. As much as 46% of all data analytics and data engineering jobs originate from the banking and financial sector, but business analyst jobs can be found in e-commerce, media, retail, and entertainment industries as well.
5. Database Administrator
The database administrator oversees the use and proper functioning of enterprise databases. They also manage the backup and recovery of business-critical information. Learning about data backup and recovery, as well as security and disaster management, are crucial to moving up in this field. Youll also want to have a proficient understanding of business analyst courses like data modelling and design. They build high-quality database systems, enable data distribution to the right users, provide quick responses to queries and minimise database downtime, document and enforce database policies, ensure data security, privacy, and integrity, among other responsibilities.
6. Analytics ManagerAn analytics manager oversees all the aforementioned operations and assigns duties to the respective team leaders based on needs and qualifications. Analytics managers are typically well-versed in technologies like SAS, R, and SQL. They must understand business requirements, goals, objectives, source, configure, and implement analytics solutions, lead a team of data analysts, build systems for data analysis to draw actionable business insights and keep track of industry news and trends. Depending on your years of experience, the average Data Science and Business Analyst salary may range between 3,50,000-5,00,000. The lower end is the salary at an entry-level with less than one year of work experience, and the higher end is the salary for those having 1-4 years of work experience.
As your experience increases over time, the salary you earn increases as well. A Business Analyst with 5-9 years of industry experience can earn up to Rs. 8,30,975. Whereas a Senior Business Analyst with up to 15-years experience earns close to Rs. 12,09,787. The location you are situated in plays a significant role when it comes to compensation. A Business Analyst in Bangalore or Pune would earn around 12.9% and 17.7% more than the national average. Hyderabad (4.2% less), Noida (8.2% less), Chennai (5.2% less).
For those interested in upskilling, Great Learning has emerged as one of Indias leading professional learning services with a footprint in 140 countries. Delivered 55 million+ learning hours across the world. Top faculty and a curriculum formulated by industry experts have helped learners successfully transition to new domains and grow in their fields. Offers courses in one of the most trending topics of today Data Science and Business Analytics, Artificial Intelligence, etc.Their PG program in Data Science and Business Analytics is offered in collaboration with The University of Texas at Austin and Great Lakes Executive Learning. It is becoming a sought-after course among working professionals across industries.
Herere a few highlights:
1. 11-month program: With a choice of online and classroom learning experience. The classroom sessions strictly follow all COVID safety measures.2. World #4 Rank in Business Analytics: Analytics Ranking (2020) for Texas University3. Hours of learning: 210+ hours of classroom learning content, 225+ hours of online learning content4. Projects: 17 real-world projects guided by industry experts and one capstone project towards the end of the course
See more here:
Yelp data shows almost half a million new businesses opened during the pandemic – CNBC
People order breakfast at Bill Smith's Cafe, after Texas Governor Greg Abbott issued a rollback of coronavirus disease (COVID-19) restrictions in McKinney, Texas, March 10, 2021.
Shelby Tauber | Reuters
Since the World Health Organization declared the coronavirus a pandemic one year ago Thursday, new Yelp data showed nearly a half million businesses opened in America during that time, an optimistic sign of the state of the U.S. economic recovery.
Between March 11, 2020 and March 1, 2021, Yelp has seen more than 487,500 new businesses listing on its platform in the United States. That's down just 14% compared with the year-ago period. More than 15% of the new entities were restaurant and food businesses.
The novel coronavirus, first discovered in China, is believed to have surfaced in Wuhan in late 2019, before spreading rapidly around the world, infecting 118 million people and causing 2.6 million deaths, according to data from Johns Hopkins University.
Virus mitigation efforts in nations all over the world, including the U.S., have ranged from full lockdowns to partial closures to reduced capacity of nonessential businesses and services. Masks and social distancing have been a hallmark of the pandemic. The economic damage from the crisis was swift.
However, according to data compiled by Yelp, which has released local economic impact reports all throughout pandemic, more than 260,800 businesses that had closed due to Covid restrictions, reopened from March 11, 2020 until March 1. About 85,000 of them were restaurant and food businesses.
Justin Norman, vice president of data science at Yelp, sees optimism in the numbers.
"As more and more Americans continue to get vaccinated, case counts continue to lower, and Congress' Covid relief bill that offers additional aid is distributed, we anticipate businesses that were once struggling over the last year will bounce back," Norman told CNBC. "We see this evidenced through the 260,000 businesses that have been able to reopen after temporarily closing."
Of the almost half million new businesses that have opened, about 59% were within the "professional, local, home and auto" category on Yelp.
"The number of new business openings particularly the high number of new home, local, professional and auto services businesses also shows great potential for those industries in the future," Norman said.
Yelp said that certain trends borne out of the pandemic may be here to stay. As consumers spend more time at home, Yelp noted an uptick in interest in home improvement. The company saw that average review mentions for home office renovation increased by 75% year over year and bathroom renovations rose by 80%.
"I anticipate that we'll still see people invest in higher-quality home offices or improving their homes," Norman said. "With warmer summer months coming and the number of vaccines being administered continuing to increase, people who aren't planning to return to the office this year may focus on more home improvement projects."
Yelp's new business data also shows the restrictions brought on by the pandemic accelerated the need for businesses to adapt by using technology and changing the ways they interact with their customers.
Of the new business openings, the number of food trucks climbed 12% and food delivery businesses were up 128%."The increase in food delivery services would have easily been predicted, although we may not have predicted they would stay on the rise a year later," Norman said.
He also said he was surprised by how local businesses have incorporated the tools technology offers. "It's been incredibly impressive and encouraging to see how much local businesses, both in large cities and smaller towns, have embraced technology to serve customers during this challenging time."
Yelp also saw changes in the ways companies specifically interacted with its app. In 2020, 1.5 million businesses updated their hours through Yelp, 500,000 indicated that they were offering virtual services, and more than 450,000 businesses crafted a custom message at the top of their page, to speak directly to customers.
In addition to the positive data about food delivery and restaurants, Norman was surprised to see some trends through the year that indicated a change in how consumers engaged with everyday life. Yelp saw that consumer interest in psychics increased 74% year over year and astrologers rose by 63%. Yelp measures consumer interest in page views, posts, or reviews.
"It was also surprising to find that consumer interest in notaries were up 52% on Yelp, as many federal and state rules allowed remote notarization," Norman said. "While Yelp data can't provide an in-depth look into what people were notarizing over the last year, Yelp data does show a trend of couples holding smaller, more intimate weddings, instead of more traditional large wedding celebrations, as well as the housing market seeing an astounding demand coupled with low interest rates and housing prices in certain markets."
A year on, it's clear that the businesses that have survived have had to find new ways to operate, and that many of the changes will be permanent."We've seen more and more businesses embrace app-enabled delivery, software tools like reservations and waitlist and consumer-oriented communications tools like the Covid health and safety measures. The digital local business is here to stay," Norman said.
See more here:
Yelp data shows almost half a million new businesses opened during the pandemic - CNBC
SoftBank Joins Initiative to Train Diverse Talent in Data Science and AI – Entrepreneur
The alliance aims to train and improve the skills of underrepresented communities seeking opportunity.
Stay informed and join our daily newsletter now!
February18, 20213 min read
SoftBank Group Corp, as part of its Academy of Artificial Intelligence (AI), announced on February 18 its support for Data Science for All / Empowerment (DS4A / Empowerment, for its acronym in English). This alliance aims to train and improve the skills of underrepresented communities seeking opportunities in the field of data science.
Developed by Correlation One, DS4A / Empowerment aims to train 10,000 people giving priority to Afro-descendants, Latinos, women, LGBTQ + and United States military veterans, over the next three years, providing new paths to economic opportunities in one of the fastest growing industries in the world.
The SoftBank AI Academy supports programs that complement the theoretical training of traditional technical education courses with practical lessons, including artificial intelligence and data management skills that can be immediately applied to business needs.
DS4A / Empowerment will provide training to employees of SoftBank Group International portfolio companies, including the Opportunity Fund and Latam Fund, as well as external candidates from the United States and Latin America, including Mexico.
The program is specifically designed to address gender equity and talent gaps in a field that has historically been inaccessible to many people, leading to a significant under -representation of women and Afro-descendants. Participants will work on real case studies that are expected to have a measurable impact on the operating performance of participating companies.
IDB Lab, the innovation laboratory of the Inter-American Development Bank Group, will join SoftBank and provide more than 10 full scholarships to underrepresented candidates in Latin America, while Beacon Council will offer 4 full scholarships for underrepresented candidates based in Miami.
Program participants will receive 13 weeks of data and analytics training (including optional Python training) while working on case studies and projects, including projects presented by SoftBank's portfolio of companies. The initiative will also link participants with mentors who will provide career development and guidance. Upon completion of the program, external participants will be connected to employment opportunities at SoftBank and leading companies in the business, financial services, technology, healthcare, consulting and consumer sectors.
DS4A Empowerment is an online program taught in English over a period of 13 weeks. Classes will be held on Saturdays from 10:00 am to 8:00 pm (Eastern Time, ET), beginning April 17, 2021.
The program registration period ends on March 7, 2021. Applicants who might consider applying include employees from the portfolio of companies affiliated with SoftBank in the region, as well as software engineers, technical product managers, technical marketers and anyone with a background in STEM who is interested in learning data analysis. To apply and learn more about the program, interested candidates can visit the official website of DS4A Empowerment .
Originally posted here:
SoftBank Joins Initiative to Train Diverse Talent in Data Science and AI - Entrepreneur
Participating in SoftBank/ Correlation One Initiative – Miami – City of Miami
Published on February 22, 2021
(Miami, FL - February 22, 2021)-The City of Miami has been named as a participant in Data Science for All / Empowerment (DS4A / Empowerment), a new effort designed to upskill and prepare job-seekers from underserved communities for data science careers. The initiative is being backed by SoftBank Group International (SoftBank) as part of its AI Academy, and was developed by Correlation One. It aims to train at least 10,000 people from underrepresented communities including Greater Miami over the next three years, providing new pathways to economic opportunity in the worlds fastest-growing industries.
We need talent with a deep understanding of data science to build the companies of the future, said Marcelo Claure, CEO of SoftBank Group International. Were proud to support this effort, continue to upskill our portfolio companies and train more than 10,000 people from underrepresented communities with critical technical skills.
SoftBanks AI Academy supports programs that supplement theoretical training of traditional technical education courses with practical lessons, including AI and data skills that can be immediately applied to common business needs.
DS4A / Empowerment will provide training for SoftBank Group International portfolio company employees, including portfolio companies of the Opportunity Fund and Latin America Fund, as well as external candidates from the U.S. and Latin America. The program is specifically designed to address talent and equity gaps in a field that has historically been inaccessible for many workers, leading to significant underrepresentation of women and non-white individuals. Participants work on real-world case studies that are expected to have measurable impact on the operational performance of participating companies.
IDB Lab will join SoftBank by providing over ten full-ride Fellowships to underrepresented candidates in Latin America while the Miami-Dade Beacon Council will provide four full-ride fellowships for underrepresented candidates based in Miami.
In addition, The City of Miami will join as an impact partner by providing twenty fellowships to Miami talent and five fellowships to public sector workers.
"As Miami grows as a tech hub, it is important that we empower local entrepreneurs and the public sector to leverage the power of AI. We are proud to support the building of a diverse data-fluent community in Miami through our partnership with Correlation One and SoftBank, said Francis Suarez, the Mayor of Miami.
Participants in the program will receive 13 weeks of data and analytics training (plus optional Python training) while working on case studies and projects, including projects submitted by SoftBank portfolio companies. The initiative will also connect participants with mentors who will provide professional development and career coaching. At the end of the program, external participants will be connected with employment opportunities at SoftBank and leading enterprises across business, financial services, technology, healthcare, consulting, and consumer sectors.
Miamis success hinges on dramatically expanding opportunity across our community and building a workforce with the skills for the jobs of tomorrow, said Matt Haggman, Executive Vice President of The Beacon Council. This program is an important step towards creating the innovative and equitable future we can - and must - achieve.
Training our residents to take on the jobs of the future is critical to ensuring that economic growth is shared across all communities, and to building our local talent so that more leading companies in fields like tech and data science can put down roots in Miami-Dade, said Daniella Levine-Cava, Miami-Dade County's Mayor. Im thrilled that this program is unlocking opportunities in a field that has historically been inaccessible for so many, and creating new, inclusive pathways to prosperity in one of the worlds fastest-growing industries.
The COVID-19 pandemic has both accelerated demand for data science talent and exacerbated the access gaps that kept so many aspiring workers locked out of opportunity, said Rasheed Sabar and Sham Mustafa, Co-CEOs and Co-Founders of Correlation One. We are grateful to work with innovative employers like SoftBank that are stepping up to play a more direct role in helping the workforce prepare themselves for jobs of the future.
Program and Registration DetailsDS4A Empowerment is an online program delivered in English over a 13-week period. Classes will convene on Saturdays from 10:00am to 8:00pm ET, beginning on April 17, 2021.Registration for the program ends on March 7, 2021. Candidates who should consider applying include employees of SoftBank affiliated portfolio companies in the region as well as software engineers, technical product managers, technical marketers, and anyone with a STEM background who is interested in learning data analysis. To apply and find out more information about the program, interested candidates can visit the official DS4A Empowerment website:https://c1-web.correlation-one.com/ds4a-empowermentFor program related inquiries, please contactds4aempowerment@correlation-one.com.
About SoftBankThe SoftBank Group invests in breakthrough technology to improve the quality of life for people around the world. The SoftBank Group is comprised of SoftBank Group Corp. (TOKYO: 9984), an investment holding company that includes telecommunications, internet services, AI, smart robotics, IoT and clean energy technology providers; the SoftBank Vision Funds, which are investing up to $100 billion to help extraordinary entrepreneurs transform industries and shape new ones; and the SoftBank Latin America Fund, the largest venture fund in the region. To learn more, please visithttps://global.softbankAbout Correlation OneCorrelation One is on a mission to build the most equitable vocational school of the future. We believe that data literacy is the most important skill for the future of work. We make data fluency a competitive edge for firms through global data science competitions, rigorous data skills assessments, and enterprise-focused data science training.
Correlation One's solutions are used by some of the most elite employers all around the world in finance, technology, healthcare, insurance, consulting and governmental agencies. Since launching in 2015, Correlation One has built an expert community of 250,000+ data scientists and 600+ partnerships with leading universities and data science organizations in the US, UK, Canada, China, and Latin America.
https://www.correlation-one.com/about
Media Contacts:
For SoftBank:Laura Gaviria HalabyLaura.gaviria@softbank.com
City for Miami Media:Stephanie Severinosseverino@miamigov.com
For Beacon Council:Maria Budetmbudet@beaconcouncil.com
Go here to read the rest:
Participating in SoftBank/ Correlation One Initiative - Miami - City of Miami
Increasing Access to Care with the Help of Big Data | Research Blog – Duke Today
Artificial intelligence (AI) and data science have the potential to revolutionize global health. But what exactly is AI and what hurdles stand in the way of more widespread integration of big data in global health? Dukes Global Health Institute (DGHI) hosted a Think Global webinar Wednesday, February 17th to dive into these questions and more.
The webinars panelists were Andy Tatem (Ph.D), Joao Vissoci (Ph.D.), and Eric Laber (Ph.D.), moderated by DGHIs Director of Research Design and Analysis Core, Liz Turner (Ph.D.). Tatem is a professor of spatial demography and epidemiology at the University of South Hampton and director of WorldPop. Vissoci is an assistant professor of surgery and global health at Duke University. Laber is a professor of statistical science and bioinformatics at Duke.
Tatem, Vissoci, and Laber all use data science to address issues in the global health realm. Tatems work largely utilizes geospatial data sets to help inform global health decisions like vaccine distribution within a certain geographic area. Vissoci, who works with the GEMINI Lab at Duke (Global Emergency Medicine Innovation and Implementation Research), tries to leverage secondary data from health systems in order to understand issues of access to and distribution of care, as well as care delivery. Laber is interested in improving decision-making processes in healthcare spaces, attempting to help health professionals synthesize very complex data via AI.
All of their work is vital to modern biomedicine and healthcare, but, Turner said, AI means a lot of different things to a lot of different people. Laber defined AI in healthcare simply as using data to make healthcare better. From a data science perspective, Vissoci said, [it is] synthesizing data an automated way to give us back information. This returned info is digestible trends and understandings derived from very big, very complex data sets. Tatem stated that AI has already revolutionized what we can do and said it is powerful if it is directed in the right way.
We often get sucked into a science-fiction version of AI, Laber said, but in actuality it is not some dystopian future but a set of tools that maximizes what can be derived from data.
However, as Tatem stated, [AI] is not a magic, press a button scenario where you get automatic results. A huge part of work for researchers like Tatem, Vissoci, and Laber is the harmonization of working with data producers, understanding data quality, integrating data sets, cleaning data, and other back-end processes.
This comes with many caveats.
Bias is a huge problem, said Laber. Vissoci reinforced this, stating that the models built from AI and data science are going to represent what data sources they are able to access bias included. We need better work in getting better data, Vissoci said.
Further, there must be more up-front listening to and communication with end-users from the very start of projects, Tatem outlined. By taking a step back and listening, tools created through AI and data science may be better met with actual uptake and less skepticism or distrust. Vissoci said that direct engagement with the people on the ground transforms data into meaningful information.
Better structures for meandering privacy issues must also be developed. A major overhaul is still needed, said Laber. This includes things like better consent processes for patients to understand how their data is being used, although Tatem said this becomes very complex when integrating data.
Nonetheless the future looks promising and each panelist feels confident that the benefits will outweigh the difficulties that are yet to come in introducing big data to global health. One cool example Vissoci gave of an ongoing project deals with the influence of environmental change through deforestation in the Brazilian Amazon on the impacts of Indigenous populations. Through work with heavy multidimensional data, Vissoci and his team also have been able to optimize scarcely distributed Covid vaccine resource to use in areas where they can have the most impact.
Laber envisions a world with reduced or even no clinical trials if randomization and experimentation are integrated directly into healthcare systems. Tatem noted how he has seen extreme growth in the field in just the last 10 to 15 years, which seems only to be accelerating.
A lot of this work has to do with making better decisions about allocating resources, as Turner stated in the beginning of the panel. In an age of reassessment about equity and access, AI and data science could serve to bring both to the field of global health.
Post by Cydney Livingston
See original here:
Increasing Access to Care with the Help of Big Data | Research Blog - Duke Today
Rochester to advance research in biological imaging through new grant – University of Rochester
February 22, 2021
A new multidisciplinary collaboration between the University of Rochesters departments of biology, biomedical engineering, and optics and the Goergen Institute for Data Science will establish an innovative microscopy resource on campus, allowing for cutting-edge scientific research in biological imaging.
Michael Welte, professor and chair of the Department of Biology, is the lead principal investigator of the project, which was awarded a $1.2 million grant from the Arnold and Mabel Beckman Foundation.
The grant supports an endeavor at the intersection of optics, data science, and biomedical research, and the University of Rochester is very strong in these areas, Welte says. The University has a highly collaborative culture, and the close proximity of our college and medical center makes Rochester ideally suited to lead advances in biological imaging.
The project will include developing and building a novel light-sheet microscope that employs freeform optical designs devised at Rochester. The microscope, which will be housed in a shared imaging facility in Goergen Hall and is expected to be online in 2022, enables three-dimensional imaging of complex cellular structures in living samples. Researchers and engineers will continually improve the microscope, and it will eventually become a resource for the entire campus research community.
The optical engineers working on this project will take light-sheet technology into new domains, says Scott Carney, professor of optics and director of Rochesters Institute of Optics, who is a co-principal investigator on the project. They will transform a precise, high-end microscope into a workhorse for biologists working at the cutting edge of their disciplines to make discoveries about the very fabric of life at the cellular and subcellular level.
The microscope will produce large amounts of data that will require new methods to better collect, analyze, and store the images.
These efforts will focus on developing algorithms for computational optical imaging and automated biological image analysis, as well as on big data management, says Mujdat Cetin, a professor of electrical and computer engineering and the Robin and Tim Wentworth Director of the Goergen Institute for Data Science. Cetin is also a co-principal investigator on the project.
While many other research microscopes illuminate objects pixel by pixel, light-sheet technology illuminates an entire plane at once. The result is faster imaging with less damage to samples, enabling researchers to study biological processes in ways previously out of reach.
In addition to funding the construction of the microscope and development of the data science component, the grant from the Arnold and Mabel Beckman Foundation supports three biological research projects:
Not only am I excited about each of the individual projectsfrom intimate looks at bacteria to finding new ways to analyze imagesI am absolutely thrilled about the prospect of building something even bigger and better via the close collaboration of disciplines Rochester excels at individually: optics, data science, and biomedical research, Welte says. I believe this joint endeavor is only the first in a long line that will establish Rochester as a leader in biological imaging.
Tags: Anne S. Meyer, Arts and Sciences, Dan Bergstralh, Department of Biology, Goergen Institute for Data Science, grant, Hajim School of Engineering and Applied Sciences, James McGrath, Michael Welte, Mujdat Cetin, Richard Waugh, Scott Carney
Category: Science & Technology
More:
Rochester to advance research in biological imaging through new grant - University of Rochester
Learn About Innovations in Data Science and Analytic Automation on an Upcoming Episode of the Advancements Series – Yahoo Finance
Explore the importance of analytics in digital transformation efforts.
JUPITER, Fla., Feb. 18, 2021 /PRNewswire-PRWeb/ -- The award-winning series, Advancements with Ted Danson, will focus on recent developments in data science technology, in an upcoming episode, scheduled to broadcast 2Q/2021.
In this segment, Advancements will explore how Alteryx uses data science to enable its customers to solve analytics use cases. Viewers will also learn how Alteryx accelerates digital transformation outcomes through analytics and data science automation. Spectators will see how, regardless of user skillset, the code-free and code-friendly platform empowers a self-service approach to upskill workforces, while speeding analytic and high-impact outcomes at scale.
"As digital transformation accelerates across the globe, the ability to unlock critical business insights through analytics is of the utmost importance in achieving meaningful outcomes," said Alan Jacobson, chief data and analytics officer of Alteryx. "Alteryx allows data workers at almost any experience level to solve complex problems with analytics and automate processes for business insights and quick wins. We look forward to sharing our story with the Advancements audience and to exploring how analytics and data science will shape the technology landscape of the future."
The segment will also uncover how the platform accelerates upskilling across the modern workforce, while furthering digital transformation initiatives and leveraging data science analytics to drive social outcomes.
"As a proven leader in analytics and data science automation, we look forward to highlighting Alteryx and to educating viewers about the importance of analytics," said Richard Lubin, senior producer for Advancements.
About Alteryx: As a leader in analytic process automation (APA), Alteryx unifies analytics, data science, and business process automation in one, end-to-end platform to accelerate digital transformation. Organizations of all sizes, all over the world, rely on the Alteryx Analytic Process Automation Platform to deliver high-impact business outcomes and the rapid upskilling of the modern workforce. Alteryx is a registered trademark of Alteryx, Inc. All other product and brand names may be trademarks or registered trademarks of their respective owners.
Story continues
For more information visit http://www.alteryx.com.
About Advancements and DMG Productions: The Advancements series is an information-based educational show targeting recent advances across a number of industries and economies. Featuring state-of-the-art solutions and important issues facing today's consumers and business professionals, Advancements focuses on cutting-edge developments, and brings this information to the public with the vision to enlighten about how technology and innovation continue to transform our world.
Backed by experts in various fields, DMG Productions is dedicated to education and advancement, and to consistently producing commercial-free, educational programming on which both viewers and networks depend.
For more information, please visit http://www.AdvancementsTV.com or call Richard Lubin at 866-496-4065.
Media Contact
Sarah McBrayer, DMG Productions, 866-496-4065, info@advancementstv.com
SOURCE Advancements with Ted Danson
Link:
Symposium aimed at leveraging the power of data science for promoting diversity – Penn State News
UNIVERSITY PARK, Pa. Data science can be a useful tool and powerful ally in enhancing diversity. A group of data scientists are holding Harnessing the Data Revolution to Enhance Diversity, a symposium aimed at discussing the issues, identifying opportunities and initiating the next steps toward improving equity and diversity in academia at the undergraduate and faculty levels.
The online event, organized by the Institute for Computational and Data Sciences and co-sponsored by the Office of the Vice Provost for Educational Equity and the Center for Social Data Analytics, will be held from 1 to 3:30 p.m. on March 16 and 17, and is scheduled to include 10 30-minute talks and a roundtable discussion. Organizers added that the event is designed to help form new collaborations and identify cutting-edge approaches that can enhance diversity at Penn State and in higher education across the country.
This symposium will bring together researchers from across the computational and social sciences to explore how we can build more diverse communities of researchers that are sensitive to how computational and data science can shape how diverse populations are impacted by change, said Jenni Evans, professor ofmeteorology and atmospheric scienceand ICDS director.
Speakers from across the U.S. will discuss issues ranging from quantifying and contextualizing diversity-related issues to examining approaches that have and havent worked in academia.
Ed O'Brien, associate professor of chemistry and ICDS co-hire, said data science offers several tools to promote diversity, equity and inclusion.
This symposium is bringing together diverse academic communities to explore how data science can be utilized to enhance diversity, equity and inclusion, said OBrien. Leveraging advances in big data and artificial intelligence holds the promise of complementing and accelerating a range of initiatives in this area.
Some of the topics the speakers and participants will address include how to identify a diversity-related goal, how to quantify and contextualize the challenge of increasing diversity, and analyzing approaches that have and have not worked.
Find out more and register for the symposium at https://icds.psu.edu/diversity.
Last Updated February 17, 2021
Read the rest here:
Symposium aimed at leveraging the power of data science for promoting diversity - Penn State News
How Intel Employees Volunteered Their Data Science Expertise To Help Costa Rica Save Lives During the Pandemic – CSRwire.com
Published 02-19-21
Submitted by Intel Corporation
We Are Intel
What do you do when a terrifying pandemic that has shaken the globe threatens to overwhelm your country? For Intel employees in Costa Rica, the answer was to offer their problem-solving expertiseand over 1,000 hours of highly technical workhelping the government develop an effective response plan.
In the early days of the pandemic, the uncertainty of how quickly the virus would spread and how severely it would impact communities made healthcare availability and resources a top concern. Feeling they could use their technical expertise to help, a group of Intel employees reached out to the Caja Costarricence de Seguro Social (CCSS), Costa Ricas main agency responsible for its public health sector.
Luis D. Rojas, one of the volunteer co-leads for the Intel team (seen here wearing a plaid shirt), laid the groundwork with CCSS to understand where help was needed, and with that, the team quickly began iterating on a statistical model to project anticipated demand for hospital beds and ICU capacity. With their combined expertise in data science, statistical process control, and machine learning system deployment, the team was able to pool their areas of knowledge to present their model and recommendations to the CCSS agency, and even the President. Ultimately, their project became one of the key modeling systems used by the government to inform their pandemic response.
Luis, and the rest of the Costa Rica team provided whats called skills-based volunteering, in which volunteers leverage the skills they possess the most expertise in to help address community concerns challenges.
What motivated me to help was my parents, said Jonathan Sequeira Androvetto, data scientist and volunteer co-lead. Both of my parents are in the high-risk population, and knowing that what I was doing was helping my country and my family was incredible. Volunteering in this way gave me a lot of positive energy it was recharging.
Jonathan and the other volunteers used their expertise in data science and statistics to help the government understand how their containment policies would affect virus reproduction rates and potential hospital and ICU utilization. The team also developed a dashboard intended to be shared with local governments to summarize the state of their cities in terms of how the pandemic is behaving. The dashboard includes metrics such as the growth rates of certain reproduction rates (R) / active cases in respective cities, as well as a 21-day projection of new cases / active cases, if the R trend is sustained in the near future.
You get back more than you give, shared volunteer co-lead and machine learning engineer Jenny Peraza, on why she went above and beyond to apply her skillset to supporting the governments pandemic response. When I began this project, it was from a place of intellectual curiosity. It evolved into so much more, and its been incredibly gratifying to be able to put my process excellence and statistical knowledge towards successfully containing and managing this problem.
The Costa Rica team proved that a coordinated volunteer effort, especially one that maximizes specific skills, can make an increased impact on local and national communities. From a small idea to a national effort, the team was able to come together to help a community much larger than themselves.
When the pandemic is over, both Jenny and Jonathan have plans to continue their skills-based volunteering. Jonathan shared, Giving back to my country in this way has been phenomenal, and the positive energy and relationships Ive made through the course of this project both within and outside of Intel have meant so much. Added Jenny, There are a wealth of resources and opportunities out there, sometimes its just a matter of finding the right fit for you to apply your knowledge and skills to help make your community stronger.
At Intel, corporate responsibility means doing what is right. Respecting people and the world around us. Its how we do business.
More from Intel Corporation
Originally posted here:
A Comprehensive Guide to Scikit-Learn – Built In
Scikit-learn is a powerful machine learning library that provides a wide variety of modules for data access, data preparation and statistical model building. It has a good selection of clean toy data sets that are great for people just getting started with data analysis and machine learning. Even better, easy access to these data sets removes the hassle of searching for and downloading files from an external data source. The library also enables data processing tasks such as imputation, data standardization and data normalization. These tasks can often lead to significant improvements in model performance.
Scikit-learn also provides a variety of packages for building linear models, tree-based models, clustering models and much more. It featuresan easy-to-use interface for each model object type, which facilitates fast prototyping and experimentation with models. Beginners in machine learning will also find the library useful since each model object is equipped with default parameters that provide baseline performance. Overall, Scikit-learn provides many easy-to-use modules and methods for accessing and processing data and building machine learning models in Python. This tutorial will serve as an introduction to some of its functions.
Scikit-learn is a powerful machine learning library that provides a wide variety of modules for data access, data preparation and statistical model building.Scikit-learn also provides a variety of packages for building linear models, tree-based models, clustering models and much more. It featuresan easy-to-use interface for each model object type, which facilitates fast prototyping and experimentation with models.
Scikit-learn provides a wide variety of toy data sets, whichare simple, clean, sometimes fictitiousdata sets that can be used for exploratory data analysis and building simple prediction models. The ones available in Scikit-learn can be applied to supervised learning tasks such as regression and classification.
For example, it has a set called iris data, which contains information corresponding to different types of iris plants. Users can employ this data for building, training and testing classification models that can classifytypes of iris plants based on their characteristics.
Scikit-learn also has a Boston housing data set, which contains information on housing prices in Boston. This data is useful for regression tasks like predicting the dollar value of a house. Finally, the handwritten digits data set is an image data set that is great for building image classification models. All of these data sets are easy to load using a few simple lines of Python code.
To start, lets walk through loading the iris data. We first need to import the pandas and numpy packages:
Next, we relax the display limits on the columns and rows:
We then load the iris data from Scikit-learn and store it in a pandas data frame:
Finally, we print the first five rows of data using the head() method:
We can repeat this process for the Boston housing data set. To do so, lets wrap our existing code in a function that takes a Scikit-learn data set as input:
We can call this function with the iris data and get the same output as before:
Now that we see that our function works, lets import the Boston housing data and call our function with the data:
Finally, lets load the handwritten digits data set, which contains images of handwritten digits from zero through nine. Since this is an image data set, its neithernecessary nor useful to store it in a data frame. Instead, we can display the first five digits in the data using the visualization library matplotlib:
And if we call our function with load_digits(), we get the following displayed images:
I cant overstate the ease with which a beginner in the field can access these toy data sets. These sets allow beginners to quickly get their feet wet with different types of data and use cases such as regression, classification and image recognition.
Scikit-learn also provides a variety of methods for data processing tasks. First, lets take a look at data imputation, which is the process of replacing missing data and is important because oftentimes real data contains either inaccurate or missing elements. This can result in misleading results and poor model performance.
Being able to accurately impute missing values is a skill that both data scientists and industry domain experts should have in their toolbox. To demonstrate how to perform data imputation using Scikit-learn, well work with the University of California, Irvines data set on housing electric power consumption, which is available here. Since the data set is quite large, well take a random sample of 40,000 records for simplicity and store the down-sampled data in a separate csv file called hpc.csv:
As we can see, the third row (second index) contains missing values specified by ? and NaN. The first thing we can do is replace the ? values with NaN values. Lets demonstrate this with Global_active_power:
We can repeat this process for the rest of the columns:
Now, to impute the missing values, we import the SimpleImputer method from Scikit-learn. We will define an imputer object that simply imputes the mean for missing values:
And we can fit our imputer to our columns with missing values:
Store the result in a data frame:
Add back the additional date and time columns:
And print the first five rows of our new data frame:
As we can see, the missing values have been replaced.
Although Scikit-learns SimpleImputer isnt the most sophisticated imputation method, it removes much of the hassle around building a custom imputer. This simplicity is useful for beginners who are dealing with missing data for the first time. Further, it serves as a good demonstration of how imputation works. By introducing the process, it can motivate more sophisticated extensions of this type of imputation such as using a statistical model to replace missing values.
Data standardization and normalization are also easy with Scikit-learn. Both of these are useful in machine learning methods that involve calculating a distance metric like K-nearest neighbors and support vector machines. Theyre also useful in cases where we can assume the data are normally distributed and for interpreting coefficients in linear models to be of variable importance.
Standardization is the process of subtracting values in numerical columns by the mean and scaling to unit variance (through dividing by the standard deviation). Standardization is necessary in cases where a wide range of numerical values may artificially dominate prediction outcomes.
Lets consider standardizing the Global_intensity in the power consumption data set. This column has values ranging from 0.2 to 36. First, lets import the StandardScalar() method from Scikit-learn:
Data normalization scales a numerical column such that its values are between 0 and 1. Normalizing data using Scikit-learn follows similar logic to standardization. Lets apply the normalizer method to the Sub_metering_2 column:
Now we see that the min and max are 1.0 and 0.
In general, you should standardize data if you can safely assume its normally distributed. Conversely, if you can safely assume that your data isnt normally distributed, then normalization is a good method for scaling it. Given that these transformations can be applied to numerical data with just a few lines of code, the StandardScaler() and Normalizer() methods are great options for beginners dealing with data fields that have widely varying values or data that isnt normally distributed.
Scikit-learn also has methods for building a wide array of statistical models, including linear regression, logistic regression and random forests. Linear regression is used for regression tasks. Specifically, it works for the prediction of continuous output like housing price, for example. Logistic regression is used for classification tasks in which the model predicts binary output or multiclass like predicting iris plant type based on characteristics. Random forests can be used for both regression and classification. Well walk through how to implement each of these models using the Scikit-learn machine learning library in Python.
Linear regression is a statistical modeling approach in which a linear function represents the relationship between input variables and a scalar response variable. To demonstrate its implementation in Python, lets consider the Boston housing data set. We can build a linear regression model that uses age as an input for predicting the housing value. To start, lets define our input and output variables:
Next, lets split our data for training and testing:
Now lets import the linear regression module from Scikit-learn:
Finally, lets train, test and evaluate the performance of our model using R^2 and RMSE:
Since we use one variable to predict a response, this is a simple linear regression. But we can also use more than one variable in a multiple linear regression. Lets build a linear regression model with age (AGE), average number of rooms (RM), and pupil-to-teacher ratio (PTRATION). All we need to do is redefine X (input) as follows:
This gives the following improvement in performance:
Linear regression is a great method to use if youre confident that there is a linear relationship between input and output. Its also useful as a benchmark against more sophisticated methods like random forests and support vector machines.
Logistic regression is a simple classification model that predicts binary or even multiclass output. The logic for training and testing is similar to linear regression.
Lets consider the iris data for our Python implementation of a logistic regression model. Well use sepal length (cm), sepal width (cm), petal length (cm) and petal width (cm) to predict the type of iris plant:
We can evaluate and visualize the model performance using a confusion matrix:
We see that the model correctly captures all of the true positives across the three iris plant classes. Similar to linear regression, logistic regression depends on a linear sum of inputs used to predict each class. As such, logistic regression models are referred to as generalized linear models. Given that logistic regression models a linear relationship between input and output, theyre best employed when you know that there is a linear relationship between input and class membership.
Random forests, also called random decision trees, is a statistical model for both classification and regression tasks. Random forests are basically a set of questions and answers about the data organized in a tree-like structure.
These questions split the data into subgroups so that the data in each successive subgroup are most similar to each other. For example, say wed like to predict whether or not a borrower will default on a loan. A question that we can ask using historical lending data is whether or not the customers credit score is below 700. The data that falls into the yes bucket will have more customers who default than the data that falls into the no bucket.
Within the yes bucket, we can further ask if the borrowers income is below $30,000. Presumably, the yes bucket here will have an even greater percentage of customers who default. Decision trees continue asking statistical questions about the data until achieving maximal separation between the data corresponding to those who default and those who dont.
Random forests extend decision trees by constructing a multitude of them. In each of these trees, we ask statistical questions on random chunks and different features of the data. For example, one tree may ask about age and credit score on a fraction of the train data. Another may ask about income and gender on a separate fraction of the training data, and so forth. Random forest then performs consensus voting across these decision trees and uses the majority vote for the final prediction.
Implementing a random forests model for both regression and classification is straightforward and very similar to the steps we went through for linear regression and logistic regression. Lets consider the regression task of predicting housing prices using the Boston housing data. All we need to do is import the random forest regressor module, initiate the regressor object, fit, test and evaluate our model:
We see a slight improvement in performance compared to linear regression.
The random forest object takes several parameters that can be modified to improve performance. The three Ill point out here are n_estimators, max_depth and random_state. You can check out the documentation for a full description of all random forest parameters.
The parameter n_estimators is simply the number of decision trees that the random forest is made up of. Max_depth measures the longest path from the first question to a question at the base of the tree. Random_state is how the algorithm randomly chooses chunks of the data for question-asking.
Since we didnt specify any values for these parameters, the random forest module automatically selects a default value for each parameter. The default value for n_estimators is 10, which corresponds to 10 decision trees. The default value for max_depth is None, which means there is no cut-off for the length of the path from the first question to the last question at the base of the decision tree. This can be roughly understood as the limit on the number of questions we ask about the data. The default value for random_state is None. This means, upon each model run, different chunks of data will be randomly selected and used to construct the decision trees in the random forests. This will result in slight variations in output and performance.
Despite using default values, we achieve pretty good performance. This accuracy demonstrates the power of random forests and the ease with which the data science beginner can implement an accurate random forest model.
Lets see how to specify n_estimators, max_depth and random_state. Well choose 100 estimators, a max depth of 10 and a random state of 42:
We see that we get a slight improvement in both MSE and R^2. Further, specifying random_state makes our results reproducible since it ensures the same random chunks of data are used to construct the decision trees.
Applying random forest models to classification tasks is very straightforward. Lets do this for the iris classification task:
And the corresponding confusion matrix is just as accurate:
Random forests are a great choice for building a statistical model since they can be applied to a wide range of prediction use cases. This includes classification, regression and even unsupervised clustering tasks. Its a fantastic tool that every data scientist should have in their back pocket. In the context of Scikit-learn, theyre extremely easy to implement and modify for improvements in performance. This enables fast prototyping and experimentation of models, which leads to accurate results faster.
Finally, all the code in this post is available on GitHub.
Overall, Scikit-learn provides many easy-to-use tools for accessing benchmark data, performing data processing, and training, testing and evaluating machine learning models. All of these tasks require relatively few lines of code, making the barrier to entry for beginners in data science and machine learning research quite low. Users can quickly access toy data sets and familiarize themselves with different machine learning use cases (classification, regression, clustering) without the hassle of finding a data source, downloading and then cleaning the data. Upon becoming familiar with different use cases, the user can then easily port over what theyve learned to more real-life applications.
Further, new data scientists unfamiliar with data imputation can quickly pick up how to use the SimpleImputer package in Scikit-learn and implement some standard methods for replacing missing or bad values in data. This can serve as the foundation for learning more advanced methods of data imputation, such as using a statistical model for predicting missing values. Additionally, the standard scaler and normalizer methods make data preparation for advanced models like neural networks and support vector machines very straightforward. This is often necessary in order to achieve satisfactory performance with more complicated models like support vector machines and neural networks.
Finally, Scikit-learn makesbuilding a wide variety of machine learning models very easy. Although Ive only covered three in this post, the logic for building other widely used models such as support vector machines and K-nearest neighbors is very similar. It is also very suitable for beginners who have limited knowledge of how these algorithms work under the hood, given that each model object comes with default parameters that give baseline performance. Whether the task is model benching marking with toy data, preparing/cleaning data, or evaluating model performance Scikit-learn is a fantastic tool for building machine learning models for a wide variety of use cases.
Jump Into Machine LearningThe Top 10 Machine Learning Algorithms Every Beginner Should Know
More here: