Category Archives: Data Science

LLNL’s Winter Hackathon Highlights Data Science Talks and Tutorial – HPCwire

March 29, 2021 The Data Science Institute (DSI) sponsored LLNLs 27th hackathonon February 1112. Held four times a year, these seasonal events bring the computing community together for a 24-hour period where anything goes: Participants can focus on special projects, learn new programming languages, develop skills, dig into challenging tasks, and more. The winter hackathon was the DSIssecond such sponsorship. Organizers were data scientist Ryan Dana, postdoctoral researcher Sarah Mackay, and DSI administrator Jennifer Bellig. DSI director Michael Goldman opened the event by noting, Hackathons are great opportunities to explore new ideas and make connections with other staff, and to both innovate and learn.

In a new twist to the typical hackathon schedule, organizers offered four optional presentations showcasing data science techniques in COVID-19 drug discovery, inertial confinement fusion, central nervous system modeling, and querying of massive graphs. Participants could also choose to attend an introductory tutorial on deep learning (DL) for image classification. Goldman noted, Almost every program area at the Lab has some type of data science element. The hackathon is one way to help build that community.

Team and individual presentations at the end of the 24-hour period featured a range of projects. Lisa Hughey, a data analytics applications developer, used the time to learn R Shiny and build an interactive web application. Former hackathon organizer Geoff Cleary experimented with packaging Python applications, while Enterprise Application Services developers Brinda Jana and Yunki Paik continued a previous hackathon project to track radio hazardous waste material.

Tutorial Teamwork

Data scientists Cindy Gonzales and Luke Jaffe ran the two-hour DL tutorial, which explained how to perform multi-class image classification in Python using the PyTorch library. Image classification is a problem in computer vision in which a model recognizes an image and outputs a label for it. This process can play an important role in a variety of mission-relevant scenarios such as chemical detection, remote sensing, optics inspections, and disease diagnosis.

We designed the material so participants wouldnt need to know anything about deep learning, machine learning in general, or computer vision, said Jaffe, who works in LLNLs Global Security Computing Applications Division (GS-CAD). We expected some level of comfort with Python, and provided links where participants could learn more about the machine learning theory we covered.

The team provided sample code via Jupyter Notebook and first walked attendees through importing packages and setting up constants and image display utility functions. Next, the tutorial explored working with images as arrays and tensorsi.e., how a computer sees an image in order to classify itusing a CIFAR10 dataset that contains images of airplanes, cars, birds, cats, and other vehicles and animals.

Gonzales and Jaffe went on to describe the concepts behind neural networks, logistic regression to optimize classification accuracy, and different types of gradient descent algorithms. The tutorial included step-by-step instructions for using PyTorch to load data and create, train, and test the DL model.

Both tutorial leaders are expanding their data science skills on the job. Gonzales came to the Lab as an administrator in 2016 and later changed careers with the help of LLNLs Education Assistance Program (EAP) and Data Science Immersion Program. Now a GS-CAD data scientist, she is pursuing a Masters in Data Science via a Johns Hopkins distance-learning program. Jaffe was a Lab intern who was hired full time in 2016 after earning undergraduate and graduate degrees in Computer Engineering from Northeastern University. He is now using the EAP to fund PhD studies in Computer Vision at UC Berkeley. The team hopes to present their tutorial again to the Labs incoming summer interns.

Continuity Is Crucial

The Lab has held four hackathons virtually since the COVID-19 pandemic began, and Goldman emphasized the importance of continuing the event. Weve been out of the office for almost a year. Several new staff havent been onsite or met colleagues in person, so virtual events are crucial, he stated.

Although online attendance has not been as high as with in-person hackathons, this winter event saw a steady participation of 3035 hackers throughout. Bellig said, As much as I missed the energy of an in-person hackathon, I was quite impressed with all the people who participated virtually and, once again, with the presentations and hacking accomplishments of my fellow employees.

These circumstances havent dampened enthusiasm for the event. Dana, who joined GS-CAD in January 2020, volunteered to help organize the event even though he had not attended a previous hackathon. I wanted to learn more about how data science is applied throughout the Lab, and network with some of incredible talent and research that is being done, he said. Gonzales added, I am definitely interested in participating in future hackathons.

Source: LLNL

Continued here:

LLNL's Winter Hackathon Highlights Data Science Talks and Tutorial - HPCwire

Time to insight over precision – Tableau brings data science to business users – Diginomica

( 3dkombinat - shutterstock)

When Salesforce announced its $15.7 billion acquisition of Tableau in 2019, Salesforce CEO Marc Benioff said that the company intends to bring data literacy to everyone in business'. And we are beginning to get a sense of what this will look like for the now combined companies, with the latest Business Science release from Tableau.

The acquisition also prompted questions over how and if there would be overlap between Tableau's BI capabilities and Salesforce's AI product, Einstein. The Business Science announcement from Tableau last week, which aims to put data science in the hands of business users, sheds some light on this question too - as it seemingly showcases how both companies can jointly bring the best of both worlds to the table.

We spoke with Andrew Beers, Chief Technology Officer at Tableau, about Business Science, which brings Einstein Discovery to Tableau's 2021.1 release later this month. The company said that by integrating Einstein Discovery into the Tableau platform, this will help business users go beyond understanding what happened and why it happened, to explore likely outcomes and inform predictive action.

But the most interesting part about the intentions behind the product is that Tableau is urging business users to recognize that there is a trade off between time to insight vs precision - where the later requires heavy investments in data science teams and tooling. Simply put, if you can get there faster, even if it's not 100% accurate, then there could well be benefits there.

Beers says that the COVID-19 pandemic has amplified the need for business users to adopt more sophisticated data science tools, as data is at the centre of driving change across an organisation. In this context, he says:

The challenge with any democratization effort around data I think is just encouraging companies to pull together the right data. We've done a lot to make data visible to the analyst, we've done a lot to make data sort of workable with the analyst. But getting that data pulled together is always challenge number one - Tableau has helped with that by making the data visible within the organisation, making it discoverable Tableau was built for the business user, but that landscape has just expanded over the 16 years that we've been selling software.

So we think business science is a natural next step for us. Companies are very, very interested because it is about helping people make decisions in context and bringing business context into decisions.

Tableau says that Business Science is being driven by demand from business users that want data science capabilities, but don't necessarily have the resources or time to support it for every use case. Beers explains:

A lot of companies are starting to reach for those data science tools to improve decision making. And of course there's all kinds of challenges with that, like not everybody's got a data scientist. Data scientists are relatively rare. And so, I may not have one, or I may have data scientists and they're not necessarily gonna be focused on my problem. There's a lot of examples in the business domain, where I need some predictive power, where precision is not necessarily required, but something that is directionally correct is required because it's going to be injected into this place with all kinds of business context.

And so that's why we think there's this opportunity to democratize advanced analytics, in particular putting some of these data science techniques into the hands of business experts.

Beers says that traditional BI focuses on having a bunch of historical data, around something like the sales on products that your company is making. BI would allow a user to look at that data and slice it and dice it in a variety of ways - who's selling it, who's buying it, what do you know about the customer, etc.

However, sometimes the information that you really need to get in front of your sales team is: how likely is this person going to renew this service? This more predictive approach allows said user to prioritize their work for the day. Beers adds:

Does that need to be a super precise model? Probably not. It's got to be directionally correct, but it doesn't necessarily have to be precise.

We think that the trade off there is getting the ability for the business experts to build these models through discovery and change them as the conditions of the business change. And then getting that next version of the model out and into the hands of the consumers, which in this case is the sales team. We think that that ends up being a lot more important for these kinds of problems than let's say going through a rigorous redesign of a data science project.

Beersoutlines that the Business Science announcement highlights how Salesforce and Tableau are bringing the full power' of both platforms together. In this case, inside the Tableau analytics products, users will be able to write some relatively simple calculations to call out to the Einstein discovery models, which will allow users to bring some predictive insights to their Tableau environment. This moves the needles for Tableau's user base that have typically relied on historical data for insights. Beers says:

The models aren't being prescribed by Tableau. The models are built by the customer using Einstein Discovery. If you've got some sort of KPI that you're trying to maximise or minimise, Discovery is very good at building models that can predict where things are going based on your historical data. And then if you've told it here are the things that I can control, here are the things I can't control', then it can say well by controlling this variable, we think you're gonna affect the outcome in this way'.

To make use of Business Science you have to be both a Tableau and a Salesforce customer, as you need access to both platforms. Beers says that there will be a lot of upside here to both companies, as there are a lot of Salesforce customers that Tableau hasn't acquired and vice versa. And we get the distinct impression that this is a sign of the thinking behind how the companies are planning to progress as a combined entity. Beers adds:

Salesforce has long had this message on helping companies go through digital transformation, this has been their message for years. And at the heart of any digital transformation is data. That's one of the driving reasons why they wanted to bring us into the fold, because they realise that data is at the heart of all these things.

In terms of what we are prioritizing, we've got a lot of irons in the fire, absolutely. We're definitely gonna get better with the data in the Salesforce ecosystem. There's a lot of data there, that now we're part of the company, we're going to get some great access to. And then both companies are going to be leveraging each other's assets. And this release is a great example - we're leveraging the Einstein Discovery assets, bringing that together, expanding it to a bunch of new users.

It's good to see the fruits of the Salesforce/Tableau acquisition being brought to market. This will be particularly interesting for Tableau customers, which could see Einstein bringing more predictive prowess to their traditional analytics platforms. Analysing historical data that's static isn't as powerful as putting it to predictive use. However, as ever, the proof will be in the customer stories and the use cases - which we will be chasing to get our hands on.

Excerpt from:

Time to insight over precision - Tableau brings data science to business users - Diginomica

Data, Science, and Journalism in the Age of COVID – Pulitzer Center on Crisis Reporting

In a year defined by a pandemic, journalists relied on data and science to tell the story that has impacted every corner of the globe. Please join Northwestern University in Qatar (NU-Q) and the Pulitzer Center on Tuesday,April 6 at 5:30pm AST (10:30am EDT) for a conversation with three Pulitzer Center grantees whose work over the past year has set a very high standard for the profession.

Youyou Zhou is a New York-based freelance data journalist working with graphics and code. She produces data-driven, visual, interactive, and experimental journalism that breaks free of words-based formats.

Charles Piller writes investigative stories for Science. He previously worked as an investigative journalist for STAT, the Los Angeles Times, and The Sacramento Bee, and has reported on public health, biological warfare, infectious disease outbreaks, and other topics from the United States, Africa, Asia, Europe, and Central America.

Eliza Barclay is a science and health editor at Vox. Formerly, she was a reporter and editor at NPR, and most recently edited the food blog The Salt. As editor of The Salt, she received a James Beard Award, a Gracie Award, and an Association of Food Journalists Award.

This event will be moderated by Tom Hundley, senior editor at the Pulitzer Center.

NU-Q is part of the Pulitzer Centers Campus Consortium network.To learn more about NUQ and the Campus Consortium, click here.

See the rest here:

Data, Science, and Journalism in the Age of COVID - Pulitzer Center on Crisis Reporting

The Future of AI: Careers in Machine Learning – Southern New Hampshire University

The robots are coming. If there is one thing we learned from the COVID-19 pandemic, its that when humans are sent home, machines keep working.

This doesnt mean that robots will take over the world. It does, however, mean that our technical landscape is changing.

Human history has a long and favorable track record of technological advancements, particularly when it comes to ideas that seem ludicrous at the time (Wright brothers, anyone?). The printing press, assembly line and personal computer have all helped move civilization forward by leaps and bounds over the last few centuries.

Imagine being one of the first people to replace glasses with contact lenses by putting them directly on their eyes, no less. Henry Ford replaced horses with the automobile as our main mode of transportation. The process of pasteurization changed the way we eat. Examples like these are endless, because throughout human history, there has been innovation and change.

Even as recently as the 1980s, there was no internet in peoples homes. The very means by which you are reading this article did not exist. Online school did not exist, at least not in the way we take college classes online now.

And while each technological advancement may have its detractors, its hard to argue with the benefits of technology as a whole. After all, thinking big got us to the moon, and gave us television, 3-D printing and a host of incredible advances in modern medicine.

So, are you wondering whats next? The future of technology lies squarely with machine learning and with artificial intelligence, known as AI.

Artificial intelligence is part of the field of data science. People who work in data science are skilled in developing mathematical algorithms to answer complex questions. When, for example, a company like Netflix wants to predict what movies a customer might want to watch next, a data scientist will create an algorithm based on that customers viewing history. Then, they will use that algorithm to offer a list of suggestions.

Machine learning is a branch of data science which involves using data science programs that can adapt based on experience, said Ben Tasker, technical program facilitator of data science and data analytics at Southern New Hampshire University. Take a weather predictor, for example. The more weather inputs there are, the better the prediction for what will come next.

While machine learning is useful, its important to note that there is no artificial intelligence involved in its functions. Machine learning involves rote mathematical or mechanical processes only.

Artificial intelligence then advances data science and machine learning even further.

Whereas machine learning can make predictions, artificial intelligence can make adjustments to its computations. In other words, AI can adjust a program to execute tasks smartly, Tasker said. For example, a fully autonomous, self-driving car is an example of something that would use full artificial intelligence.

These days, the idea of such a self-driving car is no longer science fiction. As the fields of science and engineering continue to advance, artificial intelligence is becoming a lot less artificial and a lot more intelligent, Tasker said.

Because so much about the field of data science in general and AI in particular is new, there are many opportunities to make your own niche, especially now that many companies have started to invest in the idea of artificial intelligence, Tasker said. This creates a wealth of career opportunities for those who thrive on charting their own path. The future of AI is great.

Careers for computer information and research scientists are predicted to grow 15% between now and 2029, according to the U.S. Bureau of Labor Statistics (BLS). That is much faster than the national average for career growth. The median pay is a healthy $122,840 per year, BLS reported.

Some other top career options for machine learning and artificial intelligence include:

So, will robots replace humans moving forward? For some jobs or tasks, quite possibly. For all jobs or tasks? Not likely.

Of course, robots are already in the workplace, Tasker said. They are not intelligent, but they perform basic tasks. Car manufacturers use robots on assembly lines already and have for years.

Whether a company actively uses artificial intelligence or not, all industries will be impacted by it, whether intentionally or unintentionally, Tasker said. I do think that some industries will have a higher barrier of entry, so to speak, such as medicine, he said. Patients still prefer a human touch for things like receiving a diagnosis or test results.

As artificial technology continues to develop, humans will need to have an ethical debate about what robots can and cannot do, but yes, we will see more robots, said Tasker.

And as use of robots grows, without a doubt, ethics is going to play a much larger role as AI grows, said Tasker, or at least it should.

Careers in machine learning and artificial intelligence are still being defined, which creates generous opportunities to innovate and carve your own career path. If you like math, computer programming, coding, and technology in general, a career in data science, machine learning, or AI is definitely one to consider.

Having a strong foundation in math and STEM can help prepare you for a career in AI. Knowledge of psychology will be particularly helpful, too.

Also important: a large threshold for change. Data science [and AI with it] changes every year, Tasker said, so the people working in data science will need to change with it. You will always be learning new technologies, algorithms, and coding languages.

The more math, programming, and experience with cloud computing that you can get under your belt, the better.

And, as more and more adoption of artificial intelligence technologies occur, we will begin to see an ethical debate emerge about what AI should and should not be doing, Tasker said. That makes courses in ethics critical, because "as the field of AI grows, more ethical considerations will need to be applied."

Keep in mind that while a bachelors degreeis a great foundation on which to build a career in artificial intelligence, an advanced degree is likely necessary to advance to the highest levels in the field.

Most jobs in the field of artificial intelligence require a graduate degree, such as a master of science or even doctorate, so be ready to continually learn, said Tasker.

While no career is truly future-proof given the ever-changing technology landscape, there are some ways you can be best prepared to weather the change. By grounding yourself with a strong science, math, and engineering background and then being ready to drive change, you may enjoy a long and prosperous career in the field of artificial intelligence.

Of course, while having a strong academic background is important, being good at math and programming is not enough. To really thrive in this career-field, you also need good, old-fashioned grit. In fact, curiosity, grit, and being humble are key traits toward having a successful, long-term career in data science, and especially in artificial intelligence, said Tasker. These are traits that you cannot necessarily learn in the classroom, but are helpful to being successful in this field long-term.

We have actually been using AI for some time, and not just in factories and on assembly lines, or to design futuristic cars.

Have you ever filled out a job application and included key words so that the artificial job screening tool doesnt filter you out of contention? Thats artificial intelligence.

Some artificial intelligence programs can even scan how a resume is drafted to see personality traits of an applicant, said Tasker. Other programs use facial recognition, which scans your facial expressions in an interview to create personality profiles of applicants.

Likewise, if you have ever used a website and a chat bot popped up, saying How can I help you today? that is also artificial intelligence. If youve ever thought you were chatting with a real, live human only to be informed that youre chatting with a bot, you already know just how realistic artificial intelligence tools already are in the business and retail world.

Chat bots and virtual assistants are being routinely used to respond to easy emails, schedule appointments, and even take meeting notes for users, Tasker said. While at times, being on the receiving end of using a bot can be frustrating, many businesses use them because they can perform repetitive tasks that have some known outcomes, such as which department your query needs to be routed to when you contact customer service for a company.

There are limitations currently, though. While chat bots can accomplish a surprisingly large number of tasks, they cannot operate your Tesla, for example, said Tasker.

With high return-on-investment to using chat bots and interview bots, the use of artificial intelligence in commerce is not likely to go away anytime soon. If anything, the use of AI will continue to grow in new and innovative ways.

With an increased use in artificial intelligence comes an increase in the conversation about how it should be implemented. This is where a background in psychologycould be helpful for people working in this field. "Psychology is important because it teaches a student how the human brain works, which is complicated," said Tasker. "To really learn to program AI, learning how the brain works at some basic level would help as well."

Just because a chat-bot can attend a meeting for an employee, does that mean that we should also make a bot that can perform medical exams? Where is the line? What about facilitating a classroom and teaching our children? Tasker asked. "What about fully autonomous truck driving?"

Is there a line between what we need versus what we can do? And where does focusing on the bottom line financially begin to cost us when it comes to our humanity?

These are big questions for which there are no easy answers. Yet by studying data science, math and STEM, and by embracing the change inherent in the field of machine learning and artificial intelligence, you just might be the next Wilbur or Orville Wright.

Marie Morganelli, PhD, is a freelance content writer and editor.

Go here to see the original:

The Future of AI: Careers in Machine Learning - Southern New Hampshire University

SMU meets the opportunities of the data-driven world with cutting-edge research and data science programs – The Dallas Morning News

For more than a century, SMU has served societal needs and prepared students to make an impact in their chosen professions. To fulfill that same mission in a data-driven world, the university has developed major new programs in research and data science, combining high-speed computing, mathematics and statistics to extract meaningful insights from extremely large quantities of data. These programs are helping the business community in Dallas and beyond thrive in an increasingly data-driven, complex and interconnected world.

Recently, Elizabeth G. Loboa, SMU provost and vice president for academic affairs, described several of the universitys investments in research and data science.

After describing these facilities, Loboa hosted a conversation about research and data science with a group of SMU academic leaders. The participants were James E. Quick, dean of the Moody School of Graduate and Advanced Studies and associate provost for research; Stephanie Knight, dean of the Simmons School of Education and Human Development; Suku Nair, director, SMU AT&T Center for Virtualization; and Peter K. Moore, associate provost for curricular innovation. Highlights from their conversation follow.

Moody School Dean James Quick: During the past decade, expenditures on research at SMU have increased over 400%. During that same time, conferral of Ph.D. degrees has increased over 300%. One of the keys to the increases in both these areas has been the universitys decision to focus on the digital revolution. Our ManeFrame II computing system provides both faculty and students access to advanced computing resources when they need them, without overburdening the system and delaying vital research.

For a tangible example of how we use advanced data science, look at the strong SMU program in monitoring nuclear weapons testing. The capabilities we have developed for analyzing seismic activity from around the world and distinguishing earthquake activity from nuclear tests can play a crucial role in improving our national capabilities in that vital arena.

Simmons School Dean Stephanie Knight: The Simmons School of Education and Human Development has always been a nontraditional institution. We take great pride in conducting cutting-edge research and then putting the results of that research into action.

Several years ago, we were approached by Toyota about creating a project to benefit the greater Dallas community. Toyota awarded us a $2 million, three-year planning grant to establish a pre-K through eight school in West Dallas focused on a STEM curriculum. Working with Toyota and Dallas ISD, our objective is to prepare students for jobs and college in STEM-related fields. We expect it to be a center for research and professional development that will not only benefit our students locally but also students throughout the country. Toyota also hopes that the school model can be taken to other communities to promote STEM education.

AT&T Center for Virtualization Director Suku Nair: Our partnership with AT&T came about when the company realized they were going to have to make tremendous changes to stay competitive in the telecom industry, which has seen unimaginable growth in recent years. As our research efforts have grown, other companies like Google, Ericsson, HPE and others are now coming to us for assistance. Of course, they could do much of their own research and data analysis, but one advantage we offer is that we can provide perspectives from many disciplines across our campus. To cite one example, we recently helped L3 Harris measure biometric data for student pilots to validate that the companys flight training systems were as effective as they need to be.

The SMU AT&T Center for Virtualization and the Data Science Institute are also providing invaluable assistance in dealing with the COVID-19 pandemic. We are currently in discussions with the federal governments Economic Development Administration to develop analytics tools for effective allocation of resources to deal with the pandemic. At the same time, we also provide data analysis assistance to many smaller medical facilities to help them improve their methods for treating COVID-19 patients.

Additionally, companies often come to us asking for short courses to train their workforce in some area of data science. To date, we have offered short courses in areas such as data security and advanced cryptography, cloud migration, and data center security and reliability.

Associate Provost Peter K. Moore: In the last two months of 2020, several data companies moved their headquarters from California to Texas. That situation makes SMU increasingly aware of the need to produce workers who can operate effectively in this big-data environment if we want to attract more of those companies to D-FW and to Texas.

Thats why several years ago we launched one of the nations first online masters programs in data science. We have also created a number of related professional programs in statistics, economics and business, and this coming fall we will offer a new online artificial intelligence program out of the computer science department.

Last year, we also established a bachelors degree program and a minor in data science. Both the masters and the undergraduate programs are interdisciplinary in nature and involve faculty from the arts, engineering, humanities, sciences and business.

If were going to be successful in confronting our nations most serious challenges in areas like education, public health and climate change, we will need to have expertise in both data science and in working across disciplines. We want to make sure that our students at all levels are prepared to live in the world of data. Its the water in which we all swim.

For additional information on the many academic opportunities offered at SMU, go to smu.edu.

See original here:

SMU meets the opportunities of the data-driven world with cutting-edge research and data science programs - The Dallas Morning News

Working at the intersection of data science and public policy | Penn Today – Penn Today

One of the ideas you discuss in the book is algorithmic fairness. Could you explain this concept and its importance in the context of public policy analytics?

Structural inequality and racism is the foundation of American governance and planning. Race and class dictate who gets access to resources; they define where one lives, where children go to school, access to health care, upward mobility, and beyond.

If resource allocation has historically been driven by inequality, why should we assume that a fancy new algorithm will be any different? This theme is present throughout the book. Those reading for context get several in-depth anecdotes about how inequality is baked into government data. Those reading to learn the code, get new methods for opening the algorithmic black box, testing whether a solution further exasperates disparate impact across race and class.

In the end, I develop a framework called algorithmic governance, helping policymakers and community stakeholders understand how to tradeoff algorithmic utility with fairness.

From your perspective, what are the biggest challenges in integrating tools from data science with traditional planning practices?

Planning students learn a lot about policy but very little about program design and service delivery. Once a legislature passes a $50 million line item to further a policy, it is up to a government agency to develop a program that can intervene with the affected population, allocating that $50 million in $500, $1,000 or $5,000 increments.

As I show in the book, data science combined with governments vast administrative data is good at identifying at-risk populations. But doing so is meaningless unless a well-designed program is in place to deliver services. Thus, the biggest challenge is not teaching planners how to code data science but how to consider algorithms more broadly in the context of service delivery. The book provides a framework for this by comparing an algorithmic approach to service delivery to the business-as-usual approach.

Has COVID-19 changed the way that governments think about data science? If so, how?

Absolutelyspeaking of service delivery, data science can help governments allocate limited resources. The COVID-19 pandemic is marked entirely by limited resources: From testing, PPE, and vaccines to toilet paper, home exercise equipment, and blow-up pools (the latter was a serious issue for my 7-year-old this past summer).

Government failed at planning for the allocation of testing, PPE, and vaccines. We learned that it is not enough for government to invest in a vaccine; it must also plan for how to allocate vaccines equitably to populations at greatest risk. This is exactly what we teach in Penns MUSA Program, and I was disappointed at how governments at all levels failed to ensure that the limited supply of vaccine aligned with demand.

We see this supply/demand mismatch show up time and again in government, from disaster response to the provision of health and human services. I truly believe that data can unlock new value here, but, again, if government is uninterested in thinking critically about service delivery and logistics, then the data is merely a sideshow.

What do you hope people gain by reading this book?

There is no equivalent book currently on the market. If you are an aspiring social data scientist, this book will teach you how to code spatial analysis, data visualization, and machine learning in R, a statistical programming language. It will help you build solutions to address some of todays most complex problems.

If you are a policymaker looking to adopt data and algorithms into government, this book provides a framework for developing powerful algorithmic planning tools, while also ensuring that they will not disenfranchise certain protected classes and neighborhoods.

See the original post here:

Working at the intersection of data science and public policy | Penn Today - Penn Today

Jupyter has revolutionized data science, and it started with a chance meeting between two students – TechRepublic

Commentary: Jupyter makes it easy for data scientists to collaborate, and the open source project's history reflects this kind of communal effort.

Image: iStockphoto/shironosov

If you want to do data science, you're going to have to become familiar with Jupyter. It's a hugely popular open source project that is best known for Jupyter Notebooks, a web application that allows data scientists to create and share documents that contain live code, equations, visualizations and narrative text. This proves to be a great way to extract data with code and collaborate with other data scientists, and has seen Jupyter boom from roughly 200,000 Notebooks in use in 2015 to millions today.

Jupyter is a big deal, heavily used at companies as varied as Google and Bloomberg, but it didn't start that way. It started with a friendship. Fernando Prez and Brian Granger met the first day they started graduate school at University of Colorado Boulder. Years later in 2004, they discussed the idea of creating a web-based notebook interface for IPython, which Prez had started in 2001. This became Jupyter, but even then, they had no idea how much of an impact it would have within academia and beyond. All they cared about was "putting it to immediate use with our students in doing computational physics," as Granger noted.

Today Prez is a professor at University of California, Berkeley, and Granger is a principal at AWS, but in 2004 Prez was a postdoctoral student in Applied Math at UC Boulder, and Granger was a new professor in the Physics Department at Santa Clara University. As mentioned, they first met as students in 1996, and both had been busy in the interim. Perhaps most pertinently to the rise of Jupyter, in 2001 Prez started dabbling in Python and, in what he calls a "thesis procrastination project," he wrote the first IPython over a six-week stretch: a 259-line script now available on GitHub ("Interactive execution with automatic history, tries to mimic Mathematica's prompt system").

SEE:Top 5 programming languages for data scientist to learn (free PDF)(TechRepublic)

It would be tempting to assume this led to Prez starting Jupyter--it would also be incorrect. The same counterfactual leap could occur if we remember that Granger wrote the code for the actual IPython Notebook server and user interface in 2011. This was important, too, but Jupyter wasn't a brilliant act by any one person. It was a collaborative, truly open source effort that perhaps centered on Prez and Granger, but also people like Min Ragan-Kelley, one of Granger's undergraduate students in 2005, who went on to lead development of IPython Parallel, which was deeply influential in the IPython kernel architecture used to create the IPython Notebook.

However we organize the varied people who contributed to the origin of Jupyter, it's hard to get away from "that one conversation."

In 2004 Prez visited Granger in the San Francisco Bay Area. The old friends stayed up late discussing open source and interactive computing, and the idea to build a web-based notebook came into focus as an extension of some parallel computing work Granger had been doing in Python, as well as Prez's work on IPython. According to Granger, they half-jokingly talked about these ideas having the potential to "take over the world," but at that point their idea of "the world" was somewhat narrowly defined as scientific computing within a mostly academic context.

Years (and a great deal of activity) later, in 2009, Prez was back in California, this time visiting Granger and his family at their home in San Luis Obispo, where Granger was now a professor. It was spring break, and the two spent March 21-24 collaborating in person to complete the first prototype IPython kernel with tab completion, asynchronous output and support for multiple clients.

By 2014, after a great deal of collaboration between the two and many others, Prez, Granger and the other IPython developers co-founded Project Jupyter and rebranded the IPython Notebook as the Jupyter Notebook to better reflect the project's expansion outwards from Python to a range of other languages including R and Julia. Prez and Granger continue to co-direct Jupyter today.

"What we really couldn't have foreseen is that the rest of the world would wake up to the value of data science and machine learning," Granger stressed. It wasn't until 2014 or so, he went on, that they "woke up" and found themselves in the "middle of this new explosion of data science and machine learning." They just wanted something they could use with their students. They got that, but in the process they also helped to foster a revolution in data science.

How? Or, rather, why was it that Jupyter has helped to unleash so much progress in data science? Rick Lamers explained:

Jupyter Notebooks are great for hiding complexity by allowing you to interactively run high level code in a contextual environment, centered around the specific task you are trying to solve in the notebook. By ever increasing levels of abstraction data scientists become more productive, being able to do more in less time. When the cost of trying something is reduced to almost zero, you automatically become more experimental, leading to better results that are difficult to achieve otherwise.

Data science is...science; therefore, anything that helps data scientists to iterate and explore more, be it elastic infrastructure or Jupyter Notebooks, can foster progress. Through Jupyter, that progress is happening across the industry in areas like data cleaning and transformation, numerical simulation, exploratory data analysis, data visualization, statistical modeling, machine learning and deep learning. It's amazing how much has come from a chance encounter in a doctoral program back in 1996.

Disclosure: I work for AWS, but the views expressed herein are mine.

Learn the latest news and best practices about data science, big data analytics, and artificial intelligence. Delivered Mondays

See original here:

Jupyter has revolutionized data science, and it started with a chance meeting between two students - TechRepublic

Gartner: AI and data science to drive investment decisions rather than "gut feel" by mid-decade – TechRepublic

Turns out, "calling it from the gut," may become a strategy of the past as data increasingly drives decision-making. But how will these data-driven approaches change investment teams?

Image: iStock/metamorworks

In the age of digital transformation, artificial intelligence and data science are allowing companies to offer new products and services. Rather than relying on human-based intuition or instincts, these capabilities provide organizations with droves of data to make more informed business decisions.

Turns out, "calling it from the gut," as the adage goes, may become an approach of the past as data increasingly drives investment decisions. A new Gartner report predicts that AI and data science to drive investment decisions rather than "gut feel" by mid-decade.

"Successful investors are purported to have a good 'gut feel'the ability to make sound financial decisions from mostly qualitative information alongside the quantitative data provided by the technology company," said Patrick Stakenas, senior research director at Gartner in a blog post. "However, this 'impossible to quantify inner voice' grown from personal experience is decreasingly playing a role in investment decision making."

Instead, AI and data analytics will inform more than three-quarters of "venture capital and early-stage investor executive reviews," according to a Gartner report published earlier this month.

"The traditional pitch experience will significantly shift by 2025, and tech CEOs will need to face investors with AI-enabled models and simulations as traditional pitch decks and financials will be insufficient," Stakenas said.

SEE:TechRepublic Premium editorial calendar: IT policies, checklists, toolkits, and research for download(TechRepublic Premium)

Alongside data science and AI, crowdsourcing will also help play a role in "advanced risk models, capital asset pricing models and advanced simulations evaluating prospective success," per Gartner. While the company expects this data-driven approach as opposed to an intuitive approach to become the norm for investors by mid-decade, the report also notes a specific use-case highlighting using these methods.

Correlation Ventures uses information gleaned from a VC financing and outcomes database to "build a predictive data science model," according to Gartner, allowing the fund to increase both the total number of investments and the investment process timeline "compared with traditional venture investing."

"This data is increasingly being used to build sophisticated models that can better determine the viability, strategy and potential outcome of an investment in a short amount of time. Questions such as when to invest, where to invest and how much to invest are becoming almost automated," Stakenas said.

SEE:Researchers use AI-enabled drones to protect the iconic koala(TechRepublic)

A portion of the report delves into the myriad ways these shifts in investment strategy and decision making could alter the skills venture capital companies seek and transform the traditional roles of investment managers. For example, Gartner predicts that a team of investors "familiar with analytical algorithms and data analysis" will augment investment managers.

These new investorswho are "capable of running terabytes of signals through complex models to determine whether a deal is right for them"will apply this information to enhance "decision making for each investment opportunity," according to the report.

The report also includes a series of recommendations for tech CEOs to develop in the next half-decade. This includes correcting or updating quantitative metrics listed on social media platforms and company websites for accuracy. Additionally, to increase a tech CEO's "chances of making it to an in-person pitch" they should consider adapting leadership teams and ensure "online data showcases diverse management experience and unique skills," the report said.

Learn the latest news and best practices about data science, big data analytics, and artificial intelligence. Delivered Mondays

More:

Gartner: AI and data science to drive investment decisions rather than "gut feel" by mid-decade - TechRepublic

DefinedCrowd CEO Daniela Braga on the future of AI, training data, and women in tech – GeekWire

DefinedCrowd CEO Daniela Braga. (Drio Branco Photo)

Artificial intelligence is the fourth industrial revolution and women had better play a prominent role in its development, says Daniela Braga, CEO of Seattle startup DefinedCrowd.

We left the code era to men over the last 30 years and look at where it got us, Braga told attendees of the recent Women in Data Science global conference. Ladies lets lead the world to a better future together.

Technology has of course led to amazing advancements in health, communications, education and entertainment, but it has also created a more polarized and extremist society, spread dangerous misinformation and excluded swaths of the population from participation. A 2018 study by Element AI found that only 13% of U.S. AI researchers were women.

Braga thinks we can do better. She is a co-founder of DefinedCrowd, an AI training data technology platform that launched in December 2015. Braga took over as CEO in mid-2016. The company is ranked No. 21 on the GeekWire 200, our list of top Pacific Northwest tech startups, and has reeled in $63.6 million in venture capital, including a $50 million round raised last year.

We caught up with Braga after the conference to learn how AI is usurping coding; the need to impose ethics and regulations on where it takes us; and the need for more women in the industry and in AI leadership. Here are some key takeaways:

We spent five centuries in the print era, 30 years in software and now AI is on a path to supplant software, Braga said. And while coding and programmers drove software development, its data and data scientists that produce AI.

You dont program rules, you teach AI with data that is structured and [theres] a lot of it, Braga said. The data allows us to train a brain, an artificial brain, in a week instead of what it used to take us months to code.

And its so much more powerful. Traditional coding, which is essentially built from if then decision making rules, isnt capable of controlling complex tasks like self-driving cars or virtual assistants that require subtle assessments and decision making.

Were still at the dawn of AI, Braga said, or whats called narrow AI. In these early days, the field needs to be incorporating rules and standards to make sure that AI is used in ways that are ethical, unbiased and protect privacy. Oversight is needed at an international level that brings in a diversity of voices.

We need an alliance, almost like a United Nations for AI, she said.

The data used to train AI needs to be high quality, which for Braga means its accurate, representative or unbiased. It also should be monitored, anonymized so it cant be traced to its sources, and people are consenting in providing their information. Braga admittedly has a vested interest in this matter, as her companys business it to provide the data that companies use to train their AI. DefinedCrowds focus is speech and natural language processing.

In an infamous case of what can happen when AI is trained on bad data, Microsofts AI chatbot named Tay was quickly corrupted in 2016 when online users fed it racist, misogynistic language and conspiracy theories that the personal assistant parroted back to the public.

While were in narrow AI now, next steps are general AI and super AI, Braga said. As the technology matures, the different AI systems will be able to communicate with each other. Navigation, for example, will mix with voice interaction combined with text messaging. Home and work AI domains will talk together.

There are some people who say when you start interlinking so many things you will have an AI that may become sentient, Braga said, creating technology such as the personal assistant in the movie Her who is so lifelike and charming that the protagonist falls in love.

The super AI is when AI is smarter, thinking faster, thinking better than humans, Braga said. So that is the part that is science fiction, where the machine will take over the world. Were very far from that.

Women bring emotional intelligence that technology should have, Braga said. Its just that emotional intelligence component, that creativity, that warmth, that should resemble more a human that aspect does not come through built by men alone.

Data science is different from traditional software engineering. While the latter focused on programming languages, math and statistics, work in AI incorporates linguistics, psychology and ethics to a greater degree.

DefinedCrowd is trying to build a diverse workforce and has an office in Portugal, where Braga was born and raised. The companys staff is about 32% female, but its difficult to recruit qualified women, particularly for senior roles. Whats even tougher, Braga said, is finding women at her level for mentoring and support.

There are a handful of founder/CEOs at AI-focused companies, including Daphne Koller of insitro and Rana el Kaliouby of Affectiva. And only 22 women who hold the title of founder/CEO have taken their companies public among thousands of IPOs over the decades, according to Business Insider.

I always have a super hard time finding women to look up to because it just doesnt exist. Im basically paving my way by myself. I dont have role models, Braga said. Its really hard to not have a way to bounce ideas within a safe circle.

Read more here:

DefinedCrowd CEO Daniela Braga on the future of AI, training data, and women in tech - GeekWire

Postdoctoral Position in Transient and Multi-messenger Astronomy Data Science in Greenbelt, MD for University of MD Baltimore County/CRESST II -…

Postdoctoral Position in Transient and Multi-messenger AstronomyData ScienceThe High Energy Astrophysics Science Archive Research Center (HEASARC) and Time- domain Astronomy Coordination Hub (TACH) at NASAs Goddard Space Flight Center (GSFC) invite applications for postdoctoral research positions in the fields of transient and/or multi- messenger astronomy.Applicants should have a strong astronomy research track record and also deep expertise in the technical disciplines of full-stack software development, cloud computing, and data visualization. Experience in machine learning and/or time-series databases would also be beneficial.

Successful applicants will join HEASARC/TACH and have a central role in shaping Goddards multi-messenger science output. This position is funded at 100% FTE. Approximately half of the applicants time will be devoted to HEASARC/TACH, including activities such as software engineering, shaping next-generation Kafka-based NASA astronomy alert systems, pipelinedevelopment, and collaboration with Goddard-supported missions. The remainder of the applicants time is available for self-driven research projects.

GSFC is home to over 100 Ph.D. astronomers, including project teams for Swift, Fermi, NICER, NuSTAR, TESS, JWST, and Roman, as well as ample computational resources. GSFC is also a member of the LIGO Scientific Collaboration. Through the Joint Space-Science Institute (JSI), GSFCis a partner in the Zwicky Transient Facility project. The successful applicants will also have the opportunity to apply for time on 4.3m Lowell Discovery Telescope in Happy Jack, AZ.

The positions are for two years, renewable for a third year upon mutual agreement, and will be hired through the University of Maryland, Baltimore County on the CRESST II collaborative agreement with GSFC. The nominal starting date is in Fall 2021, but alternate dates are possible depending on availability. Candidates must have a Ph.D. in astronomy, physics, or a related field by the date of appointment.

Candidates should provide a cover letter, CV (including publication list), and a 3-page statement of research interests. Short-listed candidates will be asked to supply three letters of reference at a later date. Completed applications received by Friday, April 30, 2021 will receive full consideration. All application materials and inquiries should be sent to:

Transient and Multi-messenger Astronomy Data Science Postdoctoral PositionCRESST/UMBCMail Code 660.8, NASA/GSFC Greenbelt, MD 20771, orVia e-mail to katherine.s.mckee@nasa.gov

Salary and benefits are competitive, commensurate with experience and qualifications. For more information about the proposed research, contact Dr. Judith Racusin (judith.racusin@nasa.gov). For information on CRESST II or UMBC, contact Dr. Don Engel (donengel@umbc.edu).

UMBC is an equal opportunity employer and welcomes all to apply. EOE/M/F/D/V. The TACH project and NASA/GSFC are committed to building a diverse group and encourage applications from women, racial and ethnic minorities, individuals with disabilities and veterans.

Read more:

Postdoctoral Position in Transient and Multi-messenger Astronomy Data Science in Greenbelt, MD for University of MD Baltimore County/CRESST II -...