Category Archives: Data Science
Argentine project analyzing how data science and artificial intelligence can help prevent the outbreak of Covid-19 | Chosen from more than 150…
Data science and artificial intelligence can help prevent outbreaks COVID-19? This is the focus of the research of an Argentine project, coordinated by the Interdisciplinary Center for the Studies of Science, Technology and Innovation (Cecti), which which It was selected from more than 150 proposals from around the world and will receive funding from Canada and Sweden.
The project is called Arphai (in Argentinean English for General Research on Data Science and Artificial Intelligence for Epidemic Prevention) and its goal is to develop tools, models and recommendations that help predict and manage such epidemic events as Covid-19, but are replicable with other viruses.
The initiative originated from Ciecti a civic association set up by the National University of Quilmes (UNQ) and the Latin American College of Social Sciences (FLACSO Argentina) and was selected along with eight other proposals based in Africa, Latin America and Asia. In Latin America only two were selected: Arphai in Argentina and another project in Colombia.
Based on this recognition, it will be funded by the International Development Research Center (Idrc) in Canada and the Swedish International Development Cooperation Agency (Sida), under the Global South AI4COVID programme.
The project is coordinated by Ciecti and involves the Planning and Policy Secretariat of the Ministry of Science, Technology and Innovation and the National Information Systems Directorate of the Access to Health Secretariat of the Argentine Ministry of Health.
Researchers are also working on the initiative, Technical teams from the public administration and members of 19 institutions, including universities and research centers, in six Argentine provinces and the city of Buenos Aires.
The main goal is to develop technological tools based on artificial intelligence and data science, which are applied to electronic medical records (EHR), and allow to anticipate and detect potential epidemic outbreaks and favor preventive decision-making in the field of public health regarding Covid-19.
Among the tasks carried out, progress was also made on a pilot project to implement the electronic medical record designed by the Ministry of Health (Health History Integrated HSI) in the health networks of two municipalities on the outskirts of Buenos Aires, in order to synthesize learning and learning. Design an escalation strategy at the national level.
Another goal is to prioritize the perspective of equity, particularly gender, a criterion expressed in efforts to mitigate biases in developed prototypes (models, algorithms), in analysis and concern for the databases used and their diverse configuration. Teams: 60% of the project is made up of women, many of whom are in leadership positions.
Arphai operates under strict standards of confidentiality, protection and anonymity of data and is endorsed by the Ethics Committee of the National University of Quilmes (UNQ).
View original post here:
Business of Sports Institute at UT McCombs School Founded by Gift from Accenture – UT News – UT News | The University of Texas at Austin
AUSTIN, Texas A new sportseducationandresearchventureunlike any other in the United States that will meet a pressing need in the sports industry is coming to The University of Texas at Austin.
Accenture has donated the founding $1.4 million gift to establish a Business of Sports Institute in the McCombs School of Business at UT Austin.The new institute will bring together all the advantagesof atop business schoolatamajor research institution withanelite sports programand combinesthosewiththeexpertise in sports business consultingand analyticsthat Accenture brings to thismultiyearpartnership.
There is no othermajorbusiness school in the countrybringingon-field, on-court performance analyticsintothe curriculum, intothe research lab,and to sports industry leaderslike we are, said Ethan Burris,facultydirector of theMcCombs SchoolsCenter for Leadership and Ethics, in which the new institute will be housed. Talent management, performance metrics, sports-adjacent verticals and branding there are a ton oftopic areas weare poisedto tackle.
The newBusiness of Sports Institutewillcreate:
Researchwithin the institute is already underwayin mens and womens basketball, with plans to ramp up quicklyto other sports.
This partnership is a colossal boon to our research, Burris said.We now have the financial resources to hire unique and specialized talent for example,experts in biomechanicsorin data visualization.AndAccenture isdevoting significant talent and expertise project managers, data scientists and other engineers.
Usingdata analyticsin new waysbecame a worldwide obsession after the publication in2003ofMoneyball,which chronicledhow Oakland As manager Billy Beane got his bottom-of-the league team to the playoffsusing sabermetricsto hireundervalued butwinningplayerson the smallest budget in the league. His success spurred anew generationofsportsdataanalyticsadvances.
Jon Berger, managing director and U.S. sports analytics lead at Accenture, was a part of this revolution. When Moneyball came out, Berger was working as an NFL and college football analyst for Fox Sports, before moving to ESPN and CBS. Berger had identified early-on the potential for data and gaming to proactively inform sports predictions. Now, nearly 20 years post-Moneyball and the universal application of big data, Accenture is committed to promoting the expansion of the emerging and rapidly developing field of sports analytics.
This partnership hinges on the power of Accentures capabilities and proven track record of turning insights into revenue-generating businesses, Berger said. That coupled with UTs dedication to athletic excellence and McCombs position as a leading business program, creates an unbeatable formula for pushing the envelope in sports analytics, sports science and sports business.
Globally, thesports analytics market size is expected to reach $4.6 billion by 2025, expanding at a rate more than 30%, according to an April 2020report inForbes.And only a small portion of revenue-generating teams in the world have dedicated business intelligence groups, Burris said.
Not only is this a chance for our 580 student athletes to enhance theircraftthrough data analysis, but the minor in sports analytics will be incredibly attractive for students in a wide variety of majors, from kinesiology to communications, said Christine Plonsky, UT executive senior associate athletics director.
McCombs hasbecome a hub for sports data analytics innovation since its hiring in2019of Kirk Goldsberry, the New York Times best-selling author of Sprawlball and a pioneer in the world of basketball analytics. Burris hired Goldsberryto develop coursework,teach sports analytics and oversee sports analytics researchatMcCombs.Goldsberrys groundbreaking insights have already landed him jobs as vice president of strategic research for the San Antonio Spurs, as the first-ever lead analyst for Team USA Basketball, and as a staff writer for ESPN. But his new job as executive director of the Business of Sports Institute accompanied by UT vice president and athletics director Chris Del Conte in the role of strategic adviser is where he says it all comes together.
When it comes to sports, theres no university in the world where Id rather be thinking about this, said Goldsberry.UT is uniquely positioned with its size and passion to blossom into this hub for sports academic work. If theres such a thing as a perfect university setting for elite sports research, its right here in Austin, Texas.
Click here for a video sound bite with Business of Sports Institute Executive Director Kirk Goldsberry.
Read this article:
‘I Want The Folks in Our Society to Be Data Literate So That We Are Making Good Decisions Together for the Good of the World,’ Says Professor…
What makes a roller coaster thrilling or scary? How do you find a unique restaurant when youre planning to go out for dinner? Although seemingly unrelated, NCState College of Education Professor Hollylynne Lee, Ph.D., shared that both of these questions demonstrate the importance of understanding statistics and data science.
Lee is a professor of mathematics and statistics education, a senior faculty fellow with the colleges Friday Institute for Educational Innovation and one of three 2022 finalists for Baylor Universitys highly prestigious Robert Foster Cherry Award for Great Teaching. During her Sept. 23 Cherry Award Lecture, entitled Data Moves and Discourse: Design Principles for Strengthening Statistics Education, she discussed the need to strengthen statistics education and the ways she has used her research to create learning opportunities for both students and teachers.
Through audience participation, Lee highlighted that understanding of data and statistics has far-reaching implications beyond the classroom, with people sharing that theyve used data in a variety of scenarios in their own lives, from buying a car and negotiating salaries to deciding where to live and monitoring COVID case numbers.
We need data literate citizens. I want my neighbors and the folks in our society to be data literate so that we are making good decisions together for the good of the world, Lee said.
To produce those data literate citizens, Lee has devoted her career to helping create lessons that provide students with opportunities to access different mathematical and statistical ideas, keeping in mind the tools available to teachers, the questions that will guide their thinking and the ways that students might interact with the information and each other.
When engaging in purposeful design to create exceptional learning opportunities for students, Lee said that the two most critical aspects are data moves the actions taken to produce, structure, represent, model, enhance or extend data and discourse.
These two things coming together are really what I care a lot about and use in instructional design related to statistics education, she said.
When engaging students in data analysis, Lee said its important the data they are looking at is real. Many textbooks have fake datasets that look realistic, but to truly understand data, the sets need to be large, multivariate and sometimes even messy, she said.
With engaging context rooted in reality, educators can then use appropriate tools to facilitate data moves and visualizations to help students uncover links between data representations.
Using data dashboards related to restaurants in the Raleigh and Durham areas and roller coasters at theme parks in several nearby states, Lee demonstrated how the inclusion of multiple variables into various data analyses can help students draw conclusions about data points.
For example, in a video recorded while Lee was working with students in a middle school classroom, she showed how the addition of data related to the material a roller coaster was made of to a scatter plot that already showed data related to speed and height helped students come to the conclusion that wooden roller coasters tend to have shorter drops and slower speeds compared to steel roller coasters.
Although it may seem counterintuitive to the traditional idea of starting off simple when introducing new ideas, the video demonstrates that more complex data sets can actually help enhance student understanding.
We do not live in a univariable or bivariable world. We live in a multivariable world, and our students are very adept at reasoning that way if we give them the opportunity, Lee said. We know from a lot of research that special cases in data can help students make sense of the aggregate. Instead of explaining what I wanted [the students] to see, I made the graph more complex. I added a third variable so that it could contextualize something about those roller coasters for the students, and it worked.
To bring data lessons into classrooms for students, Lee noted its important for pre-service and practicing teachers to have professional development opportunities surrounding statistics, as many did not have opportunities to learn about the subject in their own K-12 careers.
She discussed how she developed courses for the College of Education that ultimately attracted graduate students from across multiple colleges within NCState University, and how she ultimately applied her data course design principles to four online professional learning courses offered through the Friday Institute for Educational Innovation that have reached more than 6,000 educators in all 50 states and more than 100 countries.
Her Enhancing Statistics Teacher Education with E-Modules ESTEEM project, which began in 2016, created more than 40 hours of multimedia statistics modules for university instructors to use as-needed in courses for pre-service, secondary math teachers. Her current National Science Foundation-funded InSTEP project builds on seven dimensions of teaching statistics and data science data and statistical practices, central statistical ideas, augmentation, tasks, assessment, data and technology that have been proven to make for good learning environments.
Lee noted that, throughout her career, her most joyful moments have always been working with students and teachers. From watching teachers reflect on their practice and work together to improve their pedagogy to engaging with students as they dig into data and begin to make sense of it.
As she encouraged educators and future educators to think about how they will approach different problems in education and daily lives in relation to learning and teaching statistics, she reminded them to have faith in their students abilities and to be open to learning right alongside them.
Teaching with data can be scary because you do have to say that youll be a learner along with them. Youre there thinking really hard in the moment about what that next question might be. That can be scary or thrilling, Lee said.
Read the rest here:
Increase the Readability of Your Python Script With 1 Simple Tool – Built In
One of the biggest challenges beginning programmers have when learning a new language is figuring out how to make their code more readable. Since collaboration is a critical part of the modern working world, its vital we ensure other people can easily understand and make use of our code. At the same time, beginning programmers are struggling to figure out how to make their code work, and figuring out how to make it user-friendly seems like an added hurdle you just dont have time for.
Ive been there, I get it.
Fortunately, there are some pretty simple steps you can take to write clearer code. One of the main ways to make Python code more readable is by collapsing multiple lines of code into a single line. There are many ways to do this so well dive into one particular method: list comprehensions. The best part about this process is, since this is a standard method, other Python programmers reviewing your code will be able to quickly understand what youre doing.
List comprehensions are a Python tool that allow you to create a new list by filtering the elements in your data set, transform the elements that pass the filter and save the resulting listall in a single line of code.
Clarify Your Code5 Ways to Write More Pythonic Code
List comprehensions are a tool that allow you to create a new list by filtering the elements in your data set, transformthe elements that pass the filter and save the resulting listall in a single line of code.
But before we dive into that, lets take a second to think about what code we have to write to accomplish that without list comprehensions.
First we need to create an empty list to fill later.
Then we need to write a for loop to iterate through each value in our data set.
Then we need to write an if statement to filter the data based on a condition.
And finally, we need to write another statement to append the resulting data into the list.
Lets take a look at the code to see what this looks like. Imagine we have a list of 10 numbers and we want to identify and save each number greater than the mean value.
That example code first creates a list called Numbers which contains our data set, then executes the four steps outlined above to create the resulting list. What do you think the result will be?
When youre ready to check your answer, here it is: [6, 7, 8, 9, 10].
This takes four lines of code. Four lines isnt an unmanageable amount of code but if you can write it in a single line that others easily understand, why not?
More on Python ListsHow to Append Lists in Python
The general structure of a list comprehension is as follows:
[Function for Value in DataSet if Condition]
Written in plain English, this says, Execute this Function on each Value in Data Set that meets a certain Condition.
Function is how you want to modify each piece of data, which you may not want to do! Modifications arent necessary and you can use list comprehensions to store values without modifying them.
Value is an iterator we use to keep track of the particular data value tracked on each pass through the for loop.
DataSet is the data set youre analyzing in the list comprehension.
Condition is the condition the data must meet to be included.
To map those terms to the code in our previous example, we have:
Function: Number. Since were only storing the data without modification we store the iterator without calling a function.
Value: Number. This is the iterator name we used in the previous example, so we use it again here. Note: For this example, the term used in Function and Value must be the same because our goal is to store Value.
DataSet: Numbers. This was the list we used as our data set in the previous example and were going to use it the same way here.
Condition: if Number > sum(Numbers)/len(Numbers). This if statement identifies numbers that are greater than the mean of the data set and instructs the list comprehension to pass those values.
Heres how it looks written as a single list comprehension:
The result from executing this code is [6, 7, 8, 9, 10]. Its the same result while written with a single line of code using a structure that other coders will easily understand.
Weve focused on lists but this method can be applied to dictionaries, too (FYI, if youre new to coding, you may see dictionary commonly abbreviated as dict.) We only need to make a few slight changes to the syntax that correspond to the different syntax used for those data structures.
The only difference between sets and lists, syntactically, is that sets use curly brackets instead of the square brackets we use for lists. A set comprehension looks like this:
Notice how there are only two differences here. Numbers with curly brackets instead of square brackets, which makes it a set instead of a list. We also surround the comprehension creating Result with curly brackets instead of square brackets, which makes it a set comprehension instead of a list comprehension. Thats it.
We get the same result in set form instead of list form: {6, 7, 8, 9, 10}.
Learn More With Peter GrantLearn the Fundamentals of Control Flow in Python
There are two differences between list comprehensions and dictionary comprehensions, both of which are driven by the requirements of dictionaries.
First, dictionaries use curly brackets instead of square brackets so the list comprehension structure must use curly brackets. Since this is the same requirement for set comprehensions, if you start by treating a dictionary like a set, youre halfway there.
The second difference is driven by the fact that dictionaries use key: value pairs instead of only values. As a result, you have to structure the code to use key: value pairs.
These two changes lead to a structure that looks like this:
This does yield a slightly different result, because now the output is a dictionary instead of a list or set. Dictionaries have both keys and values, so the output has both keys and values. This means that the output will be: {6: 6, 7: 7, 8: 8, 9: 9, 10:10}.
You may recall I said you can apply functions to these values as you process them. We havent done that yet but its worth taking a moment to consider it now.
So how do we go about applying functions? We simply add their description to the Function part of the code.
For example, if we want to calculate the square of values instead of simply returning the value we use the following code:
You could apply any number of other functions, making list comprehensions a flexible and powerful way to simplify your code.
Keep in mind the purpose of list comprehensions is to make your code easier to read. If your list comprehension makes it harder to read then it defeats the purpose. List comprehensions can become difficult to read if the function or the condition are too long. So, when youre writing list comprehensions keep that in mind and avoid them if you think your code will be more confusing with them than without them.
People learning a new programming language have enough of a challenge figuring out how to make their code work correctly, and often dont yet have the tools to make their code clear and easy to read. Nevertheless, effective collaboration is vital to the modern workplace. Its important we have the necessary tools to make our code readable to those without a coding background. List comprehensions are a standard Python tool you can use to make your code simpler to read and easier for your colleagues to understand.
This article was originally published on Python in Plain English.
Continued here:
Increase the Readability of Your Python Script With 1 Simple Tool - Built In
Pandemic oversight board to preserve data analytics tools beyond its sunset date – Federal News Network
The Pandemic Response Accountability Committee got started last year borrowing on what worked more than a decade earlier, when the Recovery Accountability and Transparency Board oversaw more than $480 billion in stimulus spending following the 2008 recession.
But the PRAC, which will continue to operate until the end of September 2025, is learning its own lessons overseeing more than $5 trillion in COVID-19 spending.
Aside from the PRAC overseeing more than six times more stimulus spending than what Congress authorized to recover from the 2008 recession, speed and urgency also factored into how agencies administered COVID-19 programs.
In light of these circumstances, the PRAC, in a report issued last month, documented five main takeaways from how agencies disbursed pandemic spending:
Former PRAC Deputy Executive Director Linda Miller, now a principal with Grant Thornton, said the urgency of pandemic reliefput that spending at a higher risk of fraud, waste and abuse.
Recovery spending was to try to recover from a recession, and it was a lot of shovel-ready construction projects that had timeframes that were well-established. This was disaster aid, this was more similar to a post-disaster, like a hurricane. Money quickly went out the door, and disaster aid is inherently riskier, because youre in a situation where, because people are in dire circumstances, youre willing to lower the guardrails when it comes to controls, and people are more likely to exploit that bad situation to take advantage of unwitting recipients, Miller said.
PRAC Executive Director Robert Westbrooks told the Federal Drive with Tom Temin that the speed of payments also made it difficult for agencies to prioritize funding for underserved communities.
The Small Business Administration, for example, was supposed to collect demographic data and prioritize underserved communities as part of the Paycheck Protection Program, but its inspector general found the agency wasnt initially meeting those goals.
SBA, however, made underserved businesses a priority in subsequent rounds of PPP spending.
The initial rules were first come, first served. Well, that certainly gives the advantage to folks that have established relationships with national lenders that were responsible for most of the PPP loans and it disadvantages underserved communities, Westbrooks said.
The PRAC, however, is ensuring it does something that didnt happen under the Recovery Board make sure its library of data sets and analytics tools still have a home beyond its sunset date.
Miller said the PRAC plans to turn its Pandemic Analytics Center of Excellence (PACE) over to the Council of the Inspectors General on Integrity and Efficiency, ensuring that the committee has a lasting impact after itdisbands in 2025.
The Recovery Board didnt find a permanent home for its Recovery Operation Center (ROC), which resulted in the loss of analytical capabilities oncethe board disbanded in 2015.
For many of us in the oversight community, we wanted Treasury or somebody to take over the ROC, because here was all this existing infrastructure, a lot of data was already there, but nobody really had the interest or the funding to take it over. And so we wanted to make sure, when we started the PRAC, that we were not going to have a similar situation, Miller said.
The Government Accountability Office found the Treasury Department had the authority to receive ROC assets after the Recovery Board sunset date. But GAO said Treasury had no plans to transfer ROCs hardware and software assets, citing cost, lack of investigative authority, and other reasons.
While OIGs with the financial resources to do so may pursue replication of the ROCs tools, the ROCs termination may have more effect on the audit and investigative capabilities of some small and medium-sized OIGs that do not have the resources to develop independent data analytics or pay fees for a similar service, according to some OIG officials, GAO wrote.
PACE, however, is more than just ROC 2.0, and has analytics capabilities, algorithms and models developed for specific types of fraud, waste and abuse that can be leveraged by agency IGs.
These tools not only empower IGs, but also nonprofits and individuals who can tip off agency watchdogs about red flags. Former Recovery Board Chairman Earl Devaney said in an interview last year that empowering citizen watchdogs helped IGs oversee stimulus spending
Miller said the PRAC has a similar mission of making agency oversight more evidence-based and more data-driven.
Being able to tap into the power of the data science community writ large whether thats in the private sector or academia or even a college student thats interested in the data PRAC absolutely encourages the use of those data sets, and to share anything that has been identified, Miller said.
The PRAC report highlights the importance of agencies using existing federal data sources to determine benefits eligibility, but the committee is also taking steps to improve the quality of its own data on COVID-19 spending recipients.
The American Recovery and Reinvestment Act Congress passed in 2009 required recipients to submit data that went directly to the Recovery Board, which conducted data analysis and also followed up with recipients that didnt submit adequate data.
The result was a really impressive data set that the Recovery Board had, and I think many people thought, Well, thats whats going to happen now. The PRAC is going to be created and theyre going to have a similar data set to what the Recovery Board had, Miller said.
Less than two weeks after Congress passed the CARES Act, however, OMB issued guidance that directed agencies to report all CARES Act spending through existing channels on USASpending.gov. Miller said the PRAC members disagreed with OMBs decision, which went against best practices learned from the Recovery Board.
We believed that was not going to provide the level of transparency that we required to provide through the CARES Act, and weve raised that with OMB on multiple occasions. They felt the recipient reporting burden was too significant to create a separate portal, she said.
The PRAC commissioned a report that found significant reporting gaps in the data available to the PRAC. Miller said the committee conducted its own independent analysis and found about 40,000 awards whose descriptions just said CARES Act.
Miller said the PRACs reliance on USASpending.gov requires the committee to comb state government websites and other sources of reliable pandemic spending data. She said this patchwork quilt process of pulling data from a variety of sources still continues at the PRAC.
Its a time-consuming process for an organization that only has about 25 people on its staff.
What were really trying to do is cobble together something that gets as much data as possible to the public on PandemicOversight.gov, Miller said.
Read this article:
On World Cancer Research Day, Illumina Highlights the Transformative Power of Genomics – Yahoo Finance
On World Cancer Research Day, two experts share their enthusiasm for the potential of genomic and multiomic data to transform oncology
Northampton, MA --News Direct-- Illumina
Cancer is a devastating disease that is quickly becoming the leading cause of death worldwide. In 2020, there were nearly 10 million cancer deaths around the globe. But with the advent of genomics and a democratized access to specialized data, cancer researchers have been able to make great strides in recent years, and cancer survival rates have increased thanks to advancements from research.
September 24 marks World Cancer Research Day. The initiative aims to promote research in the oncology field and calls on institutions, leaders, and enterprises to increase their efforts and accelerate momentum as they strive to lessen the burden of this heartbreaking disease.
At Memorial Sloan Kettering Cancer Center (MSKCC) in New York City, Assistant Attending Computational Oncologist Elli Papaemmanuil, PhD and her lab utilize Illumina technologies to analyze multiomic data (any combination of DNA, RNA, epigenetics and protein) from thousands of patients to gain actionable insights into cancer. There is so much information embedded in the cancer genome that speaks to a patients risk predisposition, disease state, and the likely response to treatment, says Papaemmanuil. We now have the technologies that allow us to access that information. When thinking about a new problem, reading a paper, or working on a challenging patient, clinicians thought processes now immediately jump to thinking about how they can use the latest technologies to solve a problem. I find this shift incredibly inspiring. It shows that were just at the very beginning of adopting these technologies in a much wider context.
Danny Wells, PhD, is the Senior Vice President of Strategic Research & Scientific Co-Founder of Immunai, and is equally enthusiastic about the power of next-generation sequencing to gain critical insights. Im a true believer in genomicsI think the skys the limit. This technology will become central to every major disease research area.
Story continues
His company, also based in New York City, is mapping the bodys immune system using multiomics and AI in order to discover new therapies. Wellss background includes biology and applied mathematics, but today he focuses on immuno-oncology research. For me, there are two really profound differences between working with genomic data in the context of human health compared to the computational work I did in the past. The first thing is that I just love genomic data. Its so incredible that you can take these unbiased approaches. I dont think any other data types give you access to that much information. Secondly, its meaningful to be working on human samples and to know that discoveries youre making can help identify new treatments for patients, new ways to get them the right therapy.
Wells is one example of how advances in sequencing have attracted a range of talent to the field of genomics research. According to Papaemmanuil, the global research community is no longer just trained scientists predominantly studying genomics and data science. Now, individuals that may have diverse backgrounds in cell biology or molecular biology are routinely incorporating technologies that allow the decoding of the cancer genome, methylome, and transcriptome on a daily basis. An important point is that these technologies are not only generating the data, but are also naturally paired with an associated workflow for quality control and data mining after the data generation. The stability of that output is what has effectively enabled and democratized the adoption of these technologies across disciplines. The high quality of the output has enabled very reliable new discoveries for us and for many others.
But there is much more work to be done, and it begins with the researcher. The more comprehensively we interrogate the genome, transcriptome, and methylome, the faster we will be able to develop and validate relevant biomarkers, she says. Once we have exhausted our search space, we will be able to link multiomic data to patient response status and outcomes in all our correlative studies and clinical trials.
To read Illuminas Cancer Research Methods Guide, click here.
To support cancer researchers, visit World Cancer Research Day and use the hashtag #WorldCancerResearchDay on social media.
View additional multimedia and more ESG storytelling from Illumina on 3blmedia.com
View source version on newsdirect.com: https://newsdirect.com/news/on-world-cancer-research-day-illumina-highlights-the-transformative-power-of-genomics-496632398
Read this article:
An Introduction to Portfolio Optimization in Python – Built In
In investing, portfolio optimization is the task of selecting assets such that the return on investment is maximized while the risk is minimized. For example, an investor may be interested in selecting five stocks from a list of 20 to ensure they make the most money possible. Portfolio optimization methods, applied to private equity, can also help manage and diversify investments in private companies. More recently, with the rise in cryptocurrency, portfolio optimization techniques have been applied to investments in Bitcoin and Ethereum, among others.
In each of these cases, the task of optimizing assets involves balancing the trade-offs between risk and return, where return on a stock is the profits realized after a period of time and risk is the standard deviation in an asset's value.Many of the available methods of portfolio optimization are essentially extensions of diversification methods for assets in investing. The idea here is that having a portfolio of different types of assets is less risky than having ones that are similar.
Finding the right methods for portfolio optimization is an important part of the work done by investment banks and asset management firms. One of the early methods is called mean variance optimization, which was developed by Harry Markowitz and, consequently, is also called the Markowitz Method or the HM method. The method works by assuming investors are risk-averse. Specifically, it selects a set of assets that are least correlated (i.e., different from each other) and that generate the highest returns. This approach means that, given a set of portfolios with the same returns, you will select the portfolio with assets that have the least statistical relationship to one another.
For example, instead of selecting a portfolio of tech company stocks, you should pick a portfolio with stocks across disparate industries. In practice, the mean variance optimization algorithm may select a portfolio containing assets in tech, retail, healthcare and real estate instead of a single industry like tech.Althoughthis is a fundamental approach in modern portfolio theory, it has many limitations such as assuming that historical returns completely reflect future returns.
Additional methods like hierarchical risk parity (HRP) and mean conditional value at risk (mCVAR) address some of the limitations of the mean variance optimization method. Specifically, HRP does not require inverting of a covariance matrix, which is a measure of how stock returns move in the same direction. The mean variance optimization method requires finding the inverse of the covariance matrix, however, which is not always computationally feasible.
Further, the mCVAR method does not make the assumption that mean variance optimization makes, which happens when returns are normally distributed. Since mCVAR doesnt assume normally distributed returns, it is not as sensitive to extreme values like mean variance optimization. This means that if a stock has an anomalous increase in price, mCVAR will be more robust than mean variance optimization and will be better suited for asset allocation. Conversely, mean variance optimization may naively suggest we disproportionately invest most of our resources in an asset that has an anomalous increase in price.
The Python package PyPortfolioOpt provides a wide variety of features that make implementing all these methods straightforward.Here, we will look at how to apply these methods to construct a portfolio of stocks across industries.
More From Sadrach PierreNeed to Perform Financial Data Analysis? Why Python Is Your Best Tool.
We will pull stock price data using the Pandas-Datareader library. You can easily install the library using pip in a terminal command line:
Next, lets import the data reading in a new Python script:
We should pull stocks from a few different industries, so well gather price data in healthcare, tech, retail and finance. We will pull three stocks for each industry.Lets start by pulling a few stocks in healthcare. We will pull two years of stock price data for Moderna, Pfizer and Johnson & Johnson.
First, lets import Pandas and relax the display limits on rows and columns:
Next, lets import the datetime module and define start and end dates:
Now we have everything we need to pull stock prices. Lets get data for Moderna (MRNA):
Lets wrap this logic in a function that we can easily reuse since we will be pulling several stocks:
Now, lets pull for Pfizer (PFE) and Johnson & Johnson (JNJ):
Lets define another function that takes a list of stocks and generate a single data frame of stock prices for each stock:
Now, lets pull stocks for the remaining industries:
Healthcare: Moderna (MRNA), Pfizer (PFE), Johnson & Johnson (JNJ)
Tech: Google (GOOGL), Facebook (FB), Apple (AAPL)
Retail: Costco (COST), Walmart (WMT), Kroger Co (KR)
Finance: JPMorgan Chase & Co (JPM), Bank of America (BAC), HSBC Holding (HSBC)
We now have a single dataframe of returns for our stocks. Lets write this dataframe to a csv so we can easily read in the data without repeatedly having to pull it using the Pandas-Datareader.
Now, lets read in our csv:
Now we are ready to implement the mean variance optimization method to construct our portfolio. Lets start by installing the PyPortfolioOpt library:
Now, lets calculate the covariance matrix and store the calculated returns in variables S and mu, respectively:
Next, lets import the EfficientFrontier module and calculate the weights. Here, we will use the max Sharpe statistic. The Sharpe ratio is the ratio between returns and risk. The lower the risk and the higher the returns, the higher the Sharpe ratio. The algorithm looks for the maximum Sharpe ratio, which translates to the portfolio with the highest return and lowest risk. Ultimately, the higher the Sharpe ratio, the better the performance of the portfolio.
We can also display portfolio performance:
Finally, lets convert the weights into actual allocations values (i.e., how many of each stock to buy). For our allocation, lets consider an investment amount of $100,000:
Our algorithm says we should invest in 112 shares of MRNA, 10 shares of GOOGL, 113 shares of AAPL and 114 shares of KR.
We see that our portfolio performs with an expected annual return of 225 percent. This performance is due to the rapid growth of Moderna during the pandemic. Further, the Sharpe ratio value of 5.02 indicates that the portfolio optimization algorithm performs well with our current data. Of course, this return is inflated and is not likely to hold up in the future.
Mean variance optimization doesnt perform very well since it makes many simplifying assumptions, such as returns being normally distributed and the need for an invertible covariance matrix. Fortunately, methods like HRP and mCVAR address these limitations.
The HRP method works by finding subclusters of similar assets based on returns and constructing a hierarchy from these clusters to generate weights for each asset.
Lets start by importing the HRPOpt method from Pypfopt:
We then need to calculate the returns:
Then run the optimization algorithm to get the weights:
We can now print the performance of the portfolio and the weights:
We see that we have an expected annual return of 24.5 percent, which is significantly less than the inflated 225 percent we achieved with mean variance optimization. We also see a diminished Sharpe ratio of 1.12. This result is much more reasonable and more likely to hold up in the future since HRP is not as sensitive to outliers as mean variance optimization is.
Finally, lets calculate the discrete allocation using our weights:
We see that our algorithm suggests we invest heavily into Kroger (KR), HSBC, Johnson & Johnson (JNJ) and Pfizer (PFE) and not, as the previous model did, so much into Moderna (MRNA). Further, while the performance decreased, we can be more confident that this model will perform just as well when we refresh our data. This is because HRP is more robust to theanomalous increase in Moderna stock prices.
The mCVAR is another popular alternative to mean variance optimization. It works by measuring the worst-case scenarios for each asset in the portfolio, which is represented here by losing the most money. The worst-case loss for each asset is then used to calculate weights to be used for allocation for each asset.
Lets import the EEfficicientCVAR method:
Calculate the weights and get the performance:
Next, get the discrete allocation:
We see that this algorithm suggests we invest heavily into JP Morgan Chase (JPM) and also buy a single share each of Moderna (MRNA) and Johnson & Johnson (JNJ). Also we see that the expected return is 15.5 percent. As with HRP, this result is much more reasonable than the inflated 225 percentreturns given by mean variance optimizationsince it is not as sensitive to the anomalous behaviour of the Moderna stock price.
The code from this post is available on GitHub.
More in FinanceIs Your Startup Fundraising in 2021? This Is Your Time.
Although we only considered healthcare, tech, retail and finance, the methods we discussed can easily be modified to consider additional industries. For example, maybe you are more interested in constructing a portfolio of companies in the energy, real estate and materials industry. An example of this sort of portfolio could be made up of stocks such as Exxonmobil (XOM), DuPont (DD), and American Tower (AMT). I encourage you to play around with different sectors in constructing your portfolio.
What we discussed provides a solid foundation for those interested in portfolio optimization methods in Python. Having a knowledge of both the methods and the tools available for portfolio optimization can allow quants and data scientists to run quicker experiments for optimizing investment portfolio.
View post:
An Introduction to Portfolio Optimization in Python - Built In
Life sciences use of digital twins mirrors its application in other industries – MedCity News
Digital twins, virtual representations of objects or processes in the real world, are increasingly finding use in life sciences applications, but the example that many people use to explain the concept is an old one that comes from a different industry entirely. More than a half century ago, the astronauts of Apollo 13 were guided home with the help of simulations conducted here on Earth.
Conditions caused by a potentially catastrophic failure were replicated in simulators erected at NASA, said Andy Greenberg, managing director and North America digital health lead, life sciences, at Accenture. By testing 15 different simulations, NASA was able to find ways to work around problems the astronauts encountered. The test scenarios were analog, not digital. But the concept of conducting tests of equipment or a process while that process is already underway is representative of one of the ways digital twins are employed in the life sciences.
When its at its most powerful, it allows us to ask what if questions in a ways that we could never imagine before, Greenberg said.
Greenberg moderated the panel On Demand: The Role of Digital Twins in Life Sciences, during MedCity News INVEST Digital Health conference this week. He was joined by speakers from GlaxoSmithKline, Dassault Systmes, IDBS, and Unlearn.AI.
GlaxoSmithKlines use of digital twins evokes the simulations that NASA conducted with Apollo 13, according to Sandrine Dessoy, the companys digital innovation lead. In vaccines production, GSK runs simulations to represent whats happening in cell culture or in the purification step that follow. Those models are linked with the actual process in real time. Data from the real process are fed into the simulator, a digital twin that reproduces what can happen in bacteria, a bioreactor, or purification equipment. These simulations give insight into whats happening with the quality of vaccine produced at that moment, so that technicians can see if the product lines up with vaccine specifications, Dessoy said.
Dessoy added that digital twins will play a key role in the development and production of new Covid-19 products. The technology can be used for designing drugs and vaccines; in the case of vaccines, the technology can help scientists select the best antigen to use, she said. Process development can also be done virtually.
If you have to run a cell culture experiment, [it] can take three weeks. In a computer it will take you a few minutes, she said. So, you will drastically reduce the time to develop your process.
At Dassault Systmes, digital twin research led to the development of a human heart that can test medical products virtually before they are tested in patients. Steven Levine, the companys senior director of virtual human modeling, said that this virtual capability is replacing animal testing. The technology has found its fastest adoption in medical devices. Devices that have been validated with the virtual heart are now going through regulatory review. The heart model is also finding applications in clinical practice, allowing surgeons to run simulations of procedures for challenging pediatric cases.
They can practice the surgery beforehand, five, 10 times, predict the outcome, the same way the Apollo scientists did, Levine said. We can do that quickly in the virtual world and then perform those surgeries. And thats happening, and already lives have been saved.
In drug research, digital twin technology is finding use a way of speeding up clinical trials. Unlearn.AI develops digital twins of individuals who are enrolled in drug studies. Charles Fisher, the startups founder and CEO, said the technology allows the company to ask what would happen to a person if he or she were randomly assigned to receive an existing comparative treatment. Predictions from these twins allow clinical trials to run more quickly, with fewer patients needing to be assigned to receive a placebo. The companys technology is currently being used in studies of experimental Alzheimers disease drugs.
We can help people run clinical trials that have the same degree of rigor and evidence, but require 25 percent or so fewer patients, Fisher said. Thats a dramatic reduction in the timeline and costs of the clinical trials.
Alberto Pascual, director of data science and analytics at IDBS, said that the life sciences industry is looking for ways to improve efficiency and reduce costs. The company, part of global technology giant Danaher, develops software used in drug discovery and development. Each step of the drug development process produces copious amounts of data that must be managed. Those data can be used to create modelsdigital twins.
IDBS was doing digital twin technology before the companys computer scientists had heard of the term, Pascual said. Much of its use today is focused on individual instruments that scientists want to replicate with a twin. Looking ahead, he envisions broader simulations, perhaps entire labs.
I can foresee a few years from now, a full factory or a manufacturing plant fully automated with multiple twins, Pascual said. Then you can simulate the full lab.
Photo by Flickr user sumo4fun via a Creative Commons license
See more here:
Life sciences use of digital twins mirrors its application in other industries - MedCity News
The Top 3 Tools Every Data Scientist Needs – Built In
I used to work as a research fellow in academia and Ive noticed academia lags behind industryin terms of implementing the latest tools available. I want to share the best basic tools for academic data scientistsbut also for early career data scientists and even non-programmers looking to employ data science techniques into their workflow.
As a field, data sciencemoves at a different speed than other areas. Machine learning constantly evolves and libraries likePyTorchandTensorFlowkeep improving. Research companies like Open AI and Deep Mind keep pushing the boundaries of what machine learning can do ( i.e.DALL.EandCLIP). Foundationally, the skills required to be a data scientist remain the same: statistics, Python/R programming, SQL or NoSQL knowledge, PyTorch/TensorFlow and data visualization. However, the tools data scientists use constantly change.
Using the right IDE (Integrated Development Environment) for developing your project is essential. Although these tools are well known among programmers and data science hobbyists, there are still many non-expert programmers who can benefit from this advice. Although academia falls short in implementing Jupyter Notebooks, academic research projects offer some of the best scenarios for implementing notebooks to optimize knowledge transfer management.
More From Manuel SilverioWhat's So Great About Jupyter Notebook?
In addition to Jupyter Notebooks, tools like PyCharm and Visual Studio Code are standard for Python Development. PyCharm is one of the most popular Python IDEs (and my personal favorite). Its compatible with Linux, macOS and Windows and comes with a plethora of modules, packages and tools to enhance the Python development experience. PyCharm also has great intelligent code features. Finally, both Pycharm and Visual Studio Code offer great integration with Git tools for version control.
There are plenty of options for machine learning as a service (MLaaS) to train models on the cloud, such as Amazon SageMaker, Microsoft Azure ML Studio, IBM Watson ML Model Builder and Google Cloud AutoML.
In terms of services provided by each one of these MLaaS suppliers, things constantly change. A few years ago, Microsoft Azure was the best since it offered services such as anomaly detection, recommendations and ranking, which Amazon, Google and IBM did not provide at the time. Things have changed.
Discovering which MLaaS provider is the very best is outside the scope of this article but in 2021 its easy to select a favorite based on user interface and user experience: AutoML from Google Cloud Platform (GCP).
In the past months I have been working on a bot for algorithmic trading. At the beginning, I started working on Amazon Web Services (AWS) but I found a few roadblocks that forced me to try for GCP. (I have used both GCP and AWS in the past and generally am in favor of using whichever system is most convenient in terms of price and ease of use.)
Learn More From Our Experts5 Git Commands That Don't Get Enough Hype
After using GCP, I recommend it because of the work theyve done on their user interface to make it as intuitive as possible. You can jump on it without any tutorial. The functions are intuitive and everythingtakes less time to implement.
Another great feature to consider from Google Cloud Platform is Google Cloud storage. Its a great way to store your machine learning models somewhere reachable for any back end service (or colleague) with whom you might need to share your code. I realized how important Google Cloud Storage was when we needed to deploy a locally-trained model. Google Cloud Storage offered a scalable solution and many client libraries in programming languages such as Python, c# or Java, which made it easy for any other team member to implement the model.
Anaconda is a great solution for implementing virtual environments, which is particularly useful if you need to replicate someone else's code. This isnt as good as using containers, but if you want to keep things simple then it is still a good step in the right direction.
As a data scientist, I try to always make a requirement.txt file where I include all the packages used in my code. At the same time, when I am about to implement someone elses code, I like to start with a clean slate. It only takes two lines of code to start a virtual environment with Anaconda and install all required packages from the requirement folder. If, after doing that, I cant implement the code Im working with, then its often someone elses mistake and I dont need to keep banging my head against the wall trying to figure out whats gone wrong.
Before I started using Anaconda, I would often encounter all sorts of issues trying to use scripts that were developed with a specific version of packages like NumPy and Pandas. For example, I recently found a bug with NumPy and the solution from the NumPy support team is degrading to a previous NumPy version (a temporary solution). Now imagine you want to use my code without installing the exact version of NumPy I used. It wouldnt work. Thats why, when testing other peoples code, I always use Anaconda.
Dont take my word for it. Dr. Soumaya Mauthoorcompares Anaconda with pipenv for creating Python virtual environments. As you can see, theres an advantage to implementing Anaconda.
Before You Go...4 Essential Skills Every Data Scientist Needs
Although many industry data scientists already make use of the tools Ive outlined above, academic data scientists tend to lag behind the curve. Sometimes this comes down to funding, but you dont need Google money to make use of Google services. For example, Google Cloud Platform offers a free tier option thats a great solution for training and storing machine learning models. Anaconda, Jupyter Notebooks, PyCharm and Visual Studio Code are free/open source tools to consider if you work in data science.
Ultimately, these tools can help any academic or novice data scientist optimize their workflow and become aligned with industry best practices.
This article was originally published on Towards Data Science.
Read the original post:
Health Data Science Symposium: Smartphones, Wearables, and Health 11/5 Reduced Registration by 10/5 – HSPH News
You are cordially invited to the3rd Annual Health Data Science Symposium , Smartphones, Wearables, and Healthon Nov 5th 2021.
The 2021 focus is on Digital Phenotyping, Wearables, Smartphones, & Personal Sensing across Health. The symposium brings together leading experts for a day of talks, abstract presentations, and collaborative networking around state-of-the-art advances across academia and industry in the health data sciences.
Keynote Speakers:
Please see thewebsite for full scientific program and details.
Currently, the symposium will be in-person & socially-distanced with virtual attendance options. However, should public health guidelines change, the symposium will become fully virtual.
ReducedRegistrationin-person & virtual rates available before Oct 5.Abstract submissionsare encouraged particularly from trainees and students with top-scoring abstracts selected for awards and oral presentations.
Hosted by the Brigham & Womens Hospital/Harvard Medical School Dept of Neurosurgerys Computational Neuroscience Outcomes Center & the Harvard School of Public Health Onnela Lab.
Course Directors:Timothy Smith, MD, MPH, PhD,Bryan Iorgulescu, MD,JP Onnela, PhD
Go here to read the rest: