Category Archives: Data Science

UC Berkeley spreads the gospel of data science with new college, free curriculum – Los Angeles Times

They comb through troves of legal records and video evidence to challenge wrongful convictions. They organize medical data to help personalize health treatments for better care. They scrutinize school test scores to investigate inequities. Finding safe drinking water is easier thanks to an analysis tool they created.

UC Berkeleys faculty and students are marshaling the vast power of data science across myriad fields to address tough problems. And now the university is set to accelerate those efforts with a new college, its first in more than 50 years and is providing free curriculum to help spread the gospel of data science to California community colleges, California State University and institutions across the nation and world.

As data floods society faster than ever before, demand has surged for specialists who can organize and analyze it with coding skills, computing prowess and creative thinking. To meet the insatiable demand, as university officials put it, UC Berkeley will open a College of Computing, Data Science and Society after the University of California Board of Regents approved the plan Thursday.

A new college building is scheduled to open during the 2025-26 academic year and will house the data science major, first offered five years ago, with other degree programs in computer science, statistics, computational biology and computational precision health. Some of the programs will be run jointly with the Berkeley College of Engineering and UC San Francisco. UC Berkeley says no new state funds will be required; the campus has raised private funds for 14 new faculty positions and about $330 million so far in gifts for the new building.

Infusing the power of data science across multiple disciplines, from basic and applied sciences to the arts and humanities, will help us to fully realize its potential to benefit society, help address our worlds most intractable problems, and achieve our most visionary goals, said UC Berkeley Chancellor Carol Christ.

Christ told regents Wednesday that huge faculty and student demand not a top-down decision led to the data science program. In just five years, data science has become the universitys fourth most popular major among more than 100 offered, with the number of students choosing it nearly doubling to 1,232 in fall 2022 from fall 2019. The number of students who took the introductory data science course was even larger 4,291 this academic year and many were majoring in other disciplines, including economics, psychology, sociology, political science and public health.

UC Berkeleys new college comes as the University of Southern California plans to expand its own footprint in the field with its new School of Advanced Computing. USC aims to bring computing instruction to all students as well as dramatically expand the number of degrees it confers in technology-related fields. It is part of a $1-billion plan to advance student understanding of the digital world across industries.

At UC Berkeley, the top-ranked public university also plans a broad reach for its mission. The campus is seeding data science into community colleges and other institutions to make the field more accessible to a diversity of students, offering a path to high-paying careers. UC Berkeley students majoring in computer science, for instance, earn an average annual income of $179,000 four years after graduation, according to federal education data. Graduates in data science earn an average annual income of about $130,000, according to Burning Glass, a nonprofit organization that researches employment trends.

The university has posted its curriculum online, complete with assignments, slides and readings, and shared it with more than 89 other campuses. Classes have launched or are set to begin this fall at six California community colleges, four Cal State campuses and other universities including Howard, Tuskegee, Cornell, Barnard and the United States Naval Academy.

We want to expand access to computing and the possibilities of how people can learn these skills that will get them a better job, said Eric Van Dusen, a data sciences faculty member who is leading efforts to share Berkeleys curriculum with community colleges. The UC is the biggest driver of middle-class advancement we have.

El Camino College in Torrance began offering a data science class based on Berkeleys curriculum in 2021. The professor, Solomon Russell, said the inaugural class included a large number of Black students who were drawn to the subject to investigate issues meaningful to them.

The class, for instance, used census files to examine the racial demographics of Alabama and determined that the probability of seating an all-white jury in a 1965 case involving a Black defendants rape conviction and death sentence was extremely low but the U.S. Supreme Court upheld the jury selection.

It enables you to be able to follow your curiosities, answer questions you have and look at the world in a new way, Russell said.

He invokes Netflix to pitch his class. Data science, he says, is a cool new field behind the streaming services recommendations on what a subscriber would probably want to watch based on previous viewing habits.

Rebecca Gloyer, a second-year El Camino student, said she shied away from computing due to lack of self-confidence but decided to rid herself of that toxic energy to get ahead. She took Russells course, learned she could do the work without being a crazy coder and found parallels with her love of theater and drama.

Its storytelling with numbers, said Gloyer, who is weighing offers to transfer to data science programs at UC Berkeley and UCLA.

She said she hopes to use her data science skills to promote environmental sustainability, a passion of hers. One potential project, she said, would use trash collection data to investigate how waste recycling policies are working.

Russell said that El Camino College would not have been able to start the data science class without Berkeleys curriculum, along with an online course offered by the UC campus that he and two colleagues took.

But finding faculty members willing to train themselves in the new field isnt always easy, Van Dusen said, and ensuring that students have the required math training is another challenge. So far, six of 116 California community colleges are offering the class to about 500 students at El Camino, Santa Barbara City College, City College of San Francisco, San Jose City College, Skyline College and Laney College. Cal State Fresno is using the curriculum and CSU campuses at Humboldt, Pomona and Channel Islands plan to do so this fall.

Im trying to find the people who are going to lean in to teach this new curriculum, Van Dusen said. And its not everybody whos ready to take on learning a hard new thing.

The curriculum includes computing, statistics, ethics and about 25 different areas students can choose, including social justice, biology and environmental sustainability. Ethical questions are front and center.

In one project, students are helping build a public database of California police records that could be used to investigate potential misconduct. But they must also assess the reliability of the information.

Jennifer Chayes, associate provost of the computing and data science division, developed an algorithm that took the bias out of computer screening of resumes after she found that women were less likely to get interviews for tech jobs than men. Students will likewise learn to examine the platforms they build or use to ensure fairness and learn skills to identify and fight misinformation, she said.

In another project, Berkeleys data scientists are working with public defenders nationally to create platforms that can search through reams of data that could help their clients an effort to level the playing field with better resourced prosecutors, Chayes said.

We want to advance equity. We want to advance justice, Chayes said. We want resources to be allocated equitably across society. These are values that the University of California and Berkeley hold near and dear.

Read the original:

UC Berkeley spreads the gospel of data science with new college, free curriculum - Los Angeles Times

UC Regents’ vote creates UC Berkeley’s first college in 50 years – UC Berkeley

Jennifer Chayes, left, associate provost and dean of UC Berkeleys new College of Computing, Data Science, and Society, smiles Wednesday with Chancellor Carol Christ. (UCLA photo by Reed Hutchinson)

The UC Board of Regents today voted to establish UC Berkeleys College of Computing, Data Science, and Society (CDSS), the campuss first new college in more than 50 years.

The college will develop, implement and share high-quality, ethics-oriented and accessible curricula, educating a diverse student body in data science, computing and statistics. It will also create new fields, applications and solutions to societal problems through groundbreaking, multidisciplinary research that capitalizes on Berkeleys excellence across campus.

We are thrilled to announce a new college at Berkeley that connects our excellent research and education in computing, data science and statistics with the many data-intensive disciplines across our campus, said Carol T. Christ, Berkeleys chancellor. Infusing the power of data science across multiple disciplines, from basic and applied sciences to the arts and humanities, will help us to fully realize its potential to benefit society, help address our worlds most intractable problems, and achieve our most visionary goals. At Berkeley, we have the opportunity and responsibility to educate data science students from diverse backgrounds to become the ethical leaders we need in private industry, the public service sector, and education.

The vote culminates a three-yearprocess by Berkeley and the UC system to transform the Division of Computing, Data Science, and Society into a college. Now, the college can more effectively form new programs and partnerships, support instruction and research and foster identity and community among faculty, students and alumni.

Graduates celebrate at the 2022 data science commencement. (Photo by KLC fotos)

Its been since the late 1960s that Berkeley added a college to the campus. The journalism school was added in 1968. The public policy school was established in 1969. The College of Computing, Data Science, and Society approval comes as artificial intelligence and other technologies are changing how we teach, learn, connect and understand our world. The Regents vote affirms Berkeleys track record and value as a leader in using scientific and human-centered disciplines to understand and act in this moment of change.

Artificial intelligence, computing and data science are lenses through which we now experience the world, said CDSS Associate Provost and Dean Jennifer Chayes. This college provides Berkeley with opportunities to innovate and incubate new fields of inquiry at the intersection of computing and data science with other data-intensive fields. These interdisciplinary areas are often the most active areas of research, leading to some of the most exciting breakthroughs.

The college includes the Data Science Undergraduate Studies program, the Department of Statistics, the Berkeley Institute for Data Science, the Center for Computational Biology and the Bakar Institute of Digital Materials for the Planet.

It shares the Department of Electrical Engineering and Computer Sciences with the College of Engineering, the Social Science Data Lab (D-Lab) with the Social Sciences division and the Computational Precision Health program with UC San Francisco (UCSF).

Computing, Data Science, and Society will celebrate becoming a college at the data science undergraduatecommencement ceremony May 18 at 7 pm PDT at the Hearst Greek Theatre in Berkeley. Google Senior Vice President Prabhakar Raghavan will be thekeynote speaker and the event will be livestreamed.

As a division, Computing, Data Science, and Society has already been working with campus partners to meet skyrocketing demand from Berkeley students for computing and data science training and from employers in need of employees with these skills. The data science and computer science majors are among the five most popular majors at Berkeley. Many students pursuing other majors also take courses in data science and computer science.

As atop university sending students to nearby Silicon Valley as the next generation of technology leaders, Berkeley has made sure its data and computer science curriculum is interdisciplinary, high quality and society-centered. Its students take courses on how to consider the human context and ethics of their work. Berkeley programs in computer science, data science and statistics are top-ranked byU.S. News & World Report.

Computing, Data Science, and Society has prioritized inclusivity and accessibility for all students. For example, it has shared its data science curriculum withCalifornia community colleges to make this lucrative field more accessible to students from non-traditional backgrounds. Its also partnered with institutions likeTuskegee University to develop programs that build strong data and social science foundations and connections. And its builtinitiatives to support and accelerate the academic growth of students from all backgrounds.

Jennifer Chayes, right, associate provost and dean of UC Berkeleys new College of Computing, Data Science, and Society, smiles Wednesday during a UC Board of Regents discussion of the matter. (UCLA photo by Reed Hutchinson)

This pattern of excellence extends across Berkeley, making the college well-situated to partner with pioneers in data-intensive disciplines to launch groundbreaking interdisciplinary initiatives and fields. Computing, Data Science, and Society has created aninstitute to use machine learning to develop cost-efficient, easily deployable, ultra-porous materials to help combat climate change. Its established the field ofcomputational precision health to improve the quality and equity of health care and has developed aresearch center to help tackle environmental problems.

With the Regents vote today, the college will now develop its administrative and financial structures to operate similarly to other colleges on campus. Colleges can hire their own faculty, for example, and award degrees to students.

As part of this transformation, the undergraduate data science major and computer science major that are currently within the College of Letters & Science will eventually move to the College of Computing, Data Science, and Society once academic and student support systems are created. The new college will also develop new graduate programs with other Berkeley departments, schools and entities.

The entities that have been part of the division are currently distributed in buildings across campus but will ultimately unite in the 367,270-footGateway building, which is under construction on Hearst Avenue at Arch Street. The Gateway is scheduled to open during the 2025-2026 academic year.

Board of Regents members expressed enthusiasm for Computing, Data Science, and Society at the Academic and Student Affairs Committeemeeting on May 17. Regent Lark Park, the committees chair, noted the importance of the college to California, society and our future.

It really is so impressive to see how this has grown organically the growth, the interest, the popularity, said Park. You have acted to make your destiny and that will change the destiny of others.

Continue reading here:

UC Regents' vote creates UC Berkeley's first college in 50 years - UC Berkeley

Tamil Nadu: Its a mad rush for AI, data science courses, and how –

CHENNAI: From predicting delivery time of food ordered on apps to suggesting short clips or movies on social media sites as per the taste of the user, artificial intelligence already has its compulsive grip on our daily life. The emergence of ChatGPT, Googles Bard and a few live voice translation apps has created so much excitement among students and parents that BTech AI and data science courses have become the most preferred after computer science engineering this year.With colleges increasing their intake, around 16,000 students will be joining AI and data science courses in 2023-24. But the colleges are hardly ready to teach the rapidly developing and transforming field. Most engineering colleges do not have trained faculty members to teach AI. Whether the graduates will be industry-ready when they pass out four years is hardly known.Students, under peer pressure, are already flocking to the highly specialized BTech course under management quota. Experts say that they should have application-oriented knowledge, exposure to data management and machine learning.Colleges say they will use live problems to teach AI and other emerging areas. Some colleges are developing courses with industry experts and foreign universities.We have professors of practice from IT companies like Cognizant who train our students with live problems, said Abhay Meganathan, vicechairman, Rajalakshmi Group of Institutions.Some colleges are encouraging their faculty members to enroll in online degrees, certificate courses and hackathons to get trained in coding skills and artificial intelligence.Experts say fundamentally there is not much difference between computer science engineering and AI.Fundamentally, the curriculum will be common for these courses, said Anna University vice-chancellor R Velraj.He also said AI will not take away jobs. It will increase engineering jobs as we will need more engineers to deploy AI systems, Velraj said.To tap the opportunities in emerging areas, Anna University plans to offer a chance to study AI, data science to all engineering students.Colleges should teach students to solve real world problems to produce quality AI engineers. Students who want to become AI engineers, should be very good in programming, said professor B Ravindran, head, Centre for Responsible AI, IIT Madras.He also assured that jobs will not be taken away. AI is not mature to the point where you can replace humans completely with AI, he added. Some experts warned against creating more seats in AI saying students from tier-2, tier-3 colleges will struggle to get jobs of their choice.The number of jobs in AI has increased. But, not to the extent that educational institutions have increased their intake so on and so forth. Students - especially from non top-tier institutions - often struggle to get jobs of their choice after doing specialisation in AI, said Shourya Roy, president, ACM India Special Interest Group on Knowledge Discovery and Data Mining.AI CURRENTLY USED INMovie recommendations on online streaming platforms including OTTVideo recommendations on YouTubeFood delivery apps use AI to predict expected delivery timeOnline shopping sites using AI for their operations to combine orders, selecting warehouses to ship productsRide hailing apps to predict travel timeVirtual assistant devices and assistant appsHOW WILL IT BE DIFFERENT FROM THE COMPUTER SCIENCE CURRICULUM?Fundamentals are common for both computer science and AI. Students need to learn more continuous maths like probability theory, linear algebra, optimization. Students need to be strong in the fundamentals of maths and computing. AI engineering needs very good programming skills.HOW MANY SEATS AVAILABLE IN BTECH AI AND DATA SCIENCEMore than 16,000 seats will be available in BTech AI and data science courses in engineering colleges for 2023-24 academic year

Continue reading here:

Tamil Nadu: Its a mad rush for AI, data science courses, and how -

IIT Madras, University of Birmingham open application process for joint Masters programmes in Data Science and Artificial Intelligence – The Indian…

The Indian Institute of Technology (IIT) Madras and The University of Birmingham today started their application process for their Joint Masters Programme in Data Science and Artificial Intelligence. This is the first time that any IIT has partnered with the UK Russell Group of Universities, IIT Madras claims.

You have exhausted your monthly limit of free stories.

To continue reading,simply register or sign in

Subscribe to read on

Select your plan


Digital Only

This premium article is free for now.

Register to continue reading this story.

This content is exclusive for our subscribers.

Subscribe to get unlimited access to The Indian Express exclusive and premium stories.

This content is exclusive for our subscribers.

Subscribe now to get unlimited access to The Indian Express exclusive and premium stories.

The last date to apply for the joint programme is June 11, 2023. Successful candidates will study at Chennai and as well as Birmingham and both universities will issue a single degree. The applicants will also be required to carry out a research project.

The Indian Express had earlier reported about the education partnership between IIT Madras and The University of Birmingham.

Those students who have a degree in Bachelors of Science and BTech with over 60 per cent are eligible to apply. Students are also exempted from IELTS/TOEFL/PTE score if they score more than 75 per cent in English from CISCE/CBSE and West Bengal Board in Class 12 and more than 80 per cent in English from any other state board in Class 12.

Students will study at the School of Computer Science, University of Birmingham and will also benefit from the curriculum of IIT Madras. The National Institutional Ranking Framework (NIRF) ranked IIT Madras as the number engineering university in India.

The decision to launch the joint masters programme was agreed upon by the Director of IIT Madras Prof V Kamakoti and Prof Adam Tickell during latters visit to Chennai in November 2022 and the Memorandum of Understanding (MoU) was signed between the two in February 2023.

IE Online Media Services Pvt Ltd

First published on: 18-05-2023 at 16:54 IST

Read the original:

IIT Madras, University of Birmingham open application process for joint Masters programmes in Data Science and Artificial Intelligence - The Indian...

Data Science and AI to Change the Placement Scenario? – Analytics Insight

This article explores the impact of data science and AI on changing the placement scenario

Data science and Artificial Intelligence (AI) have emerged as transformative fields, revolutionizing industries across the globe. With their ability to extract valuable insights from vast amounts of data and create intelligent systems, data science, and AI are reshaping the job market and the way higher education institutions approach placements. This article explores the profound impact of data science and AI on placements, highlighting their roles, the evolving job market, and the promising future they offer to aspiring professionals.

Data science encompasses a multidisciplinary approach to derive meaningful insights from large and unstructured data. By employing scientific methods, algorithms, and various techniques such as data mining, machine learning, and data visualization, data science transforms raw data into actionable insights, guiding decision-making and forecasting.

AI, on the other hand, focuses on creating intelligent machines capable of performing tasks that typically require human intelligence, including perception, reasoning, learning, and decision-making. Utilizing machine learning, natural language processing, computer vision, and robotics, AI powers applications like speech recognition, image classification, and fraud detection.

In todays data-driven world, the volume of data is increasing exponentially. According to the International Data Corporation (IDC), global data is predicted to reach a staggering 175 zettabytes by 2030. Organizations recognize the immense value in leveraging data for decision-making and gaining a competitive edge, leading to a surge in demand for data scientists and AI professionals.

The job market for data science and AI experts has been witnessing remarkable growth. Studies indicate that data scientist jobs have become one of the most sought-after positions in the 21st century. The U.S. Bureau of Labour Statistics projects approximately 11.5 million job openings in Data Science by 2028. Forbes reports that 79% of global business executives believe that companies cannot survive without embracing data science and analytics. The World Economic Forum predicts that Data Scientists and Analysts will emerge as the number one emerging role globally by 2028, highlighting the need for data science expertise across all sectors.

Data science and AI have given rise to new job roles that previously did not exist, such as data analysts, data scientists, data architects, data engineers, machine learning engineers, and AI researchers. These professionals possess skills in programming languages like Python, R, Java, and SQL. Additionally, they have strong foundations in statistics, probability, linear algebra, and calculus. Those focusing on AI require expertise in machine learning and deep learning techniques.

The change in placement scenarios and demand for data scientists and AI professionals transcends industry boundaries. Healthcare, manufacturing, finance, sports, fashion, and retail are just a few sectors benefiting from their expertise. Due to the high demand, data science and AI professionals command higher salaries compared to counterparts in other fields. The flexible nature of their work allows them to collaborate with companies from different parts of the world, unlocking a plethora of opportunities for remote work arrangements.

The versatility of data science and AI extends to solving a wide range of business problems across industries. In healthcare, these fields enable personalized medicine, disease diagnosis, and treatment optimization. By analyzing vast amounts of patient data, healthcare providers can identify patterns and develop predictive models, aiding in better decision-making.

In manufacturing, data science and AI drive transformative changes through predictive maintenance, quality control, and supply chain optimization. Sensor data analysis allows manufacturers to identify patterns and predict equipment failure, reducing downtime and increasing efficiency.

In finance, data science and AI enhance financial services by enabling fraud detection, risk assessment, and personalized investment advice. Analyzing vast amounts of financial data helps institutions make informed decisions. The retail industry benefits from data science and AI in the form of personalized marketing, inventory management, and customer experience optimization. By analyzing customer data, retailers can tailor marketing campaigns, increase customer loyalty, and boost sales.

Data science and AI techniques play a vital role in cybersecurity by detecting and responding to threats swiftly and effectively. Machine learning algorithms identify patterns in network traffic, enabling the detection of potential cyber-attacks before they occur. AI-powered security systems monitor networks and systems in real-time, automatically responding to threats and mitigating risks.

Data science and AI are revolutionizing the job market, creating new roles and reshaping existing ones. The combination of skillsets gained through these fields, along with their wide applicability, makes them highly valuable and rewarding areas of study.

The placements for data science and AI professionals are expected to continue growing rapidly as the world generates increasingly vast amounts of data. Aspiring professionals can look forward to lucrative career prospects in diverse industries, both locally and globally. Higher education institutions must adapt their curriculum to equip students with the necessary skills and knowledge to meet industry demands.

Data science and AI are transforming the placements scenario, offering a promising future for aspiring professionals. The explosive growth of data, the evolving job market, and the wide applicability of data science and AI techniques across industries contribute to the increasing demand for skilled professionals.

With the ability to extract valuable insights from data, create intelligent systems, and solve complex business problems, data science, and AI pave the way for innovation, growth, and remote work possibilities. As the world continues to rely on data-driven decision-making, the importance of data science and AI in shaping the job market will only continue to rise

Read more from the original source:

Data Science and AI to Change the Placement Scenario? - Analytics Insight

10 Best Data Science Tools and Technologies – Analytics Insight

The article below is an extensive guide to the 10 best data science tools and technologies

Whether you refer to it as business decision-making, planning, or forecasting for the future, data science has become increasingly important in almost every sector of the modern economy. Everything falls under the innovation and patterns that were going ahead with. In the world of digital technology in 2022, we have a lot of data and are using a variety of tools and methods to make it useful for a variety of purposes. On the off chance that youd discuss any famous innovation, it would be Data Science as it were.

To do certain things, you need to know how to use a variety of tools and any of the programming languages in data science. Even if youre willing to dig a little deeper, there are approximately 5,24,000 jobs worldwide and more than 38,000 in India right now. Based on these figures, it is necessary to stay up to date on the most data science tools and data science technologies because there is a growing demand for data scientists in almost every industry.

In recent years, Python has been by far the most widely used programming language among data scientists. In the Kaggle overview, 86.7% of information researchers said that they use Python, which was over two times the second most famous reaction. Since Python is relatively straightforward to learn, it is simple for people with no prior experience with programming to read and write Python code. A significant number of the most famous information science devices are either written in Python or exceptionally viable with Python.

TensorFlow is an open-source machine learning application development library developed by Google. Giving clients a huge range of assets and instruments, TensorFlow is notable for empowering AI designers to construct enormous and exceptionally complex brain organizations. Additionally, TensorFlows software libraries include a large number of pre-written models to assist with specific tasks and are highly compatible with Python.

Apache Hadoop is an open-source framework for processing and storing enormous amounts of data that is extremely popular for big data repositories. Big data tasks are distributed across computing clusters in the way that Hadoop works. This is crucial because it makes it possible for a companys big data systems to function in a way that is both scalable and economical.

The R programming language is generally utilized for information science, all the more explicitly for measurable demonstrating and investigation. Besides Python, its presumably the main language to be aware of for anybody working in information examination. R and Python are used by data scientists for a lot of the same things, but there are a few key differences. R places a greater emphasis on the statistical aspects of data science than Python does.

Perhaps of the most generally utilized datum perception apparatuses among information researchers, Salesforces Scene can investigate a lot of both organized and unstructured information. It can then take the information it breaks down and convert it into various accommodating perceptions including intuitive diagrams, outlines, and guides. Tableaus ability to connect to a wide range of data sources is what makes it so useful.

SAS Viya was designed specifically for data analysis, making it one of the most complete platforms available for data management and analysis. It is one of the most well-known factual examination apparatuses among huge organizations and associations, because of its incredible dependability, security, and capacity to work with enormous informational indexes. In addition, SAS integrates with numerous well-known programming languages and tools to provide data scientists with extensive libraries and tools for data modeling.

Although the ubiquitous spreadsheet program may not be the first tool that comes to mind when you think of data science, it is one of the tools that data scientists use most frequently for data processing, data visualization, data cleaning, and calculation. Additionally, it is simple to pair with SQL for faster data analysis.

While unstructured information stores get a ton of press, information researchers accomplish a lot of work with organized information that dwells in conventional data sets. Additionally, they frequently rely on SQL (Structural Query Language) when attempting to access that data.

A large number of them are questioning information from SQL-based data sets like MySQL, PostgreSQL, SQL Server, and SQLite, yet you can likewise utilize SQL with huge information instruments like Flash and Hadoop.

DataRobot utilizes man-made brainpower and AI to help inform clients with an information display. It truly has something for everyone and aims to democratize the data modeling procedure. Business analysts with little programming experience can build sophisticated predictive models thanks to the platforms ease of use and lack of requirements for programming or machine learning.

Trifacta is a well-known information science instrument that can accelerate the course of information fighting and readiness. In a process that would otherwise take a very long time, Trifacta quickly transforms raw data into a format that data scientists can use for actual analysis. Trifacta works by automatically transforming raw data sets after combing through them to find possible changes.

Read the original:

10 Best Data Science Tools and Technologies - Analytics Insight

5 newer data science tools you should be using with Python – InfoWorld

Python's rich ecosystem of data science tools is a big draw for users. The only downside of such a broad and deep collection is that sometimes the best tools can get overlooked.

Here's a rundown of some of the best newer or lesser-known data science projects available for Python. Some, like Polars, are getting more attention than before but still deserve wider notice; others, like ConnectorX, are hidden gems.

Most data sits in a database somewhere, but computation typically happens outside of a database. Getting data to and from the database for actual work can be a slowdown.ConnectorX loads data from databases into many common data-wrangling tools in Python, and it keeps things fast by minimizing the amount of work to be done.

Like Polars (which I'll discuss soon), ConnectorX uses a Rust library at its core. This allows for optimizations like being able to load from a data source in parallel with partitioning. Data in PostgreSQL, for instance, can be loaded this way by specifying a partition column.

Aside from PostgreSQL, ConnectorX also supports reading from MySQL/MariaDB, SQLite, Amazon Redshift, Microsoft SQL Server and Azure SQL, and Oracle. The results can be funneled into a Pandas or PyArrow DataFrame, or into Modin, Dask, or Polars by way of PyArrow.

Data science folks who use Python ought to be aware of SQLitea small, but powerful and speedy, relational database packaged with Python. Since it runs as an in-process library, rather than a separate application, it's lightweight and responsive.

DuckDB is a little like someone answered the question, "What if we made SQLite for OLAP?" Like other OLAP database engines, it uses a columnar datastore and is optimized for long-running analytical query workloads. But it gives you all the things you expect from a conventional database, like ACID transactions. And there's no separate software suite to configure; you can get it running in a Python environment with a single pip install command.

DuckDB can directly ingest data in CSV, JSON, or Parquet format. The resulting databases can also be partitioned into multiple physical files for efficiency, based on keys (e.g., by year and month). Querying works like any other SQL-powered relational database, but with additional built-in features like the ability to take random samples of data or construct window functions.

DuckDB also has a small but useful collection of extensions, including full-text search, Excel import/export, direct connections to SQLite and PostgreSQL, Parquet file export, and support for many common geospatial data formats and types.

One of the least enviable jobs you can be stuck with is cleaning and preparing data for use in a DataFrame-centric project. Optimus is an all-in-one toolset for loading, exploring, cleansing, and writing data back out to a variety of data sources.

Optimus can use Pandas, Dask, CUDF (and Dask + CUDF), Vaex, or Spark as its underlying data engine. Data can be loaded in from and saved back out to Arrow, Parquet, Excel, a variety of common database sources, or flat-file formats like CSV and JSON.

The data manipulation API resembles Pandas, but adds .rows() and .cols() accessors to make it easy to do things like sort a dataframe, filter by column values, alter data according to criteria, or narrow the range of operations based on some criteria. Optimus also comes bundled with processors for handling common real-world data types like email addresses and URLs.

One possible issue with Optimus is that it's still under active development but its last official release was in 2020. This means it may not be as up-to-date as other components in your stack.

If you spend much of your time working with DataFrames and you're frustrated by the performance limits of Pandas, reach for Polars. This DataFrame library for Python offers a convenient syntax similar to Pandas.

Unlike Pandas, though, Polars uses a library written in Rust that takes maximum advantage of your hardware out of the box. You don't need to use special syntax to take advantage of performance-enhancing features like parallel processing or SIMD; it's all automatic. Even simple operations like reading from a CSV file are faster.

Polars also provides eager and lazy execution modes, so queries can be executed immediately or deferred until needed. It also provides a streaming API for processing queries incrementally, although streaming isn't available yet for many functions. And Rust developers can craft their own Polars extensions using pyo3.

Data science workflows are hard to set up, and even harder to set up in a consistent, predictable way. Snakemake was created to enable just that: automatically setting up data analyses in Python in ways that ensure everyone else gets the same results you do. Many existing data science projects rely on Snakemake. The more moving parts you have in your data science workflow, the more likely you'll benefit from automating it with Snakemake.

Snakemake workflows resemble GNU make workflowsyou define the things you want to create with rules, which define what they take in, what they put out, and what commands to execute to accomplish that. Workflow rules can be multithreaded (assuming that gives them any benefit), and configuration data can be piped in from JSON/YAML files. You can also define functions in your workflows to transform data used in rules, and write the actions taken at each step to logs.

Snakemake jobs are designed to be portablethey can be deployed on any Kubernetes-managed environment, or in specific cloud environments like Google Cloud Life Sciences or Tibanna on AWS. Workflows can be "frozen" to use some exact set of packages, and any successfully executed workflow can have unit tests automatically generated and stored with it. And for long-term archiving, you can store the workflow as a tarball.

Continued here:

5 newer data science tools you should be using with Python - InfoWorld

UCLA’s Fielding School to offer master’s of data science in health … – UCLA Newsroom

As the amount of health-related electronic data has exploded in recent years, the need for people with the skills to analyze and utilize this information in the service of public health has become crucial. To help meet that demand, the UCLA Fielding School of Public Health will offer a new master of data science in health degree program beginning this fall.

The program, designed for both working professionals seeking to thrive in this data-rich environment and recent college graduates hoping to enter the burgeoning field, will be housed in the schools department of biostatistics and will provide instruction in a wide range of data science methods, including statistical modeling, machine-learning and data engineering, mining, visualization and communication.

By developing the knowledge to effectively process and deploy information from sources as varied as public health surveys, patient medical records and genomic sequencing databases, as well as growing data from wearable health devices, environmental sensors and even social media, program participants will position themselves to help shape public health policy and health industry practices well into the future, said Dr. Ron Brookmeyer, dean of the Fielding School and a distinguished professor of biostatistics.

The ubiquity of health information presents both an unprecedented opportunity and enormous responsibility, Brookmeyer said. With our schools longstanding academic strength in this area and our close community partnerships, we are ideally situated to lead multidisciplinary initiatives that turn data science into better health outcomes locally and globally.

The program will be delivered in hybrid mode, with in-person weekend classes and weekday online sessions. Students will typically enroll in two classes per quarter and earn their degree in two years.

Read the full release on the UCLA Fielding School of Public health website.

Follow this link:

UCLA's Fielding School to offer master's of data science in health ... - UCLA Newsroom

Analytics and Data Science News for the Week of May 19; Updates … – Solutions Review

Solutions Review editors curated this list of the most noteworthy analytics and data science news items for the week of May 19, 2023.

Keeping tabs on all the most relevant analytics and data science news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last week, in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy analytics and data science news items.

Akkios time-series modeling allows you to understand patterns, analyze influential factors, and create a model to forecast the future. And, if you integrate your forecast with your data warehouse, you can see reports update live on your predictions.

Read on for more.

The introduction of Patch Search offers the worlds first solution that allows users to perform highly-detailed searches using sample clusters of images. This innovative feature enables users to search specific regions or patches within an image, achieving a level of granularity that was previously impossible. The resulting accuracy allows for more precise and efficient search results, leading to a significant improvement in users efficiency and accuracy.

Read on for more.

IBM said that its acquisition of Ahana is in line with its strategy to invest in open-source projects and foundations. The company acquired Red Hat in 2018, cementing its open-source strategy. Explaining the rationale behind Ahana, IBM cited the companys contributions to the Presto open-source project. Ahana is involved in has four project committes and has two technical steering committee members, IBM added.

Read on for more.

The combined entity is led by CEO Mike Capone, completing the latest chapter in the companys strategic vision to deliver best-in-class data integration, data quality, and analytics solutions. With Talend, Qlik brings a new approach, offering a full range of best-in-class capabilities, helping customers eliminate technical debt and cost while increasing enterprise confidence that trusted data is available for decision-making when it matters most.

Read on for more.

The offering enables customers to build an end-to-end data cloud that brings data from across the enterprise landscape using theSAP Datasphere solution together with Googles data cloud, so businesses can view their entire data estates in real-time and maximize value from their Google Cloud and SAP software investments.

Read on for more.

The resulting solutions are part of a new global relationship between the two companies, and are expected to deliver reduced costs, improved profits, increased risk mitigation and greater customer satisfaction for Teradata/FICO customers.

Read on for more.

The strategic acquisition of Merilytics will be the foundation of Accordions Data & Analytics Practice to strengthen long-term support for its CFO clients. Financial terms of the private transaction were not disclosed. Founded in 2011 and headquartered in Hyderabad, India, Merilytics uses decision sciences and an analytics-based approach to generate superior data-driven returns for its PE-focused clients.

Read on for more.

The idea behind the platform was to build a single platform that teams can then use to create their ETL pipelines and analytics workflows, as well as their machine learning pipelines. And while there are other projects on the market that offer similar orchestration capabilities, the idea here is to build a tool that is specifically built for the needs of machine learning teams.

Read on for more.

Virtualitics will use the funding to accelerate the expansion of its AI Platform, adding more out-of-the-box machine learning and data analytics capabilities for exploring and analyzing data for financial services as well as other industries. Virtualitics allows users to make queries in plain English and generate 3D network graph visualizations that reveal important connections in data.

Read on for more.

In this expert roundtable discussion, our panelists will share their experiences, discuss best practices for integrating technology solutions, and offer guidance for establishing a sustainable information risk program for ensuring the governed accessibility to sensitive corporate data. The 60-minute virtual event is moderated by an independent industry analyst, with a topic introduction hosted by Solutions Review all broadcast live to an audience of registered attendees.

Read on for more.

For consideration in future data science news roundups, send your announcements to the editor:

Tim is Solutions Review's Executive Editor and leads coverage on data management and analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in Data Management, Tim is a recognized industry thought leader and changemaker. Story? Reach him via email at tking@solutionsreview dot com.

See more here:

Analytics and Data Science News for the Week of May 19; Updates ... - Solutions Review

Emory School of Nursing taps Emory Healthcare nurses as data … – Emory News Center

Three Emory Healthcare nurses have been named Project NeLL (Nurses Electronic Learning Library) Scholars at the Emory University Nell Hodgson Woodruff School of Nursing for the 2023-2024 academic year.

A collaboration between the School of Nursing and Emory Healthcare, the Project NeLL Scholars Program is a one-year data science immersion for Emory Healthcare nurses.

During the program, the scholars will learn how to use Project NeLL, the School of Nursings powerful suite of apps for nursing data science that enables nurses to lead data-driven solutions to health care challenges. Project NeLL provides access to 2.7 billion de-identified health records and 37 trillion data points from across the care continuum that nurses can use in their research efforts.

The Project NeLL Scholars will have the opportunity to complete a big data research project using the platforms searchable big data repository and disseminate their findings through peer-reviewed publications.

Big data allows nurses to gather the insights they need to create solutions that are grounded in the reality of whats happening across health care today, says Vicki Hertzberg, PhD, FASA, professor and founding director of the Center for Data Science at the School of Nursing, which operates Project NeLL. We are delighted for the appointment of these Project NeLL Scholars, who will no doubt positively affect the nursing landscape through their research efforts.

2023-2024 NeLL Scholars are:

Stephanie Bennett, PhD, MBA, RNBennett is director of patient and family centered care and patient education at Emory Healthcare and adjunct assistant professor at the Nell Hodgson Woodruff School of Nursing. Bennetts research centers on strengthening the science of patient and care partner engagement with interprofessional teams and researchers to co-produce patient-centered outcomes. She has a particular interest in improving outcomes for historically underrepresented groups. She holds a PhD from the University of Cincinnati, an MBA from the University of Phoenix, and a BSN from the University of Southern Mississippi.

Monique Bouvier, PhD, ARNP, PNP-BCBouvier is an assistant professor at the School of Nursing and a research nurse scientist for Emory Healthcare. She is working on nursing care delivery model redesign and improvements in nursing documentation practice. She has mentored frontline nurses and nurse leaders on research, evidence-based practice, and quality improvement, and she has authored numerous peer-reviewed journal articles and presented at national and international conferences. She obtained her PhD from the University of San Diego with a research focus on influenza-like illness and symptomatology.

Darlene Rogers, PhD, RN, NPD-BCRogers is the nurse scientist for Emory Decatur, Emory Hillandale, and Emory Long-Term Acute Care Hospitals, and adjunct assistant professor at the Nell Hodgson Woodruff School of Nursing. She supports nurses and clinicians in their research, evidence-based practice, and quality improvement initiatives. A former gerontological medical-surgical nurse, she obtained a PhD from Mercer University, an MSN from Duke University, a post-graduate certificate in nursing informatics from Duke University, and a BSN from the Nell Hodgson Woodruff School of Nursing.Her research involves two areas: clinician perceptions of robotics in the critical care environment, and nursing care models in the acute care setting.

For more information on Project NeLL, click here.

As one of the nation's top nursing schools, the Nell Hodgson Woodruff School of Nursing at Emory University is committed to educating visionary nurse leaders and scholars. The school offers undergraduate, masters, doctoral and non-degree programs, bringing together cutting-edge resources, distinguished faculty, top clinical experiences, and access to leading health care partners to shape the future of nursing and impact our world's health and well-being.


Emory School of Nursing taps Emory Healthcare nurses as data ... - Emory News Center