NSF Leads Federal Efforts In Big Data
NSF Leads Federal Efforts In Big Data
At White House event, NSF Director announces new Big Data solicitation, $10 million Expeditions in Computing award, and awards in cyberinfrastructure, geosciences, training
( Video:Broadcast of OSTP-led federal government big data rollout, held on March 29, 2012, in the AAAS Auditorium in Washington, DC, and featuring: John Holdren, assistant to the President and director, White House Office of Science and Technology Policy; Subra Suresh, director, National Science Foundation; Francis Collins, director, National Institutes of Health; Marcia McNutt, director, United States Geological Survey; Zach Lemnios; assistant secretary of defense for research & engineering, U.S. Department of Defense; Ken Gabriel, acting director, Defense Advanced Research Projects Agency; and William Brinkman, director, Department of Energy Office of Science.
Each official announced initiative(s) that his or her federal government agency was embarking on to embrace the opportunities and address the challenges afforded by the Big Data Revolution.
The announcements were followed by a panel discussion with industry and academic thought leaders, moderated by Steve Lohr of the New York Times. Panelists were: Daphne Koller, Stanford University (machine learning and applications in biology and education); James Manyika, McKinsey & Company (co-author of major McKinsey report on Big Data); Lucila Ohno-Machado, UC San Diego (NIH’s “Integrating Data for Analysis, Anonymization, and Sharing” initiative); and Alex Szalay, Johns Hopkins University (big data for astronomy).
About Big Data: Researchers in a growing number of fields are generating extremely large and complicated data sets, commonly referred to as “big data.” A wealth of information may be found within these sets, with enormous potential to shed light on some of the toughest and most pressing challenges facing the nation.
To capitalize on this unprecedented opportunity–to extract insights, discover new patterns and make new connections across disciplines–we need better tools to access, store, search, visualize and analyze these data.
Credit: National Science Foundation)
(Image: Throughout the 2008 hurricane season, the Texas Advanced Computing Center was an active participant in a NOAA research effort to develop next-generation hurricane models.
Teams of scientists relied on TACC’s Ranger supercomputer to test high-resolution ensemble hurricane models, and to track evacuation routes from data streams on the ground and from space.
Using up to 40,000 processing cores at once, researchers simulated both global and regional weather models and received on-demand access to some of the most powerful hardware in the world enabling real-time, high-resolution ensemble simulations of the storm.
This visualization of Hurricane Ike shows the storm developing in the gulf and making landfall on the Texas coast.
Credit: Gregory P. Johnson, Romy Schneider, John Cazes, Karl Schulz, Bill Barth, The University of Texas at Austin; Frank Marks, NOAA; Fuqing Zheng, University of Pennsylvania; Yonghui Weng, Texas A&M.)
National Science Foundation (NSF) Director Subra Suresh today outlined efforts to build on NSF’s legacy in supporting the fundamental science and underlying infrastructure enabling the big data revolution.
At an event led by the White House Office of Science and Technology Policy in Washington, D.C., Suresh joined other federal science agency leaders to discuss cross-agency big data plans and announce new areas of research funding across disciplines in this field.
NSF announced new awards under its Cyberinfrastructure for the 21st Century framework and Expeditions in Computing programs, as well as awards that expand statistical approaches to address big data.
The agency is also seeking proposals under a Big Data solicitation, in collaboration with the National Institutes of Health (NIH), and anticipates opportunities for cross-disciplinary efforts under its Integrative Graduate Education and Research Traineeship program and an Ideas Lab for researchers in using large datasets to enhance the effectiveness of teaching and learning.
NSF-funded research in these key areas will develop new methods to derive knowledge from data, and to construct new infrastructure to manage, curate and serve data to communities. As part of these efforts, NSF will forge new approaches for associated education and training.
“Data are motivating a profound transformation in the culture and conduct of scientific research in every field of science and engineering,” Suresh said.
“American scientists must rise to the challenges and seize the opportunities afforded by this new, data-driven revolution. The work we do today will lay the groundwork for new enterprises and fortify the foundations for U.S. competitiveness for decades to come.”
NSF released a solicitation, “Core Techniques and Technologies for Advancing Big Data Science & Engineering,” or “Big Data,” jointly with NIH.
This program aims to extract and use knowledge from collections of large data sets in order to accelerate progress in science and engineering research. Specifically, it will fund research to develop and evaluate new algorithms, statistical methods, technologies, and tools for improved data collection and management, data analytics and e-science collaboration environments.
“The Big Data solicitation creates enormous opportunities for extracting knowledge from large-scale data across all disciplines,” said Farnam Jahanian, assistant director for NSF’s directorate for computer and information science and engineering.
“Foundational research advances in data management, analysis and collaboration will change paradigms of research and education, and promise new approaches to addressing national priorities.”
One of NSF’s awards announced today includes a $10 million award under the Expeditions in Computing program to researchers at the University of California, Berkeley.
The team will integrate algorithms, machines, and people to turn data into knowledge and insight. The objective is to develop new scalable machine-learning algorithms and data management tools that can handle large-scale and heterogeneous datasets, novel datacenter-friendly programming models, and an improved computational infrastructure.
NSF’s Cyberinfrastructure Framework for 21st Century Science and Engineering, or “CIF21,” is core to strategic efforts.
CIF21 will foster the development and implementation of the national cyberinfrastructure for researchers in science and engineering to achieve a democratization of data.
In the near term, NSF will provide opportunities and platforms for science research projects to develop the appropriate mechanisms, policies and governance structures to make data available within different research communities.
In the longer term, what will result is the integration of ground-up efforts, within a larger-scale national framework, for the sharing of data among disciplines and institutions.
The first round of awards made through an NSF geosciences program called EarthCube, under the CIF21 framework, was also announced today.
These awards will support the development of community-guided cyberinfrastructure to integrate big data across geosciences and ultimately change how geosciences research is conducted.
Integrating data from disparate locations and sources with eclectic structures and formats that has been stored as well as captured in real time, will expedite the delivery of geoscience knowledge.
“EarthCube is a groundbreaking NSF program,” said Tim Killeen, assistant director for NSF’s geosciences directorate. “It represents a dynamic new way to access, share and use data of all types to accelerate and transform research for understanding our planet. We are asking experts from all sectors–industry, academia, government and non-U.S. institutions–to form collaborations and tell us what research topics they think are most important. Their enthusiastic and energetic response has resulted in a synergy of exhilarating and novel ideas.”
NSF also announced a $1.4 million award for a focused research group that brings together statisticians and biologists to develop network models and automatic, scalable algorithms and tools to determine protein structures and biological pathways.
And, a $2 million award for a research training group in big data will support training for undergraduates, graduates and postdoctoral fellows to use statistical, graphical and visualization techniques for complex data.
“NSF is developing a bold and comprehensive approach for this new data-centric world, from fundamental mathematical, statistical and computational approaches needed to understand the data, to infrastructure at a national and international level needed to support and serve our communities, to policy enabling rapid dissemination and sharing of knowledge,” said Ed Seidel, assistant director for NSF’s mathematical and physical sciences directorate.
“Together, this will accelerate scientific progress, create new possibilities for education, enhance innovation in society and be a driver for job creation. Everyone will benefit from these activities.”
In addition, anticipated cross-disciplinary efforts at NSF include encouraging data citation to increase opportunities for the use and analysis of data sets; participation in an Ideas Lab to explore ways to use big data to enhance teaching and learning effectiveness; and the use of NSF’s Integrative Graduate Education and Research Traineeship, or IGERT, mechanism to educate and train researchers in data enabled science and engineering.
(Image:Collaboration and concurrent visualization of 20 simulation runs performed by the International Panel on Climate Change (IPCC) using the HIPerWall (Highly Interactive Parallelized Display Wall) system.
Located at the University of California, Irvine, the HIPerWall system is a facility aimed at advancing earth science modeling and visualization by providing unprecedented, high-capacity visualization capabilities for experimental and theoretical researchers.
It’s being used to analyze IPCC datasets. The room-sized HIPerWall display measures nearly 23 x 9 feet and consists of 50 flat-panel tiles that provide a total resolution of over 200 million mega pixels, bringing to life terabyte-sized datasets.
Credit: Falko Kuester, California Institute for Telecommunications and Information Technology (Calit2), University of California, San Diego)
A full list of NSF data-enabled science and engineering projects follows.
The following is a list of NSF programs in the Big Data space. Hotlinks to programs and contacts are also noted below.
NATIONAL SCIENCE FOUNDATION (NSF), Lisa-Joy Zgorski
Core Techniques and Technologies for Advancing Big Data Science & Engineering
(Big Data) is a new joint solicitation between NSF and NIH that aims to advance the core scientific and technological means of managing, analyzing, visualizing and extracting useful information from large, diverse, distributed and heterogeneous data sets. Specifically, it will support the development and evaluation of technologies and tools for data collection and management, data analytics, and/or e-science collaborations, which will enable breakthrough discoveries and innovation in science, engineering, and medicine–laying the foundations for U.S. competitiveness for many decades to come. Suzanne Iacono
Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21) develops, consolidates, coordinates, and leverages a set of advanced cyberinfrastructure programs and efforts across NSF to create meaningful cyberinfrastructure, as well as develop a level of integration and interoperability of data and tools to support science and education. Alan Blatecky and Mark Suskin
CIF21 Track for IGERT. NSF has shared with its community plans to establish a new CIF21 track as part of its Integrative Graduate Education and Research Traineeship (IGERT) program.
This track aims to educate and support a new generation of researchers able to address fundamental Big Data challenges concerning core techniques and technologies, problems, and cyberinfrastructure across disciplines. Mark Suskin and Tom Russell
Data Citation, which provides transparency and increased opportunities for the use and analysis of data sets, was encouraged in a dear colleague letter initiated by NSF’s Geosciences directorate, demonstrating NSF’s commitment to responsible stewardship and sustainability of data resulting from federally funded research.
Data and Software Preservation for Open Science (DASPOS) is a first attempt to establish a formal collaboration of physicists from experiments at the LHC and Fermilab/Tevatron with experts in digital curation, heterogeneous high-throughput storage systems, large-scale computing systems, and grid access and infrastructure.
The intent is to define and execute a compact set of well-defined, entrant-scale activities on which to base a large-scale, long-term program, as well as an index of commonality among various scientific disciplines. Randal Ruchti, Marv Goldberg and Saul Gonzalez
Digging into Data Challenge addresses how big data changes the research landscape for the humanities and social sciences, in which new, computationally-based research methods are needed to search, analyze, and understand massive databases of materials such as digitized books and newspapers, and transactional data from web searches, sensors and cell phone records.
Administered by the National Endowment for the Humanities, this Challenge is funded by multiple U.S. and international organizations. Brett Bobley
EarthCube supports the development of community-guided cyberinfrastructure to integrate data into a framework that will expedite the delivery of geoscience knowledge.
NSF’s just announced, first round of EarthCube awards, made within the CIF21 framework, via the EArly Concept Grants for Exploratory Research (EAGER) mechanism, are the first step in laying the foundation to transform the conduct of research in geosciences. Clifford Jacobs
Expeditions in Computing has funded a team of researchers at the University of California (UC), Berkeley to deeply integrate algorithms, machines, and people to address big data research challenges.
The combination of fundamental innovations in analytics, new systems infrastructure that facilitate scalable resources from cloud and cluster computing and crowd sourcing, and human activity and intelligence will provide solutions to problems not solvable by today’s automated data analysis technologies alone. Mitra Basu
Focused Research Group, stochastic network models. Researchers are developing a unified theoretical framework for principled statistical approaches to network models with scalable algorithms in order to differentiate knowledge in a network from randomness.
Collaborators in biology and mathematics will study relationships between words and phrases in a very large newspaper database in order to provide media analysts with automatic and scalable tools. Peter Bickel and Haiyan Cai
Ideas Lab. NSF released a dear colleague letter announcing an Ideas Lab, for which cross disciplinary participation will be solicited, to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments. Doris Carver
Information Integration and Informatics addresses the challenges and scalability problems involved in moving from traditional scientific research data to very large, heterogeneous data, such as the integration of new data types models and representations, as well as issues related to data path, information life cycle management, and new platforms. Sylvia Spengler
The Computational and Data-enabled Science and Engineering (CDS&E) in Mathematical and Statistical Sciences (CDS&E-MSS), created by NSF’s Division of Mathematical Sciences (DMS) and the Office of Cyberinfrastructure (OCI), is becoming a distinct discipline encompassing mathematical and statistical foundations and computational algorithms.
Proposals in this program are currently being reviewed and new awards will be made in July 2012. Jia Li
Some Research Training Groups (RTG) and Mentoring through Critical Transition Points (MCTP) relate to big data. The RTG project at the UC Davis addresses the challenges associated with the analysis of object-data–data that take on many forms including images, functions, graphs and trees–in a number of fields such as astronomy, computer science, and neuroscience.
Undergraduates will be trained in graphical and visualization techniques for complex data, software packages, and computer simulations to assess the validity of models. The development of student sites with big data applications to climate, image reconstruction, networks, cybersecurity and cancer are also underway. Nandini Kannan
The Laser Interferometer Gravitational Wave Observatory (LIGO) detects gravitational waves, previously unobserved form of radiation, which will open a new window on the universe.
Processing the deluge of data collected by LIGO is only possible through the use of large computational facilities across the world and the collective work of more than 870 researchers in 77 institutions, as well as the project. Pedro Marronetti and Tom Carruthers
The Open Science Grid (OSG) enables over 8,000 scientists worldwide to collaborate on discoveries, including the search for the Higgs boson. High-speed networks distribute over 15 petabytes of data each year in real-time from the Large Hadron Collider (LHC) at CERN in Switzerland to more than 100 computing facilities.
Partnerships of computer and domain scientists and computing facilities in the U.S. provide the advanced fabric of services for data transfer and analysis, job specification and execution, security and administration, shared across disciplines including physics, biology, nanotechnology, and astrophysics. Marv Goldberg and Saul Gonzalez
The Theoretical and Computational Astrophysics Networks (TCAN) program seeks to maximize the discovery potential of massive astronomical data sets by advancing the fundamental theoretical and computational approaches needed to interpret those data, uniting researchers in collaborative networks that cross institutional and geographical divides and training the future theoretical and computational scientists. Tom Statler and Linda Sparke
©Typologos.com 2012. The article belongs to National Science Foundation.Press Release 12-060. Credit of broadcast (video and belongs to National Science Foundation. Credits of images and belongs to Gregory P. Johnson, Romy Schneider, John Cazes, Karl Schulz, Bill Barth, The University of Texas at Austin; Frank Marks, NOAA; Fuqing Zheng, University of Pennsylvania; Yonghui Weng, Texas A&M Falko Kuester, California Institute for Telecommunications and Information Technology (Calit2), University of California, San Diego.