Science

When tools for collecting, sharing, and analyzing data improve, science advances faster. But many of today’s tools are reaching their breaking points—they fail in the face of growing workloads and increasing complexity, and they have seldom been designed with sustainability in mind. Within IST, we are developing new systems that handle today’s work easily and scale up well. We study the fundamental mathematics and physics of information and apply what we learn to build better tools from the ground up.

Research thrusts

  • Energy-saving, high-performance ways to exchange, process, and store data
  • Tools and methods for extracting vital information from massive datasets
  • Simulations and 3-D visualizations that allow scientists to make accurate predictions even in the presence of uncertainty
  • Tools that help researchers collect and classify data, using machine-learning and machine-vision technologies

What is one of the biggest challenges in every major scientific experiment? Here’s a hint: the Large Hadron Collider alone puts out 15 million gigabytes of data annually. Big experiments process data using computer clusters around the world. They need accurate, lightning-fast data transmission. With students and collaborators, Professor of Computer Science and Electrical Engineering Steven Low invented an algorithm called FAST TCP to move immense amounts of data across high-speed networks. Physicists shattered speed records using FAST TCP, repeatedly winning the Supercomputing Bandwidth Challenge. FAST TCP has moved into the commercial sector and is accelerating video content delivery as well as one of the world’s largest social networks.

Studies of Drosophila melanogaster—the common fruit fly—have yielded insights into neurodegenerative disorders, autism, aging, cancer, and more. In these studies, fruit-fly behavior is often hand-classified by students watching videotape. Allen E. Puckett Professor of Electrical Engineering Pietro Perona and colleagues recently developed machine-vision technology and accompanying mathematical models and computer code that automate tracking and classification. Free software packages developed by Perona, postdocs Kristin Branson and Heiko Dankert, and colleagues are widely used in the scientific community. David Anderson—the Seymour Benzer Professor of Biology—used a system developed by the Perona group in a study that revealed links between dopamine abnormalities and exaggerated hyperactivity—its results could influence treatment of ADHD and learning deficits.

Postdoctoral scholar Lulu Qian and colleagues recently created the first artificial neural network made from DNA. The network played a memory game with its inventors: Qian; Shuki Bruck, the Gordon and Betty Moore Professor of Computation and Neural Systems and Electrical Engineering; and Erik Winfree, Professor of Computer Science, Computation and Neural Systems, and Bioengineering. The circuit of interacting molecules recalled memories when given incomplete patterns and recognized things after examining a subset of features. Says Qian, “we asked, instead of having a physically connected network of neural cells, can a soup of interacting molecules exhibit brainlike behavior?” Apparently, it can. When the researchers asked questions in 27 different ways, the test-tube network responded correctly each time. The work illuminates biological information processing and principles of neural computing at the molecular and intracellular level.

Can an algorithm fix a blurry photo or an error-ridden dataset? Assistant Professor of Applied and Computational Mathematics Joel Tropp explores computational ways to recover information by exploiting underlying patterns in the original data. Surprisingly often, a few basic factors allow a faithful representation of the missing information. This is a modern manifestation of Occam's Razor: do not add terms to a model unless they substantially improve its explanatory power. Balancing the complexity of a model against its fidelity to the data, Tropp's algorithms work incrementally, cherry-picking influential factors to quickly solve complex problems. This work has applications in compressed sensing, a technique for gathering information quickly when the act of measurement is damaging or expensive—from MRI scans on children to characterizations of fragile and fleeting quantum phenomena.

In today’s deluge of data, sorting out relevant information is a central challenge—an astronomer wants data about one kind of galaxy, for example, or a doctor needs a patient’s white-blood-cell counts by class to make a diagnosis. Today, machines help sort data and images, recognizing patterns, learning from hints, and modeling complex behaviors. Their adeptness is partly due to Professor of Electrical Engineering and Computer Science Yaser Abu-Mostafa’s research and teaching in machine learning. His favorite teaching opportunity came when Netflix provided 100 million anonymized data points to computer scientists and statisticians in a bid to improve the way the site offers recommendations. "It was a machine-learning bonanza," he recalled. "The students could experiment with all kinds of ideas to their heart's content."

California Institute of Technology. All rights reserved. ISTInformation Science and Technology EASEngineering and Applied Science