ML2R became the Lamarr Institute – find all current information here!

Coordinator

Dr. Ramses Sanchez

Contact

Coordinator

Dr. Ramses Sanchez

Contact

Challenges and Open Research Questions

Machine learning (ML) can generally be understood as the study and design of algorithms that build models from data to solve complex problems. Currently, there are three unresolved problems in ML:

First, one often encounters problems for which purely data-driven models fall short and data-independent information has to somehow be injected into the learning process. This occurs for example when not enough data is available to train a sufficiently generalized model, or when the model needs to meet some problem-specific constraint. How to precisely incorporate, in a principle way, such prior knowledge into the ML pipeline is one open problem in current ML research.

Second, the performance of ML models strongly depends on which representations of the data are used to build them. ML models should be able to uncover meaningful and causal factors of variations (representations) in data, from which one can understand and reason about the data generating mechanisms. How to drive models to automatically learn meaningful representations is another fundamental question in ML.

Third, many modern ML models and algorithms are in themselves complex systems, whose theoretical analysis is fundamental for their interpretability and explainability. The device of techniques to formally characterize training and generalization in these systems is a third longstanding goal in ML.

The Hybrid ML research, which can be split into Informed ML, Representation Learning and Theoretical ML, focuses, respectively, on studying each of these ML problems.

Current Research Highlights and Activities

Informed ML

The Informed ML research involves a variety of complex problems in fields ranging from natural language processing and computer vision to astrophysics.

Recent contributions

A Taxonomy and Survey of Informed ML approaches:
We have recently proposed a definition for informed ML as learning from hybrid sources that consist of data and additional prior knowledge. Our contribution introduces a taxonomy that classifies informed ML approaches according to the source of knowledge, its representation, and its integration into the ML pipeline. To_publication_
Using Generative Adversarial Networks to estimate response times:
An example of informed ML is our recently developed deep generative model for response times, suitable to the large datasets typical of modern service providers. Our approach is informed by Queueing Theory and uses Wasserstein Generative Adversarial Networks to learn response-time distributions conditioned on the continuous arrival process. This work constitutes a first bridge between queuing systems and deep generative modelling. To_publication_
Neural network modelling of environmental stresses in agriculture:
Another informed ML highlight is our method to combine expert knowledge with data-driven techniques for modelling in agriculture, which won first place at the 2019 Syngenta Crop Challenge/Data Science Competition.

Research directions

Information-extraction and knowledge base population algorithms from structured documents via probabilistic soft logic and neural networks
Scene understanding from primitive shapes, object detection and pose estimation
Knowledge-constrained learning from astroparticle simulations
Neural-based knowledge graph reasoning
Password-guessing algorithms via variational autoencoders
Learning similarity functions for structured data
Predicting strain-stretch behavior of non-woven fiber materials

Representation Learning

Our research mainly focuses on different aspects of representation learning for natural language processing.

Recent contributions

Does an Artificial Intelligence have a sense of humor?
Research is ongoing, but Artificial Intelligence (AI) is getting better at understanding jokes. Recent results from hybrid Machine Learning research on unifying vector embeddings and symbolic knowledge play a key role. In the exhibit “AI and Humor,” we tell a joke. Using interactive graphics, we explain what makes us humans laugh and what humor looks like from the perspective of AI research. To the exhibit “AI and Humor”
Imposing tree structures onto vector representations:
We have successfully imposed tree-structured category information onto vector embeddings by promoting the latter into N-balls in a higher dimensional space, so that (i) vector embeddings are well preserved, and (ii) symbolic tree structures are encoded by inclusion relations among balls without error. This approach inherits good features from both symbolic and deep learning techniques. To _publication _
Learning representations for dynamic language models:
We build on top of neural network models for text processing to incorporate temporal information and model how review data changes with time. Specifically, we use the dynamic representations of recurrent point process models, which encode the history of how business or service reviews are received in time, to generate instantaneous language models with improved prediction capabilities. To _publication _

Research directions

Ball embedding for knowledge representation and reasoning
Representation learning for dynamic language models
Deep generative models of semantic structure for natural language understanding
Disentangled representations for language modelling and controlled text generation
Neural-based variational inference for stochastic processes in continuous-time

Theoretical ML

We study different aspects of generalization and training of deep and wide neural networks.

Recent contributions

Heating up data spaces to understand decision problems:

We have recently studied classifiers’ decision boundaries via Brownian motion processes in ambient data space and associated probabilistic techniques. Intuitively, our ideas correspond to placing a heat source at the decision boundary and observing how effectively the sample points warm up. In contrast to well-known methods, our work provides a non-equivalent technique to address decision boundary geometry. In particular, it leads to refinements and improvements of previous state-of-the-art decision boundary analysis, as well as new curvature estimates and more detailed boundary density estimation. To _publication _
Identifying structures in large graphs:
We have adapted sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to medium-size graphs or a single large graph. Our method approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sub-linear time with one-sided error. This allowed us to obtain approximate counts for patterns that are more than five times as large as the current state-of-the-art in graphs with millions of vertices and edges. To _publication _

Research directions

Stable rank and spectral norm effects on the geometry and expressiveness of deep neural networks
Out-of-equilibrium dynamics of training in wide neural networks
Training dynamics of deep generative models via gradient flows in probability space
Maximal closed set and half-space separations in finite closure systems

Coordinator

Dr. Ramses Sanchez

Coordinator

Dr. Ramses Sanchez

Challenges and Open Research Questions

Current Research Highlights and Activities

Informed ML

Recent contributions

Research directions

Representation Learning

Recent contributions

Research directions

Theoretical ML

Recent contributions

Research directions

Recommended Reading and Selected Publications

YouTube

Twitter

LinkedIn

ML-Blog

ResearchGate

YouTube

Twitter

LinkedIn

ML-Blog

ResearchGate