L’année 2015 est l’année de la science des données. Pour marquer cet événement, nous colorons cette journée NormaSTIC autour de cette thématique transverse à nos laboratoires. Le programme comporte des exposés aussi bien sur les théories fondatrices de ce domaine que sur leurs applications. Ces exposés se présentent sous une forme plus ou moins formelle afin de faciliter les échanges et d’éventuelles coopérations.
La journée se déroulera au GREYC (Caen), salle S3-351.
Programme :
- 9h30 : accueil/café
- 10h00 : « Pattern Aided Regression Modeling » (Guozhu Dong, Wright State University, US). Slides. See abstract below.
- 11h00 : « A scalable pattern spotting system for historical documents » (Sovann En, LITIS and GREYC). Slides. See abstract below.
- 11h30 : « Input/Output Deep Architecture for structured output classification problems » (Soufiane Belharbi and Clément Chatelain, LITIS). Slides. See abstract below.
- Déjeuner
- 14h00 : « Analyse stratégique de trajectoires dans du jeu vidéo compétitif » (Alexandre Letois et al. GREYC et LITIS). Slides. See abstract below.
- 14h30 : « Graphes & Cie : Fouille de données et modélisation » (Géraldine Del Mondo, LITIS). Slides. See abstract below.
- 15h00: « Data mining pour la polypharmacologie » (Jean-Philippe Métivier, CERMN et GREYC).
- 15h00 : « On Preference-based (soft) pattern sets » (Bruno Crémilleux et al., GREYC). Slides. See abstract below.
- 15h30 : conclusion de la journée et vie de l’axe DAC (bilan de l’appel à projets 2015, préparation de l’appel à projets 2016, groupes de travail,…)
- fin de la journée vers 16h30.
Titles and abstracts / Titres et résumés des présentations:
Guozhu Dong (Wright State University, US) « Pattern Aided Regression Modeling »
Abstract: Constructing accurate numerical prediction models is a fundamental task for a wide range of modeling and forecasting applications, including scientific modeling, medical/healthcare modeling, insurance risk modeling, loan default risk modeling, economic forecasting, and severe weather forecasting. As a result, predictive modeling is also a key ingredient of data science. In this talk I will introduce a new type of regression models, namely pattern aided regression (PXR) models. PXR models were motivated by two observations: (1) Regression modeling applications often involve complex diverse predictor-response relationships, which occur when the optimal regression models (of popular model types) fitting distinct subgroups of data of given application are highly different. (2) State-of-the-art regression methods are often unable to adequately model such highly diverse predictor-response relationships. To adequately model such highly diverse predictor-response relationships, a PXR model uses several pattern and local regression model pairs, which respectively serve as logical and behavioral characterizations of distinct predictor-response relationships for the application, to define a prediction model. I will present a contrast pattern aided regression (CPXR) method, to build accurate and easy-to-explain PXR models. In experiments, the PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by big margins. Using seven simple patterns on average and using linear local regression models, those PXR models are easy to interpret. CPXR is especially effective for high-dimensional data. The CPXR methodology can also be used for analyzing prediction models and correcting their prediction errors. I will also discuss how to use CPXR for classification, including results on medical risk prediction for traumatic brain injury and heart failure.
Bio: Guozhu Dong is a full professor at Wright State University. His main research interests are data science, data mining and machine learning, bioinformatics, and databases. He has published over 150 articles and two books entitled “Sequence Data Mining” and “Contrast Data Mining,” and he holds 4 US patents. He is widely known for his work on contrast/emerging pattern mining and applications, and for his work on first-order maintenance of recursive and transitive closure queries/views.
Sovann En (LITIS and GREYC) « A scalable pattern spotting system for historical document »
Abstract: Information retrieval in historical documents has long consisted in spotting words. In this paper, we focus on graphical pattern spotting. Contrary to object detection, that relies on previous examples of the query, pattern spotting does not rely on any prior information on the query nor predefined class of graphical objects. Another challenge is the computational and storage costs required by the detection and localization. We propose an unsupervised, segmentation-free approach that takes advantage of recent developments in computer vision (product quantization (PQ) and asymmetric distance computation (ADC)) that allows to overcome these issues. We also investigate the use of new, compact descriptors for the data, namely the vectors of locally aggregated descriptors (VLAD) and Fisher Vectors, instead of the usual bag-of-visual-words approach. Results obtained on medieval manuscripts from the DocExplore project show that our approach not only achieves better results but also with a better efficiency in term of time/memory compared to standard approaches. The experimentations shows that VLAD and Fisher Vectors can be fruitfully used in the future for the description of historical documents.
Soufiane Belharbi and Clément Chatelain (LITIS) « Input/Output Deep Architecture for structured output classification problems »
Abstract: Pre-training of input layers has shown to be efficient for learning deep architectures, solving the vanishing gradient issues. In this work, we propose to extend the use of pre-training to output layers in order to address structured output problems, which are characterized by dependencies between the outputs (e.g. the classes of pixels in an image labeling problem). Whereas the output structure is generally modeled using graphical models, we propose a fully neural-based model called IODA (Input Output Deep Architecture) that learns both input and output dependencies.
We apply IODA on a toy problem, as well as on two real-world problems. The first one is a medical image labeling problem, where the classes of pixels follow a particular anatomical structure. The second one is a facial landmark detection problem, where the relative positions of the landmarks are strongly dependent.
Alexandre Letois, Alexandre Pauchet et François Rioult (LITIS et GREYC) « Analyse stratégique de trajectoires dans du jeu vidéo compétitif «
Abstract: Le monde du sport engendre une grande quantité de données, mais ces dernières sont difficiles à récupérer. Cependant, un nouveau type de sport gagne de l’importance depuis quelques années : le sport électronique, ou e-sport. Il s’agit de jeux vidéos compétitifs dans lesquels des joueurs professionnels s’affrontent. Cet environnement génère naturellement un grand nombre de données facilement exploitables. Avec des récompenses de plus en plus conséquentes lors des compétitions, un outil traitant automatiquement ces données peut devenir un atout majeur. Le travail présenté examine la possibilité d’obtenir des informations stratégiques à partir du grand nombre de données disponibles. L’intuition est que la fouille des trajectoires des joueurs permette de prédire et analyser leurs intentions stratégiques.
Géraldine Del Mondo (LITIS) « Graphes & Cie : Fouille de données et modélisation »
Abstract: Au travers de deux thématiques, bioinformatique et géomatique, cette présentation propose de passer en revue quelques travaux dans lesquelles les graphes sont utilisés à des fins de modélisation. Dans un cas il s’agit d’extraire explicitement de l’information des données dans l’autre on cherche avant tout à modéliser les phénomènes en conservant la sémantique des relations entre les entités qui les composent.
Patrice Boizumault, Bruno Crémilleux, Samir Loudni and Willy Ugarte (GREYC) « On Preference-based (soft) pattern sets »
Abstract: In the last decade, the pattern mining community has witnessed a sharp shift from efficiency-based approaches to methods which can extract more meaningful patterns. Recently, new methods adapting results from multi criteria decision analyses such as Pareto efficiency, or skylines, have been studied. Within pattern mining, this novel line of research allows the easy expression of preferences according to a dominance relation on a set of measures and avoids the well-known threshold issue. In this talk, we present the discovery of soft skyline patterns (or soft skypatterns) combining data mining and constraint programming techniques. To avoid an apriori choice of measures, we propose to use the skypattern cube according to the set of measures. Navigation trough the cube indicates differences and similarities between skypattern sets when a measure is added or removed, highlighting the role of the measures.
Informations pratiques
Documents administratifs (l’ordre de mission doit être fait AVANT la mission).
- Pour le GREYC : ordre de mission (demande d’ordre de mission et d’utilisation de véhicule personnel)
- Pour le LITIS : ordre de mission (demande d’ordre de mission), demande d’utilisation de véhicule personnel (utilisation de véhicule)
Plans d’accès pour venir au GREYC.
Des plans d’accès sont disponibles ici.
Retour à la page de l’axe DAC