Data management challenges@ DB group HADAS

Efficient and distributed service based data management

Data management consists on a set of processes for querying, organizing, indexing and replicating data on persistence supports (disk, memory, cache) for enabling the exploitation of information (e.g., analysis, aggregation) and guarantying data integrity. We identify four research challenges:

(1) Design and programming of algorithms for optimized querying, access and data retrieval, data organization on storage support;
(2) Building data management systems by designing efficient processes that implement data management algorithms (1);
(3) Implementation of data management systems on top of target platforms (e.g., services based platforms);
(4) Deployment of data management processes on target execution plate-forms (e.g., P2P, cloud).

The research done by the members[1] of the group HADAS working on data management issues concern points (1) (2), and (3)[2]. There are important results on point (4) through the projects e-CLOUDSS – http://e-cloudss.imag.fr, redSHINE – http://redshine.imag.fr – and, CLEVER – http://clever.imag.fr

We address challenges introduced by the design of algorithms for managing data (1) particularly, we propose algorithms for computing “hybrid” query plans, query optimisation using machine learning techniques and operations research techniques (see projects UBIQUEST – http://ubiquest.imag.fr/ -, and OPTIMACS – http://optimacs.imag.fr,).

We also address the design of algorithms and protocols for managing storage support (cache, and disk) and for composition event flows and thereby observing the use model of resources for implementing data management processes.

Major results: model of hybrid query based on services, algorithm for evaluating queries by coordinating services, prototype HYPATIA[3], service based query evaluation based on continuous and on demand services[4], language MQLiST and a cache model for mashups.

Students participating in these research problems: Carlos Manuel López Enríquez, Lourdes, A. Martínez Medina, Mohamad Othman Abdallah, Juan Carlos Castrejón, Esteban Gutiérrez, Epal Njanem Orléant.

The construction and implementation of data management systems (2, 3) is an important and re-emerging challenge (3)[5]. We use coordination models (workflow) for defining querying, optimization and data storage services. This approach stems form the BPM community, and we use it for describing data management processes. We contribute to the database research with a process-oriented approach rather than to the software engineering domain that proposes services as architecture units (see for example the research done in the group ADELE at LIG – http://www-adele.imag.fr/).

Once data management is provided as a coordination of services, it is necessary to ensure its reliability and the integrity of the data it manages. We propose systems that ensure these properties. We propose a policy model and its associated language for defining and ensuring these properties within data management services. We have defined policies for ensuring specific properties like exception handling, atomicity, security and, persistency. We have also proposed an environment called Pi-SODM for building service coordinations.

We also address dynamic services’ substitution in service coordinations. Service substitution is done considering functional and non-functional properties (response time, reliability, availability, cost) and it serves to implement recovery strategies as a way of reinforcing some integrity and reliability properties.

Major results: Policy based reliable service coordination model and its associated language; atomicity, persistence, security policies for service coordinations[6]; Pi-SODM environment.

Students participating in these research problems: Javier A. Espinosa Oviedo, Placido A. Souza Neto, Christiane Kamdem

We have some results concerning the deployment of data management services on large-scale target execution environments (4, 1). The PhD work of Juan Carlos Castrejón, Esteban Guitiérrez, and, Epal Orléant propose respectively algorithms for storing, replicating and observing resources distribution on the cloud.

[1] Genoveva Vargas Solar, Christine Collet, Christophe Bobineau, Noha Ibrahim.

[2] Par exemple, l’algorithme BP-GYO dans la thèse de Victor Cuevas Vicenttín (cf. publications COOPIS).

[3] Victor Cuevas-Vicenttin, Genoveva Vargas-Solar, Christine Collet, Evaluating Hybrid Queries through Service Coordination in HYPATIA, In Proceedings of the 15th International Conference on Extending Database Technology (EDBT), Berlin, Germany, 2012

[4] Víctor Cuevas-Vicenttín, Christine Collet, Genoveva Vargas-Solar, Noha Ibrahim and Christophe Bobineau, Coordinating services for accessing and processing data in dynamic environments, In Proceedings of the OTM 2010 Conferences, COOPIS 2010, LNCS, 2010

[5] Ionut Subasu, Patrick Ziegler, Klaus R. Dittrich, Towards Service-Based Data Management Systems, BTW Workshops 2007: 296-306

Michael J. Carey, Inside “Big Data Management”: Ogres, Onions, or Parfaits?, In Proceedings of the 15th International Conference on Extending Database Technology (EDBT), Berlin, Germany, 2012

http://www.systems.ethz.ch/

[6] Javier-A. Espinosa-Oviedo, Vargas-Solar Genoveva, José-Luis Zechinelli-Martini and Christine Collet. Policy driven services coordination for building social networks based applications. In Proc. Of the 8th International Conference on Services Computing (SCC’ 11), Work-in-Progress Track, Washington, DC, USA, 2011.

P.A. Souza Neto, M.A., Musicante, G., Vargas-Solar, and J.L. Zechinelli-Martini, PEWS-CT: Adding Contract Support to a Web Service Composition Language. LTPD 2010, 4th Workshop on Languages and Tools for Multithreaded, Parallel and Distributed Programming. Salvador, Bahia – Brazil, 2010