Michael Cochez

Assistant Professor at Vrije Universiteit Amsterdam


I have done research in several information technology related areas during the past years, currently I am mostly working on areas related to data analysis and knowledge representation like knowledge graph embedding, scalable hierarchical clustering, prototype-based ontologies, ontology matching, and knowledge evolution.

Below, some more details about my past research and references to publications. The released software related to my research can be found from the Software page.

A complete list of publications can be found from the Publications page.

Knowledge Graph Embeddings

Most Machine Learning algorithms and models assume that the features of training samples (data points) can be represented as vectors. Knowledge Graphs contain a lot of information which would be useful for training these models. However, these graphs cannot be straightforwardly transformed into vectors suitable for machine learning.

In this research we were looking into approaches which can be used to transform the nodes of large graphs into vectors, which can in turn be used to train neural networks.

  • M. Cochez, P. Ristoski, S. P. Ponzetto, and H. Paulheim. Global RDF vector space embeddings. In C. d’Amato, M. Fernandez, et al., editors, The Semantic Web – ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Proceedings, Part I, pages 190–207. Springer International Publishing, Cham, 2017d. ISBN 978-3-319-68288-4. doi: 10.1007/978-3-319-68288-4 12. preprint
  • M. Cochez, P. Ristoski, S. P. Ponzetto, and H. Paulheim. Biased graph walks for RDF graph embeddings. In Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, WIMS ’17, pages 21:1–21:12, New York, NY, USA, 2017c. ACM. ISBN 978-1-4503-5225-3. doi: 10.1145/3102254.3102279. preprint

  • My presentation at ISWC 2017 http://videolectures.net/iswc2017_cochez_space_embeddings/

Scalable Hierarchical Clustering

Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the quadratic time complexity, these algorithms have also a quadratic space complexity, rendering them infeasible for large datasets.

I am working on more scalable, often approximate algorithms which can be used to cluster large amounts of data.

  • Cochez, M., & Neri, F. (2015). Scalable Hierarchical Clustering : Twister Tries with a Posteriori Trie Elimination. In SSCI 2015 : Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence. Symposium CIDM 2015 : 6th IEEE Symposium on Computational Intelligence and Data Mining (pp. 756-763). IEEE. doi:10.1109/SSCI.2015.12 preprint
  • Cochez, M., & Mou, H. (2015). Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time. In T. Sellis, S. Davidson, & Z. Ives (Eds.), SIGMOD ‘15 : Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 505-517). New York: Association for Computing Machinery. doi:10.1145/2723372.2751521 preprint

Prototype-based Ontologies

Traditionally knowledge bases are designed with a distinction between classes and instances. This, however, leads to several problems, especially in the semantic web where knowledge is meant to be shared and connected. First, it is sometimes very hard to tell whether an entity is an instance or a class. For example, in an ontology describing the domain of vehicles, a Ford model T can be seen as an instance. However, when creating an ontology for a collection of rare cars, Ford model T would be a class. Second, the way ontologies and instances are created does not enable reuse in the sense that an existing definition can be reused multiple times. Prototype-based ontologies solves these problems by having only sort of instances (everything is a prototype) and allowing reuse of prototype definitions.

  • Cochez, M., Decker, S., & Prud’hommeaux, E. (2016, October). Knowledge representation on the web revisited: the case for prototypes. In International Semantic Web Conference (pp. 151-166). Springer International Publishing. doi:10.1007/978-3-319-46523-4_10 preprint

Technical reports, documentation and articles (non-reviewed)

  • Cochez, M., Decker, S., & Prud’hommeaux, E. (2016). Knowledge representation on the web revisited: tools for prototype based ontologies. arXiv preprint arXiv:1607.04809 .

Frequent Pattern Mining

When coping with a large transaction database, it is often interesting to know what kind of patters are frequently occurring. Besides, it is useful if these patterns are maximal, i.e., there are no larger patterns which also occur frequently. In this research we investigated the use of Apache Spark to implement a scalable frequent maximal pattern mining approach.

  • M. R. Karim, M. Cochez, O. D. Beyan, C. F. Ahmed, and S. Decker. Mining maximal frequent patterns in transactional databases and dynamic data streams: a spark-based approach. Information Sciences, Elsevier, 2017

Scalable Ontology Matching

I worked on a Scalable approach for string based ontology matching. The main idea is to use approximate nearest neighbor approaches, known from information retrieval research in the matching process.

  • Cochez, M. (2014). Locality-sensitive hashing for massive string-based ontology matching. In D. Ślęzak, B. Dunin-Kęplicz, M. Lewis, & T. Terano (Eds.), 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) (pp. 134-140). IEEE. doi:10.1109/WI-IAT.2014.26 preprint
  • Cochez, M., Terziyan, V., & Ermolayev, V. (2015). Balanced Large Scale Knowledge Matching Using LSH Forest. In J. Cardoso, F. Guerra, G.-J. Houben, A. M. Pinto, & Y. Velegrakis (Eds.), Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers (pp. 36-50). Lecture Notes in Computer Science (9398). Springer International Publishing. doi:10.1007/978-3-319-27932-9_4 preprint

Knowledge Evolution

Knowledge Evolution is the fact that knowledge changes over time. This is necessary to accommodate changes in the real world.

A common way to encode knowledge is by creating Knowledge Bases (based on some sort of ontology) in which one often distinguishes between the ABox (Assertional knowledge - the data) and TBox (Terminological knowledge - schema or description of the structure the data).

In our work we investigated the creation of a knowledge ecosystem in which artificial knowledge organisms consume the knowledge in the environment. These organisms are then capable of updating their Terminological knowledge if they notice significant changes in the Assertional Knowledge. The core idea behind the Ecosystem is that

The mechanisms of knowledge evolution are very similar to the mechanisms of biological evolution. Hence, the methods and mechanisms for the evolution of knowledge could be spotted from the ones enabling the evolution of living beings.

Starting from this idea, we derived that we could model the knowledge evolution inside the system using ideas from natural evolution. One example is that knowledge which is more fit for its environment, has a higher chance to survive as less fit knowledge.

We also proposed some initial ideas about handling streams of tokens arriving at high rate and altogether representing a huge amount of data.

  • Ermolayev, V., Akerkar, R., Terziyan, V., & Cochez, M. (2014). Toward evolving knowledge ecosystems for big data understanding. In R. Akerkar (Ed.), Big Data Computing (pp. 3-56). Boca Raton, FL: Taylor & Francis.
  • Cochez, M., & Terziyan, V. (2012). Quality of an ontology as a dynamic optimisation problem. In V. Ermolayev, et. al (Eds.), ICT in Education, Research and Industrial Applications: Integration, Harmonization and Knowledge Transfer. Proceedings of the 8th International Conference ICTERI 2012 (pp. 249-256). CEUR Workshop Proceedings (Vol-848). Aachen: RWTH Aachen. Retrieved from http://ceur-ws.org/Vol-848/ICTERI-2012-CEUR-WS-DEIS-paper-1-p-249-256.pdf

Technical reports, documentation and articles (non-reviewed)

  • Cochez, M., Periaux, J., Terziyan, V., Kamlyk, K., & Tuovinen, T. (2014). Evolutionary cloud for cooperative UAV coordination. Jyväskylä, Finland: University of Jyväskylä. Reports of the Department of Mathematical Information Technology. Series C, Software engineering and computational intelligence, 1/2014. access

Anomaly Detection for finding Sleeping Cells in Wireless Networks

The Sleeping Cell problem is a particular type of cell degradation in wireless networks. In practice such cell outage leads to the lack of network service and sometimes it can be revealed only after multiple user complains by an operator. We worked on a data mining framework and experimented with several anomaly detection algorithms to detect this type of outages.

  • Chernov, S., Cochez, M., & Ristaniemi, T. (2015). Anomaly Detection Algorithms for the Sleeping Cell Detection in LTE Networks. In Proceedings of IEEE 81st Vehicular Technology Conference (VTC Spring 2015) (pp. 1-5). IEEE. doi:10.1109/VTCSpring.2015.7145707 preprint

Multi-channel communication

Cloud computing has opened the path to more on-line service oriented business models. Customers are interacting with enterprises’ digital systems trough a multitude of interfaces. We regard each of these possible interfaces as a communication channel and investigated how users could get a less fragmented experience.

  • Cochez, M., Helin, S., & Chen, J. (2013). Cloud communication service. In I. Porres, T. Mikkonen, & A. Ashraf (Eds.), Developing Cloud Software: Algorithms, Applications, and Tools (pp. 227-248). TUCS General Publication (60). Turku: Turku Centre for Computer Science. Retrieved from http://urn.fi/URN:ISBN:978-952-12-2952-7
  • Supervised thesis: Jiawen Chen (2015). Smart Semantic Multi-channel Communication. A Master’s Thesis in Information Technology, University of Jyväskylä. link

Multi-Agent Systems (MAS)


I have been working as a trainee and junior researcher in the UBIWARE project. The aim of the project was to create a new generation middleware platform which will allowed creation of self-managed complex industrial systems. These systems can consist of distributed, heterogeneous, shared and reusable components of different nature, like smart machines and devices, sensors, actuators, RFIDs, web-services, software components and applications, humans, etc.

The platform was implemented as a Multi-Agent System where agents’ beliefs, desires, intentions and communication are encoded using the Semantic Agent Programming Language (S-APL).

Project home page Project documentation

Indeterminacy Reduction in Agent Communication

In the field of agent communications uncertainty and vagueness in the message content and in the achievable results play a primordial role when two agents (human or artificial) communicate. Even though the importance of vagueness and uncertainty has been recognized long ago, only recently mechanisms related to the communications’ semantics that allow a practical approach have been designed. In our work we sketch how theoretical ideas borrowed from situation semantics theory and the works of Sutton on semantic information can be applied in the field of multi-agent systems, using the semantic-agent programming language (S-APL).

  • Paggi, H., & Cochez, M. (2015). Indeterminacy Reduction in Agent Communication Using a Semantic Language. WSEAS Transactions on Systems, 14, 77-89. preprint
  • Paggi, H., & Cochez, M. (2014). Use of a Semantic Language to Reduce the Indeterminacy in Agents Communication. In Proceedings of the 2014 International Conference on Mathematics and Computers in Sciences and Industry (MCSI 2014) (pp. 281-287). IEEE. doi:10.1109/MCSI.2014.64 preprint
  • Master thesis : Cochez M., Semantic Agent Programming Language: use and formalization, Master’s Thesis in Information Technology, University of Jyväskylä, Jyväskylä, Finland, 2012. [link]
  • Khriyenko, O., & Cochez, M. (2011). Open environment for collaborative cloud ecosystems. In CLOUD COMPUTING 2011 : The Second International Conference on Cloud Computing, GRIDs, and Virtualization (pp. 147-153). USA: IARIA.

Technical reports, documentation and articles (non-reviewed)

  • Cochez, M., & Nagy M. (2012). Ubiware application user guide. Jyväskylä, Finland: University of Jyväskylä. Industrial Ontologies Group. Ubiware documentation. [link]
  • Katasonov, A., Nagy M., & Cochez, M. (2012). Ubiware application developer guide. Jyväskylä, Finland: University of Jyväskylä. Industrial Ontologies Group. Ubiware documentation. [link]
  • Katasonov, A., & Cochez, M. (2012). Ubiware Platform Application Developer’s guide - RAB overview. Jyväskylä, Finland: University of Jyväskylä. Industrial Ontologies Group. Ubiware documentation. [link]
  • Cochez, M. (2012). Ubiware Platform Application Developer’s guide - RAB programming. Jyväskylä, Finland: University of Jyväskylä. Industrial Ontologies Group. Ubiware documentation. [link]
  • Cochez, M., & Nagy M. (2012). Ubiware infrastructure guide. Jyväskylä, Finland: University of Jyväskylä. Industrial Ontologies Group. Ubiware documentation. [link]
  • Terziyan, V., Nikitin, S., Nagy, M., Khriyenko, O., Kesäniemi, J., Cochez, M., Pulkkis, A., UBIWARE Platform Prototype v 3.0, Technical Report (Deliverable D3.3), UBIWARE Tekes Project, Agora Centre, University of Jyväskylä, August 2010, 45pp. [link]

Further notes (not finalized as actual reports):

  • Cochez, M., Turing equivalence of the Ubiware Agent [link]
  • Cochez, M., Policy management engine for Ubiware Agent [link]

Social Networks

We did an investigation and developed a prototype for the integration of social network profiles and updates as part of our activities in the Cloud Software Program.

Technical reports, documentation and articles (non-reviewed)

  • Cochez, M., & Nagy, M. (2011). WP1: Mashupper - agent-enabled social cloud. Espoo: Tieto- ja viestintäteollisuuden tutkimus TIVIT Oy. Cloud Software Program Report, Q1-Q2/2011. report

Semantic Analysis of Human Resource Data

We worked on a way to analyze human resource data in a semantic way. The goal was to find candidates with a suitable profile from within the organization. For this task we used semantic relationships between competences and ways to determine missing data in the dataset. The work was done in collaboration with Tieto Oy.

Professional tooling in higher education teaching

In this research effort we analyzed how computer science students use version control systems and what the major sources of confusion are. I also contributed to a paper in which we studied issues encountered when students study in a self-directed manner.

  • Isomöttönen, V., & Cochez, M. (2014). Challenges and Confusions in Learning Version Control with Git. In V. Ermolayev, et. al (Eds.), Information and Communication Technologies in Education, Research, and Industrial Applications: 10th International Conference, ICTERI 2014, Kherson, Ukraine, June 9-12, 2014, Revised Selected Papers (pp. 178-193). Communications in Computer and Information Science (469). Springer International Publishing. doi:10.1007/978-3-319-13206-8_9 Retrieved from https://jyx.jyu.fi/dspace/handle/123456789/44859
  • Isomöttönen, V., Tirronen, V., & Cochez, M. (2013). Issues with a course that emphasizes self-direction. In ITiCSE ‘13: Proceedings of the 18th ACM conference on Innovation and technology in computer science education (pp. 111-116). New York, NY: ACM. doi:10.1145/2462476.2462495
  • Cochez, M., Isomöttönen, V., Tirronen, V., & Itkonen, J. (2013). How do computer science students use distributed version control systems?. In V. Ermolayev, et. al (Eds.), Information and Communication Technologies in Education, Research, and Industrial Applications, 9th International Conference, ICTERI 2013, Kherson, Ukraine, June 19-22, 2013, Revised Selected Papers (pp. 210-228). Communications in Computer and Information Science (412). Springer International Publishing. doi:10.1007/978-3-319-03998-5_11 Retrieved from http://rd.springer.com/chapter/10.1007/978-3-319-03998-5_11
  • Cochez, M., Isomöttönen, V., Tirronen, V., & Itkonen, J. (2013). The use of distributed version control systems in advanced programming courses. In ICTERI 2013 - ICT in Education, Research and Industrial Applications: Integration, Harmonization and Knowledge Transfer (pp. 221-235). CEUR Workshop Proceedings (1000). Aachen: CEUR Workshop Proceedings. Retrieved from http://ceur-ws.org/Vol-1000/ICTERI-2013-p-221-235.pdf