Born out of a combination of a data aggregation solution and a semantic analysis engine, OPPSCIENCE is a French publisher specializing in Intelligence Analysis Management, a data analysis platform that uses intelligence. This solution allows end users to leverage all the data to focus on the essentials and thus increase the quality of decision making. Gilles Andre, its founder and CEO, answers our questions.
Intelligence analysis department mentioned, what?
Computing has been governed by cycles starting with large systems (mainframe computers) since the 1970s. In the age of structured data, relational databases and ETL technologies responded to the need to unify all information fragmented in different applications within a single data warehouse. These data centers have enabled the emergence of data analytics tools to analyze this data and facilitate the decision-making process. Today, companies have access to a massive amount of data from a variety of ever-exciting sources, Big Data.
Intelligence Analytics Management (IAM) is to big data what data science and data analytics was to structured data. The goal is to take advantage of all the data (structured and unstructured) to facilitate decision making in a context that no human can navigate in order to make an informed decision.
IAM is an ecosystem of sustainable technologies that achieve this goal by enabling:
- convergence and securing of all available information at one point;
- Implementation and coordination of any type of AI operations depending on the information format;
- present the knowledge extracted from these processes in a standardized working knowledge model;
- Allowing users to run artificial intelligence autonomously and securely, without the need for technical skills.
The ecosystem allows them to focus on the essentials and thus increase the quality of decision making.
What are the main challenges facing your sector?
The first depends on our ability to know what to do with the possibilities of artificial intelligence in the broadest sense. With this question “How to avoid losing control of information”. The difficulties encountered relate to the following:
- an increasing volume of information that is incompatible with the assimilatory capabilities of humans;
- The diversity of available information that requires differentiated analysis and processing of artificial intelligence. Collecting disparate information allows the user to gain a better understanding and global view.
- Mastering the AI itself, which consists of a variety of techniques that will be implemented by causing the traceability of its original source/context to be lost
It is also necessary to deal with the increasing internationalization of interactions, with the emergence of multilingualism and uncontrolled multichannelity. There will be a continuous increase in the amount of information. Firms will inevitably lose out on scale structuring.
Especially since we are seeing an exponential evolution of formulas that are not entered (heterogeneous data). How can OPPSCIENCE respond to this?
This increase should prompt us to realize that unstructured data holds wealth and that there are techniques to extract it. Few actors really understood the potential of this scripted material. Among them are the judiciary and the police. These fields have been placed for years of writing at the center of their data production. Others must follow. Faced with this need, OPPSCIENCE makes it possible to use algorithms, but with the tracking philosophy, which is to always find the source. The user retains control, with the possibility of correction, closely following what the machine offers, according to a procedure carried out with the customer, according to his wishes.
What is OPPSCIENCE’s position regarding Large Language Models (LLMs) such as ChatGPT?
The emergence of large language paradigms is clearly a step forward in artificial intelligence, and its implementation in the service of OPPSCIENCE use cases opens up exciting prospects. We note that language understanding and reasoning skills can yield significant performance gains, without the need for large amounts of training data.
These developments point to new paradigms in implementing AI models to meet the concrete needs of users. Everyone is now familiar with conversational agents, which can become a powerful tool if they are connected with the right data. However, this mode of interaction is just one possibility among many others, which have yet to be invented to make them effective tools in business issues.
The first question is to pinpoint precisely the tasks in which the LLM provides significant benefit without introducing potentially unacceptable risks or biases into the areas we cover. On these grounds, it is up to us to propose new ways of interacting with artificial intelligence, which will make it possible to get the most out of it, in a way consistent with the modes of operation of our users.
The second challenge relates to the methodology that allows customer data to be fed into the LLM. In fact, because they are pre-trained on huge amounts of “generic” data, these models gain significant generic and language skills, but remain ignorant of the information contained in our clients’ operational documentation. It is necessary to implement various techniques that allow first to “adapt” (fine tuning In English) generic models to make it more efficient in this data, and then provide it in a timely manner with business information to use to respond to a specific user request.
The third challenge relates to the ability to explain and track the information provided. We must give the user the elements that led to such and such a conclusion, and refer him to the documentary sources where these elements were discovered (it is even a legal obligation for certain uses).
The fourth issue concerns combating the biases inherent in the operation of these models. One of the most pressing problems is the so-called “hallucinating” phenomenon: the generative model is designed to produce a plausible answer, but without guaranteeing truth. Thus, when he does not have the necessary information, he can “invent” an answer from scratch, which is difficult to distinguish from a reliable one.
By providing concrete answers to these various issues, OPPSCIENCE promotes a pragmatic approach to integrating LLMs into its technology arsenal, allowing its clients to look at AI in the completeness of solutions while analyzing the cost/benefit/risk ratio of each.
Are LLMs replacing Natural Language Processing (NLP) technologies, which is one of OPPSCIENCE’s main areas of expertise?
One of the critical tasks of NLP for our uses is information extraction: it involves converting information expressed in natural language into structured, unified data that can be stored, aggregated, correlated, filtered, and so on. For example, in the sentence “Microsoft acquired Linkedin on June 13, 2016 for $26 billion,” it refers to two companies participating in a merger and acquisition event. For the investment banker, we will endeavor to qualify the event, its stakeholders, and its characteristics (date, amount, etc.), then verify and supplement the information by analyzing other sources.
However, by construction, generative models such as current LLMs are not geared towards this type of task: they adapt naturally to the paradigm ” text to text With applications such as automatic summarization, translation, question answering, or chatbots.
So the question of its applicability to knowledge extraction according to a structured model remains to this day a relatively open question. It is likely that LLMs will be able to meet needs that are not previously known, and/or for which they are not specifically trained, at the expense of the various biases we mentioned above.
So, there are two areas to consider in terms of integration with other methods we already have:
- Formulate LLM with other techniques for a certain number of information extraction tasks: for certain tasks LLMs will substitute, for others in addition to traditional machine learning in addition to symbolic models.
Because OPPSCIENCE technology is built around the principles of modularity and flexibility, including at the level of NLP operations and AI in general, this combinatorial approach fits naturally into its operation, allowing us to offer our customers to get the most out of each technology.