The primary objective of the Messor Module within the InkBytes ecosystem is to efficiently collect and extract news articles from diverse sources, functioning much like a harvester gathers crops in a field.

Scraping refers to the process of extracting content from different sources on the internet where information is not structured or normalized such as documents, articles, blogs, etc..

Standalone

Messor operates as a standalone module within the broader InkBytes ecosystem. Embracing SOLID principles, Messor is assigned the sole responsibility of extracting news from diverse sources and converting them into a consistent format.This strategic compartmentalization enhances modularity within the ecosystem, promoting seamless collaboration and efficiency.

Impartial Processing

Messor is not designed to form opinions or judgments about the nature, accuracy, or fidelity of news sources. Its sole responsibility is the meticulous extraction of news articles, employing sophisticated techniques to parse and convert them into the standardized format utilized by the InkBytes system.

Data Normalization

Data normalization is a critical step in the web scraping process. Messor’s goal is to aggregate and make sense of information collected from diverse sources, transforming data into a consistent format that enables efficient access, analysis, and storage.

Operational Framework

The operational framework of Messor revolves around directing its functionalities in distinct scenarios, each representing a unique news source. This allows Messor to adapt to varied sources, ensuring a versatile approach to news extraction.

Supported Sources

Source Available Libraries Technology
Online news outlets (Web) Yes newspaper3k Scrapping
PDF Documents No    
Stored/Addressed images (TIFF,PNG) No    
Audio No    
Video/Multimedia No    

Note: This is a work in progress