The primary objective of the Messor Module within the InkBytes ecosystem is to efficiently collect and extract news articles from diverse sources, functioning much like a harvester gathers crops in a field.
Scraping refers to the process of extracting content from different sources on the internet where information is not structured or normalized such as documents, articles, blogs, etc..
Messor operates as a standalone module within the broader InkBytes ecosystem. Embracing SOLID principles, Messor is assigned the sole responsibility of extracting news from diverse sources and converting them into a consistent format.This strategic compartmentalization enhances modularity within the ecosystem, promoting seamless collaboration and efficiency.
Messor is not designed to form opinions or judgments about the nature, accuracy, or fidelity of news sources. Its sole responsibility is the meticulous extraction of news articles, employing sophisticated techniques to parse and convert them into the standardized format utilized by the InkBytes system.
Data normalization is a critical step in the web scraping process. Messor’s goal is to aggregate and make sense of information collected from diverse sources, transforming data into a consistent format that enables efficient access, analysis, and storage.
The operational framework of Messor revolves around directing its functionalities in distinct scenarios, each representing a unique news source. This allows Messor to adapt to varied sources, ensuring a versatile approach to news extraction.
Source | Available | Libraries | Technology |
---|---|---|---|
Online news outlets (Web) | Yes | newspaper3k | Scrapping |
PDF Documents | No | ||
Stored/Addressed images (TIFF,PNG) | No | ||
Audio | No | ||
Video/Multimedia | No |
Note: This is a work in progress
Present a clear overview of your services to inform visitors across your site.