As IPFS files grow in number, a search engine will be needed. Google search does not crawl IPFS files. The currently existing IPFS search engine at https://ipfs-search.com has defects. So we made this project.
A large number of documents and webpages are already stored in IPFS. Google search and other mainstream search engines do not crawl IPFS files. So we are making an IPFS search engine, Also Google controls us. What content we see in Google search is decided by few engineers in Google. Things which we cannot do in Google search - Rank webpages on popularity of the webpage less and relevance with search term more or the vice versa. - Rank webpages based on credible backlinks more and social media popularity less to reduce exaggerated news. - Consider you haven't read any news for the 6 months. What will you do? If you go to Google News, you will be shown the recent articles/posts only. You will not have the ability to pick say top 10 popular content in the past 6 years. In general, we do not have the power to modify the ranking parameters. It also applies to ranking of posts in social media sites like Facebook, Linkedin and Twitter. So we are working on a search engine/recommendation engine where the ranking data is stored in public blockchain. Other users can use that data to build their own ranking algorithms. Also the robots.txt file makes it difficult for new search engines to crawl the web. IPFS being an open platform encourages people like us to start a search engine. Currently, we have our search engine for unstoppable domains and IPFS at https://ipfs.sarchy.online
How it's made
Four parts of this search engine. 1. Crawling - Get available IPFS hashes/URL from commoncrawl, unstoppabledomains Ethereum and Zil blockchain. For crawling IPFS pages, Yacy 2 is used. Scrappy python package and apache nutch wereconsidered. 2. Assessing popularity of domain. - Standard page rank like techniques using backlinks and social media popularity can also be used for Ranking. Presence of ipfs hash is used as minimum ranking parameter. As of now, number of words, number of images, volume of ethereum transactions and outlinks are used as ranking parameters. 3. Search - Parse content from webpage and put into Elasticsearch 4. Frontend - UI for search engine and calls for the APITechnologies used