Contact Information:
Nickname: Michael
IRC Nick: michael-w1
Email: [Redacted]
Github: michael-w1 · GitHub
High level overview of how search works:
User inputs text into Search Field → Text get filtered (converted to lowercase, ignoring accents in characters) → Get converted to tokens by splitting on whitespace and punctuation → Filtering out certain tokens like stop words etc. if needed
Look into finding the analysis setting used in elastic in Solr Search
The search is sent to elastic search index to find results. If the index is not generated, sent query to SQL database then create the index. To generate the index, it pulls chunks of 50k records from the database, and takes into account of relationships in the data such the author of an work. I will need to look to into doing the same thing in Solr Search.
Proposal Timeline
May 8 - June 1 - Community Bonding Period
- Get to know the mentors
- Read documentation on elastic search and Solr search server
- Get familiar with build and testing process
12 Week Project Period
Week 1 - June 2-9 - Initial Research
- Look at the source code with elastic search for BookBrainz
- Note the current data schema for the books
- Review how other MetaBrainz websites use Solr Search
Week 2 - June 9-16 - Solr Setup and Configuration
- Setup Solr development environment
- Create Solr development instance and configure it for search queries
Week 3 - June 16-23 - Design Solr Schema
- Analyze data structure for all entities (authors, work, etc.)
- Design Solr Schema that supports multi-entity search
Week 4 June 23-30 Develop Multi-Entity Search Model and Indexing
Implement a prototype Solr schema to support multi-entity search
Begin indexing data for different entity types
- Test that the multi-entity search works in Solr with the sample data
- Make sure that schema supports combined queries across multiple entities
Week 5 - June 30-July 7 Integrate to Website
- Adapt search logic and integrate with website
- Update backend logic to use Solr Search instead of Elastic search
- Update website routes to work with Solr
Week 6 July 7-14 Data Migration and Indexing
- Migrate data from Elastic search to Solr and ensure that it is indexed correctly
- Test the indexing process for different entities
July 14 - Submit midterm evaluation
Week 7 July 14-21: Finalize Multi-Entity Search Functionality
- Finalize the multi-entity search feature to ensure all entity types are handle in one query
- Ensure that Solr can return results from authors, works, editions, etc.
Week 8 July 21-28: Frontend Integration and Testing
- Integrate Solr-based search into the frontend
- Make sure the UI is working with Solr’s multi-entity search
Week 9 July 28-August 4 - Performance Optimization and User Testing
- Research and see if there are any performance optimizations can be made
- Test that search is working as expected
- Test if search can handle large datasets and heavy usage
Week 10 August 4-11 - Refinements, Deployment and Documentation
- Finalize the search server integration and prepare for production deployment
- Document the search system and changes made to the code
- Deploy to production and monitor for any issues
2 Week buffer for any unexpected delays
Week 11 - August 11-18
Week 12 - August 18-25
- Submit Project
Community Affinities
Some music that I listen to is: bd5bd7cf-cb0e-4848-87e2-df54084a3286bd5bd7cf-cb0e-4848-87e2-df54084a3286
- What interests me the most about BookBrainz is that it is offers valuable data that is easily accessible to anymore. With the code being open source, the project is also community driven.
- I’ve tested out BookBrainz search quite a few times
Programming Precedents
- Started programming in 2018
- I have not contributed to any open source project
- Some of my projects include:
- full-stack websites using React, Typescript and Express with REST API
- Programming Language Interpreter using Golang
Practical Requirements
I will be using a MacBook Pro (M1 Pro, 16 GB RAM) for the project
- I am able to spent 40 hours per week on this project