Data Intensive Search Site

Hi,
I want to make data intensive search site for colleges. There will be more than 40,000 colleges with images, content details of various types. There will be multiple search criteria for colleges and courses. Should I go for Grav to make such site or go for database oriented system? How does Grav ensures fast multi criteria search with loads of data? Is there any implementation of the same?

I have no benchmarks, but in the grand scheme of things, 40k records is not all that many. The only way you’ll know for sure is by trying both and comparing. The speed is going to depend more on the resources available to your server than whether the back end is flat file or RDB/NoSQL driven. And your user flow will dictate certain constraints as well.

Grav is flat file, but flat file does not mean static. There are numerous plugins that load YAML, JSON, or other data for various purposes. Heck, you could even use SQLite if you were feeling cheeky. You could then provide various client-side tools for manipulating that data (there are dozens of datagrid components out there). A Grav plugin can process GET and POST requests and even update data, if you want to go that route.

Grav is just a tool. You shouldn’t build a project around a tool. You should define the project independently and then look for the best tool for the job. In most cases, there are many. Grav i s pretty flexible. There’s nothing you’ve mentioned so far that necessarily precludes Grav as an option, but whether it’s the best option depends on a lot of other things.

Given the amount of records, I would prima facie suggest leveraging a database rather than a flat-file structure. As Perlkonig mentions, SQLite is an apt candidate here, as 40.000 records are not insurmountable nor massive at this point in time - nor is there likely to be an exponential amount of colleges added anytime soon, unless someone spawns a host of online colleges just to spew out degrees.

Grav itself does not, to my knowledge, provide a particular search-mechanism, but as this would be just records in a database searching them would be rapid with any decently written plugin to scour the data. With 40.000 records, practically any PHP method for doing so since 5.3 would work as fast as you’d expect.

As is always the case in these scenarios, perceived speed is more important: A slow interface for accessing the records will irritate far more than the underlying system, and this is where Grav will shine. You’ll have no unnecessary deadweight between the interface and the database, and you’ll actually only need a very simple plugin for reading the database and retur ning the results to the page. This also applies to search, as even the smoothest autocomplete-ajax solution is only hindered by poor implementations.

SQLite is optimal, as it follows Grav’s flat-file ideology closely but without having a file describing each college. 40.000 files on any system is a hassle in moving, copying, storing, deleting, and so the SQLite-database would remain portable, require no installation, and not clutter up your system.

I think you will definitely need to build a search index offline for a search of that size to be effective. SimpleSearch was built to be, well, simple, and only does a string comparison of data.

BTW, I would love to build a powerful search engine capability for Grav, just need the time!