Search For Course Builder
A Google Intern's Tale
14 Aug 2013
Who I Am
I am a software engineering intern at Google on the Course Builder team. I attend the University of Texas at Austin, pursuing degrees in computer science and the liberal arts. As a high school student, I made use of free, online materials, often in the form of college lectures when I was not being challenged in the classroom. Technology cannot solve all the problems in education but, I know firsthand that it can help. My decision to work on Course Builder did not begin with a desire to work in online education, however, but from my (almost) lifelong love affair with Google, its products, and its mission to organize the world’s information and make it universally accessible and useful. When I got a call from my future intern host on the Course Builder team offering me a summer job at my first-choice company working on a problem I am passionate about, I accepted the offer rather quickly.
What Course Builder Is
Course Builder is an open source project started by Google in 2012 that provides all of the tools for someone to create an interactive online course. It is Google’s first foray into a tool of this kind and is already being used by educators throughout the world. Course Builder is built on Google App Engine in Python.
My Project
Motivation
When I was thinking about how I might use my summer, I wanted to build out a feature that would be immediately useful for students. I thought about various integrations with other Google products that could facilitate collaboration or help organize course materials for students, but I kept coming back to one feature I thought was obviously missing: search! Imagine a student working on a Latin assignment and struggling to remember how the gerundive is used. In a traditional course, that student might use the textbook’s index to find the relative page. In an online course, a student would need to find the relevant page by searching through the course’s structure manually. If the gerundive was explained in a video, as content in online courses often is, she would need to tediously skip through the video a minute at a time hoping to hear something familiar. It would be much better if the student could simply search for “gerundive” and be presented with the relevant information.
“But,” you say, “Google already does search! Why reinvent the wheel?” To that I respond: I’m not trying to reinvent the wheel, but I do not think that a full internet search is appropriate here. Our Latin student could have searched the web for information about the gerundive, but she would have been inundated with complicated and unfamiliar definitions. Moreover, if she were studying a more arcane subject, she might not find any information at all.
So, I decided to implement search so that any student taking a Course Builder course could navigate courses quickly.
Research Phase
When I stated that I didn’t want to reinvent the wheel, what I meant was that I never planned on creating a search platform from scratch, or at the very least, this was a last resort as it would mean that I would have far less time developing user-facing features. Before I began searching for possible search platforms, however, I needed a list of key features that I wanted to implement so that I could assess the relative goodness of each possible solution. I came up with the following:
- The platform needed to be fast and support thousands of concurrent queries per second.
- I wanted to be able to search through transcripts of YouTube videos and return a video fast-forwarded to the relevant spot.
- The delay between a change in the course’s content and the corresponding update of the search index needed to be small.
After some searching, I came up with a few options: the Whoosh package for Python, Google’s Custom Search Engine, and the new Full-Text Search API for Google App Engine (GAE). Though Custom Search Engine was initially attractive because it would require a minimal implementation on the Course Builder side, and the search results would benefit from Google’s ranking algorithms, I had to rule it out because it would not allow me the flexibility to search YouTube transcripts. The GAE API was quite new, and I was wary of it. Whoosh seemed like the most full-featured and stable option, so I created two quick prototypes, one outside of GAE and one inside it. I subjected the sans-GAE one to several load tests, and it performed admirably, but when I tried running Whoosh inside GAE, I ran into problems. Whoosh uses the pickle module to serialize its search index, and because of differences in the standard Python environment and GAE, this problem turned into a nonstarter. It may not sound like a huge issue, but after burning a couple days trying to resolve it, I found out just how large a problem serialization can be and abandoned Whoosh. After cobbling together a quick prototype using GAE’s Full-Text Search API inside Course Builder and subjecting it to a load test, I decided that even though it was still in the preview stage, it would work and was my best option.
Design Overview
If you want to know more about how search engines work in general, you can find plenty of information via your favorite search engine. If you need help using your favorite search engine, there’s a great course to help you here!
When I decided to take on search, it was immediately obvious that it would fit
in quite well with Course Builder’s existing modular structure. In designing any
peripheral feature for a product, the key is defining and limiting the interface
with the rest of the product and with the feature’s dependencies. In the most
general terms, I decided to separate the search module into two files,
search.py
which depends on App Engine’s search API and contains
all of the module’s interface methods, and resources.py
which
depends on Course Builder and manages the discovery of course resources. Here,
resource refers to lessons, announcements, external links, YouTube videos linked
to in the course, and others.
I wanted to keep the search interface as small as possible, and I ended up with
three methods: index_all_docs
, clear_index
, and fetch
. These three methods
were sufficient to cover all of the functionality I needed, and limiting myself
to these three helped control the complexity and made the interface easy to
test.
In resources.py
, I needed to handle both resource discovery and differential
presentation in search results (i.e. displaying YouTube videos as videos). In
the interest of clearly separating two concerns, I implemented two classes,
Resource
and Result
, and for each resource in the course that I wanted to
index and search over, I created a subclass of each of these. The subclass of
Resource
is responsible for finding all of the relevant resources in the
course and creating an index document, which is then put into the App Engine
search index. The subclass of Result
is responsible for taking a result
returned by the App Engine search index and formatting it based on the type of
the original document.
Future Work
While Course Builder’s search module is a robust, end-to-end feature, there is still plenty of room for improvement. One thing I would like to see is analytics for the search history of a course. This would allow course authors to see what students are searching for most frequently and potentially give them insights into what students are struggling to comprehend or retain. More resources from a course, such as questions or PDF files, could be indexed. Also, I think making the search algorithm context-aware would have great value (e.g. videos and pages a student has recently viewed are given higher priority in the search results).
What I Learned
I do not have the time nor the energy to enumerate all of the things I learned this summer; I would be writing for days. Instead, I will attempt to restrict myself to the highlights. Firstly, I learned a great deal about software engineering and working on a team. For the first time this summer, I went through a real code review process, and I cannot overstate how valuable it was to receive almost immediate feedback on every line of code I wrote. My coding abilities improved, and I began to internalize the reviewer’s voice so that while I was programming, I was always thinking of what my reviewer might say. I also learned a great deal about minimizing dependencies and isolating features so that processes like the indexing pipeline are fault-tolerant and independent from the rest of the code base.
More importantly, though, I learned a great deal about myself. In a very short amount of time, having never written production Python code and having never used Google’s internal tools or infrastructure, I was able to become a productive member of my team at one of the most technically-proficient companies in the world. That felt good, and it taught me that I have what it takes to be able to create good software. I contributed a feature to a product that will help improve education for thousands of students, and doing that showed me that I really can affect change in the world. I like that feeling, too, and I think that for me, this is just the beginning.
So, thank you to Google and the Course Builder team for the opportunity to work on such a great project and your excellent guidance. It has been a wonderful summer, and I could not have done it without you.