Sp21 CS F002C 02W Adv Data Struct/Algrm In C++ ⻚⾯ Binary Search! Search... 查看所有⻚⾯ Immersive Reader Binary Search Binary Search If you haven't read the Linear Search notes, read those first. Binary Search Linear Search might be fast for small inputs but if you the input (the number of things you're searching through) becomes twice as big, it's going to take about twice as long to search. If the input gets K !mes as big, it's going to take about K !mes as long to search. We can do be"er! With Binary Search and a sorted list, if we double the size of the input, it'll take the same amount of extra !me no ma"er how big the input is. For the rest of this page, we'll use an example where we have a list of words and we want to know if some word is in that list. This might be useful for checking if a word is an English word, like when you're typing and you see the red squiggly lines under the word because it's mispelled, or if you implemented a game like Scrabble and you want to know if a word is valid or not. Idea Let's walk through an example, searching for the word "olive" in this sorted list: index word 0 apple 1 banana 2 durian 3 lemon 4 lime 5 mango 6 orange 7 peach 8 pear 9 raspberry Since the list is sorted, you know that if the word you're looking for is in the list, it must be between the first word (at index 0) and the last word (at index 9). Binary Search is a process of elimina!on. We can look in the middle and see if that middle word comes before or a#er "olive"... and just keep repea!ng this un!l we either find "olive" or eliminate all words in the list. Note: In C++ and many programming languages, it is idioma!c to represent a range of integers by including the lower bound and excluding the upper bound! At each step, you check the middle word in the range. Since we start with [0, 10), we first look at index 5: "mango". lower bound upper bound middle index middle word what happens 0 10 (1 + 10) / 2 == 5 mango "olive" comes a#er "mango" so we update the lower bound to be just past "mango" 6 10 (6 + 10) / 2 == 8 pear "olive" comes before "pear" so we update the upper bound to be just before "pear" 6 8 (6 + 8) / 2 == 7 peach "olive" comes before "peach" so we update the upper bound 6 7 (6 + 7) / 2 == 6 orange "olive" comes before "orange" so we update the upper bound 6 6 (6 + 6) / 2 == 6 orange now our range is empty (lower bound == upper bound), so we know "olive" is not in the list Implementa!on bool _contains_word_bs( const vector
& words, const string& word, int low, int high) { // Base case: Could not find the word if (low == high) { return false; } const int middle = (low + high) / 2; const string& mid_word = words[middle]; if (mid_word == word) { // Base case: Found the word! return true; } else if (word < mid_word) { high = middle; } else { // if (word > mid_word) low = middle + 1; } // Recursive case: We've narrowed down the search to the left or right half. return _contains_word_bs(words, word, low, high); } bool contains_word_bs(const vector& words, const string& word) { return _contains_word_bs(words, word, 0, words.size()); } std::binary_search , defined in the header is a more general version of linear search than contains_word_bs . How fast is it? binary_search_!med.cpp #include #include #include #include #include #include #include using std::cout; using std::endl; using std::ifstream; using std::string; using std::vector; vector read_lines(const string& filepath, const long max_lines=-1) { ifstream f(filepath); vector lines; for(string line; getline(f, line); ) { lines.push_back(line); if (max_lines > 0 && lines.size() >= max_lines) { break; } } return lines; } vector get_random_words(const vector& words, const int sample_size) { std::default_random_engine dre; std::uniform_int_distribution uniform_dist(0, words.size() - 1); vector random_words; for (int i = 0; i < sample_size; ++i) { random_words.push_back(words[uniform_dist(dre)]); } return random_words; } bool _contains_word_bs( const vector& words, const string& word, int low, int high) { // Base case: Could not find the word if (low == high) { return false; } const int middle = (low + high) / 2; const string& mid_word = words[middle]; if (mid_word == word) { // Base case: Found the word! return true; } else if (word < mid_word) { high = middle; } else { // if (word > mid_word) low = middle + 1; } // Recursive case: We've narrowed down the search to the left or right half. // Note that instead of recursing here, we could just as easily put the code // in this function in a loop. return _contains_word_bs(words, word, low, high); } bool contains_word_bs(const vector& words, const string& word) { return _contains_word_bs(words, word, 0, words.size()); } // Returns the number of seconds to search for all `sample_words` in `words`. double time_contains_word_bs(const vector& words, const vector& sample_words) { auto start_time = std::chrono::steady_clock::now(); for (const string& w : sample_words) { contains_word_bs(words, w); } auto end_time = std::chrono::steady_clock::now(); std::chrono::duration elapsed_seconds = end_time - start_time; return elapsed_seconds.count(); } // Returns the number of seconds to search for all `sample_words` in `words`. double time_std_find(const vector& words, const vector& sample_words) { auto start_time = std::chrono::steady_clock::now(); for (const string& w : sample_words) { std::binary_search(words.begin(), words.end(), w); } auto end_time = std::chrono::steady_clock::now(); std::chrono::duration elapsed_seconds = end_time - start_time; return elapsed_seconds.count(); } int main(int argc, const char* argv[]) { const int sample_size = 100000; // Output a CSV that can be graphed. cout << "\"Number of words\",contains_word_bs,std::binary_search\n"; for (int n = 1; n < 1000000; n *= 2) { vector words = read_lines("words.txt", n); std::sort(words.begin(), words.end()); vector random_words = get_random_words(words, sample_size); cout << words.size() << "," << time_contains_word_bs(words, random_words) << "," << time_std_find(words, random_words) << '\n'; } return 0; } $ clang -pedantic -Wall -lm -lstdc++ -std=c++20 -o binary_search_timed binary_search/binary_search_timed.cpp $ ./binary_search_timed "Number of words",contains_word_bs,std::binary_search 1,0.00411836,0.0108771 2,0.00686095,0.0157544 4,0.00941716,0.0164178 8,0.012649,0.0202229 16,0.0162046,0.0231924 32,0.0202056,0.0285961 64,0.0262999,0.0334576 128,0.0304595,0.0389713 256,0.035055,0.0452508 512,0.0411489,0.0487735 1024,0.0474154,0.0549603 2048,0.0519004,0.0593754 4096,0.0587545,0.0648679 8192,0.0656104,0.0705913 16384,0.0741395,0.0783728 32768,0.0823508,0.0884602 65536,0.0985337,0.10054 124536,0.106752,0.111065 124536,0.106102,0.109859 124536,0.105223,0.109041 In prac!ce, it looks like it's much faster than Linear Search, at least for larger inputs. It's possible that Binary Search is actually slower for small inputs but by elimina!ng half of the possible words at every itera!on instead of only one word, even a slow Binary Search will beat a Linear Search on large inputs. This begs the ques!on, if Linear Search has to iterate for every word in the worse case scenario, how many !mes does Binary Search have to iterate? We know that in the base cases (when we narrow down the range to 1 word or 0 words), it only takes one itera!on. We also know that for the general/recursive case, we cut the range of possible words in half. So basically, if there are N words, the most itera!ons we might do (the most recursive calls we might do) is the same as the number of !mes we can divide N by 2: log base 2 of N. "上⼀⻚ 下⼀⻚# Spring 2021 主⻚ 公告 作业 讨论 评分 ⼈员 ⻚⾯ ⽂件 ⼤纲 测验 单元 会议 协作 Chat Library Evalua!onKIT Course Labster Dashboard Foothill Tutoring Tech Ambassador Search 帐户 控制⾯板 课程 ⽇历 收件箱 历史记录 Studio 帮助 Student Support Portal Pronto 欢迎咨询51作业君