Maps and Hash Tables. EECS 2011 Prof. J. Elder - 1 -

Similar documents
Maps, Hash Tables and Dictionaries

Priority Queues & Heaps

File Systems: Fundamentals

Priority Queues & Heaps

Priority Queues & Heaps

Search Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :51 PM

Midterm Review. EECS 2011 Prof. J. Elder - 1 -

Midterm Review. EECS 2011 Prof. J. Elder - 1 -

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Lecture 6 Cryptographic Hash Functions

Contents. Bibliography 121. Index 123

Year 1 Mental mathematics and fluency in rapid recall of number facts are one of the main aims of the new Mathematics Curriculum.

ECE250: Algorithms and Data Structures Trees

COMP : DATA STRUCTURES 2/27/14. Are binary trees satisfying two additional properties:

Protocol to Check Correctness of Colorado s Risk-Limiting Tabulation Audit

CS 5523: Operating Systems

Chapter 8: Recursion

Text UI. Data Store Ø Example of a backend to a real Could add a different user interface. Good judgment comes from experience

Please reach out to for a complete list of our GET::search method conditions. 3

Key Considerations for Implementing Bodies and Oversight Actors

Last Time. Bit banged SPI I2C LIN Ethernet. u Embedded networks. Ø Characteristics Ø Requirements Ø Simple embedded LANs

Estonian National Electoral Committee. E-Voting System. General Overview

Chapter 11. Weighted Voting Systems. For All Practical Purposes: Effective Teaching

2143 Vote Count. Input

Optimization Strategies

CRYPTOGRAPHIC PROTOCOLS FOR TRANSPARENCY AND AUDITABILITY IN REMOTE ELECTRONIC VOTING SCHEMES

Global Conditions (applies to all components):

General Framework of Electronic Voting and Implementation thereof at National Elections in Estonia

Proving correctness of Stable Matching algorithm Analyzing algorithms Asymptotic running times

We should share our secrets

TAFTW (Take Aways for the Week) APT Quiz and Markov Overview. Comparing objects and tradeoffs. From Comparable to TreeMap/Sort

Supreme Court of Florida

Batch binary Edwards. D. J. Bernstein University of Illinois at Chicago NSF ITR

A matinee of cryptographic topics

Fall 2016 COP 3223H Program #5: Election Season Nears an End Due date: Please consult WebCourses for your section

Concurrent Programing: Why you should care, deeply. Don Porter Portions courtesy Emmett Witchel

Estimating the Margin of Victory for Instant-Runoff Voting

Hoboken Public Schools. Algebra II Honors Curriculum

Constraint satisfaction problems. Lirong Xia

IN-POLL TABULATOR PROCEDURES

A Bloom Filter Based Scalable Data Integrity Check Tool for Large-scale Dataset

An untraceable, universally verifiable voting scheme

Key Considerations for Oversight Actors

SECURE REMOTE VOTER REGISTRATION

Chief Electoral Officer Directives for the Counting of Ballots (Elections Act, R.S.N.B. 1973, c.e-3, ss.5.2(1), s.87.63, 87.64, 91.1, and 91.

Hoboken Public Schools. College Algebra Curriculum

Act means the Municipal Elections Act, 1996, c. 32 as amended;

Cluster Analysis. (see also: Segmentation)

Check off these skills when you feel that you have mastered them. Identify if a dictator exists in a given weighted voting system.

Primecoin: Cryptocurrency with Prime Number Proof-of-Work

Electronic Voting Service Using Block-Chain

Table of Contents. September, 2016 LIBRS Specifications, Rel

Mathematics and Social Choice Theory. Topic 4 Voting methods with more than 2 alternatives. 4.1 Social choice procedures

The optical memory card is a Write Once media, a written area cannot be overwritten. Information stored on an optical memory card is non-volatile.

Colorado Secretary of State Election Rules [8 CCR ]

Supreme Court of Florida

CPSC 467b: Cryptography and Computer Security

PROCEDURE FOR USE OF VOTE TABULATORS MUNICIPAL ELECTIONS 2018

Subreddit Recommendations within Reddit Communities

Notes for Session 7 Basic Voting Theory and Arrow s Theorem

Supreme Court of Florida

Supreme Court of Florida

Supreme Court of Florida

SQL Server T-SQL Recipes

Fair Division in Theory and Practice

Uncovering the veil on Geneva s internet voting solution

Swiss E-Voting Workshop 2010

PROCEDURES FOR THE USE OF VOTE COUNT TABULATORS

Local differential privacy

Complexity of Terminating Preference Elicitation

Election Audit Report for Pinellas County, FL. March 7, 2006 Elections Using Sequoia Voting Systems, Inc. ACV Edge Voting System, Release Level 4.

Supreme Court of Florida

Statement on Security & Auditability

SIMPLE LINEAR REGRESSION OF CPS DATA

CSCI211: Intro Objectives

STEP-BY-STEP INSTRUCTIONS FOR FORMING A CITIZEN ADVISORY COMMITTEE (CAC)

New Hampshire Secretary of State Electronic Ballot Counting Devices

MUNICIPAL ELECTIONS 2014 Voting Day Procedures & Procedures for the Use of Vote Tabulators

Lecture 7 A Special Class of TU games: Voting Games

Draft rules issued for comment on July 20, Ballot cast should be when voter relinquishes control of a marked, sealed ballot.

Polydisciplinary Faculty of Larache Abdelmalek Essaadi University, MOROCCO 3 Department of Mathematics and Informatics

ETSI TS V8.3.0 ( )

DIRECTIVE November 20, All County Boards of Elections Directors, Deputy Directors, and Board Members. Post-Election Audits SUMMARY

Event Based Sequential Program Development: Application to Constructing a Pointer Program

Voting on combinatorial domains. LAMSADE, CNRS Université Paris-Dauphine. FET-11, session on Computational Social Choice

Please see my attached comments. Thank you.

Support Vector Machines

PROCEDURES FOR USE OF VOTE TABULATORS. Municipal Elections Township of Norwich

W. B. Vasantha Kandasamy Florentin Smarandache K. Kandasamy

Registrar of Voters Certification. Audit ( 9 320f)

(1) PURPOSE. To establish minimum security standards for voting systems pursuant to Section (4), F.S.

Installation Instructions HM2085-PLM Strain Gage Input Module

November 15-18, 2013 Open Government Survey

COMMISSION CHECKLIST FOR NOVEMBER GENERAL ELECTIONS (Effective May 18, 2004; Revised July 15, 2015)

DIRECTIVE FOR THE 2018 GENERAL ELECTION FOR ALL ELECTORAL DISTRICTS FOR VOTE COUNTING EQUIPMENT AND ACCESSIBLE VOTING EQUIPMENT

Financial Institutions Guide to Preparing & Formatting IOLTA Remittance Files

Complexity of Manipulating Elections with Few Candidates

The Effectiveness of Receipt-Based Attacks on ThreeBallot

IN THE THIRTEENTH JUDICIAL CIRCUIT HILLSBOROUGH COUNTY, FLORIDA. ADMINISTRATIVE ORDER S (Supersedes Second Amendment to Local Rule 3)

Hat problem on a graph

Transcription:

Maps and Hash Tables - 1 -

Outline Ø Maps Ø Hashing Ø Multimaps Ø Ordered Maps - 2 -

Learning Outcomes Ø By understanding this lecture, you should be able to: Ø Outline the ADT for a map and a multimap Ø Identify applications for which maps, multimaps and ordered maps are appropriate Ø Design and implement a map in java and verify that it satisfies the requirements of the map ADT Ø Explain the purpose of hash tables Ø Design and implement hashing methods using separate chaining or open addressing and a variety of hashing and double-hashing functions Ø Specify worst-case and average-case asymptotic run-times for standard operations on maps based upon hash tables Ø Specify (worst-case) asymptotic run-times for standard operations on sorted maps - 3 -

Outline Ø Maps Ø Hashing Ø Multimaps Ø Ordered Maps - 4 -

Maps Ø A map models a searchable collection of key-value entries Ø The main operations of a map are for searching, inserting, and deleting items Ø Multiple entries with the same key are not allowed Ø Applications: q address book q student-record database - 5 -

Ø Map ADT methods: The Map ADT (in java.util) q get(k): if the map M has an entry with key k, return its associated value; else, return null q put(k, v): insert entry (k, v) into the map M; if key k is not already in M, then return null; else, return old value associated with k q remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null q size(), isempty() q entryset(): returns an iterable collection of the entries in M q keyset(): return an iterable collection of the keys in M q values(): return an iterable collection of the values in M - 6 -

Example Operation Output M isempty() true Ø put(5,a) null (5,A) put(7,b) null (5,A),(7,B) put(2,c) null (5,A),(7,B),(2,C) put(8,d) null (5,A),(7,B),(2,C),(8,D) put(2,e) C (5,A),(7,B),(2,E),(8,D) get(7) B (5,A),(7,B),(2,E),(8,D) get(4) null (5,A),(7,B),(2,E),(8,D) get(2) E (5,A),(7,B),(2,E),(8,D) size() 4 (5,A),(7,B),(2,E),(8,D) remove(5) A (7,B),(2,E),(8,D) remove(2) E (7,B),(8,D) get(2) null (7,B),(8,D) isempty() false (7,B),(8,D) - 7 -

A Simple List-Based Map Ø We could implement a map using an unsorted list qwe store the entries of the map in a doubly-linked list S, in arbitrary order header nodes/positions trailer 9 c 6 b 5 a 8 d entries - 8 -

The get(k) Algorithm Algorithm get(k): B = S.positions() {B is an iterator of the positions in S} while B.hasNext() do p = B.next() {the next position in B} if p.element().getkey() = k then return p.element().getvalue() return null {there is no entry with key equal to k} - 9 -

The put(k,v) Algorithm Algorithm put(k,v): B = S.positions() while B.hasNext() do p = B.next() if p.element().getkey() = k then t = p.element().getvalue() S.set(p,(k,v)) return t {return the old value} S.addLast((k,v)) n = n + 1 {increment variable storing number of entries} return null {there was no previous entry with key equal to k} - 10 -

The remove(k) Algorithm Algorithm remove(k): B =S.positions() while B.hasNext() do p = B.next() if p.element().getkey() = k then t = p.element().getvalue() S.remove(p) n = n 1 return t {decrement number of entries} {return the removed value} return null {there is no entry with key equal to k} - 11 -

Performance of a List-Based Map Ø Performance: q put, get and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key Ø The unsorted list implementation is effective only for small maps - 12 -

Outline Ø Maps Ø Hashing Ø Multimaps Ø Ordered Maps - 13 -

Hash Tables Ø A hash table is a data structure that can be used to make map operations faster. Ø While worst-case is still O(n), average case is typically O(1). - 14 -

Applications of Hash Tables Ø Databases Ø Caches Ø Programming languages - 15 -

Hash Functions and Hash Tables Ø A hash function h maps keys of a given type to integers in a fixed interval [0, N - 1] Ø Example: h(k) = k mod N is a hash function for integer keys Ø The integer h(k) is called the hash value of key k Ø A hash table for a given key type consists of qhash function h qarray (called table) of size N Ø When implementing a map with a hash table, the goal is to store item (k, v) at index i = h(k) - 16 -

Example Ø We design a hash table for a map storing entries as (SIN, Name), where SIN (social insurance number) is a nine-digit positive integer Ø Our hash table uses an array of size N = 1,000 and the hash function h(k) = last three digits of SIN k 0 1 2 3 4 997 998 999 Ø Ø Ø Ø 025-612-001 981-101-002 451-229-004 200-751-998 Ø Problem: what happens if two entries have the same last three digits? - 17 -

Hash Functions Ø A hash function is usually specified as the composition of two functions: Hash code: h 1 : keys è integers Compression function: h 2 : integers è [0, N - 1] Ø The hash code is applied first, and the compression function is applied next on the result, i.e., h(k) = h 2 (h 1 (k)) Ø The goal of the hash function is to disperse the keys in an apparently random way - 18 -

Ø Memory address: Hash Codes q We reinterpret the memory address of the key object as an integer (default hash code of all Java objects) q Does not work well when copies of the same object may be stored at different locations. Ø Integer cast: q We reinterpret the bits of the key as an integer q Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in Java) Ø Component sum: q We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) q Suitable for keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double in Java) - 19 -

Problems with Component Sum Hash Codes Ø Hashing works when q the number of commonly-occurring keys is small relative to the hashing space (e.g., 2 32 for a 32-bit hash code). q the hash codes for commonly-occurring keys are well-distributed (do not collide) in this space. Ø Component Sum codes ignore the ordering of the components. q e.g., using 8-bit ASCII components, stop and pots yields the same code. Ø If commonly-occuring keys are anagrams of each other, this is a bad idea! - 20 -

Ø Polynomial accumulation: Polynomial Hash Codes q We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a 0 a 1 a n-1 q We evaluate the polynomial p(z) = a 0 + a 1 z + a 2 z 2 + + a n-1 z n-1 at a fixed value z, ignoring overflows q Especially suitable for strings q Polynomial p(z) can be evaluated in O(n) time using Horner s rule: ² The following polynomials are successively computed, each from the previous one in O(1) time p 0 (z) = a n-1 q We have p(z) = p n-1 (z) p i (z) = a n-i-1 + zp i-1 (z) (i = 1, 2,, n -1) - 21 -

Compression Functions Ø Division: q h 2 (y) = y mod N q The size N of the hash table is usually chosen to be a prime (on the assumption that the differences between hash keys y are less likely to be multiples of primes). Ø Multiply, Add and Divide (MAD): q h 2 (y) = [(ay + b) mod p] mod N, where ²p is a prime number greater than N ²a and b are integers chosen at random from the interval [0, p 1], with a > 0. - 22 -

Collisions Ø Collisions occur when different elements are mapped to the same cell Ø Example: Ø Suppose h 2 (y) = y mod N, where N = 13. Ø y = 17 and y = 30 will hash to the same location (4). - 23 -

Collision Handling Ø Collisions occur when different elements are mapped to the same cell Ø Separate Chaining: q Let each cell in the table point to a linked list of entries that map there q Separate chaining is simple, but requires additional memory outside the table 0 Ø 1 025-612-001 2 3 Ø Ø 4 451-229-004 981-101-004-24 -

Map Methods with Separate Chaining Ø Delegate operations to a list-based map at each cell: Algorithm get(k): Output: The value associated with the key k in the map, or null if there is no entry with key equal to k in the map return A[h(k)].get(k) {delegate the get to the list-based map at A[h(k)]} - 25 -

Map Methods with Separate Chaining Ø Delegate operations to a list-based map at each cell: Algorithm put(k,v): Output: Store the new (key, value) pair. If there is an existing entry with key equal to k, return the old value; otherwise, return null t = A[h(k)].put(k,v) if t = null then {delegate the put to the list-based map at A[h(k)]} {k is a new key} n = n + 1 return t - 26 -

Map Methods with Separate Chaining Ø Delegate operations to a list-based map at each cell: Algorithm remove(k): Output: The (removed) value associated with key k in the map, or null if there is no entry with key equal to k in the map t = A[h(k)].remove(k) if t null then n = n - 1 {delegate the remove to the list-based map at A[h(k)]} {k was found} return t - 27 -

Open Addressing: Linear Probing Ø Open addressing: the colliding item is placed in a different cell of the table Ø Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell Ø Each table cell inspected is referred to as a probe Ø Colliding items lump together, so that future collisions cause a longer sequence of probes Ø Example: q h(k) = k mod 13 q Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 41 18 44 59 32 22 31 73 0 1 2 3 4 5 6 7 8 9 10 11 12-28 -

Get with Linear Probing Ø Consider a hash table A of length N that uses linear probing Ø get(k) q We start at cell h(k) q We probe consecutive locations until one of the following occurs ²An item with key k is found, or ²An empty cell is found, or ²N cells have been unsuccessfully probed Algorithm get(k) i ç h(k) p ç 0 repeat c ça[i] if c = Ø return null else if c.key () = k return c.element() else i ç (i + 1) mod N p ç p + 1 until p = N return null - 29 -

End of Lecture Feb 13, 2018-30 -

Remove with Linear Probing Ø Suppose we receive a remove(44) message. Ø What problem arises if we simply remove the key = 44 entry? k h(k) i 18 5 5 41 2 2 22 9 9 44 5 6 59 7 7 32 6 8 31 5 10 73 8 11 Ø Example: q h(k) = k mod 13 q Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order Ø ê 41 18 44 59 32 22 31 73 0 1 2 3 4 5 6 7 8 9 10 11 12 What happens now if we do a get(31)? - 31 -

Removal with Linear Probing Ø To address this problem, we introduce a special object, called AVAILABLE, which replaces deleted elements Ø AVAILABLE has a null key Ø No changes to get(k) are required. Algorithm get(k) i ç h(k) p ç 0 repeat c ç A[i] if c = Ø return null else if c.key () = k return c.element() else i ç (i + 1) mod N p ç p + 1 until p = N return null - 32 -

Updates with Linear Probing Ø remove(k) q We search for an entry with key k q If such an entry (k, v) is found, we replace it with the special item AVAILABLE and we return value v q Else, we return null Ø put(k, v) q We start at cell h(k) and then probe consecutive cells q If we encounter a cell labeled AVAILABLE we note its index q We continue probing until either ² A cell i is found that is empty or has key matching k ² N cells have been unsuccessfully probed q If a cell has key matching k, we update the value for this cell to v q Else if a cell labeled AVAILABLE was encountered we store entry (k, v) there. q Else if an empty cell was encountered we store entry (k, v) there q Else if N cells were probed unsuccessfully we throw an exception. - 33 -

Open Addressing: Double Hashing Ø Double hashing is an alternative open addressing method that uses a secondary hash function h (k) in addition to the primary hash function h(x). Ø Suppose that the primary hashing i=h(k) leads to a collision. Ø We then iteratively probe the locations (i + jh (k)) mod N for j = 0,1,, N - 1 Ø The secondary hash function h (k) cannot have zero values Ø Choose N to be prime. Ø Common choice of secondary hash function h (k): q h (k) = q - k mod q, where ² q < N ² q is a prime Ø The possible values for h (k) are 1, 2,, q - 34 -

Open Addressing: Double Hashing Ø Potential problem: depending upon the relative values of N and h (k), not all locations may be probed. Ø Example: N = 12, h (k) = 6. Suppose primary hash i = h(k) = 2. 31 41 18 32 59 73 22 44 0 1 2 3 4 5 6 7 8 9 10 11 Probed Locations Ø To avoid this problem, we need to ensure that none of the N probed locations repeat, i.e., we require:! + #h (') mod,! + #. h. ' mod,, where 0 # < #. <, - 35 -

Open Addressing: Double Hashing / + 1h ($) mod ' / + 1 # h # $ mod ', where 0 1 < 1 # < ' Ø In other words, we require!h # $ &',!, & 1, 2,, ' 1,! & Ø This will be true if h (k) and N are relatively prime, i.e., share no common factors. Ø We have ensured this by making N prime. However this has downsides: q Modulo operations tend to be less efficient. q Have to search for prime numbers Ø More common to make N a power of 2. q Makes modulo operations very efficient. q No searching for special numbers required. Ø Q: If N is a power of 2, how can we ensure that h (k) and N are relatively prime? Ø A: Make h (k) odd! - 36 -

Open Addressing: Double Hashing (Revised) Ø Select N to be a power of 2. Ø Let h (k) be a random odd integer, 0 < h (k) < N, e.g., q h (k) = 2 * Random.nextInt(N/2) + 1, where Rand.nextInt(n) generates a pseudo-random integer in the range 0 n-1. - 37 -

Example of Double Hashing Ø Consider a hash table storing integer keys that handles collision with double hashing q N = 13 q h(k) = k mod 13 q h (k) = 7 - k mod 7 Ø Insert keys 18, 41, 22, 44, 59, 32, 31, 73 k h(k) h'(k) Probes 18 5 3 5 41 2 1 2 22 9 6 9 44 5 5 5 10 59 7 4 7 32 6 3 6 31 5 4 5 9 0 73 8 4 8 31 41 18 32 59 73 22 44 0 1 2 3 4 5 6 7 8 9 10 11 12-38 -

Example of Double Hashing Ø Consider a hash table storing integer keys that handles collision with double hashing q N = 13 q h(k) = k mod 13 q h (k) = 7 - k mod 7 Ø Insert keys 18, 41, 22, 44, 59, 32, 31, 73 k h(k) h'(k) Probes 18 5 3 5 41 2 1 2 22 9 6 9 44 5 5 5 10 59 7 4 7 32 6 3 6 31 5 4 5 9 0 73 8 4 8 31 41 18 32 59 73 22 44 0 1 2 3 4 5 6 7 8 9 10 11 12-39 -

Performance of Hashing Ø In the worst case, searches, insertions and removals on a hash table take O(n) time Ø The worst case occurs when all the keys inserted into the map collide Ø The load factor λ = n/n affects the performance of a hash table q For separate chaining, performance is typically good for λ < 0.9. q For open addressing, performance is typically good for λ < 0.5. q java.util.hashmap maintains λ < 0.75 Ø Open addressing can be more memory efficient than separate chaining, as we do not require a separate data structure. Ø However, separate chaining is typically as fast or faster than open addressing. - 40 -

Rehashing Ø When the load factor λ exceeds threshold, the table must be rehashed. q A larger table is allocated (typically at least double the size). q A new hash function is defined. q All existing entries are copied to this new table using the new hash function. - 41 -

Outline Ø Maps Ø Hashing Ø Multimaps Ø Ordered Maps - 42 -

Multimap ADT Ø The Multimap ADT is identical to the Map ADT, but allows entries with the same key. Ø Multimaps are also sometimes known as dictionaries. Ø As for maps, the main operations are searching, inserting, and deleting items Ø Applications: q word-definition pairs Ø Dictionary ADT methods: q get(k): if the dictionary has at least one entry with key k, returns one of them, else, returns null q getall(k): returns an iterable collection of all entries with key k q put(k, v): inserts and returns the entry (k, v) q remove(e): removes and returns the entry e. Throws an exception if the entry is not in the dictionary. q entryset(): returns an iterable collection of the entries in the dictionary q size(), isempty() - 43 -

Multimaps and Java Ø Note: The java.util.dictionary class actually implements a map ADT. Ø There is no multimap data structure in the Java Collections Framework that supports multiple entries with equal keys. Ø The textbook (Ch. 10.5.3) sketches an implementation of a multimap based upon a map of keys, each entry of which supports a List of entries with the same key. q Note: There are some errors in the textbook java implementation of the multimap please ignore Code Fragment 10.17. - 44 -

Example Operation Output Dictionary put(5,a) (5,A) (5,A) put(7,b) (7,B) (5,A),(7,B) put(2,c) (2,C) (5,A),(7,B),(2,C) put(8,d) (8,D) (5,A),(7,B),(2,C),(8,D) put(2,e) (2,E) (5,A),(7,B),(2,C),(8,D),(2,E) get(7) (7,B) (5,A),(7,B),(2,C),(8,D),(2,E) get(4) null (5,A),(7,B),(2,C),(8,D),(2,E) get(2) (2,C) (5,A),(7,B),(2,C),(8,D),(2,E) getall(2) (2,C),(2,E) (5,A),(7,B),(2,C),(8,D),(2,E) size() 5 (5,A),(7,B),(2,C),(8,D),(2,E) remove(get(5)) (5,A) (7,B),(2,C),(8,D),(2,E) get(5) null (7,B),(2,C),(8,D),(2,E) - 45 -

Subtleties of remove(e) Ø remove(e) will remove an entry that matches e (i.e., has the same (key, value) pair). Ø If the dictionary contains more than one entry with identical (key, value) pairs, remove(e) will only remove one. Ø Example: Operation Output Dictionary e1 = put(2,c) (2,C) (5,A),(7,B),(2,C) e2 = put(8,d) (8,D) (5,A),(7,B),(2,C),(8,D) e3 = put(2,e) (2,E) (5,A),(7,B),(2,C),(8,D),(2,E) remove(get(5)) (5,A) (7,B),(2,C),(8,D),(2,E) remove(e3) (2,E) (7,B),(2,C),(8,D) remove(e1) (2,C) (7,B),(8,D) - 46 -

A List-Based Multi-Map (Dictionary) Ø A log file or audit trail is a dictionary implemented by means of an unsorted sequence q We store the items of the dictionary in a sequence (based on a doublylinked list or array), in arbitrary order Ø Performance: q insert takes O(1) time since we can insert the new item at the beginning or at the end of the sequence q find and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key Ø The log file is effective only for dictionaries of small size or for dictionaries on which insertions are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation) - 47 -

Outline Ø Maps Ø Hashing Ø Multimaps Ø Ordered Maps - 48 -

Ordered Maps and Dictionaries Ø If keys obey a total order relation, can represent a map or dictionary as an ordered search table stored in an array. Ø Can then support a fast find(k) using binary search. q at each step, the number of candidate items is halved q terminates after a logarithmic number of steps q Example: find(7) 0 l 0 l 0 0 1 3 4 5 7 8 9 11 14 16 18 19 m h 1 3 4 5 7 8 9 11 14 16 18 19 m h 1 3 4 5 7 8 9 11 14 16 18 19 l m h 1 3 4 5 7 8 9 11 14 16 18 19 l=m =h - 49 -

Ordered Search Tables Ø Performance: q find takes O(log n) time, using binary search q insert takes O(n) time since in the worst case we have to shift n items to make room for the new item q remove takes O(n) time since in the worst case we have to shift n items to compact the items after the removal Ø A search table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations) - 50 -

Outline Ø Maps Ø Hashing Ø Multimaps Ø Ordered Maps - 51 -

Learning Outcomes By understanding this lecture, you should be able to: Ø Outline the ADT for a map and a multimap Ø Identify applications for which maps, multimaps and ordered maps are appropriate Ø Design and implement a map in java and verify that it satisfies the requirements of the map ADT Ø Explain the purpose of hash tables Ø Design and implement hashing methods using separate chaining or open addressing and a variety of hashing and double-hashing functions Ø Specify worst-case and average-case asymptotic run-times for standard operations on maps based upon hash tables Ø Specify (worst-case) asymptotic run-times for standard operations on sorted maps - 52 -