TAFTW (Take Aways for the Week) APT Quiz and Markov Overview. Comparing objects and tradeoffs. From Comparable to TreeMap/Sort

Similar documents
Midterm Review. EECS 2011 Prof. J. Elder - 1 -

Midterm Review. EECS 2011 Prof. J. Elder - 1 -

Priority Queues & Heaps

Priority Queues & Heaps

Maps, Hash Tables and Dictionaries

File Systems: Fundamentals

Maps and Hash Tables. EECS 2011 Prof. J. Elder - 1 -

Priority Queues & Heaps

Proving correctness of Stable Matching algorithm Analyzing algorithms Asymptotic running times

Plan For the Week. Solve problems by programming in Python. Compsci 101 Way-of-life. Vocabulary and Concepts

BMI for everyone. Compsci 6/101: PFTW. Accumulating a value. How to solve an APT. Review how APTs and Python work, run

CS 5523: Operating Systems

Compsci 290.3, Spring 2017 Software Design and Implementation: Mobile Landon Cox Owen Astrachan

Chapter 8: Recursion

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

Search Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :51 PM

Text UI. Data Store Ø Example of a backend to a real Could add a different user interface. Good judgment comes from experience

OPEN SOURCE CRYPTOCURRENCY

Fall 2016 COP 3223H Program #5: Election Season Nears an End Due date: Please consult WebCourses for your section

Coverage tools Eclipse Debugger Object-oriented Design Principles. Oct 26, 2016 Sprenkle - CSCI209 1

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Review: SoBware Development

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Data 100. Lecture 9: Scraping Web Technologies. Slides by: Joseph E. Gonzalez, Deb Nolan

We should share our secrets

Hoboken Public Schools. AP Statistics Curriculum

ECE250: Algorithms and Data Structures Trees

Hoboken Public Schools. Algebra II Honors Curriculum

Creating and Managing Clauses. Selectica, Inc. Selectica Contract Performance Management System

Support Vector Machines

Constraint satisfaction problems. Lirong Xia

Exploring QR Factorization on GPU for Quantum Monte Carlo Simulation

Mojdeh Nikdel Patty George

Social Computing in Blogosphere

Learning Expectations

Optimization Strategies

Deep Learning and Visualization of Election Data

Supreme Court of Florida

Programming with Android: SDK install and initial setup. Dipartimento di Informatica: Scienza e Ingegneria Università di Bologna

Use and abuse of voter migration models in an election year. Dr. Peter Moser Statistical Office of the Canton of Zurich

SIMPLE LINEAR REGRESSION OF CPS DATA

Event Based Sequential Program Development: Application to Constructing a Pointer Program

EXAMINATION 3 VERSION B "Wage Structure, Mobility, and Discrimination" April 19, 2018

LobbyView: Firm-level Lobbying & Congressional Bills Database

Feedback loops of attention in peer production

Cluster Analysis. (see also: Segmentation)

Bribery in voting with CP-nets

REVEALING THE GEOPOLITICAL GEOMETRY THROUGH SAMPLING JONATHAN MATTINGLY (+ THE TEAM) DUKE MATH

Fuzzy Mathematical Approach for Selecting Candidate For Election by a Political Party

Hoboken Public Schools. Algebra I Curriculum

Title: Local Search Required reading: AIMA, Chapter 4 LWH: Chapters 6, 10, 13 and 14.

Review: Background on Bits. PFTD: What is Computer Science? Scale and Bits: Binary Digits. BIT: Binary Digit. Understanding scale, what does it mean?

Board on Mathematical Sciences & Analytics. View webinar videos and learn more about BMSA at

Objec&ves. Review. JUnit Coverage Collabora&on

IBM Cognos Open Mic Cognos Analytics 11 Part nd June, IBM Corporation

A Bloom Filter Based Scalable Data Integrity Check Tool for Large-scale Dataset

Congress Lobbying Database: Documentation and Usage

VALUING CASES FOR SETTLEMENT: SEEING THE FOREST THROUGH THE (DECISION) TREES

Objec&ves. Review. So-ware Quality Metrics Sta&c Analysis Tools Refactoring for Extensibility

4/29/2015. Conditions for Patentability. Conditions: Utility. Juicy Whip v. Orange Bang. Conditions: Subject Matter. Subject Matter: Abstract Ideas

Year 1 Mental mathematics and fluency in rapid recall of number facts are one of the main aims of the new Mathematics Curriculum.

30 Transformational Design with Essential Aspect Decomposition: Model-Driven Architecture (MDA)

Aspect Decomposition: Model-Driven Architecture (MDA) 30 Transformational Design with Essential. References. Ø Optional: Ø Obligatory:

The 2012 GOP Primary: Unmasking the Vote Manipulation

UTAH LEGISLATIVE BILL WATCH

QUANTIFYING GERRYMANDERING REVEALING GEOPOLITICAL STRUCTURE THROUGH SAMPLING

Rock the Vote or Vote The Rock

CSCI211: Intro Objectives

HASHGRAPH CONSENSUS: DETAILED EXAMPLES

Tengyu Ma Facebook AI Research. Based on joint work with Yuanzhi Li (Princeton) and Hongyang Zhang (Stanford)

Estonian National Electoral Committee. E-Voting System. General Overview

Many irregularities occurred as Travis County conducted the City of Austin s City Council Runoff election:

ECONOMIC GROWTH* Chapt er. Key Concepts

Hoboken Public Schools. Project Lead The Way Curriculum Grade 8

Cloud Tutorial: AWS IoT. TA for class CSE 521S, Fall, Jan/18/2018 Haoran Li

Subreddit Recommendations within Reddit Communities

Clause Logic Service User Interface User Manual

Lecture 6 Cryptographic Hash Functions

Entity Linking Enityt Linking. Laura Dietz University of Massachusetts. Use cursor keys to flip through slides.

NEW YORK CITY COLLEGE OF TECHNOLOGY The City University of New York

1 IN THE UNITED STATES DISTRICT COURT 2 FOR THE SOUTHERN DISTRICT OF OHIO 3 * * * 4 NORTHEAST OHIO COALITION. 5 FOR THE HOMELESS, et al.

Using library & law resources

General Framework of Electronic Voting and Implementation thereof at National Elections in Estonia

THE PREPARE CURRICULUM: FOR POST-SECONDARY AND CAREER READNISS

IDENTIFYING FAULT-PRONE MODULES IN SOFTWARE FOR DIAGNOSIS AND TREATMENT USING EEPORTERS CLASSIFICATION TREE

Introduction to Path Analysis: Multivariate Regression

"Efficient and Durable Decision Rules with Incomplete Information", by Bengt Holmström and Roger B. Myerson

This Week on developerworks: Ruby, AIX, collaboration, BPM, Blogger API Episode date:

Has the War between the Rent Seekers Escalated?

An untraceable, universally verifiable voting scheme

HPCG on Tianhe2. Yutong Lu 1,Chao Yang 2, Yunfei Du 1

THE PREPARED CURRICULUM:

Was the Late 19th Century a Golden Age of Racial Integration?

Creating a Criminal Appeal and documents in ecourts Appellate

The New Geography of Jobs. Enrico Moretti University of California at Berkeley

Analyzing proofs Introduction to problem solving. Wiki: Everyone log in okay? Decide on either using a blog or wiki-style journal?

Outline. codified. codified 2. Pharm 543 Autumn 2008 Tom Hazlet. how a bill becomes a. law (statute) authorizes agencies to

New features in Oracle 11g for PL/SQL code tuning.

Fake Or Real? How To Self-Check The News And Get The Facts

Local differential privacy

Transcription:

TAFTW (Take Aways for the Week) Graded work this week: Ø APT Quiz, details and overview Ø Markov assignment, details and overview Concepts: Empirical and Analytical Analysis Ø Algorithms and Data Structures Ø Benchmarking and empirical analyses Ø Terminology, mathematics, analytical analyses Java idioms: Interfaces: general and Comparable Software Engineering: Unit Testing and JUnit Compsci 201, Fall 2016 9.1 APT Quiz and Markov Overview APT Quiz meant to demonstrate mastery of concepts. If you don't do this now, you'll have an opportunity to demonstrate mastery later Ø Self check on where you are, help us too Ø Validate your own work with APTs Ø It's ok to do a green dance, partial dance ok too! Markov Assignment Ø Basics of Java Objects, real/interesting scenario Ø Do not leave this until the last two days Compsci 201, Fall 2016 9.2 Comparing objects and tradeoffs How are objects compared in Java? Ø When would you want to compare? Ø What can t be compared? Empirical and Analytical Analysis Ø Why are some lists different? Ø Why is adding in the middle fast? Ø Why is adding in the middle slow? How do you measure performance? From Comparable to TreeMap/Sort When a class implements Comparable then Ø Instances are comparable to each other apple < zebra, 6 > 2 Sorting Strings, Sorting WordPairs, Method compareto invoked when Comparable< > types the parameter to compareto Ø Return < 0, == 0, > 0 according to results of comparison Compsci201, Fall 2016 9.3 Compsci 201, Fall 2016 9.4

Strings: simple Comparable Comparable? Strings compare themselves lexicographically aka Dictionary order Ø "zebra" > "aardvark", but "Zebra" < "aardvark" Ø You can't use <, ==, > with Strings "zebra".compareto(s) returns < 0 or == 0 or > 0 Ø According to less than, equal to, greater than Helper: "zebra".comparetoignorecase(s) implements Comparable<String> means? Ø Requires a method, what about correctness? Compsci 201, Fall 2016 9.5 Compsci 201, Fall 2016 9.6 Liberté, Egalité, Comparable Can we compare points? Ø http://stackoverflow.com/questions/5178092/sorti ng-a-list-of-points-with-java Ø https://courses.cs.washington.edu/courses/cse331/ 11sp/lectures/slides/04a-compare.pdf Key take-away: Comparable should be consistent with equals Ø If a.equals(b) then a.compareto(b) == 0 Ø Converse is also true, e.g., if and only if How do we compare points? Naïve approach? First compare x, then y? Let's look at.equals(..) first Ø Why is parameter an Object? Ø Everything is an Object! public boolean equals(object o) { if (o == null! (o instanceof Point)) { return false; Point p = (Point) o; return p.x == x && p.y == y; Compsci201, Fall 2016 9.7 Compsci 201, Fall 2016 9.8

How do we compare points? Naïve approach? First compare x, then y? Let's look at.compareto(..) Ø Why is parameter a Point? Useful math trick Use subtraction to help with return values http://stackoverflow.com/questions/2654839/rounding-a-double-to-turnit-into-an-int-java public int compareto(point p) { if (this.x < p.x) return -1; if (this.x > p.x) return 1; if (this.y < p.y) return -1; if (this.y > p.y) return 1 return 0; public int compareto(point p) { int deltax = (int) Math.round(x p.x); int deltay = (int) Math.round(y p.y); if (deltax == 0) return deltay; return deltax; Compsci 201, Fall 2016 9.9 Compsci 201, Fall 2016 9.10 Comparable and Interfaces http://bit.ly/201fall16-sept28-1 Some questions look at KWICModel.java, code we've previously examined in class. But now looking at interfaces Empirical and Analytical Analysis We can run programs to look at "efficiency" Ø Depends on machine, environment, programs We can analyze mathematically to look at efficiency from a different point of view Ø Depends on being able to employ mathematics We will work on doing both, leading to a better understanding in many dimensions Compsci201, Fall 2016 9.11 Compsci 201, Fall 2016 9.12

What is a java.util.list in Java? Collection of elements, operations? Ø Add, remove, traverse, Ø What can a list do to itself? Ø What can we do to a list? What s the Difference Here? How does find-a-track work? Fast forward? Why more than one kind of list: Array and Linked? Ø Useful in different applications Ø How do we analyze differences? Ø How do we use them in code? Compsci 201, Fall 2016 9.13 Compsci 201, Fall 2016 9.14 Analyze Data Structures public double removefirst(list<string> list) { double start = System.nanoTime(); while (list.size()!= 1){ list.remove(0); double end = System.nanoTime (); return (end-start)/1e9; List<String> linked = new LinkedList<String>(); List<String> array = new ArrayList<String>(); double ltime = splicer.removefirst(splicer.create(linked,100000)); double atime = splicer.removefirst(splicer.create(array,100000)); Remove First in 2011 Size 10 3 link array 10 0.003 0.045 20 0.001 0.173 30 0.001 0.383 40 0.002 0.680 50 0.002 1.074 60 0.002 1.530 70 0.003 2.071 80 0.003 2.704 90 0.004 3.449 100 0.007 4.220 Time taken to remove the first element? https://git.cs.duke.edu/201fall16/building-arrays/blob/mast er/src/listsplic er.ja va Compsci201, Fall 2016 9.15 Compsci 201, Fall 2016 9.16

Remove First in 2016 Why are timings good? Why are timings bad? Size 103 link array 10 0.0036 0.0102 20 0.0016 0.0375 30 0.0012 0.0756 40 0.0003 0.2228 50 0.0003 0.235 60 0.0004 0.2945 70 0.0005 0.3975 80 0.0007 0.5456 90 0.0006 0.7091 100 0.0007 0.8827 Analytical Analysis Since LinkedList is roughly linear Ø Time to remove first element is constant, but must be done N times Ø Vocabulary, time for one removal is O(1) --- constant and doesn't depend on N Ø Vocabulary, time for all removals is O(N) linear in N, but slope doesn't matter For ArrayList, removing first element entails Ø Shifting N-1 elements, so this is O(N) All: (N-1) + (N-2) + + 3 + 2 + 1 = O(N 2 ) Ø Sum is (N-1)N/2 Compsci 201, Fall 2016 9.17 Compsci 201, Fall 2016 9.18 Interfaces What is an interface? What does Google say? Ø Term overloaded even in English Ø What is a Java Interface? Abstraction that defines a contract/construct Ø Implementing requires certain methods exist For example, Comparable interface? Ø Programming to the interface is enabling What does Collections.sort actually sort? IDE helps by putting in stubs as needed Ø Let Eclipse be your friend Why use Interfaces? Implementation can vary without modifying code Ø Code relies on interface, e.g., addfrontor removemiddle Ø Argument passed has a concrete type, but code uses the interface in compiling Actual method called determined at runtime! Similar to API, e.g., using the Twitter API Ø Calls return JSON, the format is specified, different languages used to interpret JSON Compsci201, Fall 2016 9.19 Compsci 201, Fall 2016 9.20

Markov Interlude: JUnit and Interfaces How do we design/code/test EfficientMarkov? Ø Note: it implements an Interface! Ø Note: MarkovTest can be used to test it! How do we design/code/test WordGram? Ø Can we use WordGram tester when first cloned? Ø Where is implementation of WordGram? Ø How do you make your own? JUnit tests To run these must access JUnit library, jar file Ø Eclipse knows where this is, but Ø Must add to build-path aka class-path, Eclipse will do this for you if you let it Getting all green is the goal, but red is good Ø You have to have code that doesn't pass before you can pass Ø Similar to APTs, widely used in practice Testing is extremely important in engineering! Ø See also QA: quality assurance Compsci 201, Fall 2016 9.21 Compsci 201, Fall 2016 9.22 JUnit Interlude Looking at PointExperiment classes: Ø https://git.cs.duke.edu/201fall16/pointexperiment /tree/master/src Create JUnit tests for some methods, see live run through and summary Ø http://bit.ly/201-junit JUnit great for per-method testing in isolation from other methods Remove Middle Index public double removemiddleindex(list<string> list) { double start = System.nanoTime(); while (list.size()!= 1){ list.remove(list.size()/2); double end = System.nanoTime(); return (end-start)/1e9; What operations could be expensive here? Ø Explicit: size, remove (only one is expensive) Ø Implicit: find n th element Compsci201, Fall 2016 9.23 Compsci 201, Fall 2016 9.24

Remove Middle 2011 size link array 10 0.105 0.023 20 0.472 0.09 30 0.984 0.192 40 1.83 0.343 50 3.026 0.534 60 4.288 0.767 70 6.078 1.039 80 7.885 1.363 Remove Middle 2016 size link array 10 0.0635 0.0057 20 0.2644 0.0131 30 0.4808 0.0345 40 0.8524 0.0531 50 1.4025 0.0844 60 1.8418 0.1245 70 2.9064 0.1777 80 3.7237 0.2224 90 4.6833 0.3102 100 7.8717 0.3824 Compsci 201, Fall 2016 9.25 Compsci 201, Fall 2016 9.26 ArrayList and LinkedList as ADTs As an ADT (abstract data type) ArrayList supports Ø Constant-time or O(1) access to the k-th element Ø Amortized linear or O(n) storage/time with add Total storage used in n-element vector is approx. 2n, spread over all accesses/additions (why?) Ø Add/remove in middle is "expensive" O(n), why? What's underneath here? How Implemented? Ø Concrete: array contiguous memory, must be contiguous to support random access Ø Element 20 = beginning + 20 x size of a pointer ArrayList and LinkedList as ADTs LinkedList as ADT Ø Constant-time or O(1) insertion/deletion anywhere, but Ø Linear or O(n) time to find where, sequential search Linked good for add/remove at front Ø Splicing into middle, also for 'sparse' structures What's underneath? How Implemented Ø Low-level linked lists, self-referential structures Ø More memory intensive than array: two pointers Compsci201, Fall 2016 9.27 Compsci 201, Fall 2016 9.28

Inheritance and Interfaces Interfaces provide method names and parameters Ø The method signature we can expect and use! Ø What can we do to an ArrayList? To a LinkedList? Ø What can we do to a Map or Set or PriorityQueue? Ø java.util.collection is an interface New in Java 8: Interfaces can have code! Nancy Leveson: Software Safety Founded the field Mathematical and engineering aspects Ø Air traffic control Ø Microsoft word "C++ is not state-of-the-art, it's only state-of-the-practice, which in recent years has been going backwards" Software and steam engines once deadly dangerous? http://sunnyday.mit.edu/steam.pdf THERAC 25: Radiation machine killed many people http://sunnyday.mit.edu/papers/therac.pdf Compsci 201, Fall 2016 9.29 Compsci 201, Fall 2016 9.30 Big-Oh, O-notation: concepts & caveats Count how many times simple statements execute Ø In the body of a loop, what matters? (e.g., another loop?) Ø Assume statements take a second, cost a penny? What's good, what s bad about this assumption? If a loop is inside a loop: Ø Tricky because the inner loop can depend on the outer, use math and reasoning In real life: cache behavior, memory behavior, swapping behavior, library gotchas, things we don t understand, More on O-notation, big-oh Big-Oh hides/obscures some empirical analysis, but is good for general description of algorithm Ø Allows us to compare algorithms in the limit Ø 20N hours vs N 2 microseconds: which is better? O-notation is an upper-bound, this means that N is O(N), but it is also O(N 2 ); we try to provide tight bounds. Compsci201, Fall 2016 9.31 Compsci 201, Fall 2016 9.32

More on O-notation, big-oh O-notation is an upper-bound, this means that N is O(N), but it is also O(N 2 ); we try to provide tight bounds. Formally: Ø A function g(n) is O(f(N)) if there exist constants c and n such that g(n) < cf(n) for all N > n cf(n) g(n) Notations for measuring complexity O-notation/big-Oh: O(n 2 ) is used in algorithmic analysis, e.g., Compsci 330 at Duke. Upper bound in the limit Ø Correct to say that linear algorithm is O(n 2 ), but useful? Omega is lower bound: Ω(n log n) is a lower bound for comparison based sorts Ø Can't do better than that, a little hard to prove Ø We can still engineer good sorts: TimSort! x = n Compsci 201, Fall 2016 9.33 Compsci 201, Fall 2016 9.34 Simple examples of array/loops: O? for(int k=0; k < list.length; k += 1) { list[k] += 1; // list.set(k, list.get(k)+1); //----- for(int k=0; k < list.length; k += 1) //--- for(int j=k+1; j < list.length; j += 1) if (list[j].equals(l ist [k] )) matches += 1; for(int k=0; k < list.length; k += 1) for(int j=k+1; j < list.length; j *= 2) value += 1; Compsci201, Fall 2016 9.35 Multiplying and adding big-oh Suppose we do a linear search then do another one Ø What is the complexity? O(n) + O(n) Ø If we do 100 linear searches? 100*O(n) Ø If we do n searches on an array of size n? n * O(n) Binary search followed by linear search? Ø What are big-oh complexities? Sum? Ø What about 50 binary searches? What about n searches? Compsci 201, Fall 2016 9.36

What is big-oh about? Intuition: avoid details when they don t matter, and they don t matter when input size (N) is big enough Ø Use only leading term, ignore coefficients y = 3x y = 6x-2 y = 15x + 44 y = x 2 y = x 2-6x+9 y = 3x 2 +4x The first family is O(n), the second is O(n 2 ) Ø Intuition: family of curves, generally the same shape Ø Intuition: linear function: double input, double time, quadratic function: double input, quadruple the time Compsci 201, Fall 2016 9.37 Some helpful mathematics 1 + 2 + 3 + 4 + + N Ø N(N+1)/2, exactly = N 2 /2 + N/2 which is O(N 2 ) why? N + N + N +. + N (total of N times) Ø N*N = N 2 which is O(N 2 ) N + N + N +. + N + + N + + N (total of 3N times) Ø 3N*N = 3N 2 which is O(N 2 ) 1 + 2 + 4 + + 2 N Ø 2 N+1 1 = 2 x 2 N 1 which is O(2 N ) in terms of last term, call it X, this is O(X) Compsci 201, Fall 2016 9.38