Learning Systems. Research at the Intersection of Machine Learning & Data Systems. Joseph E. Gonzalez

Similar documents
Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

CS 5523: Operating Systems

Processes. Criteria for Comparing Scheduling Algorithms

Instructors: Tengyu Ma and Chris Re

Category-level localization. Cordelia Schmid

Exploring QR Factorization on GPU for Quantum Monte Carlo Simulation

Combating Friend Spam Using Social Rejections

Adaptive QoS Control for Real-Time Systems

CS 5523 Operating Systems: Intro to Distributed Systems

Support Vector Machines

Fall Detection for Older Adults with Wearables. Chenyang Lu

Cyber-Physical Systems Scheduling

CSE 520S Real-Time Systems

Servilla: Service Provisioning in Wireless Sensor Networks. Chenyang Lu

Hoboken Public Schools. Project Lead The Way Curriculum Grade 8

Deep Learning Working Group R-CNN

October Next Generation Smart Border Security Ability. Quality. Delivery.

A comparative analysis of subreddit recommenders for Reddit

Tengyu Ma Facebook AI Research. Based on joint work with Yuanzhi Li (Princeton) and Hongyang Zhang (Stanford)

Final Review. Chenyang Lu. CSE 467S Embedded Compu5ng Systems

IBM Cognos Open Mic Cognos Analytics 11 Part nd June, IBM Corporation

Real-Time Wireless Control Networks for Cyber-Physical Systems

Deep Learning and Visualization of Election Data

Cyber-Physical Systems Feedback Control

Robust Electric Power Infrastructures. Response and Recovery during Catastrophic Failures.

Deep Classification and Generation of Reddit Post Titles

Random Forests. Gradient Boosting. and. Bagging and Boosting

Introduction-cont Pattern classification

IDEMIA Identity & Security. Providing identity assurance to. secure & simplify lives N.A.

Quality of Service in Optical Telecommunication Networks

Tengyu Ma Facebook AI Research. Based on joint work with Rong Ge (Duke) and Jason D. Lee (USC)

Real- Time Wireless Control Networks for Cyber- Physical Systems

Real-Time CORBA. Chenyang Lu CSE 520S

Real-Time Scheduling Single Processor. Chenyang Lu

4th International Industrial Supercomputing Workshop Supercomputing for industry and SMEs in the Netherlands

Classification of posts on Reddit

Proving correctness of Stable Matching algorithm Analyzing algorithms Asymptotic running times

A Dead Heat and the Electoral College

Discourse Obligations in Dialogue Processing. Traum and Allen Anubha Kothari Meaning Machines, 10/13/04. Main Question

Classifier Evaluation and Selection. Review and Overview of Methods

Department of Industrial Engineering: Research Groups

File Systems: Fundamentals

HPCG on Tianhe2. Yutong Lu 1,Chao Yang 2, Yunfei Du 1

11th Annual Patent Law Institute

INVESTIGATIVE POWER IN PRACTICE - Contribution from Brazil

Case Study. MegaMatcher Accelerator

Key Considerations for Implementing Bodies and Oversight Actors

Subreddit Recommendations within Reddit Communities

VUSUMUZI MKHIZE 16 January 2017

The Predictive Potential of Political Discourse. Leah Windsor Institute for Intelligent Systems The University of Memphis

A Bloom Filter Based Scalable Data Integrity Check Tool for Large-scale Dataset

Lecture 8: Verification and Validation

Performance & Energy

U.S. Department of Homeland Security: Improved homeland security management and biometrics through the US-VISIT program

1/12/12. Introduction-cont Pattern classification. Behavioral vs Physical Traits. Announcements

THE PRIMITIVES OF LEGAL PROTECTION AGAINST DATA TOTALITARIANISMS

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

Using Satellite Imagery of Night Lights to Study Patronage and Politics in Africa: A Research Proposal

CS 4407 Algorithms Greedy Algorithms and Minimum Spanning Trees

Digital research data in the Sigma2 prospective

Going with the flow. Helping border agencies to exploit technology convergence to gain consistent, comprehensive and automated border management

CSE 308, Section 2. Semester Project Discussion. Session Objectives

Predicting Congressional Votes Based on Campaign Finance Data

Vote Compass Methodology

Cluster Analysis. (see also: Segmentation)

The Social Web: Social networks, tagging and what you can learn from them. Kristina Lerman USC Information Sciences Institute

Case Bb (elastic, 1D vertical gradient)

Hoboken Public Schools. Project Lead The Way Curriculum Grade 7

Genetic Algorithms with Elitism-Based Immigrants for Changing Optimization Problems

Comparison of the Psychometric Properties of Several Computer-Based Test Designs for. Credentialing Exams

30 Transformational Design with Essential Aspect Decomposition: Model-Driven Architecture (MDA)

Aspect Decomposition: Model-Driven Architecture (MDA) 30 Transformational Design with Essential. References. Ø Optional: Ø Obligatory:

Configuring MST (802.1s)/RSTP (802.1w) on Catalyst Series Switches Running CatOS

Probabilistic Latent Semantic Analysis Hofmann (1999)

Philips Lifeline. Ø Chenyang Lu 1

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Congressional Gridlock: The Effects of the Master Lever

CS 229: r/classifier - Subreddit Text Classification

Research and strategy for the land community.

Local differential privacy

Cloud Tutorial: AWS IoT. TA for class CSE 521S, Fall, Jan/18/2018 Haoran Li

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

CHE 572: Modelling Process Dynamics

Incumbency as a Source of Spillover Effects in Mixed Electoral Systems: Evidence from a Regression-Discontinuity Design.

2017 KOF Index of Globalization

Batch binary Edwards. D. J. Bernstein University of Illinois at Chicago NSF ITR

Lab 3: Logistic regression models

Text UI. Data Store Ø Example of a backend to a real Could add a different user interface. Good judgment comes from experience

LEGAL TERMS OF USE. Ownership of Terms of Use

The Effectiveness of Receipt-Based Attacks on ThreeBallot

FOURIER ANALYSIS OF THE NUMBER OF PUBLIC LAWS David L. Farnsworth, Eisenhower College Michael G. Stratton, GTE Sylvania

VOTING DYNAMICS IN INNOVATION SYSTEMS

Why Biometrics? Why Biometrics? Biometric Technologies: Security and Privacy 2/25/2014. Dr. Rigoberto Chinchilla School of Technology

Structural Folds: Generative Disruption in Overlapping Groups. Balázs Vedres David Stark

Processing for Security Systems

TERMS OF REFERENCE. Overview:

2018 EU-wide Stress Test exercise. Introducing IFRS 9 requirements and new reporting templates

ABC and Integrated Border management

Towards Large Eddy Simulation for Turbo-machinery Flows

CONCRETE: A benchmarking framework to CONtrol and Classify REpeatable Testbed Experiments

Transcription:

Learning Systems Research at the Intersection of Machine Learning & Data Systems Joseph E. Gonzalez Asst. Professor, UC Berkeley jegonzal@cs.berkeley.edu

How can machine learning techniques be used to address systems challenges? Learning Systems How can systems techniques be used to address machine learning challenges?

How can machine learning techniques be used to address systems challenges? Learning Systems How can systems techniques be used to address machine learning challenges?

How can machine learning techniques be used to address systems challenges? Systems are getting increasing complex: Ø Resource Disaggregation à growing diversity of system configurations and freedom to add resources as needed Ø New Pricing Models à dynamic pricing and potential to bid for different types of resources Ø Data-centric Workloads à performance depends on interaction between system, algorithms, and data

Paris Performance Aware Runtime Inference System Neeraja Yadwadkar Bharath Hariharan Randy Katz Ø What vm-type should I use to run my experiment? m4.xlarge r3.4xlarge r3.2xlarge m4.large c4.large t2.large r3.xlarge m4.4xlarge r3.large m3.xlarge g2.8xlarge m4.2xlarge g2.2xlarge m3.2xlarge m3.medium t2.micro c4.xlarge c4.2xlarge m3.large r3.8xlarge c4.4xlarge c4.8xlarge t2.small x1.32xlarge t2.medium t2.nano m4.10xlarge

Paris Performance Aware Runtime Inference System Neeraja Yadwadkar Bharath Hariharan Randy Katz Ø What vm-type should I use to run my experiment? r3.2xlarge m3.xlarge m4.2xlarge c4.large m4.large m4.4xlarge r3.xlarge m3.medium g2.2xlarge r3.large t2.large g2.8xlarge 54 Instance Types r3.4xlarge m4.xlarge t2.micro m3.2xlarge r3.8xlarge t2.small x1.32xlarge c4.4xlarge t2.medium m3.large c4.2xlarge c4.8xlarge c4.xlarge t2.nano m4.10xlarge

Paris Performance Aware Runtime Inference System Neeraja Yadwadkar Ø What vm-type should I use to run my experiment? Bharath Hariharan Randy Katz t2.small m4.large r3.large t2.micro c4.large c4.8xlarge m4.2xlarge c4.4xlarge t2.medium m4.10xlarge t2.large m3.medium c4.xlarge g2.8xlarge m4.4xlarge c4.2xlarge r3.8xlarge r3.xlarge g2.2xlarge m4.xlarge m3.xlarge m3.large x1.32xlarge r3.4xlarge m3.2xlarge t2.nano r3.2xlarge 54 25 18 Ø Answer: workload specific and depends on cost & runtime goals

Paris Performance Aware Runtime Inference System Neeraja Yadwadkar Bharath Hariharan Randy Katz Ø Best vm-type depends on workload as well as cost & runtime goals Price Runtime Which VM will cost me the least? m1.small is cheapest?

Paris Performance Aware Runtime Inference System Neeraja Yadwadkar Bharath Hariharan Randy Katz Ø Best vm-type depends on workload as well as cost & runtime goals Price Runtime Job Cost Requires accurate runtime prediction.

Paris Performance Aware Runtime Inference System Neeraja Yadwadkar Bharath Hariharan Randy Katz Ø Goal: Predict the runtime of workload w on VM type v Ø Challenge: How do we model workloads and VM types Ø Insight: Ø Extensive benchmarking to model relationships between VM types Ø Costly but run once for all workloads Ø Lightweight workload fingerprinting by on a small set of test VMs Ø Generalize workload performance on other VMs Ø Results: Runtime prediction 17% Relative RMSE (56% Baseline) Benchmarking vm1 vm2 vm100 Workload Fingerprinting

*follow-up work to Shivaram s Ernest paper Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Venkataraman Xinghao Pan Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? Zi Zheng Iter. / Sec. I(p) Systems Metric Cores Iterations per second as a function of cores p Loss L(i, p) ML Metric Iteration Loss as a function of iterations i and cores p We can estimate I from data on many systems We can estimate L from data for our problem

Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Venkataraman Xinghao Pan Zi Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? L(i, p) I(p) Loss as a function of iterations i and cores p Iterations per second as a function of cores p loss(t, p) =L (t I (p), p) How long does it take to get to a given loss? Given a time budget and number of cores which algorithm will give the best result? *follow-up work to Shivaram s Ernest paper

Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Ø Smart naming of variables and routines Ø Learn coding styles and patterns Ø Predict large code fragments Ø Char and Symbol LSTMs Xin Wang Chang Liu Dawn Song def fib(x): if x < 2 : return x else: Ø Programs are more tree shaped y = fib(x 1) + fib(x 2) return y

Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Ø Smart naming of variables and routines Ø Learn coding styles and patterns Ø Predict large code fragments Ø Char and Symbol LSTMs Xin Wang Chang Liu Dawn Song def fib( ): x if x < 2 return x Parse Tree = Ø Programs are more tree shaped y + fib(x 1) fib(x 2) return y

Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Ø Smart naming of variables and routines Ø Learn coding styles and patterns Ø Predict large code fragments Ø Char and Symbol LSTMs Xin Wang Chang Liu Dawn Song def fib( ): x if x < 2 return x Parse Tree = y Ø Exploring Tree LSTMs Ø Issue: dependencies flow in both directions + fib(x 1) fib(x 2) return y Kai Sheng Tai, Richard Socher, Christopher D. Manning. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. (ACL 2015)

Deep Code Completion Neural architectures for reasoning about computer programs Ø Goals: Ø Smart naming of variables and routines Ø Learn coding styles and patterns Ø Predict large code fragments Ø Current studying Char-LSTM and Tree-LSTM on benchmark C++ code and JavaScript code. Ø Plan to extend Tree-LSTM with downward information flow Xin Wang Chang Liu Dawn Song Vanilla LSTM Tree- LSTM

Fun Code Sample Generated by Char-LSTM Code Prefix Generated Code Sample For now, the neural network can learn some code patterns like matching the parenthesis, if-else block, etc but the variable name issue still hasn t been solved. *this is trained on the leetcode OJ code submissions from Github.

How can machine learning techniques be used to address systems challenges? Learning Systems How can systems techniques be used to address machine learning challenges?

How can machine learning techniques be used to address systems challenges? Learning Systems How can systems techniques be used to address machine learning challenges?

Systems for Machine Learning Big Data Training Big Model Timescale: minutes to days Systems: offline and batch optimized Heavily studied... primary focus of the ML research

Big Data Training Big Model Splash CoCoA Please make a Logo!

Big Data Training Big Model emgine Splash CoCoA Please make a Logo!

Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, Francois Billetti Evan Sparks Ø for high-dimensional & irregularly sampled time series Xin Wang Regularly Sampled Samples are easy to align (requires sorting) Sensor 1 Sensor 2 Sensor 3 Time Time Time Irregularly Sampled Difficult to align! Sensor 1 Sensor 2 Sensor 3 Time Time t 0 t 1 t 2 t 3 t 4 t 5 t 6 Time

Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, Francois Billetti Evan Sparks Ø for high-dimensional & irregularly sampled time series Xin Wang Irregularly Sampled Difficult to align! Sensor 1 Sensor 2 Sensor 3 Time Time Time Solution: Project onto Fourier basis does not require data alignment Infer statistics in frequency domain equivalent to kernel smoothing analysis of bias variance tradeoff

Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, Francois Billetti Evan Sparks Ø for high-dimensional & irregularly sampled time series Xin Wang Solution: Project onto Fourier basis does not require data alignment Infer statistics in frequency domain equivalent to kernel smoothing analysis of bias variance tradeoff emgine Define an operator DAG (like TF) and then rely on query-optimization to define efficient execution.

Learning Big Data Training Big Model

Learning Inference Query Big Data Training Big Model Decision? Application

Inference Learning Query Big Data Training Big Model Decision Application Timescale: ~10 milliseconds Systems: online and latency optimized Less Studied

why is Inference challenging? Need to render low latency (< 10ms) predictions for complex Models Queries Features Top K SELECT * FROM users JOIN items, click_logs, pages WHERE under heavy load with system failures.

Inference Learning Big Data Claim: next big area of research in Training scalable ML systems Big Model Query Decision Application Timescale: ~10 milliseconds Systems: online and latency optimized Less studied

Learning Inference Query Big Data Training Feedback Big Model Decision Application

Learning Inference Training Decision Big Data Timescale: hours to weeks Issues: No standard solutions implicit feedback, sample bias, Feedback Application

Why is Feedback challenging? Ø Exposes system to feedback loops Ø Address Explore Exploit trade-off in real-time Ø Adverserial feedback Ø Opportunities for multi-task learning and anomly detection Ø Need to address temporal variation Ø Need to model time directly? When do we forget the past?

Learning Inference Query Big Data Training Feedback Big Model Decision Application

Learning Inference Query Big Data Training Adaptive (~1 seconds) Feedback Big Model Responsive (~10ms) Decision Application

Learning Adaptive (~1 seconds) Inference Responsive (~10ms) Techniques we are studying (or should be ): Multi-task Learning Adaptive Batching Online Ensemble Learning Approx. Caching Load Shedding Anytime Inference Model Compression Model Switching Meta-Policy RL Inference on the Edge

Prediction Serving Daniel Crankshaw Xin Wang Giulio Zhou Michael Franklin Ion Stoica

Learning Inference Big Data Training Query Decision Feedback Application

Learning Inference Big Data Training Slow Changing Parameters Fast Changing Parameters Query Decision Feedback Slow Application

Hybrid Offline + Online Learning Update feature functions offline using batch solvers Leverage high-throughput systems (Tensor Flow) Exploit slow change in population statistics f(x; ) T w u Update the user weights online: Simple to train + more robust model Address rapidly changing user statistics

Common modeling structure f(x; ) T w u Matrix Factorization Items Deep Learning Ensemble Methods Users Input

Clipper Online Learning for Recommendations (Simulated News Rec.) Error 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 Examples Partial Updates: 0.4 ms Retraining: 7.1 seconds >4 orders-of-magnitude faster adaptation

Learning Inference Big Data Slow Changing Parameters Fast Changing Parameters Feedback Slow Application

Learning Big Data Slow Changing Parameters Clipper Fast Changing Parameters Inference Feedback Caffe Slow Application

Clipper Serves Predictions across ML Frameworks Fraud Detection Content Rec. Personal Asst. Robotic Control Machine Translation Clipper Create VW Caffe

Clipper Architecture Applications Predict RPC/REST Interface Observe Clipper Create Caffe VW

Clipper Architecture Applications Predict RPC/REST Interface Observe Clipper RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

Clipper Architecture Applications Predict RPC/REST Interface Clipper Observe Improve accuracy through ensembles, online learning and personalization Provide a common interface to models while bounding latency and maximizing throughput. Model Selection Layer Model Abstraction Layer RPC RPC RPC RPC Model Wrapper (MW) MW MW MW

Clipper Architecture Applications Predict RPC/REST Interface Clipper Observe Anytime Predictions Approximate Caching Adaptive Batching Model Selection Layer Model Abstraction Layer RPC RPC RPC RPC Model Wrapper (MW) MW MW MW

Adaptive Batching to Improve Throughput Ø Why batching helps: A single page load may generate many queries Ø Optimal batch depends on: Ø hardware configuration Ø model and framework Ø system load Clipper Solution: Hardware Acceleration Helps amortize system overhead be as slow as allowed Ø Application specifies latency objective Ø Clipper uses TCP-like tuning algorithm to increase latency up to the objective

Tensor Flow Conv. Net (GPU) Latency (ms) Optimal Batch Size Latency Deadline Batch Sizes (Queries) Throughput (Queries Per Second)

Approximate Caching to Reduce Latency Ø Opportunity for caching Popular items may be evaluated frequently Clipper Solution: Approximate Caching apply locality sensitive hash functions Cache Hit? Ø Need for approximation Bag-of-Words Model Images High Dimensional and continuous valued queries have low cache hit rate. Cache Miss?? Cache Hit Error

Adaptive Batching to Improve Throughput Ø Why batching helps: A single page load may generate many queries Ø Optimal batch depends on: Ø hardware configuration Ø model and framework Ø system load Clipper Solution: Hardware Acceleration Helps amortize system overhead be as slow as allowed Ø Application specifies latency objective Ø Clipper uses TCP-like tuning algorithm to increase latency up to the objective

Tensor Flow Conv. Net (GPU) Latency (ms) Optimal Batch Size Latency Deadline Batch Sizes (Queries) Throughput (Queries Per Second)

Anytime Predictions 20ms Slow Changing Model Clipper Fast Changing Linear Model Caffe Solution: Replace missing prediction with an estimator Application E[ (x) ]

Anytime Predictions Fast Changing Model w f scikit (x) + E X [f TF (X)] + f Ca e (x) scikit w TF w Caffe Slow Changing Model Caffe

Comparison to TensorFlow Serving Takeaway: Clipper is able to match the average latency of TensorFlow Serving while reducing tail latency (2x) and improving throughput (2x)

Evaluation of Throughput Under Heavy Load Accuracy Throughput (queries per second) Takeaway: Clipper is able to gracefully degrade accuracy to maintain availability under heavy load.

Improved Prediction Accuracy (ImageNet) System Model Error Rate #Errors Caffe VGG 13.05% 6525 Caffe LeNet 11.52% 5760 Caffe ResNet 9.02% 4512 TensorFlow Inception v3 6.18% 3088 sequence of pre-trained models

Improved Prediction Accuracy (ImageNet) System Model Error Rate #Errors Caffe VGG 13.05% 6525 5.2% relative improvement in prediction accuracy! Caffe LeNet 11.52% 5760 Caffe ResNet 9.02% 4512 TensorFlow Inception v3 6.18% 3088 Clipper Ensemble 5.86% 2930

Clipper Create Caffe VW Clipper prediction serving system that spans multiple ML Frameworks and is designed to Ø to simplifying model serving Ø bound latency and increase throughput Ø and enable real-time learning and personalization across machine learning frameworks

Learning Systems Graduate students collaborators on this work: Joseph E. Gonzalez 773 Soda Hall jegonzal@cs.berkeley.edu Francois Billetti Daniel Crankshaw Ankur Dave Xinghao Pan Xin Wang Neeraja Yadwadkar Wenting Zheng

R SE Real-time, Intelligent, and Secure Systems Lab

RISE Lab From live data to real-time decisions AMP Lab From batch data to advanced analytics

Goal Real-time decisions on live data decide in ms the current state of the environment the current state as data arrives with strong security privacy, confidentiality, and integrity privacy, confidentiality, integrity 65

R SE Real-time, Intelligent, and Secure Systems Lab Learn More: CS294 Course on RISE Topics https://ucbrise.github.io/cs294-rise-fa16/ Early RISErs Seminar on Mondays at 9:30 AM

Security: Protecting Models Data is a core asset & models capture the value in data Ø Expensive: many engineering & compute hours to develop Ø Models can reveal private information about the data How do we protect models from being stolen? Ø Prevent them from being copied from devices (DRM? SGX?) Ø Defend against active learning attacks on decision boundaries How do we identify when models have been stolen? Ø Watermarks in decision boundaries?