Getting in Frnt n Data Quality Thmas C. Redman, Ph.D. the Data Dc, in Ciudad Real, Espana, July, 2016 Data Quality Slutins tmredman@dataqualityslutins.cm /Redman-ICIQ-June2016 DQS 2000-2016 T.C. Redman, Page 1
But first, Muchas gracias, Alarcs Research Grup!!! /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 2
Agenda Data Quality in Practice Mst Imprtant Things t Knw: n Getting in Frnt Apprach n Rles and Respnsibilities The Friday Afternn Measurement: Please g d this! /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 3
Data Quality in AT&T s Access Management Department Tm Redman /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 4
Figure 1: Access Bill Verificatin at AT&T Telc (supplier) AT&C (custmer) Predicted Bill - Prducing Prcess Bill- Prducing Prcess Bills Cmpare Predicted Bills Find discrepancies Respnd File Claims Prcess rebates File Cunterclaims /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 5
A revelatin unflds, We need t clean up ur data every ther year. And its gd fr me I get a big bnus. But smehw it desn t feel right fr the cmpany! Twain: A man with a watch knws what time it is. A man with tw is never sure. This desn t wrk (this is stupid!) /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 6
Yu had me at Hell A Tracked Data Recrd The nly change here invlves frmat. Nt f cncern Step f Prcess Attribute A B C D E Name XYZ.1234 XYZ-1234 XYZ-1234 XYZ-1234 XYZ-1234 Billing 272-791-2424 272-791-9100 272-791-9100 Number Bill Cde 1 A A A A Office 408727 408727 408727 408927 408970... In this case, 1 and A bth mean yes. It reflects pr architecture, but is nt a data errr /Redman-ICIQ-June2016 DQS 2000-2016 Changes in data values. Errrs f serius cncern. Nte we cannt be certain where the errrs actually ccurred. T. C. Redman, Page 7
Start with a Basic Time-Series Plt Clearly, this prcess is brken! /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 8
The Search fr Rt Causes /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 9
The Search fr Rt Causes, Anther Perspective /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 10
Cmbining the Tw Perspectives With this plt, we have identified ptential rt causes well enugh t charter specific imprvement prjects. Eliminating the tp three will reduce the verall errr rate by 80! /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 11
This has t be end-t-end I dn t understand the details, but I trust the lgic. Let s give it a try Yu knw Bb, we dn t cme in every mrning thinking f ways t ful yu up. What des timely and accurate mean? Hmmm. That s A gd questin? /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 12
(Prpsed) Access Financial Assurance Prcess This is what we are ging d! Tracking Results Telc (supplier) This is what We are ging d! AT&T (custmer) Custmer Requirements Step a Step B Step C Step D Step E Prcess management Bills Supplier Management Bth: Eliminate Bill Verificatin (and all it brings Audit Supplier Perfrmance /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 13
Business Results Data accuracy errrs reduced 90% Billing errrs reduced 98% Cycle time (bill perid clsure) reduced 67% Custmer csts reduced 73% Csts acrss supplier base reduced 20% Achieved Financial Assurance Created Financial Predictability Tm, I really d appreciate the $100M. What I appreciate even mre is that nw we can manage the business. That is wrth even mre. /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 14
Getting in Frnt: The Mst Imprtant Things T Knw /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 15
Rising Middle Manager VS /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 16
What Makes Data f High Quality Data are f high quality if they are fit fr their intended uses (by custmers) in peratins, decisin-making, analytics and planning (after Juran). free f defects: - accessible - accurate - up-t-date - etc. Mstly, are the data right? Data that s fit fr use Largely, the right data pssess desired features (attributes): - relevant - cmprehensive - prper level f detail - easy-t-interpret - etc. Custmers are the ultimate arbiters f quality!! /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 17
Day-in, day-ut definitin Meeting the mst imprtant needs f the mst imprtant custmers /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 18
Data Quality: The Nn-delegatable Chice T Clean Up The Lake, One Must First Eliminate The Surces Of Pllutant Unmanaged /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 19
Fr Data, Only Tw Mments Really Matter The Mment f Creatin Nte that they DO NOT ccur in IT The Mment f Use PATH FROM CREATOR TO CUSTOMER /Redman-ICIQ-June2016 DATA CREATOR The whle pint f data quality management is t cnnect the tw! DQS 2000-2016 DATA CUSTOMER T. C. Redman, Page 20
Custmer Rles and Respnsibilities Recgnize that yu ve becme way t tlerant f junk. Recgnize yur hidden data factries and vw t put a stp t them. Srt ut and dcument yur mst imprtant needs. Cmmunicate them t suppliers. Wrk with suppliers t clse the mst imprtant gaps. /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 21
The mst imprtant tl in all f data quality management (and maybe all management) CUSTOMER-SUPPLIER MODEL requirements requirements Yur Business Prcess inputs utputs Suppliers Custmers feedback feedback T be clear, all must build these cmmunicatins channels with data custmers and suppliers. /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 22
Creatr Rles and Respnsibilities Recgnize that the data yu create impacts thers. Understand the mst imprtant needs f the mst imprtant custmers Measure quality against thse needs. Cnduct imprvement prjects t eliminate rt causes. Put in place cntrls t clse the gaps. /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 23
Prvcateurs disrupt the dynamic that leads t hidden data factries Dissatisfied with the status qu. Curage t try smething new. Great crprate citizens. Achieve real results within their spans f cntrl. At all levels! THERE IS A LITTLE PROVOCATEUR IN ALL OF US /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 24
Prvcateurs Can G Only S Far penetratin f DQ acrss rganizatin Prgress f a Typical Data Quality Prgram tractin real results plateau Order-f-magnitude imprvement n sme data next level = mre data time /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 25
Getting in Frnt Takes a Village: New Rles fr Everyne Data custmers and creatrs: Everyne tuches data, s these rles are fundamental! n Ideally, in a prcess framewrk. n Embedded data managers Prvcateurs DQ Teams: Day-in, day-ut faces f DQ n n Chief Data Architect Data Maestr Leadership Technlgists Either prvcateurs r DQ Team lead must help senir management understand what t d /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 26
The Prcess Management Cycle 2. Understand custmer needs 3. Describe prcess 4. Establish measurement system 1. Establish management respnsibilities The prcess management cycle prvides a pwerful, repeatable means t bring the tasks t bear 7. Make imprvements & sustain gains 6. Set and pursue target fr imprvement 5. Establish cntrl & check cnfrmance t requirements /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 27
Typical Result 1. Rqmts defined First-time, n-time results Fractin Perfect Recrds 1 0.9 0.8 0.7 0.6 0.5 2. First Meas 4. Cntrl 3. Imprvements 0 5 10 15 20 Mnth Prgram start Accuracy Rate ave lwer cntrl limit upper cntrl limit target Each errr nt made saves an average f $500. Quickly millins. But the imprved cmpe==ve psi=n may be wrth even mre! /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 28
Dn t Frget the Metadata Metadata are data abut data Yu have t take gd care f at least sme metadata. The principles f data quality management apply A Chief Data Architect, shuld lead wrk n cmmn definitins. Embeds play huge rles here! METADATA /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 29
The Friday Afternn Measurement /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 30
The Friday afternn measurement aims t help answer the questin, D I need t wrry abut data quality? /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 31
Friday Afternn Measurement Prtcl Assemble last 100 recrds Assemble 2-3 experts Mark bviuslyerred data in red Summarize and interpret results /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 32
Assemble the last 100 recrds Select 10 15 mst imprtant attributes Attribute 1: Attribute 2: Size Attribute 3: Amunt Recrd A Jane De Null $472.13 Recrd B Jhn Smith Medium $126.93 C Stuart Madnick XXXL Null D Thams Jnes etc Recrd 100 James Olsen One Lcked Place $76.24 After Fig 18.2, Redman, Data Quality: /Redman-ICIQ-June2016 DQS 2000-2016 The T. C. Field Redman, Guide Page 33
Mark the bvius errrs Attribute 1: Attribute 2: Size Attribute 3: Amunt Recrd A Jane De Null $472.13 Recrd B Jhn Smith Medium $126.93 C Stuart Madnick XXXL Null D Thams Jnes etc Recrd 100 James Olsen One Lcked Place $76.24 After Fig 18.2, Redman, Data Quality: /Redman-ICIQ-June2016 DQS 2000-2016 The T. C. Field Redman, Guide Page 34
Rate the recrd as perfect r nt Attribute 1: Attribute 2: Size Attribute 3: Amunt etc recrd perfect? (y/n) Recrd A Jane De Null $472.13 n Recrd B Jhn Smith Medium $126.93 y C Stuart Madnick XXXL Null n D Thams Jnes n Recrd 100 James Olsen One Lcked Place $76.24 n After Fig 18.2, Redman, Data Quality: /Redman-ICIQ-June2016 DQS 2000-2016 The T. C. Field Redman, Guide Page 35
Cunt the perfects Attribute 1: Attribute 2: Size Attribute 3: Amunt etc recrd perfect? (y/ n) Recrd A Jane De Null $472.13 n Recrd B Jhn Smith Medium $126.93 y C Stuart Madnick XXXL Null n D Thams Jnes n Recrd 100 James Olsen One Lcked Place $76.24 n Cunt perfect After Fig 18.2, Redman, Data Quality: /Redman-ICIQ-June2016 DQS 2000-2016 The T. C. Field Redman, Guide Page 36 67
Data Quality = 67% Here the interpretatin is a full third f recent custmer rders had a serius DQ issue. A wrry indeed! /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 37
Summary Order-f-magnitude imprvements are pssible. Such imprvements bring rich rewards Sner r later, yu have t getting in frnt. The mst imprtant rles are data custmers and data creatrs. There is a little prvcateur in all f us. Time t let it ut. Making an initial measurement is nt s hard using the Friday Afternn Measurement. /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 38
Questins? Thmas C. Redman, Ph.D. the Data Dc +1 732-933-4669 tmredman@dataqualityslutins.cm www.dataqualityslutins.cm /Redman-ICIQ-June2016 DQS 2000-2016 T. C. Redman, Page 39
Clinician: Thmas C. Redman, the Data Dc Ph.D., Statistics, Flrida State, 1980. Cnceived and led the Data Quality Lab at AT&T Bell Labs. Frmed Data Quality Slutins in 1996. Latest and greatest: Data s Credibility Prblem, HBR, Dec, 2013. Data Driven: Prfiting frm Yur Mst Imprtant Business Asset, Harvard Business Schl Press, 2008. Knwn bias: Data are quite bviusly the key asset f the Infrmatin Age. Yet tday s rganizatins are unfit fr data. Further, despite enrmus ptential, few are yet cnsidering hw they will cmpete with data. Finally, high-quality data is pre-requisite. These crystallize THE management challenges f the 21st century. /Redman-ICIQ-June2016 DQS 2000-2016 Picture T.C. Redman, Page 40
The Statistician in View 1980 /Redman-ICIQ-June2016 Tday DQS 2000-2016 T. C. Redman, Page 41