Linear Tabling Strategies and Optimization Techniques

Similar documents
Mixed-Strategies for Linear Tabling in Prolog

ProbLog Technology for Inference in a Probabilistic First Order Logic

information it takes to make tampering with an election computationally hard.

WUENIC A Case Study in Rule-based Knowledge Representation and Reasoning

Chapter 8: Recursion

16. How to Structure Large Models and Programs with Graph Structurings

Secure Electronic Voting: Capabilities and Limitations. Dimitris Gritzalis

ETH Model United Nations

Title: Local Search Required reading: AIMA, Chapter 4 LWH: Chapters 6, 10, 13 and 14.

Graph Structurings. 16. How to Structure Large Models - Obligatory Reading. Ø T. Fischer, Jörg Niere, L. Torunski, and Albert Zündorf, 'Story

Hoboken Public Schools. PLTW Introduction to Computer Science Curriculum

Users reading habits in online news portals

30 Transformational Design with Essential Aspect Decomposition: Model-Driven Architecture (MDA)

Aspect Decomposition: Model-Driven Architecture (MDA) 30 Transformational Design with Essential. References. Ø Optional: Ø Obligatory:

Event Based Sequential Program Development: Application to Constructing a Pointer Program

Genetic Algorithms with Elitism-Based Immigrants for Changing Optimization Problems

A logic for making hard decisions

NP-Hard Manipulations of Voting Schemes

Biogeography-Based Optimization Combined with Evolutionary Strategy and Immigration Refusal

Electronic Voting For Ghana, the Way Forward. (A Case Study in Ghana)

Complexity of Manipulating Elections with Few Candidates

THE PATENTABILITY OF COMPUTER-IMPLEMENTED INVENTIONS. Consultation Paper by the Services of the Directorate General for the Internal Market

11th Annual Patent Law Institute

Enhancement of Attraction of Utility Model System

Global Changes and Fundamental Development Trends in China in the Second Decade of the 21st Century

6th BILETA Conference An Expert System for Improving the Pretrial Release/Detention Decision

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Secure and Reliable Electronic Voting. Dimitris Gritzalis

Thirteenth Australian Computer Science Conference ACSC-13

Risk-Limiting Audits for Denmark and Mongolia

PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use:

Security Analysis on an Elementary E-Voting System

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

Backoff DOP: Parameter Estimation by Backoff

Transformation of Chinese Government s Economic Function under Globalization

The Buddy System. A Distributed Reputation System Based On Social Structure 1

The usage of electronic voting is spreading because of the potential benefits of anonymity,

A Calculus for End-to-end Statistical Service Guarantees

Hoboken Public Schools. Project Lead The Way Curriculum Grade 8

I am broadly interested in theoretical computer science. My current research focuses on computational social choice theory.

Lecture 8: Verification and Validation

Estimating the Margin of Victory for Instant-Runoff Voting

30 Transformational Design with Essential Aspect Decomposition: Model-Driven Architecture (MDA)

Discourse Obligations in Dialogue Processing. Traum and Allen Anubha Kothari Meaning Machines, 10/13/04. Main Question

Viktória Babicová 1. mail:

The Effectiveness of Receipt-Based Attacks on ThreeBallot

Introduction to Computational Social Choice. Yann Chevaleyre. LAMSADE, Université Paris-Dauphine

Solutions of Implication Constraints yield Type Inference for More General Algebraic Data Types

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Development of the UNESCO Database of National Cultural Heritage Laws Phase III. Project proposal

Comparison on the Developmental Trends Between Chinese Students Studying Abroad and Foreign Students Studying in China

Australian AI 2015 Tutorial Program Computational Social Choice

Study on Problems in the Ideological and Political Education of College Students and Countermeasures from the Perspective of Institutionalization

Research on the Education and Training of College Student Party Members

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

The Logical Structure of a Legal System Proving the Validity of Law

International Dialogue on Migration (IDM) 2016 Assessing progress in the implementation of the migration-related SDGs

Midterm Review. EECS 2011 Prof. J. Elder - 1 -

Operation Mode Analysis-Based National Sports Non-Profit Organization Modern Administrative Research

An Empirical Study of the Manipulability of Single Transferable Voting

Aadhaar Based Voting System Using Android Application

Programming in Logic: Prolog

Studies on translation and multilingualism

Strategic Reasoning in Interdependence: Logical and Game-theoretical Investigations Extended Abstract

Experiments on Data Preprocessing of Persian Blog Networks

WORLD INTELLECTUAL PROPERTY ORGANIZATION GENEVA SPECIAL UNION FOR THE INTERNATIONAL PATENT CLASSIFICATION (IPC UNION) AD HOC IPC REFORM WORKING GROUP

U.S. District Court [LIVE] Eastern District of TEXAS

Designing police patrol districts on street network

Preamble. THE GOVERNMENT OF THE UNITED STATES OF AMERICA AND THE GOVERNMENT OF THE KINGDOM OF SWEDEN (hereinafter referred to as the Parties ):

Probabilistic Latent Semantic Analysis Hofmann (1999)

The 1st. and most important component involves Students:

Professur für Policy Analyse und Politische Wirtschaftslehre. Industry 4.0. Smart Factory Workshop. Tübingen, August 29, 2016 / Daniel Buhr

Paris International Model United Nations

Title: Solving Problems by Searching AIMA: Chapter 3 (Sections 3.1, 3.2 and 3.3)

ECE250: Algorithms and Data Structures Trees

Secure Electronic Voting: New trends, new threats, new options. Dimitris Gritzalis

Subreddit Recommendations within Reddit Communities

Computational social choice Combinatorial voting. Lirong Xia

Exploring QR Factorization on GPU for Quantum Monte Carlo Simulation

China s Road of Peaceful Development and the Building of Communities of Interests

TERMS OF USE FOR PUBLIC LAW CORPORATION CERTIFICATES OF SECURE APPLICATION

Title: Adverserial Search AIMA: Chapter 5 (Sections 5.1, 5.2 and 5.3)

CUG Members' Handbook

Statement on Security & Auditability

CSE 308, Section 2. Semester Project Discussion. Session Objectives

Browsing case-law: an Application of the Carneades Argumentation System

A Patents, Copyrights, Intellectual Property Policy

NEW YORK CITY COLLEGE OF TECHNOLOGY The City University of New York

Secure Voter Registration and Eligibility Checking for Nigerian Elections

Attachment: Opinions on the Draft Amendment of the Implementing Regulations of the Patent Law of the People s Republic of China

Scytl. Enhancing Governance through ICT solutions World Bank, Washington, DC - September 2011

Polydisciplinary Faculty of Larache Abdelmalek Essaadi University, MOROCCO 3 Department of Mathematics and Informatics

Electrical Engineering and Computer Science Department

CS 5523 Operating Systems: Intro to Distributed Systems

Primecoin: Cryptocurrency with Prime Number Proof-of-Work

Computational challenges in analyzing and moderating online social discussions

Annex Seminar Information

Key Considerations for Implementing Bodies and Oversight Actors

ENGLISH LANGUAGE ARTS IV Correlation to Common Core READING STANDARDS FOR LITERATURE KEY IDEAS AND DETAILS Student Text Practice Book

CS 4407 Algorithms Greedy Algorithms and Minimum Spanning Trees

Transcription:

Linear Tabling Strategies and Optimization Techniques Neng-Fa Zhou CUNY Brooklyn College and Graduate Center Summary Tabling is a technique that can get rid of infinite loops and redundant computations in the execution of recursive logic programs. The main idea of tabling is to memorize the answers to subgoals and use the answers to resolve their variant descendents. Tabling helps narrow the gap between declarative and procedural readings of logic programs. It not only is useful in the problem domains that motivated its birth, such as program analysis, parsing, deductive database, and theorem proving, but also has been found essential in several other problem domains such as model checking, learning, and data mining. Early resolution mechanisms proposed for tabling such as OLDT rely on suspension and resumption of subgoals to compute fixpoints. Recently, a new resolution framework called linear tabling, envisioned by the proposer and several other researchers, has received considerable attention because of its simplicity, ease of implementation, and good space efficiency. The idea of linear tabling is to use depth-first iterative deepening rather than suspension to compute fixpoints. Linear tabling is still immature compared with OLDT and a great of potential remains to be exploited. The objective of this project is to qualitatively and quantitatively analyze possible strategies and propose effective optimization techniques to make it sustainable to large applications such as natural language and data mining applications. About My Collaborator:Yi-Dong Shen is a Professor at the Software Institute of the Chinese Academy of Sciences in Beijing, a leading institution in China. He is an active researcher in logic programming theories and has published a number of papers in top-ranked journals such as Theoretical Computer Science, Theory and Practice of Logic Programming, New Generation Computing, and ACM Transactions on Computational Logic. I first met Prof. Dr. Shen in 1998 while both of us were visiting the Computing Science department of the University of Alberta, and have had opportunities to collaborate with him on the early tabling mechanisms. His research experience and expertise in logic programming theories and his enthusiasm in the research in tabling and data mining fit perfectly to the scope of the proposed research. Scientific Merits: This research will strive to make both theoretical and practical contributions. The soundness and completeness of all the optimization techniques will be proved and the techniques will be implemented and experimentally compared. The results will be published in conference proceedings and journals, and the resulting system will be disseminated broadly to students, researchers, and users in industry. Broader Impacts: Dr. Shen has a research group at the Software Institute and the PI has students including both undergraduate and PhD students working on applying tabled

OISE: Zhou 3 Prolog to bioinformatics, heuristics-learning for constraint solving, and protocol verification. The cooperation will enhance the connection between our research groups, help establish a formal relationship between CUNY Computer Science and the Software Institute, and also enhance the involvement of students in the application projects. 1 Introduction 1.1 Background Recently there has been a growing interest of research in tabling because of its usefulness in a variety of application domains including program analysis, parsing, deductive database, theorem proving, model checking, and logic-based probabilistic learning [7, 11, 14, 17, 20, 23, 28, 29, 33]. The main idea of tabling is to memorize the answers to some subgoals and use the answers to resolve subsequent variant subgoals. This idea of caching previously calculated solutions, called memoization, was first used to avoid redundant evaluation of functions [13], and later was used to deal with left-recursion and redundancy in top-down descent parsers [11, 16, 29]. Tabling brings several powerful features of bottom-up evaluation such as completeness and termination into goal-directed top-down evaluation. Tabling has become a practical technique thanks to the availability of large amounts of memory in computers. It has become an embedded feature in a number of other logic programming systems such as ALS [9], B-Prolog [30, 35, 32, 34], Mercury, XSB [22], and YAP [21]. OLDT [25] is the first revised SLD resolution that accommodates the idea of tabling. In OLDT, a table area is used to record subgoals and their answers. When a subgoal (producer) is first encountered in execution, it is resolved by using program clauses just as in SLD resolution with the exception that the subgoal and its answers are recorded in the table. When a subgoal (consumer) is encountered that is subsumed by one of its ancestors, OLDT does not expand it as in SLD resolution but rather uses the answers in the table to resolve it. After the answers are exhausted, the computation of the consumer is suspended until the producer produces new answers into the table. The process is continued until the fixpoint is reached, i.e. when no answers are available for consumers and no producers can produce any new answers. Several other resolution formalisms including SLG [4] and SLS [3] have been developed for tabling that rely on suspension and resumption of subgoals to compute fixpoints. XSB is the first Prolog system that successfully supports tabling based on this idea [22]. OLDT is non-linear in the sense that the state of a consumer must be preserved before execution backtracks to its producer. This non-linearity requires freezing stack segments [22] or copying stack segments into a different area [6] before backtracking

OISE: Zhou 4 takes place. Recently, another formalism, called linear tabling 1, has become an effective alternative tabling mechanism [24, 36, 9]. The main idea of linear tabling is to use depth-first iterative deepening rather than suspension to compute fixpoints. A significant difference between linear tabling and OLDT lies in the handling of variant descendents of a subgoal. In linear tabling, after a descendent consumes all the answers, it either fails or turns into a producer, producing answers by using the alternative clauses of the ancestor. A subgoal is called a looping subgoal if a variant occurs in its evaluation. The evaluation of looping subgoals must be iterated to ensure the completeness of evaluation. Linear tabling is relatively easy to implement on top of a WAM-like abstract machine thanks to its linearity. Linear tabling is more space efficient than OLDT since the states of subgoals need not be preserved. Nevertheless, linear tabling may be slower than OLDT because of iteration of looping subgoals, especially when programs contain a large number of interdependent recursive predicates. Two Prolog systems, namely B-Prolog [30, 31] and ALS [9], have been extended to support linear tabling. 1.2 Motivation and Objective Prolog with Tabling has found its ways into many application areas ranging from deductive database, program analysis, natural language processing, model checking, to data mining and learning. We have been developing two application systems using our B-Prolog system: a protocol verifier and a statistical learning system called PRISM [33]. We have been applying the PRISM system to several machine learning tasks such as biosequence analysis and heuristics-learning for constraint solving. Efficient fixpoint computation is of crucial importance for our protocal verifier to find pitfalls in subtle and complex protocols, and for the learning system to learn probability distributions from large amounts of sample data. Linear tabling is a framework from which different methods can be derived: each is a combination of strategies for handling looping subgoals in forward execution, backtracking, and iteration. For example, for the forward execution of a looping subgoal that is a descendent of a variant ancestor, one strategy is to fail the subgoal after it consumes all the available answers, and another strategy is to let the descendent produce answers by using the alternative clauses of the ancestor. The decisions on how to handle looping subgoals in forward execution, backtracking, and iteration are orthogonal, and different methods can be designed by combining different strategies. 1 Notice that the word linear here has nothing to do with complexity or liner logic. It has the same meaning as L in SLD: a derivation is made up of a sequence of goals G 0 G 1... G k such that G i+1 is derived from G i.

OISE: Zhou 5 The objective of this project is to qualitatively and quantitatively analyze possible strategies. Current tabling systems incur considerable overhead to the execution of programs. Tamaki and Sato states the following in [25]: The storage requirement [of tabling] can be too demanding in some cases and the overhead of table manipulation can be too large. This situation has not changed since. The second objective of this project is to develop optimization techniques to tackle this problem. In concrete, we will investigate the following optimization techniques: (1) deferred tabling, deferring tabling to reduce the overhead of over-tabled programs; (2) semi-naive evaluation, an algorithm employed in bottom-up evaluation of Datalog programs [27] to avoid redundant joins; and (3) controlling search in tabled Prolog. 2 Prior Research Results on Tabling The SLD resolution used in Prolog may not be complete or efficient for programs in the presence of recursion. For example, for a recursive definition of the transitive closure of a relation, a query may never terminate under SLD resolution if the program contains left-recursion or the graph represented by the relation contains cycles even if no rule is left-recursive. For a natural definition of the Fibonacci function, the evaluation of a subgoal under SLD resolution spawns an exponential number of subgoals, many of which are variants. The lack of completeness and efficiency in evaluating recursive programs is problematic: novice programmers may lose confidence in writing declarative programs that terminate and real programmers have to reformulate a natural and declarative formulation to avoid these problems, resulting in less declarative and less readable programs. Tabling [25, 28] is a technique that can get rid of infinite loops for boundedterm-size programs and redundant computations in the execution of recursive Prolog programs. The main idea of tabling is to memorize the answers to some subgoals and use the answers to resolve subsequent variant subgoals. This idea of caching previously calculated solutions, called memoization, was first used to speed up evaluation of functions [13]. Tabling in Prolog not only is useful in the problem domains that motivated its birth, such as program analysis, parsing, deductive database and theorem proving but also has been found essential in several other problem domains such as model checking [20] and probabilistic logic learning [19, 23, 33]. OLDT [25] is the first revised SLD resolution that accommodates the idea of tabling. In OLDT, a table area is used to record subgoals and their answers. When a subgoal (producer) is first encountered in execution, it is resolved by using program clauses just as in SLD resolution with the exception that the subgoal and its answers

OISE: Zhou 6 are recorded in the table. When a subgoal (consumer) is encountered that is subsumed by one of its ancestors, OLDT does not expand it as in SLD resolution but rather uses the answers in the table to resolve it. After the answers are exhausted, the computation of the consumer is suspended until the producer produces new answers into the table. The process is continued until the fixpoint is reached, i.e. when no answers are available for consumers and no producers can produce any new answers. Several extensions of OLDT including SLG [4] and SLS [3] have been developed for evaluating general logic programs with negation. XSB is the first Prolog system that successfully supports tabling [22]. OLDT is non-linear in the sense that the state of a consumer must be preserved before execution backtracks to its producer. This non-linearity requires freezing stack segments [22] or copying stack segments into a different area [6] before backtracking takes place. Recently, another formalism, called linear tabling 2, has emerged as an alternative tabling method [24, 35, 9]. The main idea of linear tabling is to use iterative computation rather than suspension to compute fixpoints. A significant difference between linear tabling and OLDT lies in the handling of variant descendents of a subgoal. In linear tabling, after a descendent consumes all the answers, it either fails or turns into a producer, producing answers by using the alternative clauses of the ancestor. A subgoal is called a looping subgoal if a variant occurs as a descendent in its evaluation. The evaluation of looping subgoals must be iterated to ensure the completeness of evaluation. Linear tabling is a framework from which different methods can be derived based on the strategies used in handling looping subgoals in forward execution, backtracking, and iteration. The framework can be defined by three primitives on tabled subgoals: start(a), memo(a), and check completion(a). start(a) This primitive is executed when a tabled subgoal A is encountered. The subgoal A is registered into the table if it is not registered yet. If A s state is complete meaning that A has been completely evaluated before, then A is resolved by using the answers in the table. If A is a pioneer of the current path, meaning that it is encountered for the first time, then it is resolved by using program clauses. If A is a follower of some ancestor A 0, meaning that a loop has been encountered, then it is first resolved by using the answers in the table. After the answers are exhausted, two possible actions can be taken: One is to fail A. This strategy is called FDF (Fail Descendent Followers). The other is to resolve A by using 2 Notice that the word linear here has nothing to do with complexity or linear logic. It has the same meaning as L in SLD: a derivation is made up of a sequence of goals G 0 G 1... G k such that G i+1 is derived from G i.

OISE: Zhou 7 the alternative clauses of its ancestor A 0. This strategy is called SAC (Steal Alternative Clauses). memo(a) This primitive is executed when an answer is found for the tabled subgoal A. If the answer A is already in the table, then fail; otherwise the answer is added into the table. Two different strategies can be used after the answer is added. One is called eager consumption strategy, which lets the subgoal succeed and consume the answer. The other strategy is called lazy consumption strategy, which fails the primitive and forces the system to backtrack to find the next answer. table completion(a) This primitive is executed when the subgoal A is being resolved by using program clauses and all the clauses have been tried. If A has never occurred in a loop, then A s state can be set to complete and A can be failed after all the answers are consumed. If A is a top-most looping subgoal, then different actions should be taken depending on whether this is A s normal evaluation or re-evaluation. These actions depend again on what strategies are used by start(a) and memo(a). In any case, if no new answer is produced in the last round of re-evaluation of A, then A s state can be set to complete and A can be failed after all the answers are consumed. For a pioneer in a loop that is not a top-most looping subgoal, two different strategies can be used after the subgoal exhausts all the clauses and answers. One is to fail the subgoal. This means that under this strategy re-evaluation starts only at top-most looping subgoals. The other strategy is to iterate the evaluation of the subgoal until it reaches a temporary fixpoint. The fixpoint is temporary since the top-most subgoal on which this subgoal depends has not been completely evaluated. Linear tabling is relatively easy to implement on top of a WAM-like abstract machine thanks to its linearity. Linear tabling is more space efficient than suspension-based methods since the states of subgoals need not be preserved. In [32] and [34], we have investigated the lazy consumption strategy and proposed several optimization techniques including subgoal optimization and semi-naive optimization that can significantly improve the performance. 3 Proposed Research As described above, linear tabling is a framework from which different methods can be derived: each is a combination of strategies for handling looping subgoals in forward

OISE: Zhou 8 execution, backtracking, and iteration. For example, for the forward execution of a looping subgoal that is a descendent of a variant ancestor, one strategy is to fail the subgoal after it consumes all the available answers, and another strategy is to let the descendent produce answers by using the alternative clauses of the ancestor. The decisions on how to handle looping subgoals in forward execution, backtracking, and iteration are orthogonal, and different methods can be designed by combining different strategies. Different strategies, called scheduling strategies, have been studied in the context of OLDT [8], but no work has been done in linear tabling. One of the objectives of this project is to qualitatively and quantitatively analyze possible strategies and perfect a method that is most efficient in terms of both space and time. No tabling method could lead to an efficient system without optimization. Another objective of this project is to investigate several optimization techniques. We will also be investigating the method for handling cuts under various kinds of strategies. 3.1 Deferred tabling Current tabling systems incur significant overhead to the execution of programs. Tamaki and Sato states the following in [25]: The storage requirement [of tabling] can be too demanding in some cases and the overhead of table manipulation can be too large. This situation has not much changed since. For example, for the naive reverse program both XSB and B-Prolog slow down by over 30 times if the predicates are declared tabled. The overhead problem of tabling is a common problem to all current tabling systems. We propose a technique, called deferred tabling to tackle this problem. In current tabling systems, a table-declared subgoal is tabled the first time when it is encountered regardless of whether the answers can be reused or not. For a predicate for which no subgoal occurs twice, tabling it is a waste. For a predicate for which even if some subgoals may occur multiple times, most subgoals may occur only once. Tabling those single-occurring subgoals is a waste. This technique amounts to defer tabling a subgoal until a sufficiently close subgoal occurs again in execution. The deferred tabling technique is analogous to the buffering technique used in operating systems. One of the key components in the design of this technique is to define the closeness of subgoals. We will rely on subgoal abstraction to check the closeness of subgoals and explore several definitions for subgoal abstraction. 3.2 Semi-naive evaluation in linear tabling Linear tabling relies on iterative evaluation of top-most looping subgoals to compute fixpoints. Blind re-computation of all subgoals and clauses is not computationally

OISE: Zhou 9 acceptable. A system should re-evaluate only those subgoals and should use only those clauses and answers that can contribute to the generation of new answers. We have proposed a framework for incorporating the semi-naive evaluation into linear tabling [34]. The semi-naive algorithm used in bottom-up evaluation for Datalog programs [2, 27] can be used to avoid redundant joins in linear tabling. In concrete, in each round of evaluation, the join of the subgoals in the body of each rule must involve at least one new answer produced in the previous round. Let H: A 1,..., A k,..., A n be a clause where A k is the last subgoal in the body that may depend on H (i.e., all the subgoals to the right of A k that have occurred in early rounds must be already complete when H is re-evaluated). For each combination of the answers of A 1,, and A k 1, if the combination does not contain any new answers then A k should consume new answers only. In semi-naive evaluation, answers can be consumed either sequentially or incrementally. In sequential consumption, for each subgoal either all the available answers or only new answers are consumed. In incremental consumption, in contrast, answers produced in a round are not consumed until the next round. These two schemes need to be thoroughly investigated, and the conditions for them to be complete need to be identified and proved. The semi-naive algorithm, once fully incorporated, will have as great an impact on linear tabling as it has on bottom-up evaluation. With this algorithm, linear tabling can constantly beat the magic-sets method [27] since instantiation information of goals are passed down automatically in top-down evaluation and linear tabling enjoys far better space efficiency than the OLDT and bottom-up evaluation methods. 3.3 Single-solution search and cuts The lazy consumption strategy is suited for finding all answers. For certain applications such as planning and theorem proving it is unreasonable to find all answers either because the set is infinite or because the search space is huge. The cut operator in Prolog can be used to prune unnecessary branches in a search tree. Nevertheless, under the lazy consumption strategy the goal p(x),!,q(x) produces all the answers for p(x) even though only one is needed. At first glance, the eager consumption strategy seems to be amenable to cuts, but it causes other problems. In particular, more re-computation may be needed to reach fixpoints. For instance, in the goal p(x), p(y ) p(y ) may be started before p(x) is complete under the eager consumption strategy, and for each instance of p(x) the join p(x), p(y ) needs to be performed more than once to guarantee that no solution is lost. The effect can be disastrous if there is a heavy computation to be done between these two variant subgoals. We propose a novel and effective approach to dealing with cuts. We first replace

OISE: Zhou 10 each chunk of subgoals A that resides in the scope of a cut with once(a) and generate a program from the original program for each such call of once. The key idea of the approach to implementing once is to translate clauses into continuation-passing style (CPS) clauses. For a CPS program that is required to return only one solution, we can prove that continuations do not need to be tabled. The transformation from a standard clause to a CPS clause is straightforward[26, 5]. In a CPS program, each subgoal takes an extra argument that explicitly represents the continuation computation of the subgoal. Let H(Cont) be the CPS atomic formula of H. The clause H : B 1, B 2,..., B n is translated into the following CPS clause: H(Cont) : B1(B2(... Bn(Cont))...). If the body of the original clause is empty, then body of the CPS clause calls the continuation: H(Cont) : call(cont). CPS binary programs have the following property: A solution is found for a query if any clause in the transformed CPS predicates succeeds. For a call of once, a solution should be returned immediately after it is found. 3.4 An efficient tabling system for PRISM Tabling is used in the construction of explanation graphs for observed samples in PRISM [23, 33]. Using tabling resembles using dynamic programming in the machine learning algorithms such as the Baum-Welch algorithm for the HMM [18] and the Inside-Outside algorithm for PCFG [1]. The graphical EM learning algorithm adopted in PRISM first finds an explanation graph and then repeatedly walks it until it converges. Therefore, a compact and efficient representation for explanation graphs is crucially important. To this end, we propose two techniques, namely auto-tabling and graph compression. The auto-tabling technique amounts to devising a standard for automatically tabling predicates. It is daunting for programmers to declare what predicates need to be tabled. Users declarations can be inaccurate. Programs may slow down significantly because of over-tabling or may fall into infinite loops because some predicates are not tabled. Furthermore, table declarations affect directly the structure of the resulting explanation graph. Therefore, such a standard should take the following into consideration: (1) the overhead of over-tabling, (2) the cost of re-computation

OISE: Zhou 11 because of lack of tabling; (3) the possibility of infinite loops; and (4) the compactness of the resulting explanation graph. The resulting explanation graph from a program may not be optimal even if a wise standard is used in selecting table predicates. The graph compression technique reduces the size of the graph further by eliminating redundant nodes and edges, and by combing possible subgraphs. 4 Scientific Merits The result of this project will be an efficient linear tabling method that is an optimal combination of possible strategies. The method will be implemented in our B-Prolog system and will be made widely available to the public. Our protocol verification and statistical learning systems will be used to evaluate different methods in addition to the de facto standard benchmark suite. Our application programs will be the direct beneficiaries of the research result. B-Prolog is being used by thousands of users in a number of projects including several ones that require tabling. For example, Oege de Moor of Oxford University is using tabled Prolog in a program analysis project and Douglas R. Miles of Teknowledge, a Stanford University based AI company, is using tabled Prolog in a natural language processing project. All these projects will benefit from our research result. Our research will also have much broader impact on logic programming and deductive database research. As far as logic programming is concerned, the need to extend Prolog to narrow the gap between declarative and procedural readings of programs has been urged long before [15]. Although tabling has been perceived as necessary to narrow the gap, it has not become a standard feature. One primary reason may be the lack of an easy-to-implement and efficient tabling method. Our tabling method will accelerate the acceptance of tabling as a standard feature of logic programming. Deductive database used to be a very hot research area. A number of bottom-up evaluation methods have been invented for evaluating recursive Datalog rules [2, 10, 27], and several systems have been developed [12]. Deductive database was expected to replace relational database as the next generation database model. Nevertheless, deductive database research failed to meet the early expectations. One reason for this failure is that Datalog does not provided the flexibility needed in the representation of data and control in many application domains. Our system does not suffer from this limitation. Our linear tabling method has the potential to revitalize the research and development of deductive database systems and applications. The lack of completeness and efficiency in evaluating non-tabled recursive programs is problematic: novice programmers may lose confidence in writing declarative programs that terminate and real programmers have to reformulate a natural and

OISE: Zhou 12 declarative formulation to avoid these problems, resulting in less declarative and less readable programs. Therefore, our research will not only enhance the productivity of logic programming but also improve the friendliness of logic programming to beginners. 5 Broader Impacts of Research Our research will also have broad impacts on society and education. CUNY is striving to establish its reputation as a leading research institution in the region, and some programs, e.g. The Honors Program, are attracting some of the best computer science students in the greater New York region, especially from minority groups. The PI has students including both undergraduate and PhD students working on applying tabled Prolog to bioinformatics, heuristics-learning for constraint solving, and protocol verification. Our research will seek to involve and develop the academic skills of the students including minority students. Our research results will be disseminated through several channels to society, one of which is the CUNY Institute for Software Development (CISDD). CISDD serves as a hub that connects software industries and institutions in the greater New York region. We will make use of office space for students and visitors provided by CISDD and through CISDD establish a close relationship with industry. The Software Institute of the Chinese Academy of Sciences is a leading institution in China. The cooperation between Dr. Shen and the PI will enhance the connection between our research groups, and help establish a formal relationship between CUNY Computer Science and the Software Institute. 6 A Schedule of Planned Travel It transpires from the above that our research objectives require both theoretical proofs and experimentation, and the cooperation between Prof. Yi-Dong Shen, a theorist, and the PI, a system architect, can help advance the research. Our cooperation will be carried out in various ways. We will make frequent use of the Internet to exchange our ideas and developments on the subject. We will also visit each other at least once a year to work closely on diverse topics of mutual interest. 7/1/2005-7/15/2005: Visit Prof. Shen at the Institute of Software of the Chinese Academy of Sciences in Beijing to discuss optimization techniques and prove their completeness.

OISE: Zhou 13 1/2/2006-1/15/2006: Visit Prof. Shen to discuss the possibility of having a set of conditions for semi-naive evaluation to achieve the full effect of the seminaive algorithm used in bottom-up evaluation. Investigate possible methods for controlling search in tabled programs. 5/1/2006-5/7/2006: Host the visit of Prof. Shen at CUNY. Discuss the implementation status of the optimization techniques. Invite him to give a talk at the CUNY Computer Science Colloquium. 12/15/2006-12/30/2006: Visit Prof. Shen to finalize some research papers. 7 Experience and Capabilities of the PI The PI, Dr. Neng-Fa Zhou, has been an active researcher in programming language systems for over ten years. He has authored over thirty papers on programming language, constraint-solving, graphics, and machine learning systems including over fifteen papers published in top journals (ACM TOPLAS, Journal of Logic Programming, Theory and Practice of Logic Programming, Journal of Functional and Logic Programming, and Software Practice and Experience) and major conferences. His papers on compilation of logic programs, constraint solving, and tabling have received a number of citations. He is the main developer of the B-Prolog system, a cutting-edge constraint logic programming system which has tens of thousands users worldwide in both academia and industry.

OISE: Zhou 14 References [1] Baker, J. K. Trainable grammars for speech recognition. In Speech Communication Papers for the 97th Meeting of the Acoustical Society of America (1979), pp. 547 550. [2] Bancilhon, F., and Ramakrishnan, R. An amateur s introduction to recursive query processing strategies. Proc. of ACM SIGMOD 86 (1986), 16 52. [3] Bol, R. N., and Degerstedt, L. Tabulated resolution for the well-founded semantics. Journal of Logic Programming 34, 2 (1998), 67 109. [4] Chen, W., and Warren, D. S. Tabled evaluation with delaying for general logic programs. Journal of the ACM 43, 1 (1996), 20 74. [5] Demoen, B., and Marien, A. Implementation of prolog as binary definite programs. In Proceedings of the Second RUssian Conference on Logic Programming (1992), LNAI 592, Springer-Verlag, pp. 165 176. [6] Demoen, B., and Sagonas, K. CHAT: The copy-hybrid approach to tabling. In Proceedings of Practical Aspects of Declarative Programming (PADL) (1999), LNCS 1551, Springer-Verlag, pp. 106 121. [7] Eisner, J., Goldlust, E., and Smith, N. A. Dyna: A declarative language for implementing dynamic programs. In Proc. of the 42nd Annual Meeting of ACL (2004). [8] Freire, J., Swift, T., and Warren, D. S. Beyond depth-first: Improving tabled logic programs through alternative scheduling strategies. [9] Guo, H.-F., and Gupta, G. A simple scheme for implementing tabled logic programming systems based on dynamic reordering of alternatives. In Proceedings International Conference on Logic Programming (ICLP) (2001), LNCS 2237, Springer-Verlag, pp. 181 195. [10] Han, J., and Luk, W. What kinds of recursions can be processed by transitive closure strategies? Methodologies for Intelligent Systems 3 (1998), 170 179. [11] Johnson, M. Memoization of top down parsing. Computational Linguistics 21, 3 (1995). [12] Liu, M. Deductive database languages: Problems and solutions. ACM Computing Surveys 31, 1 (1999), 27 62.

OISE: Zhou 15 [13] Michie, D. memo functions and machine learning. Nature (1968), 19 22. [14] Nielson, F., Nielson, H. R., Sun, H., Buchholtz, M., Hansen, R. R., Pilegaard, H., and Seidl, H. The succinct solver suite. In Proc. Tools and Algorithms for the Construction and Analysis of Systems: 10th International Conference (TACAS), LNCS 2988 (2004), pp. 251 265. [15] Parker, D., Carey, M., Jarke, M., Sciore, E., and Walker, A. Logic programming and databases. In Expert Database Systems (1986). [16] Pereira, F. C. N., and Warren, D. H. D. Parsing as deduction. In Proceedings of 21st Annual Meeting of the Association for Computational Linguistics (June 1983), MIT. [17] Pientka, B. Tabled higher-order logic programming. PhD thesis, Technical Report CMU-CS-03-185, December 2003. [18] Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recoginition. Proceedings of the IEEE 77 (1989), 257 286. [19] Raedt, L. D., and Kersting, K. Probabilistic logic learning. SIGKDD Explorations 5 (2003), 31 48. [20] Ramakrishnan, C. Model checking with tabled logic programming. In ALP News Letter (2002), ALP. [21] Rocha, R., Silva, F., and Costa, V. S. On a tabling engine that can exploit or-parallelism. In Proceedings International Conference on Logic Programming (ICLP) (2001), LNCS 2237, Springer-Verlag, pp. 43 58. [22] Sagonas, K., and Swift, T. An abstract machine for tabled execution of fixed-order stratified logic programs. ACM Transactions on Programming Languages and Systems 20, 3 (1998), 586 634. [23] Sato, T., and Kameya, Y. Parameter learning of logic programs for symbolicstatistical modeling. Journal of Artificial Intelligence Research (2001), 391 454. [24] Shen, Y.-D., Yuan, L., You, J., and Zhou, N.-F. Linear tabulated resolution based on Prolog control strategy. Theory and Practice of Logic Programming (TPLP) 1, 1 (2001), 71 103. [25] Tamaki, H., and Sato, T. OLD resolution with tabulation. In Proceedings of the Third International Conference on Logic Programming (1986), E. Shapiro, Ed., LNCS, Springer-Verlag, pp. 84 98.

OISE: Zhou 16 [26] Tarau, P., and Boyer, M. Elementary Logic Programs. In Proceedings of Programming Language Implementation and Logic Programming (Aug. 1990), P. Deransart and J. Ma luszyński, Eds., no. 456 in Lecture Notes in Computer Science, Springer, pp. 159 173. [27] Ullman, J. D. Database and Knowledge-Base Systems, vol. 1 & 2. Computer Science Press, 1988. [28] Warren, D. S. Memoing for logic programs. Comm. of the ACM, Special Section on Logic Programming 35, 3 (1992), 93. [29] Warren, D. S. Programming in Tabled Prolog. DRAFT 1 (http://www.cs.sunysb.edu/ warren/xsbbook/book.html), 1999. [30] Zhou, N.-F. Parameter passing and control stack management in Prolog implementation revisited. ACM Transactions on Programming Languages and Systems 18, 6 (1996), 752 779. [31] Zhou, N.-F. B-Prolog users manual, version 6.6. Technical report, CUNY Computer Science, 2004. [32] Zhou, N.-F., and Sato, T. Efficient fixpoint computation in linear tabling. In Fifth ACM-SIGPLAN International Conference on Principles and Practice of Declarative Programming (2003), pp. 275 283. [33] Zhou, N.-F., Sato, T., and Hasida, K. Toward a high-performance system for symbolic and statistical modeling. In IJCAI Workshop on Learning Statistical Models from Relational Data (2003), pp. 153 159. [34] Zhou, N.-F., Shen, Y.-D., and Sato, T. Semi-naive evaluation in linear tabling. In Fifth ACM-SIGPLAN International Conference on Principles and Practice of Declarative Programming (2004), pp. 90 97. [35] Zhou, N.-F., Shen, Y.-D., Yuan, L., and You, J. Implementation of a linear tabling mechanism. Journal of Functional and Logic Programming 2001(1) (2001), 1 15. [36] Zhou, N.-F., Shen, Y.-D., Yuan, L.-Y., and You, J.-H. Implementation of a linear tabling mechanism. In Proceedings of Practical Aspects of Declarative Programming (PADL) (2000), LNCS 1753, Springer-Verlag, pp. 109 123.