ac

部分匹配表

"部分匹配值"就是"前缀"和"后缀"的最长的共有元素的长度。以"ABCDABD"为例，

　　－　"A"的前缀和后缀都为空集，共有元素的长度为0；

　　－　"AB"的前缀为[A]，后缀为[B]，共有元素的长度为0；

　　－　"ABC"的前缀为[A, AB]，后缀为[BC, C]，共有元素的长度0；

　　－　"ABCD"的前缀为[A, AB, ABC]，后缀为[BCD, CD, D]，共有元素的长度为0；

　　－　"ABCDA"的前缀为[A, AB, ABC, ABCD]，后缀为[BCDA, CDA, DA, A]，共有元素为"A"，长度为1；

　　－　"ABCDAB"的前缀为[A, AB, ABC, ABCD, ABCDA]，后缀为[BCDAB, CDAB, DAB, AB, B]，共有元素为"AB"，长度为2；

　　－　"ABCDABD"的前缀为[A, AB, ABC, ABCD, ABCDA, ABCDAB]，后缀为[BCDABD, CDABD, DABD, ABD, BD, D]，共有元素的长度为0。


"部分匹配"的实质是，有时候，字符串头部和尾部会有重复。比如，"ABCDAB"之中有两个"AB"，那么它的"部分
匹配值"就是2（"AB"的长度）。搜索词移动的时候，第一个"AB"向后移动4位（字符串长度-部分匹配值），就可以来到第二个"AB"的位置。

求next数组(TODO)

next[i]定义

i之前的字符串match[0...i-1]中，必须以match[i-1]结尾的后缀(不能包括match[0]),与必须以match[0]开头的前缀子串(不能包含match[i-1])最大匹配长度是多少？

code

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.Scanner;

public class Main {
    public static Scanner sc = new Scanner(System.in);

    public static BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

    /**
     * 求部分匹配
     * @return
     */
    static int[] getNext(char[] p, int len) {
        int[] next = new int[len];

        next[0] = -1;
        if(len == 1){
            return next;
        }
        next[1] = 0;
        int pos = 2;
        int cn = 0;
        while (pos < len){
            if(p[pos-1] == p[cn]){
                next[pos++] = ++cn;
            }else if(cn > 0){
                cn = next[cn];
            }else {
                next[pos++] = 0;
            }
        }
        return next;
    }

    public static void main(String[] args) throws Exception {
//        String s = sc.nextLine();
//        String match = sc.next();
        String s = br.readLine();
        String match = br.readLine();
        if(s == null || match == null){
            System.out.println(-1);
            return;
        }
        char[] ss = s.toCharArray();
        char[] ms = match.toCharArray();
        int n = ss.length;
        int m = ms.length;

        if(n < 1 || n < m){
            System.out.println(-1);
            return;
        }

        int[] next = getNext(ms, m);

        int si = 0;
        int mi = 0;
        boolean flag = false;
        while (si < n){
            if(ss[si] == ms[mi]){
                si ++;
                mi ++;
            }else if(next[mi] == -1){
                si ++;
            }else {
                // 不匹配了 mi 直接等于 next[mi]
                mi = next[mi];
            }
            if(mi == m){
                System.out.print((si-mi)+" ");
                flag = true;
                mi = 0;
            }
        }
        if(mi != m && !flag) {
            System.out.println(-1);
        }
    }
}
/*
acbc
bc
2

acbc
bcc
-1

ababab
ab
0 2 4
 */

eg

BBC ABCDAB ABCDABCDABDE
    ABCDABD

mi = 6的时候出现了不相等，此时next[mi] = 2, si = 9
令mi = 2，再比较ss[si]和ms[mi]
相当于将字符串后移动了m - next[mi]个位置

BBC ABCDAB ABCDABCDABDE
    ABCDABD

next 求解说明

+--------------------------------------------+
|  pre  ｜C              ｜  suffix  ｜B｜pos |
+--------------------------------------------+

当前所求位置是pos,pos前一个紧挨着的字符是B，next[B]已经求出，它的前缀和后缀相等部分为pre和suffix区域，pre区域紧挨着字符C

如果B == C，那么pre + C = suffix + B,则有

next[pos] = next[B] + 1

如果B != C，那么pre + C != suffix + B, 则继续判断

+------------------------------------------------------+
|      pre        ｜C          ｜  suffix       ｜B｜pos |
| preC|D   suffixC｜                   suffixC ｜       |
+------------------------------------------------------+

C的next的前缀分别是 preC 和 suffixC, 因为B的next前缀是pre和suffix,那么有个对称区域是的suffix里面的后面部分包含suffixC;preC紧挨着的字符是D

如果B=D，那么preC + D = suffixC + B,则有

next[pos] = next[D] + 1

否则，类似这样继续递归的操作，如果跳到了最左，那么

next[pos] = 0

编码时记C位置为cn,cn会根据起next跳到D,递归到往前跳

leetcode 28 题

class Solution {

    int[] getNext(char[] p, int len){
        int[] next = new int[len];
        next[0] = -1;
        if (len == 1) {
            return next;
        }
        next[1] = 0;
        int pos = 2;
        int cn = 0; // cn 为 next[pos-1]，不断的向前取即可
        while (pos < len){
            if (p[pos-1] == p[cn]){
                next[pos] = ++cn;
                pos ++;
            }else {
                if (cn > 0) {
                    cn = next[cn];
                } else {
                    next[pos] = 0;
                    pos ++;
                }
            }
        }
        return next;
    }

    public int strStr(String haystack, String needle) {
        if(haystack == null){
            return -1;
        }
        if(needle == null){
            return -1;
        }
        if(haystack.isEmpty() && needle.isEmpty()){
            return 0;
        }
        if(!haystack.isEmpty() && needle.isEmpty()){
            return 0;
        }
        int n = haystack.length();
        int m = needle.length();
        int[] next = getNext(needle.toCharArray(), m);
        int i = 0;
        int j = 0;
        int firstPos = -1;
        while (i < n && j < m){
            if (haystack.charAt(i) == needle.charAt(j)) {
                i ++;
                j ++;
            } else {
                if(next[j] >= 0){
                    // 让 i 与 nextJ 两个位置比较相等
                    j = next[j];
                } else {
                    // next[j] = -1 表示重新来过, 子串从0开始，s从新的位置开始
                    i ++;
                }
            }
            if(j == m){
                firstPos = i - m;
            }
        }
        return firstPos;
    }
}