Fighting spam with Haskell

One of our weapons in the fight against spam, malware, and other abuse on Facebook is a system called Sigma. Its job is to proactively identify malicious actions on Facebook, such as spam, phishing attacks, posting links to malware, etc. Bad content detected by Sigma is removed automatically so that it doesn’t show up in your News Feed.

We recently completed a two-year-long major redesign of Sigma, which involved replacing the in-house FXL language previously used to program Sigma with Haskell. The Haskell-powered Sigma now runs in production, serving more than one million requests per second.

Haskell isn’t a common choice for large production systems like Sigma, and in this post, we’ll explain some of the thinking that led to that decision. We also wanted to share the experiences and lessons we learned along the way. We made several improvements to GHC (the Haskell compiler) and fed them back upstream, and we were able to achieve better performance from Haskell compared with the previous implementation.

How does Sigma work?

Sigma is a rule engine, which means it runs a set of rules, called policies. Every interaction on Facebook — from posting a status update to clicking “like” — results in Sigma evaluating a set of policies specific to that type of interaction. These policies make it possible for us to identify and block malicious interactions before they affect people on Facebook.

Policies are continuously deployed. At all times, the source code in the repository is the code running in Sigma, allowing us to move quickly to deploy policies in response to new abuses. This also means that safety in the language we write policies in is important. We don’t allow code to be checked into the repository unless it is type-correct.

Louis Brandy of Facebook’s Site Integrity team discusses scalable spam fighting and the anti-abuse structure at Facebook and Instagram in a 2014 @Scale talk.

Why Haskell?

The original language we designed for writing policies, FXL, was not ideal for expressing the growing scale and complexity of Facebook policies. It lacked certain abstraction facilities, such as user-defined data types and modules, and its implementation, based on an interpreter, was slower than we wanted. We wanted the performance and expressivity of a fully fledged programming language. Thus, we decided to migrate to an existing language rather than try to improve FXL.

The following features were at the top of our list when we were choosing a replacement:

1. Purely functional and strongly typed. This ensures that policies can’t inadvertently interact with each other, they can’t crash Sigma, and they are easy to test in isolation. Strong types help eliminate many bugs before putting policies into production.

2. Automatically batch and overlap data fetches. Policies typically fetch data from other systems at Facebook, so we want to employ concurrency wherever possible for efficiency. We want concurrency to be implicit, so that engineers writing policies can concentrate on fighting spam and not worry about concurrency. Implicit concurrency also prevents the code from being cluttered with efficiency-related details that would obscure the functionality, and make the code harder to understand and modify.

3. Push code changes to production in minutes. This enables us to deploy new or updated policies quickly.

4. Performance. FXL’s slower performance meant that we were writing anything performance-critical in C++ and putting it in Sigma itself. This had a number of drawbacks, particularly the time required to roll out changes.

5. Support for interactive development. Developers working on policies want to be able to experiment and test their code interactively, and to see the results immediately.

Haskell measures up quite well: It is a purely functional and strongly typed language, and it has a mature optimizing compiler and an interactive environment (GHCi). It also has all the abstraction facilities we would need, it has a rich set of libraries available, and it’s backed by an active developer community.

That left us with two features from our list to address: (1) automatic batching and concurrency, and (2) hot-swapping of compiled code.

Automatic batching and concurrency: The Haxl framework

All the existing concurrency abstractions in Haskell are explicit, meaning that the user needs to say which things should happen concurrently. For data-fetching, which can be considered a purely functional operation, we wanted a programming model in which the system just exploits whatever concurrency is available, without the programmer having to use explicit concurrency constructs. We developed the Haxl framework to address this issue: Haxl enables multiple data-fetching operations to be automatically batched and executed concurrently.

We discussed Haxl in an earlier blog post, and we published a paper on Haxl at the ICFP 2014 conference. Haxl is open source and available on GitHub.

In addition to the Haxl framework, we needed help from the Haskell compiler in the form of theApplicative do-notation. This allows programmers to write sequences of statements that the compiler automatically rearranges to exploit concurrency. We also designed and implemented Applicative do-notation in GHC.

Hot-swapping of compiled code

Every time someone checks new code into the repository of policies, we want to have that code running on every machine in the Sigma fleet as quickly as possible. Haskell is a compiled language, so that involves compiling the code and distributing the new compiled code to all the machines running Sigma.

We want to update the compiled rules in a running Sigma process on the fly, while it is serving requests. Changing the code of a running program is a tricky problem in general, and it has been the subject of a great deal of research in the academic community. In our case, fortunately, the problem is simpler: Requests to Sigma are short-lived, so we don’t need to switch a running request to new code. We can serve new requests on the new code and let the existing requests finish before we discard the old code. We’re careful to ensure that we don’t change any code associated with persistent state in Sigma.

Loading and unloading code currently uses GHC’s built-in runtime linker, although in principle, we could use the system dynamic linker. To unload the old version of the code, the garbage collector gets involved. The garbage collector detects when old code is no longer being used by a running request, so we know when it is safe to unload it from the running process.

How Haskell fits in

Haskell is sandwiched between two layers of C++ in Sigma. At the top, we use the C++ thrift server. In principle, Haskell can act as a thrift server, but the C++ thrift server is more mature and performant. It also supports more features. Furthermore, it can work seamlessly with the Haskell layers below because we can call into Haskell from C++. For these reasons, it made sense to use C++ for the server layer.

At the lowest layer, we have existing C++ client code for talking to other internal services. Rather than rewrite this code in Haskell, which would duplicate the functionality and create an additional maintenance burden, we wrapped each C++ client in a Haxl data source using Haskell’s Foreign Function Interface (FFI) so we could use it from Haskell.

Haskell’s FFI is designed to call C rather than C++, so calling C++ requires an intermediate C layer. In most cases, we were able to avoid the intermediate C layer by using a compile-time tool that demangles C++ function names so they can be called directly from Haskell.

Performance

Perhaps the biggest question here is “Does it run fast enough?” Requests to Sigma result from users performing actions on Facebook, such as sending a message on Messenger, and Sigma must respond before the action can take place. So we wanted to serve requests fast enough to avoid interruptions to the user experience.

The graph below shows the relative throughput performance between FXL and Haskell for the 25 most common types of requests served by Sigma (these requests account for approximately 95 percent of Sigma’s typical workload).

Haskell performs as much as three times faster than FXL for certain requests. On a typical workload mix, we measured a 20 percent to 30 percent improvement in overall throughput, meaning we can serve 20 percent to 30 percent more traffic with the same hardware. We believe additional improvements are possible through performance analysis, tuning, and optimizing the GHC runtime for our workload.

Achieving this level of performance required a lot of hard work, profiling the Haskell code, and identifying and resolving performance bottlenecks.

Here are a few specific things we did:

 

  • We implemented automatic memoization of top-level computations using a source-to-source translator. This is particularly beneficial in our use-case where multiple policies can refer to the same shared value, and we want to compute it only once. Note, this is per-request memoization rather than global memoization, which lazy evaluation already provides.
  • We made a change to the way GHC manages the heap, to reduce the frequency of garbage collections on multicore machines. GHC’s default heap settings are frugal, so we also use a larger allocation area size of at least 64 MB per core.
  • Fetching remote data usually involves marshaling the data structure across the C++/Haskell boundary. If the whole data structure isn’t required, it is better to marshal only the pieces needed. Or better still, don’t fetch the whole thing — although that’s only possible if the remote service implements an appropriate API.
  • We uncovered a nasty performance bug in aeson, the Haskell JSON parsing library. Bryan O’Sullivan, the author of aeson, wrote a nice blog post about how he fixed it. It turns out that when you do things at Facebook scale, those one-in-a-million corner cases tend to crop up all the time.

 

Resource limits

In a latency-sensitive service, you don’t want a single request using a lot of resources and slowing down other requests on the same machine. In this case, the “resources” include everything on the machine that is shared by the running requests — CPU, memory, network bandwidth, and so on.

A request that uses a lot of resources is normally a bug that we want to fix. It does happen from time to time, often as a result of a condition that occurs in production that wasn’t encountered during testing — perhaps an innocuous operation provided with some unexpectedly large input data, or pathological performance of an algorithm on certain rare inputs, for example. When this happens, we want Sigma to terminate the affected request with an error (that will subsequently result in the bug being fixed) and continue without any impact on the performance of other requests being served.

To make this possible, we implemented allocation limits in GHC, which places a bound on the amount of memory a thread can allocate before it is terminated. Terminating a computation safely is a hard problem in general, but Haskell provides a safe way to abort a computation in the form of asynchronous exceptions. Asynchronous exceptions allow us to write most of most of our code ignoring the potential for summary termination and still have all the nice guarantees that we need in the event that the limit is hit, including safe releasing of resources, closing network connections, and so forth.

The following graph illustrates of how well allocation limits work in practice. It tracks the maximum live memory across various groups of machines in the Sigma fleet. When we enabled one request that had some resource-intensive outliers, we saw large spikes in the maximum live memory, which disappeared when we enabled allocation limits.

Enabling interactive development

Facebook engineers develop policies interactively, testing code against real data as they go. To enable this workflow in Haskell, we needed the GHCi environment to work with our full stack, including making requests to other back-end services from the command line.

To make this work, we had to make our build system link all the C++ dependencies of our code into a shared library that GHCi could load. We also customized the GHCi front end to implement some of our own commands and streamline the desired workflows. The result is an interactive environment in which developers can load their code from source in a few seconds and work on it with a fast turnaround time. They have the full set of APIs available and can test against real production data sources.

While GHCi isn’t as easy to customize as it could be, we’ve already made several improvements and contributed them upstream. We hope to make more improvements in the future.

Packages and build systems

In addition to GHC itself, we make use of a lot of open-source Haskell library code. Haskell has its own packaging and build system, Cabal, and the open-source packages are all hosted onHackage. The problem with this setup is that the pace of change on Hackage is fast, there are often breakages, and not all combinations of packages work well together. The system of version dependencies in Cabal relies too much on package authors getting it right, which is hard to ensure, and the tool support isn’t what it could be. We found that using packages directly from Hackage together with Facebook’s internal build tools meant adding or updating an existing package sometimes led to a yak-shaving exercise involving a cascade of updates to other packages, often with an element of trial and error to find the right version combinations.

As a result of this experience, we switched to Stackage as our source of packages. Stackage provides a set of package versions that are known to work together, freeing us from the problem of having to find the set by trial and error.

Did we find bugs in GHC?

Yes, most notably:

 

  • We fixed a bug in GHC’s garbage collector that was causing our Sigma processes to crash every few hours. The bug had gone undetected in GHC for several years.
  • We fixed a bug in GHC’s handling of finalizers that occasionally caused crashes during process shutdown.

 

Following these fixes, we haven’t seen any crashes in either the Haskell runtime or the Haskell code itself across our whole fleet.

What else?

At Facebook, we’re using Haskell at scale to fight spam and other types of abuse. We’ve found it to be reliable and performant in practice. Using the Haxl framework, our engineers working on spam fighting can focus on functionality rather than on performance, while the system can exploit the available concurrency automatically.

For more information on spam fighting at Facebook, check out our Protect the Graph page, or watch videos from our recent Spam Fighting @Scale event.

最大似然估计(Maximal Likelihood Estimation)

(一)基本思想

给定一个概率分布 D,假设其概率密度函数是 f_{D},它与一个未知参数 \theta 相关。我们可以从这个分布中抽取 n 样本 x_{1},x_{2},...,x_{n},我们就可以得到这个概率是

P(x_{1},...,x_{n}) = f_{D}(x_{1},...,x_{n}|\theta).

但是,在这里我们并不知道参数 \theta 的值。如何估计参数 \theta 的取值就成为了关键之处。一个简单的想法就是从这个分布中随机抽取样本 x_{1},...,x_{n},然后利用这些数据来估算 \theta 的值。

最大似然估计 (maximal likelihood estimator) 算法会计算参数 \theta 的最可能的值,也就是说参数的选择会使得这个采样的概率最大化

用数学的语言来说,首先我们需要定义似然函数

L(\theta) = f_{D}(x_{1},...,x_{n}|\theta),

并且在 \theta 的所有取值上,使得这个函数的取值最大化。换言之,也就是函数 L(\theta) 的一阶导数等于零。这个使得 L(\theta) 最大化的参数 \hat{\theta} 称为 \theta最大似然估计

Remark. 最大似然函数不一定是唯一的,甚至不一定是存在的。

(二)基本算法

求解最大似然函数估计值的一般步骤

(1)定义似然函数;

(2)对似然函数求导数,或者说对似然函数的对数求导数,目的都是为了更加方便地计算一阶导数;

(3)令一阶导数等于零,得到关于参数 \theta 的似然方程;

(4)求解似然方程,得到的参数就是最大似然估计。在求解的过程中,如果不能够直接求解方程的话,可以采取牛顿法来近似求解。

(三)例子

(i)Bernoulli 分布(Bernoulli Distribution)

假设我们有 n 个随机样本 x_{1},...,x_{n}. 如果第 i 个学生没有自行车,那么 x_{i}=0; 否则 x_{i}=1. 并且假设 x_{i} 是满足未知参数 p 的 Bernoulli 分布的。我们此时的目标是计算最大似然估计 p,也就是全体学生中拥有自行车的比例。

如果 \{x_{i}:1\leq i\leq n\} 是相互独立的 Bernoulli 随机变量,那么对每一个 x_{i} 而言,它的概率函数则是:

f(x_{i};p)=p^{x_{i}}(1-p)^{1-x_{i}}, \text{ for } x_{i}=0 \text{ or } 1 \text{ and } 0<p<1.

因此,似然函数 L(p) 可以定义为:

L(p)=\prod_{i=1}^{n}f(x_{i};p)=p^{\sum_{i=1}^{n}x_{i}}(1-p)^{n-\sum_{i=1}^{n}x_{i}}.

为了计算参数 p 的值,可以对 ln(L(p)) 求导:

\ln(L(p))=(\sum_{i=1}^{n}x_{i})\ln(p) + (n-\sum_{i=1}^{n}x_{i})\ln(1-p)

\frac{\partial\ln(L(p))}{\partial p} = \frac{\sum_{i=1}^{n}x_{i}}{p}-\frac{n-\sum_{i=1}^{n}x_{i}}{1-p}

\frac{\partial\ln(L(p))}{\partial p} = 0,可以得到

p=\sum_{i=1}^{n}x_{i}/n.

也就是说最大似然估计是

\hat{p}=\sum_{i=1}^{n}x_{i}/n.

(ii)Gaussian Distribution

假设 x_{1},...,x_{n} 满足正态分布,并且该正态分布的参数 \mu\sigma^{2} 都是未知的。目标是寻找均值 \mu 和方差 \sigma^{2} 的最大似然估计。

如果 \{x_{1},...,x_{n}\} 是满足正态分布的,那么对于每一个变量 x_{i} 的概率密度函数就是:

f(x_{i};\mu,\sigma^{2}) = \frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{(x_{i}-\theta)^{2}}{2\sigma^{2}}).

似然函数就是:

L(\mu,\sigma) = \prod_{i=1}^{n} f(x_{i};\mu,\sigma^{2}) = (2\pi)^{-n/2}\sigma^{-n}\exp(-\frac{\sum_{i=1}^{n}(x_{i}-\mu)^{2}}{2\sigma^{2}})

\ln(L(\mu,\sigma)) = -\frac{n}{2}\ln(2\pi) - n\ln(\sigma) - \frac{\sum_{i=1}^{n}(x_{i}-\mu)^{2}}{2\sigma^{2}}

\frac{\partial \ln(L(\mu,\sigma))}{\partial \mu} = \frac{\sum_{i=1}^{n}(x_{i}-\mu)}{\sigma^{2}}

\frac{\partial \ln(L(\mu,\sigma))}{\partial \sigma} = -\frac{n}{\sigma} + \frac{\sum_{i=1}^{n}(x_{i}-\mu)^{2}}{\sigma^{3}}

\frac{\partial \ln(L(\mu,\sigma))}{\partial \mu} =0 和  \frac{\partial \ln(L(\mu,\sigma))}{\partial \sigma}=0,可以求解方程组得到:

\hat{\mu}= \sum_{i=1}^{n}x_{i}/n,

\hat{\sigma}^{2} = \sum_{i=1}^{n}(x_{i}-\mu)^{2}/n.

(iii) Weibull 分布(Weibull Distribution)

首先,我们回顾一下 Weibull 分布的定义。Weibull 分布(Weibull Distribution)是连续型的概率分布,其概率密度函数是:

f(x;\lambda,k) = \frac{k}{\lambda}(\frac{x}{\lambda})^{k-1}\exp^{-(x/\lambda)^{k}} \text{ for } x\geq 0, f(x;\lambda,k)=0 \text{ for } x<0.

其中,x 是随机变量,\lambda>0 是 scale parameter,k>0 是 shape parameter。特别地,当 k=1 时,Weibull 分布就是指数分布;当 k=2 时,Weibull 分布就是 Rayleigh 分布。

Weibull 分布的累积分布函数

F(x;k,\lambda) = 1- \exp^{-(x/\lambda)^{k}} \text{ for } x\geq 0,

F(x;k,\lambda) = 0 \text{ for } x<0.

Weibull 分布的分位函数(quantile function, inverse cumulative distribution)是

Q(p;k,\lambda) = \lambda(-\ln(1-p))^{1/k} \text{ for } 0\leq p <1.

其次,我们来计算最大似然估计。

假设 \{x_{1},...,x_{n}\} 满足 Weibull 分布,其未知参数是 k,\lambda. 那么对于每一个 x_{i} 而言,概率密度函数是:

p(x_{i};k,\lambda) = \frac{k}{\lambda}(\frac{x_{i}}{\lambda})^{k-1}\exp(-(\frac{x_{i}}{\lambda})^{k}).

定义似然函数为:

L(k,\lambda) = \prod_{i=1}^{n}p(x_{i};k,\lambda)

取对数之后得到:

\ln(L(k,\lambda)) = n\ln(k) - nk\ln(\lambda) + (k-1)\sum_{i=1}^{n}\ln(x_{i}) - \sum_{i=1}^{n}x_{i}^{k}/\lambda^{k}.

计算一阶偏导数得到:

\frac{\partial \ln(L(k,\lambda))}{\partial \lambda} = - \frac{nk}{\lambda} + \frac{k\sum_{i=1}^{n}x_{i}^{k}}{\lambda^{k+1}},

\frac{\partial \ln(L(k,\lambda))}{\partial k} = \frac{n}{k} - n\ln(\lambda) + \sum_{i=1}^{n}\ln(x_{i}) -\sum_{i=1}^{n}(\frac{x_{i}}{\lambda})^{k}\ln(\frac{x_{i}}{\lambda}).

可以计算得出:

\lambda^{k}=\frac{\sum_{i=1}^{n}x_{i}^{k}}{n},

\frac{1}{k} = \frac{\sum_{i=1}^{n}x_{i}^{k}\ln(x_{i})}{\sum_{i=1}^{n}x_{i}^{k}} -\frac{\sum_{i=1}^{n}\ln(x_{i})}{n}.

其中第一个式子可以计算出 \lambda 的最大似然估计。第二个式子是关于 k 的隐函数,不能够直接求解,需要使用 Newton’s method 来计算。

f(k) = \frac{\sum_{i=1}^{n}x_{i}^{k}\ln(x_{i})}{\sum_{i=1}^{n}x_{i}^{k}} - \frac{\sum_{i=1}^{n}\ln(x_{i})}{n} - \frac{1}{k}.

求导得到:

f^{'}(k)= \frac{\sum_{i=1}^{n}x_{i}^{k}(\ln(x_{i}))^{2}}{\sum_{i=1}^{n}x_{i}^{k}}-(\frac{\sum_{i=1}^{n}x_{i}^{k}\ln(x_{i})}{\sum_{i=1}^{n}x_{i}^{k}})^{2} + \frac{1}{k^{2}}.

根据 Cauchy’s Inequality, 可以得到:

(\sum_{i=1}^{n}x_{i}^{k}(\ln(x_{i}))^{2})\cdot(\sum_{i=1}^{n}x_{i}^{k})\geq (\sum_{i=1}^{n}x_{i}^{k}\ln(x_{i}))^{2}.

所以,f^{'}(k)>0 \text{ for all } k>0,换言之,f(k) 是关于 k 的递增函数,并且

\lim_{k\rightarrow 0^{+}}f(k) = -\infty,

\lim_{k\rightarrow +\infty}f(k) > 0 \text{ if } \forall x_{i}>1.

那么对于递增函数 f(k) 而言,就必定有一个零点。因此使用 Newton’s Iteration 的时候,初始点可以从靠近零的整数开始,比如 k_{0}=0.0001。如果从一个比较大的数开始的时候,可能使用 Newton 法的时候,会与负轴相交。但是如果从一个较小的数开始,就必定只与正数轴相交。其中 Newton 法的公式是:

k_{0}= 0.0001,

k_{n+1} = k_{n}- \frac{f(k_{n})}{f^{'}(k_{n})} \text{ for all } n\geq 0.

n 的次数足够大的时候,k_{n} 就可以被当作最大似然估计。

How machine learning can help the security industry

Machine learning (ML) is such a hot area in security right now.

At the 2016 RSA Conference, you would be hard pressed to find a company that is not claiming to use ML for security. And why not? To the layperson, ML seems like the magic solution to all security problems. Take a bunch of unlabeled data, pump it through a system with some ML magic inside, and it can somehow identify patterns even human experts can’t find — all while learning and adapting to new behaviors and threats. Rather than having to code the rules, these systems can discover the rules all by themselves.

Oh, if only that were the case! ML is this year’s “big data”: Everyone is claiming to do it, but few actually do it right or even understand what it’s good for. Especially in security, I’ve seen more misapplications than appropriate ones.

Most applications of ML in security use a form of anomaly detection, which is used to spot events that do not match an expected pattern. Anomaly detection is a useful technique in certain circumstances, but too often, vendors misapply it. For example, they will claim to analyze network traffic in an enterprise and use ML to find hackers in your network. This does not work, and you should be immediately skeptical of the vendors who make this claim.

Effective machine learning requires a low dimensionality problem with high-quality labeled data. Unfortunately, deployments in real enterprises have neither. Detecting novel attacks requires either clear, labeled examples of attacks, which you do not have by definition, or a complete, exhaustive understanding of “normal” network behavior, which is impossible for any real network. And any sophisticated attacker will make an attack appear as seamless and “typical” as possible, to avoid setting off alarms.

Where does ML work?

One example where ML and anomaly detection can actually work well for security is in classifying human behavior. Humans, it turns out, are fairly predictable, and it is possible to build fairly accurate models of individual user behavior and detect when it doesn’t match their normal behavior.

We’ve had success in using ML for implicit authentication via analyzing a user’s biometrics, behavior, and environment. Implicit authentication is a technique that allows users to authenticate without performing any explicit actions like entering a password or swiping a fingerprint. This has clear benefits to both the user experience as well as for security. Users don’t need to be bothered with extra steps, we can use many authentication factors (rather than just one, a password), and it can happen continuously in the background.

Implicit authentication is well-suited to ML because most of the factors are low dimensional, meaning they involve a small number of parameters, and you can passively gather high-quality labeled data about user identities. Much like ML is effective in matching images for computer vision even in the presence of variance and noise, it is also effective in matching unique human behavioral aspects.

One example of this technology is how we can authenticate users based on unique aspects to the way they move. Attributes of the way you walk, sit, and stand are influenced by a large number of factors (including physiology, age, gender, and muscle memory), but are largely consistent for an individual. It is actually possible to accurately detect some of these attributes from the motion sensors in your phone in your pocket. In fact, after four seconds of motion data from a phone in your pocket, we can detect enough of these attributes to identify you. Another example is in using a user’s location history to authenticate them. Humans are creatures of habit, and by looking at where they came from and when, we can make an estimate of whether it’s them.

There are enough sensors in phones and computers (and more recently, wearables and IoT devices) that it is possible to passively pick up a large number of unique attributes about a user’s behavior and environment. We can then use ML to build a unique model for an individual user and find correlations between factors.

Threat models and anomaly detection

In any security system, it is important to understand the threat models you are trying to protect against. When using ML for security, you need to explicitly gather data, model the threats your system is protecting against, and use the model to train your system. Fortunately, for attacks against authentication, it is often possible to detect behavioral changes. For example, when a device is stolen, there are often clear changes in terms of its movement, location, and usage. And because false negatives are acceptable in that they just require the user to re-authenticate with a different method, we can tune the system to minimize false positives. In fact, once we combine four factors across multiple devices, we can get below a 0.001 percent false positive rate on implicit authentication.

There is no magic machine learning genie that can solve all your security problems. Building an effective security product that uses ML requires a deep understanding of the underlying system, and many security problems are just not appropriate for ML. For those that are, it’s a very powerful technique. And don’t worry, the companies on the hype train will soon move on to newer fads, like mobile self-driving AR blockchain drone marketplaces.

Hausdorff dimension of the graphs of the classical Weierstrass functions

In this paper, we obtain the explicit value of the Hausdorff dimension of the graphs of the classical Weierstrass functions, by proving absolute continuity of the SRB measures of the associated solenoidal attractors.

1. Introduction

In Real Analysis, the classical Weierstrass function is

\displaystyle W_{\lambda,b}(x) = \sum\limits_{n=0}^{\infty} \lambda^n \cos(2\pi b^n x)

with {1/b < \lambda < 1}.

Note that the Weierstrass functions have the form

\displaystyle f^{\phi}_{\lambda,b}(x) = \sum\limits_{n=0}^{\infty} \lambda^n \phi(b^n x)

where {\phi} is a {\mathbb{Z}}-periodic {C^2}-function.

Weierstrass (1872) and Hardy (1916) were interested in {W_{\lambda,b}} because they are concrete examples of continuous but nowhere differentiable functions.

Remark 1 The graph of {f^{\phi}_{\lambda,b}} tends to be a “fractal object” because {f^{\phi}_{\lambda,b}} is self-similar in the sense that

\displaystyle f^{\phi}_{\lambda, b}(x) = \phi(x) + \lambda f^{\phi}_{\lambda,b}(bx)

We will come back to this point later.

Remark 2 {f^{\phi}_{\lambda,b}} is a {C^{\alpha}}-function for all {0\leq \alpha < \frac{-\log\lambda}{\log b}}. In fact, for all {x,y\in[0,1]}, we have

\displaystyle \frac{f^{\phi}_{\lambda, b}(x) - f^{\phi}_{\lambda,b}(y)}{|x-y|^{\alpha}} = \sum\limits_{n=0}^{\infty} \lambda^n b^{n\alpha} \left(\frac{\phi(b^n x) - \phi(b^n y)}{|b^n x - b^n y|^{\alpha}}\right),

so that

\displaystyle \frac{f^{\phi}_{\lambda, b}(x) - f^{\phi}_{\lambda,b}(y)}{|x-y|^{\alpha}} \leq \|\phi\|_{C^{\alpha}} \sum\limits_{n=0}^{\infty}(\lambda b^{\alpha})^n:=C(\phi,\alpha,\lambda,b) < \infty

whenever {\lambda b^{\alpha} < 1}, i.e., {\alpha < -\log\lambda/\log b}.

The study of the graphs of {W_{\lambda,b}} as fractal sets started with the work of Besicovitch-Ursell in 1937.

Remark 3 The Hausdorff dimension of the graph of a {C^{\alpha}}-function {f:[0,1]\rightarrow\mathbb{R}}is

\displaystyle \textrm{dim}(\textrm{graph}(f))\leq 2 - \alpha

Indeed, for each {n\in\mathbb{N}}, the Hölder continuity condition

\displaystyle |f(x)-f(y)|\leq C|x-y|^{\alpha}

leads us to the “natural cover” of {G=\textrm{graph}(f)} by the family {(R_{j,n})_{j=1}^n} of rectangles given by

\displaystyle R_{j,n}:=\left[\frac{j-1}{n}, \frac{j}{n}\right] \times \left[f(j/n)-\frac{C}{n^{\alpha}}, f(j/n)+\frac{C}{n^{\alpha}}\right]

Nevertheless, a direct calculation with the family {(R_{j,n})_{j=1}^n} does not give us an appropriate bound on {\textrm{dim}(G)}. In fact, since {\textrm{diam}(R_{j,n})\leq 4C/n^{\alpha}} for each {j=1,\dots, n}, we have

\displaystyle \sum\limits_{j=1}^n\textrm{diam}(R_{j,n})^d\leq n\left(\frac{4C}{n^{\alpha}}\right)^d = (4C)^{1/\alpha} < \infty

for {d=1/\alpha}. Because {n\in\mathbb{N}} is arbitrary, we deduce that {\textrm{dim}(G)\leq 1/\alpha}. Of course, this bound is certainly suboptimal for {\alpha<1/2} (because we know that {\textrm{dim}(G)\leq 2 < 1/\alpha} anyway).Fortunately, we can refine the covering {(R_{j,n})} by taking into account that each rectangle {R_{j,n}} tends to be more vertical than horizontal (i.e., its height {2C/n^{\alpha}} is usually larger than its width {1/n}). More precisely, we can divide each rectangle {R_{j,n}} into {\lfloor n^{1-\alpha}\rfloor} squares, say

\displaystyle R_{j,n} = \bigcup\limits_{k=1}^{\lfloor n^{1-\alpha}\rfloor}Q_{j,n,k},

such that every square {Q_{j,n,k}} has diameter {\leq 2C/n}. In this way, we obtain a covering {(Q_{j,n,k})} of {G} such that

\displaystyle \sum\limits_{j=1}^n\sum\limits_{k=1}^{\lfloor n^{1-\alpha}\rfloor} \textrm{diam}(Q_{j,n,k})^d \leq n\cdot n^{1-\alpha}\cdot\left(\frac{2}{n}\right)^d\leq (2C)^{2-\alpha}<\infty

for {d=2-\alpha}. Since {n\in\mathbb{N}} is arbitrary, we conclude the desired bound

\displaystyle \textrm{dim}(G)\leq 2-\alpha

A long-standing conjecture about the fractal geometry of {W_{\lambda,b}} is:

Conjecture (Mandelbrot 1977): The Hausdorff dimension of the graph of {W_{\lambda,b}} is

\displaystyle 1<\textrm{dim}(\textrm{graph}(W_{\lambda,b})) = 2 + \frac{\log\lambda}{\log b} < 2

Remark 4 In view of remarks 2 and 3, the whole point of Mandelbrot’s conjecture is to establish the lower bound

\displaystyle \textrm{dim}(\textrm{graph}(W_{\lambda,b})) \geq 2 + \frac{\log\lambda}{\log b}

Remark 5 The analog of Mandelbrot conjecture for the box and packing dimensions is known to be true: see, e.g., these papers here and here).

In a recent paper (see here), Shen proved the following result:

Theorem 1 (Shen) For any {b\geq 2} integer and for all {1/b < \lambda < 1}, the Mandelbrot conjecture is true, i.e.,

\displaystyle \textrm{dim}(\textrm{graph}(W_{\lambda,b})) = 2 + \frac{\log\lambda}{\log b}

Remark 6 The techniques employed by Shen also allow him to show that given {\phi:\mathbb{R}\rightarrow\mathbb{R}} a {\mathbb{Z}}-periodic, non-constant, {C^2} function, and given {b\geq 2} integer, there exists {K=K(\phi,b)>1} such that

\displaystyle \textrm{dim}(\textrm{graph}(f^{\phi}_{\lambda,b})) = 2 + \frac{\log\lambda}{\log b}

for all {1/K < \lambda < 1}.

Remark 7 A previous important result towards Mandelbrot’s conjecture was obtained by Barańsky-Barány-Romanowska (in 2014): they proved that for all {b\geq 2} integer, there exists {1/b < \lambda_b < 1} such that

\displaystyle \textrm{dim}(\textrm{graph}(W_{\lambda,b})) = 2 + \frac{\log\lambda}{\log b}

for all {\lambda_b < \lambda < 1}.

The remainder of this post is dedicated to give some ideas of Shen’s proof of Theorem1 by discussing the particular case when {1/b<\lambda<2/b} and {b\in\mathbb{N}} is large.

2. Ledrappier’s dynamical approach

If {b\geq 2} is an integer, then the self-similar function {f^{\phi}_{\lambda,b}} (cf. Remark 1) is also {\mathbb{Z}}-periodic, i.e., {f^{\phi}_{\lambda,b}(x+1) = f^{\phi}_{\lambda,b}(x)} for all {x\in\mathbb{R}}. In particular, if {b\geq 2} is an integer, then {\textrm{graph}(f^{\phi}_{\lambda,b})} is an invariant repeller for the endomorphism {\Phi:\mathbb{R}/\mathbb{Z}\times\mathbb{R}\rightarrow \mathbb{R}/\mathbb{Z}\times\mathbb{R}} given by

\displaystyle \Phi(x,y) = \left(bx\textrm{ mod }1, \frac{y-\phi(x)}{\lambda}\right)

This dynamical characterization of {G = \textrm{graph}(f^{\phi}_{\lambda,b})} led Ledrappier to the following criterion for the validity of Mandelbrot’s conjecture when {b\geq 2} is an integer.

Denote by {\mathcal{A}} the alphabet {\mathcal{A}=\{0,\dots,b-1\}}. The unstable manifolds of {\Phi}through {G} have slopes of the form

\displaystyle (1,-\gamma \cdot s(x,u))

where {\frac{1}{b} < \gamma = \frac{1}{\lambda b} <1}, {x\in\mathbb{R}}, {u\in\mathcal{A}^{\mathbb{N}}}, and

\displaystyle s(x,u):=\sum\limits_{n=0}^{\infty} \gamma^n \phi'\left(\frac{x + u_1 + u_2 b + \dots + u_n b^{n-1}}{b^n}\right)

In this context, the push-forwards {m_x := (u\mapsto s(x,u))_*\mathbb{P}} of the Bernoulli measure {\mathbb{P}} on {\mathcal{A}^{\mathbb{N}}} (induced by the discrete measure assigning weight {1/b} to each letter of the alphabet {\mathcal{A}}) play the role of conditional measures along vertical fibers of the unique Sinai-Ruelle-Bowen (SRB) measure {\theta} of the expanding endomorphism {T:\mathbb{R}/\mathbb{Z}\times\mathbb{R} \rightarrow \mathbb{R}/\mathbb{Z}\times\mathbb{R}},

\displaystyle T(x,y) = (bx\textrm{ mod }1, \gamma y + \psi(x)),

where {\gamma=1/\lambda b} and {\psi(x)=\phi'(x)}. In plain terms, this means that

\displaystyle \theta = \int_{\mathbb{R}/\mathbb{Z}} m_x \, d\textrm{Leb}(x) \ \ \ \ \ (1)

where {\theta} is the unique {T}-invariant probability measure which is absolutely continuous along unstable manifolds (see Tsujii’s paper).

As it was shown by Ledrappier in 1992, the fractal geometry of the conditional measures {m_x} have important consequences for the fractal geometry of the graph {G}:

Theorem 2 (Ledrappier) Suppose that for Lebesgue almost every {x\in\mathbb{R}} the conditional measures {m_x} have dimension {\textrm{dim}(m_x)=1}, i.e.,

\displaystyle \lim\limits_{r\rightarrow 0}\frac{\log m_x(B(z,r))}{\log r} = 1 \textrm{ for } m_x\textrm{-a.e. } z

Then, the graph {G=\textrm{graph}(f^{\phi}_{\lambda,b})} has Hausdorff dimension

\displaystyle \textrm{dim}(G) = 2 + \frac{\log\lambda}{\log b}

Remark 8 Very roughly speaking, the proof of Ledrappier theorem goes as follows. By Remark 4, it suffices to prove that {\textrm{dim}(G)\geq 2 + \frac{\log\lambda}{\log b}}. By Frostman lemma, we need to construct a Borel measure {\nu} supported on {G} such that

\displaystyle \underline{\textrm{dim}}(\nu) := \textrm{ ess }\inf \underline{d}(\nu,x) \geq 2 + \frac{\log\lambda}{\log b}

where {\underline{d}(\nu,x):=\liminf\limits_{r\rightarrow 0}\log \nu(B(x,r))/\log r}. Finally, the main point is that the assumptions in Ledrappier theorem allow to prove that the measure {\mu^{\phi}_{\lambda, b}} given by the lift to {G} of the Lebesgue measure on {[0,1]} via the map {x\mapsto (x,f^{\phi}_{\lambda,b}(x))}satisfies

\displaystyle \underline{\textrm{dim}}(\mu^{\phi}_{\lambda,b}) \geq 2 + \frac{\log\lambda}{\log b}

An interesting consequence of Ledrappier theorem and the equation 1 is the following criterion for Mandelbrot’s conjecture:

Corollary 3 If {\theta} is absolutely continuous with respect to the Lebesgue measure {\textrm{Leb}_{\mathbb{R}^2}}, then

\displaystyle \textrm{dim}(G) = 2 + \frac{\log\lambda}{\log b}

Proof: By (1), the absolute continuity of {\theta} implies that {m_x} is absolutely continuous with respect to {\textrm{Leb}_{\mathbb{R}}} for Lebesgue almost every {x\in\mathbb{R}}.

Since {m_x\ll \textrm{Leb}_{\mathbb{R}}} for almost every {x} implies that {\textrm{dim}(m_x)=1} for almost every {x}, the desired corollary now follows from Ledrappier’s theorem. \Box

3. Tsujii’s theorem

The relevance of Corollary 3 is explained by the fact that Tsujii found an explicittransversality condition implying the absolute continuity of {\theta}.

More precisely, Tsujii firstly introduced the following definition:

Definition 4

  • Given {\varepsilon>0}, {\delta>0} and {x_0\in\mathbb{R}/\mathbb{Z}}, we say that two infinite words {u, v\in\mathcal{A}^{\mathbb{N}}} are {(\varepsilon,\delta)}-transverse at {x_0} if either

    \displaystyle |s(x_0,u)-s(x_0,v)|>\varepsilon

    or

    \displaystyle |s'(x_0,u)-s'(x_0,v)|>\delta

  • Given {q\in\mathbb{N}}, {\varepsilon>0}, {\delta>0} and {x_0\in\mathbb{R}/\mathbb{Z}}, we say that two finite words {k,l\in\mathcal{A}^q} are {(\varepsilon,\delta)}-transverse at {x_0} if {ku}, {lv} are {(\varepsilon,\delta)}-transverse at {x_0}for all pairs of infinite words {u,v\in\mathcal{A}^{\mathbb{N}}}; otherwise, we say that {k} and {l} are{(\varepsilon,\delta)}-tangent at {x_0};
  • {E(q,x_0;\varepsilon,\delta):= \{(k,l)\in\mathcal{A}^q\times\mathcal{A}^q: (k,l) \textrm{ is } (\varepsilon,\delta)\textrm{-tangent at } x_0\}}
  • {E(q,x_0):=\bigcap\limits_{\varepsilon>0}\bigcap\limits_{\delta>0} E(q,x_0;\varepsilon,\delta)};
  • {e(q,x_0):=\max\limits_{k\in\mathcal{A}^q}\#\{l\in\mathcal{A}^q: (k,l)\in E(q,x_0)\}}
  • {e(q):=\max\limits_{x_0\in\mathbb{R}/\mathbb{Z}} e(q,x_0)}.

Next, Tsujii proves the following result:

Theorem 5 (Tsujii) If there exists {q\geq 1} integer such that {e(q)<(\gamma b)^q}, then

\displaystyle \theta\ll\textrm{Leb}_{\mathbb{R}^2}

Remark 9 Intuitively, Tsujii’s theorem says the following. The transversality condition {e(q)<(\gamma b)^q} implies that the majority of strong unstable manifolds {\ell^{uu}}are mutually transverse, so that they almost fill a small neighborhood {U} of some point {x_0} (see the figure below extracted from this paper of Tsujii). Since the SRB measure {\theta} is absolutely continuous along strong unstable manifolds, the fact that the {\ell^{uu}}‘s almost fill {U} implies that {\theta} becomes “comparable” to the restriction of the Lebesgue measure {\textrm{Leb}_{\mathbb{R}^2}} to {U}.

tsujiiacta

Remark 10 In this setting, Barańsky-Barány-Romanowska obtained their main result by showing that, for adequate choices of the parameters {\lambda} and {b}, one has {e(1)=1}. Indeed, once we know that {e(1)=1}, since {1<\gamma b}, they can apply Tsujii’s theorem and Ledrappier’s theorem (or rather Corollary 3) to derive the validity of Mandelbrot’s conjecture for certain parameters {\lambda} and {b}.

For the sake of exposition, we will give just a flavor of the proof of Theorem 1 by sketching the derivation of the following result:

Proposition 6 Let {\phi(x) = \cos(2\pi x)}. If {1/2<\gamma=1/\lambda b <1} and {b\in\mathbb{N}} is sufficiently large, then

\displaystyle e(1)<\gamma b

In particular, by Corollary 3 and Tsujii’s theorem, if {1/2<\gamma=1/\lambda b <1} and {b\in\mathbb{N}} is sufficiently large, then Mandelbrot’s conjecture is valid, i.e.,

\displaystyle \textrm{dim}(W_{\lambda,b}) = 2+\frac{\log\lambda}{\log b}

Remark 11 The proof of Theorem 1 in full generality (i.e., for {b\geq 2} integer and {1/b<\lambda<1}) requires the introduction of a modified version of Tsujii’s transversality condition: roughly speaking, Shen defines a function {\sigma(q)\leq e(q)}(inspired from Peter-Paul inequality) and he proves

  • (a) a variant of Proposition 6: if {b\geq 2} integer and {1/b<\lambda<1}, then {\sigma(q)<(\gamma b)^q} for some integer {q};
  • (b) a variant of Tsujii’s theorem: if {\sigma(q)<(\gamma b)^q} for some integer {q}, then {\theta\ll\textrm{Leb}_{\mathbb{R}^2}}.

See Sections 2, 3, 4 and 5 of Shen’s paper for more details.

We start the (sketch of) proof of Proposition 6 by recalling that the slopes of unstable manifolds are given by

\displaystyle s(x,u):=-2\pi\sum\limits_{n=0}^{\infty} \gamma^n \sin\left(2\pi\frac{x + u_1 + u_2 b + \dots + u_n b^{n-1}}{b^n}\right)

for {x\in\mathbb{R}}, {u\in\mathcal{A}^{\mathbb{N}}}, so that

\displaystyle s'(x,u)=-4\pi^2\sum\limits_{n=0}^{\infty} \left(\frac{\gamma}{b}\right)^n \cos\left(2\pi\frac{x + u_1 + u_2 b + \dots + u_n b^{n-1}}{b^n}\right)

Remark 12 Since {\gamma/b < \gamma}, the series defining {s'(x,u)} converges faster than the series defining {s(x,u)}.

By studying the first term of the expansion of {s(x,u)} and {s'(x,u)} (while treating the remaining terms as a “small error term”), it is possible to show that if {(k,l)\in E(1,x_0)}, then

\displaystyle \left|\sin\left(2\pi\frac{x_0+k}{b}\right) - \sin\left(2\pi\frac{x_0+l}{b}\right)\right| \leq\frac{2\gamma}{1-\gamma} \ \ \ \ \ (2)

and

\displaystyle \left|\cos\left(2\pi\frac{x_0+k}{b}\right) - \cos\left(2\pi\frac{x_0+l}{b}\right)\right| \leq \frac{2\gamma}{b-\gamma} \ \ \ \ \ (3)

(cf. Lemma 3.2 in Shen’s paper).

Using these estimates, we can find an upper bound for {e(1)} as follows. Take {x_0\in\mathbb{R}/\mathbb{Z}} with {e(1)=e(1,x_0)}, and let {k\in\mathcal{A}} be such that {(k,l_1),\dots,(k,l_{e(1)})\in E(1,x_0)} distinct elements listed in such a way that

\displaystyle \sin(2\pi x_i)\leq \sin(2\pi x_{i+1})

for all {i=1,\dots,e(1)-1}, where {x_i:=(x_0+l_i)/b}.

From (3), we see that

\displaystyle \left|\cos\left(2\pi x_i\right) - \cos\left(2\pi x_{i+1}\right)\right| \leq \frac{4\gamma}{b-\gamma}

for all {i=1,\dots,e(1)-1}.

Since

\displaystyle (\cos(2\pi x_i)-\cos(2\pi x_{i+1}))^2 + (\sin(2\pi x_i)-\sin(2\pi x_{i+1}))^2 = 4\sin^2(\pi(x_i-x_{i+1}))\geq 4\sin^2(\pi/b),

it follows that

\displaystyle |\sin(2\pi x_i)-\sin(2\pi x_{i+1})|\geq \sqrt{4\sin^2\left(\frac{\pi}{b}\right) - \left(\frac{4\gamma}{b-\gamma}\right)^2} \ \ \ \ \ (4)

Now, we observe that

\displaystyle \sqrt{4\sin^2\left(\frac{\pi}{b}\right) - \left(\frac{4\gamma}{b-\gamma}\right)^2} > \frac{4}{b} \ \ \ \ \ (5)

for {b} large enough. Indeed, this happens because

  • {\sqrt{z^2-w^2}>2(z-w)} if {z+w>4(z-w)};
  • {z+w>4(z-w)} if {z/w:=u < 5/3};
  • {\frac{2\sin(\frac{\pi}{b})}{\frac{4\gamma}{b-\gamma}}\rightarrow \frac{2\pi}{4\gamma} (< \frac{5}{3})} as {b\rightarrow\infty}, and {2\sin(\frac{\pi}{b}) - \frac{4\gamma}{b-\gamma} \rightarrow (2\pi-4\gamma)\frac{1}{b} (>\frac{2}{b})} as {b\rightarrow\infty} (here we used {\gamma<1}).

By combining (4) and (5), we deduce that

\displaystyle |\sin(2\pi x_i)-\sin(2\pi x_{i+1})| > 4/b

for all {i=1,\dots, e(1)-1}.

Since {-1\leq\sin(2\pi x_1)\leq\sin(2\pi x_2)\leq\dots\leq\sin(2\pi x_{e(1)})\leq 1}, the previous estimate implies that

\displaystyle \frac{4}{b}(e(1)-1)<\sum\limits_{i=1}^{e(1)-1}(\sin(2\pi x_{i+1}) - \sin(2\pi x_i)) = \sin(2\pi x_{e(1)}) - \sin(2\pi x_1)\leq 2,

i.e.,

\displaystyle e(1)<1+\frac{b}{2}

Thus, it follows from our assumptions ({\gamma>1/2}, {b} large) that

\displaystyle e(1)<1+\frac{b}{2}<\gamma b

This completes the (sketch of) proof of Proposition 6 (and our discussion of Shen’s talk).

为什么说绝大多数人都是“低品质勤奋者”?

写在前面:面对时代的飞速变化,你可曾焦虑和无助?也许你见过所在的城市凌晨四点的样子,也曾搭乘最后一班地铁回家。然而,这并不是最可怕的,可怕的是你累得像条狗,感觉身体被掏空,然而却并没有什么卵用。也许你也曾像我一样,每天忙碌又疲惫,却依然只是一名“低品质勤奋者”。这篇文章正是自己对于“真正的勤奋”这一话题的思考,希望有机会一同深入探讨。

(一)一个普通“勤奋者”的模糊肖像

 

行色匆匆的上班族

如果你足够勤奋,你多半会按照被这个时代所鼓励的方式去生活——热爱学习,拥抱变化,走在快速成长的风口上——或者至少你是这么认为的:

首先,你会耳濡目染相当数量的缺乏实现路径的励志故事,相信天道必然酬勤,在地铁上也不忘用一本《创业维艰》或者《穷爸爸富爸爸》来配合自己的定位;

然后,你对潮流的走向也相当敏锐,罗辑思维的语音一天不落,忙于穿梭于各互联网创业训练营,一言不合就用微信来扫一扫,自以为与各种大咖建立了连接;

当然,作为崛起的中产阶级一份子,你对于旅游也持有支持的态度,说走就走的事情也不是没干过,体验不同的生活固然是一个很文艺的说辞,然而下面往往才是重点——用美颜相机精心地采集好你“生活在别处”的证据,通过朋友圈被选择性地展示出来,并满怀期待地等待32个赞。

可是问题是:你做完了以上所有事,你会如愿得到你想要的结果吗?或者你有认真考虑过结果吗?

是的,这才是问题的关键所在——我们讨论的绝不是“勤奋的姿势”,而是“勤奋所带来的结果”。

(二)表演“勤奋”,还是想把事情搞定?

大概很少人会拒绝“成功来自勤奋”这种说法。就像大多数人拥有梦想的人一样,说不定凌晨四点,你就踏上了一天的征途,去迎接一整天的忙忙碌碌和东奔西走,好不容易处理好一天的工作,顾不上身体被掏空,又赶着最后一班地铁回家。我相信,你这么一复一日地努力,无非想结果更好一些,离成功更近一点,不过令人遗憾的是,时间不仁以万物为刍狗,不舍昼夜地消逝。不经意间小半年过去了,接着一年又没了,直到你盘点收获时,才尴尬地发现以下事实:

1、之前计划好的雅思没有准备好,只得弃考或者硬着头皮裸考,导致无法出国;

2、一直想提高的演讲和写作技能也没大长进,所以那次难得的公众表达机会就这么白白溜走;

3、甚至你一直期待的“减肥成功后,自信满满地向女神大胆表白”这样的美好画面也没有出现,原因想必大家都了解。

这所有的一切,都与你制定目标时的雄心壮志相去甚远,压迫着你的神经,以至于你会显得忿忿不平:我投入了这么多时间,却没有收到预期的回报,实在是不公平!

两届总统任期这么快?有点尴尬了

事实上,我认为“说时间不公平”才是最大的不公平。进一步解释,时间甚至是绝无仅有跨越国籍限制、打破阶级边界、罔顾古今之别的神奇资源,它被无差别地分配到了每一个人手中。而具体到用相同的时间资源产生大不同的结果,原因也是有的,即每个人对时间的感知能力和利用效率不同。这一点李笑来老师在《把时间当做朋友》这本书里已经详细剖析过了——不同的心智水平会让同样的时间资源在不同人那里产生截然不同的结果。

学会感知时间,做时间的朋友

在我看来,这种优秀的心智能力更多的是一种策略利好,它会对你的实践起到“思维工具性质”的帮助。但是,如果让我拿一辆法拉利跑车为例,如果一名老司机想要真正发挥其威力,那除了“跑车足够新、司机足够老”之外,还有一点不可或缺——油箱里必须有足够的油。

所以,我质疑的从来不是“勤奋有没有用”,而是认为“表演勤奋”的这种行为没有价值,这种看似勤奋的行为实质上是一个人“思维懒惰”的保护色。用一句流传甚广的话来概括:这根本是在用战术上的勤奋来掩盖战略上的懒惰——表面上你很勤奋,实际上却刻意回避了真正困难却更有价值的部分——而这种“思维懒惰”的行为最终会导致你成为了文章开头中所提及的“低品质勤奋者”。

其实,我是个演员

还是结合文章开头的场景来谈:

你听完罗辑思维的语音后,一时心血来潮地下单了很多书,却从来不看——不难理解,毕竟买书的行为容易,看书则要困难得多;而更加困难的是,你完全没有思考过你应该系统地读哪些书来更好地解决你的实际问题,哪些书对你的帮助最大。

你下了血本,花了大几千块去听风头正劲的某大咖演讲,哪怕他标价¥38 的书里所阐述的思想完全一样——这也好理解,毕竟听演讲这个行为有逼格又轻松,况且还可以勾搭上大咖;而相对让人不那么愉悦的还是埋头看书这件事了,至于能否勾搭上大咖,我认为唯一靠谱的判断标准就是你的咖是不是够大,但是思维懒惰者总会有自欺欺人的理由。

至于“旅游去体验生活”这件事,我很认同其价值,不过我认为其美好特质依然与思维懒惰者无缘。我问过好多朋友:你旅行的目的是什么?令我吃惊的是,虽然答案五花八门,但是好像没几个人能真正说出一个让他们自己满意的答案。当然,有一个女生想得比较清楚,她认为“旅行是一种让自己从例行公事般日常脱离,去体验另一种生活的机会”,也许正是她的这种认真思考所带来的对于旅行的认同感,让她分外珍惜每次旅行的机会——往返机票和住宿的预订、装备行李的配置,以及记录心情的旅行札记——无一例外的精心规划。我几乎能想象出这种积极的准备态度会让她拥有怎样高品质的旅行经历。

思维懒惰,你有吗?

以上行为的价值有高有低,但毫无例外,你很有可能就选择了价值更低的那种。在此声明,虽然我用的代词是“你”,实际上也是“我”,这是我们每个人的思维倾向。事实上,一旦我们选择了“思维懒惰”,我们也就选择了做一名“低品质勤奋者”,同时也就选择了低价值的行为和由此而来的低价值结果。

到这里,我有了这么个初步结论:“思维懒惰”所带来的“低认知水平”才是“低品质勤奋者”产生的原因。不过依然困扰着我的是:费这么大劲,苦也没少吃,福却没多享,从经济学角度看,“低品质勤奋者”的勤奋行为性价比极低,完全不具备投资价值,那为什么包括自己在内的这么多人还乐此不疲地投入其中?也许美团王兴的一句话道破了天机。

(三)多数人为了逃避真正的思考,愿意做任何事情

我第一次看到这句话愣了半天,我想如果你初次看到这句话,并且足够走心,多半也会被震撼到。这句话的力量在于它放弃了自我欺骗,毫不留情地拒绝了任何寻找借口的可能。所以,经济学上解释不通的事情,就这样在心理学上找到了突破口。人是趋利避害的动物,在进化史上绝对长的时间内,人类都没有被赋予过多深度思考的任务,原因很简单,光是应激反应就足以解决掉过去95%以上的问题了。但是让基因万万没想到的是,人类的进化速度竟然是如此之快!

事实上,心理学家丹尼尔.卡尼曼在其著作《思考,快与慢》中对此有过精妙描述。他认为,我们的大脑有快与慢两种作决定的方式。常用的无意识的“系统1”依赖情感、记忆和经验迅速作出判断,它见闻广博,反应快速,但很容易上当。有意识的“系统2”通过调动注意力来分析和解决问题,并作出决定,它比较慢,不容易出错,但却很懒惰,经常走捷径采纳系统1的直觉型判断结果。

在很长一段时间内,这种处理方式——面对于变化缓慢的环境,基因采纳系统1的直觉型判断结果——不存在任何问题。一方面,它做出了一个大概率靠谱的决定来应对环境的缓慢变化;另一方面,懒惰地走捷径也让基因节约了能量,这对于远古时期食物获取成本极高的人类而言意义非凡。所以,当我们谈到为什么人会“思维懒惰”、或者为什么不习惯于“深度思考”的时候,我们实际上是在通过向基因施压,让其减少对于“条件反射”这种救命神器的能量分配,转而向“深度思考”这种奢侈品倾斜。这对从远古穿越而来的基因而言,无异于降低基因携带者的生存概率。简单来概括,深度思考在基因层面是反人性的。

基因不鼓励原始人深度思考

让我们从远古穿越到现在,那么目力所及,现在社会究竟是什么样的存在? 变化,急剧的变化,非常急剧的变化!事实上,变化早已经成为了我们彼此心知肚明的共识,这种越来越快的变化所导致的一个直接结果就是信息的指数级发展,从信息的承载方式上亦可见一斑——从甲骨、竹简、羊皮卷、印刷纸,一直到理论上无限大的虚拟存储空间。

对于这种信息疯狂蔓延所引发的知识洪荒,每个人可怜的认知能力显得是那样微不足道,认知能力取代了知识信息储备成为了更为稀缺的资源,构建起人与人之间新的壁垒。如果此时还顺从顽固不化的基因,继续思维懒惰下去,避免反人性的深度思考,会导致什么结果?我想结果大概也很容易预测——我们将无法享受到知识增长和环境改变所带来的好处,至多维持目前水平的生活质量,甚至被淹死在信息的洪荒之中,焦虑无助、不知所措。有句话说得好,如果想得到与过去不同的结果,就必须做一些与过去不同的事情,而这些不一样首先要体现在认知层面。

(四)“深度思考”才能带来“认知升级”,从而成为“高品质勤奋者”

谈到“深度思考”,爱因斯坦说过这么一段话,让我印象尤为深刻:

如果给我1个小时解答一道决定我生死的问题,我会花55分钟来弄清楚这道题到底是在问什么。一旦清楚了它到底在问什么,剩下的5分钟足够回答这个问题。

想象力比知识更重要

死生亦大矣,这段话用事关生死的极端描述强调了“深度思考”的重要性,很有说服力。而事实上,在真正生死命悬一线的战争场景中,“深度思考”这种特质的地位不仅没有因为兵荒马乱的紧迫性而被削弱,反而是因为沙场嗜血的残酷特质被大大提升了。

我们都知道《孙子兵法》,在这部被誉为“兵学圣典”一书的“军行篇”中有这么一句:胜兵先胜而后求战,败兵先战而后求胜。意思是说,在两军短兵相接之前,就要做好充分的准备:努力收集一切渠道的信息,充分评估当下态势,殚精竭虑地质问己方一切的隐患和可能发生的问题,然后在脑海里推测、模拟战争可能的走势,利用现存资源来精心筹划出解决方案。等到这所有工作都就绪,双方真正踏上战场的时候,才能将一切了然于心而胸有成竹,这仗才会有胜算。由此可见,对于精心准备的一方,战争的大部分工作在战前就在深度思考的头脑里完成了,上战场打仗只不过是一个例行公事般的存在,胜负的天平早已倾斜。

说到战争,这里不得不提到另一个人,也我个人欣赏的战争天才。作为被毛泽东评价为“无以伦比的常胜元帅”、被蒋介石赞赏为“黄埔最优秀将军”的林彪,之所以能在战场上所向披靡、战无不胜,靠的绝非简简单单的“狭路相逢勇者胜”,而是“深度思考”得出的对战场局势胜人一筹的认知水平。江湖上关于林的战争传闻很多,最有传奇色彩的可能要数“他利用大数据活捉廖耀湘”这件事。

战时林彪(中)

自1948年辽沈战役,每天深夜林彪都在东北野战军前线指挥所里听取军情汇报,由值班参谋读出下属各个纵队、师、团用电台报告的当日战况和缴获情况,而林彪则认真细致地记录着大数据。在一次关于“胡家窝棚那个战斗的缴获”中,林彪敏锐地从一个数据的微小变化中察觉到了异样,面对一脸懵逼的吃瓜部下,林彪用三个疑问确定了问题的关键所在:

“为什么那里缴获的短枪与长枪的比例比其它战斗略高?”

“为什么那里缴获和击毁的小车与大车的比例比其它战斗略高?”

“为什么在那里俘虏和击毙的军官与士兵的比例比其它战斗略高?”

没等大家反应过来,林彪大步走向挂满军用地图的墙壁,指着地图上的那个点说:“我断定敌人的指挥所就在这里!”事实上,林彪可以如此笃定,正是得益于他高品质的勤奋——拒绝思维懒惰,坚持深度思考——长期的数据记录和分析,让这些枯燥的数字在林彪脑中形成了系统化的数据库,所以一旦出现偏差,他便可以及时发现不同,推理出准确信息,找出关键价值所在。

在林彪推理出情报的帮助下,新六军的指挥所很快就被连锅端了。新六军军长廖耀湘,这位出身黄埔并留学法国著名的圣西尔军校、参加过滇缅战役的名将,想不到自己精心隐蔽的精悍野战司令部这么快就被灭掉,输的不甘心,认为这是一个偶然事件。而当他得知林彪是如何得出判断之后,他说:“我服了,败在他手下,不丢人。”

除了重视数据,林彪的勤奋细节还体现在他尤为重视调查,作为“八路军出师以来打的第一个大胜仗”平型关大捷的总指挥,他在战前三次到平型关乔沟一带进行实地勘察:

第一次是他带着参谋人员和电台去的。首先到平型关关口,爬上关口北侧山岭,对着地图观察平型关以东的山势、河沟、村庄和道路。然后下山沿西跑池、东跑池公路到乔沟至东河南,察看峡谷公路两侧的地形地貌;

第二次是他化装去侦察的。重点勘察了老爷庙前的地形和乔沟南侧山地地貌,一个完整的伏击战计划在林彪脑海里基本形成;

第三次是在上寨动员会后,林彪和聂荣臻带着旅长、团长们去侦查的,并在现场向各团指定了埋伏地点,明确了师、旅、团指挥所的位置。

战争的筹备工作历来繁杂,在战争开始前三天,基于各种局势下的战斗模拟就没有停过,这还不包括对于战时的部队部署,以及战前对于全师连以上干部的动员工作。诚然,战争胜利的因素很多,但是至少,林彪在战前基于“深度思考”的勤奋准备对于平型关大捷的结果功不可没。

我有一个习惯,如果我觉得一个人与众不同,我会去分析他的思维方式,而了解一个人思维方式的最好方法莫过于听他自己怎么讲。林彪自己在《怎样当好一名师长》一文中就分述九点,把他自己的工作方法进行了细致的总结。文章网上可以找到,看上去朴实无华却内含寸劲,条条直达要害。在我看来,估计很少有人能按这九条来落实,原因是太耗心力——至少有四条要求需要投入大量的精力来“深度思考”,其中第五条的要求是这样的,因为太过经典,我原封不动引用出来:

五、要把各方面的问题想够想透:

每一次战役、战斗的组织,要让大家提出各种可能出现的问题,要让大家来找答案,而且要从最坏的最严重的情况来找答案。把所有提出来的问题都回答了,再没有问题没有回答的了,这样,打起仗来才不会犯大错误,万一犯了错误,也比较容易纠正。没有得到答案的问题,不能因为想了很久想不出来就把它丢开,留下一个疙瘩。如果这样,是很危险的,在紧要关头,这个疙瘩很可能冒出来,就会使你们心中无数,措手不及。当然,在战争环境中,要考虑的问题很多,不可能一次都提完,也不可能一次都回答完,整个战役、战斗的过程,就是不断提出问题和不断回答问题的过程。有时脑子很疲劳,有的问题可能立即回答不了。这时,除了好好地和别人商量以外,就好好地睡一觉,睡好了,睡醒了,头脑清醒了,再躺在床上好好想一想,就可能开窍,可能想通了,回答了,解决了。总之,对每一个问题不能含糊了事。问题回答完了,战役、战斗的组织才算完成。

这里必须要说实话,起初我看完林彪这篇文章,居然相当紧张,直冒一身冷汗。因为遑论真刀真枪地上沙场,仅仅看完这九条,就发现居然有如此多不达标之处,可见通过“深度思考”让自己的“认知升级”,确实不是一件容易的事情。不过同时也松了一口气,甚至略有欣喜——好歹已经知道了方法,也算上了道。

写到这里,我基本上也理清了自己的思路:勤奋很重要,怎么强调都不为过,它是优秀结果的必要非充分条件。那么如何让它变得充分必要?我给出的答案是——拒绝思维懒惰,习惯于深度思考,提升自己的认知水平。而至于如何深度思考,我觉得每个人都应该尝试着给出自己的答案,我也会在今后的文章中就这一话题谈谈自己的看法。

文章写完,于我而言却仅仅是一个开始,因为我清醒地知道“拒绝思维懒惰,习惯于深度思考”其实是在同人性抗争,必须要做好打持久战的准备。同时,我也希望每一位真正勤奋的人都能撕掉“低品质”的标签,过上配得上你努力程度的高品质生活。

 

文/布洛迪的后花园(简书作者)
原文链接:http://www.jianshu.com/p/e26f435b7b0a
著作权归作者所有,转载请联系作者获得授权,并标注“简书作者”。