From: Brian Norris Date: Tue, 4 Jun 2013 23:14:18 +0000 (-0700) Subject: Merge branch 'markdown' X-Git-Tag: oopsla2013-final~15 X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=a47b5b6b7f12dd7b1a9a8ebe866fb33c5f26dc87;hp=9757697e52c8e15f3994a44d7bf4140d65822b32;p=model-checker.git Merge branch 'markdown' --- diff --git a/.gitignore b/.gitignore index 4acd010..8df9862 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,6 @@ # generic types *.o -.*.swp +*.swp *.swo *.so *~ @@ -12,3 +12,4 @@ /tags /doc/docs /benchmarks +/README.html diff --git a/Makefile b/Makefile index 0ad9fa6..b7fd60b 100644 --- a/Makefile +++ b/Makefile @@ -18,15 +18,20 @@ endif TESTS_DIR := test -all: $(LIB_SO) tests +MARKDOWN := doc/Markdown/Markdown.pl + +all: $(LIB_SO) tests README.html debug: CPPFLAGS += -DCONFIG_DEBUG debug: all PHONY += docs -docs: *.c *.cc *.h +docs: *.c *.cc *.h README.html doxygen +README.html: README.md + $(MARKDOWN) $< > $@ + $(LIB_SO): $(OBJECTS) $(CXX) $(SHARED) -o $(LIB_SO) $+ $(LDFLAGS) diff --git a/README b/README deleted file mode 100644 index 3064ce6..0000000 --- a/README +++ /dev/null @@ -1,243 +0,0 @@ -**************************************** - CDSChecker Readme -**************************************** - -Copyright (c) 2013 Regents of the University of California. All rights reserved. - -CDSChecker is distributed under the GPL v2. See the LICENSE file for details. - -This README is divided into sections as follows: - - I. Overview - II. Basic build and run - III. Running your own code - IV. Reading an execution trace -Appendix - A. References - ----------------------------------------- - I. Overview ----------------------------------------- - -CDSChecker is a model checker for C11/C++11 exhaustively explores the behaviors -of code under the C11/C++11 memory model. It uses partial order reduction to -eliminate redundant executions to significantly shrink the state space. -The model checking algorithm is described in more detail in this paper -(currently under review): - - http://demsky.eecs.uci.edu/publications/c11modelcheck.pdf - -It is designed to support unit tests on concurrent data structure written using -C11/C++11 atomics. - -CDSChecker is constructed as a dynamically-linked shared library which -implements the C and C++ atomic types and portions of the other thread-support -libraries of C/C++ (e.g., std::atomic, std::mutex, etc.). Notably, we only -support the C version of threads (i.e., thrd_t and similar, from ), -because C++ threads require features which are only available to a C++11 -compiler (and we want to support others, at least for now). - -CDSChecker should compile on Linux and Mac OSX with no dependencies and has been -tested with LLVM (clang/clang++) and GCC. It likely can be ported to other *NIX -flavors. We have not attempted to port to Windows. - -Other references can be found at the main project page: - - http://demsky.eecs.uci.edu/c11modelchecker.php - ----------------------------------------- - II. Basic build and run ----------------------------------------- - -Sample run instructions: - -$ make -$ export LD_LIBRARY_PATH=. -$ ./test/userprog.o # Runs simple test program -$ ./test/userprog.o -h # Prints help information -Copyright (c) 2013 Regents of the University of California. All rights reserved. -Distributed under the GPLv2 -Written by Brian Norris and Brian Demsky - -Usage: ./test/userprog.o [MODEL-CHECKER OPTIONS] -- [PROGRAM ARGS] - -MODEL-CHECKER OPTIONS can be any of the model-checker options listed below. Arguments -provided after the `--' (the PROGRAM ARGS) are passed to the user program. - -Model-checker options: --h, --help Display this help message and exit --m, --liveness=NUM Maximum times a thread can read from the same write - while other writes exist. - Default: 0 --M, --maxfv=NUM Maximum number of future values that can be sent to - the same read. - Default: 0 --s, --maxfvdelay=NUM Maximum actions that the model checker will wait for - a write from the future past the expected number - of actions. - Default: 6 --S, --fvslop=NUM Future value expiration sloppiness. - Default: 4 --y, --yield Enable CHESS-like yield-based fairness support. - Default: disabled --Y, --yieldblock Prohibit an execution from running a yield. - Default: disabled --f, --fairness=WINDOW Specify a fairness window in which actions that are - enabled sufficiently many times should receive - priority for execution (not recommended). - Default: 0 --e, --enabled=COUNT Enabled count. - Default: 1 --b, --bound=MAX Upper length bound. - Default: 0 --v[NUM], --verbose[=NUM] Print verbose execution information. NUM is optional: - 0 is quiet; 1 is noisy; 2 is noisier. - Default: 0 --u, --uninitialized=VALUE Return VALUE any load which may read from an - uninitialized atomic. - Default: 0 --t, --analysis=NAME Use Analysis Plugin. --o, --options=NAME Option for previous analysis plugin. - -o help for a list of options - -- Program arguments follow. - -Analysis plugins: -SC - - -Note that we also provide a series of benchmarks (distributed separately), -which can be placed under the benchmarks/ directory. After building CDSChecker, -you can build and run the benchmarks as follows: - - cd benchmarks - make - ./run.sh barrier/barrier -y -m 2 # runs barrier test with fairness/memory liveness - ./bench.sh # run all benchmarks twice, with timing results - ----------------------------------------- - III. Running your own code ----------------------------------------- - -We provide several test and sample programs under the test/ directory, which -should compile and run with no trouble. Of course, you likely want to test your -own code. To do so, you need to perform a few steps. - -First, because CDSChecker executes your program dozens (if not hundreds or -thousands) of times, you will have the most success if your code is written as a -unit test and not as a full-blown program. - -Next, test programs should use the standard C11/C++11 library headers -(/, , , ) and must -name their main routine as user_main(int, char**) rather than main(int, char**). -We only support C11 thread syntax (thrd_t, etc. from ). - -Test programs may also use our included happens-before race detector by -including and utilizing the appropriate functions -(store_{8,16,32,64}() and load_{8,16,32,64}()) for loading/storing data from/to -non-atomic shared memory. - -CDSChecker can also check boolean assertions in your test programs. Just -include and use the MODEL_ASSERT() macro in your test program. -CDSChecker will report a bug in any possible execution in which the argument to -MODEL_ASSERT() evaluates to false (that is, 0). - -Test programs should be compiled against our shared library (libmodel.so) using -the headers in the include/ directory. Then the shared library must be made -available to the dynamic linker, using the LD_LIBRARY_PATH environment -variable, for instance. - ----------------------------------------- - IV. Reading an execution trace ----------------------------------------- - -When CDSChecker detects a bug in your program (or when run with the --verbose -flag), it prints the output of the program run (STDOUT) along with some summary -trace information for the execution in question. The trace is given as a -sequence of lines, where each line represents an operation in the execution -trace. These lines are ordered by the order in which they were run by CDSChecker -(i.e., the "execution order"), which does not necessarily align with the "order" -of the values observed (i.e., the modification order or the reads-from -relation). - -The following list describes each of the columns in the execution trace output: - - o #: The sequence number within the execution. That is, sequence number "9" - means the operation was the 9th operation executed by CDSChecker. Note that - this represents the execution order, not necessarily any other order (e.g., - modification order or reads-from). - - o t: The thread number - - o Action type: The type of operation performed - - o MO: The memory-order for this operation (i.e., memory_order_XXX, where XXX is - relaxed, release, acquire, rel_acq, or seq_cst) - - o Location: The memory location on which this operation is operating. This is - well-defined for atomic write/read/RMW, but other operations are subject to - CDSChecker implementation details. - - o Value: For reads/writes/RMW, the value returned by the operation. Note that - for RMW, this is the value that is *read*, not the value that was *written*. - For other operations, 'value' may have some CDSChecker-internal meaning, or - it may simply be a don't-care (such as 0xdeadbeef). - - o Rf: For reads, the sequence number of the operation from which it reads. - [Note: If the execution is a partial, infeasible trace (labeled INFEASIBLE), - as printed during --verbose execution, reads may not be resolved and so may - have Rf=? or Rf=Px, where x is a promised future value.] - - o CV: The clock vector, encapsulating the happens-before relation (see our - paper, or the C/C++ memory model itself). We use a Lamport-style clock vector - similar to [1]. The "clock" is just the sequence number (#). The clock vector - can be read as follows: - - Each entry is indexed as CV[i], where - - i = 0, 1, 2, ..., - - So for any thread i, we say CV[i] is the sequence number of the most recent - operation in thread i such that operation i happens-before this operation. - Notably, thread 0 is reserved as a dummy thread for certain CDSChecker - operations. - -See the following example trace: - ------------------------------------------------------------------------------------- -# t Action type MO Location Value Rf CV ------------------------------------------------------------------------------------- -1 1 thread start seq_cst 0x7f68ff11e7c0 0xdeadbeef ( 0, 1) -2 1 init atomic relaxed 0x601068 0 ( 0, 2) -3 1 init atomic relaxed 0x60106c 0 ( 0, 3) -4 1 thread create seq_cst 0x7f68fe51c710 0x7f68fe51c6e0 ( 0, 4) -5 2 thread start seq_cst 0x7f68ff11ebc0 0xdeadbeef ( 0, 4, 5) -6 2 atomic read relaxed 0x60106c 0 3 ( 0, 4, 6) -7 1 thread create seq_cst 0x7f68fe51c720 0x7f68fe51c6e0 ( 0, 7) -8 3 thread start seq_cst 0x7f68ff11efc0 0xdeadbeef ( 0, 7, 0, 8) -9 2 atomic write relaxed 0x601068 0 ( 0, 4, 9) -10 3 atomic read relaxed 0x601068 0 2 ( 0, 7, 0, 10) -11 2 thread finish seq_cst 0x7f68ff11ebc0 0xdeadbeef ( 0, 4, 11) -12 3 atomic write relaxed 0x60106c 0x2a ( 0, 7, 0, 12) -13 1 thread join seq_cst 0x7f68ff11ebc0 0x2 ( 0, 13, 11) -14 3 thread finish seq_cst 0x7f68ff11efc0 0xdeadbeef ( 0, 7, 0, 14) -15 1 thread join seq_cst 0x7f68ff11efc0 0x3 ( 0, 15, 11, 14) -16 1 thread finish seq_cst 0x7f68ff11e7c0 0xdeadbeef ( 0, 16, 11, 14) -HASH 4073708854 ------------------------------------------------------------------------------------- - -Now consider, for example, operation 10: - -This is the 10th operation in the execution order. It is an atomic read-relaxed -operation performed by thread 3 at memory address 0x601068. It reads the value -"0", which was written by the 2nd operation in the execution order. Its clock -vector consists of the following values: - - CV[0] = 0, CV[1] = 7, CV[2] = 0, CV[3] = 10 - - ----------------------------------------- - A. References ----------------------------------------- - -[1] L. Lamport. Time, clocks, and the ordering of events in a distributed - system. CACM, 21(7):558–565, July 1978. diff --git a/README b/README new file mode 120000 index 0000000..42061c0 --- /dev/null +++ b/README @@ -0,0 +1 @@ +README.md \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..218e960 --- /dev/null +++ b/README.md @@ -0,0 +1,240 @@ +CDSChecker Readme +================= + +Copyright © 2013 Regents of the University of California. All rights reserved. + +CDSChecker is distributed under the GPL v2. See the LICENSE file for details. + +This README is divided into sections as follows: + + * Overview + * Basic build and run + * Running your own code + * Reading an execution trace + * References + +Overview +-------- + +CDSChecker is a model checker for C11/C++11 exhaustively explores the behaviors +of code under the C11/C++11 memory model. It uses partial order reduction to +eliminate redundant executions to significantly shrink the state space. +The model checking algorithm is described in more detail in this paper +(currently under review): + + [http://demsky.eecs.uci.edu/publications/c11modelcheck.pdf](http://demsky.eecs.uci.edu/publications/c11modelcheck.pdf) + +It is designed to support unit tests on concurrent data structure written using +C11/C++11 atomics. + +CDSChecker is constructed as a dynamically-linked shared library which +implements the C and C++ atomic types and portions of the other thread-support +libraries of C/C++ (e.g., std::atomic, std::mutex, etc.). Notably, we only +support the C version of threads (i.e., `thrd_t` and similar, from ``), +because C++ threads require features which are only available to a C++11 +compiler (and we want to support others, at least for now). + +CDSChecker should compile on Linux and Mac OSX with no dependencies and has been +tested with LLVM (clang/clang++) and GCC. It likely can be ported to other \*NIX +flavors. We have not attempted to port to Windows. + +Other references can be found at the main project page: + + [http://demsky.eecs.uci.edu/c11modelchecker.php](http://demsky.eecs.uci.edu/c11modelchecker.php) + +Basic build and run +------------------- + +Sample run instructions: + +
+$ make
+$ export LD_LIBRARY_PATH=.
+$ ./test/userprog.o                   # Runs simple test program
+$ ./test/userprog.o -h                # Prints help information
+Copyright (c) 2013 Regents of the University of California. All rights reserved.
+Distributed under the GPLv2
+Written by Brian Norris and Brian Demsky
+
+Usage: ./test/userprog.o [MODEL-CHECKER OPTIONS] -- [PROGRAM ARGS]
+
+MODEL-CHECKER OPTIONS can be any of the model-checker options listed below. Arguments
+provided after the `--' (the PROGRAM ARGS) are passed to the user program.
+
+Model-checker options:
+-h, --help                  Display this help message and exit
+-m, --liveness=NUM          Maximum times a thread can read from the same write
+                              while other writes exist.
+                              Default: 0
+-M, --maxfv=NUM             Maximum number of future values that can be sent to
+                              the same read.
+                              Default: 0
+-s, --maxfvdelay=NUM        Maximum actions that the model checker will wait for
+                              a write from the future past the expected number
+                              of actions.
+                              Default: 6
+-S, --fvslop=NUM            Future value expiration sloppiness.
+                              Default: 4
+-y, --yield                 Enable CHESS-like yield-based fairness support.
+                              Default: disabled
+-Y, --yieldblock            Prohibit an execution from running a yield.
+                              Default: disabled
+-f, --fairness=WINDOW       Specify a fairness window in which actions that are
+                              enabled sufficiently many times should receive
+                              priority for execution (not recommended).
+                              Default: 0
+-e, --enabled=COUNT         Enabled count.
+                              Default: 1
+-b, --bound=MAX             Upper length bound.
+                              Default: 0
+-v[NUM], --verbose[=NUM]    Print verbose execution information. NUM is optional:
+                              0 is quiet; 1 is noisy; 2 is noisier.
+                              Default: 0
+-u, --uninitialized=VALUE   Return VALUE any load which may read from an
+                              uninitialized atomic.
+                              Default: 0
+-t, --analysis=NAME         Use Analysis Plugin.
+-o, --options=NAME          Option for previous analysis plugin.
+                            -o help for a list of options
+ --                         Program arguments follow.
+
+Analysis plugins:
+SC
+
+ + +Note that we also provide a series of benchmarks (distributed separately), +which can be placed under the benchmarks/ directory. After building CDSChecker, +you can build and run the benchmarks as follows: + + cd benchmarks + make + ./run.sh barrier/barrier -y -m 2 # runs barrier test with fairness/memory liveness + ./bench.sh # run all benchmarks twice, with timing results + +Running your own code +--------------------- + +We provide several test and sample programs under the test/ directory, which +should compile and run with no trouble. Of course, you likely want to test your +own code. To do so, you need to perform a few steps. + +First, because CDSChecker executes your program dozens (if not hundreds or +thousands) of times, you will have the most success if your code is written as a +unit test and not as a full-blown program. + +Next, test programs should use the standard C11/C++11 library headers +(``/``, ``, ``, ``) and must +name their main routine as `user_main(int, char**)` rather than `main(int, char**)`. +We only support C11 thread syntax (`thrd_t`, etc. from ``). + +Test programs may also use our included happens-before race detector by +including and utilizing the appropriate functions +(`store_{8,16,32,64}()` and `load_{8,16,32,64}()`) for loading/storing data from/to +non-atomic shared memory. + +CDSChecker can also check boolean assertions in your test programs. Just +include `` and use the `MODEL_ASSERT()` macro in your test program. +CDSChecker will report a bug in any possible execution in which the argument to +`MODEL_ASSERT()` evaluates to false (that is, 0). + +Test programs should be compiled against our shared library (libmodel.so) using +the headers in the `include/` directory. Then the shared library must be made +available to the dynamic linker, using the `LD_LIBRARY_PATH` environment +variable, for instance. + +Reading an execution trace +-------------------------- + +When CDSChecker detects a bug in your program (or when run with the `--verbose` +flag), it prints the output of the program run (STDOUT) along with some summary +trace information for the execution in question. The trace is given as a +sequence of lines, where each line represents an operation in the execution +trace. These lines are ordered by the order in which they were run by CDSChecker +(i.e., the "execution order"), which does not necessarily align with the "order" +of the values observed (i.e., the modification order or the reads-from +relation). + +The following list describes each of the columns in the execution trace output: + + * \#: The sequence number within the execution. That is, sequence number "9" + means the operation was the 9th operation executed by CDSChecker. Note that + this represents the execution order, not necessarily any other order (e.g., + modification order or reads-from). + + * t: The thread number + + * Action type: The type of operation performed + + * MO: The memory-order for this operation (i.e., `memory_order_XXX`, where XXX is + relaxed, release, acquire, rel_acq, or seq_cst) + + * Location: The memory location on which this operation is operating. This is + well-defined for atomic write/read/RMW, but other operations are subject to + CDSChecker implementation details. + + * Value: For reads/writes/RMW, the value returned by the operation. Note that + for RMW, this is the value that is *read*, not the value that was *written*. + For other operations, 'value' may have some CDSChecker-internal meaning, or + it may simply be a don't-care (such as 0xdeadbeef). + + * Rf: For reads, the sequence number of the operation from which it reads. + [Note: If the execution is a partial, infeasible trace (labeled INFEASIBLE), + as printed during `--verbose` execution, reads may not be resolved and so may + have Rf=? or Rf=Px, where x is a promised future value.] + + * CV: The clock vector, encapsulating the happens-before relation (see our + paper, or the C/C++ memory model itself). We use a Lamport-style clock vector + similar to [1]. The "clock" is just the sequence number (#). The clock vector + can be read as follows: + + Each entry is indexed as CV[i], where + + i = 0, 1, 2, ..., + + So for any thread i, we say CV[i] is the sequence number of the most recent + operation in thread i such that operation i happens-before this operation. + Notably, thread 0 is reserved as a dummy thread for certain CDSChecker + operations. + +See the following example trace: + +
+------------------------------------------------------------------------------------
+#    t    Action type     MO       Location         Value               Rf  CV
+------------------------------------------------------------------------------------
+1    1    thread start    seq_cst  0x7f68ff11e7c0   0xdeadbeef              ( 0,  1)
+2    1    init atomic     relaxed        0x601068   0                       ( 0,  2)
+3    1    init atomic     relaxed        0x60106c   0                       ( 0,  3)
+4    1    thread create   seq_cst  0x7f68fe51c710   0x7f68fe51c6e0          ( 0,  4)
+5    2    thread start    seq_cst  0x7f68ff11ebc0   0xdeadbeef              ( 0,  4,  5)
+6    2    atomic read     relaxed        0x60106c   0                   3   ( 0,  4,  6)
+7    1    thread create   seq_cst  0x7f68fe51c720   0x7f68fe51c6e0          ( 0,  7)
+8    3    thread start    seq_cst  0x7f68ff11efc0   0xdeadbeef              ( 0,  7,  0,  8)
+9    2    atomic write    relaxed        0x601068   0                       ( 0,  4,  9)
+10   3    atomic read     relaxed        0x601068   0                   2   ( 0,  7,  0, 10)
+11   2    thread finish   seq_cst  0x7f68ff11ebc0   0xdeadbeef              ( 0,  4, 11)
+12   3    atomic write    relaxed        0x60106c   0x2a                    ( 0,  7,  0, 12)
+13   1    thread join     seq_cst  0x7f68ff11ebc0   0x2                     ( 0, 13, 11)
+14   3    thread finish   seq_cst  0x7f68ff11efc0   0xdeadbeef              ( 0,  7,  0, 14)
+15   1    thread join     seq_cst  0x7f68ff11efc0   0x3                     ( 0, 15, 11, 14)
+16   1    thread finish   seq_cst  0x7f68ff11e7c0   0xdeadbeef              ( 0, 16, 11, 14)
+HASH 4073708854
+------------------------------------------------------------------------------------
+
+ +Now consider, for example, operation 10: + +This is the 10th operation in the execution order. It is an atomic read-relaxed +operation performed by thread 3 at memory address `0x601068`. It reads the value +"0", which was written by the 2nd operation in the execution order. Its clock +vector consists of the following values: + + CV[0] = 0, CV[1] = 7, CV[2] = 0, CV[3] = 10 + + +References +---------- + +[1] L. Lamport. Time, clocks, and the ordering of events in a distributed + system. CACM, 21(7):558-565, July 1978. diff --git a/doc/Markdown/License.text b/doc/Markdown/License.text new file mode 100644 index 0000000..6d76506 --- /dev/null +++ b/doc/Markdown/License.text @@ -0,0 +1,30 @@ +Copyright (c) 2004, John Gruber + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +* Neither the name "Markdown" nor the names of its contributors may + be used to endorse or promote products derived from this software + without specific prior written permission. + +This software is provided by the copyright holders and contributors "as +is" and any express or implied warranties, including, but not limited +to, the implied warranties of merchantability and fitness for a +particular purpose are disclaimed. In no event shall the copyright owner +or contributors be liable for any direct, indirect, incidental, special, +exemplary, or consequential damages (including, but not limited to, +procurement of substitute goods or services; loss of use, data, or +profits; or business interruption) however caused and on any theory of +liability, whether in contract, strict liability, or tort (including +negligence or otherwise) arising in any way out of the use of this +software, even if advised of the possibility of such damage. diff --git a/doc/Markdown/Markdown Readme.text b/doc/Markdown/Markdown Readme.text new file mode 100644 index 0000000..6fbb95f --- /dev/null +++ b/doc/Markdown/Markdown Readme.text @@ -0,0 +1,341 @@ +Markdown +======== + +Version 1.0.1 - Tue 14 Dec 2004 + +by John Gruber + + + +Introduction +------------ + +Markdown is a text-to-HTML conversion tool for web writers. Markdown +allows you to write using an easy-to-read, easy-to-write plain text +format, then convert it to structurally valid XHTML (or HTML). + +Thus, "Markdown" is two things: a plain text markup syntax, and a +software tool, written in Perl, that converts the plain text markup +to HTML. + +Markdown works both as a Movable Type plug-in and as a standalone Perl +script -- which means it can also be used as a text filter in BBEdit +(or any other application that supporst filters written in Perl). + +Full documentation of Markdown's syntax and configuration options is +available on the web: . +(Note: this readme file is formatted in Markdown.) + + + +Installation and Requirements +----------------------------- + +Markdown requires Perl 5.6.0 or later. Welcome to the 21st Century. +Markdown also requires the standard Perl library module `Digest::MD5`. + + +### Movable Type ### + +Markdown works with Movable Type version 2.6 or later (including +MT 3.0 or later). + +1. Copy the "Markdown.pl" file into your Movable Type "plugins" + directory. The "plugins" directory should be in the same directory + as "mt.cgi"; if the "plugins" directory doesn't already exist, use + your FTP program to create it. Your installation should look like + this: + + (mt home)/plugins/Markdown.pl + +2. Once installed, Markdown will appear as an option in Movable Type's + Text Formatting pop-up menu. This is selectable on a per-post basis. + Markdown translates your posts to HTML when you publish; the posts + themselves are stored in your MT database in Markdown format. + +3. If you also install SmartyPants 1.5 (or later), Markdown will offer + a second text formatting option: "Markdown with SmartyPants". This + option is the same as the regular "Markdown" formatter, except that + automatically uses SmartyPants to create typographically correct + curly quotes, em-dashes, and ellipses. See the SmartyPants web page + for more information: + +4. To make Markdown (or "Markdown with SmartyPants") your default + text formatting option for new posts, go to Weblog Config -> + Preferences. + +Note that by default, Markdown produces XHTML output. To configure +Markdown to produce HTML 4 output, see "Configuration", below. + + +### Blosxom ### + +Markdown works with Blosxom version 2.x. + +1. Rename the "Markdown.pl" plug-in to "Markdown" (case is + important). Movable Type requires plug-ins to have a ".pl" + extension; Blosxom forbids it. + +2. Copy the "Markdown" plug-in file to your Blosxom plug-ins folder. + If you're not sure where your Blosxom plug-ins folder is, see the + Blosxom documentation for information. + +3. That's it. The entries in your weblog will now automatically be + processed by Markdown. + +4. If you'd like to apply Markdown formatting only to certain posts, + rather than all of them, see Jason Clark's instructions for using + Markdown in conjunction with Blosxom's Meta plugin: + + + + +### BBEdit ### + +Markdown works with BBEdit 6.1 or later on Mac OS X. (It also works +with BBEdit 5.1 or later and MacPerl 5.6.1 on Mac OS 8.6 or later.) + +1. Copy the "Markdown.pl" file to appropriate filters folder in your + "BBEdit Support" folder. On Mac OS X, this should be: + + BBEdit Support/Unix Support/Unix Filters/ + + See the BBEdit documentation for more details on the location of + these folders. + + You can rename "Markdown.pl" to whatever you wish. + +2. That's it. To use Markdown, select some text in a BBEdit document, + then choose Markdown from the Filters sub-menu in the "#!" menu, or + the Filters floating palette + + + +Configuration +------------- + +By default, Markdown produces XHTML output for tags with empty elements. +E.g.: + +
+ +Markdown can be configured to produce HTML-style tags; e.g.: + +
+ + +### Movable Type ### + +You need to use a special `MTMarkdownOptions` container tag in each +Movable Type template where you want HTML 4-style output: + + + ... put your entry content here ... + + +The easiest way to use MTMarkdownOptions is probably to put the +opening tag right after your `` tag, and the closing tag right +before ``. + +To suppress Markdown processing in a particular template, i.e. to +publish the raw Markdown-formatted text without translation into +(X)HTML, set the `output` attribute to 'raw': + + + ... put your entry content here ... + + + +### Command-Line ### + +Use the `--html4tags` command-line switch to produce HTML output from a +Unix-style command line. E.g.: + + % perl Markdown.pl --html4tags foo.text + +Type `perldoc Markdown.pl`, or read the POD documentation within the +Markdown.pl source code for more information. + + + +Bugs +---- + +To file bug reports or feature requests please send email to: +. + + + +Version History +--------------- + +1.0.1 (14 Dec 2004): + ++ Changed the syntax rules for code blocks and spans. Previously, + backslash escapes for special Markdown characters were processed + everywhere other than within inline HTML tags. Now, the contents + of code blocks and spans are no longer processed for backslash + escapes. This means that code blocks and spans are now treated + literally, with no special rules to worry about regarding + backslashes. + + **NOTE**: This changes the syntax from all previous versions of + Markdown. Code blocks and spans involving backslash characters + will now generate different output than before. + ++ Tweaked the rules for link definitions so that they must occur + within three spaces of the left margin. Thus if you indent a link + definition by four spaces or a tab, it will now be a code block. + + [a]: /url/ "Indented 3 spaces, this is a link def" + + [b]: /url/ "Indented 4 spaces, this is a code block" + + **IMPORTANT**: This may affect existing Markdown content if it + contains link definitions indented by 4 or more spaces. + ++ Added `>`, `+`, and `-` to the list of backslash-escapable + characters. These should have been done when these characters + were added as unordered list item markers. + ++ Trailing spaces and tabs following HTML comments and `
` tags + are now ignored. + ++ Inline links using `<` and `>` URL delimiters weren't working: + + like [this]() + ++ Added a bit of tolerance for trailing spaces and tabs after + Markdown hr's. + ++ Fixed bug where auto-links were being processed within code spans: + + like this: `` + ++ Sort-of fixed a bug where lines in the middle of hard-wrapped + paragraphs, which lines look like the start of a list item, + would accidentally trigger the creation of a list. E.g. a + paragraph that looked like this: + + I recommend upgrading to version + 8. Oops, now this line is treated + as a sub-list. + + This is fixed for top-level lists, but it can still happen for + sub-lists. E.g., the following list item will not be parsed + properly: + + + I recommend upgrading to version + 8. Oops, now this line is treated + as a sub-list. + + Given Markdown's list-creation rules, I'm not sure this can + be fixed. + ++ Standalone HTML comments are now handled; previously, they'd get + wrapped in a spurious `

` tag. + ++ Fix for horizontal rules preceded by 2 or 3 spaces. + ++ `


` HTML tags in must occur within three spaces of left + margin. (With 4 spaces or a tab, they should be code blocks, but + weren't before this fix.) + ++ Capitalized "With" in "Markdown With SmartyPants" for + consistency with the same string label in SmartyPants.pl. + (This fix is specific to the MT plug-in interface.) + ++ Auto-linked email address can now optionally contain + a 'mailto:' protocol. I.e. these are equivalent: + + + + ++ Fixed annoying bug where nested lists would wind up with + spurious (and invalid) `

` tags. + ++ You can now write empty links: + + [like this]() + + and they'll be turned into anchor tags with empty href attributes. + This should have worked before, but didn't. + ++ `***this***` and `___this___` are now turned into + + this + + Instead of + + this + + which isn't valid. (Thanks to Michel Fortin for the fix.) + ++ Added a new substitution in `_EncodeCode()`: s/\$/$/g; This + is only for the benefit of Blosxom users, because Blosxom + (sometimes?) interpolates Perl scalars in your article bodies. + ++ Fixed problem for links defined with urls that include parens, e.g.: + + [1]: http://sources.wikipedia.org/wiki/Middle_East_Policy_(Chomsky) + + "Chomsky" was being erroneously treated as the URL's title. + ++ At some point during 1.0's beta cycle, I changed every sub's + argument fetching from this idiom: + + my $text = shift; + + to: + + my $text = shift || return ''; + + The idea was to keep Markdown from doing any work in a sub + if the input was empty. This introduced a bug, though: + if the input to any function was the single-character string + "0", it would also evaluate as false and return immediately. + How silly. Now fixed. + + + +Donations +--------- + +Donations to support Markdown's development are happily accepted. See: + for details. + + + +Copyright and License +--------------------- + +Copyright (c) 2003-2004 John Gruber + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +* Neither the name "Markdown" nor the names of its contributors may + be used to endorse or promote products derived from this software + without specific prior written permission. + +This software is provided by the copyright holders and contributors "as +is" and any express or implied warranties, including, but not limited +to, the implied warranties of merchantability and fitness for a +particular purpose are disclaimed. In no event shall the copyright owner +or contributors be liable for any direct, indirect, incidental, special, +exemplary, or consequential damages (including, but not limited to, +procurement of substitute goods or services; loss of use, data, or +profits; or business interruption) however caused and on any theory of +liability, whether in contract, strict liability, or tort (including +negligence or otherwise) arising in any way out of the use of this +software, even if advised of the possibility of such damage. diff --git a/doc/Markdown/Markdown.pl b/doc/Markdown/Markdown.pl new file mode 100755 index 0000000..e4c8469 --- /dev/null +++ b/doc/Markdown/Markdown.pl @@ -0,0 +1,1450 @@ +#!/usr/bin/perl + +# +# Markdown -- A text-to-HTML conversion tool for web writers +# +# Copyright (c) 2004 John Gruber +# +# + + +package Markdown; +require 5.006_000; +use strict; +use warnings; + +use Digest::MD5 qw(md5_hex); +use vars qw($VERSION); +$VERSION = '1.0.1'; +# Tue 14 Dec 2004 + +## Disabled; causes problems under Perl 5.6.1: +# use utf8; +# binmode( STDOUT, ":utf8" ); # c.f.: http://acis.openlib.org/dev/perl-unicode-struggle.html + + +# +# Global default settings: +# +my $g_empty_element_suffix = " />"; # Change to ">" for HTML output +my $g_tab_width = 4; + + +# +# Globals: +# + +# Regex to match balanced [brackets]. See Friedl's +# "Mastering Regular Expressions", 2nd Ed., pp. 328-331. +my $g_nested_brackets; +$g_nested_brackets = qr{ + (?> # Atomic matching + [^\[\]]+ # Anything other than brackets + | + \[ + (??{ $g_nested_brackets }) # Recursive set of nested brackets + \] + )* +}x; + + +# Table of hash values for escaped characters: +my %g_escape_table; +foreach my $char (split //, '\\`*_{}[]()>#+-.!') { + $g_escape_table{$char} = md5_hex($char); +} + + +# Global hashes, used by various utility routines +my %g_urls; +my %g_titles; +my %g_html_blocks; + +# Used to track when we're inside an ordered or unordered list +# (see _ProcessListItems() for details): +my $g_list_level = 0; + + +#### Blosxom plug-in interface ########################################## + +# Set $g_blosxom_use_meta to 1 to use Blosxom's meta plug-in to determine +# which posts Markdown should process, using a "meta-markup: markdown" +# header. If it's set to 0 (the default), Markdown will process all +# entries. +my $g_blosxom_use_meta = 0; + +sub start { 1; } +sub story { + my($pkg, $path, $filename, $story_ref, $title_ref, $body_ref) = @_; + + if ( (! $g_blosxom_use_meta) or + (defined($meta::markup) and ($meta::markup =~ /^\s*markdown\s*$/i)) + ){ + $$body_ref = Markdown($$body_ref); + } + 1; +} + + +#### Movable Type plug-in interface ##################################### +eval {require MT}; # Test to see if we're running in MT. +unless ($@) { + require MT; + import MT; + require MT::Template::Context; + import MT::Template::Context; + + eval {require MT::Plugin}; # Test to see if we're running >= MT 3.0. + unless ($@) { + require MT::Plugin; + import MT::Plugin; + my $plugin = new MT::Plugin({ + name => "Markdown", + description => "A plain-text-to-HTML formatting plugin. (Version: $VERSION)", + doc_link => 'http://daringfireball.net/projects/markdown/' + }); + MT->add_plugin( $plugin ); + } + + MT::Template::Context->add_container_tag(MarkdownOptions => sub { + my $ctx = shift; + my $args = shift; + my $builder = $ctx->stash('builder'); + my $tokens = $ctx->stash('tokens'); + + if (defined ($args->{'output'}) ) { + $ctx->stash('markdown_output', lc $args->{'output'}); + } + + defined (my $str = $builder->build($ctx, $tokens) ) + or return $ctx->error($builder->errstr); + $str; # return value + }); + + MT->add_text_filter('markdown' => { + label => 'Markdown', + docs => 'http://daringfireball.net/projects/markdown/', + on_format => sub { + my $text = shift; + my $ctx = shift; + my $raw = 0; + if (defined $ctx) { + my $output = $ctx->stash('markdown_output'); + if (defined $output && $output =~ m/^html/i) { + $g_empty_element_suffix = ">"; + $ctx->stash('markdown_output', ''); + } + elsif (defined $output && $output eq 'raw') { + $raw = 1; + $ctx->stash('markdown_output', ''); + } + else { + $raw = 0; + $g_empty_element_suffix = " />"; + } + } + $text = $raw ? $text : Markdown($text); + $text; + }, + }); + + # If SmartyPants is loaded, add a combo Markdown/SmartyPants text filter: + my $smartypants; + + { + no warnings "once"; + $smartypants = $MT::Template::Context::Global_filters{'smarty_pants'}; + } + + if ($smartypants) { + MT->add_text_filter('markdown_with_smartypants' => { + label => 'Markdown With SmartyPants', + docs => 'http://daringfireball.net/projects/markdown/', + on_format => sub { + my $text = shift; + my $ctx = shift; + if (defined $ctx) { + my $output = $ctx->stash('markdown_output'); + if (defined $output && $output eq 'html') { + $g_empty_element_suffix = ">"; + } + else { + $g_empty_element_suffix = " />"; + } + } + $text = Markdown($text); + $text = $smartypants->($text, '1'); + }, + }); + } +} +else { +#### BBEdit/command-line text filter interface ########################## +# Needs to be hidden from MT (and Blosxom when running in static mode). + + # We're only using $blosxom::version once; tell Perl not to warn us: + no warnings 'once'; + unless ( defined($blosxom::version) ) { + use warnings; + + #### Check for command-line switches: ################# + my %cli_opts; + use Getopt::Long; + Getopt::Long::Configure('pass_through'); + GetOptions(\%cli_opts, + 'version', + 'shortversion', + 'html4tags', + ); + if ($cli_opts{'version'}) { # Version info + print "\nThis is Markdown, version $VERSION.\n"; + print "Copyright 2004 John Gruber\n"; + print "http://daringfireball.net/projects/markdown/\n\n"; + exit 0; + } + if ($cli_opts{'shortversion'}) { # Just the version number string. + print $VERSION; + exit 0; + } + if ($cli_opts{'html4tags'}) { # Use HTML tag style instead of XHTML + $g_empty_element_suffix = ">"; + } + + + #### Process incoming text: ########################### + my $text; + { + local $/; # Slurp the whole file + $text = <>; + } + print Markdown($text); + } +} + + + +sub Markdown { +# +# Main function. The order in which other subs are called here is +# essential. Link and image substitutions need to happen before +# _EscapeSpecialChars(), so that any *'s or _'s in the +# and tags get encoded. +# + my $text = shift; + + # Clear the global hashes. If we don't clear these, you get conflicts + # from other articles when generating a page which contains more than + # one article (e.g. an index page that shows the N most recent + # articles): + %g_urls = (); + %g_titles = (); + %g_html_blocks = (); + + + # Standardize line endings: + $text =~ s{\r\n}{\n}g; # DOS to Unix + $text =~ s{\r}{\n}g; # Mac to Unix + + # Make sure $text ends with a couple of newlines: + $text .= "\n\n"; + + # Convert all tabs to spaces. + $text = _Detab($text); + + # Strip any lines consisting only of spaces and tabs. + # This makes subsequent regexen easier to write, because we can + # match consecutive blank lines with /\n+/ instead of something + # contorted like /[ \t]*\n+/ . + $text =~ s/^[ \t]+$//mg; + + # Turn block-level HTML blocks into hash entries + $text = _HashHTMLBlocks($text); + + # Strip link definitions, store in hashes. + $text = _StripLinkDefinitions($text); + + $text = _RunBlockGamut($text); + + $text = _UnescapeSpecialChars($text); + + return $text . "\n"; +} + + +sub _StripLinkDefinitions { +# +# Strips link definitions from text, stores the URLs and titles in +# hash references. +# + my $text = shift; + my $less_than_tab = $g_tab_width - 1; + + # Link defs are in the form: ^[id]: url "optional title" + while ($text =~ s{ + ^[ ]{0,$less_than_tab}\[(.+)\]: # id = $1 + [ \t]* + \n? # maybe *one* newline + [ \t]* + ? # url = $2 + [ \t]* + \n? # maybe one newline + [ \t]* + (?: + (?<=\s) # lookbehind for whitespace + ["(] + (.+?) # title = $3 + [")] + [ \t]* + )? # title is optional + (?:\n+|\Z) + } + {}mx) { + $g_urls{lc $1} = _EncodeAmpsAndAngles( $2 ); # Link IDs are case-insensitive + if ($3) { + $g_titles{lc $1} = $3; + $g_titles{lc $1} =~ s/"/"/g; + } + } + + return $text; +} + + +sub _HashHTMLBlocks { + my $text = shift; + my $less_than_tab = $g_tab_width - 1; + + # Hashify HTML blocks: + # We only want to do this for block-level HTML tags, such as headers, + # lists, and tables. That's because we still want to wrap

s around + # "paragraphs" that are wrapped in non-block-level tags, such as anchors, + # phrase emphasis, and spans. The list of tags we're looking for is + # hard-coded: + my $block_tags_a = qr/p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del/; + my $block_tags_b = qr/p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math/; + + # First, look for nested blocks, e.g.: + #

+ # + # The outermost tags must start at the left margin for this to match, and + # the inner nested divs must be indented. + # We need to do this before the next, more liberal match, because the next + # match will start at the first `
` and stop at the first `
`. + $text =~ s{ + ( # save in $1 + ^ # start of line (with /m) + <($block_tags_a) # start tag = $2 + \b # word break + (.*\n)*? # any number of lines, minimally matching + # the matching end tag + [ \t]* # trailing spaces/tabs + (?=\n+|\Z) # followed by a newline or end of document + ) + }{ + my $key = md5_hex($1); + $g_html_blocks{$key} = $1; + "\n\n" . $key . "\n\n"; + }egmx; + + + # + # Now match more liberally, simply from `\n` to `\n` + # + $text =~ s{ + ( # save in $1 + ^ # start of line (with /m) + <($block_tags_b) # start tag = $2 + \b # word break + (.*\n)*? # any number of lines, minimally matching + .* # the matching end tag + [ \t]* # trailing spaces/tabs + (?=\n+|\Z) # followed by a newline or end of document + ) + }{ + my $key = md5_hex($1); + $g_html_blocks{$key} = $1; + "\n\n" . $key . "\n\n"; + }egmx; + # Special case just for
. It was easier to make a special case than + # to make the other regex more complicated. + $text =~ s{ + (?: + (?<=\n\n) # Starting after a blank line + | # or + \A\n? # the beginning of the doc + ) + ( # save in $1 + [ ]{0,$less_than_tab} + <(hr) # start tag = $2 + \b # word break + ([^<>])*? # + /?> # the matching end tag + [ \t]* + (?=\n{2,}|\Z) # followed by a blank line or end of document + ) + }{ + my $key = md5_hex($1); + $g_html_blocks{$key} = $1; + "\n\n" . $key . "\n\n"; + }egx; + + # Special case for standalone HTML comments: + $text =~ s{ + (?: + (?<=\n\n) # Starting after a blank line + | # or + \A\n? # the beginning of the doc + ) + ( # save in $1 + [ ]{0,$less_than_tab} + (?s: + + ) + [ \t]* + (?=\n{2,}|\Z) # followed by a blank line or end of document + ) + }{ + my $key = md5_hex($1); + $g_html_blocks{$key} = $1; + "\n\n" . $key . "\n\n"; + }egx; + + + return $text; +} + + +sub _RunBlockGamut { +# +# These are all the transformations that form block-level +# tags like paragraphs, headers, and list items. +# + my $text = shift; + + $text = _DoHeaders($text); + + # Do Horizontal Rules: + $text =~ s{^[ ]{0,2}([ ]?\*[ ]?){3,}[ \t]*$}{\n tags around block-level tags. + $text = _HashHTMLBlocks($text); + + $text = _FormParagraphs($text); + + return $text; +} + + +sub _RunSpanGamut { +# +# These are all the transformations that occur *within* block-level +# tags like paragraphs, headers, and list items. +# + my $text = shift; + + $text = _DoCodeSpans($text); + + $text = _EscapeSpecialChars($text); + + # Process anchor and image tags. Images must come first, + # because ![foo][f] looks like an anchor. + $text = _DoImages($text); + $text = _DoAnchors($text); + + # Make links out of things like `` + # Must come after _DoAnchors(), because you can use < and > + # delimiters in inline links like [this](). + $text = _DoAutoLinks($text); + + $text = _EncodeAmpsAndAngles($text); + + $text = _DoItalicsAndBold($text); + + # Do hard breaks: + $text =~ s/ {2,}\n/ or tags. +# my $tags_to_skip = qr!<(/?)(?:pre|code|kbd|script|math)[\s>]!; + + foreach my $cur_token (@$tokens) { + if ($cur_token->[0] eq "tag") { + # Within tags, encode * and _ so they don't conflict + # with their use in Markdown for italics and strong. + # We're replacing each such character with its + # corresponding MD5 checksum value; this is likely + # overkill, but it should prevent us from colliding + # with the escape values by accident. + $cur_token->[1] =~ s! \* !$g_escape_table{'*'}!gx; + $cur_token->[1] =~ s! _ !$g_escape_table{'_'}!gx; + $text .= $cur_token->[1]; + } else { + my $t = $cur_token->[1]; + $t = _EncodeBackslashEscapes($t); + $text .= $t; + } + } + return $text; +} + + +sub _DoAnchors { +# +# Turn Markdown link shortcuts into XHTML
tags. +# + my $text = shift; + + # + # First, handle reference-style links: [link text] [id] + # + $text =~ s{ + ( # wrap whole match in $1 + \[ + ($g_nested_brackets) # link text = $2 + \] + + [ ]? # one optional space + (?:\n[ ]*)? # one optional newline followed by spaces + + \[ + (.*?) # id = $3 + \] + ) + }{ + my $result; + my $whole_match = $1; + my $link_text = $2; + my $link_id = lc $3; + + if ($link_id eq "") { + $link_id = lc $link_text; # for shortcut links like [this][]. + } + + if (defined $g_urls{$link_id}) { + my $url = $g_urls{$link_id}; + $url =~ s! \* !$g_escape_table{'*'}!gx; # We've got to encode these to avoid + $url =~ s! _ !$g_escape_table{'_'}!gx; # conflicting with italics/bold. + $result = "? # href = $3 + [ \t]* + ( # $4 + (['"]) # quote char = $5 + (.*?) # Title = $6 + \5 # matching quote + )? # title is optional + \) + ) + }{ + my $result; + my $whole_match = $1; + my $link_text = $2; + my $url = $3; + my $title = $6; + + $url =~ s! \* !$g_escape_table{'*'}!gx; # We've got to encode these to avoid + $url =~ s! _ !$g_escape_table{'_'}!gx; # conflicting with italics/bold. + $result = " tags. +# + my $text = shift; + + # + # First, handle reference-style labeled images: ![alt text][id] + # + $text =~ s{ + ( # wrap whole match in $1 + !\[ + (.*?) # alt text = $2 + \] + + [ ]? # one optional space + (?:\n[ ]*)? # one optional newline followed by spaces + + \[ + (.*?) # id = $3 + \] + + ) + }{ + my $result; + my $whole_match = $1; + my $alt_text = $2; + my $link_id = lc $3; + + if ($link_id eq "") { + $link_id = lc $alt_text; # for shortcut links like ![this][]. + } + + $alt_text =~ s/"/"/g; + if (defined $g_urls{$link_id}) { + my $url = $g_urls{$link_id}; + $url =~ s! \* !$g_escape_table{'*'}!gx; # We've got to encode these to avoid + $url =~ s! _ !$g_escape_table{'_'}!gx; # conflicting with italics/bold. + $result = "\"$alt_text\"";? # src url = $3 + [ \t]* + ( # $4 + (['"]) # quote char = $5 + (.*?) # title = $6 + \5 # matching quote + [ \t]* + )? # title is optional + \) + ) + }{ + my $result; + my $whole_match = $1; + my $alt_text = $2; + my $url = $3; + my $title = ''; + if (defined($6)) { + $title = $6; + } + + $alt_text =~ s/"/"/g; + $title =~ s/"/"/g; + $url =~ s! \* !$g_escape_table{'*'}!gx; # We've got to encode these to avoid + $url =~ s! _ !$g_escape_table{'_'}!gx; # conflicting with italics/bold. + $result = "\"$alt_text\"";" . _RunSpanGamut($1) . "\n\n"; + }egmx; + + $text =~ s{ ^(.+)[ \t]*\n-+[ \t]*\n+ }{ + "

" . _RunSpanGamut($1) . "

\n\n"; + }egmx; + + + # atx-style headers: + # # Header 1 + # ## Header 2 + # ## Header 2 with closing hashes ## + # ... + # ###### Header 6 + # + $text =~ s{ + ^(\#{1,6}) # $1 = string of #'s + [ \t]* + (.+?) # $2 = Header text + [ \t]* + \#* # optional closing #'s (not counted) + \n+ + }{ + my $h_level = length($1); + "" . _RunSpanGamut($2) . "\n\n"; + }egmx; + + return $text; +} + + +sub _DoLists { +# +# Form HTML ordered (numbered) and unordered (bulleted) lists. +# + my $text = shift; + my $less_than_tab = $g_tab_width - 1; + + # Re-usable patterns to match list item bullets and number markers: + my $marker_ul = qr/[*+-]/; + my $marker_ol = qr/\d+[.]/; + my $marker_any = qr/(?:$marker_ul|$marker_ol)/; + + # Re-usable pattern to match any entirel ul or ol list: + my $whole_list = qr{ + ( # $1 = whole list + ( # $2 + [ ]{0,$less_than_tab} + (${marker_any}) # $3 = first list item marker + [ \t]+ + ) + (?s:.+?) + ( # $4 + \z + | + \n{2,} + (?=\S) + (?! # Negative lookahead for another list item marker + [ \t]* + ${marker_any}[ \t]+ + ) + ) + ) + }mx; + + # We use a different prefix before nested lists than top-level lists. + # See extended comment in _ProcessListItems(). + # + # Note: There's a bit of duplication here. My original implementation + # created a scalar regex pattern as the conditional result of the test on + # $g_list_level, and then only ran the $text =~ s{...}{...}egmx + # substitution once, using the scalar as the pattern. This worked, + # everywhere except when running under MT on my hosting account at Pair + # Networks. There, this caused all rebuilds to be killed by the reaper (or + # perhaps they crashed, but that seems incredibly unlikely given that the + # same script on the same server ran fine *except* under MT. I've spent + # more time trying to figure out why this is happening than I'd like to + # admit. My only guess, backed up by the fact that this workaround works, + # is that Perl optimizes the substition when it can figure out that the + # pattern will never change, and when this optimization isn't on, we run + # afoul of the reaper. Thus, the slightly redundant code to that uses two + # static s/// patterns rather than one conditional pattern. + + if ($g_list_level) { + $text =~ s{ + ^ + $whole_list + }{ + my $list = $1; + my $list_type = ($3 =~ m/$marker_ul/) ? "ul" : "ol"; + # Turn double returns into triple returns, so that we can make a + # paragraph for the last item in a list, if necessary: + $list =~ s/\n{2,}/\n\n\n/g; + my $result = _ProcessListItems($list, $marker_any); + $result = "<$list_type>\n" . $result . "\n"; + $result; + }egmx; + } + else { + $text =~ s{ + (?:(?<=\n\n)|\A\n?) + $whole_list + }{ + my $list = $1; + my $list_type = ($3 =~ m/$marker_ul/) ? "ul" : "ol"; + # Turn double returns into triple returns, so that we can make a + # paragraph for the last item in a list, if necessary: + $list =~ s/\n{2,}/\n\n\n/g; + my $result = _ProcessListItems($list, $marker_any); + $result = "<$list_type>\n" . $result . "\n"; + $result; + }egmx; + } + + + return $text; +} + + +sub _ProcessListItems { +# +# Process the contents of a single ordered or unordered list, splitting it +# into individual list items. +# + + my $list_str = shift; + my $marker_any = shift; + + + # The $g_list_level global keeps track of when we're inside a list. + # Each time we enter a list, we increment it; when we leave a list, + # we decrement. If it's zero, we're not in a list anymore. + # + # We do this because when we're not inside a list, we want to treat + # something like this: + # + # I recommend upgrading to version + # 8. Oops, now this line is treated + # as a sub-list. + # + # As a single paragraph, despite the fact that the second line starts + # with a digit-period-space sequence. + # + # Whereas when we're inside a list (or sub-list), that line will be + # treated as the start of a sub-list. What a kludge, huh? This is + # an aspect of Markdown's syntax that's hard to parse perfectly + # without resorting to mind-reading. Perhaps the solution is to + # change the syntax rules such that sub-lists must start with a + # starting cardinal number; e.g. "1." or "a.". + + $g_list_level++; + + # trim trailing blank lines: + $list_str =~ s/\n{2,}\z/\n/; + + + $list_str =~ s{ + (\n)? # leading line = $1 + (^[ \t]*) # leading whitespace = $2 + ($marker_any) [ \t]+ # list marker = $3 + ((?s:.+?) # list item text = $4 + (\n{1,2})) + (?= \n* (\z | \2 ($marker_any) [ \t]+)) + }{ + my $item = $4; + my $leading_line = $1; + my $leading_space = $2; + + if ($leading_line or ($item =~ m/\n{2,}/)) { + $item = _RunBlockGamut(_Outdent($item)); + } + else { + # Recursion for sub-lists: + $item = _DoLists(_Outdent($item)); + chomp $item; + $item = _RunSpanGamut($item); + } + + "
  • " . $item . "
  • \n"; + }egmx; + + $g_list_level--; + return $list_str; +} + + + +sub _DoCodeBlocks { +# +# Process Markdown `
    ` blocks.
    +#	
    +
    +	my $text = shift;
    +
    +	$text =~ s{
    +			(?:\n\n|\A)
    +			(	            # $1 = the code block -- one or more lines, starting with a space/tab
    +			  (?:
    +			    (?:[ ]{$g_tab_width} | \t)  # Lines must start with a tab or a tab-width of spaces
    +			    .*\n+
    +			  )+
    +			)
    +			((?=^[ ]{0,$g_tab_width}\S)|\Z)	# Lookahead for non-space at line-start, or end of doc
    +		}{
    +			my $codeblock = $1;
    +			my $result; # return value
    +
    +			$codeblock = _EncodeCode(_Outdent($codeblock));
    +			$codeblock = _Detab($codeblock);
    +			$codeblock =~ s/\A\n+//; # trim leading newlines
    +			$codeblock =~ s/\s+\z//; # trim trailing whitespace
    +
    +			$result = "\n\n
    " . $codeblock . "\n
    \n\n"; + + $result; + }egmx; + + return $text; +} + + +sub _DoCodeSpans { +# +# * Backtick quotes are used for spans. +# +# * You can use multiple backticks as the delimiters if you want to +# include literal backticks in the code span. So, this input: +# +# Just type ``foo `bar` baz`` at the prompt. +# +# Will translate to: +# +#

    Just type foo `bar` baz at the prompt.

    +# +# There's no arbitrary limit to the number of backticks you +# can use as delimters. If you need three consecutive backticks +# in your code, use four for delimiters, etc. +# +# * You can use spaces to get literal backticks at the edges: +# +# ... type `` `bar` `` ... +# +# Turns to: +# +# ... type `bar` ... +# + + my $text = shift; + + $text =~ s@ + (`+) # $1 = Opening run of ` + (.+?) # $2 = The code block + (?$c
    "; + @egsx; + + return $text; +} + + +sub _EncodeCode { +# +# Encode/escape certain characters inside Markdown code runs. +# The point is that in code, these characters are literals, +# and lose their special Markdown meanings. +# + local $_ = shift; + + # Encode all ampersands; HTML entities are not + # entities within a Markdown code span. + s/&/&/g; + + # Encode $'s, but only if we're running under Blosxom. + # (Blosxom interpolates Perl variables in article bodies.) + { + no warnings 'once'; + if (defined($blosxom::version)) { + s/\$/$/g; + } + } + + + # Do the angle bracket song and dance: + s! < !<!gx; + s! > !>!gx; + + # Now, escape characters that are magic in Markdown: + s! \* !$g_escape_table{'*'}!gx; + s! _ !$g_escape_table{'_'}!gx; + s! { !$g_escape_table{'{'}!gx; + s! } !$g_escape_table{'}'}!gx; + s! \[ !$g_escape_table{'['}!gx; + s! \] !$g_escape_table{']'}!gx; + s! \\ !$g_escape_table{'\\'}!gx; + + return $_; +} + + +sub _DoItalicsAndBold { + my $text = shift; + + # must go first: + $text =~ s{ (\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1 } + {$2}gsx; + + $text =~ s{ (\*|_) (?=\S) (.+?) (?<=\S) \1 } + {$2}gsx; + + return $text; +} + + +sub _DoBlockQuotes { + my $text = shift; + + $text =~ s{ + ( # Wrap whole match in $1 + ( + ^[ \t]*>[ \t]? # '>' at the start of a line + .+\n # rest of the first line + (.+\n)* # subsequent consecutive lines + \n* # blanks + )+ + ) + }{ + my $bq = $1; + $bq =~ s/^[ \t]*>[ \t]?//gm; # trim one level of quoting + $bq =~ s/^[ \t]+$//mg; # trim whitespace-only lines + $bq = _RunBlockGamut($bq); # recurse + + $bq =~ s/^/ /g; + # These leading spaces screw with
     content, so we need to fix that:
    +			$bq =~ s{
    +					(\s*
    .+?
    ) + }{ + my $pre = $1; + $pre =~ s/^ //mg; + $pre; + }egsx; + + "
    \n$bq\n
    \n\n"; + }egmx; + + + return $text; +} + + +sub _FormParagraphs { +# +# Params: +# $text - string to process with html

    tags +# + my $text = shift; + + # Strip leading and trailing lines: + $text =~ s/\A\n+//; + $text =~ s/\n+\z//; + + my @grafs = split(/\n{2,}/, $text); + + # + # Wrap

    tags. + # + foreach (@grafs) { + unless (defined( $g_html_blocks{$_} )) { + $_ = _RunSpanGamut($_); + s/^([ \t]*)/

    /; + $_ .= "

    "; + } + } + + # + # Unhashify HTML blocks + # + foreach (@grafs) { + if (defined( $g_html_blocks{$_} )) { + $_ = $g_html_blocks{$_}; + } + } + + return join "\n\n", @grafs; +} + + +sub _EncodeAmpsAndAngles { +# Smart processing for ampersands and angle brackets that need to be encoded. + + my $text = shift; + + # Ampersand-encoding based entirely on Nat Irons's Amputator MT plugin: + # http://bumppo.net/projects/amputator/ + $text =~ s/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)/&/g; + + # Encode naked <'s + $text =~ s{<(?![a-z/?\$!])}{<}gi; + + return $text; +} + + +sub _EncodeBackslashEscapes { +# +# Parameter: String. +# Returns: The string, with after processing the following backslash +# escape sequences. +# + local $_ = shift; + + s! \\\\ !$g_escape_table{'\\'}!gx; # Must process escaped backslashes first. + s! \\` !$g_escape_table{'`'}!gx; + s! \\\* !$g_escape_table{'*'}!gx; + s! \\_ !$g_escape_table{'_'}!gx; + s! \\\{ !$g_escape_table{'{'}!gx; + s! \\\} !$g_escape_table{'}'}!gx; + s! \\\[ !$g_escape_table{'['}!gx; + s! \\\] !$g_escape_table{']'}!gx; + s! \\\( !$g_escape_table{'('}!gx; + s! \\\) !$g_escape_table{')'}!gx; + s! \\> !$g_escape_table{'>'}!gx; + s! \\\# !$g_escape_table{'#'}!gx; + s! \\\+ !$g_escape_table{'+'}!gx; + s! \\\- !$g_escape_table{'-'}!gx; + s! \\\. !$g_escape_table{'.'}!gx; + s{ \\! }{$g_escape_table{'!'}}gx; + + return $_; +} + + +sub _DoAutoLinks { + my $text = shift; + + $text =~ s{<((https?|ftp):[^'">\s]+)>}{
    $1}gi; + + # Email addresses: + $text =~ s{ + < + (?:mailto:)? + ( + [-.\w]+ + \@ + [-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+ + ) + > + }{ + _EncodeEmailAddress( _UnescapeSpecialChars($1) ); + }egix; + + return $text; +} + + +sub _EncodeEmailAddress { +# +# Input: an email address, e.g. "foo@example.com" +# +# Output: the email address as a mailto link, with each character +# of the address encoded as either a decimal or hex entity, in +# the hopes of foiling most address harvesting spam bots. E.g.: +# +# foo +# @example.com +# +# Based on a filter by Matthew Wickline, posted to the BBEdit-Talk +# mailing list: +# + + my $addr = shift; + + srand; + my @encode = ( + sub { '&#' . ord(shift) . ';' }, + sub { '&#x' . sprintf( "%X", ord(shift) ) . ';' }, + sub { shift }, + ); + + $addr = "mailto:" . $addr; + + $addr =~ s{(.)}{ + my $char = $1; + if ( $char eq '@' ) { + # this *must* be encoded. I insist. + $char = $encode[int rand 1]->($char); + } elsif ( $char ne ':' ) { + # leave ':' alone (to spot mailto: later) + my $r = rand; + # roughly 10% raw, 45% hex, 45% dec + $char = ( + $r > .9 ? $encode[2]->($char) : + $r < .45 ? $encode[1]->($char) : + $encode[0]->($char) + ); + } + $char; + }gex; + + $addr = qq{$addr}; + $addr =~ s{">.+?:}{">}; # strip the mailto: from the visible part + + return $addr; +} + + +sub _UnescapeSpecialChars { +# +# Swap back in all the special characters we've hidden. +# + my $text = shift; + + while( my($char, $hash) = each(%g_escape_table) ) { + $text =~ s/$hash/$char/g; + } + return $text; +} + + +sub _TokenizeHTML { +# +# Parameter: String containing HTML markup. +# Returns: Reference to an array of the tokens comprising the input +# string. Each token is either a tag (possibly with nested, +# tags contained therein, such as , or a +# run of text between tags. Each element of the array is a +# two-element array; the first is either 'tag' or 'text'; +# the second is the actual value. +# +# +# Derived from the _tokenize() subroutine from Brad Choate's MTRegex plugin. +# +# + + my $str = shift; + my $pos = 0; + my $len = length $str; + my @tokens; + + my $depth = 6; + my $nested_tags = join('|', ('(?:<[a-z/!$](?:[^<>]') x $depth) . (')*>)' x $depth); + my $match = qr/(?s: ) | # comment + (?s: <\? .*? \?> ) | # processing instruction + $nested_tags/ix; # nested tags + + while ($str =~ m/($match)/g) { + my $whole_tag = $1; + my $sec_start = pos $str; + my $tag_start = $sec_start - length $whole_tag; + if ($pos < $tag_start) { + push @tokens, ['text', substr($str, $pos, $tag_start - $pos)]; + } + push @tokens, ['tag', $whole_tag]; + $pos = pos $str; + } + push @tokens, ['text', substr($str, $pos, $len - $pos)] if $pos < $len; + \@tokens; +} + + +sub _Outdent { +# +# Remove one level of line-leading tabs or spaces +# + my $text = shift; + + $text =~ s/^(\t|[ ]{1,$g_tab_width})//gm; + return $text; +} + + +sub _Detab { +# +# Cribbed from a post by Bart Lateur: +# +# + my $text = shift; + + $text =~ s{(.*?)\t}{$1.(' ' x ($g_tab_width - length($1) % $g_tab_width))}ge; + return $text; +} + + +1; + +__END__ + + +=pod + +=head1 NAME + +B + + +=head1 SYNOPSIS + +B [ B<--html4tags> ] [ B<--version> ] [ B<-shortversion> ] + [ I ... ] + + +=head1 DESCRIPTION + +Markdown is a text-to-HTML filter; it translates an easy-to-read / +easy-to-write structured text format into HTML. Markdown's text format +is most similar to that of plain text email, and supports features such +as headers, *emphasis*, code blocks, blockquotes, and links. + +Markdown's syntax is designed not as a generic markup language, but +specifically to serve as a front-end to (X)HTML. You can use span-level +HTML tags anywhere in a Markdown document, and you can use block level +HTML tags (like
    and as well). + +For more information about Markdown's syntax, see: + + http://daringfireball.net/projects/markdown/ + + +=head1 OPTIONS + +Use "--" to end switch parsing. For example, to open a file named "-z", use: + + Markdown.pl -- -z + +=over 4 + + +=item B<--html4tags> + +Use HTML 4 style for empty element tags, e.g.: + +
    + +instead of Markdown's default XHTML style tags, e.g.: + +
    + + +=item B<-v>, B<--version> + +Display Markdown's version number and copyright information. + + +=item B<-s>, B<--shortversion> + +Display the short-form version number. + + +=back + + + +=head1 BUGS + +To file bug reports or feature requests (other than topics listed in the +Caveats section above) please send email to: + + support@daringfireball.net + +Please include with your report: (1) the example input; (2) the output +you expected; (3) the output Markdown actually produced. + + +=head1 VERSION HISTORY + +See the readme file for detailed release notes for this version. + +1.0.1 - 14 Dec 2004 + +1.0 - 28 Aug 2004 + + +=head1 AUTHOR + + John Gruber + http://daringfireball.net + + PHP port and other contributions by Michel Fortin + http://michelf.com + + +=head1 COPYRIGHT AND LICENSE + +Copyright (c) 2003-2004 John Gruber + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +* Neither the name "Markdown" nor the names of its contributors may + be used to endorse or promote products derived from this software + without specific prior written permission. + +This software is provided by the copyright holders and contributors "as +is" and any express or implied warranties, including, but not limited +to, the implied warranties of merchantability and fitness for a +particular purpose are disclaimed. In no event shall the copyright owner +or contributors be liable for any direct, indirect, incidental, special, +exemplary, or consequential damages (including, but not limited to, +procurement of substitute goods or services; loss of use, data, or +profits; or business interruption) however caused and on any theory of +liability, whether in contract, strict liability, or tort (including +negligence or otherwise) arising in any way out of the use of this +software, even if advised of the possibility of such damage. + +=cut