Caching Youtube

By human

Puas rasanya, akhirnya bisa caching youtube (dan yang similar) :D.
Setelah sekian bulan “mangkrak” akhirnya jalan juga walau masih di mesin virtualbox.

Howto ini bukan untuk para pembenci youtube dan googlemap.
Tapi untuk youtube dan googlemap lovers.

bahan referensi yang jadi bacaan.
http://www.mail-archive.com/squid-users@squid-cache.org/msg54605.html
http://www.mail-archive.com/squid-users@squid-cache.org/msg51076.html
http://wiki.squid-cache.org/Features/StoreUrlRewrite
http://wiki.squid-cache.org/Features/StoreUrlRewrite/RewriteScript

Versi yang saya pakai adalah squid-2.7.STABLE3, tidak tahu dukungan untuk versi yang lain.

  1. buat script untuk manipulasi youtube.
    #!/usr/bin/perl
    $|=1;
    while (<>) {
    @X = split;
    $url = $X[0];
    $url =~s@^http://(.*?)/get_video\?(.*)video_id=(.*?)&.*@squid://videos.youtube.INTERNAL/ID=$3@;
    $url =~s@^http://(.*?)/get_video\?(.*)video_id=(.*?)$@squid://videos.youtube.INTERNAL/ID=$3@;
    $url =~s@^http://(.*?)/videodownload\?(.*)docid=(.*?)$@squid://videos.google.INTERNAL/ID=$3@;
    $url =~s@^http://(.*?)/videodownload\?(.*)docid=(.*?)&.*@squid://videos.google.INTERNAL/ID=$3@;
    print "$url\n"; }
  2. Lalu di squid.conf-nya edit seperti yang dibawah ini:
    acl store_rewrite_list url_regex ^http://(.*?)/get_video\?
    acl store_rewrite_list url_regex ^http://(.*?)/videodownload\?
    cache allow store_rewrite_list
    
    # Had to uncomment this again, because I couln'd login to google mail using IE6 (firefox had no trouble):
    acl QUERY urlpath_regex cgi-bin \?
    cache deny QUERY
    
    refresh_pattern ^http://(.*?)/get_video\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
    refresh_pattern ^http://(.*?)/videodownload\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
    
    storeurl_access allow store_rewrite_list
    storeurl_access deny all
    
    storeurl_rewrite_program /usr/local/bin/store_url_rewrite

Hasilnya bisa dilihat di access-log, pada saat mengakses video yang sama, akan langsung hit.

# grep youtube access.log | grep TCP_HIT

1214834411.379    735 192.168.1.89 TCP_HIT/200 1604459 GET http://youtube.com/get_video?video_id=2d55B-SiJdM&t=OEgsToPDskKrwAAE_vVIhOqMhPqmPDUQ - NONE/- video/flv
1214834487.090    818 192.168.1.94 TCP_HIT/200 1604459 GET http://youtube.com/get_video?video_id=2d55B-SiJdM&t=OEgsToPDskLGVqEnxKjLEN4DGA3HYGse - NONE/- video/flv
1214836269.353   4383 192.168.1.91 TCP_HIT/200 9533167 GET http://youtube.com/get_video?video_id=i6cKRT12jgw&t=OEgsToPDskKeQxYVvYZ7fgEIW4UNC_U- - NONE/- video/flv
1214836514.802   3757 192.168.1.91 TCP_HIT/200 9533167 GET http://youtube.com/get_video?video_id=i6cKRT12jgw&t=OEgsToPDskIEwsTb26LiGFc96hBUUa9Z - NONE/- video/flv

Satu pesan dari Horacio Herrera Gonzalez, karena basic scriptnya tidak spesifik ke url tertentu, maka :

Warning! This code may match other sites not related to YT or GV.

He he he he, watching your bandwidth.

Karena beberapa user merasa kesulitan untuk mengaplied caching youtube.

Langkah dibawah adalah urutan di server saya.

  1. Saya pakai distro TSL 3.05, dengan squid squid-2.7.STABLE3
  2. ./configure \
    --sysconfdir=/etc/squid \
    --prefix=/usr \
    --enable-async-io \
    --enable-removal-policies=lru,heap \
    --disable-delay-pools \
    --disable-wccp \
    --disable-wccp2 \
    --enable-kill-parent-hack \
    --enable-snmp \
    --enable-default-err-languages=English --enable-err-languages=English \
    --enable-linux-netfilter \
    --disable-auth
  3. config hasil parsing ^# dari squid.conf
    acl all src all
    acl manager proto cache_object
    acl localhost src 127.0.0.1/32
    acl to_localhost dst 127.0.0.0/8
    acl localnet src 10.0.0.0/8	# RFC1918 possible internal network
    acl localnet src 172.16.0.0/12	# RFC1918 possible internal network
    acl localnet src 192.168.0.0/16	# RFC1918 possible internal network
    acl SSL_ports port 443
    acl Safe_ports port 80		# http
    acl Safe_ports port 21		# ftp
    acl Safe_ports port 443		# https
    acl Safe_ports port 70		# gopher
    acl Safe_ports port 210		# wais
    acl Safe_ports port 1025-65535	# unregistered ports
    acl Safe_ports port 280		# http-mgmt
    acl Safe_ports port 488		# gss-http
    acl Safe_ports port 591		# filemaker
    acl Safe_ports port 777		# multiling http
    acl CONNECT method CONNECT
    http_access allow manager localhost
    http_access deny manager
    http_access deny !Safe_ports
    http_access deny CONNECT !SSL_ports
    http_access allow localnet
    http_access deny all
    icp_access allow localnet
    icp_access deny all
    http_port 3128 transparent
    hierarchy_stoplist cgi-bin ?
    cache_mem 6 MB
    maximum_object_size_in_memory 32 KB
    memory_replacement_policy heap GDSF
    cache_replacement_policy heap LFUDA
    cache_dir aufs /nfs/cache 20000 16 256
    maximum_object_size 64 MB
    cache_swap_low 98
    cache_swap_high 99
    access_log /var/log/squid/access.log squid
    cache_log /var/log/squid/cache.log
    cache_store_log none
    log_fqdn off
    storeurl_rewrite_program /etc/squid/store_url_rewrite
    acl store_rewrite_list url_regex ^http://(.*?)/get_video\?
    acl store_rewrite_list url_regex ^http://(.*?)/videodownload\?
    storeurl_access allow store_rewrite_list
    storeurl_access deny all
    cache allow store_rewrite_list
    acl QUERY urlpath_regex cgi-bin \?
    cache deny QUERY
    refresh_pattern ^http://(.*?)/get_video\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
    refresh_pattern ^http://(.*?)/videodownload\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
    refresh_pattern ^ftp:		1440	20%	10080
    refresh_pattern ^gopher:	1440	0%	1440
    refresh_pattern -i (/cgi-bin/|\?) 0	0%	0
    refresh_pattern .		0	20%	4320
    quick_abort_min 0
    quick_abort_max 0
    quick_abort_pct 98
    acl apache rep_header Server ^Apache
    broken_vary_encoding allow apache
    vary_ignore_expire on
    cache_effective_user squid
    cache_effective_group squid
    log_icp_queries off
    ipcache_size 2048
    ipcache_low 98
    ipcache_high 99
    memory_pools off
    reload_into_ims on
    coredump_dir /usr/var/cache
    pipeline_prefetch on

Caching photobucket

Kontribusi apit (Ym-id relative_04), caching untuk photobucket yang banyak di pakai di friendster.

di store_url_rewrite

$url =~s@^http://(.*?)/albums\?&.*@squid://images.photobucket.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/albums\?$@squid://images.photobucket.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/albums\?&.*@squid://videos.photobucket.INTERNAL/ID=$3@;
$url =~s@^http://(.*?)/albums\?$@squid://videos.photobucket.INTERNAL/ID=$3@;

di squid.conf

acl store_rewrite_list url_regex ^http://i(.*?).photobucket.com/albums/(.*?)/(.*?)/(.*?)\?
acl store_rewrite_list url_regex ^http://vid(.*?).photobucket.com/albums/(.*?)/(.*?)\?

refresh_pattern ^http://i(.*?).photobucket.com/albums/(.*?)/(.*?)/(.*?)\? 43200 90% 999999 override-expire ignore-no-cache ignore-private
refresh_pattern ^http://vid(.*?).photobucket.com/albums/(.*?)/(.*?)\? 43200 90% 999999 override-expire ignore-no-cache ignore-private

Hasilnya

TCP_HIT/200 5474813 GET http://vid264.photobucket.com/albums/ii163/shannonwiseman12/DSCN0212.flv - NONE/- text/plain

Update script

Diperkirakan youtube merubah sistem mereka, sekitar quartal pertama tahun 2009.
Akibatnya script diatas sudah tidak berfungsi, untuk mengatasinya perlu diubah script dan beberapa bagian di konfigurasi.
Untung saja sudah ada panduannya di http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube/Discussion

konfigurasi di bawah saya coba dimesin vmware dengan os centos 5.2, juli 2009

Untuk mempermudah saya sertakan squid.conf yang sudah dimodifikasi dan script url rewriternya.

acl all src all
acl manager proto cache_object
acl localhost src 127.0.0.1/32
acl to_localhost dst 127.0.0.0/8
acl localnet src 10.0.0.0/8
acl localnet src 172.16.0.0/12
acl localnet src 192.168.0.0/16
acl SSL_ports port 443
acl Safe_ports port 80
acl Safe_ports port 21
acl Safe_ports port 443
acl Safe_ports port 70
acl Safe_ports port 210
acl Safe_ports port 1025-65535
acl Safe_ports port 280
acl Safe_ports port 488
acl Safe_ports port 591
acl Safe_ports port 777
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localnet
http_access deny all
icp_access allow localnet
icp_access deny all
http_port 3128 transparent
hierarchy_stoplist cgi-bin ?
cache_mem 6 MB
maximum_object_size_in_memory 32 KB
memory_replacement_policy heap GDSF
cache_replacement_policy heap LFUDA
cache_dir aufs /cache 20000 16 256
maximum_object_size 64 MB
cache_swap_low 98
cache_swap_high 99
access_log /var/log/squid/access.log squid
cache_log /var/log/squid/cache.log
cache_store_log none
log_fqdn off

#storeurl_rewrite_program /etc/squid/store_url_rewrite
#acl store_rewrite_list url_regex ^http://(.*?)/get_video\?
#acl store_rewrite_list url_regex ^http://(.*?)/videoplayback\?

acl store_rewrite_list urlpath_regex \/(get_video\?|videodownload\?|videoplayback.*id) \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\? \/ads\?
acl store_rewrite_list_web url_regex ^http:\/\/([A-Za-z-]+[0-9]+)*\.[A-Za-z]*\.[A-Za-z]*
acl store_rewrite_list_path urlpath_regex \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)$

acl store_rewrite_list_web_CDN url_regex ^http:\/\/[a-z]+[0-9]\.google\.com doubleclick\.net
acl QUERY2 urlpath_regex get_video\? videoplayback\? \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\?
cache allow QUERY2
cache allow store_rewrite_list_web_CDN

acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY

storeurl_access allow store_rewrite_list
#this is not related to youtube video its only for CDN pictures
storeurl_access allow store_rewrite_list_web_CDN
storeurl_access allow store_rewrite_list_web store_rewrite_list_path
storeurl_access deny all
#rewrite_program path is base on windows so use use your own path
storeurl_rewrite_program /etc/squid/cacheyoutube2.pl
storeurl_rewrite_children 1
storeurl_rewrite_concurrency 10

refresh_pattern ^http://(.*?)/get_video\? 10080 90% 999999 override-expire ignore-no-cache ignore-private
refresh_pattern ^http://(.*?)/videoplayback\? 10080 90% 999999 override-expire ignore-no-cache ignore-private

refresh_pattern -i (get_video\?|videoplayback\?id|videoplayback.*id) 161280 50000% 525948 override-expire ignore-reload
#and for pictures
refresh_pattern -i \.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)(\?|$) 161280 3000% 525948 override-expire reload-into-ims

refresh_pattern ^ftp:		1440	20%	10080
refresh_pattern ^gopher:	1440	0%	1440
refresh_pattern -i (/cgi-bin/|\?) 0	0%	0
refresh_pattern .		0	20%	4320
quick_abort_min 0
quick_abort_max 0
quick_abort_pct 98
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
vary_ignore_expire on
cache_effective_user squid
cache_effective_group squid
log_icp_queries off
ipcache_size 2048
ipcache_low 98
ipcache_high 99
memory_pools off
reload_into_ims on
coredump_dir /usr/var/cache
pipeline_prefetch on

sedangkan untuk storeurl programnya sebagai berikut

isi file cacheyoutube2.pl

#!/usr/bin/perl
$|=1;
while (<>) {
    @X = split;
        $x = $X[0];
        $_ = $X[1];
        $u = $X[1];

if (m/^http:\/\/([0-9.]{4}|www\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?(videoplayback\?id=.*?|video_id=.*?)\&(.*?)/) {
        $z = $2; $z =~ s/video_id=/get_video?video_id=/; # compatible to old cached get_video?video_id
        print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $z . "\n";
                        # new youtube

} elsif (m/^http:\/\/([0-9.]{4}|www\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?\&(id=[a-zA-Z0-9]*)/) {
        print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $2 . "\n";

} elsif (m/^http:\/\/www\.google-analytics\.com\/__utm\.gif\?.*/) {
        print $x . "http://www.google-analytics.com/__utm.gif\n";
                        #cache high latency ads
} elsif (m/^http:\/\/(.*?)\/(ads)\?(.*?)/) {
        print $x . "http://" . $1 . "/" . $2  . "\n";

                        # spicific servers starts here....
} elsif (m/^http:\/\/(www\.ziddu\.com.*\.[^\/]{3,4})\/(.*?)/) {
        print $x . "http://" . $1 . "\n";
                        #rapidshare
} elsif ( ($u =~ /rapidshare/) && (m/^http:\/\/(([A-Za-z]+[0-9-.]+)*?)([a-z]*\.[^\/]{3}\/[a-z]*\/[0-9]*)\/(.*?)\/([^\/\?\&]{4,})$/)) {
        print $x . "http://cdn." . $3 . "/SQUIDINTERNAL/" . $5 . "\n";

} elsif ( ($u =~ /maxporn/) && (m/^http:\/\/([^\/]*?)\/(.*?)\/([^\/]*?)(\?.*)?$/)) {
#       $z = $1; $z =~ s/[A-Za-z]+[0-9-.]+/cdn/;
        print $x . "http://" . $1 . "/SQUIDINTERNAL/" . $3 . "\n";      

                        #like porn hub variables url and center part of the path, filename etention 3 or 4 with or withour ? at the end
} elsif ( ($u =~ /tube8|pornhub/) && (m/^http:\/\/(([A-Za-z]+[0-9-.]+)*?)\.([a-z]*[0-9]?\.[^\/]{3}\/[a-z]*)(.*?)((\/[a-z]*)?(\/[^\/]*){4}\.[^\/\?]{3,4})(\?.*)?$/)) {
        print $x . "http://cdn." . $3 . $5 . "\n";
                        #...spicific servers end here.
                        #general purpose for cdn servers. add above your specific servers.
} elsif (m/^http:\/\/([0-9.]*?)\/\/(.*?)\.(.*)\?(.*?)/) {
        print $x . "http://squid-cdn-url//" . $2  . "." . $3 . "\n";
                        #for yimg.com
} elsif (m/^http:\/\/(.*?)\.yimg\.com\/(.*?)\.yimg\.com\/(.*?)\?(.*?)/) {
        print $x . "http://cdn.yimg.com/"  . $3 . "\n";
                        #generic http://variable.domain.com/path/filename."ext" or "exte" with or withour "?"
} elsif (m/^http:\/\/( ([A-Za-z]+[0-9-.]+)*?)\.(.*?)\.(.*?)\/(.*?)\.([^\/\?\&]{3,4})(\?.*)?$/) {
        print $x . "http://cdn." . $3 . "." . $4 . "/" . $5 . "." . $6 . "\n";
                        # generic http://variable.domain.com/...
} elsif (m/^http:\/\/( ([A-Za-z]+[0-9-.]+)*?)\.(.*?)\.(.*?)\/(.*)$/) {
        print $x . "http://cdn." . $3 . "." . $4 . "/" . $5 .  "\n";
                        # spicific extention that ends with ?
} elsif (m/^http:\/\/(.*?)\/(.*?)\.(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv|on2)\?(.*)/) {
        print $x . "http://" . $1 . "/" . $2  . "." . $3 . "\n";
                        # all that ends with ;
} elsif (m/^http:\/\/(.*?)\/(.*?)\;(.*)/) {
        print $x . "http://" . $1 . "/" . $2  . "\n";

} else {
        print $x . $_ . "\n";
}
}

Jangan lupa di chmod +x agar file perl-nya bisa di exekusi.

167 Comments

  • At 2010.02.25 10:03, human said:

    Secara umum untuk semua linux sama.

    Pastikan saja clark connect memakai squid 2.7X dan sudah tersedia perl.

    • At 2010.03.06 23:37, mada said:

      sip dah

      • At 2010.05.14 16:21, bas said:

        Bos human… emang itu harus di kompile sendiri ya?
        Kalo misal saya pake squid yang dari apt-get install masih bisa ato ngga? Terimakasih..

        • At 2010.05.19 16:29, human said:

          Takutnya kalau pakai apt-get, fungsi regexnya bermasalah, karena tidak jadi default kompile dari squidnya

        • At 2010.05.14 17:45, bezt said:

          Terimakasih kang human, Saya memakai squid 2.7 STABLE6 ga perlu di compile ulang dan ternyata berhasil dengan mulus, sebelumnya saya memakai software videocache yg bayar itu (meski sy dapetnya ga bayar) dan mengalami bug sehingga harus me-request 2 kali, baru bisa ter-cache.

          Tapi dengan jampi2 diatas, wuuuuzzz… jadi ngacir kang!
          Matur sembah nuwun.. ^^

          • At 2010.05.19 16:28, human said:

            Sip dah kalau sudah bisa jalan, jangan biasakan memakai barang haram ya … he he he

          • At 2010.08.06 02:17, masemen said:

            Mohon Bantuan

            1281035522.448 258366 192.168.99.1 TCP_MISS/200 21647496 GET http://v19.lscache4.c.youtube.com/videoplayback? – DIRECT/208.117.252.152 video/x-flv
            1281035572.956 1133 192.168.99.1 TCP_MISS/200 19055 GET http://www.youtube.com/watch? – DIRECT/64.233.181.93 text/html
            1281035573.688 122292 192.168.99.1 TCP_MISS/200 8171268 GET http://v5.cache8.c.youtube.com/videoplayback? – DIRECT/74.125.209.244 video/x-flv
            1281035574.013 298 192.168.99.1 TCP_MISS/204 462 GET http://www.youtube.com/get_video? – DIRECT/64.233.181.93 text/html
            1281035574.169 932 192.168.99.1 TCP_MISS/204 260 GET http://v22.lscache3.c.youtube.com/generate_204? – DIRECT/208.117.241.33 text/html
            1281035589.145 1021 192.168.99.1 TCP_MISS/200 20486 GET http://www.youtube.com/watch? – DIRECT/64.233.181.93 text/html
            1281035590.070 315 192.168.99.1 TCP_MISS/204 462 GET http://www.youtube.com/get_video? – DIRECT/64.233.181.93 text/html
            1281035590.371 962 192.168.99.1 TCP_MISS/204 260 GET http://v14.lscache1.c.youtube.com/generate_204? – DIRECT/204.246.232.32 text/html
            ^C

            kok ga ada yg HIT yach
            bisa dibantu ?

            • At 2010.09.17 14:07, wisnu said:

              ada cara tuk cache biar lama umurnya.. sama settingnya yang mana ya mas? tks

              • […] apit1, caching untuk photobucket yang banyak di pakai di […]

                • At 2010.12.14 17:33, pLuTo said:

                  gimana mahu update pada squid 2.7 stable3..saya masih pakai clearos squid 2.6..saya mahu buat cache youtube mass bantuan dongg…:(

                  • At 2010.12.17 07:32, dani said:

                    bro mohon bantuannya…gmna cara membuat delaypool di server clearos 5.1 yg benar,gagal terus bro,buat membatasi downlad trus menghendel streming you tube…kasian yg maen PB yg ngelak…oiya bro aku pake speedy 2mb….thanks bro…

                    • At 2011.03.26 10:42, mr x said:

                      acl video_youtube dstdomain .youtube.com
                      delay_pools 1
                      delay_class 1 2
                      delay_parameters -1/-1 8000/1000000
                      delay_access allow video_youtube
                      delay_access deny all

                    • At 2011.02.16 10:13, xander said:

                      squid w ga bisa jga padahal dah ikutin caranya tetep aja banyak yg error,.apaannya yg salah ya,.w pake squid Version 2.7.STABLE6,.tolong dong di posting ulang aja biar ga ribet,.

                      • At 2011.03.01 11:59, aldi said:

                        saya mau tanya apakah squid contoh diatas masih bisa cache youtube, dahulu memang bisa, baru sadar sekarang youtubenya ga cache lagi di tempat saya. tq

                        • At 2011.03.26 11:07, mr x said:

                          #############################################
                          acl youtube dstdomain .youtube.com
                          cache allow youtube

                          ## atau as simple just ##

                          cache allow all
                          #############################################

                          ## kalau squid belum di-patch tambahkan ##
                          minimum_object_size 512 bytes

                          ## pastikan maximum_object_size yg sesuai ##
                          maximum_object_size 2 GB

                          ####################################################################
                          ## Note: script yang ditampilkan human mungkin hanya untuk video youtube lama !!
                          ####################################################################

                          • At 2011.04.03 10:13, human said:

                            terimakasih infonya mr x, jadi sekarang hanya se-simple itu ya, pakai squid versi berapa ?

                            memang betul saya tidak mengupdate script di blog saya.
                            Sudah tidak ada lagi mesin proxy untuk di oprek.
                            Sekarang sedang dipaksa maen poip2-an.

                          • At 2012.04.04 11:00, ifdall said:

                            kalo udah kadarluasa info nya di hapus aja bro…

                            (Required)
                            (Required, will not be published)